Databricks Certified Associate Developer for Apache Spark Exam Practice Questions (P. 4)
- Full Access (343 questions)
- One Year of Premium Access
- Access to one million comments
- Seamless ChatGPT Integration
- Ability to download PDF files
- Anki Flashcard files for revision
- No Captcha & No AdSense
- Advanced Exam Configuration
Question #31
Which of the following code blocks will most quickly return an approximation for the number of distinct values in column division in DataFrame storesDF?
- AstoresDF.agg(approx_count_distinct(col("division")).alias("divisionDistinct"))
- BstoresDF.agg(approx_count_distinct(col("division"), 0.01).alias("divisionDistinct"))
- CstoresDF.agg(approx_count_distinct(col("division"), 0.15).alias("divisionDistinct"))Most Voted
- DstoresDF.agg(approx_count_distinct(col("division"), 0.0).alias("divisionDistinct"))
- EstoresDF.agg(approx_count_distinct(col("division"), 0.05).alias("divisionDistinct"))
send
light_mode
delete
Question #32
The code block shown below contains an error. The code block is intended to return a new DataFrame with the mean of column sqft from DataFrame storesDF in column sqftMean. Identify the error.
Code block:
storesDF.agg(mean("sqft").alias("sqftMean"))
Code block:
storesDF.agg(mean("sqft").alias("sqftMean"))
- AThe argument to the mean() operation should be a Column abject rather than a string column name.Most Voted
- BThe argument to the mean() operation should not be quoted.
- CThe mean() operation is not a standalone function – it’s a method of the Column object.
- DThe agg() operation is not appropriate here – the withColumn() operation should be used instead.
- EThe only way to compute a mean of a column is with the mean() method from a DataFrame.
Correct Answer:
A
A
send
light_mode
delete
Question #33
Which of the following operations can be used to return the number of rows in a DataFrame?
- ADataFrame.numberOfRows()
- BDataFrame.n()
- CDataFrame.sum()
- DDataFrame.count()
- EDataFrame.countDistinct()
Correct Answer:
D
D
send
light_mode
delete
Question #34
Which of the following operations returns a GroupedData object?
- ADataFrame.GroupBy()
- BDataFrame.cubed()
- CDataFrame.group()
- DDataFrame.groupBy()Most Voted
- EDataFrame.grouping_id()
Correct Answer:
D
D
send
light_mode
delete
Question #35
Which of the following code blocks returns a collection of summary statistics for all columns in
DataFrame storesDF?
DataFrame storesDF?
- AstoresDF.summary("mean")
- BstoresDF.describe(all = True)
- CstoresDF.describe("all")
- DstoresDF.summary("all")
- EstoresDF.describe()Most Voted
Correct Answer:
E
E
send
light_mode
delete
Question #36
Which of the following code blocks fails to return a DataFrame reverse sorted alphabetically based on column division?
- AstoresDF.orderBy("division", ascending – False)
- BstoresDF.orderBy(["division"], ascending = [0])
- CstoresDF.orderBy(col("division").asc())
- DstoresDF.sort("division", ascending – False)
- EstoresDF.sort(desc("division"))
Correct Answer:
C
C
send
light_mode
delete
Question #37
Which of the following code blocks returns a 15 percent sample of rows from DataFrame storesDF without replacement?
- AstoresDF.sample(fraction = 0.10)
- BstoresDF.sampleBy(fraction = 0.15)
- CstoresDF.sample(True, fraction = 0.10)
- DstoresDF.sample()
- EstoresDF.sample(fraction = 0.15)Most Voted
Correct Answer:
E
E
send
light_mode
delete
Question #38
Which of the following code blocks returns all the rows from DataFrame storesDF?
- AstoresDF.head()
- BstoresDF.collect()Most Voted
- CstoresDF.count()
- DstoresDF.take()
- EstoresDF.show()
Correct Answer:
B
B
send
light_mode
delete
Question #39
Which of the following code blocks applies the function assessPerformance() to each row of DataFrame storesDF?
- A[assessPerformance(row) for row in storesDF.take(3)]
- B[assessPerformance() for row in storesDF]
- CstoresDF.collect().apply(lambda: assessPerformance)
- D[assessPerformance(row) for row in storesDF.collect()]Most Voted
- E[assessPerformance(row) for row in storesDF]
Correct Answer:
D
D
send
light_mode
delete
Question #40
The code block shown below contains an error. The code block is intended to print the schema of DataFrame storesDF. Identify the error.
Code block:
storesDF.printSchema
Code block:
storesDF.printSchema
- AThere is no printSchema member of DataFrame – schema and the print() function should be used instead.
- BThe entire line needs to be a string – it should be wrapped by str().
- CThere is no printSchema member of DataFrame – the getSchema() operation should be used instead.
- DThere is no printSchema member of DataFrame – the schema() operation should be used instead.
- EThe printSchema member of DataFrame is an operation and needs to be followed by parentheses.Most Voted
Correct Answer:
E
E
send
light_mode
delete
All Pages
