Databricks Certified Associate Developer for Apache Spark Exam Practice Questions (P. 4)
- Full Access (207 questions)
- Six months of Premium Access
- Access to one million comments
- Seamless ChatGPT Integration
- Ability to download PDF files
- Anki Flashcard files for revision
- No Captcha & No AdSense
- Advanced Exam Configuration
Question #16
Which of the following operations can be used to create a new DataFrame that has 12 partitions from an original DataFrame df that has 8 partitions?
- Adf.repartition(12)Most Voted
- Bdf.cache()
- Cdf.partitionBy(1.5)
- Ddf.coalesce(12)
- Edf.partitionBy(12)
Correct Answer:
A
A

The repartition() function in Spark, highlighted in option A, is the right call when you're looking to adjust the number of partitions in a DataFrame, and it can both increase and decrease this number. Specifically, when you use df.repartition(12), you're instructing Spark to redistribute the data across 12 new partitions, regardless of the original count, which is perfect when the frame initially has 8 partitions. This makes it a powerful tool for managing how data is divided and handled within Spark, leading to potentially improved performance in distributed environments.
send
light_mode
delete
Question #17
Which of the following object types cannot be contained within a column of a Spark DataFrame?
send
light_mode
delete
Question #18
Which of the following operations can be used to create a DataFrame with a subset of columns from DataFrame storesDF that are specified by name?
- AstoresDF.subset()
- BstoresDF.select()Most Voted
- CstoresDF.selectColumn()
- DstoresDF.filter()
- EstoresDF.drop()
Correct Answer:
B
B
send
light_mode
delete
Question #19
The code block shown below contains an error. The code block is intended to return a DataFrame containing all columns from DataFrame storesDF except for column sqft and column customerSatisfaction. Identify the error.
Code block:
storesDF.drop(sqft, customerSatisfaction)
Code block:
storesDF.drop(sqft, customerSatisfaction)
- AThe drop() operation only works if one column name is called at a time – there should be two calls in succession like storesDF.drop("sqft").drop("customerSatisfaction").
- BThe drop() operation only works if column names are wrapped inside the col() function like storesDF.drop(col(sqft), col(customerSatisfaction)).
- CThere is no drop() operation for storesDF.
- DThe sqft and customerSatisfaction column names should be quoted like "sqft" and "customerSatisfaction".Most Voted
- EThe sqft and customerSatisfaction column names should be subset from the DataFrame storesDF like storesDF."sqft" and storesDF."customerSatisfaction".
Correct Answer:
D
D
send
light_mode
delete
Question #20
Which of the following code blocks returns a DataFrame containing only the rows from DataFrame storesDF where the value in column sqft is less than or equal to 25,000?
- AstoresDF.filter("sqft" <= 25000)
- BstoresDF.filter(sqft > 25000)
- CstoresDF.where(storesDF[sqft] > 25000)
- DstoresDF.where(sqft > 25000)
- EstoresDF.filter(col("sqft") <= 25000)Most Voted
Correct Answer:
E
E
send
light_mode
delete
All Pages