Certified Associate Developer for Apache Spark Exam - Free Exam Q&As, Page 5

Question #41

The code block shown below should create and register a SQL UDF named "ASSESS_PERFORMANCE" using the Python function assessPerformance() and apply it to column customerSatisfaction in table stores. Choose the response that correctly fills in the numbered blanks within the code block to complete this task.
Code block:
spark._1_._2_(_3_, _4_)
spark.sql("SELECT customerSatisfaction, _5_(customerSatisfaction) AS result FROM stores")

A
1. udf
2. register
3. "ASSESS_PERFORMANCE"
4. assessPerformance
5. ASSESS_PERFORMANCE
Most Voted
B
1. udf
2. register
3. assessPerformance
4. "ASSESS_PERFORMANCE"
5. "ASSESS_PERFORMANCE"
C
1. udf
2. register
3."ASSESS_PERFORMANCE"
4. assessPerformance
5. "ASSESS_PERFORMANCE"
D
1. register
2. udf
3. "ASSESS_PERFORMANCE"
4. assessPerformance
5. "ASSESS_PERFORMANCE"
E
1. udf
2. register
3. ASSESS_PERFORMANCE
4. assessPerformance
5. ASSESS_PERFORMANCE

Correct Answer:
A

Show Answer

send

light_mode delete

Question #42

The code block shown below contains an error. The code block is intended to create a Python UDF assessPerformanceUDF() using the integer-returning Python function assessPerformance() and apply it to column customerSatisfaction in DataFrame storesDF. Identify the error.
Code block:
assessPerformanceUDF – udf(assessPerformance)
storesDF.withColumn("result", assessPerformanceUDF(col("customerSatisfaction")))

A
The assessPerformance() operation is not properly registered as a UDF.
B
The withColumn() operation is not appropriate here – UDFs should be applied by iterating over rows instead.
C
UDFs can only be applied vie SQL and not through the DataFrame API.
D
The return type of the assessPerformanceUDF() is not specified in the udf() operation.
Most Voted
E
The assessPerformance() operation should be used on column customerSatisfaction rather than the assessPerformanceUDF() operation.

Correct Answer: D ?️

Show Answer

send

light_mode delete

Question #43

The code block shown below contains an error. The code block is intended to use SQL to return a new DataFrame containing column storeId and column managerName from a table created from DataFrame storesDF. Identify the error.
Code block:
storesDF.createOrReplaceTempView("stores")
storesDF.sql("SELECT storeId, managerName FROM stores")

A
The createOrReplaceTempView() operation does not make a Dataframe accessible via SQL.
B
The sql() operation should be accessed via the spark variable rather than DataFrame storesDF.
Most Voted
C
There is the sql() operation in DataFrame storesDF. The operation query() should be used instead.
D
This cannot be accomplished using SQL – the DataFrame API should be used instead.
E
The createOrReplaceTempView() operation should be accessed via the spark variable rather than DataFrame storesDF.

Correct Answer:
B

Show Answer

send

light_mode delete

Question #44

The code block shown below should create a single-column DataFrame from Python list years which is made up of integers. Choose the response that correctly fills in the numbered blanks within the code block to complete this task.
Code block:
_1_._2_(_3_, _4_)

A
1. spark
2. createDataFrame
3. years
4. IntegerType
B
1. DataFrame
2. create
3. [years]
4. IntegerType
C
1. spark
2. createDataFrame
3. [years]
4. IntegertType
D
1. spark
2. createDataFrame
3. [years]
4. IntegertType()
E
1. spark
2. createDataFrame
3. years
4. IntegertType()
Most Voted

Correct Answer: E ?️

Show Answer

send

light_mode delete

Question #45

The code block shown below contains an error. The code block is intended to cache DataFrame storesDF only in Spark’s memory and then return the number of rows in the cached DataFrame. Identify the error.
Code block:
storesDF.cache().count()

A
The cache() operation caches DataFrames at the MEMORY_AND_DISK level by default – the storage level must be specified to MEMORY_ONLY as an argument to cache().
B
The cache() operation caches DataFrames at the MEMORY_AND_DISK level by default – the storage level must be set via storesDF.storageLevel prior to calling cache().
C
The storesDF DataFrame has not been checkpointed – it must have a checkpoint in order to be cached.
D
DataFrames themselves cannot be cached – DataFrame storesDF must be cached as a table.
E
The cache() operation can only cache DataFrames at the MEMORY_AND_DISK level (the default) – persist() should be used instead.
Most Voted

Correct Answer: E ?️

Show Answer

send

light_mode delete

Question #46

Which of the following operations can be used to return a new DataFrame from DataFrame storesDF without inducing a shuffle?

A
storesDF.intersect()
B
storesDF.repartition(1)
C
storesDF.union()
Most Voted
D
storesDF.coalesce(1)
E
storesDF.rdd.getNumPartitions()

Correct Answer:
D

Show Answer

send

light_mode delete

Question #47

The code block shown below contains an error. The code block is intended to return a new 12-partition DataFrame from the 8-partition DataFrame storesDF by inducing a shuffle. Identify the error.
Code block:
storesDF.coalesce(12)

A
The coalesce() operation cannot guarantee the number of target partitions – the repartition() operation should be used instead.
B
The coalesce() operation does not induce a shuffle and cannot increase the number of partitions – the repartition() operation should be used instead.
Most Voted
C
The coalesce() operation will only work if the DataFrame has been cached to memory – the repartition() operation should be used instead.
D
The coalesce() operation requires a column by which to partition rather than a number of partitions – the repartition() operation should be used instead.
E
The number of resulting partitions, 12, is not achievable for an 8-partition DataFrame.

Correct Answer:
B

Show Answer

send

light_mode delete

Question #48

Which of the following Spark properties is used to configure whether DataFrame partitions that do not meet a minimum size threshold are automatically coalesced into larger partitions during a shuffle?

A
spark.sql.shuffle.partitions
B
spark.sql.autoBroadcastJoinThreshold
C
spark.sql.adaptive.skewJoin.enabled
D
spark.sql.inMemoryColumnarStorage.batchSize
E
spark.sql.adaptive.coalescePartitions.enabled
Most Voted

Correct Answer:
E

Show Answer

send

light_mode delete

Question #49

The code block shown below contains an error. The code block is intended to return a DataFrame containing a column openDateString, a string representation of Java’s SimpleDateFormat. Identify the error.
Note that column openDate is of type integer and represents a date in the UNIX epoch format – the number of seconds since midnight on January 1st, 1970.
An example of Java’s SimpleDateFormat is "Sunday, Dec 4, 2008 1:05 PM".
A sample of storesDF is displayed below:

Code block:
storesDF.withColumn("openDateString", from_unixtime(col("openDate"), "EEE, MMM d, yyyy h:mm a", TimestampType()))

A
The from_unixtime() operation only accepts two parameters – the TimestampTime() arguments not necessary.
Most Voted
B
The from_unixtime() operation only works if column openDate is of type long rather than integer – column openDate must first be converted.
C
The second argument to from_unixtime() is not correct – it should be a variant of TimestampType() rather than a string.
D
The from_unixtime() operation automatically places the input column in java’s SimpleDateFormat – there is no need for a second or third argument.
E
The column openDate must first be converted to a timestamp, and then the Date() function can be used to reformat to java’s SimpleDateFormat.

Correct Answer:
A

Show Answer

send

light_mode delete

Question #50

Which of the following code blocks returns a DataFrame containing a column dayOfYear, an integer representation of the day of the year from column openDate from DataFrame storesDF?
Note that column openDate is of type integer and represents a date in the UNIX epoch format – the number of seconds since midnight on January 1st, 1970.
A sample of storesDF is displayed below:

A
(storesDF.withColumn("openTimestamp", col("openDate").cast("Timestamp"))
. withColumn("dayOfYear", dayofyear(col("openTimestamp"))))
Most Voted
B
storesDF.withColumn("dayOfYear", get dayofyear(col("openDate")))
C
storesDF.withColumn("dayOfYear", dayofyear(col("openDate")))
D
(storesDF.withColumn("openDateFormat", col("openDate").cast("Date"))
. withColumn("dayOfYear", dayofyear(col("openDateFormat"))))
E
storesDF.withColumn("dayOfYear", substr(col("openDate"), 4, 6))

Correct Answer: A ?️

Show Answer

send

light_mode delete

Databricks Certified Associate Developer for Apache Spark Exam Practice Questions (P. 5)

Get Contributor Access

Download Demo PDF

Question #41

Question #42

Question #43

Question #44

Question #45

Question #46

Question #47

Question #48

Question #49

Question #50

Best prices & offers

Latest Question

Expert Verified

Instant Download

High Success Rate

Follow Us

Databricks Certified Associate Developer for Apache Spark Exam Practice Questions (P. 5)

Get Contributor Access

Download Demo PDF

Question #41

Question #42

Question #43

Question #44

Question #45

Question #46

Question #47

Question #48

Question #49

Question #50