Databricks Certified Machine Learning Professional Exam Practice Questions (P. 1)

Full Access (216 questions)
One Year of Premium Access
Access to one million comments
Seamless ChatGPT Integration

Ability to download PDF files
Anki Flashcard files for revision
No Captcha & No AdSense
Advanced Exam Configuration

Get Contributor Access

Question #1

Which of the following describes concept drift?

A
Concept drift is when there is a change in the distribution of an input variable
B
Concept drift is when there is a change in the distribution of a target variable
C
Concept drift is when there is a change in the relationship between input variables and target variables
Most Voted
D
Concept drift is when there is a change in the distribution of the predicted target given by the model
E
None of these describe Concept drift

Correct Answer: C

GPT

Concept drift is accurately described as a change in the relationship between input variables and target variables, meaning the interaction patterns that the model initially learned to predict outcomes may no longer hold true. This usually results in a decrease in model accuracy over time as the model continues to use outdated information to make predictions. This phenomenon necessitates the need for model retraining or adaptation strategies to cope with evolving data trends.

Show Answer

send

light_mode delete

Question #2

A machine learning engineer is monitoring categorical input variables for a production machine learning application. The engineer believes that missing values are becoming more prevalent in more recent data for a particular value in one of the categorical input variables.
Which of the following tools can the machine learning engineer use to assess their theory?

A
Kolmogorov-Smirnov (KS) test
B
One-way Chi-squared Test
Most Voted
C
Two-way Chi-squared Test
D
Jenson-Shannon distance
E
None of these

Correct Answer:
B

GPT

The One-way Chi-squared Test is suitable here as it allows the machine learning engineer to analyze whether there's a significant shift in frequency within a single categorical variable over different time periods. By examining the distribution of values, including the increase in missing values over time, this test is excellent for detecting changes concerning each category in the variable, thus confirming or rejecting the engineer's theory effectively.

Show Answer

send

light_mode delete

Question #3

A data scientist is using MLflow to track their machine learning experiment. As a part of each MLflow run, they are performing hyperparameter tuning. The data scientist would like to have one parent run for the tuning process with a child run for each unique combination of hyperparameter values.
They are using the following code block:

The code block is not nesting the runs in MLflow as they expected.
Which of the following changes does the data scientist need to make to the above code block so that it successfully nests the child runs under the parent run in MLflow?

A
Indent the child run blocks within the parent run block
Most Voted
B
Add the nested=True argument to the parent run
C
Remove the nested=True argument from the child runs
D
Provide the same name to the run_name parameter for all three run blocks
E
Add the nested=True argument to the parent run and remove the nested=True arguments from the child runs

Correct Answer: A

Show Answer

send

light_mode delete

Question #4

A machine learning engineer wants to log feature importance data from a CSV file at path importance_path with an MLflow run for model model.
Which of the following code blocks will accomplish this task inside of an existing MLflow run block?

A
B
C
mlflow.log_data(importance_path, "feature-importance.csv")
D
mlflow.log_artifact(importance_path, "feature-importance.csv")
Most Voted
E
None of these code blocks tan accomplish the task.

Correct Answer: D

GPT

To properly log feature importance data with MLflow during a machine learning model run, it is best to use `mlflow.log_artifact()`. This function efficiently logs files from a specified path as artifacts of the current run. For this case, using `mlflow.log_artifact(importance_path, "feature-importance.csv")`, as referenced in the provided user comments, is correct for logging a CSV file containing feature importance data. This method ensures that the designated file is uploaded and stored with the run details, allowing full traceability and access to the feature importance data.

Show Answer

send

light_mode delete

Question #5

Which of the following is a simple, low-cost method of monitoring numeric feature drift?

A
Jensen-Shannon test
B
Summary statistics trends
Most Voted
C
Chi-squared test
D
None of these can be used to monitor feature drift
E
Kolmogorov-Smirnov (KS) test

Correct Answer:
B

GPT

Absolutely, using summary statistics trends is spot-on when you want a straightforward and low-cost method to monitor numeric feature drift. Just keep an eye on basic metrics like the mean, median, and standard deviation over time. It’ll give you a good clue about any shifts in your data distribution without diving into complex stats tests. Simple, effective, and pretty neat for quick checks!

Show Answer

send

light_mode delete

Next Questions

All Pages

Databricks Certified Machine Learning Professional Exam Practice Questions (P. 1)

Get Contributor Access

Download Demo PDF

Question #1

Question #2

Question #3

Question #4

Question #5

Best prices & offers

Latest Question

Expert Verified

Instant Download

High Success Rate

Follow Us

Databricks Certified Machine Learning Professional Exam Practice Questions (P. 1)

Get Contributor Access

Download Demo PDF

Question #1

Question #2

Question #3

Question #4

Question #5