Databricks Certified Machine Learning Professional Exam Practice Questions (P. 1)
- Full Access (60 questions)
- Six months of Premium Access
- Access to one million comments
- Seamless ChatGPT Integration
- Ability to download PDF files
- Anki Flashcard files for revision
- No Captcha & No AdSense
- Advanced Exam Configuration
Question #1
Which of the following describes concept drift?
- AConcept drift is when there is a change in the distribution of an input variable
- BConcept drift is when there is a change in the distribution of a target variable
- CConcept drift is when there is a change in the relationship between input variables and target variablesMost Voted
- DConcept drift is when there is a change in the distribution of the predicted target given by the model
- ENone of these describe Concept drift
Correct Answer:
D
D

Concept drift is accurately described as a change in the relationship between input variables and target variables, meaning the interaction patterns that the model initially learned to predict outcomes may no longer hold true. This usually results in a decrease in model accuracy over time as the model continues to use outdated information to make predictions. This phenomenon necessitates the need for model retraining or adaptation strategies to cope with evolving data trends.
send
light_mode
delete
Question #2
A machine learning engineer is monitoring categorical input variables for a production machine learning application. The engineer believes that missing values are becoming more prevalent in more recent data for a particular value in one of the categorical input variables.
Which of the following tools can the machine learning engineer use to assess their theory?
Which of the following tools can the machine learning engineer use to assess their theory?
- AKolmogorov-Smirnov (KS) test
- BOne-way Chi-squared TestMost Voted
- CTwo-way Chi-squared Test
- DJenson-Shannon distance
- ENone of these
Correct Answer:
B
B

The One-way Chi-squared Test is suitable here as it allows the machine learning engineer to analyze whether there's a significant shift in frequency within a single categorical variable over different time periods. By examining the distribution of values, including the increase in missing values over time, this test is excellent for detecting changes concerning each category in the variable, thus confirming or rejecting the engineer's theory effectively.
send
light_mode
delete
Question #3
A data scientist is using MLflow to track their machine learning experiment. As a part of each MLflow run, they are performing hyperparameter tuning. The data scientist would like to have one parent run for the tuning process with a child run for each unique combination of hyperparameter values.
They are using the following code block:

The code block is not nesting the runs in MLflow as they expected.
Which of the following changes does the data scientist need to make to the above code block so that it successfully nests the child runs under the parent run in MLflow?
They are using the following code block:

The code block is not nesting the runs in MLflow as they expected.
Which of the following changes does the data scientist need to make to the above code block so that it successfully nests the child runs under the parent run in MLflow?
- AIndent the child run blocks within the parent run blockMost Voted
- BAdd the nested=True argument to the parent run
- CRemove the nested=True argument from the child runs
- DProvide the same name to the run_name parameter for all three run blocks
- EAdd the nested=True argument to the parent run and remove the nested=True arguments from the child runs
Correct Answer:
E
E
send
light_mode
delete
Question #4
A machine learning engineer wants to log feature importance data from a CSV file at path importance_path with an MLflow run for model model.
Which of the following code blocks will accomplish this task inside of an existing MLflow run block?
Which of the following code blocks will accomplish this task inside of an existing MLflow run block?
- A
- B
- Cmlflow.log_data(importance_path, "feature-importance.csv")
- Dmlflow.log_artifact(importance_path, "feature-importance.csv")Most Voted
- ENone of these code blocks tan accomplish the task.
Correct Answer:
C
C

To properly log feature importance data with MLflow during a machine learning model run, it is best to use `mlflow.log_artifact()`. This function efficiently logs files from a specified path as artifacts of the current run. For this case, using `mlflow.log_artifact(importance_path, "feature-importance.csv")`, as referenced in the provided user comments, is correct for logging a CSV file containing feature importance data. This method ensures that the designated file is uploaded and stored with the run details, allowing full traceability and access to the feature importance data.
send
light_mode
delete
Question #5
Which of the following is a simple, low-cost method of monitoring numeric feature drift?
- AJensen-Shannon test
- BSummary statistics trendsMost Voted
- CChi-squared test
- DNone of these can be used to monitor feature drift
- EKolmogorov-Smirnov (KS) test
Correct Answer:
B
B

Absolutely, using summary statistics trends is spot-on when you want a straightforward and low-cost method to monitor numeric feature drift. Just keep an eye on basic metrics like the mean, median, and standard deviation over time. It’ll give you a good clue about any shifts in your data distribution without diving into complex stats tests. Simple, effective, and pretty neat for quick checks!
send
light_mode
delete
All Pages