Google Professional Machine Learning Engineer Exam Practice Questions (P. 1)
- Full Access (304 questions)
- Six months of Premium Access
- Access to one million comments
- Seamless ChatGPT Integration
- Ability to download PDF files
- Anki Flashcard files for revision
- No Captcha & No AdSense
- Advanced Exam Configuration
Question #1
You are building an ML model to detect anomalies in real-time sensor data. You will use Pub/Sub to handle incoming requests. You want to store the results for analytics and visualization. How should you configure the pipeline?
- A1 = Dataflow, 2 = AI Platform, 3 = BigQueryMost Voted
- B1 = DataProc, 2 = AutoML, 3 = Cloud Bigtable
- C1 = BigQuery, 2 = AutoML, 3 = Cloud Functions
- D1 = BigQuery, 2 = AI Platform, 3 = Cloud Storage
Correct Answer:
C
Reference:
https://cloud.google.com/solutions/building-anomaly-detection-dataflow-bigqueryml-dlp
C
Reference:
https://cloud.google.com/solutions/building-anomaly-detection-dataflow-bigqueryml-dlp
send
light_mode
delete
Question #2
Your organization wants to make its internal shuttle service route more efficient. The shuttles currently stop at all pick-up points across the city every 30 minutes between 7 am and 10 am. The development team has already built an application on Google Kubernetes Engine that requires users to confirm their presence and shuttle station one day in advance. What approach should you take?
- A1. Build a tree-based regression model that predicts how many passengers will be picked up at each shuttle station. 2. Dispatch an appropriately sized shuttle and provide the map with the required stops based on the prediction.
- B1. Build a tree-based classification model that predicts whether the shuttle should pick up passengers at each shuttle station. 2. Dispatch an available shuttle and provide the map with the required stops based on the prediction.
- C1. Define the optimal route as the shortest route that passes by all shuttle stations with confirmed attendance at the given time under capacity constraints. 2. Dispatch an appropriately sized shuttle and indicate the required stops on the map.Most Voted
- D1. Build a reinforcement learning model with tree-based classification models that predict the presence of passengers at shuttle stops as agents and a reward function around a distance-based metric. 2. Dispatch an appropriately sized shuttle and provide the map with the required stops based on the simulated outcome.
Correct Answer:
A
A

Answer A provides the correct strategic application of machine learning to improve operational efficiency for the shuttle service. By leveraging a tree-based regression model, the solution allows us to predict the number of passengers at each shuttle station accurately. This predictive capability enables the dispatch of appropriately sized shuttles and optimizes the route based on anticipated passenger volume. This approach not only addresses efficiency but is also scalable and adaptable to daily fluctuations in rider attendance and preferences, which is crucial for dynamic urban transportation solutions.
send
light_mode
delete
Question #3
You were asked to investigate failures of a production line component based on sensor readings. After receiving the dataset, you discover that less than 1% of the readings are positive examples representing failure incidents. You have tried to train several classification models, but none of them converge. How should you resolve the class imbalance problem?
- AUse the class distribution to generate 10% positive examples.
- BUse a convolutional neural network with max pooling and softmax activation.
- CDownsample the data with upweighting to create a sample with 10% positive examples.
- DRemove negative examples until the numbers of positive and negative examples are equal.
Correct Answer:
B
Reference:
https://towardsdatascience.com/convolution-neural-networks-a-beginners-guide-implementing-a-mnist-hand-written-digit-8aa60330d022
B
Reference:
https://towardsdatascience.com/convolution-neural-networks-a-beginners-guide-implementing-a-mnist-hand-written-digit-8aa60330d022
send
light_mode
delete
Question #4
You want to rebuild your ML pipeline for structured data on Google Cloud. You are using PySpark to conduct data transformations at scale, but your pipelines are taking over 12 hours to run. To speed up development and pipeline run time, you want to use a serverless tool and SQL syntax. You have already moved your raw data into Cloud Storage. How should you build the pipeline on Google Cloud while meeting the speed and processing requirements?
- AUse Data Fusion's GUI to build the transformation pipelines, and then write the data into BigQuery.
- BConvert your PySpark into SparkSQL queries to transform the data, and then run your pipeline on Dataproc to write the data into BigQuery.
- CIngest your data into Cloud SQL, convert your PySpark commands into SQL queries to transform the data, and then use federated queries from BigQuery for machine learning.
- DIngest your data into BigQuery using BigQuery Load, convert your PySpark commands into BigQuery SQL queries to transform the data, and then write the transformations to a new table.Most Voted
Correct Answer:
B
B

The optimal solution, leveraging existing technology for fast processing and SQL compatibility, involves transitioning from PySpark to SparkSQL. This is naturally supported within Google's ecosystem via Dataproc, allowing streamlined integration and enhanced performance for transformation processes, culminating in storage into BigQuery. While Dataproc is not strictly serverless, its managed cluster environment provides significant value by simplifying operational management and scaling, thus striking a practical balance between performance needs and managerial simplicity. This approach efficiently meets the requirements for rapid processing and scalability within a SQL-centric framework.
send
light_mode
delete
Question #5
You manage a team of data scientists who use a cloud-based backend system to submit training jobs. This system has become very difficult to administer, and you want to use a managed service instead. The data scientists you work with use many different frameworks, including Keras, PyTorch, theano, Scikit-learn, and custom libraries. What should you do?
- AUse the AI Platform custom containers feature to receive training jobs using any framework.Most Voted
- BConfigure Kubeflow to run on Google Kubernetes Engine and receive training jobs through TF Job.
- CCreate a library of VM images on Compute Engine, and publish these images on a centralized repository.
- DSet up Slurm workload manager to receive jobs that can be scheduled to run on your cloud infrastructure.
Correct Answer:
D
D

The correct choice would be D if you're looking for a method that allows for extensive customization and supports various frameworks used by your team without insisting on a managed service solution. However, if prioritizing a managed service to replace your complex system is critical, option A, involving Vertex AI (formerly AI Platform), would be more appropriate. Vertex AI supports a wide range of ML frameworks and provides a truly managed environment, alleviating administrative burdens. Thus, while D correctly answers within its context, A could be a more fitting choice considering your need for a managed service.
send
light_mode
delete
Question #6
You work for an online retail company that is creating a visual search engine. You have set up an end-to-end ML pipeline on Google Cloud to classify whether an image contains your company's product. Expecting the release of new products in the near future, you configured a retraining functionality in the pipeline so that new data can be fed into your ML models. You also want to use AI Platform's continuous evaluation service to ensure that the models have high accuracy on your test dataset. What should you do?
- AKeep the original test dataset unchanged even if newer products are incorporated into retraining.
- BExtend your test dataset with images of the newer products when they are introduced to retraining.Most Voted
- CReplace your test dataset with images of the newer products when they are introduced to retraining.
- DUpdate your test dataset with images of the newer products when your evaluation metrics drop below a pre-decided threshold.
Correct Answer:
C
C

In setting up an effective ML visual search system for an online retailer, particularly when new products are continually introduced, consideration needs to be given to the content and relevance of test datasets. The ideal approach is to maintain a test dataset that mirrors the current inventory and user interest, integrating images of new products as these items are introduced into the training pipeline. This ensures that model evaluations remain both relevant and reflective of the actual operational environment, maintaining high accuracy and relevancy across all products, new and existing.
send
light_mode
delete
Question #7
You need to build classification workflows over several structured datasets currently stored in BigQuery. Because you will be performing the classification several times, you want to complete the following steps without writing code: exploratory data analysis, feature selection, model building, training, and hyperparameter tuning and serving. What should you do?
- AConfigure AutoML Tables to perform the classification task.Most Voted
- BRun a BigQuery ML task to perform logistic regression for the classification.
- CUse AI Platform Notebooks to run the classification model with pandas library.
- DUse AI Platform to run the classification model job configured for hyperparameter tuning.
Correct Answer:
B
BigQuery ML supports supervised learningג€ with the logistic regression model type.
Reference:
https://cloud.google.com/bigquery-ml/docs/logistic-regression-prediction
B
BigQuery ML supports supervised learningג€ with the logistic regression model type.
Reference:
https://cloud.google.com/bigquery-ml/docs/logistic-regression-prediction
send
light_mode
delete
Question #8
You work for a public transportation company and need to build a model to estimate delay times for multiple transportation routes. Predictions are served directly to users in an app in real time. Because different seasons and population increases impact the data relevance, you will retrain the model every month. You want to follow Google-recommended best practices. How should you configure the end-to-end architecture of the predictive model?
- AConfigure Kubeflow Pipelines to schedule your multi-step workflow from training to deploying your model.Most Voted
- BUse a model trained and deployed on BigQuery ML, and trigger retraining with the scheduled query feature in BigQuery.
- CWrite a Cloud Functions script that launches a training and deploying job on AI Platform that is triggered by Cloud Scheduler.
- DUse Cloud Composer to programmatically schedule a Dataflow job that executes the workflow from training to deploying your model.
Correct Answer:
A
A

Kubeflow Pipelines is an excellent choice for managing the complexities of deploying and maintaining a machine learning model in a continuous, real-time environment. It's particularly useful in scenarios like this, where monthly retrainings are needed to adjust to seasonal variations and population changes. Kubeflow Pipelines helps in orchestrating a multi-step workflow efficiently, from model training to deployment, thus enabling an end-to-end machine learning architecture that can dynamically update as per set schedules. This feature aligns well with your requirement, ensuring new models are seamlessly rolled out with minimal manual intervention.
send
light_mode
delete
Question #9
You are developing ML models with AI Platform for image segmentation on CT scans. You frequently update your model architectures based on the newest available research papers, and have to rerun training on the same dataset to benchmark their performance. You want to minimize computation costs and manual intervention while having version control for your code. What should you do?
- AUse Cloud Functions to identify changes to your code in Cloud Storage and trigger a retraining job.
- BUse the gcloud command-line tool to submit training jobs on AI Platform when you update your code.
- CUse Cloud Build linked with Cloud Source Repositories to trigger retraining when new code is pushed to the repository.Most Voted
- DCreate an automated workflow in Cloud Composer that runs daily and looks for changes in code in Cloud Storage using a sensor.
Correct Answer:
B
https://cloud.google.com/ai-platform/training/docs/training-jobs
B
https://cloud.google.com/ai-platform/training/docs/training-jobs
send
light_mode
delete
Question #10
Your team needs to build a model that predicts whether images contain a driver's license, passport, or credit card. The data engineering team already built the pipeline and generated a dataset composed of 10,000 images with driver's licenses, 1,000 images with passports, and 1,000 images with credit cards. You now have to train a model with the following label map: [`˜drivers_license', `˜passport', `˜credit_card']. Which loss function should you use?
- ACategorical hinge
- BBinary cross-entropy
- CCategorical cross-entropyMost Voted
- DSparse categorical cross-entropy
Correct Answer:
D
se sparse_categorical_crossentropy. Examples for above 3-class classification problem: [1] , [2], [3]
Reference:
https://stats.stackexchange.com/questions/326065/cross-entropy-vs-sparse-cross-entropy-when-to-use-one-over-the-other
D
se sparse_categorical_crossentropy. Examples for above 3-class classification problem: [1] , [2], [3]
Reference:
https://stats.stackexchange.com/questions/326065/cross-entropy-vs-sparse-cross-entropy-when-to-use-one-over-the-other
send
light_mode
delete
All Pages