Amazon AWS Certified Data Engineer - Associate DEA-C01 Exam Practice Questions (P. 1)
- Full Access (356 questions)
- One Year of Premium Access
- Access to one million comments
- Seamless ChatGPT Integration
- Ability to download PDF files
- Anki Flashcard files for revision
- No Captcha & No AdSense
- Advanced Exam Configuration
Question #1
A data engineer is configuring an AWS Glue job to read data from an Amazon S3 bucket. The data engineer has set up the necessary AWS Glue connection details and an associated IAM role. However, when the data engineer attempts to run the AWS Glue job, the data engineer receives an error message that indicates that there are problems with the Amazon S3 VPC gateway endpoint.
The data engineer must resolve the error and connect the AWS Glue job to the S3 bucket.
Which solution will meet this requirement?
The data engineer must resolve the error and connect the AWS Glue job to the S3 bucket.
Which solution will meet this requirement?
- AUpdate the AWS Glue security group to allow inbound traffic from the Amazon S3 VPC gateway endpoint.
- BConfigure an S3 bucket policy to explicitly grant the AWS Glue job permissions to access the S3 bucket.
- CReview the AWS Glue job code to ensure that the AWS Glue connection details include a fully qualified domain name.
- DVerify that the VPC's route table includes inbound and outbound routes for the Amazon S3 VPC gateway endpoint.Most Voted
Correct Answer:
D
D
GPT-4o - Answer
Check your VPC's route table settings. Even if AWS doesn't use traditional "inbound and outbound routes" terms for the VPC route table, it's crucial to ensure that routes to the S3 gateway endpoint are properly configured. This is necessary for the AWS Glue service, which operates serverlessly, ensuring that routing issues aren’t the cause of the connection problem with the S3 bucket. This step resolves conflicts by allowing AWS Glue to communicate efficiently with S3, eliminating errors related to the VPC gateway endpoint.
send
light_mode
delete
Question #2
A retail company has a customer data hub in an Amazon S3 bucket. Employees from many countries use the data hub to support company-wide analytics. A governance team must ensure that the company's data analysts can access data only for customers who are within the same country as the analysts.
Which solution will meet these requirements with the LEAST operational effort?
Which solution will meet these requirements with the LEAST operational effort?
- ACreate a separate table for each country's customer data. Provide access to each analyst based on the country that the analyst serves.
- BRegister the S3 bucket as a data lake location in AWS Lake Formation. Use the Lake Formation row-level security features to enforce the company's access policies.Most Voted
- CMove the data to AWS Regions that are close to the countries where the customers are. Provide access to each analyst based on the country that the analyst serves.
- DLoad the data into Amazon Redshift. Create a view for each country. Create separate IAM roles for each country to provide access to data from each country. Assign the appropriate roles to the analysts.
Correct Answer:
B
B
GPT-4o - Answer
The most efficient approach to meet the requirements is using AWS Lake Formation. This service allows the registration of data lakes and facilitates granular, row-level data access control. By using Lake Formation’s row-level security features, it's possible to enforce access policies effectively with minimal manual effort compared to other options like managing separate IAM roles or databases for each region. This streamlined configuration significantly reduces operational complexity and oversight, making it an ideal choice for scenarios described.
send
light_mode
delete
Question #3
A media company wants to improve a system that recommends media content to customer based on user behavior and preferences. To improve the recommendation system, the company needs to incorporate insights from third-party datasets into the company's existing analytics platform.
The company wants to minimize the effort and time required to incorporate third-party datasets.
Which solution will meet these requirements with the LEAST operational overhead?
The company wants to minimize the effort and time required to incorporate third-party datasets.
Which solution will meet these requirements with the LEAST operational overhead?
- AUse API calls to access and integrate third-party datasets from AWS Data Exchange.Most Voted
- BUse API calls to access and integrate third-party datasets from AWS DataSync.
- CUse Amazon Kinesis Data Streams to access and integrate third-party datasets from AWS CodeCommit repositories.
- DUse Amazon Kinesis Data Streams to access and integrate third-party datasets from Amazon Elastic Container Registry (Amazon ECR).
Correct Answer:
A
A
GPT-4o - Answer
Using AWS DataSync for integrating third-party datasets effectively minimizes operational overhead by allowing for automated data transfers. While some may suggest AWS Data Exchange due to its specific design for dataset acquisition, the key attribute of DataSync is its ability to reduce manual effort and simplify the data integration process, directly addressing the company’s need to incorporate third-party datasets efficiently. This automation can significantly speed up the process compared to other methods that might require more manual configuration and oversight.
send
light_mode
delete
Question #4
A financial company wants to implement a data mesh. The data mesh must support centralized data governance, data analysis, and data access control. The company has decided to use AWS Glue for data catalogs and extract, transform, and load (ETL) operations.
Which combination of AWS services will implement a data mesh? (Choose two.)
Which combination of AWS services will implement a data mesh? (Choose two.)
- AUse Amazon Aurora for data storage. Use an Amazon Redshift provisioned cluster for data analysis.
- BUse Amazon S3 for data storage. Use Amazon Athena for data analysis.Most Voted
- CUse AWS Glue DataBrew for centralized data governance and access control.
- DUse Amazon RDS for data storage. Use Amazon EMR for data analysis.
- EUse AWS Lake Formation for centralized data governance and access control.Most Voted
Correct Answer:
BE
GPT-4o - Answer
For implementing a data mesh architecture with effective centralized data governance, the integration of AWS services like Amazon RDS and Amazon EMR, supplemented by AWS Glue for ETL and cataloging tasks, forms a robust solution. While AWS Lake Formation is indeed a strong candidate for centralized data governance, the combination of Amazon RDS for reliable data storage and Amazon EMR for comprehensive data analysis covers a broader scope of requirements for data processing and management within a financial company's data mesh framework. However, it's essential to clarify that AWS Glue DataBrew, as noted, primarily serves data preparation rather than extensive governance or access control.
send
light_mode
delete
Question #5
A data engineer maintains custom Python scripts that perform a data formatting process that many AWS Lambda functions use. When the data engineer needs to modify the Python scripts, the data engineer must manually update all the Lambda functions.
The data engineer requires a less manual way to update the Lambda functions.
Which solution will meet this requirement?
The data engineer requires a less manual way to update the Lambda functions.
Which solution will meet this requirement?
- AStore a pointer to the custom Python scripts in the execution context object in a shared Amazon S3 bucket.
- BPackage the custom Python scripts into Lambda layers. Apply the Lambda layers to the Lambda functions.Most Voted
- CStore a pointer to the custom Python scripts in environment variables in a shared Amazon S3 bucket.
- DAssign the same alias to each Lambda function. Call reach Lambda function by specifying the function's alias.
Correct Answer:
B
B
GPT-4o - Answer
The ideal solution for streamlining the update process for AWS Lambda functions using shared custom Python scripts is to utilize Lambda layers. This method allows the data engineer to manage and revise shared code efficiently. By integrating the custom scripts into a Lambda layer, all associated Lambda functions that implement this layer can be instantly updated with any changes to the scripts. This approach eliminates the necessity for manual updates to each function, ensuring both time efficiency and consistency in code execution across multiple functions. This aligns perfectly with the concept and benefits of Lambda layers as outlined by AWS documentation.
send
light_mode
delete
Question #6
A company created an extract, transform, and load (ETL) data pipeline in AWS Glue. A data engineer must crawl a table that is in Microsoft SQL Server. The data engineer needs to extract, transform, and load the output of the crawl to an Amazon S3 bucket. The data engineer also must orchestrate the data pipeline.
Which AWS service or feature will meet these requirements MOST cost-effectively?
Which AWS service or feature will meet these requirements MOST cost-effectively?
- AAWS Step Functions
- BAWS Glue workflowsMost Voted
- CAWS Glue Studio
- DAmazon Managed Workflows for Apache Airflow (Amazon MWAA)
Correct Answer:
B
B
GPT-4o - Answer
AWS Glue workflows are indeed powerful, allowing for the orchestration of ETL pipelines. They enable you to define a sequence of actions, like crawlers and jobs, that handle and process data. Glue workflows are designed to be a comprehensive and straightforward approach for data pipeline management within AWS, integrating seamlessly with other AWS services and providing a cost-effective solution. Thus, they suit the needs described for effectively managing ETL processes from SQL Server to S3 within a workflow context.
send
light_mode
delete
Question #7
A financial services company stores financial data in Amazon Redshift. A data engineer wants to run real-time queries on the financial data to support a web-based trading application. The data engineer wants to run the queries from within the trading application.
Which solution will meet these requirements with the LEAST operational overhead?
Which solution will meet these requirements with the LEAST operational overhead?
- AEstablish WebSocket connections to Amazon Redshift.
- BUse the Amazon Redshift Data API.Most Voted
- CSet up Java Database Connectivity (JDBC) connections to Amazon Redshift.
- DStore frequently accessed data in Amazon S3. Use Amazon S3 Select to run the queries.
Correct Answer:
B
GPT-4o - Answer
The Amazon Redshift Data API is a superb choice for executing SQL queries straight from applications, mainly due to its simplicity in not requiring the management of continuous connections or drivers, which WebSocket or JDBC solutions would demand. It’s especially effective for scenarios needing integration within application logic without the operational complexity of managing connections, making it ideal for real-time queries in a web-based trading application, providing a more straightforward, effective solution with minimal overhead.
send
light_mode
delete
Question #8
A company uses Amazon Athena for one-time queries against data that is in Amazon S3. The company has several use cases. The company must implement permission controls to separate query processes and access to query history among users, teams, and applications that are in the same AWS account.
Which solution will meet these requirements?
Which solution will meet these requirements?
- ACreate an S3 bucket for each use case. Create an S3 bucket policy that grants permissions to appropriate individual IAM users. Apply the S3 bucket policy to the S3 bucket.
- BCreate an Athena workgroup for each use case. Apply tags to the workgroup. Create an IAM policy that uses the tags to apply appropriate permissions to the workgroup.Most Voted
- CCreate an IAM role for each use case. Assign appropriate permissions to the role for each use case. Associate the role with Athena.
- DCreate an AWS Glue Data Catalog resource policy that grants permissions to appropriate individual IAM users for each use case. Apply the resource policy to the specific tables that Athena uses.
Correct Answer:
B
GPT-4o - Answer
To manage permissions and access to query history optimally in Amazon Athena, while using the same AWS account, implementing Athena workgroups for each use case is the most effective approach. By designating separate workgroups, you can isolate query processes and maintain control over permissions and query history for individual teams and users. This can be further enhanced by tagging workgroups, facilitating efficient management and simplification of IAM policies based on specific tags, catering to different use cases. This setup not only ensures operational efficiency but also strengthens data security and governance within the organization.
send
light_mode
delete
Question #9
A data engineer needs to schedule a workflow that runs a set of AWS Glue jobs every day. The data engineer does not require the Glue jobs to run or finish at a specific time.
Which solution will run the Glue jobs in the MOST cost-effective way?
Which solution will run the Glue jobs in the MOST cost-effective way?
- AChoose the FLEX execution class in the Glue job properties.
- BUse the Spot Instance type in Glue job properties.
- CChoose the STANDARD execution class in the Glue job properties.
- DChoose the latest version in the GlueVersion field in the Glue job properties.
Correct Answer:
A
A
GPT-4o - Answer
Choosing the FLEX execution class in AWS Glue job properties indeed offers a cost-effective solution for running jobs that do not have stringent timing requirements. The FLEX option utilizes the available idle resources within the AWS ecosystem. This functionality is particularly beneficial for data integration tasks that lack urgency, utilizing spare compute capacity without tying you to dedicated resources, thereby optimizing costs. This approach aligns perfectly with scenarios where precise job execution timings are flexible, making it the most suitable option for scheduling daily, non-time-sensitive workflows.
send
light_mode
delete
Question #10
A data engineer needs to create an AWS Lambda function that converts the format of data from .csv to Apache Parquet. The Lambda function must run only if a user uploads a .csv file to an Amazon S3 bucket.
Which solution will meet these requirements with the LEAST operational overhead?
Which solution will meet these requirements with the LEAST operational overhead?
- ACreate an S3 event notification that has an event type of s3:ObjectCreated:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.Most Voted
- BCreate an S3 event notification that has an event type of s3:ObjectTagging:* for objects that have a tag set to .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.
- CCreate an S3 event notification that has an event type of s3:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.
- DCreate an S3 event notification that has an event type of s3:ObjectCreated:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set an Amazon Simple Notification Service (Amazon SNS) topic as the destination for the event notification. Subscribe the Lambda function to the SNS topic.
Correct Answer:
A
GPT-4o - Answer
The correct implementation involves setting up an S3 event notification configured to s3:ObjectCreated:* with a filter rule that focuses on files ending in .csv. This setup ensures that the Lambda function is triggered only when a .csv file is uploaded, meeting the specific data processing requirement with minimal operational hassle. It leverages direct invocation of the Lambda function without the need for intermediary services like SNS, thereby streamlining the data processing workflow and reducing latency and complexity. This approach checks all boxes for efficiency and alignment with the scoped requirement of the data transformation task.
send
light_mode
delete
All Pages
