AWS Certified Data Engineer - Associate DEA-C01 Exam - Free Exam Q&As, Page 2

Question #6

A company created an extract, transform, and load (ETL) data pipeline in AWS Glue. A data engineer must crawl a table that is in Microsoft SQL Server. The data engineer needs to extract, transform, and load the output of the crawl to an Amazon S3 bucket. The data engineer also must orchestrate the data pipeline.
Which AWS service or feature will meet these requirements MOST cost-effectively?

A
AWS Step Functions
B
AWS Glue workflows
Most Voted
C
AWS Glue Studio
D
Amazon Managed Workflows for Apache Airflow (Amazon MWAA)

Correct Answer:
B

GPT-4o - Answer

AWS Glue workflows are indeed powerful, allowing for the orchestration of ETL pipelines. They enable you to define a sequence of actions, like crawlers and jobs, that handle and process data. Glue workflows are designed to be a comprehensive and straightforward approach for data pipeline management within AWS, integrating seamlessly with other AWS services and providing a cost-effective solution. Thus, they suit the needs described for effectively managing ETL processes from SQL Server to S3 within a workflow context.

Show Answer

send

light_mode delete

Question #7

A financial services company stores financial data in Amazon Redshift. A data engineer wants to run real-time queries on the financial data to support a web-based trading application. The data engineer wants to run the queries from within the trading application.
Which solution will meet these requirements with the LEAST operational overhead?

A
Establish WebSocket connections to Amazon Redshift.
B
Use the Amazon Redshift Data API.
Most Voted
C
Set up Java Database Connectivity (JDBC) connections to Amazon Redshift.
D
Store frequently accessed data in Amazon S3. Use Amazon S3 Select to run the queries.

Correct Answer:
D

GPT-4o - Answer

The Amazon Redshift Data API is a superb choice for executing SQL queries straight from applications, mainly due to its simplicity in not requiring the management of continuous connections or drivers, which WebSocket or JDBC solutions would demand. It’s especially effective for scenarios needing integration within application logic without the operational complexity of managing connections, making it ideal for real-time queries in a web-based trading application, providing a more straightforward, effective solution with minimal overhead.

Show Answer

send

light_mode delete

Question #8

A company uses Amazon Athena for one-time queries against data that is in Amazon S3. The company has several use cases. The company must implement permission controls to separate query processes and access to query history among users, teams, and applications that are in the same AWS account.
Which solution will meet these requirements?

A
Create an S3 bucket for each use case. Create an S3 bucket policy that grants permissions to appropriate individual IAM users. Apply the S3 bucket policy to the S3 bucket.
B
Create an Athena workgroup for each use case. Apply tags to the workgroup. Create an IAM policy that uses the tags to apply appropriate permissions to the workgroup.
Most Voted
C
Create an IAM role for each use case. Assign appropriate permissions to the role for each use case. Associate the role with Athena.
D
Create an AWS Glue Data Catalog resource policy that grants permissions to appropriate individual IAM users for each use case. Apply the resource policy to the specific tables that Athena uses.

Correct Answer:
C

GPT-4o - Answer

To manage permissions and access to query history optimally in Amazon Athena, while using the same AWS account, implementing Athena workgroups for each use case is the most effective approach. By designating separate workgroups, you can isolate query processes and maintain control over permissions and query history for individual teams and users. This can be further enhanced by tagging workgroups, facilitating efficient management and simplification of IAM policies based on specific tags, catering to different use cases. This setup not only ensures operational efficiency but also strengthens data security and governance within the organization.

Show Answer

send

light_mode delete

Question #9

A data engineer needs to schedule a workflow that runs a set of AWS Glue jobs every day. The data engineer does not require the Glue jobs to run or finish at a specific time.
Which solution will run the Glue jobs in the MOST cost-effective way?

A
Choose the FLEX execution class in the Glue job properties.
B
Use the Spot Instance type in Glue job properties.
C
Choose the STANDARD execution class in the Glue job properties.
D
Choose the latest version in the GlueVersion field in the Glue job properties.

Correct Answer:
A

GPT-4o - Answer

Choosing the FLEX execution class in AWS Glue job properties indeed offers a cost-effective solution for running jobs that do not have stringent timing requirements. The FLEX option utilizes the available idle resources within the AWS ecosystem. This functionality is particularly beneficial for data integration tasks that lack urgency, utilizing spare compute capacity without tying you to dedicated resources, thereby optimizing costs. This approach aligns perfectly with scenarios where precise job execution timings are flexible, making it the most suitable option for scheduling daily, non-time-sensitive workflows.

Show Answer

send

light_mode delete

Question #10

A data engineer needs to create an AWS Lambda function that converts the format of data from .csv to Apache Parquet. The Lambda function must run only if a user uploads a .csv file to an Amazon S3 bucket.
Which solution will meet these requirements with the LEAST operational overhead?

A
Create an S3 event notification that has an event type of s3:ObjectCreated:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.
Most Voted
B
Create an S3 event notification that has an event type of s3:ObjectTagging:* for objects that have a tag set to .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.
C
Create an S3 event notification that has an event type of s3:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.
D
Create an S3 event notification that has an event type of s3:ObjectCreated:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set an Amazon Simple Notification Service (Amazon SNS) topic as the destination for the event notification. Subscribe the Lambda function to the SNS topic.

Correct Answer:
C

GPT-4o - Answer

The correct implementation involves setting up an S3 event notification configured to s3:ObjectCreated:* with a filter rule that focuses on files ending in .csv. This setup ensures that the Lambda function is triggered only when a .csv file is uploaded, meeting the specific data processing requirement with minimal operational hassle. It leverages direct invocation of the Lambda function without the need for intermediary services like SNS, thereby streamlining the data processing workflow and reducing latency and complexity. This approach checks all boxes for efficiency and alignment with the scoped requirement of the data transformation task.

Show Answer

send

light_mode delete

Amazon AWS Certified Data Engineer - Associate DEA-C01 Exam Practice Questions (P. 2)

Get Contributor Access

Download Demo PDF

Question #6

Question #7

Question #8

Question #9

Question #10

Best prices & offers

Latest Question

Expert Verified

Instant Download

High Success Rate

Follow Us

Amazon AWS Certified Data Engineer - Associate DEA-C01 Exam Practice Questions (P. 2)

Get Contributor Access

Download Demo PDF

Question #6

Question #7

Question #8

Question #9

Question #10