Amazon AWS Certified Data Analytics - Specialty Exam Practice Questions (P. 3)
- Full Access (164 questions)
- Six months of Premium Access
- Access to one million comments
- Seamless ChatGPT Integration
- Ability to download PDF files
- Anki Flashcard files for revision
- No Captcha & No AdSense
- Advanced Exam Configuration
Question #11
A company that produces network devices has millions of users. Data is collected from the devices on an hourly basis and stored in an Amazon S3 data lake.
The company runs analyses on the last 24 hours of data flow logs for abnormality detection and to troubleshoot and resolve user issues. The company also analyzes historical logs dating back 2 years to discover patterns and look for improvement opportunities.
The data flow logs contain many metrics, such as date, timestamp, source IP, and target IP. There are about 10 billion events every day.
How should this data be stored for optimal performance?
The company runs analyses on the last 24 hours of data flow logs for abnormality detection and to troubleshoot and resolve user issues. The company also analyzes historical logs dating back 2 years to discover patterns and look for improvement opportunities.
The data flow logs contain many metrics, such as date, timestamp, source IP, and target IP. There are about 10 billion events every day.
How should this data be stored for optimal performance?
- AIn Apache ORC partitioned by date and sorted by source IPMost Voted
- BIn compressed .csv partitioned by date and sorted by source IP
- CIn Apache Parquet partitioned by source IP and sorted by date
- DIn compressed nested JSON partitioned by source IP and sorted by date
Correct Answer:
D
D

Apache ORC and Parquet are both efficient for the company's use case due to their columnar storage format, which accelerates big data retrievals for complex queries. However, proper partitioning is crucial. The company needs daily analysis, so ideally, data should be partitioned by date rather than source IP, to optimize for frequent access patterns focused on recent data. While sorting by source IP would facilitate specific IP-based searches, partitioning by date aligns better with the primary operational requirement of daily analysis. Therefore, despite D being listed as the correct answer, option A might be more aligned with the described requirements.
send
light_mode
delete
Question #12
A banking company is currently using an Amazon Redshift cluster with dense storage (DS) nodes to store sensitive data. An audit found that the cluster is unencrypted. Compliance requirements state that a database with sensitive data must be encrypted through a hardware security module (HSM) with automated key rotation.
Which combination of steps is required to achieve compliance? (Choose two.)
Which combination of steps is required to achieve compliance? (Choose two.)
- ASet up a trusted connection with HSM using a client and server certificate with automatic key rotation.Most Voted
- BModify the cluster with an HSM encryption option and automatic key rotation.
- CCreate a new HSM-encrypted Amazon Redshift cluster and migrate the data to the new cluster.Most Voted
- DEnable HSM with key rotation through the AWS CLI.
- EEnable Elliptic Curve Diffie-Hellman Ephemeral (ECDHE) encryption in the HSM.
Correct Answer:
BD
Reference:
https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-db-encryption.html
BD
Reference:
https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-db-encryption.html
send
light_mode
delete
Question #13
A company is planning to do a proof of concept for a machine learning (ML) project using Amazon SageMaker with a subset of existing on-premises data hosted in the company's 3 TB data warehouse. For part of the project, AWS Direct Connect is established and tested. To prepare the data for ML, data analysts are performing data curation. The data analysts want to perform multiple step, including mapping, dropping null fields, resolving choice, and splitting fields. The company needs the fastest solution to curate the data for this project.
Which solution meets these requirements?
Which solution meets these requirements?
- AIngest data into Amazon S3 using AWS DataSync and use Apache Spark scrips to curate the data in an Amazon EMR cluster. Store the curated data in Amazon S3 for ML processing.
- BCreate custom ETL jobs on-premises to curate the data. Use AWS DMS to ingest data into Amazon S3 for ML processing.
- CIngest data into Amazon S3 using AWS DMS. Use AWS Glue to perform data curation and store the data in Amazon S3 for ML processing.Most Voted
- DTake a full backup of the data store and ship the backup files using AWS Snowball. Upload Snowball data into Amazon S3 and schedule data curation jobs using AWS Batch to prepare the data for ML.
Correct Answer:
C
C

Using AWS Database Migration Service (DMS) to ingest data into Amazon S3 allows for efficient data transfer, per Amazon’s documentation. Once in S3, AWS Glue, which incorporates the power of Apache Spark, provides a potent and agile platform for data curation, including transformations and cleaning needed for machine learning readiness. This setup is generally faster and more scalable compared to on-premises solutions or manual interventions, resulting in a swift preparation of data for machine learning purposes. Hence, option C, which leverages both AWS DMS and AWS Glue, offers an effective path for data curation at scale.
send
light_mode
delete
Question #14
A US-based sneaker retail company launched its global website. All the transaction data is stored in Amazon RDS and curated historic transaction data is stored in Amazon Redshift in the us-east-1 Region. The business intelligence (BI) team wants to enhance the user experience by providing a dashboard for sneaker trends.
The BI team decides to use Amazon QuickSight to render the website dashboards. During development, a team in Japan provisioned Amazon QuickSight in ap- northeast-1. The team is having difficulty connecting Amazon QuickSight from ap-northeast-1 to Amazon Redshift in us-east-1.
Which solution will solve this issue and meet the requirements?
The BI team decides to use Amazon QuickSight to render the website dashboards. During development, a team in Japan provisioned Amazon QuickSight in ap- northeast-1. The team is having difficulty connecting Amazon QuickSight from ap-northeast-1 to Amazon Redshift in us-east-1.
Which solution will solve this issue and meet the requirements?
- AIn the Amazon Redshift console, choose to configure cross-Region snapshots and set the destination Region as ap-northeast-1. Restore the Amazon Redshift Cluster from the snapshot and connect to Amazon QuickSight launched in ap-northeast-1.
- BCreate a VPC endpoint from the Amazon QuickSight VPC to the Amazon Redshift VPC so Amazon QuickSight can access data from Amazon Redshift.
- CCreate an Amazon Redshift endpoint connection string with Region information in the string and use this connection string in Amazon QuickSight to connect to Amazon Redshift.
- DCreate a new security group for Amazon Redshift in us-east-1 with an inbound rule authorizing access from the appropriate IP address range for the Amazon QuickSight servers in ap-northeast-1.Most Voted
Correct Answer:
B
B

The issue regarding the connection between Amazon QuickSight in the ap-northeast-1 region and Amazon Redshift in the us-east-1 region is effectively approached by configuring a VPC endpoint. This solution enables Amazon QuickSight to access data from Amazon Redshift without the need for cross-region data transfers. It simplifies the architecture and secures data access within the AWS ecosystem, adhering to best practices for network configuration and maintaining high performance by avoiding unnecessary complexities and potential latency issues associated with cross-region connections.
send
light_mode
delete
Question #15
An airline has .csv-formatted data stored in Amazon S3 with an AWS Glue Data Catalog. Data analysts want to join this data with call center data stored in
Amazon Redshift as part of a dally batch process. The Amazon Redshift cluster is already under a heavy load. The solution must be managed, serverless, well- functioning, and minimize the load on the existing Amazon Redshift cluster. The solution should also require minimal effort and development activity.
Which solution meets these requirements?
Amazon Redshift as part of a dally batch process. The Amazon Redshift cluster is already under a heavy load. The solution must be managed, serverless, well- functioning, and minimize the load on the existing Amazon Redshift cluster. The solution should also require minimal effort and development activity.
Which solution meets these requirements?
- AUnload the call center data from Amazon Redshift to Amazon S3 using an AWS Lambda function. Perform the join with AWS Glue ETL scripts.
- BExport the call center data from Amazon Redshift using a Python shell in AWS Glue. Perform the join with AWS Glue ETL scripts.
- CCreate an external table using Amazon Redshift Spectrum for the call center data and perform the join with Amazon Redshift.Most Voted
- DExport the call center data from Amazon Redshift to Amazon EMR using Apache Sqoop. Perform the join with Apache Hive.
Correct Answer:
C
C

Redshift Spectrum is ideal for this scenario as it enables querying data directly stored in S3 using an external table strategy without imposing additional load on the Redshift cluster. This setup doesn’t require moving or duplicating existing datasets into Redshift, which would add unnecessary strain. Spectrum operates serverlessly, aligning seamlessly with the requirement for a managed and minimal-effort solution. By utilizing Spectrum, the airline can efficiently join their S3 data with the Redshift-based call center data, thus optimizing performance and resource utilization in an environment already experiencing high demands.
send
light_mode
delete
All Pages