Amazon AWS Certified Data Analytics - Specialty Exam Practice Questions (P. 4)
- Full Access (164 questions)
- Six months of Premium Access
- Access to one million comments
- Seamless ChatGPT Integration
- Ability to download PDF files
- Anki Flashcard files for revision
- No Captcha & No AdSense
- Advanced Exam Configuration
Question #16
A data analyst is using Amazon QuickSight for data visualization across multiple datasets generated by applications. Each application stores files within a separate Amazon S3 bucket. AWS Glue Data Catalog is used as a central catalog across all application data in Amazon S3. A new application stores its data within a separate S3 bucket. After updating the catalog to include the new application data source, the data analyst created a new Amazon QuickSight data source from an Amazon Athena table, but the import into SPICE failed.
How should the data analyst resolve the issue?
How should the data analyst resolve the issue?
- AEdit the permissions for the AWS Glue Data Catalog from within the Amazon QuickSight console.
- BEdit the permissions for the new S3 bucket from within the Amazon QuickSight console.Most Voted
- CEdit the permissions for the AWS Glue Data Catalog from within the AWS Glue console.
- DEdit the permissions for the new S3 bucket from within the S3 console.
Correct Answer:
B
Reference:
https://aws.amazon.com/blogs/big-data/harmonize-query-and-visualize-data-from-various-providers-using-aws-glue-amazon-athena-and-amazon- quicksight/
B
Reference:
https://aws.amazon.com/blogs/big-data/harmonize-query-and-visualize-data-from-various-providers-using-aws-glue-amazon-athena-and-amazon- quicksight/
send
light_mode
delete
Question #17
A team of data scientists plans to analyze market trend data for their company's new investment strategy. The trend data comes from five different data sources in large volumes. The team wants to utilize Amazon Kinesis to support their use case. The team uses SQL-like queries to analyze trends and wants to send notifications based on certain significant patterns in the trends. Additionally, the data scientists want to save the data to Amazon S3 for archival and historical re- processing, and use AWS managed services wherever possible. The team wants to implement the lowest-cost solution.
Which solution meets these requirements?
Which solution meets these requirements?
- APublish data to one Kinesis data stream. Deploy a custom application using the Kinesis Client Library (KCL) for analyzing trends, and send notifications using Amazon SNS. Configure Kinesis Data Firehose on the Kinesis data stream to persist data to an S3 bucket.
- BPublish data to one Kinesis data stream. Deploy Kinesis Data Analytic to the stream for analyzing trends, and configure an AWS Lambda function as an output to send notifications using Amazon SNS. Configure Kinesis Data Firehose on the Kinesis data stream to persist data to an S3 bucket.Most Voted
- CPublish data to two Kinesis data streams. Deploy Kinesis Data Analytics to the first stream for analyzing trends, and configure an AWS Lambda function as an output to send notifications using Amazon SNS. Configure Kinesis Data Firehose on the second Kinesis data stream to persist data to an S3 bucket.
- DPublish data to two Kinesis data streams. Deploy a custom application using the Kinesis Client Library (KCL) to the first stream for analyzing trends, and send notifications using Amazon SNS. Configure Kinesis Data Firehose on the second Kinesis data stream to persist data to an S3 bucket.
Correct Answer:
A
A

Choosing option A involves using the Kinesis Client Library (KCL), which complicates the handling of SQL-like queries for trend analysis, not optimized for such operations compared to Kinesis Data Analytics. Option B provides a streamlined solution using Kinesis Data Analytics, which is ideal for SQL-like queries. It integrates seamlessly with AWS Lambda for notification purposes via Amazon SNS and employs a single stream design, making it more cost-effective and efficient than using multiple streams or custom applications. Thus, Option B could actually be the more appropriate choice considering the requirements.
send
light_mode
delete
Question #18
A company currently uses Amazon Athena to query its global datasets. The regional data is stored in Amazon S3 in the us-east-1 and us-west-2 Regions. The data is not encrypted. To simplify the query process and manage it centrally, the company wants to use Athena in us-west-2 to query data from Amazon S3 in both
Regions. The solution should be as low-cost as possible.
What should the company do to achieve this goal?
Regions. The solution should be as low-cost as possible.
What should the company do to achieve this goal?
- AUse AWS DMS to migrate the AWS Glue Data Catalog from us-east-1 to us-west-2. Run Athena queries in us-west-2.
- BRun the AWS Glue crawler in us-west-2 to catalog datasets in all Regions. Once the data is crawled, run Athena queries in us-west-2.Most Voted
- CEnable cross-Region replication for the S3 buckets in us-east-1 to replicate data in us-west-2. Once the data is replicated in us-west-2, run the AWS Glue crawler there to update the AWS Glue Data Catalog in us-west-2 and run Athena queries.
- DUpdate AWS Glue resource policies to provide us-east-1 AWS Glue Data Catalog access to us-west-2. Once the catalog in us-west-2 has access to the catalog in us-east-1, run Athena queries in us-west-2.
Correct Answer:
C
C

Enabling cross-region replication for S3 buckets in us-east-1 to mirror data to us-west-2, followed by using AWS Glue crawler in us-west-2 to update the data catalog, offers a centralized and efficient method to manage and query data using Athena. Although this approach might initially seem costly due to replication, the simplification of data management and querying processes might offset these costs in a complex multi-region setup. Moreover, centralized querying can improve performance and reduce the time to insights in data analytics, justifying the initial investment in cross-region data synchronization. Remember to evaluate the balance between cost and operational efficiency when implementing such solutions.
send
light_mode
delete
Question #19
A large company receives files from external parties in Amazon EC2 throughout the day. At the end of the day, the files are combined into a single file, compressed into a gzip file, and uploaded to Amazon S3. The total size of all the files is close to 100 GB daily. Once the files are uploaded to Amazon S3, an
AWS Batch program executes a COPY command to load the files into an Amazon Redshift cluster.
Which program modification will accelerate the COPY process?
AWS Batch program executes a COPY command to load the files into an Amazon Redshift cluster.
Which program modification will accelerate the COPY process?
- AUpload the individual files to Amazon S3 and run the COPY command as soon as the files become available.
- BSplit the number of files so they are equal to a multiple of the number of slices in the Amazon Redshift cluster. Gzip and upload the files to Amazon S3. Run the COPY command on the files.Most Voted
- CSplit the number of files so they are equal to a multiple of the number of compute nodes in the Amazon Redshift cluster. Gzip and upload the files to Amazon S3. Run the COPY command on the files.
- DApply sharding by breaking up the files so the distkey columns with the same values go to the same file. Gzip and upload the sharded files to Amazon S3. Run the COPY command on the files.
Correct Answer:
B
Reference:
https://docs.aws.amazon.com/redshift/latest/dg/t_splitting-data-files.html
B
Reference:
https://docs.aws.amazon.com/redshift/latest/dg/t_splitting-data-files.html
send
light_mode
delete
Question #20
A large ride-sharing company has thousands of drivers globally serving millions of unique customers every day. The company has decided to migrate an existing data mart to Amazon Redshift. The existing schema includes the following tables.
✑ A trips fact table for information on completed rides.
✑ A drivers dimension table for driver profiles.
✑ A customers fact table holding customer profile information.
The company analyzes trip details by date and destination to examine profitability by region. The drivers data rarely changes. The customers data frequently changes.
What table design provides optimal query performance?
✑ A trips fact table for information on completed rides.
✑ A drivers dimension table for driver profiles.
✑ A customers fact table holding customer profile information.
The company analyzes trip details by date and destination to examine profitability by region. The drivers data rarely changes. The customers data frequently changes.
What table design provides optimal query performance?
- AUse DISTSTYLE KEY (destination) for the trips table and sort by date. Use DISTSTYLE ALL for the drivers and customers tables.
- BUse DISTSTYLE EVEN for the trips table and sort by date. Use DISTSTYLE ALL for the drivers table. Use DISTSTYLE EVEN for the customers table.
- CUse DISTSTYLE KEY (destination) for the trips table and sort by date. Use DISTSTYLE ALL for the drivers table. Use DISTSTYLE EVEN for the customers table.Most Voted
- DUse DISTSTYLE EVEN for the drivers table and sort by date. Use DISTSTYLE ALL for both fact tables.
Correct Answer:
A
A

Using DISTSTYLE ALL for the drivers' table in choice A takes full advantage of its infrequently changing nature by storing a complete copy on each node, reducing shuffling during queries involving driver data. For the trips table, employing DISTSTYLE KEY with 'destination' as the key helps in collocating related data on the same node, optimizing query performance for analyses based on destination and date. Although the customers table updates frequently, using DISTSTYLE ALL minimizes the need for redistribution when joining with other tables, enhancing performance despite potential concerns about space usage. This setup optimally balances query efficiency and update handling.
send
light_mode
delete
All Pages