Amazon AWS Certified Big Data - Specialty Exam Practice Questions (P. 4)
- Full Access (85 questions)
- Six months of Premium Access
- Access to one million comments
- Seamless ChatGPT Integration
- Ability to download PDF files
- Anki Flashcard files for revision
- No Captcha & No AdSense
- Advanced Exam Configuration
Question #16
An administrator needs to design the event log storage architecture for events from mobile devices. The event data will be processed by an Amazon EMR cluster daily for aggregated reporting and analytics before being archived.
How should the administrator recommend storing the log data?
How should the administrator recommend storing the log data?
- ACreate an Amazon S3 bucket and write log data into folders by device. Execute the EMR job on the device folders.
- BCreate an Amazon DynamoDB table partitioned on the device and sorted on date, write log data to table. Execute the EMR job on the Amazon DynamoDB table.
- CCreate an Amazon S3 bucket and write data into folders by day. Execute the EMR job on the daily folder.
- DCreate an Amazon DynamoDB table partitioned on EventID, write log data to table. Execute the EMR job on the table.
Correct Answer:
A
A
send
light_mode
delete
Question #17
A data engineer wants to use an Amazon Elastic Map Reduce for an application. The data engineer needs to make sure it complies with regulatory requirements. The auditor must be able to confirm at any point which servers are running and which network access controls are deployed.
Which action should the data engineer take to meet this requirement?
Which action should the data engineer take to meet this requirement?
- AProvide the auditor IAM accounts with the SecurityAudit policy attached to their group.
- BProvide the auditor with SSH keys for access to the Amazon EMR cluster.
- CProvide the auditor with CloudFormation templates.
- DProvide the auditor with access to AWS DirectConnect to use their existing tools.
Correct Answer:
C
C
send
light_mode
delete
Question #18
A social media customer has data from different data sources including RDS running MySQL, Redshift, and
Hive on EMR. To support better analysis, the customer needs to be able to analyze data from different data sources and to combine the results.
What is the most cost-effective solution to meet these requirements?
Hive on EMR. To support better analysis, the customer needs to be able to analyze data from different data sources and to combine the results.
What is the most cost-effective solution to meet these requirements?
- ALoad all data from a different database/warehouse to S3. Use Redshift COPY command to copy data to Redshift for analysis.
- BInstall Presto on the EMR cluster where Hive sits. Configure MySQL and PostgreSQL connector to select from different data sources in a single query.
- CSpin up an Elasticsearch cluster. Load data from all three data sources and use Kibana to analyze.
- DWrite a program running on a separate EC2 instance to run queries to three different systems. Aggregate the results after getting the responses from all three systems.
Correct Answer:
B
B
send
light_mode
delete
Question #19
An Amazon EMR cluster using EMRFS has access to petabytes of data on Amazon S3, originating from multiple unique data sources. The customer needs to query common fields across some of the data sets to be able to perform interactive joins and then display results quickly.
Which technology is most appropriate to enable this capability?
Which technology is most appropriate to enable this capability?
send
light_mode
delete
Question #20
A game company needs to properly scale its game application, which is backed by DynamoDB. Amazon
Redshift has the past two years of historical data. Game traffic varies throughout the year based on various factors such as season, movie release, and holiday season. An administrator needs to calculate how much read and write throughput should be provisioned for DynamoDB table for each week in advance.
How should the administrator accomplish this task?
Redshift has the past two years of historical data. Game traffic varies throughout the year based on various factors such as season, movie release, and holiday season. An administrator needs to calculate how much read and write throughput should be provisioned for DynamoDB table for each week in advance.
How should the administrator accomplish this task?
- AFeed the data into Amazon Machine Learning and build a regression model.
- BFeed the data into Spark Mlib and build a random forest modest.
- CFeed the data into Apache Mahout and build a multi-classification model.
- DFeed the data into Amazon Machine Learning and build a binary classification model.
Correct Answer:
B
B
send
light_mode
delete
All Pages