CCD-410 Exam - Free Exam Q&As, Page 3

Question #11

You have just executed a MapReduce job. Where is intermediate data written to after being emitted from the Mapper's map method?

A
Intermediate data in streamed across the network from Mapper to the Reduce and is never written to disk.
B
Into in-memory buffers on the TaskTracker node running the Mapper that spill over and are written into HDFS.
C
Into in-memory buffers that spill over to the local file system of the TaskTracker node running the Mapper.
D
Into in-memory buffers that spill over to the local file system (outside HDFS) of the TaskTracker node running the Reducer
E
Into in-memory buffers on the TaskTracker node running the Reducer that spill over and are written into HDFS.

Correct Answer:
D
The mapper output (intermediate data) is stored on the Local file system (NOT HDFS) of each individual mapper nodes. This is typically a temporary directory location which can be setup in config by the hadoop administrator. The intermediate data is cleaned up after the Hadoop Job completes.
Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, Where is the Mapper Output (intermediate kay-value data) stored ?

Show Answer

send

light_mode delete

Question #12

You want to understand more about how users browse your public website, such as which pages they visit prior to placing an order. You have a farm of 200 web servers hosting your website. How will you gather this data for your analysis?

A
Ingest the server web logs into HDFS using Flume.
B
Write a MapReduce job, with the web servers for mappers, and the Hadoop cluster nodes for reduces.
C
Import all users' clicks from your OLTP databases into Hadoop, using Sqoop.
D
Channel these clickstreams inot Hadoop using Hadoop Streaming.
E
Sample the weblogs from the web servers, copying them into Hadoop using curl.

Correct Answer:
B
Hadoop MapReduce for Parsing Weblogs
Here are the steps for parsing a log file using Hadoop MapReduce:
Load log files into the HDFS location using this Hadoop command: hadoop fs -put <local file path of weblogs> <hadoop HDFS location>
The Opencsv2.3.jar framework is used for parsing log records.
Below is the Mapper program for parsing the log file from the HDFS location. public static class ParseMapper extends Mapper<Object, Text, NullWritable,Text >{ private Text word = new Text(); public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
CSVParser parse = new CSVParser(' ','\"');
String sp[]=parse.parseLine(value.toString());
int spSize=sp.length;
StringBuffer rec= new StringBuffer();
for(int i=0;i<spSize;i++){
rec.append(sp[i]);
if(i!=(spSize-1))
rec.append(",");
}
word.set(rec.toString());
context.write(NullWritable.get(), word);
}
}
The command below is the Hadoop-based log parse execution. TheMapReduce program is attached in this article. You can add extra parsing methods in the class. Be sure to create a new JAR with any change and move it to the Hadoop distributed job tracker system. hadoop jar <path of logparse jar> <hadoop HDFS logfile path> <output path of parsed log file>
The output file is stored in the HDFS location, and the output file name starts with "part-".

Show Answer

send

light_mode delete

Question #13

MapReduce v2 (MRv2/YARN) is designed to address which two issues?

A
Single point of failure in the NameNode.
B
Resource pressure on the JobTracker.
C
HDFS latency.
D
Ability to run frameworks other than MapReduce, such as MPI.
E
Reduce complexity of the MapReduce APIs.
F
Standardize on a single MapReduce API.

Correct Answer:
BD
YARN (Yet Another Resource Negotiator), as an aspect of Hadoop, has two major kinds of benefits:
* (D) The ability to use programming frameworks other than MapReduce.
/ MPI (Message Passing Interface) was mentioned as a paradigmatic example of a MapReduce alternative
* Scalability, no matter what programming framework you use.
Note:
* The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring, into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job in the classical sense of Map-Reduce jobs or a DAG of jobs.
* (B) The central goal of YARN is to clearly separate two things that are unfortunately smushed together in current Hadoop, specifically in (mainly) JobTracker:
/ Monitoring the status of the cluster with respect to which nodes have which resources available. Under YARN, this will be global.
/ Managing the parallelization execution of any specific job. Under YARN, this will be done separately for each job.
The current Hadoop MapReduce system is fairly scalable "" Yahoo runs 5000 Hadoop jobs, truly concurrently, on a single cluster, for a total 1.5 "" 2 millions jobs/ cluster/month. Still, YARN will remove scalability bottlenecks
Reference: Apache Hadoop YARN "" Concepts & Applications

Show Answer

send

light_mode delete

Question #14

You need to run the same job many times with minor variations. Rather than hardcoding all job configuration options in your drive code, you've decided to have your Driver subclass org.apache.hadoop.conf.Configured and implement the org.apache.hadoop.util.Tool interface.
Indentify which invocation correctly passes.mapred.job.name with a value of Example to Hadoop?

A
hadoop "mapred.job.name=Example" MyDriver input output
B
hadoop MyDriver mapred.job.name=Example input output
C
hadoop MyDrive ""D mapred.job.name=Example input output
D
hadoop setproperty mapred.job.name=Example MyDriver input output
E
hadoop setproperty ("mapred.job.name=Example") MyDriver input output

Correct Answer:
C
Configure the property using the -D key=value notation:
-D mapred.job.name='My Job'
You can list a whole bunch of options by calling the streaming jar with just the -info argument
Reference: Python hadoop streaming : Setting a job name

Show Answer

send

light_mode delete

Question #15

You are developing a MapReduce job for sales reporting. The mapper will process input keys representing the year (IntWritable) and input values representing product indentifies (Text).
Indentify what determines the data types used by the Mapper for a given job.

A
The key and value types specified in the JobConf.setMapInputKeyClass and JobConf.setMapInputValuesClass methods
B
The data types specified in HADOOP_MAP_DATATYPES environment variable
C
The mapper-specification.xml file submitted with the job determine the mapper's input key and value types.
D
The InputFormat used by the job determines the mapper's input key and value types.

Correct Answer:
D
The input types fed to the mapper are controlled by the InputFormat used. The default input format, "TextInputFormat," will load data in as (LongWritable, Text) pairs. The long value is the byte offset of the line in the file. The Text object holds the string contents of the line of the file.
Note: The data types emitted by the reducer are identified by setOutputKeyClass() andsetOutputValueClass(). The data types emitted by the reducer are identified by setOutputKeyClass() and setOutputValueClass().
By default, it is assumed that these are the output types of the mapper as well. If this is not the case, the methods setMapOutputKeyClass() and setMapOutputValueClass() methods of the JobConf class will override these.
Reference: Yahoo! Hadoop Tutorial, THE DRIVER METHOD

Show Answer

send

light_mode delete

Cloudera CCD-410 Exam Practice Questions (P. 3)

Get Contributor Access

Download Demo PDF

Question #11

Question #12

Question #13

Question #14

Question #15

Best prices & offers

Latest Question

Expert Verified

Instant Download

High Success Rate

Follow Us

Cloudera CCD-410 Exam Practice Questions (P. 3)

Get Contributor Access

Download Demo PDF

Question #11

Question #12

Question #13

Question #14

Question #15