Hadoop MapReduce - Interview Questions

What Java classes are provided in Hadoop framework to support the implementation of MapReduce phases?

 FAQKey Concept

Hadoop provides the Mapper and Reducer classes that support the implementation of MapReduce phases.

Mapper - Hadoop framework provides the org.apache.hadoop.mapreduce.Mapper class that defines the abstract method map(). The programmer has to override the map() method and implement the map phase. The map() function takes three parameters - key, value and a Context object to which the output is written to.

Reducer - Hadoop framework provides the org.apache.hadoop.mapreduce.Reducer class that defines the abstract method reduce(). The programmer has to override the reduce() method and implement the reduce phase. Similar to the map() function, the reduce() function also takes three parameters - key, value and the Context object to write the output to.

The output types of the map() function should match the input types of the reduce() function.

What class does Hadoop framework provide to configure, execute and manage MapReduce jobs?

 FAQ

Hadoop provides the 'org.apache.hadoop.mapreduce.Job' class that is used to configure a MapReduce job, submit the job, control the execution of the job and query the state of the job.

What are the key steps in a MapReduce Job program?

 FAQ

Following are the key steps in a Job program.

1. Create an instance of Job class.

2. Set job specific parameters.

3. Set mapper, reducer and optionally a combiner class for the job.

4. Set input and output paths.

5. Submit job and poll for completion.

Below code snippet highlights these steps.

//create job
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, 'wordcount');
job.setJarByClass(WordCount.class);

//set job specific parameters
job.setJobName('wordcount');

//set mapper, reducer and combiner
job.setMapperClass(WordcountMapper.class);
job.setCombinerClass(WordcountReducer.class);
job.setReducerClass(WordcountReducer.class);

//set key and value classes
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

//set input and output paths
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

//Submit the job and poll for progress
System.exit(job.waitForCompletion(true) ? 0 : 1);

What happens if you attempt to call a set method on the Job class after the Job is submitted for execution?

 FAQ

Hadoop throws an IllegalStateException if you call the set method on the Job class after it is submitted for execution.

What class does the Hadoop framework provide to implement the MapReduce map phase?

 FAQ

Hadoop provides the org.apache.hadoop.mapreduce.Mapper class that can be extended to implement the map functionality. Mapper class maps input key/value pairs to a set of intermediate key/value pairs. Maps (Instances of Mapper class) are the individual tasks which transform input key/value pairs into intermediate key/value pairs. The transformed intermediate records need not be of the same type as the input records. An input pair may map to zero or many output pairs.

Big Data Interview Guide has over 150+ interview questions and answers. Get the guide for $29.95 only.
 
BUY EBOOK
 

What are the key methods provided in the Mapper class?

 FAQ

Mapper class provides the following methods.

setup() - setup() is called once at the beginning of the map task and is used to setup one time resources for the task.

map() - map() is called once for each key/value pair of the input split and perform the map functionality

cleanup() - cleanup() is called once at the end of the map task and is used to clean up resources.

run() - run() method can be overridden to get more complete control over the execution of the Mapper

What method of the Mapper class has to be overridden to implement the map functionality?

 FAQ

The map() method of the Mapper class has to be overridden to implement the map phase of the MapReduce functionality. The map() method is called once for each key/value pair in the input split.

What parameters are passed to the map() method of the Mapper class?

 FAQ

Key, Value and Context objects have to be passed to the map() function.

What is the role of the setup() method in the Mapper class?

 FAQ

setup() method is called once at the beginning of the map task and is used to setup resources.

What is the role of the cleanup() method in the Mapper class?

 FAQ

cleanup() method is called once at the end of the map task and is used to clean up resources.

Big Data Interview Guide has over 150+ interview questions and answers. Get the guide for $29.95 only.
 
BUY EBOOK
 

Write a sample map function()?

 FAQ

The word count program is a common MapReduce program that is commonly used to demonstrate the MapReduce functionality. You can use this program if the interviewer does not ask for a specific functioality.

The map() function in a word count program takes each line and input, splits the words, and outputs key/value pairs in the form of of .

//map function - key, value and context are passed as parameters

//key - line number
//value - line
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}

What are the three primary phases performed in a Reducer?

 FAQ

Following are the three primary phases of a Reducer.

Shuffle - In this phase the Reducer copies the sorted output from each Mapper using HTTP across the network.

Sort - In this phase the framework merge sorts Reducer inputs by keys (since different Mappers may have output the same key).The shuffle and sort phases occur simultaneously i.e. while outputs are being fetched they are merged.

Reduce - In this phase the reduce(Object, Iterable, org.apache.hadoop.mapreduce.Reducer.Context) method is called for each in the sorted inputs.The output of the reduce task is typically written to a RecordWriter via TaskInputOutputContext.write(Object, Object).

What are the key methods provided in the Reducer class?

 FAQ

Reducer class provides the following methods.

setup() - setup() is called once at the beginning of the map task and is used to setup one time resources for the reduce task.

reduce() - reduce() is called once for each key/value pair of the input split and perform the reduce functionality

cleanup() - cleanup() is called once at the end of the reduce task and is used to clean up resources.

run() - run() method can be overridden to get more complete control over the execution of the Mapper

What method of the Reducer class has to be overridden to implement the map functionality?

 FAQ

The reduce() method of the Mapper class has to be overridden to implement the reduce phase of the MapReduce functionality. The reduce() method is called once for each key.

What parameters are passed to the reduce() method of the Mapper class?

 FAQ

Key, Value and Context objects have to be passed to the map() function.

Big Data Interview Guide has over 150+ interview questions and answers. Get the guide for $29.95 only.
 
BUY EBOOK
 

Write a sample reduce() function

 FAQ

The word count program is a common MapReduce program that is commonly used to demonstrate the MapReduce functionality. You can use this program if the interviewer does not ask for a specific functioality.

The reduce() function in a word count program takes each key/value pairs output from the mapper and aggregates the count of reach key.

public void reduce(Text word, Iterable counts, Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable count : counts) {
sum += count.get();
}
context.write(word, new IntWritable(sum));
}
 
Big Data Interview Guide

$29.95

BUY EBOOK
  SSL Secure Payment
Java Interview Quesiuons - Secure Payment
Big Data Interview Guide

$29.95

BUY EBOOK
  SSL Secure Payment
Java Interview Quesiuons - Secure Payment
 

Apache Hadoop - Interview Questions

Hadoop BasicsHadoop MapReduceHadoop YARNHadoop HDFS
 
RECOMMENDED RESOURCES
Behaviorial Interview
Top resource to prepare for behaviorial and situational interview questions.

STAR Interview Example