Hadoop YARN - Interview Questions

What is YARN?

 FAQKey Concept

Apache YARN, which stands for 'Yet Another Resource Negotiator', is Hadoop's cluster resource management system.

YARN provides APIs for requesting and working with Hadoop's cluster resources. These APIs are usually used by components of Hadoop's distributed frameworks such as MapReduce, Spark, Tez etc. which are build on top of YARN. User applications typically do not use the YARN APIs directly. Instead, they use higher level APIs provided by the framework (MapReduce, Spark, etc.) which hide the resource management details from the user.

What is YARN?

 FAQKey Concept

Apache YARN, which stands for 'Yet Another Resource Negotiator', is Hadoop's cluster resource management system.

YARN provides APIs for requesting and working with Hadoop's cluster resources. These APIs are usually used by components of Hadoop's distributed frameworks such as MapReduce, Spark, Tez etc. which are build on top of YARN. User applications typically do not use the YARN APIs directly. Instead, they use higher level APIs provided by the framework (MapReduce, Spark, etc.) which hide the resource management details from the user.

What are the key components of YARN?

 FAQKey Concept

The basic idea of YARN is to split the functionality of resource management and job scheduling/monitoring into separate daemons. YARN consists of the following different components

ResourceManager - The ResourceManager is a global component or daemon, one per cluster, which manages the requests to and resources across the nodes of the cluster.

NodeManager - NodeManger runs on each node of the cluster and is responsible for launching and monitoring containers and reporting the status back to the ResourceManager

ApplicationMaster is a per-application component that is responsible for negotiating resource requirements for the resource manager and working with NodeManagers to execute and monitor the tasks.

Container Container is YARN framework is a unix process running on the node that executes an application-specific process with a constrained set of resources (Memory, CPU, etc.)

What are the key components of YARN?

 FAQKey Concept

The basic idea of YARN is to split the functionality of resource management and job scheduling/monitoring into separate daemons. YARN consists of the following different components

ResourceManager - The ResourceManager is a global component or daemon, one per cluster, which manages the requests to and resources across the nodes of the cluster.

NodeManager - NodeManger runs on each node of the cluster and is responsible for launching and monitoring containers and reporting the status back to the ResourceManager

ApplicationMaster is a per-application component that is responsible for negotiating resource container requirements from the ResourceManager, and working with NodeManagers to execute and monitor the container tasks.

Container Container is YARN framework is a unix process running on the node that executes an application-specific process with a constrained set of resources (Memory, CPU, etc.). Container in YARN is an abstract notion and is not a physical component.

What is ResourceManager in YARN?

 FAQKey Concept

The YARN ResourceManager is a global component or daemon, one per cluster, which manages the requests to and resources across the nodes of the cluster.

The ResourceManager has two main components - Scheduler and ApplicationsManager

Scheduler - The scheduler is responsible for allocating resources to and starting applications based on the abstract notion of resource containers having a constrained set of resources.

ApplicationManager - The ApplicationsManager is responsible for accepting job-submissions, negotiating the first container for executing the application specific ApplicationMaster and provides the service for restarting the ApplicationMaster container on failure.

Big Data Interview Guide has over 150+ interview questions and answers. Get the guide for $29.95 only.
 
BUY EBOOK
 

What is ResourceManager in YARN?

 FAQKey Concept

The YARN ResourceManager is a global component or daemon, one per cluster, which manages the requests to and resources across the nodes of the cluster.

The ResourceManager has two main components - Scheduler and ApplicationsManager

Scheduler - The scheduler is responsible for allocating resources to and starting applications based on the abstract notion of resource containers having a constrained set of resources.

ApplicationManager - The ApplicationsManager is responsible for accepting job-submissions, negotiating the first container for executing the application specific ApplicationMaster and provides the service for restarting the ApplicationMaster container on failure.

What are the scheduling policies available in YARN?

 FAQKey Concept

YARN scheduler is responsible for scheduling resources to user applications based on a defined scheduling policy. YARN provides three scheduling options - FIFO scheduler, Capacity scheduler and Fair scheduler.

FIFO Scheduler - FIFO scheduler puts application requests in queue and runs them in the order of submission.

Capacity Scheduler - Capacity scheduler has a separate dedicated queue for smaller jobs and starts them as soon as they are submitted.

Fair Scheduler - Fair scheduler dynamically balances and allocates resources between all the running jobs.

How do you setup ResourceManager to use CapacityScheduler?

 FAQKey Concept

You can configure the ResourceManager to use CapacityScheduler by setting the value of property 'yarn.resourcemanager.scheduler.class' to 'org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler' in the file 'conf/yarn-site.xml'.

How do you setup ResourceManager to use FairScheduler?

 FAQKey Concept

You can configure the ResourceManager to use FairScheduler by setting the value of property 'yarn.resourcemanager.scheduler.class' to 'org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler' in the file 'conf/yarn-site.xml'.

How do you setup HA for ResourceManager?

 FAQKey Concept

ResourceManager is responsible for scheduling applications and tracking resources in a cluster. Prior to Hadoop 2.4, the ResourceManager does not have option to be setup for HA and is a single point of failure in a YARN cluster.

Since Hadoop 2.4, YARN ResourceManager can be setup for high availability. High availability of ResourceManager is enabled by use of Active/Standby architecture. At any point of time, one ResourceManager is active and one or more of ResourceManagers are in the standby mode. In case the active ResourceManager fails, one of the standby ResourceManagers transitions to a active mode

Big Data Interview Guide has over 150+ interview questions and answers. Get the guide for $29.95 only.
 
BUY EBOOK
 
 
Big Data Interview Guide

$29.95

BUY EBOOK
  SSL Secure Payment
Java Interview Quesiuons - Secure Payment
Big Data Interview Guide

$29.95

BUY EBOOK
  SSL Secure Payment
Java Interview Quesiuons - Secure Payment
 

Apache Hadoop - Interview Questions

Hadoop BasicsHadoop MapReduceHadoop YARNHadoop HDFS
 
RECOMMENDED RESOURCES
Behaviorial Interview
Top resource to prepare for behaviorial and situational interview questions.

STAR Interview Example