Apache Oozie - Interview Questions

What is Apache Oozie?

 FAQ

Apache Oozie is a workflow scheduler engine to manage and schedule Apache Hadoop jobs. Oozie supports different kinds of Hadoop jobs out of the box such as MapReduce jobs, Streaming jobs, Pig, Hive and Scoop. Oozie also supports system specific jobs such as shell scripts and Java jobs.

What kind of application is Oozie?

 FAQ

Oozie is a Java Web-Application that runs in a Java servlet-container.

What is Apache Oozie Workflow?

 FAQ

Apache Oozie Workflow is a collection of actions; which are Hadoop MapReduce jobs, Pig jobs etc. The actions are arranged in a control dependency DAG (Direct Acyclic Graph), which controls how and when an action can be run. Oozie workflow definitions are written in hPDL, a XML Process Definition Language.

What are the key components of Apache Oozie Workflow?

 FAQ

Apache Oozie Workflow contains control flow nodes and action nodes.

Control Flow Nodes - Control flow nodes are the mechanisms that define the beginning and end of the workflow (start, end, fail). In addition, control flow nodes also provide mechanism to control the execution path of the workflow (decision, fork and join)

Action NodesAction nodes are the mechanisms which triggers the execution of a computation/processing task. Oozie provides support for different types of Hadoop actions out of the box - Hadoop MapReduce, Hadoop file system, Pig etc. In addition Oozie also provides support for system specific jobs - SSH, HTTP, eMail etc.

What are the different states of an Apache Oozie Workflow job?

 FAQ

An Apache Oozie Workflow job can have the following states - PREP , RUNNING , SUSPENDED , SUCCEEDED , KILLED and FAILED.

Big Data Interview Guide has over 150+ interview questions and answers. Get the guide for $49.95 only.
 
BUY EBOOK
 

Does Apache Oozie Workflow support cycles?

 FAQ

Apache Oozie Workflow does not support cycles. Apache Oozie WorkFlow definitions must be a strict DAG. At workflow application deployment time, if Oozie detects a cycle in the workflow definition then it fails the deployment.

What are the different control flow nodes supported by Apache Oozie workflows that start and end the workflow?

 FAQ

Apache Oozie workflow supports the following control flow nodes that start or end the workflow execution.

Start Control Node - The start node is the first node that a Oozie workflow job transitions to and is the entry point for a workflow job. Every Apache Oozie workflow definition must have one start node.

End Control Node - The end node is last node that a Oozie workflow job transitions to and it indicates that the workflow job has completed successfully. When a workflow job reaches the end node it finishes successfully and the job status changes to SUCCEEDED. Every Apache Oozie workflow definition must have one end node.

Kill Control Node - The kill node allows a workflow job to kill itself. When a workflow job reaches the kill node it finishes in error and the status of the job changes to KILLED.

What are the different control flow nodes supported by Apache Oozie workflows that control the workflow execution path?

 FAQ

Apache Oozie workflow supports the following control flow nodes that control the execution path of the workflow.

Decision Control Node - The decision control node is like a switch-case statement, which enables a workflow to make a selection on the execution path to follow.

Fork and Join Control Node - The fork and join control nodes are used in pairs and work as follows. The fork node splits a single path of execution into multiple concurrent paths of execution. The join node waits until every concurrent execution path of the corresponding fork node arrives to it.

What are the different Action nodes supported by Apache Oozie workflow?

 FAQ

Apache Oozie supports the following action nodes which trigger the execution of computation and processing tasks.

Map-Reduce Action - The map-reduce action node starts a Hadoop Map-Reduce job from a Oozie workflow.

Pig Action - The pig action node starts a Pig job from a Oozie workflow.

FS (HDFS) Action - The FS action node enables an Oozie workflow to manipulate HDFS files and directories. FS action nodes support the commands - move , delete , mkdir , chmod , touchz and chgrp .

SSH Action -

Sub-workflow Action -

Java Action - The java action node executes the public static void main(String[] args) method of the specified main Java class form a Oozie workflow.

Describe the life-cycle of Apache Oozie workflow job?

 FAQ

The Apache Oozie workflow job transitions through the following states.

PREP- An Oozie workflow job is in the PREP state when it is first created. In this state the workflow job is defined but is not running.

RUNNING - An Oozie workflow transitions to the RUNNING state when it is started. The workflow remains in RUNNING state while the workflow does not reach its end state, ends in error or it is suspended.

SUSPENDED - An Oozie workflow job transitions to SUSPENDED state if it is suspended. The workflow will remain in suspended state until it is resumed or it is killed.

SUCCEEDED - A RUNNING Oozie job transitions to the SUCCEEDED state when it reaches the end node.

KILLED - A CREATED, RUNNING or SUSPENDED workflow job transitions to a KILLED state when the workflow job is killed by an administrator.

FAILED - A RUNNING Oozie job transitions to a FAILED state when the workflow job fails with an unexpected error.

Big Data Interview Guide has over 150+ interview questions and answers. Get the guide for $49.95 only.
 
BUY EBOOK
 
 
Big Data Interview Guide

$29.95

BUY EBOOK
  SSL Secure Payment
Java Interview Quesiuons - Secure Payment
Big Data Interview Guide

$29.95

BUY EBOOK
  SSL Secure Payment
Java Interview Quesiuons - Secure Payment
 

Big Data - Interview Questions

Map ReduceApache FlumeApache KafkaApache HiveApache HueApache OozieApache Sqoop
 
RECOMMENDED RESOURCES
Behaviorial Interview
Top resource to prepare for behaviorial and situational interview questions.

STAR Interview Example