YARN – Yet Another Resource Negotiator

Ayushi Last Updated : 07 Jan, 2022

5 min read

In today’s world, data is being generated at an ever-growing pace, leading to a boom in demand for Big Data tools such as Hadoop, Pig, Spark, Hive, and many more. The tool that stands out the most is Apache Hadoop, and one of its core components is YARN.

Apache Hadoop YARN, or as it is called Yet Another Resource Negotiator. It is an upgrade to MapReduce present in Hadoop version 1.0 because it is a mighty and efficient resource manager that helps support applications such as HBase, Spark, and Hive. The main idea behind YARN is a layer that is used to split up the resource management layer and processing component layer.

YARN can work parallelly with various applications, bringing greater efficiency while processing the data. YARN is responsible for providing resources such as storage space and also functions as a resource manager.

Before the framework received its official name, it was known as MapReduce2, which is used by YARN to manage and allocate resources for various processes and jobs efficiently.

Why YARN?

In the Hadoop version 1.0, we used MapReduce, which uses the functionalities of job trackers and task trackers for resource management, job scheduling, and monitoring of jobs. Still, after the introduction of YARN, these responsibilities are managed by the resource manager and application master.

Additionally, Hadoop version 2.0 introduces the YARN model, which is more isolated and scalable than its earlier version. YARN was designed to address many of the limitations in MapReduce, which are discussed below:

Scalability

Unlike MapReduce, the resource manager and application master has cluster-wise responsibilities which handle all the jobs and tasks. YARN provides a single application master per job, which is why YARN has high scalability, designed to scale up to 10,000 nodes and 100,000 studies.

Utilization

The YARN framework does not have any fixed slots for tasks that do not cause any issue of resource wastage. It provides a central resource manager which allows users to share multiple applications through a shared resource.

Compatibility

YARN introduces the concept of central resource management. YARN comes as the backwards-compatible framework, which enables Hadoop to support various kinds of applications along with MapReduce.

Reliability

MapReduce is based on a single master and multiple slave architecture, which means the master is down. The slave will not execute their operation, leading to a single failure point in Hadoop version 1.0. In contrast, Hadoop version 2.0 is based on the YARN, which is more reliable as it has the concept of multiple masters and slaves. Hence, if, in any case, the master is unavailable, there is another master that will act as a backup to resume its process and continue its execution.

Components Of YARN

_{Resource Manager}

The Resource Manager has the highest authority as it manages the allocation of resources. It runs many services, including the resource scheduler, which decides how to assign the resources.

Resource Manager contains the metadata regarding the location and number of resources available to the data nodes, collectively known as Rack awareness. The Resource Manager is present on each cluster and can accept processes from the users and allocate resources to them.

The resource manager can be broken down into two sub-parts:

Application Manager

The application manager is responsible for validating the jobs submitted to the resource manager. It verifies if the system has enough resources to run the job and rejects them if it is out of resources. It also ensures that there is no other application with the same ID that has already been submitted that could cause an error. After performing these checks, it finally forwards the application to the scheduler.

Schedulers

The scheduler as the name suggests is responsible for scheduling the tasks. The scheduler neither monitors nor tracks the status of the application nor does restart if there is any failure occurred. There are three types of Schedulers available in YARN: FIFO [First In, First Out] schedulers, capacity schedulers, and Fair schedulers. Out of these clusters to run large jobs executed promptly, it’s better to use Capacity and Fair Schedulers.

_{Node Manager}

The Node Manager works as a slave installed at each node and functions as a monitoring and reporting agent for the Resource Manager. It also transmits the health status of each node to the Resource Manager and offers resources to the cluster.

It is responsible for monitoring resource usage by individual containers and reporting it to the Resource manager. The Node Manager can also kill or destroy the container if it receives a command from the Resource Manager to do so.

It also monitors the usage of resources, performs log Management, and helps in creating container processes and executing them on the request of the application master.

Now we shall discuss the components of Node Manager:

Containers

Containers are a fraction of Node Manager capacity, whose responsibility is to provide physical resources like a disk space on a single node. All of the actual processing takes place inside the container. An application can use a specific amount of memory from the CPU only after permission has been granted by the container.

Application Master

Application master is the framework-specific process that negotiates resources for a single application. It works along with the Node Manager and monitors the execution of the task. Application Master also sends heartbeats to the resource manager which provides a report after the application has started. Application Master requests the container in the node manager by launching CLC (Container Launch Context) which takes care of all the resources required an application needs to execute.

Steps to Submit YARN Application

1. To run an application client connects the resource manager and requests the new application ID. Resource Manager forwards the ID and available resources, depending on various constraints such as queue capacity, access control list, etc.

2. Resource Manager then provides application master, which launches containers to carry out the operations required for the job.

3. Application Master registers to the Resource Manager, which allows AM to get the necessary details about the job, which helps select an optimal number of containers.

4. The resource manager permits app master, which will select appropriate containers required for the job and performs processing of the job.

5. On successful allocation of containers, the application master launches containers by container launch specifications to the Node Manager. The launch specifications include the required information needed by the container that allows communication with the application master.

6. The code running inside the container consists of information such as progress, status, etc., which is forwarded to the application master by application-specific protocol.

7. The client that submits the program communicates directly with the application master to collect the info like status, progress update, etc., with the help of application-specific protocol.

8. Once all the tasks are performed, the AM will free the resources occupied by shutting them down and disconnects itself from the resource manager to get utilized again.

_ENDNOTES

To sum it up, I hope you found the resources helpful to get an overview of YARN and get a basic about its components and understand its architecture. I hope you have understood every concept in this article. If you enjoyed this article, share it with your friends and colleagues! If you have any queries, reach out to me in the comments below. Till then, KEEP LEARNING 🙂

Ayushi

Big data Data Science

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.6

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

YARN – Yet Another Resource Negotiator

Why YARN?

Components Of YARN

_{Resource Manager}

_{Node Manager}

Steps to Submit YARN Application

_ENDNOTES

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Reading list

Introduction to Deep Learning

Feed Forward Networks

Gradient Descent

Loss Function

Activation Functions

Introduction to Neural networks

Forward and Backward Propagation

Optimizers

Learning Rate Schedulers

NN on Structured Data

Improving the Deep Learning Model

Deep Learning Model Optimization

Unsupervised Deep Learning

AutoDL

Model Deployment

Introduction to PyTorch

YARN – Yet Another Resource Negotiator

Why YARN?

Components Of YARN

Resource Manager

Node Manager

Steps to Submit YARN Application

ENDNOTES

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

_{Resource Manager}

_{Node Manager}

_ENDNOTES