Shivani Sharma — June 14, 2021
Advanced Project Python Reinforcement Learning

This article was published as a part of the Data Science Blogathon

Introduction :

This article aims to provide you with sufficient knowledge of the most important type of machine learning, i.e., reinforcement learning. Reinforcement Learning is based on a self-learning mechanism (i.e. it does not need extra data and training resources). So, you might be thinking without data how machines are going to learn. The answer is “FEEDBACK”.

In Reinforcement Learning the machines learn with interactive feedback. They did some job and the user provides feedback for the same, if the feedback is positive the machine continues that work and if the feedback is negative the machine changes the work. Sometimes it is also called experienced learning. The major applications of RL are in game-playing and robotics, as their decision-making is sequential, and the goal is long-term. The primary goal of this algorithm is to improve efficiency by collecting the maximum positive rewards or by reducing the punishment rate.

EXAMPLE: To understand it in a better way let’s take a gaming example. Consider a 3 * 3 grid, consisting of fire and diamond. The diamond is a positive reward for the agent while the fire represents the negative one. The goal of the agent is to move in the grid by taking a path that reaches the diamond. Every time an agent reaches the fire it has to change its path to get a positive reward.

Reinforcement learning - GeeksforGeeks

The code will look like the following:

import gym
# create the environment
env = gym.make("CartPole-v1")
# reset the environment before starting
# loop 10 times
for i in range(10):
# take a random action
# render the game
# close the environment

Terminologies used in Reinforcement Learning

1.Agent- An entity that acts upon the environment or we can say the learner or machine that takes rewards.

2.Environment- The area where agents learn or take rewards. It can also be a situation or problem where an algorithm is applied.

3.Action- Steps were taken by the machine to do a task.

4.Reward -Reward is the most important element of RL. It describes the performance of a machine. The reward is of two types positive reward and negative reward. It is generally a scalar value.

5.Policy — It is the function or control strategy for the decision-making process of the agent. It shows a mapping from situations to actions.

Types of Reinforcement learning

There are mainly two types of reinforcement learning:

  • Positive Reinforcement Learning

  • Negative Reinforcement Learning

Positive Reinforcement Learning:

Positive reinforcement learning means something that increases the value of decision-making by providing happy rewards. It adds something to enhance the tendency that expected behavior would occur again and again. The behavior of the agent is positively impacted and the strength of the behavior is increased.

The changes for this type of reinforcement can sustain for a long time. Sometimes, the machine may forget to learn due to too many positive rewards, which is not a good sign.

Negative Reinforcement Learning:

Negative reinforcement learning is the just opposite of positive reinforcement. it expects a change in behavior by avoiding the negative condition. It occurs by getting poor rewards or punishment by the user. This type of learning is very powerful as it teaches more. Whenever the agent gets punished it has to change its behavior and that is how it learns more.

Based on situation and behavior, it can be more effective than positive reinforcement learning.

Reinforcement Learning Algorithms

Algorithms of Reinforcement learning are highly used in AI and gaming applications. The most common algorithms are:

  • Q-Learning:

It is a commonly used model-free RL algorithm. Q-learning is an Off policy RL method where the learning of the agent is based on the value of action that a* derived from other policies. It is used for temporal difference learning. To compare temporally successive predictions the temporal difference learning methods are used.

It follows the value function Q (S, a), which infer how good it is to take action “a” at a particular state “s.” The below diagram explains the working of Q- learning:

Reinforcement Learning Algorithms

  • State Action Reward State action (SARSA):

    • SARSA stands for State Action Reward State action, it is another commonly used model-free RL algorithm. It is an on-policy temporal difference learning method. In the on-policy control methods, the selection of actions for each state at the time of learning is done by a specific policy. The learning of value is based on its current action derived from its current policy.

    • SARSA is named as it uses the quintuple Q(s, a, r, s’, a’). Where,
      s: original state
      a: Original action
      r: reward observed while following the states
      s’: a’: New state, action pair.

Difference between Reinforcement Learning and Supervised Learning

Reinforcement Learning and Supervised Learning are both parts of machine learning, but both types of learning are too different, like the north pole or south pole. supervised learning algorithms predict the output based on the training and learn from the labeled dataset. Whereas the RL algorithms learn from feedback or experiences. In RL, the agent performs some action in the environment and gets rewarded for the same.

The difference between RL and Supervised learning is shown below :

Reinforcement Learning Supervised Learning
  • The working of RL depends upon the interaction of agents with the environment.
  • The working of Supervised learning depends upon the existing dataset.
  • The RL algorithm works more like the human brain works to make decisions.
  • In the decision-making process Supervised Learning works as when a human learns things under the supervision of a guide.
  • No labeled dataset is present in RL.
  • The labeled dataset is present in SL.
  • No prior training is given to the learning agent.
  • Pre-Training is provided to the algorithm so that it can predict the output correctly.
  • RL helps in the sequential decision-making process.
  • In Supervised learning, the decision-making process is complete input-based.


From the above discussion, it can be concluded that Reinforcement Learning is one of the most interesting and useful types of Machine learning. In RL, without any human intervention, the agent explores or visits the environment. It is the main learning algorithm that is used in Artificial Intelligence. But it is not effectively used in some cases, such as if we are provided with enough data to solve the problem, then other ML algorithms can be used more efficiently.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

About the Author

Our Top Authors

  • Analytics Vidhya
  • Guest Blog
  • Tavish Srivastava
  • Aishwarya Singh
  • Aniruddha Bhandari
  • Abhishek Sharma
  • Aarshay Jain

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *