Reinforcement Learning – Guide for Beginners
This article was published as a part of the Data Science Blogathon
This article aims to provide you with sufficient knowledge of the most important type of machine learning, i.e., reinforcement learning. Reinforcement Learning is based on a self-learning mechanism (i.e. it does not need extra data and training resources). So, you might be thinking without data how machines are going to learn. The answer is “FEEDBACK”.
In Reinforcement Learning the machines learn with interactive feedback. They did some job and the user provides feedback for the same, if the feedback is positive the machine continues that work and if the feedback is negative the machine changes the work. Sometimes it is also called experienced learning. The major applications of RL are in game-playing and robotics, as their decision-making is sequential, and the goal is long-term. The primary goal of this algorithm is to improve efficiency by collecting the maximum positive rewards or by reducing the punishment rate.
EXAMPLE: To understand it in a better way let’s take a gaming example. Consider a 3 * 3 grid, consisting of fire and diamond. The diamond is a positive reward for the agent while the fire represents the negative one. The goal of the agent is to move in the grid by taking a path that reaches the diamond. Every time an agent reaches the fire it has to change its path to get a positive reward.
The code will look like the following:
import gym # create the environment env = gym.make("CartPole-v1") # reset the environment before starting env.reset() # loop 10 times for i in range(10): # take a random action env.step(env.action_space.sample()) # render the game env.render() # close the environment env.close()
Terminologies used in Reinforcement Learning
1.Agent- An entity that acts upon the environment or we can say the learner or machine that takes rewards.
2.Environment- The area where agents learn or take rewards. It can also be a situation or problem where an algorithm is applied.
3.Action- Steps were taken by the machine to do a task.
4.Reward -Reward is the most important element of RL. It describes the performance of a machine. The reward is of two types positive reward and negative reward. It is generally a scalar value.
5.Policy — It is the function or control strategy for the decision-making process of the agent. It shows a mapping from situations to actions.
Types of Reinforcement learning
There are mainly two types of reinforcement learning:
Positive Reinforcement Learning
Negative Reinforcement Learning
Positive Reinforcement Learning:
Positive reinforcement learning means something that increases the value of decision-making by providing happy rewards. It adds something to enhance the tendency that expected behavior would occur again and again. The behavior of the agent is positively impacted and the strength of the behavior is increased.
The changes for this type of reinforcement can sustain for a long time. Sometimes, the machine may forget to learn due to too many positive rewards, which is not a good sign.
Negative Reinforcement Learning:
Negative reinforcement learning is the just opposite of positive reinforcement. it expects a change in behavior by avoiding the negative condition. It occurs by getting poor rewards or punishment by the user. This type of learning is very powerful as it teaches more. Whenever the agent gets punished it has to change its behavior and that is how it learns more.
Based on situation and behavior, it can be more effective than positive reinforcement learning.
Reinforcement Learning Algorithms
Algorithms of Reinforcement learning are highly used in AI and gaming applications. The most common algorithms are:
It is a commonly used model-free RL algorithm. Q-learning is an Off policy RL method where the learning of the agent is based on the value of action that a* derived from other policies. It is used for temporal difference learning. To compare temporally successive predictions the temporal difference learning methods are used.
It follows the value function Q (S, a), which infer how good it is to take action “a” at a particular state “s.” The below diagram explains the working of Q- learning:
State Action Reward State action (SARSA):
SARSA stands for State Action Reward State action, it is another commonly used model-free RL algorithm. It is an on-policy temporal difference learning method. In the on-policy control methods, the selection of actions for each state at the time of learning is done by a specific policy. The learning of value is based on its current action derived from its current policy.
SARSA is named as it uses the quintuple Q(s, a, r, s’, a’). Where,
s: original state
a: Original action
r: reward observed while following the states
s’: a’: New state, action pair.
Difference between Reinforcement Learning and Supervised Learning
Reinforcement Learning and Supervised Learning are both parts of machine learning, but both types of learning are too different, like the north pole or south pole. supervised learning algorithms predict the output based on the training and learn from the labeled dataset. Whereas the RL algorithms learn from feedback or experiences. In RL, the agent performs some action in the environment and gets rewarded for the same.
The difference between RL and Supervised learning is shown below :
|Reinforcement Learning||Supervised Learning|
From the above discussion, it can be concluded that Reinforcement Learning is one of the most interesting and useful types of Machine learning. In RL, without any human intervention, the agent explores or visits the environment. It is the main learning algorithm that is used in Artificial Intelligence. But it is not effectively used in some cases, such as if we are provided with enough data to solve the problem, then other ML algorithms can be used more efficiently.
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.