Understanding Reinforcement Learning with Panda Gym
This article was published as a part of the Data Science Blogathon
In this article, we introduce readers to panda-gym, an open-source library that provides a reinforcement learning environment for the Franka Emika Panda robot integrated with the Gym. It supports Atari game environments, board games, 2D and 3D physics simulations, and more. This allows you to train multiple agents, compare them, or develop new machine learning algorithms for reinforcement learning problems.
Reinforcement Learning and the machine learning environment play an important role in the development of an intelligent machine. The development environment used for machine learning is no less important than the machine learning methods used in solving predictive modelling problems.
In a reinforcement learning task, the environment constitutes the fundamental elements, so it is important to understand the underlying environment with which the RL agent will interact. Understanding the environment helps develop the correct design and teaching method for the agent being taught.
The environment is the agent’s world in which he lives; an agent interacts with the environment, performing some actions, but in performing these actions, he has no right to influence the rules or dynamics of the environment, just as people are agents in the terrestrial environment and are limited by its laws.
We can interact with the environment, but we cannot change the laws. The environment also rewards the agent: a scalar return value that acts as feedback, informing the agent whether his action was good or bad.
In reinforcement learning, many paradigms achieve a winning strategy, that is, they force the agent to perform the desired action in several ways. In difficult situations, calculating the precise winning strategy or reward function is difficult, especially when agents begin to find out from interactions instead of experience.
There are several types of learning environments:
A single-agent environment where just one agent exists and interacts with the environment.
Multi-agent environment: Where more than one agent is interacting with the environment.
A discrete environment, the action space of which is discrete.
A continuous environment, the action space of which is continuous in nature.
An episodic environment where the agent’s actions are limited only to a specific episode and are not related to previous actions.
A sequential environment where the agent’s actions are linked to her previous actions.
What is Gym Open AI?
For the development and comparison of reinforcement learning algorithms, an open-source toolkit is used that is generally known as Gym. It is made easier to work with because it allows you to structure your environment with just a few lines of code and is compatible with any computation library such as TensorFlow or Theano.
The Gym library is a collection of test problems and environments that you can use to teach and develop more effective reinforcement learning models. The presented environments have a common interface that allows you to write generic algorithms. The Gym provides a wide variety of simulation environments.
In the modelling environment, you can perform 5 tasks: reach, push, slide, pick, place, and stack items. It works with a multipurpose framework to enable purpose-oriented reinforcement learning algorithms.
The open-source physics engine PyBullet is used to facilitate open research. The implementation chosen for this package makes it easy to define new tasks or maybe create new robots.
Simulation and its problems
The presented environment consists of a Panda robotic arm referred to as Franka Emika1, which is widely utilized in simulations and real academic work. The robot has 7 degrees of freedom and a parallel finger grip. It is modelled using the PyBullet physics engine, which is open source to help demonstrate modelling performance.
The modelling task is to move or capture objects to a given position and is considered complete if the distance between the moving object and the target position is less than 5 cm.
The difficulty level of the five presented tasks is configurable. In the PandaReach-v1 task, you need to reach the target position using a capture; the target position is randomly generated in a volume of 30 cm × 30 cm × 30 cm.
In the PandaPush-v1 problem, a cube placed on the table must be pushed to the target position on the table surface under locked grip conditions. Here, the target and starting cube positions are randomly generated in a 30×30 cm square around the neutral position of the robot.
The task of the PandaSlide-v1 simulation is as follows: when the gripper is locked, a flat cylinder must be moved to a predetermined position on the table surface. The target position is randomly generated during a 50 × 50 cm square, this square is found 40 cm ahead of the robot during a neutral position.
Since the target positions are out of the reach of the robot, the object needs to be impulsed rather than simply pushed. In the PandaPickAndPlace-v1 simulation, you’d wish to bring the cube to the target position, created during a volume of 30 × 30 × 20 cm above the table. To pick up a cube, you need to pick it up with your gripper fingers.
In the PandaStack-v1 problem, two cubes must be placed in a given position on the table surface. The position of the target is formed in a square of 30 × 30 cm. For the correct installation put the red cube exactly under the green one. All these modelling problems are still being investigated and do not yet have an ideal solution.
Let’s try to run two simulations from the Panda Gym Challenge and understand what is required to develop and set up the environment. Install the library while working in Google Colab:
!pip install panda-gym
We import dependencies:
#importing dependencies import gym import panda_gym
Let’s install the environment and simulation:
#assigning the simulation task to environment env = gym.make('PandaPickAndPlace-v1') state = env.reset() #setting the environment done = False #rendering agent learnings images = [env.render('rgb_array')] while not done: action = env.action_space.sample() state, reward, done, info = env.step(action) images.append(env.render('rgb_array')) env.close()
To set up your environment, you can run the lines of code below. Hyperparameters can be tuned according to the required performance; here we will run a basic demo simulation. Next, install the numpngw library, which is a package that defines a write_png function that writes a NumPy array to a PNG file, and write_apng to a sequence of arrays of an animated PNG file (APNG).
#installing numpngw !pip3 install numpngw from numpngw import write_apng write_ajpg('anima.jpg', images, delay = 100) # real-time rendering = 40 ms between frames
Let's display the results: #rendering the simulation from IPython.display import Image Image(filename="anima.jpg")
We see that the robot is moving the block using a gripper! In addition, two block positions are visible. Although the simulation may not be very clear, it can be further tuned with hyperparameters or, for better rendering performance, run on a more advanced computing system. Now we simulate the sliding gripper problem and extract the slides of its execution:
import gym import panda_gym env = gym.make('PandaSlide-v1') state = env.reset() done = False images = [env.render('rgb_array')] while not done: action = env.action_space.sample() state, reward, done, info = env.step(action) images.append(env.render('rgb_array')) env.close()
!pip3 install numpngw
from numpngw import write_ajpg write_ajpg('anima.jpg', images, delay = 70) # real-time rendering = 40 ms between frames from IPython.display import Image Image(filename="anima.jpg")
Render times and other training and environment settings are configurable. The tool is very handy for testing deep reinforcement learning algorithms. Sometimes there are restrictions, such as restrictions on the control of the capture: it can only be controlled by high-level actions. Additional efforts will be required to implement the training policy. In addition, the simulation is not completely realistic; the main problem is the shape of the capture of objects in the environment by the robot itself.
Through this article, we have seen the essence of a learning environment in the field of reinforcement learning. We also tried to understand the panda-gym task and performed a basic demo simulation of two tasks for rendering Franka Emika1’s robotic arm. Enjoy your modelling!