# Solving real world problems using Reinforcement Learning

5th August, 2023 9:30 am - 5:30 pm RENAISSANCE :- Race Course Rd, Madhava Nagar Extension

The revolutionary Generative model, ChatGPT uses Reinforcement Learning under the hood. Reinforcement Learning from Human Feedback (RLHF) is the core working principle behind these technologies. RLHF is used to align the Large Language Models to the human preferences. It’s evident that Reinforcement Learning has a lot of potential to solve real world problems.

In this workshop, you will learn Reinforcement Learning starting from basics to advanced and understand how to apply Reinforcement Learning to real world problems. You will also learn about RLHF and its significance in Large Language Models. Whether you’re a seasoned AI practitioner or just starting out, this workshop will equip you with the tools and knowledge to tackle real world challenges using Reinforcement Learning. Join us and discover how Reinforcement Learning can transform the way you approach problem solving!

#### Module 1:  Mathematical Prerequisites for  Reinforcement Learning

• Markov Decision Processes
• Bellman equation and Dynamic Programming
• Value Iteration
• Policy Iteration
• Hands on experience – Jupyter notebook with simple numpy based tutorial with solution for Value Iteration and Policy Iteration
• Introduction to Partially Observable Markov Decision Processed and Games

#### Module 2: Simple Reinforcement Learning

• Temporal difference (TD) learning and Monte Carlo (MC) methods
• RL – framework: OpenAI Gym Environment
• Exploration vs Exploitation in RL
• Actor Only, Critic Only and Actor Critic Algorithms
• Q-learning
• SARSA
• REINFORCE
• Jupyter notebook tutorial with solution for TD, MC, Q-learning, SARSA, REINFORCE
• Discussion on online vs offline RL

#### Module 3: Reinforcement Learning with Function Approximation

• Basic Introduction to Linear Function Approximation
• Deadly triad of Deep RL – function approximation, bootstrapping and offline learning
• DQN and variants
• OpenAI Spinning up based tutorial on DQN with solution
• Stochastic Policy Gradient Theorem
• PPO and variants
• OpenAI Spinning up based tutorial on PPO with solution
• Deterministic Policy Gradient Theorem
• DDPG, TD3, SAC
• OpenAI Spinning up based tutorial on TD3 with solution

#### Module 4: RLHF for LLMs

• LLM Basics
• Types of human feedback
• Supervised Fine Tuning – Basics
• Reward Model from Human Feedback
• RL based LLM finetuning with PPO
• RL based LLM finetuning with ILQL
• TRLX based tutorial on finetuning GPT2 with PPO and ILQL*
• Discussion on other RLHF open source libraries.*

Prerequisites:

• System Requirement and Setup
• Laptop with at least 4-8 GB of RAM
• We will be using a cloud jupyter notebook powered by GPU for the workshop
• Offline Setup [Optional]
• GPU good to have!
• Install Python3.9 or higher version(Resource)
• Install jupyter notebook (Resource)