Reinforcement Learning for LLM Agents: Training, Fine-Tuning & Deployment

About the Workshop

In this hands-on workshop, participants will learn how reinforcement learning (RL) is used to train large language model–based agents that can make sequential decisions, interact with environments, call tools autonomously, and improve performance through experience. 
 
We will cover RL fundamentals for LLM agents, extend Markov Decision Processes (MDPs) to agent settings, explore modular RL frameworks, and dive into practical implementations using OpenPipe’s Agent Reinforcement Trainer (ART). By the end, attendees will understand how to design, train, and evaluate RL-based LLM agents for real-world tasks. 

*Note: These are tentative details and are subject to change. 

 

Prerequisites

  • Familiarity with Large Language Models (LLMs) and Python

  • Basic understanding of Reinforcement Learning concepts (policies, rewards, environments)

  • Prior exposure to agent frameworks is helpful but not required

Workshop Modules

  • What makes an LLM an agent vs. a predictor 
  • Markov Decision Process (MDP) in the context of LLM actions 
  • States, actions, rewards, environment interactions 
  • Challenges in RL for LLMs (instability, reward design, scaling)

  • Overview of PPO, GRPO, and policy optimization methods 
  • End-to-end RL workflows for LLM agents 
  • Understanding Agent-R1 framework and structured RL pipelines 
  • Crafting reward functions for multi-step tasks

  • Overview of OpenPipe ART architecture 
  • Installation and environment setup 
  • Training loop walkthrough 
  • Experiment tracking with Weights & Biases

  • Task and environment design 
  • Reward shaping and policy objectives 
  • Tool use and hierarchical decision making 
  • Case study and implementation walkthrough

  • Evaluation metrics (success rate, trajectory efficiency, robustness) 
  • Human-in-the-loop evaluation 
  • Reward hacking and safety risks 
  • Deployment considerations 

 

Instructor

Workshop Details