Post‑Training Is Back: From Prompts to Policies

About

This session, "Post-Training Is Back: From Prompts to Policies," explores the resurgence of post-training techniques in the development and alignment of large language models (LLMs). We begin by analyzing the current plateau in prompt engineering, where simple prompt tweaks deliver only short-term, brittle solutions that don’t scale to complex or long-term objectives. The session explains why post-training is regaining importance, driven by the democratization of fine-tuning pipelines and reward-model toolkits. As LLMs are increasingly deployed in critical real-world applications, we need robust, policy-driven alignment methods that can go beyond input tweaking and deliver reliable, safe behavior at scale.

We introduce new paradigms such as leveraging test-time computation for improved policy learning and demonstrate how integrating tool use with reinforcement learning (RL) leads to better, more capable agents. We will detail the challenges in this transition and highlight the opportunities it unlocks for both research and industrial deployment.

Attendees will see practical applications such as fine-tuning LLMs to adhere to organization-specific policies, including regulatory compliance in sectors like Indian finance or healthcare. The session will also demonstrate how reward models and verifiable rewards can teach agents complex multi-step tasks, like support automation or conversational assistants that reason over extensive documents. Furthermore, we will explore integrating external tools-such as calculators, code execution, and web search-with LLMs using RL to enhance capabilities in areas like customer support, education, and data analytics. A live code demo will specifically illustrate how to train an LLM to properly invoke external APIs or tools, such as weather or web search functions, showcasing RL for tool use in action.

Key Takeaways:

  • Understand the limitations of prompt engineering for aligning LLMs.
  • Recognize why post-training (fine-tuning, reward modeling) is critical for robust, scalable alignment.
  • Learn about the new paradigms emerging in test-time computation and their role in post-training.
  • Discover practical methods for combining tool use and RL to build better agents.
  • Explore challenges unique to post-training and policy alignment, especially in the context of critical applications.

Speaker

Book Tickets
Download Brochure

Download agenda