Aashay Sachdeva is a dynamic data scientist and a pivotal member of the founding team at Sarvam AI, where he specializes in machine learning (ML) and artificial intelligence (AI) solutions. With five years of diverse experience spanning healthcare, creatives, and gaming industries, Aashay has honed his expertise in building real-time ML systems that not only enhance operational efficiency but also drive significant business impact. He is currently working as a ML engineer at Sarvam AI in the models team that involves spearheading the development of a full-stack platform for Generative AI, where he leverages cutting-edge technologies and frameworks.
This session, "Post-Training Is Back: From Prompts to Policies," explores the resurgence of post-training techniques in the development and alignment of large language models (LLMs). We begin by analyzing the current plateau in prompt engineering, where simple prompt tweaks deliver only short-term, brittle solutions that don’t scale to complex or long-term objectives. The session explains why post-training is regaining importance, driven by the democratization of fine-tuning pipelines and reward-model toolkits. As LLMs are increasingly deployed in critical real-world applications, we need robust, policy-driven alignment methods that can go beyond input tweaking and deliver reliable, safe behavior at scale.
We introduce new paradigms such as leveraging test-time computation for improved policy learning and demonstrate how integrating tool use with reinforcement learning (RL) leads to better, more capable agents. We will detail the challenges in this transition and highlight the opportunities it unlocks for both research and industrial deployment.
Attendees will see practical applications such as fine-tuning LLMs to adhere to organization-specific policies, including regulatory compliance in sectors like Indian finance or healthcare. The session will also demonstrate how reward models and verifiable rewards can teach agents complex multi-step tasks, like support automation or conversational assistants that reason over extensive documents. Furthermore, we will explore integrating external tools-such as calculators, code execution, and web search-with LLMs using RL to enhance capabilities in areas like customer support, education, and data analytics. A live code demo will specifically illustrate how to train an LLM to properly invoke external APIs or tools, such as weather or web search functions, showcasing RL for tool use in action.
Read MoreManaging and scaling ML workloads have never been a bigger challenge in the past. Data scientists are looking for collaboration, building, training, and re-iterating thousands of AI experiments. On the flip side ML engineers are looking for distributed training, artifact management, and automated deployment for high performance
Read MoreManaging and scaling ML workloads have never been a bigger challenge in the past. Data scientists are looking for collaboration, building, training, and re-iterating thousands of AI experiments. On the flip side ML engineers are looking for distributed training, artifact management, and automated deployment for high performance
Read More