Post‑Training Is Back: From Prompts to Policies

About

This session, "Post-Training Is Back: From Prompts to Policies," explores the resurgence of post-training techniques in the development and alignment of large language models (LLMs). We begin by analyzing the current plateau in prompt engineering, where simple prompt tweaks deliver only short-term, brittle solutions that don’t scale to complex or long-term objectives. The session explains why post-training is regaining importance, driven by the democratization of fine-tuning pipelines and reward-model toolkits. As LLMs are increasingly deployed in critical real-world applications, we need robust, policy-driven alignment methods that can go beyond input tweaking and deliver reliable, safe behavior at scale.

We introduce new paradigms such as leveraging test-time computation for improved policy learning and demonstrate how integrating tool use with reinforcement learning (RL) leads to better, more capable agents. We will detail the challenges in this transition and highlight the opportunities it unlocks for both research and industrial deployment.

Attendees will see practical applications such as fine-tuning LLMs to adhere to organization-specific policies, including regulatory compliance in sectors like Indian finance or healthcare. The session will also demonstrate how reward models and verifiable rewards can teach agents complex multi-step tasks, like support automation or conversational assistants that reason over extensive documents. Furthermore, we will explore integrating external tools-such as calculators, code execution, and web search-with LLMs using RL to enhance capabilities in areas like customer support, education, and data analytics. A live code demo will specifically illustrate how to train an LLM to properly invoke external APIs or tools, such as weather or web search functions, showcasing RL for tool use in action.

Key Takeaways:

Understand the limitations of prompt engineering for aligning LLMs.
Recognize why post-training (fine-tuning, reward modeling) is critical for robust, scalable alignment.
Learn about the new paradigms emerging in test-time computation and their role in post-training.
Discover practical methods for combining tool use and RL to build better agents.
Explore challenges unique to post-training and policy alignment, especially in the context of critical applications.

Speaker

Aashay Sachdeva

Founding Team/ML

Download Brochure

Phone Number

Email Id

I Agree to the Terms & Conditions

Send WhatsApp Updates

Post‑Training Is Back: From Prompts to Policies

About

Key Takeaways:

Speaker

Aashay Sachdeva

Download agenda

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

s_pltp

s_tslv

li_theme

li_theme_set

Google (11)

_gcl_au

SID

SAPISID

__Secure-#

APISID

SSID

HSID

DV

NID

1P_JAR

OTZ

Facebook (2)

_fbp

fr

LinkedIn (6)

bscookie

lidc

bcookie

aam_uuid

UserMatchHistory

li_sugr

Microsoft (2)

MR

ANONCHK

04

10

19

48