Scaling Test-time Inference Compute & Advent of Reasoning Models

About

Enabling LLMs to enhance their outputs through increased test-time computation is a crucial step toward building self-improving agents capable of handling open-ended natural language tasks. This session explores how allowing a fixed but non-trivial amount of inference-time compute can impact performance on challenging prompts—an area with significant implications for LLM pretraining strategies and the trade-offs between inference-time and pretraining compute.

Reasoning-focused LLMs, particularly open-source ones, are now challenging closed models with comparable performance using less compute. We’ll explore the mechanisms behind this shift, including Chain-of-Thought (CoT) prompting and reinforcement learning-based reward modeling.

The session will cover the architectures, benchmarks, and performance of next-gen reasoning models through hands-on code walkthroughs. Topics include foundational LLM architectures (pre/post-training and inference), zero-shot CoT prompting (without RL), RL-based reasoning enhancements (beam search, Best-of-N, lookahead), and a comparison of fine-tuning strategies Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Generalized Rejection-based Preference Optimization (GRPO)). Finally, we'll demonstrate how to run and fine-tune models efficiently using the Unsloth.ai framework on limited compute setups.

Key Takeaways:

Smarter AI with More Thinking Time: When we let AI models spend a bit more time thinking (using more compute during response generation), they can come up with better answers, especially for tough questions. This idea is like giving someone extra time on a test to think through a tricky problem.
Where to Invest Computing Power: AI systems need a lot of computing power. This talk explores whether it’s better to spend that power while training the AI or when it's actually answering questions. The answer can change how we build and use AI in the future.
Rise of Thinking AIs: Modern AI models are starting to “reason” more like humans, breaking problems into steps instead of just guessing. Open-source models (free and accessible to all) are now competing strongly with big, private AI systems by doing more with less.
Learning from Feedback: Just like people learn from rewards and consequences, some AI models use a technique called reward modelling to learn how to give better answers based on what we want. This is a big part of how reasoning in AI is improving.

Speaker

Jayita Bhattacharyya

Data Scientist

Download Brochure

Phone Number

Email Id

I Agree to the Terms & Conditions

Send WhatsApp Updates

Scaling Test-time Inference Compute & Advent of Reasoning Models

About

Key Takeaways:

Speaker

Jayita Bhattacharyya

Download agenda

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

s_pltp

s_tslv

li_theme

li_theme_set

Google (11)

_gcl_au

SID

SAPISID

__Secure-#

APISID

SSID

HSID

DV

NID

1P_JAR

OTZ

Facebook (2)

_fbp

fr

LinkedIn (6)

bscookie

lidc

bcookie

aam_uuid

UserMatchHistory

li_sugr

Microsoft (2)

MR

ANONCHK

04

10

19

48