Scaling Test-time Inference Compute & Advent of Reasoning Models

About

Enabling LLMs to enhance their outputs through increased test-time computation is a crucial step toward building self-improving agents capable of handling open-ended natural language tasks. This session explores how allowing a fixed but non-trivial amount of inference-time compute can impact performance on challenging prompts—an area with significant implications for LLM pretraining strategies and the trade-offs between inference-time and pretraining compute.

Reasoning-focused LLMs, particularly open-source ones, are now challenging closed models with comparable performance using less compute. We’ll explore the mechanisms behind this shift, including Chain-of-Thought (CoT) prompting and reinforcement learning-based reward modeling.

The session will cover the architectures, benchmarks, and performance of next-gen reasoning models through hands-on code walkthroughs. Topics include foundational LLM architectures (pre/post-training and inference), zero-shot CoT prompting (without RL), RL-based reasoning enhancements (beam search, Best-of-N, lookahead), and a comparison of fine-tuning strategies Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Generalized Rejection-based Preference Optimization (GRPO)). Finally, we'll demonstrate how to run and fine-tune models efficiently using the Unsloth.ai framework on limited compute setups.

Key Takeaways:

  • Smarter AI with More Thinking Time: When we let AI models spend a bit more time thinking (using more compute during response generation), they can come up with better answers, especially for tough questions. This idea is like giving someone extra time on a test to think through a tricky problem.
  • Where to Invest Computing Power: AI systems need a lot of computing power. This talk explores whether it’s better to spend that power while training the AI or when it's actually answering questions. The answer can change how we build and use AI in the future.
  • Rise of Thinking AIs: Modern AI models are starting to “reason” more like humans, breaking problems into steps instead of just guessing. Open-source models (free and accessible to all) are now competing strongly with big, private AI systems by doing more with less.
  • Learning from Feedback: Just like people learn from rewards and consequences, some AI models use a technique called reward modelling to learn how to give better answers based on what we want. This is a big part of how reasoning in AI is improving.

Speaker

Book Tickets
Download Brochure

Download agenda