Whenever it comes to training model, companies usually bet of feeding it more and more data for training.
Bigger datasets = smarter models
When DeepSeek released initially, it challenged this approach and set new definitions for model training. And after that came a wave of model training with less data and optimized approach. I came across one such research paper: LIMI: Less Is More for Intelligent Agency and it really got me hooked. It discusses how you don’t need thousands of examples to build a powerful AI. In fact, just 78 carefully chosen training samples are enough to outperform models trained on 10,000.
How? By focusing on quality over quantity. Instead of flooding the model with repetitive or shallow examples, LIMI uses rich, real-world scenarios from software development and scientific research. Each sample captures the full arc of problem-solving: planning, tool use, debugging, and collaboration.
The result? A model that doesn’t just “know” things: it does things. And it does them better, faster, and with far less data.
This article explains how LIMI works!
The paper defines Agency as an emergent capability where AI systems function as autonomous agents. These agents do not wait for step-by-step instructions. Instead, they:
This contrasts sharply with traditional language models that generate responses but cannot act. Real-world applications like debugging code, managing research workflows, or operating microservices, require this kind of proactive intelligence.
The shift from “thinking AI” to “working AI” is driven by industry needs. Companies now seek systems that can complete tasks end-to-end, not just answer questions.
For over a decade, AI progress has followed one rule: scale up. Bigger models. More tokens. Larger datasets. And it worked: for language understanding. However, recent work in other domains suggests otherwise:
But agency is different. You can’t learn to build by reading millions of code snippets. You learn by doing. And doing well requires dense, high-fidelity examples: not just volume.
Think of it like learning to cook. Watching 10,000 cooking videos might teach you vocabulary. But one hands-on session with a chef, where you chop, season, taste, and adjust, teaches you how to cook.
LIMI applies this idea to AI training. Instead of collecting endless logs of tool calls, it curates 78 full “cooking sessions,” each one a complete, successful collaboration between human and AI on a complex task.
The result? The model learns the essence of agency: how to plan, adapt, and deliver.
LIMI’s success rests on three methodological pillars:
Queries are not generic prompts. They simulate real collaborative tasks in software development (“vibe coding”) and scientific research. The team collected:
For each query, the team recorded full interaction trajectories, multi-turn sequences that include:
All 78 training samples come from two domains that represent the bulk of knowledge work:
This focus ensures that every training example is dense with agentic signals.
The LIMI dataset was built through a meticulous pipeline:
Real queries came from actual developer and researcher workflows. Synthetic queries were derived from 100 high-star GitHub repositories, filtered for meaningful code changes (excluding documentation-only PRs).
Four PhD-level annotators reviewed all queries for semantic alignment with real tasks. Only the best 78 were selected.
Using the SII CLI environment, a tool-rich interface supporting code execution, file system access, and web search: human annotators collaborated with GPT-5 to complete each task. Every successful trajectory was logged in full.
The result is a compact but extremely rich dataset where each sample encapsulates hours of realistic problem-solving.
To test LIMI’s capabilities, the team used AgencyBench, a new benchmark with 10 complex, real-world tasks:
Each task has multiple subtasks, requiring planning, tool use, and iterative refinement.
In addition to AgencyBench, LIMI was tested on generalization benchmarks:
LIMI was implemented by fine-tuning GLM-4.5 (355B parameters) on the 78-sample dataset. It was compared against:
On AgencyBench, LIMI scored 73.5%, far ahead of all competitors:
Even more striking: LIMI outperformed the 10,000-sample model by 53.7% absolute points, using 128 times fewer samples.
On generalization benchmarks, LIMI averaged 57.2%, beating all baselines and data-rich variants. It achieved top scores on coding (92.1% on HumanEval) and competitive results on tool use (45.6% on TAU2-retail).
The paper includes detailed case comparisons:

These examples illustrate LIMI’s superior reasoning, tool use, and adaptability.
Also Read: Make Model Training and Testing Easier with MultiTrain
LIMI establishes the Agency Efficiency Principle:
Machine autonomy emerges not from data abundance but from strategic curation of high-quality agentic demonstrations.
This challenges the industry’s reliance on massive data pipelines. Instead, it suggests that:
For practitioners, this means investing in task design, human-AI collaboration protocols, and trajectory quality: not just data volume.
Also Read: Understanding the Architecture of Qwen3-Next-80B-A3B
The LIMI paper delivers a bold message: you don’t need 10,000 examples to teach an AI how to work. You need 78 really good ones. By focusing on high-quality, real-world collaborations, LIMI achieves state-of-the-art agentic performance with a fraction of the data. It proves that agency isn’t about scale. It’s about signal.
As AI moves from chatbots to coworkers, this insight will be crucial. The future belongs not to those who collect the most data, but to those who design the most meaningful learning experiences.
In the age of agentic AI, less isn’t just more. It’s better!