The Rarer the Signal, the Higher the Tax

Hack Session

About the session

Why optimizing for edge cases quietly degrades general intelligence and how to prevent it?
 
Fine-tuning and reinforcement learning have become the default tools for making LLMs safer and more useful. But these methods introduce a hidden cost: optimizing for rare, high-stakes signals often degrades the model’s broader capabilities in ways that are rarely measured before deployment.
 
This talk explores what we call the "alignment tax", and why it grows disproportionately when training signals are sparse. Safety violations, edge-case behaviors, and domain-specific exceptions are often the most critical examples in a dataset, but their rarity creates a structural imbalance: the updates meant to fix them can overwrite the very capabilities that make the model useful.
 
We will unpack the mechanism behind this effect, showing how rare signals produce outsized gradient updates that distort the model’s internal representations and erode its "logic scaffolding", the general reasoning ability underlying tasks like coding, mathematics, and structured problem solving.
 
Key areas we will cover:
 
  • Why rarity amplifies impact: the gradient dynamics that cause small slices of data to disproportionately shape model behavior
  • Three production failure modes: capability forgetting, distribution brittleness, and reward hacking masked by evaluation metrics
  • The logic scaffolding problem: why degrading general reasoning is an early warning sign of deeper system failure
  • A measurement framework: how to detect alignment tax before deployment
  • Practical mitigation strategies: including gradient isolation, model averaging, and training-time tradeoff design
  • We will also ground this through cross-domain examples: systems tuned for rare, high-stakes signals often become brittle in the common case, from anomaly-sensitive decision pipelines over-flagging legitimate inputs, to compressed detectors in scientific systems missing the very rare events they seek, to medical models maintaining headline accuracy while drifting under real-world distribution shifts.

If you are building, fine-tuning, or deploying LLM systems, this talk offers a more precise lens to understand the tradeoffs you are already making; "whether you realize it or not".

Speaker

Download Brochure