Self-Distillation Can Crash Math Skills: Here Is The Surprising Reason
Based on research by Jeonghye Kim, Xufang Luo, Minbeom Kim, Sangmook Lee, Dohyung Kim
Researchers have uncovered a hidden flaw in a major AI training technique that paradoxically makes large language models worse at math. While designed to boost efficiency, the method silences a critical thinking process needed for solving complex problems.
Scientists found that self-distillation forces models to be too confident too quickly. During reasoning, healthy models express uncertainty to explore different paths. This technique suppresses that hesitation in favor of speed. The trade-off is devastating for new tasks the AI has never seen before. Models dropped up to 40% in performance on unseen problems because they stopped adjusting their thinking style dynamically.
The takeaway is clear: efficiency cannot come at the cost of genuine intellectual flexibility, and preserving a model's ability to admit confusion is vital for robust reasoning.
Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs? by Jeonghye Kim et al., https://arxiv.org/abs/2603.24472