Using a smarter AI to write training data for a smaller one sounds like a perfect shortcut, yet it often backfires spectacularly. New research reveals that simply copying the output of a powerful model can actually cripple the reasoning skills of the student you are trying to teach. This counterintuitive failure has left many developers wondering why their best resources are making their models worse instead of better.

The core problem lies in a subtle but critical mismatch: the writing style of the teacher model is too different from what the student model naturally produces. When researchers tried fine-tuning a reasoning model using data generated by a much stronger system, the student struggled to adapt because the patterns felt alien. The resulting synthetic sequences were logically sound but stylistically jarring, causing the model to lose its footing during actual use rather than gaining new skills.

To solve this, the team developed a cooperative framework that interleaves teacher and student models to alternately generate style and non-style tokens. Instead of letting the teacher write everything alone, the system alternates between generating high-level reasoning from the expert and filling in the stylistic details with the student's own voice. This hybrid approach ensures the final training data retains the intelligence of the powerful model while sounding exactly like something the student would naturally say.

The results are stark: where standard methods caused performance to plummet by over 10 percent, this cooperative strategy delivered double-digit improvements in code generation tasks. The takeaway is clear for anyone building reasoning models: you cannot just copy-paste from a genius; you must teach the student how to think in their own voice while borrowing their wisdom.