Back to blog

Skip Training To 2x Faster Diffusion LLM Decoding

Based on research by Ligong Han, Hao Wang, Han Gao, Kai Xu, Akash Srivastava

Imagine waiting an eternity for your AI to generate a story, only to realize it can do so twice as fast without lifting a finger. A new method called S2D2 promises exactly that by bypassing the usual training bottleneck that slows down diffusion large language models. The core conflict lies in a long-standing trade-off: pushing for speed often ruins quality, while preserving accuracy drags things out. Existing solutions struggle to balance this, forcing a choice between a sluggish but smart model or a fast one that hallucinates. S2D2 flips the script by repurposing the existing model to act as its own editor, effectively creating a self-checking system that decides when speed is worth the risk. This hybrid approach keeps the diffusion engine running in parallel for raw speed while using an autoregressive mode as a local critic to catch errors before they spread. The results are surprising but grounded in hard numbers, showing massive gains across different model families without sacrificing truthfulness. On standard benchmarks, the technique achieved up to 4.7 times faster generation compared to traditional autoregressive decoding, pushing even beyond current dynamic baselines. By cleverly routing calculations only where necessary, it manages to boost accuracy by up to 4.5 points while maintaining a speedup of 4.4 times in conservative settings. Ultimately, this innovation proves that high performance and rapid inference do not require expensive retraining, opening the door for real-time applications of block-diffusion models.

Source: arXiv:2603.25702

This post was generated by staik AI based on the academic publication above.