Auto-Regressive Masked Diffusion Models

•

Original Author:Mahdi Karami et al.

•

January 23, 2026

Image generated by Gemini AI

The Auto-Regressive Masked Diffusion (ARMD) model addresses performance gaps in masked diffusion models (MDMs) compared to autoregressive models (ARMs) by combining their training efficiency with the parallel capabilities of diffusion models. ARMD employs a causal, permutation-equivariant architecture, enabling efficient autoregressive-style decoding and a new strided parallel generation strategy. This innovation accelerates inference while ensuring coherence, leading to state-of-the-art results on language modeling benchmarks with fewer training steps and bridging the gap between parallel and sequential decoding methods.

Auto-Regressive Masked Diffusion Models Revolutionize Language Modeling

Recent advancements in language modeling have introduced Auto-Regressive Masked Diffusion (ARMD) models, enhancing performance by merging autoregressive models and diffusion-based architectures. This innovative approach improves training efficiency and narrows the performance gap.

Key Innovations of the ARMD Model

Causal Architecture: Computes all conditional probabilities during multiple denoising steps within a single parallel forward pass.
Efficient Decoding: Supports autoregressive-style decoding with a progressive permutation training scheme, accommodating various token orderings.
Strided Parallel Generation: Accelerates inference by generating tokens across parallel streams while ensuring global coherence.

Empirical evaluations indicate that ARMD sets a new standard in language modeling benchmarks, outstripping established diffusion baselines while requiring significantly fewer training steps.

ARMD's performance enhancements showcase its ability to bridge the gap between parallel and sequential decoding methods, redefining expectations in language model training.

Share this article

Twitter Facebook LinkedIn WhatsApp Reddit

Auto-Regressive Masked Diffusion Models

Auto-Regressive Masked Diffusion Models Revolutionize Language Modeling

Key Innovations of the ARMD Model

Related Topics:

Share this article