AI
AI News

DAWN: Dependency-Aware Fast Inference for Diffusion LLMs

Source:arXiv
Original Author:Lizhuo Luo et al.
DAWN: Dependency-Aware Fast Inference for Diffusion LLMs

Image generated by Gemini AI

The article introduces DAWN, a new method for improving inference speed in diffusion large language models (dLLMs) without sacrificing output quality. DAWN addresses the inefficiencies of traditional parallel decoding by modeling inter-token dependencies, allowing for more reliable token unmasking. Experimental results show DAWN enhances inference speed by 1.80-8.06x compared to existing methods while maintaining generation quality. The code is available at GitHub for implementation.

New Decoding Method DAWN Enhances Inference Speed for Diffusion LLMs

Researchers have introduced DAWN, a novel decoding technique aimed at optimizing inference speed for diffusion large language models (dLLMs). This method addresses the inefficiencies of existing parallel decoding strategies.

DAWN employs a training-free, dependency-aware approach that constructs a dependency graph to prioritize token relationships. It focuses on two key insights:

  • Positions that depend on certain unmasked tokens yield more reliable outputs.
  • Unmasking multiple strongly coupled tokens simultaneously can lead to errors in generation.

DAWN effectively selects the most reliable unmasking positions at each iteration, allowing for high levels of parallelism while maintaining text quality. Experiments have shown that DAWN can accelerate inference by a factor of 1.80 to 8.06 times compared to existing baselines, without compromising output quality. The code for DAWN is publicly available at GitHub.

Related Topics:

DAWNdependency-aware decodingdiffusion large language modelsparallel decodinginference speedup

📰 Original Source: https://arxiv.org/abs/2602.06953v1

All rights and credit belong to the original publisher.

Share this article