Learning on the Manifold: Unlocking Standard Diffusion Transformers with Representation Encoders

Image generated by Gemini AI
A new approach called Riemannian Flow Matching with Jacobi Regularization (RJF) addresses convergence issues in diffusion transformers when generating high-fidelity outputs from representation encoders. By focusing on manifold geodesics and correcting curvature errors, RJF allows the DiT-B architecture (131M parameters) to achieve a significant FID score of 3.37, outperforming previous methods. Code is available at the provided GitHub link.
Unlocking Standard Diffusion Transformers with Riemannian Flow Matching
A new approach, Riemannian Flow Matching with Jacobi Regularization (RJF), resolves convergence issues in standard diffusion transformers. This method allows diffusion transformers to perform better without expensive modifications.
Previous research linked convergence failures to a capacity bottleneck, but this study identifies Geometric Interference as the primary cause. This occurs when standard flow matching directs probability paths through low-density regions instead of along the manifold surface where data points are concentrated.
Introducing Riemannian Flow Matching
The RJF method constrains the generative process to follow manifold geodesics, reducing curvature-induced error propagation. This allows the DiT-B architecture, with 131 million parameters, to achieve a Fréchet Inception Distance (FID) of 3.37, marking a significant improvement over previous methods.
Implications for Generative Modeling
The introduction of RJF enhances the fidelity of generative outputs. The research team has made the implementation of RJF publicly available on GitHub.
Related Topics:
📰 Original Source: https://arxiv.org/abs/2602.10099v1
All rights and credit belong to the original publisher.