AI
AI News

Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling

Source:arXiv
Original Author:Gongye Liu et al.
Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling

Image generated by Gemini AI

Researchers have introduced DiNa-LRM, a diffusion-native latent reward model that optimizes preference learning directly on noisy diffusion states. This approach utilizes a noise-calibrated Thurstone likelihood to enhance alignment efficiency. DiNa-LRM outperforms existing diffusion-based reward systems and competes with leading Vision-Language Models, achieving significant improvements in speed and resource use during model alignment.

New Diffusion-Native Reward Model Outperforms Vision-Language Models

A novel approach to preference optimization in diffusion models, known as DiNa-LRM, has shown significant advancements over traditional Vision-Language Models (VLMs) in computational efficiency and alignment performance. This model formulates preference learning directly on noisy diffusion states.

DiNa-LRM addresses the limitations of current reward functions that rely on VLMs, which suffer from high computational and memory costs. The method introduces a noise-calibrated Thurstone likelihood that streamlines the optimization process.

Performance Metrics and Comparisons

In image alignment benchmarks, DiNa-LRM demonstrated substantial improvements over current diffusion-based reward models, achieving performance levels competitive with state-of-the-art VLMs at a significantly reduced computational cost. This positions DiNa-LRM as a compelling alternative for optimizing preference in machine learning applications.

Related Topics:

VLM-Based RewardsDiNa-LRMdiffusion-native latent reward modelpreference optimizationcomputational efficiency

📰 Original Source: https://arxiv.org/abs/2602.11146v1

All rights and credit belong to the original publisher.

Share this article