WorldCompass: Reinforcement Learning for Long-Horizon World Models

Image generated by Gemini AI
WorldCompass introduces an advanced Reinforcement Learning framework for enhancing long-horizon, interactive video-based world models. Key innovations include a clip-level rollout strategy for improved sample efficiency, complementary reward functions to maintain accuracy and quality, and a negative-aware fine-tuning method for effective model enhancement. Tests on the WorldPlay model show marked improvements in interaction accuracy and visual fidelity, suggesting practical applications in interactive media and simulation environments.
WorldCompass Enhances Reinforcement Learning for Video-Based World Models
A new post-training framework named WorldCompass has been developed to improve long-horizon interactive video-based world models. This approach aims to enable these models to explore environments more accurately by leveraging interaction signals.
WorldCompass introduces three key innovations:
- Clip-level Rollout Strategy: Generates and evaluates multiple samples at a single target clip, enhancing rollout efficiency.
- Complementary Reward Functions: Employs two distinct reward functions to guide the model, focusing on interaction-following accuracy and visual quality.
- Efficient RL Algorithm: Utilizes a negative-aware fine-tuning strategy to boost model capacity.
Evaluations on the open-source world model, WorldPlay, reveal that WorldCompass enhances interaction accuracy and visual fidelity. This marks a significant advancement in the capabilities of interactive world models.
Related Topics:
📰 Original Source: https://arxiv.org/abs/2602.09022v1
All rights and credit belong to the original publisher.