Talk2Move: Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes

•

Original Author:Jing Tan et al.

•

January 5, 2026

Talk2Move: Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes

Image generated by Gemini AI

Talk2Move is a novel reinforcement learning framework designed for spatial transformations of objects in scenes based on text instructions. It addresses limitations in existing methods, enabling geometric adjustments (like rotation and resizing) without requiring extensive paired data. By using Group Relative Policy Optimization and a unique spatial reward system, Talk2Move enhances learning efficiency and achieves superior accuracy in object transformations. Experiments show it outperforms current text-guided editing techniques, offering interpretable and coherent results in spatial manipulation.

Talk2Move: Advancing Object-Level Geometric Transformation Through Reinforcement Learning

A new framework, Talk2Move, utilizes reinforcement learning to enable text-instructed spatial transformations of objects within various scenes. This approach addresses the limitations of existing multimodal generation systems that struggle with object-level geometric adjustments such as translating, rotating, or resizing.

Talk2Move employs Group Relative Policy Optimization (GRPO), facilitating the exploration of geometric actions through diverse rollouts generated from input images and lightweight textual variations. The framework’s design integrates a spatial reward model that aligns geometric transformations with corresponding linguistic descriptions.

Key Features of Talk2Move

Off-Policy Step Evaluation: Enhances learning efficiency by focusing on informative stages of transformation.
Active Step Sampling: Refines outputs based on real-time feedback.
Object-Centric Spatial Rewards: Directly assess behaviors such as displacement, rotation, and scaling.

Experimental results indicate that Talk2Move achieves notable improvements in precision and consistency of object transformations, surpassing existing text-guided editing methods in spatial accuracy and enhancing scene coherence.

Share this article

Twitter Facebook LinkedIn WhatsApp Reddit

Talk2Move: Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes

Talk2Move: Advancing Object-Level Geometric Transformation Through Reinforcement Learning

Key Features of Talk2Move

Related Topics:

Share this article