AI
AI News

InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning

Source:arXiv
Original Author:Yuchen Yan et al.
InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning

Image generated by Gemini AI

InftyThink+ is a new reinforcement learning framework designed to enhance iterative reasoning in large models by optimizing when to summarize and how to resume reasoning. Through a two-stage training process, it improves accuracy by 21% on AIME24 and outperforms traditional methods while reducing inference latency. This approach not only boosts performance but also enhances generalization to new benchmarks, making reasoning more efficient.

InftyThink+: A Breakthrough in Infinite-Horizon Reasoning via Reinforcement Learning

A new framework, InftyThink+, has been introduced to enhance infinite-horizon reasoning in large models. This end-to-end reinforcement learning approach optimizes iterative reasoning by improving accuracy and reducing inference latency.

InftyThink+ incorporates iterative reasoning, summarizing intermediate thoughts to streamline the process. It employs a novel reinforcement learning framework that optimizes the entire trajectory of reasoning, including model-controlled iteration boundaries and explicit summarization techniques.

Results from experiments using the DeepSeek-R1-Distill-Qwen-1.5B model demonstrate that InftyThink+ achieves a 21% increase in accuracy on the AIME24 benchmark, surpassing conventional long chain-of-thought reinforcement learning methods. Additionally, it shows improved generalization against out-of-distribution benchmarks and reduces inference latency, indicating stronger performance and improved efficiency in reasoning tasks.

Related Topics:

InftyThink+reinforcement learningiterative reasoningstrategic summarizationreasoning efficiency

📰 Original Source: https://arxiv.org/abs/2602.06960v1

All rights and credit belong to the original publisher.

Share this article