Bellman Calibration for V-Learning in Offline Reinforcement Learning

Image generated by Gemini AI
The article presents Iterated Bellman Calibration, a model-agnostic method for improving off-policy value predictions in infinite-horizon Markov decision processes. By ensuring states with similar predicted returns align with Bellman equation outcomes, it utilizes histogram and isotonic calibration techniques. The method employs a doubly robust pseudo-outcome for off-policy data, offering a one-dimensional fitted value iteration applicable to any value estimator. Importantly, it provides finite-sample guarantees without needing Bellman completeness or realizability, enhancing the reliability of predictions.
New Method Introduced for Offline Reinforcement Learning Calibration
Researchers have unveiled Iterated Bellman Calibration, a novel post-hoc procedure aimed at enhancing off-policy value predictions in infinite-horizon Markov decision processes. This model-agnostic approach addresses the calibration of predicted long-term returns, ensuring that states with analogous predictions align with the Bellman equation under the target policy.
The analysis associated with this new calibration method offers finite-sample guarantees for both calibration accuracy and predictive performance under relatively weak assumptions. Notably, the approach does not necessitate Bellman completeness or realizability, which are often challenging conditions in reinforcement learning contexts.
Related Topics:
📰 Original Source: https://arxiv.org/abs/2512.23694v1
All rights and credit belong to the original publisher.