Trust, Don't Trust, or Flip: Robust Preference-Based Reinforcement Learning with Multi-Expert Feedback

•

Original Author:Seyed Amir Hosseini et al.

•

January 26, 2026

Trust, Don't Trust, or Flip: Robust Preference-Based Reinforcement Learning with Multi-Expert Feedback

Image generated by Gemini AI

TriTrust-PBRL (TTP) is a new framework designed to enhance preference-based reinforcement learning by addressing challenges posed by heterogeneous annotators. Unlike existing methods, TTP learns both a reward model and expert-specific trust parameters, allowing it to identify and invert adversarial feedback. This leads to significant robustness, as demonstrated in diverse tasks like MetaWorld and DM Control, where TTP outperforms current PBRL approaches, maintaining high performance even with unreliable feedback. The framework operates without needing detailed expert features, making it a seamless addition to existing systems.

New Framework Enhances Preference-Based Reinforcement Learning Amidst Noisy Feedback

Researchers have introduced TriTrust-PBRL (TTP), a novel framework designed to improve preference-based reinforcement learning (PBRL) by addressing challenges posed by heterogeneous annotators. This approach enables effective handling of feedback from both reliable and adversarial sources, significantly enhancing the robustness of learning algorithms.

The TTP framework introduces a mechanism that enables the learning of a shared reward model and expert-specific trust parameters, which can evolve during optimization. This results in three distinct states: positive (trust), near zero (ignorance), and negative (need to flip the preference). This allows the model to invert adversarial preferences and extract valuable signals instead of discarding corrupted data.

To validate TTP, researchers conducted evaluations across four domains, including manipulation tasks from MetaWorld and locomotion challenges from DM Control. The results highlighted TTP's superior robustness, maintaining performance close to oracle levels in scenarios involving adversarial corruption, while standard PBRL methods exhibited significant failures.

Notably, TTP outperformed existing benchmarks by successfully learning from mixed pools of expert feedback, requiring no additional expert features beyond identification indices, making it easy to integrate with existing PBRL pipelines.

Share this article

Twitter Facebook LinkedIn WhatsApp Reddit

Trust, Don't Trust, or Flip: Robust Preference-Based Reinforcement Learning with Multi-Expert Feedback

New Framework Enhances Preference-Based Reinforcement Learning Amidst Noisy Feedback

Related Topics:

Share this article