AI
AI News

On-Policy Context Distillation for Language Models

Source:arXiv
Original Author:Tianzhu Ye et al.
On-Policy Context Distillation for Language Models

Image generated by Gemini AI

A new framework called On-Policy Context Distillation (OPCD) enhances language models by allowing them to internalize knowledge from their own generated outputs. This method effectively consolidates experiential knowledge and optimizes system prompts, leading to improved accuracy in tasks like mathematical reasoning and text-based games. OPCD also facilitates knowledge transfer from larger to smaller models, outperforming existing baseline techniques.

On-Policy Context Distillation Framework Introduced for Language Models

A new framework, On-Policy Context Distillation (OPCD), has been proposed to enhance language models by enabling them to internalize in-context knowledge more effectively. The OPCD framework trains a student model using its own generated trajectories while minimizing the reverse Kullback-Leibler divergence against a context-conditioned teacher model. This method has shown promise in experiential knowledge distillation and system prompt distillation.

Performance Outcomes

The effectiveness of OPCD has been validated across multiple domains, including:

  • Mathematical reasoning
  • Text-based games
  • Domain-specific tasks

In these applications, OPCD consistently outperformed baseline methods, achieving higher task accuracy and demonstrating better preservation of out-of-distribution capabilities.

Related Topics:

On-Policy Context Distillationlanguage modelsknowledge distillationKullback-Leibler divergencetask accuracy

📰 Original Source: https://arxiv.org/abs/2602.12275v1

All rights and credit belong to the original publisher.

Share this article