Step-resolved data attribution for looped transformers

Image generated by Gemini AI
Researchers have developed a new method called Step-Decomposed Influence (SDI) to analyze how individual training examples impact looped transformers during recurrent computations. Unlike existing methods that provide a single influence score, SDI offers a detailed influence trajectory across each iteration. Implemented using TensorSketch, SDI avoids generating per-example gradients, making it scalable for transformer models. Experiments demonstrate that SDI aligns closely with traditional full-gradient methods while enhancing data attribution and interpretability in algorithmic reasoning tasks.
New Method Enhances Data Attribution in Looped Transformers
Researchers have developed a novel approach, Step-Decomposed Influence (SDI), to improve the understanding of how individual training examples impact computation within looped transformers. This advancement addresses a significant limitation in existing methods, which only provide a single scalar score that aggregates influence across all iterations, obscuring the timing of an example's relevance.
SDI decomposes the influence attributed by existing estimators like TracIn into a detailed influence trajectory that spans the length of the recurrent iterations. By unrolling the recurrent computation graph, the new method allows for precise attribution of influence to specific loop iterations, offering a clearer picture of the reasoning involved in transformer models.
Experimental Validation
Extensive experiments were conducted using looped GPT-style models on various algorithmic reasoning tasks. Results indicate that SDI scales effectively and aligns closely with full-gradient baselines, maintaining a low error rate. This performance demonstrates SDI's potential as a reliable tool for data attribution and interpretability in machine learning.
Related Topics:
📰 Original Source: https://arxiv.org/abs/2602.10097v1
All rights and credit belong to the original publisher.