HexFormer: Hyperbolic Vision Transformer with Exponential Map Aggregation

Image generated by Gemini AI
Researchers have developed HexFormer, a hyperbolic vision transformer for image classification that employs exponential map aggregation in its attention mechanism. The architecture includes both a hyperbolic variant and a hybrid version that merges a hyperbolic encoder with an Euclidean classification head. Experiments show HexFormer outperforms standard Euclidean models and previous hyperbolic transformers across various datasets, with the hybrid variant achieving the best results. The study also highlights that hyperbolic models offer improved gradient stability and reduced sensitivity to training strategies, suggesting practical advantages in using hyperbolic geometry for vision tasks.
HexFormer: A New Era in Vision Transformers
A groundbreaking study has unveiled HexFormer, a hyperbolic vision transformer designed to enhance image classification through innovative use of hyperbolic geometry. This model incorporates an exponential map aggregation mechanism within its attention framework, proving to be a significant advancement over traditional methods.
Performance Enhancements
Extensive experiments conducted across multiple datasets reveal consistent performance improvements for HexFormer over both Euclidean baselines and previous hyperbolic vision transformers. Notably, the hybrid variant has achieved the strongest overall results, underscoring the effectiveness of combining hyperbolic and Euclidean elements in model design.
Gradient Stability Analysis
The research also delves into the gradient stability of hyperbolic transformers. Findings indicate that these models maintain more stable gradients and exhibit reduced sensitivity to warmup strategies when compared to their Euclidean counterparts.
Related Topics:
📰 Original Source: https://arxiv.org/abs/2601.19849v1
All rights and credit belong to the original publisher.