Provable Robustness in Multimodal Large Language Models via Feature Space Smoothing

Image generated by Gemini AI
A new approach called Feature-space Smoothing (FS) has been proposed to enhance the robustness of multimodal large language models (MLLMs) against adversarial attacks. FS guarantees a certified lower bound on feature cosine similarity under $\ell_2$-bounded attacks. The addition of the Purifier and Smoothness Mapper (PSM) module further improves robustness without retraining. Experiments show that FS-PSM significantly reduces the Attack Success Rate from nearly 90% to about 1%, outperforming traditional adversarial training across various MLLMs and tasks.
New Method Enhances Robustness of Multimodal Large Language Models
Research on multimodal large language models (MLLMs) has led to a new technique aimed at countering vulnerabilities to adversarial attacks. The Feature-space Smoothing (FS) method provides certified robustness by ensuring stable feature representations, significantly enhancing models' resistance to perturbations.
The FS method transforms any feature encoder into a smoothed variant that guarantees a certified lower bound on the cosine similarity between clean and adversarial representations during $\ell_2$-bounded attacks, crucial for maintaining model integrity against adversarial threats.
Key Findings and Methodology
The Feature Cosine Similarity Bound (FCSB) derived from FS can be enhanced by increasing the Gaussian robustness score of the original encoder through the Purifier and Smoothness Mapper (PSM), which boosts the Gaussian robustness score without retraining the MLLMs.
Integration of FS and PSM demonstrates superior empirical performance. Extensive experiments reveal that the FS-PSM method significantly reduces the Attack Success Rate (ASR) of multiple white-box attacks, dropping from nearly 90% to approximately 1%.
Related Topics:
📰 Original Source: https://arxiv.org/abs/2601.16200v1
All rights and credit belong to the original publisher.