MedMO: Grounding and Understanding Multimodal Large Language Model for Medical Images

Image generated by Gemini AI
MedMO is a new multimodal large language model designed for the medical field, addressing limitations in existing models. It employs a multi-stage training process, including cross-modal pretraining and reinforcement learning, resulting in significant performance improvements: +13.7% in visual question answering and notable gains in report generation accuracy. MedMO shows strong grounding capabilities across various medical specialties. Two model versions, 4B and 8B, are available at genmilab.github.io/MedMO-Page.
MedMO: A Breakthrough in Multimodal Large Language Models for Medical Imaging
Researchers have introduced MedMO, a specialized medical foundation model designed to enhance the use of multimodal large language models (MLLMs) in healthcare settings. MedMO addresses limitations hindering the adoption of MLLMs in medicine, particularly in domain coverage and grounded reasoning.
Training Methodology and Performance
MedMO utilizes a multi-stage training approach that includes cross-modal pretraining, instruction tuning, and reinforcement learning. As a result, it consistently outperforms existing open-source medical MLLMs. In visual question answering benchmarks, MedMO achieved an average accuracy improvement of 13.7% over baseline models and performed closely within 1.9% of the state-of-the-art model, Fleming-VL.
Clinical Application and Grounding Capabilities
MedMO demonstrates significant advancements in medical report generation, with notable improvements in semantic and clinical accuracy. Its grounding capabilities show a 40.4% increase in Intersection over Union (IoU) metrics compared to baseline models, essential for interpreting complex medical images.
Availability
MedMO is available in two versions, 4 billion and 8 billion parameters. The project can be accessed at MedMO Project Page.
Related Topics:
📰 Original Source: https://arxiv.org/abs/2602.06965v1
All rights and credit belong to the original publisher.