AI
AI News

MedMO: Grounding and Understanding Multimodal Large Language Model for Medical Images

Source:arXiv
Original Author:Ankan Deria et al.
MedMO: Grounding and Understanding Multimodal Large Language Model for Medical Images

Image generated by Gemini AI

MedMO is a new multimodal large language model designed for the medical field, addressing limitations in existing models. It employs a multi-stage training process, including cross-modal pretraining and reinforcement learning, resulting in significant performance improvements: +13.7% in visual question answering and notable gains in report generation accuracy. MedMO shows strong grounding capabilities across various medical specialties. Two model versions, 4B and 8B, are available at genmilab.github.io/MedMO-Page.

MedMO: A Breakthrough in Multimodal Large Language Models for Medical Imaging

Researchers have introduced MedMO, a specialized medical foundation model designed to enhance the use of multimodal large language models (MLLMs) in healthcare settings. MedMO addresses limitations hindering the adoption of MLLMs in medicine, particularly in domain coverage and grounded reasoning.

Training Methodology and Performance

MedMO utilizes a multi-stage training approach that includes cross-modal pretraining, instruction tuning, and reinforcement learning. As a result, it consistently outperforms existing open-source medical MLLMs. In visual question answering benchmarks, MedMO achieved an average accuracy improvement of 13.7% over baseline models and performed closely within 1.9% of the state-of-the-art model, Fleming-VL.

Clinical Application and Grounding Capabilities

MedMO demonstrates significant advancements in medical report generation, with notable improvements in semantic and clinical accuracy. Its grounding capabilities show a 40.4% increase in Intersection over Union (IoU) metrics compared to baseline models, essential for interpreting complex medical images.

Availability

MedMO is available in two versions, 4 billion and 8 billion parameters. The project can be accessed at MedMO Project Page.

Related Topics:

MedMOmultimodal large language modelsmedical foundation modelcross-modal pretrainingspatial reasoning

📰 Original Source: https://arxiv.org/abs/2602.06965v1

All rights and credit belong to the original publisher.

Share this article