LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR

Image generated by Gemini AI
LightOn has launched LightOnOCR-2-1B, a 1B-parameter multilingual model that transforms document images into organized text without traditional OCR. It excels in accuracy on OlmOCR-Bench while being 9x smaller and faster than its predecessors. The model predicts bounding boxes for images and employs innovative training strategies. Checkpoints and datasets are available under Apache 2.0, enhancing accessibility for further research.
LightOnOCR-2-1B: A Breakthrough in Multilingual OCR Technology
LightOn has unveiled the LightOnOCR-2-1B, a multilingual vision-language model designed to transform document images into structured text with remarkable efficiency. This model, comprising 1 billion parameters, promises to outperform traditional Optical Character Recognition (OCR) systems.
LightOnOCR-2 has demonstrated state-of-the-art performance on the OlmOCR-Bench benchmark and is 9 times smaller and significantly faster than its predecessors.
Key Features
- Normalized Bounding Box Prediction: Predicts normalized bounding boxes for embedded images, improving utility for complex layouts.
- Reinforcement Learning With Rewards: Refines performance through IoU-based rewards, ensuring more accurate text extraction.
LightOn has released the model checkpoints under the Apache 2.0 license, along with the accompanying dataset and the new LightOnOCR-bbox-bench evaluation. This positions LightOnOCR-2-1B as a significant advancement for applications requiring quick and accurate text extraction from multilingual document images.
Related Topics:
📰 Original Source: https://arxiv.org/abs/2601.14251v1
All rights and credit belong to the original publisher.