Infi-MMR: Curriculum-based Unlocking Multimodal Reasoning via Phased Reinforcement Learning in Multimodal Small Language Models
This is the official repository for the paper Infi-MMR-3B.
In this work, we design a novel framework, Infi-MMR, to systematically unlock the reasoning potential of Multimodal Small Language Models (MSLMs) through a curriculum of three carefully structured phases and propose our multimodal reasoning model Infi-MMR-3B.
Specially, Infi-MMR, a curriculum-based progressive rule-based RL training framework that unfolds in three distinct phases:
- Foundational Reasoning Activation leverages high-quality textual reasoning datasets to activate and strengthen the modelβs logical reasoning capabilities.
- Cross-Modal Reasoning Adaptation utilizes caption-augmented multimodal data to facilitate the progressive transfer of reasoning skills to multimodal contexts.
- Multimodal Reasoning Enhancement employs curated, caption-free multimodal data to mitigate linguistic biases and promote robust cross-modal reasoning.
2025/06/03
Model Weights have been uploaded to Hugging Face.2025/05/30
Our Preprint has been published on arXiv.
If you find this work useful, citations to the following papers are welcome:
@article{liu2025infimmr,
title={Infi-MMR: Curriculum-based Unlocking Multimodal Reasoning via Phased Reinforcement Learning in Multimodal Small Language Models},
author={Zeyu Liu and Yuhang Liu and Guanghao Zhu and Congkai Xie and Zhen Li and Jianbo Yuan and Xinyao Wang and Qing Li and Shing-Chi Cheung and Shengyu Zhang and Fei Wu and Hongxia Yang},
journal={arXiv preprint arXiv:2505.23091},
year={2025}
}