Skip to content

Reallm-Labs/Infi-MMR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Infi-MMR: Curriculum-based Unlocking Multimodal Reasoning via Phased Reinforcement Learning in Multimodal Small Language Models

arXiv Paper Hugging Face Paper Hugging Face Model


This is the official repository for the paper Infi-MMR-3B.

🌟 Overview

In this work, we design a novel framework, Infi-MMR, to systematically unlock the reasoning potential of Multimodal Small Language Models (MSLMs) through a curriculum of three carefully structured phases and propose our multimodal reasoning model Infi-MMR-3B.

Specially, Infi-MMR, a curriculum-based progressive rule-based RL training framework that unfolds in three distinct phases:

  • Foundational Reasoning Activation leverages high-quality textual reasoning datasets to activate and strengthen the model’s logical reasoning capabilities.
  • Cross-Modal Reasoning Adaptation utilizes caption-augmented multimodal data to facilitate the progressive transfer of reasoning skills to multimodal contexts.
  • Multimodal Reasoning Enhancement employs curated, caption-free multimodal data to mitigate linguistic biases and promote robust cross-modal reasoning.
Method Overview

Infi-MMR training framework

πŸš€ Updates

πŸ“š Citation Information

If you find this work useful, citations to the following papers are welcome:

@article{liu2025infimmr,
  title={Infi-MMR: Curriculum-based Unlocking Multimodal Reasoning via Phased Reinforcement Learning in Multimodal Small Language Models},
  author={Zeyu Liu and Yuhang Liu and Guanghao Zhu and Congkai Xie and Zhen Li and Jianbo Yuan and Xinyao Wang and Qing Li and Shing-Chi Cheung and Shengyu Zhang and Fei Wu and Hongxia Yang},
  journal={arXiv preprint arXiv:2505.23091},
  year={2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published