Skip to content

Official Implementation of OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

Notifications You must be signed in to change notification settings

LanceZPF/OpenING

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenING

🚀 Quickstart | 🌐 Homepage | 🏆 Leaderboard | 🤗 IntJudge | 📖 OpenING arXiv | 🖊️ Citation

This repository is the official implementation of OpenING (CVPR 2025 Oral).

OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
Pengfei Zhou*, Xiaopeng Peng*, Jiajun Song, Chuanhao Li, Zhaopan Xu, Yue Yang, Ziyao Guo, Hao Zhang, Yuqi Lin, Yefei He, Lirui Zhao, Shuo Liu, Tianhua Li, Yuxuan Xie, Xiaojun Chang, Yu Qiao, Wenqi Shao, Kaipeng Zhang† * Equal Contribution
Corresponding Author: [email protected]

💡 News

  • 2025/02/27: The beta version of OpenING data can be accessed via Google Drive. If you have any questions, please contact us.
  • 2025/02/26: Our paper is accepted by CVPR 2025, and selected as Oral. Thanks to all contributors.
  • 2024/11/29: Our judge model IntJudge is released!
  • 2024/11/28: We are releasing the evaluation code here.
  • 2024/11/27: The technical report of OpenING is released! And check our project page!

📖 Introduction

We introduce OpenING, a comprehensive benchmark comprising 5,400 high-quality human-annotated instances across 56 real-world tasks. OpenING covers diverse daily scenarios such as travel guide, design, and brainstorming, offering a robust platform for challenging interleaved generation methods. In addition, we present IntJudge, a judge model for evaluating open-ended multimodal generation methods. Trained with a novel data pipeline, our IntJudge achieves an agreement rate of 82.42% with human judgments, outperforming GPT-based evaluators by 11.34%. Extensive experiments on OpenING reveal that current interleaved generation methods still have substantial room for improvement. Key findings on interleaved image-text generation are further presented to guide the development of next-generation generative models. We anticipate that more advanced multimodal judge models can be trained and tested on OpenING and we also believe that OpenING will push the boundaries of MLLMs towards general-purpose multimodal intelligence.

Alt text

🏆 Leaderboard

  • An overview of model win rates evaluated by human, GPT-4o, and our IntJudge under FDT and different tie metrics. FDT: Force Dividing Tie metric. w/o Tie: Non-tie case. w/ Tie (0) and w/ Tie (.5): Count a tie as 0 and 0.5 wins for a model in a pairwise comparison, respectively. The best-performing model in each category is in-bold, and the second best is underlined. overview

Please refer to this to view the dynamic leaderboard.

🚀 Quick Start

Please refer to this to quick start.

🌟 Disclaimers

The guidelines for the annotators emphasized strict compliance with copyright and licensing rules from the initial data source, specifically avoiding materials from websites that forbid copying and redistribution. Should you encounter any data samples potentially breaching the copyright or licensing regulations of any site, please feel free to contact us. Upon verification, we will immediately remove the potentially breaching samples.

📞 Contact

🖊️ Citation

If you feel OpenING useful in your project or research, please kindly use the following BibTeX entry to cite our paper. Thanks!

@misc{zhou2024GATE,
  title={GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation}, 
  author={Pengfei Zhou, Xiaopeng Peng, Jiajun Song, Chuanhao Li, Zhaopan Xu, Yue Yang, Ziyao Guo, Hao Zhang, Yuqi Lin, Yefei He, Lirui Zhao, Shuo Liu, Tianhua Li, Yuxuan Xie, Xiaojun Chang, Yu Qiao, Wenqi Shao, and Kaipeng Zhang},
  year={2024},
  eprint={2411.18499},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

About

Official Implementation of OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages