OpenING

This repository is the official implementation of OpenING (CVPR 2025 Oral).

OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
Pengfei Zhou*, Xiaopeng Peng*, Jiajun Song, Chuanhao Li, Zhaopan Xu, Yue Yang, Ziyao Guo, Hao Zhang, Yuqi Lin, Yefei He, Lirui Zhao, Shuo Liu, Tianhua Li, Yuxuan Xie, Xiaojun Chang, Yu Qiao, Wenqi Shao, Kaipeng Zhang† ^* Equal Contribution
^† Corresponding Author: [email protected]

💡 News

2025/02/27: The beta version of OpenING data can be accessed via Google Drive. If you have any questions, please contact us.
2025/02/26: Our paper is accepted by CVPR 2025, and selected as Oral. Thanks to all contributors.
2024/11/29: Our judge model IntJudge is released!
2024/11/28: We are releasing the evaluation code here.
2024/11/27: The technical report of OpenING is released! And check our project page!

📖 Introduction

We introduce OpenING, a comprehensive benchmark comprising 5,400 high-quality human-annotated instances across 56 real-world tasks. OpenING covers diverse daily scenarios such as travel guide, design, and brainstorming, offering a robust platform for challenging interleaved generation methods. In addition, we present IntJudge, a judge model for evaluating open-ended multimodal generation methods. Trained with a novel data pipeline, our IntJudge achieves an agreement rate of 82.42% with human judgments, outperforming GPT-based evaluators by 11.34%. Extensive experiments on OpenING reveal that current interleaved generation methods still have substantial room for improvement. Key findings on interleaved image-text generation are further presented to guide the development of next-generation generative models. We anticipate that more advanced multimodal judge models can be trained and tested on OpenING and we also believe that OpenING will push the boundaries of MLLMs towards general-purpose multimodal intelligence.

🏆 Leaderboard

An overview of model win rates evaluated by human, GPT-4o, and our IntJudge under FDT and different tie metrics. FDT: Force Dividing Tie metric. w/o Tie: Non-tie case. w/ Tie (0) and w/ Tie (.5): Count a tie as 0 and 0.5 wins for a model in a pairwise comparison, respectively. The best-performing model in each category is in-bold, and the second best is underlined.

Please refer to this to view the dynamic leaderboard.

🚀 Quick Start

Please refer to this to quick start.

🌟 Disclaimers

The guidelines for the annotators emphasized strict compliance with copyright and licensing rules from the initial data source, specifically avoiding materials from websites that forbid copying and redistribution. Should you encounter any data samples potentially breaching the copyright or licensing regulations of any site, please feel free to contact us. Upon verification, we will immediately remove the potentially breaching samples.

📞 Contact

Pengfei Zhou: [email protected]
Kaipeng Zhang: [email protected]

🖊️ Citation

If you feel OpenING useful in your project or research, please kindly use the following BibTeX entry to cite our paper. Thanks!

@misc{zhou2024GATE,
  title={GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation}, 
  author={Pengfei Zhou, Xiaopeng Peng, Jiajun Song, Chuanhao Li, Zhaopan Xu, Yue Yang, Ziyao Guo, Hao Zhang, Yuqi Lin, Yefei He, Lirui Zhao, Shuo Liu, Tianhua Li, Yuxuan Xie, Xiaojun Chang, Yu Qiao, Wenqi Shao, and Kaipeng Zhang},
  year={2024},
  eprint={2411.18499},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Interleaved_Arena		Interleaved_Arena
OpenING-benchmark		OpenING-benchmark
assets		assets
gen_outputs/GPT-4o+DALL-E3_output		gen_outputs/GPT-4o+DALL-E3_output
generator_codes		generator_codes
prompts		prompts
tools		tools
traindata_IntJudge		traindata_IntJudge
traindata_MiniGPT-5		traindata_MiniGPT-5
.DS_Store		.DS_Store
GPT_judge_AB.py		GPT_judge_AB.py
GPT_score_A.py		GPT_score_A.py
IntJudge_judge_AB.py		IntJudge_judge_AB.py
Quickstart.md		Quickstart.md
README.md		README.md
calculate_agreement.py		calculate_agreement.py
calculate_model_winrate.py		calculate_model_winrate.py
gpt-dalle_generation.py		gpt-dalle_generation.py
requirements.txt		requirements.txt
sampling_battles_pairs.py		sampling_battles_pairs.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OpenING

💡 News

📖 Introduction

🏆 Leaderboard

🚀 Quick Start

🌟 Disclaimers

📞 Contact

🖊️ Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

LanceZPF/OpenING

Folders and files

Latest commit

History

Repository files navigation

OpenING

💡 News

📖 Introduction

🏆 Leaderboard

🚀 Quick Start

🌟 Disclaimers

📞 Contact

🖊️ Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages