[Benchmark] Support M4Bench #1163

Thorin215 · 2025-07-15T13:40:12Z

Support M4Bench [IJCAI 2025]

Thorin215 · 2025-07-17T05:35:30Z

Hi @kennymckormick @niboshi @mfarre @gnobitab
I'm writing to beg for some help with a persistent CI failure on my PR #1163 . The vlm_test job is consistently failing.

I've reviewed my changes but can't spot an obvious reason for a major performance issue. I am certain that the modifications I made will not affect files such as MMBench and OCRBench in the test. Could you please take a look when you have a moment or offer any suggestions on what might be causing the runner to terminate the process?

Any help would be greatly appreciated.

kennymckormick · 2025-07-20T09:39:42Z

Hi, @Thorin215

The CI problems might be due to that you develop based on an earlier commit, after which we updated the CI. So currently you can ignore it.

However, I noticed several problems in this PR, which makes this benchmark not usable by users:

In m4bench.py, you used several interfaces we have not defined, like parse_choice or has_image, which leads to errors
In your tsv file, you only save the urls to the images, which is not a recommended practice and is not correctly handled by your build_prompt. We recommend you to save image as base64 format in the tsv file.

Please fix those issues and make sure you can successfully run the evaluation under VLMEvalKit.

[Benchmark]Support M4Bench

200d406

kennymckormick added the WIP label Jul 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Benchmark] Support M4Bench #1163

[Benchmark] Support M4Bench #1163

Uh oh!

Thorin215 commented Jul 15, 2025

Uh oh!

Thorin215 commented Jul 17, 2025

Uh oh!

kennymckormick commented Jul 20, 2025

Uh oh!

Uh oh!

[Benchmark] Support M4Bench #1163

Are you sure you want to change the base?

[Benchmark] Support M4Bench #1163

Uh oh!

Conversation

Thorin215 commented Jul 15, 2025

Uh oh!

Thorin215 commented Jul 17, 2025

Uh oh!

kennymckormick commented Jul 20, 2025

Uh oh!

Uh oh!