Skip to content

[Benchmark] Support M4Bench #1163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

Thorin215
Copy link

Support M4Bench [IJCAI 2025]

@Thorin215
Copy link
Author

Hi @kennymckormick @niboshi @mfarre @gnobitab
I'm writing to beg for some help with a persistent CI failure on my PR #1163 . The vlm_test job is consistently failing.

I've reviewed my changes but can't spot an obvious reason for a major performance issue. I am certain that the modifications I made will not affect files such as MMBench and OCRBench in the test. Could you please take a look when you have a moment or offer any suggestions on what might be causing the runner to terminate the process?

Any help would be greatly appreciated.

@kennymckormick
Copy link
Member

Hi, @Thorin215

The CI problems might be due to that you develop based on an earlier commit, after which we updated the CI. So currently you can ignore it.

However, I noticed several problems in this PR, which makes this benchmark not usable by users:

  1. In m4bench.py, you used several interfaces we have not defined, like parse_choice or has_image, which leads to errors
  2. In your tsv file, you only save the urls to the images, which is not a recommended practice and is not correctly handled by your build_prompt. We recommend you to save image as base64 format in the tsv file.

Please fix those issues and make sure you can successfully run the evaluation under VLMEvalKit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants