feat(datasets): add MaritimeBench dataset and related configuration #2018

K-zhy · 2025-04-14T03:09:13Z

✅ Motivation
This PR introduces a new domain-specific benchmark dataset, MaritimeBench, which is designed to evaluate AI models' understanding and reasoning capabilities in the maritime field. The goal is to provide comprehensive evaluation tools for maritime-related tasks such as navigation, marine engineering, and GMDSS.

✅ Modification
Added a new dataset class MaritimeBenchDataset in opencompass/datasets/.

Added configuration file maritimebench_gen.py and README.md in configs/datasets/maritimebench/.

Updated datasets_info.py with dataset metadata.

Updated text_postprocessors.py with parse_bracketed_answer to support MaritimeBench answer format.

Registered the dataset in datasets/init.py.

❌ BC-breaking (Optional)
No backward compatibility breaking changes introduced.

✅ Use cases (Optional)
This dataset can be used to evaluate foundation models like Qwen2.5-32B, InternLM, or Yi-34B in professional maritime tasks. It is especially useful for:

Knowledge understanding and reasoning in maritime exams

Evaluating model accuracy on single-choice maritime questions

Automated assessments in crew training or certification

✅ Checklist
Before PR:

Pre-commit hooks have been run to ensure code quality.

Dataset logic tested on both HuggingFace and ModelScope sources.

Format postprocessing verified with parse_bracketed_answer.

After PR:

This PR has no impact on existing benchmarks or interfaces.

CLA has been signed.

Added MaritimeBench dataset, including dataset metadata, configuration files, data processing logic, and a text post-processing function. This dataset is designed to evaluate AI models' domain knowledge and reasoning ability in the maritime field.

mm-assistant bot assigned bittersweet1999 Apr 14, 2025

K-zhy temporarily deployed to prod April 14, 2025 03:09 — with GitHub Actions Inactive

K-zhy mentioned this pull request Apr 14, 2025

add MaritimeBench dataset support for maritime domain evaluation #2016

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(datasets): add MaritimeBench dataset and related configuration #2018

feat(datasets): add MaritimeBench dataset and related configuration #2018

Uh oh!

K-zhy commented Apr 14, 2025

Uh oh!

Uh oh!

feat(datasets): add MaritimeBench dataset and related configuration #2018

Are you sure you want to change the base?

feat(datasets): add MaritimeBench dataset and related configuration #2018

Uh oh!

Conversation

K-zhy commented Apr 14, 2025

Uh oh!

Uh oh!