Skip to content

feat(datasets): add MaritimeBench dataset and related configuration #2018

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

K-zhy
Copy link

@K-zhy K-zhy commented Apr 14, 2025

✅ Motivation
This PR introduces a new domain-specific benchmark dataset, MaritimeBench, which is designed to evaluate AI models' understanding and reasoning capabilities in the maritime field. The goal is to provide comprehensive evaluation tools for maritime-related tasks such as navigation, marine engineering, and GMDSS.

✅ Modification
Added a new dataset class MaritimeBenchDataset in opencompass/datasets/.

Added configuration file maritimebench_gen.py and README.md in configs/datasets/maritimebench/.

Updated datasets_info.py with dataset metadata.

Updated text_postprocessors.py with parse_bracketed_answer to support MaritimeBench answer format.

Registered the dataset in datasets/init.py.

❌ BC-breaking (Optional)
No backward compatibility breaking changes introduced.

✅ Use cases (Optional)
This dataset can be used to evaluate foundation models like Qwen2.5-32B, InternLM, or Yi-34B in professional maritime tasks. It is especially useful for:

Knowledge understanding and reasoning in maritime exams

Evaluating model accuracy on single-choice maritime questions

Automated assessments in crew training or certification

✅ Checklist
Before PR:

Pre-commit hooks have been run to ensure code quality.

Dataset logic tested on both HuggingFace and ModelScope sources.

Format postprocessing verified with parse_bracketed_answer.

After PR:

This PR has no impact on existing benchmarks or interfaces.

CLA has been signed.

Added MaritimeBench dataset, including dataset metadata, configuration files, data processing logic, and a text post-processing function. This dataset is designed to evaluate AI models' domain knowledge and reasoning ability in the maritime field.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants