feat(datasets): add MaritimeBench dataset and related configuration #2018
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
✅ Motivation
This PR introduces a new domain-specific benchmark dataset, MaritimeBench, which is designed to evaluate AI models' understanding and reasoning capabilities in the maritime field. The goal is to provide comprehensive evaluation tools for maritime-related tasks such as navigation, marine engineering, and GMDSS.
✅ Modification
Added a new dataset class MaritimeBenchDataset in opencompass/datasets/.
Added configuration file maritimebench_gen.py and README.md in configs/datasets/maritimebench/.
Updated datasets_info.py with dataset metadata.
Updated text_postprocessors.py with parse_bracketed_answer to support MaritimeBench answer format.
Registered the dataset in datasets/init.py.
❌ BC-breaking (Optional)
No backward compatibility breaking changes introduced.
✅ Use cases (Optional)
This dataset can be used to evaluate foundation models like Qwen2.5-32B, InternLM, or Yi-34B in professional maritime tasks. It is especially useful for:
Knowledge understanding and reasoning in maritime exams
Evaluating model accuracy on single-choice maritime questions
Automated assessments in crew training or certification
✅ Checklist
Before PR:
Pre-commit hooks have been run to ensure code quality.
Dataset logic tested on both HuggingFace and ModelScope sources.
Format postprocessing verified with parse_bracketed_answer.
After PR:
This PR has no impact on existing benchmarks or interfaces.
CLA has been signed.