Skip to content

feat(datasets): add MaritimeBench dataset and related configuration #2018

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions opencompass/configs/datasets/maritimebench/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
## 📘 About MaritimeBench

**MaritimeBench** 是航运行业首个基于“学科(一级)- 子学科(二级)- 具体考点(三级)”分类体系构建的专业知识评测集。该数据集包含 **1888 道客观选择题**,覆盖以下核心领域:

- 航海
- 轮机
- 电子电气员
- GMDSS(全球海上遇险与安全系统)
- 船员培训

评测内容涵盖理论知识、操作技能及行业规范,旨在:

- 提升 AI 模型在航运领域的 **理解与推理能力**
- 确保其在关键知识点上的 **准确性与适应性**
- 支持航运专业考试、船员培训及资质认证的 **自动化测评**
- 优化船舶管理、导航操作、海上通信等场景下的 **智能问答与决策系统**

MaritimeBench 基于行业权威标准,构建了 **系统、科学的知识评测体系**,全面衡量模型在航运各专业领域的表现,助力其专业化发展。

---

## 🧪 示例

请回答单选题。要求只输出选项,不输出解释,将选项放在 `< >` 内,直接输出答案。

**题目 1:**
在船舶主推进动力装置中,传动轴系在运转中承受以下复杂的应力和负荷,但不包括______。
选项:
A. 电磁力
B. 压拉应力
C. 弯曲应力
D. 扭应力
**答:** `<A>`

**题目 2:**
当船舶实行 PMS 检验时,应将 CCS 现行规范中规定的特别检验纳入在 PMS 计划表中,下列应包括______。
① 每年应进行的确认性检查项目
② 每年应进行的拆检项目
③ 5 年内应拆检的项目
④ 5 年内应进行的确认性检查项目
选项:
A. ①④
B. ②④
C. ①③
D. ①②③④
**答:** `<C>`

---

## 📂 Dataset Links

- [MaritimeBench on Hugging Face](https://huggingface.co/datasets/Hi-Dolphin/MaritimeBench)
- [MaritimeBench on ModelScope](https://modelscope.cn/datasets/HiDolphin/MaritimeBench/summary)

---

## 📊 模型测试结果

| dataset | version | metric | mode | Qwen2.5-32B |
|----- | ----- | ----- | ----- | -----|
| maritimebench | 6d56ec | accuracy | gen | 72.99 |
42 changes: 42 additions & 0 deletions opencompass/configs/datasets/maritimebench/maritimebench_gen.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
from opencompass.datasets import MaritimeBenchDataset
from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.utils.text_postprocessors import parse_bracketed_answer
from opencompass.openicl.icl_evaluator import AccEvaluator
from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer

maritimebench_reader_cfg = dict(
input_columns=['question', 'A', 'B', 'C', 'D'],
output_column='answer',
train_split='test' # 明确指定使用test分割
)

maritimebench_infer_cfg = dict(
prompt_template=dict(
type=PromptTemplate,
template=dict(
round=[
dict(role='HUMAN', prompt='请回答单选题。要求只输出选项,不输出解释,将选项放在<>里,直接输出答案。示例:\n\n题目:在船舶主推进动力装置中,传动轴系在运转中承受以下复杂的应力和负荷,但不包括______。\n选项:\nA. 电磁力\nB. 压拉应力\nC. 弯曲应力\nD. 扭应力\n答:<A> 当前题目:\n {question}\nA:{A}\nB:{B}\nC:{C}\nD:{D}')
]
),
),
retriever=dict(type=ZeroRetriever), # 不使用上下文
inferencer=dict(type=GenInferencer) # 添加推理器配置
)

maritimebench_eval_cfg = dict(
evaluator=dict(type=AccEvaluator),
pred_postprocessor=dict(type=parse_bracketed_answer, options='A|B|C|D')
)

maritimebench_datasets = [
dict(
abbr='maritimebench',
type=MaritimeBenchDataset,
name='default',
path='opencompass/maritimebench',
reader_cfg=maritimebench_reader_cfg,
infer_cfg=maritimebench_infer_cfg,
eval_cfg=maritimebench_eval_cfg
)
]
1 change: 1 addition & 0 deletions opencompass/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@
from .longbench import * # noqa: F401, F403
from .longbenchv2 import * # noqa: F401, F403
from .lveval import * # noqa: F401, F403
from .maritime_bench import * # noqa: F401, F403
from .mastermath2024v1 import * # noqa: F401, F403
from .math import * # noqa: F401, F403
from .math401 import * # noqa: F401, F403
Expand Down
64 changes: 64 additions & 0 deletions opencompass/datasets/maritime_bench.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
import json
import os.path as osp
from os import environ

import datasets
from datasets import Dataset, DatasetDict

from opencompass.registry import LOAD_DATASET
from opencompass.utils import get_data_path

from .base import BaseDataset


@LOAD_DATASET.register_module()
class MaritimeBenchDataset(BaseDataset):

@staticmethod
def load(path: str, name: str) -> datasets.Dataset:
path = get_data_path(path)
dataset = DatasetDict()
dataset_list = []

if environ.get('DATASET_SOURCE') == 'ModelScope':
from modelscope import MsDataset
for split in ['test']:
# 从 ModelScope 加载数据
ms_dataset = MsDataset.load(path,
subset_name=name,
split=split)

for line in ms_dataset:
question = line['question']
A = line['A']
B = line['B']
C = line['C']
D = line['D']
answer = line['answer']
dataset_list.append({
'question': question,
'A': A,
'B': B,
'C': C,
'D': D,
'answer': answer,
})
# dataset[split] = Dataset.from_list(dataset_list)
else:
for split in ['test']:
filename = osp.join(path, split, f'{name}_{split}.jsonl')
with open(filename, encoding='utf-8') as f:
for line in f:
data = json.loads(line)
dataset_list.append({
'question': data['question'],
'A': data['A'],
'B': data['B'],
'C': data['C'],
'D': data['D'],
'answer': data['answer']
})

dataset[split] = Dataset.from_list(dataset_list)

return dataset
5 changes: 5 additions & 0 deletions opencompass/utils/datasets_info.py
Original file line number Diff line number Diff line change
Expand Up @@ -420,6 +420,11 @@
"hf_id": "",
"local": "./data/OlympiadBench",
},
"opencompass/maritimebench": {
"ms_id": "HiDolphin/MaritimeBench",
"hf_id": "Hi-Dolphin/MaritimeBench",
"local": "./data/maritimebench",
},
}

DATASETS_URL = {
Expand Down
7 changes: 7 additions & 0 deletions opencompass/utils/text_postprocessors.py
Original file line number Diff line number Diff line change
Expand Up @@ -283,3 +283,10 @@ def extract_non_reasoning_content(
re.DOTALL)
non_reasoning_content = reasoning_regex.sub('', text).strip()
return non_reasoning_content


def parse_bracketed_answer(text: str, options: str) -> str:
match = re.search(rf'<({options})>', text)
if match:
return match.group(1)
return ''