[Dataset] Add SeedBench Dataset #2020

ChenZiHong-Gavin · 2025-04-14T06:27:57Z

Motivation

This PR introduces a new domain-specific benchmark dataset, SeedBench, which is the first multi-task benchmark designed to evaluate large language models (LLMs) in seed science, focusing on seed breeding.

Modification

Added a new dataset class SeedBenchDataset and implemented some metrics like F1Evaluator in opencompass/datasets/SeedBench.py.

Added configuration file seedbench_gen_5d5ea1.py, seedbench_gen.py and README.md in configs/datasets/SeedBench/.

Registered the dataset in datasets/init.py.

Updated datasets_info.py with dataset metadata.

Updated dataset-index.yml with dataset metadata.

BC-breaking (Optional)

No backward compatibility breaking changes introduced.

Use cases (Optional)

SeedBench assesses LLMs across three core seed breeding stages:

Gene Information Retrieval
Gene Function and Regulation Analysis
Variety Breeding with Agronomic Trait Optimization

Built with domain experts, SeedBench features 2,264 expert-validated questions across 11 task types and 10 subcategories, initially targeting rice breeding. Future updates will include other crops like maize, soybean, and wheat.

Following the instruction, we can evaluate with SeedBench using:

DATASET_SOURCE=ModelScope python run.py --hf-type chat --hf-path Qwen/Qwen2.5-0.5B-Instruct  --datasets seedbench_gen --debug

Checklist

Before PR:

Pre-commit or other linting tools are used to fix the potential lint issues.
Tested on ModelScope and local environment.
The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects.
CLA has been signed and all committers have signed the CLA in this PR.

tpoisonooo · 2025-04-14T07:32:06Z

cc @tonysy

…ompass into SeedBench

Copilot

Pull Request Overview

This PR adds a new domain-specific benchmark dataset, SeedBench, for evaluating LLMs in seed science and breeding.

Introduces the SeedBenchDataset and multiple evaluators (F1ScoreEvaluator, AverageRougeScoreEvaluator, AccScoreStr_Evaluator) in opencompass/datasets/SeedBench.py.
Adds a new dataset configuration along with corresponding documentation and metadata updates in datasets_info.py, dataset-index.yml, and configs/datasets/SeedBench/.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
opencompass/utils/datasets_info.py	Registers SeedBench metadata in dataset info
opencompass/datasets/init.py	Imports the new SeedBench dataset module
opencompass/datasets/SeedBench.py	Implements SeedBenchDataset and its evaluators
opencompass/configs/datasets/SeedBench/seedbench_gen_5d5ea1.py	Provides configuration for SeedBench evaluation
opencompass/configs/datasets/SeedBench/seedbench_gen.py	Reads base configuration for SeedBench datasets
opencompass/configs/datasets/SeedBench/README.md	Documents the SeedBench dataset details
dataset-index.yml	Adds SeedBench entry for dataset indexing

Comments suppressed due to low confidence (2)

opencompass/datasets/SeedBench.py:305

[nitpick] The evaluator class name 'AccScoreStr_Evaluator' is inconsistent with its base class naming convention. Consider renaming it to 'AccScoreStrEvaluator' to maintain clarity and consistency.

class AccScoreStr_Evaluator(AccScoreStrEvaluator):

opencompass/utils/datasets_info.py:233

[nitpick] The dataset key 'opencompass/seedbench' uses lowercase while the corresponding module file is named 'SeedBench.py'. Ensure consistent casing across modules and identifiers to avoid potential issues on case-sensitive systems.

"opencompass/seedbench": {

[Dataset] Add SeedBench Dataset

dae700e

mm-assistant bot assigned bittersweet1999 Apr 14, 2025

ChenZiHong-Gavin and others added 7 commits April 14, 2025 19:51

docs: add README for SeedBench

f9b1636

refactor: delete unnecessary comment

db04df7

Merge branch 'main' into SeedBench

8000375

fix: fix load function for SeedBenchDataset

c9ea024

Merge branch 'open-compass:main' into SeedBench

cbfac1e

fix: delete unnecessary code

39b34d6

Merge branch 'SeedBench' of https://github.com/ChenZiHong-Gavin/openc…

332acdf

…ompass into SeedBench

ChenZiHong-Gavin marked this pull request as ready for review April 15, 2025 06:27

ChenZiHong-Gavin and others added 2 commits April 15, 2025 15:34

fix: fix typo

e335b29

Merge branch 'main' into SeedBench

2ded84a

ChenZiHong-Gavin temporarily deployed to prod April 24, 2025 11:31 — with GitHub Actions Inactive

tonysy requested review from Myhs-phz, Copilot and MaiziXiao April 24, 2025 11:31

Copilot AI reviewed Apr 24, 2025

View reviewed changes

Merge branch 'main' into SeedBench

d26e808

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Dataset] Add SeedBench Dataset #2020

[Dataset] Add SeedBench Dataset #2020

Uh oh!

ChenZiHong-Gavin commented Apr 14, 2025 •

edited

Loading

Uh oh!

tpoisonooo commented Apr 14, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

[Dataset] Add SeedBench Dataset #2020

Are you sure you want to change the base?

[Dataset] Add SeedBench Dataset #2020

Uh oh!

Conversation

ChenZiHong-Gavin commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modification

BC-breaking (Optional)

Use cases (Optional)

Checklist

Uh oh!

tpoisonooo commented Apr 14, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

ChenZiHong-Gavin commented Apr 14, 2025 •

edited

Loading