Skip to content

[Dataset] Add SeedBench Dataset #2020

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

ChenZiHong-Gavin
Copy link

@ChenZiHong-Gavin ChenZiHong-Gavin commented Apr 14, 2025

Motivation

This PR introduces a new domain-specific benchmark dataset, SeedBench, which is the first multi-task benchmark designed to evaluate large language models (LLMs) in seed science, focusing on seed breeding.

Modification

Added a new dataset class SeedBenchDataset and implemented some metrics like F1Evaluator in opencompass/datasets/SeedBench.py.

Added configuration file seedbench_gen_5d5ea1.py, seedbench_gen.py and README.md in configs/datasets/SeedBench/.

Registered the dataset in datasets/init.py.

Updated datasets_info.py with dataset metadata.

Updated dataset-index.yml with dataset metadata.

BC-breaking (Optional)

No backward compatibility breaking changes introduced.

Use cases (Optional)

SeedBench assesses LLMs across three core seed breeding stages:

  • Gene Information Retrieval
  • Gene Function and Regulation Analysis
  • Variety Breeding with Agronomic Trait Optimization

Built with domain experts, SeedBench features 2,264 expert-validated questions across 11 task types and 10 subcategories, initially targeting rice breeding. Future updates will include other crops like maize, soybean, and wheat.

Following the instruction, we can evaluate with SeedBench using:

DATASET_SOURCE=ModelScope python run.py --hf-type chat --hf-path Qwen/Qwen2.5-0.5B-Instruct  --datasets seedbench_gen --debug

Checklist

Before PR:

  • Pre-commit or other linting tools are used to fix the potential lint issues.
  • Tested on ModelScope and local environment.
  • The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

  • If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects.
  • CLA has been signed and all committers have signed the CLA in this PR.

@tpoisonooo
Copy link
Contributor

cc @tonysy

@ChenZiHong-Gavin ChenZiHong-Gavin marked this pull request as ready for review April 15, 2025 06:27
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new domain-specific benchmark dataset, SeedBench, for evaluating LLMs in seed science and breeding.

  • Introduces the SeedBenchDataset and multiple evaluators (F1ScoreEvaluator, AverageRougeScoreEvaluator, AccScoreStr_Evaluator) in opencompass/datasets/SeedBench.py.
  • Adds a new dataset configuration along with corresponding documentation and metadata updates in datasets_info.py, dataset-index.yml, and configs/datasets/SeedBench/.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
opencompass/utils/datasets_info.py Registers SeedBench metadata in dataset info
opencompass/datasets/init.py Imports the new SeedBench dataset module
opencompass/datasets/SeedBench.py Implements SeedBenchDataset and its evaluators
opencompass/configs/datasets/SeedBench/seedbench_gen_5d5ea1.py Provides configuration for SeedBench evaluation
opencompass/configs/datasets/SeedBench/seedbench_gen.py Reads base configuration for SeedBench datasets
opencompass/configs/datasets/SeedBench/README.md Documents the SeedBench dataset details
dataset-index.yml Adds SeedBench entry for dataset indexing
Comments suppressed due to low confidence (2)

opencompass/datasets/SeedBench.py:305

  • [nitpick] The evaluator class name 'AccScoreStr_Evaluator' is inconsistent with its base class naming convention. Consider renaming it to 'AccScoreStrEvaluator' to maintain clarity and consistency.
class AccScoreStr_Evaluator(AccScoreStrEvaluator):

opencompass/utils/datasets_info.py:233

  • [nitpick] The dataset key 'opencompass/seedbench' uses lowercase while the corresponding module file is named 'SeedBench.py'. Ensure consistent casing across modules and identifiers to avoid potential issues on case-sensitive systems.
"opencompass/seedbench": {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants