-
Notifications
You must be signed in to change notification settings - Fork 578
[Dataset] Add SeedBench Dataset #2020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
cc @tonysy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a new domain-specific benchmark dataset, SeedBench, for evaluating LLMs in seed science and breeding.
- Introduces the SeedBenchDataset and multiple evaluators (F1ScoreEvaluator, AverageRougeScoreEvaluator, AccScoreStr_Evaluator) in opencompass/datasets/SeedBench.py.
- Adds a new dataset configuration along with corresponding documentation and metadata updates in datasets_info.py, dataset-index.yml, and configs/datasets/SeedBench/.
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
opencompass/utils/datasets_info.py | Registers SeedBench metadata in dataset info |
opencompass/datasets/init.py | Imports the new SeedBench dataset module |
opencompass/datasets/SeedBench.py | Implements SeedBenchDataset and its evaluators |
opencompass/configs/datasets/SeedBench/seedbench_gen_5d5ea1.py | Provides configuration for SeedBench evaluation |
opencompass/configs/datasets/SeedBench/seedbench_gen.py | Reads base configuration for SeedBench datasets |
opencompass/configs/datasets/SeedBench/README.md | Documents the SeedBench dataset details |
dataset-index.yml | Adds SeedBench entry for dataset indexing |
Comments suppressed due to low confidence (2)
opencompass/datasets/SeedBench.py:305
- [nitpick] The evaluator class name 'AccScoreStr_Evaluator' is inconsistent with its base class naming convention. Consider renaming it to 'AccScoreStrEvaluator' to maintain clarity and consistency.
class AccScoreStr_Evaluator(AccScoreStrEvaluator):
opencompass/utils/datasets_info.py:233
- [nitpick] The dataset key 'opencompass/seedbench' uses lowercase while the corresponding module file is named 'SeedBench.py'. Ensure consistent casing across modules and identifiers to avoid potential issues on case-sensitive systems.
"opencompass/seedbench": {
Motivation
This PR introduces a new domain-specific benchmark dataset, SeedBench, which is the first multi-task benchmark designed to evaluate large language models (LLMs) in seed science, focusing on seed breeding.
Modification
Added a new dataset class SeedBenchDataset and implemented some metrics like F1Evaluator in opencompass/datasets/SeedBench.py.
Added configuration file seedbench_gen_5d5ea1.py, seedbench_gen.py and README.md in configs/datasets/SeedBench/.
Registered the dataset in datasets/init.py.
Updated datasets_info.py with dataset metadata.
Updated dataset-index.yml with dataset metadata.
BC-breaking (Optional)
No backward compatibility breaking changes introduced.
Use cases (Optional)
SeedBench assesses LLMs across three core seed breeding stages:
Built with domain experts, SeedBench features 2,264 expert-validated questions across 11 task types and 10 subcategories, initially targeting rice breeding. Future updates will include other crops like maize, soybean, and wheat.
Following the instruction, we can evaluate with SeedBench using:
Checklist
Before PR:
After PR: