This repository contains the companion code for the AgEval benchmark datasets, focusing on plant stress identification, classification, and quantification. It includes 12 subsets of data used in the benchmark.
The main components of this repository are:
inference.py
: Script for evaluating models on the AgEval Benchmark datasets.data_loader.py
: Functions for downloading and preparing the benchmark datasets.
To replicate the results presented in the paper, run inference.py
to evaluate no-context or few-shot in-context learning on the datasets.
The inference.py
script contains:
- Implementation of multiple AI models (OpenAI, Anthropic, Google, OpenRouter)
- Functions for asynchronous processing to improve performance
- Progress tracking using tqdm
- Result saving in CSV format
- Evaluation of no-context and few-shot in-context learning
- Customizable number of shots for in-context learning
- GPT-4 (OpenAI)
- Claude-3.5-sonnet (Anthropic)
- Claude-3-haiku (Anthropic)
- LLaVA v1.6 34B (OpenRouter)
- Gemini-flash-1.5 (Google)
- Gemini-pro-1.5 (Google)
The data_loader.py
script provides functions to download and prepare the 12 AgEval benchmark datasets. Features include:
- Dataset-specific loading functions
- Automatic downloading from Kaggle or Zenodo if not present
- Extraction and renaming of files in the
/data
folder - Random sampling with a fixed seed for reproducibility
- Durum Wheat Dataset
- Soybean Seeds Dataset
- Mango Leaf Disease Dataset
- DeepWeeds Dataset
- Bean Leaf Lesions Dataset
- Yellow Rust 19 Dataset
- FUSARIUM 22 Dataset
- PlantDoc (Leaf Disease Segmentation)
- Dangerous Insects Dataset
- IDC (Iron Deficiency Chlorosis) Dataset
- Soybean Diseases Dataset (PNAS)
- InsectCount Dataset
Each dataset loading function accepts a total_samples_to_check
parameter (default is 100) to specify the number of samples per class for evaluation:
from data_loader import load_and_prepare_data_DurumWheat
samples, classes, dataset_name = load_and_prepare_data_DurumWheat(total_samples_to_check=50)
- The scripts will skip downloading datasets if they already exist in the
/data
folder. - Evaluation results are saved in the
/results
folder, organized by model name and dataset. - For detailed information on each dataset and the evaluation process, please refer to the AgEval benchmark paper.
For more detailed information about the implementation, please refer to the comments in the source code files.