AgEval Benchmark

This repository contains the companion code for the AgEval benchmark datasets, focusing on plant stress identification, classification, and quantification. It includes 12 subsets of data used in the benchmark.

Overview

The main components of this repository are:

inference.py: Script for evaluating models on the AgEval Benchmark datasets.
data_loader.py: Functions for downloading and preparing the benchmark datasets.

To replicate the results presented in the paper, run inference.py to evaluate no-context or few-shot in-context learning on the datasets.

Inference (`inference.py`)

The inference.py script contains:

Implementation of multiple AI models (OpenAI, Anthropic, Google, OpenRouter)
Functions for asynchronous processing to improve performance
Progress tracking using tqdm
Result saving in CSV format
Evaluation of no-context and few-shot in-context learning
Customizable number of shots for in-context learning

Supported Models

GPT-4 (OpenAI)
Claude-3.5-sonnet (Anthropic)
Claude-3-haiku (Anthropic)
LLaVA v1.6 34B (OpenRouter)
Gemini-flash-1.5 (Google)
Gemini-pro-1.5 (Google)

Data Loader (`data_loader.py`)

The data_loader.py script provides functions to download and prepare the 12 AgEval benchmark datasets. Features include:

Dataset-specific loading functions
Automatic downloading from Kaggle or Zenodo if not present
Extraction and renaming of files in the /data folder
Random sampling with a fixed seed for reproducibility

Available Datasets

Durum Wheat Dataset
Soybean Seeds Dataset
Mango Leaf Disease Dataset
DeepWeeds Dataset
Bean Leaf Lesions Dataset
Yellow Rust 19 Dataset
FUSARIUM 22 Dataset
PlantDoc (Leaf Disease Segmentation)
Dangerous Insects Dataset
IDC (Iron Deficiency Chlorosis) Dataset
Soybean Diseases Dataset (PNAS)
InsectCount Dataset

Usage

Each dataset loading function accepts a total_samples_to_check parameter (default is 100) to specify the number of samples per class for evaluation:

from data_loader import load_and_prepare_data_DurumWheat

samples, classes, dataset_name = load_and_prepare_data_DurumWheat(total_samples_to_check=50)

Notes

The scripts will skip downloading datasets if they already exist in the /data folder.
Evaluation results are saved in the /results folder, organized by model name and dataset.
For detailed information on each dataset and the evaluation process, please refer to the AgEval benchmark paper.

For more detailed information about the implementation, please refer to the comments in the source code files.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
.vscode		.vscode
__pycache__		__pycache__
analysis		analysis
results		results
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
Analysis-Bullseye-IntraTask.ipynb		Analysis-Bullseye-IntraTask.ipynb
Overview.png		Overview.png
README.md		README.md
data_loader.py		data_loader.py
further_visul.ipynb		further_visul.ipynb
inference.py		inference.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgEval Benchmark

Overview

Inference (`inference.py`)

Supported Models

Data Loader (`data_loader.py`)

Available Datasets

Usage

Notes

About

Releases

Packages

Contributors 2

Languages

arbab-ml/AgEval

Folders and files

Latest commit

History

Repository files navigation

AgEval Benchmark

Overview

Inference (inference.py)

Supported Models

Data Loader (data_loader.py)

Available Datasets

Usage

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Inference (`inference.py`)

Data Loader (`data_loader.py`)

Packages