This repository contains the development framework for the BioDCASE-Tiny 2025 competition (Task 3), focusing on TinyML implementation for bird species recognition on the ESP32-S3-Korvo-2 development board.
For complete competition details, visit the official BioDCASE 2025 Task 3 website.
BioDCASE-Tiny is a competition for developing efficient machine learning models for bird audio recognition that can run on resource-constrained embedded devices. The project uses the ESP32-S3-Korvo-2 development board, which offers audio processing capabilities in a small form factor suitable for field deployment.
- Setup and Installation
- Usage
- Dataset
- Development
- ESP32-S3-Korvo-2 Development Board
- Code Structure
- Development Tips
- Evaluation Metrics
- License
- Citation
- Funding
- Partners
- Python 3.11+ with pip and venv
- Docker for ESP-IDF environment
- USB cable and ESP32-S3-Korvo-2 development board
- Clone the repository:
git clone https://github.com/birdnet-team/BioDCASE-Tiny-2025.git
cd BioDCASE-Tiny-2025
- Create a virtual environment (recommended)
python3 -m venv .venv
source .venv/bin/activate
- Install Python dependencies:
pip install -e .
- Set your serial device port in the pipeline_config.yaml
embedded_code_generation:
serial_device: <YOUR_DEVICE>
As the required tflite-micro package is not easily available for Windows we recommend using WSL to run this project.
To make your device accessible for WSL you can use this guide: https://learn.microsoft.com/en-us/windows/wsl/connect-usb
To determine your serial device port you can use the following command:
dmesg | grep tty
You might also need to grant some rights to run the deployment:
sudo adduser $USER dialout
sudo chmod a+rw $SERIAL_PORT
- Modify model.py with your architecture (make sure to compile with optimizer and loss)
- Modify the training loop in the same file, if you need to
- Modify pipeline_config.yaml parameters of feature extraction
- run biodcase.py
The BioDCASE-Tiny 2025 competition uses a specialized dataset of Yellowhammer bird vocalizations. Key features include:
- 2+ hours of audio recordings
- Songs from multiple individuals recorded at various distances (6.5m to 200m)
- Recordings in different environments (forest and grassland)
- Includes negative samples (other bird species and background noise)
- Dataset is split into training, validation, and evaluation sets
The training set contains recordings from 8 individuals, while the validation set contains recordings from 2 individuals. An additional 2 individuals are reserved for the final evaluation.
The dataset is organized as follows:
Development_Set/
├── Training_Set/
│ ├── Yellowhammer/
│ │ └── *.wav (Yellowhammer vocalizations)
│ └── Negatives/
│ └── *.wav (Other species and background noise)
└── Validation_Set/
├── Yellowhammer/
│ └── *.wav (Yellowhammer vocalizations)
└── Negatives/
└── *.wav (Other species and background noise)
-
Yellowhammer/ - Contains target species vocalizations with filenames following format
YH_SongID_Location_Distance.wav
- SongID: 3-digit identifier for each song
- Location: "forest", "grassland", or "speaker" for original recordings
- Distance: A (original) through H (farthest at ~200m)
-
Negatives/ - Contains negative samples with filenames following format
Type_ID.wav
- Type: "Background" for noise or "bird" for non-target species vocalizations
- ID: File identifier
Download the dataset from: BioDCASE-Tiny 2025 Dataset
After downloading paste the folders into /data/01_raw/clips
To run the complete pipeline execute:
python biodcase.py
This will execute the data preprocessing, extract the features, train the model and deploy it to your board.
Once deployed, benchmarking code on the ESP32-S3 will display info, via serial monitor, about the runtime performance of the preprocessing steps and actual model.
The steps of the pipeline can be executed individually
-
Data Preprocessing
python data_preprocessing.py
-
Feature Extraction
python feature_extraction.py
-
Model Training
python model_training.py
-
Deployment
python embedded_code_generation.py
The data processing pipeline follows these steps:
- Raw audio files are read and preprocessed
- Features are extracted according to configuration in
pipeline_config.yaml
- The dataset is split into training/validation/testing sets
- Features are used for model training
The model training process is managed in model_training.py
. You can customize:
- Model architecture in
model.py
and, optionally, the training loop - Training hyperparameters in
pipeline_config.yaml
- Feature extraction parameters to optimize model input
To deploy your model to the ESP32-S3-Korvo-2 board, you'll use the built-in deployment tools that handle model conversion, code generation, and flashing. The deployment process:
- Converts your trained Keras model to TensorFlow Lite format optimized for the ESP32-S3
- Packages your feature extraction configuration for embedded use
- Generates C++ code that integrates with the ESP-IDF framework
- Compiles the firmware using Docker-based ESP-IDF toolchain
- Flashes the compiled firmware to your connected ESP32-S3-Korvo-2 board
The ESP32-S3-Korvo-2 board features:
- ESP32-S3 dual-core processor
- Built-in microphone array
- Audio codec for high-quality audio processing
- Wi-Fi and Bluetooth connectivity
- USB-C connection for programming and debugging
biodcase.py
- Main execution pipelinemodel.py
- Define your model architecturefeature_extraction.py
- Audio feature extraction implementationsembedded_code_generation.py
- ESP32 code generation utilitiesbiodcase_tiny/embedded/esp_target.py
- ESP target definition and configurationbiodcase_tiny/embedded/firmware/main
- Firmware source code
The codebase includes performance benchmarking tools that measure:
- Feature extraction time
- Model inference time
- Memory usage on the target device
-
Feature Extraction Parameters: Carefully tune the feature extraction parameters in
pipeline_config.yaml
for your specific audio dataset. -
Model Size: Keep your model compact. The ESP32-S3 has limited memory, so optimize your architecture accordingly.
-
Profiling: Use the profiling tools to identify bottlenecks in your implementation.
-
Memory Management: Be mindful of memory allocation on the ESP32. Monitor the allocations reported by the firmware.
-
Docker Environment: The toolchain uses Docker to provide a consistent ESP-IDF environment, making it easier to build on any host system.
The BioDCASE-Tiny competition evaluates models based on multiple criteria:
- Average precision: the average value of precision across all recall levels from 0 to 1.
- Model Size: Tflite model file size (KB)
- Inference Time: Average time required for single audio classification, including feature extraction (ms)
- Peak Memory Usage: Maximum RAM usage during inference (KB)
Participants will be ranked separately for each one of the evaluation criteria.
This project is licensed under the Apache License 2.0 - see the license headers in individual files for details.
If you use the BioDCASE-Tiny framework or dataset in your research, please cite the following:
@misc{biodcase_tiny_2025,
author = {Carmantini, Giovanni and Förstner, Friedrich and Isik, Can and Kahl, Stefan},
title = {BioDCASE-Tiny 2025: A Framework for Bird Species Recognition on Resource-Constrained Hardware},
year = {2025},
institution = {Cornell University and Chemnitz University of Technology},
type = {Software},
publisher = {GitHub},
journal = {GitHub Repository},
howpublished = {\url{https://github.com/birdnet-team/BioDCASE-Tiny-2025}}
}
@dataset{yellowhammer_dataset_2025,
author = {Morandi, Ilaria and Linhart, Pavel and Kwak, Minkyung and Petrusková, Tereza},
title = {BioDCASE 2025 Task 3: Bioacoustics for Tiny Hardware Development Set},
year = {2025},
publisher = {Zenodo},
doi = {10.5281/zenodo.15228365},
url = {https://doi.org/10.5281/zenodo.15228365}
}
This project is supported by Jake Holshuh (Cornell class of ´69) and The Arthur Vining Davis Foundations. Our work in the K. Lisa Yang Center for Conservation Bioacoustics is made possible by the generosity of K. Lisa Yang to advance innovative conservation technologies to inspire and inform the conservation of wildlife and habitats.
The development of BirdNET is supported by the German Federal Ministry of Education and Research through the project “BirdNET+” (FKZ 01|S22072). The German Federal Ministry for the Environment, Nature Conservation and Nuclear Safety contributes through the “DeepBirdDetect” project (FKZ 67KI31040E). In addition, the Deutsche Bundesstiftung Umwelt supports BirdNET through the project “RangerSound” (project 39263/01).
BirdNET is a joint effort of partners from academia and industry. Without these partnerships, this project would not have been possible. Thank you!