Quality Metrics

Silero VAD quality metrics

General Information
Metrics

Info

Test Datasets

Libryparty dataset validation dataset contains 4 hours of audio;

AVA Spoken Activity Datasets validation dataset contains 4 hours of audio as well;

Probability

Modern Voice Activity Detectors output speech probability (a float between 0 and 1) of an audio chunk of a desired length using:

Some pre-trained model;

Some function of its state or some internal buffer;

Threshold

Threshold is a value selected by the user that determines if there is speech in audio. If speech probability of an audio chunk is higher than the set threshold, we assume it has speech. Depending on the desired result threshold should be tailored for a specific data set or domain.

Metrics

ROC-AUC score

SpeechBrain VAD does not support streaming, i.e. it looks at the entire audio context, so we didn't include it in the Precision-Recall curves.

Model	AVA	LibryParty	Streaming
Silero v3 (current) 16k	0.9	0.99	✔️
Silero v3 (current) 8k	0.89	0.97	✔️
Silero v2 16k	0.87	0.93	✔️
Silero v2 8k	0.85	0.94	✔️
SpeechBrain	0.85	0.99	❌
WebRTC	0.66	0.81	✔️
Picovoice	0.88	0.97	✔️

Silero VAD vs Other Available Solutions

Parameters: 16000 Hz sampling rate, 30 ms (512 samples).

WebRTC VAD algorithm is extremely fast and pretty good at separating noise from silence, but pretty poor at separating speech from noise.

Picovoice VAD is good overall, but we were able to surpass it in quality (eof 2022).

Vs_competitors

Silero VAD Vs Old Silero VAD

Parameters: 16000 Hz sampling rate, 30 ms (512 samples).

As you can see, there was a huge jump in the model's quality.

Vs_old

Sample Rate Comparison

Diff_sr

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quality Metrics

Silero VAD quality metrics

Info

Test Datasets

Probability

Threshold

Metrics

ROC-AUC score

Silero VAD vs Other Available Solutions

Silero VAD Vs Old Silero VAD

Sample Rate Comparison

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally