Quality Metrics

Silero VAD quality metrics

Info

Test Dataset

Some Test Dataset stats:

2200 audio with an average duration of 7 seconds.

30+ languages

different speech audibility (from clear studio to background speech)

different noise levels

manually annotated whole audio (is there a speech), not short chunks

55% of test audio is speech.

Probability

Modern Voice Activity Detectors output speech probability (float between 0 and 1) of an audio chunk of a desired length using:

sound external features

parameters obtained during training

previous chunk state

Threshold

Threshold is the user selectable value that distinguishes speech in audio. If speech probability of an audio chunk is higher than the set threshold, we assume it is speech. Depending on the desired result threshold should be tailored for a specific data set.

Method

We assume that certain VAD algorithm predicted speech for the whole audio if its chunk predictions contain a sequence of probabilities above a certain threshold and longer than 250 milliseconds

So test method can be described as follows:

Get raw model predictions (sequence of speech probabilities between 0 and 1) for each audio in the test set.
Use raw predictions with different thresholds to calculate if there is a speech in the whole audio
Calculate recall/precision/accuracy/zero class recall for each threshold
Draw Precision-Recall curve

Metrics

Vs other available solutions

Parameters: 16000 sample rate, 30 milliseconds (512 samples).

WebRTC VAD algorithm is extremely fast and pretty good at separating noise from silence, but poor at separating speech from noise.

Picovoice VAD is good overall, but we were able to surpass it in quality.

Vs_competitors

Vs old Silero VAD

Parameters: 16000 sample rate, 100 milliseconds (1536 samples) for new VAD models, 250 milliseconds (4000 samples) for the old one.

As you can see, there was a huge jump in the model's quality. Silero Big model is not publicly available, please contact us if you are interested in it.

Vs_old

Sample rate comparison

Diff_sr

Chunk size comparison

Diff_chunk_size

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quality Metrics

Silero VAD quality metrics

Info

Test Dataset

Probability

Threshold

Method

Metrics

Vs other available solutions

Vs old Silero VAD

Sample rate comparison

Chunk size comparison

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally