Inference: pad very short signals before embedding them #14055

rfejgin · 2025-06-28T00:47:58Z

The speaker embedding model crashes on very short signals. So we zero-pad the end of the signal if it's less than 0.5 seconds long before running it through the speaker embedding model.

The speaker embedding model crashes on very short signals. So we zero-pad the end of the signal if it's less than 0.5 seconds long before running it through the speaker embedding model. Signed-off-by: Fejgin, Roy <[email protected]>

…_signals

subhankar-ghosh

LGTM. Left some minor comments.

scripts/magpietts/evaluate_generated_audio.py

subhankar-ghosh · 2025-07-01T17:01:32Z

scripts/magpietts/evaluate_generated_audio.py

 def extract_embedding(model, extractor, audio_path, device, sv_model_type):
    speech_array, sampling_rate = librosa.load(audio_path, sr=16000)
-
+    # pad to 0.5 seconds as the extractor may not be able to handle very short signals
+    speech_array = pad_audio_to_min_length(speech_array, int(sampling_rate), min_seconds=0.5)


Has this been tested, does this affect the final evaluation metrics in any way?

I tested that the padding works correctly. Did not collect pre/post evaluation stats. This should only kick in very rarely, when the generated speech is shorter than 0.5 sec.

Following up on this: I ran libri_unseen_test with and without the padding fix and found no statistically significant differences in WER and SSIM.

rfejgin · 2025-07-01T17:08:11Z

LGTM. Left some minor comments.

Thanks @subhankar-ghosh. It had been set to auto-merge so already got merged as soon as you approved, but I'll still look at your comments.

paarthneekhara

Looks good to me.

Inference: pad short signals before embedding them

9e4153c

The speaker embedding model crashes on very short signals. So we zero-pad the end of the signal if it's less than 0.5 seconds long before running it through the speaker embedding model. Signed-off-by: Fejgin, Roy <[email protected]>

rfejgin added the Run CICD label Jun 28, 2025

ko3n1g added Run CICD and removed Run CICD labels Jun 28, 2025

rfejgin requested a review from paarthneekhara June 28, 2025 00:48

rfejgin marked this pull request as ready for review June 28, 2025 00:48

rfejgin requested a review from shehzeen June 28, 2025 00:49

rfejgin enabled auto-merge (squash) June 28, 2025 00:51

Merge branch 'magpietts_2503' into magpietts_2503_inference_pad_short…

f7e004b

…_signals

ko3n1g added Run CICD and removed Run CICD labels Jun 29, 2025

subhankar-ghosh approved these changes Jul 1, 2025

View reviewed changes

rfejgin merged commit dbc6e78 into NVIDIA:magpietts_2503 Jul 1, 2025
73 checks passed

paarthneekhara reviewed Jul 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inference: pad very short signals before embedding them #14055

Inference: pad very short signals before embedding them #14055

Uh oh!

rfejgin commented Jun 28, 2025

Uh oh!

subhankar-ghosh left a comment

Uh oh!

Uh oh!

subhankar-ghosh Jul 1, 2025

Uh oh!

rfejgin Jul 1, 2025 •

edited

Loading

Uh oh!

rfejgin Jul 1, 2025

Uh oh!

Uh oh!

rfejgin commented Jul 1, 2025

Uh oh!

paarthneekhara left a comment

Uh oh!

Uh oh!

Inference: pad very short signals before embedding them #14055

Inference: pad very short signals before embedding them #14055

Uh oh!

Conversation

rfejgin commented Jun 28, 2025

Uh oh!

subhankar-ghosh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

subhankar-ghosh Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

rfejgin Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rfejgin Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rfejgin commented Jul 1, 2025

Uh oh!

paarthneekhara left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rfejgin Jul 1, 2025 •

edited

Loading