Update fastspeech2 model card #37377

ricalanis · 2025-04-08T21:14:22Z

Updated the FastSpeech2Conformer model card
Could not replicate the pipeline example from original code, as a bug related to FastSpeech2ConformerConfig appears (AttributeError: 'FastSpeech2ConformerConfig' object has no attribute 'model_config') still left it as is.
Did not add terminal call out as I could not find a direct way to do that.
Added a small section to use the combined model with the vocoder.
Did not add quantization as the model architecture is not linear layer heavy (limited impact of quantization, AFAIK)
This was a little bit more challenging, still. really thank you for your patience!

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
[] Did you write any new necessary tests? NA

github-actions · 2025-04-08T21:14:36Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

Rocketknight1 · 2025-04-09T11:22:50Z

cc @stevhliu

stevhliu

Thanks for your contribution! I've pinged our audio expert @eustlb to have a look at why the code examples are failing.

docs/source/en/model_doc/fastspeech2_conformer.md

stevhliu · 2025-04-15T19:09:01Z

docs/source/en/model_doc/fastspeech2_conformer.md

-#### Convolution Module
-![Convolution Module](https://d3i71xaburhd42.cloudfront.net/8809d0732f6147d4ad9218c8f9b20227c837a746/2-Figure1-1.png)
+<hfoptions id="usage">
+<hfoption id="Pipeline">


@eustlb, would you mind taking a look at why the code is failing here? It returns a AttributeError: 'FastSpeech2ConformerConfig' object has no attribute 'model_config' error which seems to be available here. Thanks!

# pip install -U -q g2p-en import torch import soundfile as sf from transformers import pipeline, FastSpeech2ConformerHifiGan vocoder = FastSpeech2ConformerHifiGan.from_pretrained("espnet/fastspeech2_conformer_hifigan") synthesiser = pipeline(task="text-to-audio", model="espnet/fastspeech2_conformer", vocoder=vocoder, device=0, torch_dtype=torch.float16) speech = synthesiser("Hello, my dog is cooler than you!") sf.write("speech.wav", speech["audio"].squeeze(), samplerate=speech["sampling_rate"])

Nice catch, let's simply modify modeling_audio.py's MODEL_FOR_TEXT_TO_WAVEFORM_MAPPING_NAMES

MODEL_FOR_TEXT_TO_WAVEFORM_MAPPING_NAMES = OrderedDict( [ # Model for Text-To-Waveform mapping ("bark", "BarkModel"), ("csm", "CsmForConditionalGeneration"), ("fastspeech2_conformer", "FastSpeech2ConformerModel"), ("fastspeech2_conformer_with_hifigan", "FastSpeech2ConformerWithHifiGan"), ("musicgen", "MusicgenForConditionalGeneration"), ("musicgen_melody", "MusicgenMelodyForConditionalGeneration"), ("qwen2_5_omni", "Qwen2_5OmniForConditionalGeneration"), ("seamless_m4t", "SeamlessM4TForTextToSpeech"), ("seamless_m4t_v2", "SeamlessM4Tv2ForTextToSpeech"), ("vits", "VitsModel"), ] )

stevhliu · 2025-04-15T19:11:01Z

docs/source/en/model_doc/fastspeech2_conformer.md

-
-```python
+</hfoption>
+<hfoption id="AutoModel">


Let's use the combined version so its easier:

# pip install -U -q g2p-en import soundfile as sf import torch from transformers import AutoTokenizer, FastSpeech2ConformerWithHifiGan tokenizer = AutoTokenizer.from_pretrained("espnet/fastspeech2_conformer") model = FastSpeech2ConformerWithHifiGan.from_pretrained("espnet/fastspeech2_conformer_with_hifigan", torch_dtype=torch.float16, device_map="auto") inputs = tokenizer("Hello, my dog is cute.", return_tensors="pt").to("cuda") input_ids = inputs["input_ids"] output_dict = model(input_ids, return_dict=True) waveform = output_dict["waveform"] sf.write("speech.wav", waveform.squeeze().detach().numpy(), samplerate=22050)

docs/source/en/model_doc/fastspeech2_conformer.md

Co-authored-by: Steven Liu <[email protected]>

feat: edit fastspeech2 model card

4897b5e

github-actions bot marked this pull request as draft April 8, 2025 21:14

ricalanis marked this pull request as ready for review April 8, 2025 21:54

github-actions bot requested a review from stevhliu April 8, 2025 21:54

Merge branch 'main' into update_fastspeech2_modelcard

bdf9f16

stevhliu mentioned this pull request Apr 11, 2025

[Community contributions] Model cards #36979

Open

stevhliu reviewed Apr 15, 2025

View reviewed changes

ricalanis and others added 8 commits April 16, 2025 21:41

Update docs/source/en/model_doc/fastspeech2_conformer.md

d2b26c5

Co-authored-by: Steven Liu <[email protected]>

Update docs/source/en/model_doc/fastspeech2_conformer.md

d059e44

Co-authored-by: Steven Liu <[email protected]>

Update docs/source/en/model_doc/fastspeech2_conformer.md

727daa0

Co-authored-by: Steven Liu <[email protected]>

Update docs/source/en/model_doc/fastspeech2_conformer.md

4adefb0

Co-authored-by: Steven Liu <[email protected]>

Update docs/source/en/model_doc/fastspeech2_conformer.md

b9f68aa

Co-authored-by: Steven Liu <[email protected]>

Update docs/source/en/model_doc/fastspeech2_conformer.md

8859d6c

Co-authored-by: Steven Liu <[email protected]>

Merge branch 'huggingface:main' into update_fastspeech2_modelcard

3a69e07

fix: add combined version

21326f6

stevhliu mentioned this pull request May 19, 2025

[fix] Add FastSpeech2ConformerWithHifiGan #38207

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update fastspeech2 model card #37377

Update fastspeech2 model card #37377

Uh oh!

ricalanis commented Apr 8, 2025

Uh oh!

github-actions bot commented Apr 8, 2025

Uh oh!

Rocketknight1 commented Apr 9, 2025

Uh oh!

stevhliu left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stevhliu Apr 15, 2025

Uh oh!

eustlb May 1, 2025

Uh oh!

stevhliu Apr 15, 2025

Uh oh!

ricalanis Apr 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Update fastspeech2 model card #37377

Are you sure you want to change the base?

Update fastspeech2 model card #37377

Uh oh!

Conversation

ricalanis commented Apr 8, 2025

Before submitting

Uh oh!

github-actions bot commented Apr 8, 2025

Uh oh!

Rocketknight1 commented Apr 9, 2025

Uh oh!

stevhliu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stevhliu Apr 15, 2025

Choose a reason for hiding this comment

Uh oh!

eustlb May 1, 2025

Choose a reason for hiding this comment

Uh oh!

stevhliu Apr 15, 2025

Choose a reason for hiding this comment

Uh oh!

ricalanis Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!