[Bug] If sentence too long, some part will be missing during audio file generation #1680

hengway · 2022-06-22T10:24:28Z

Describe the bug

If a sentence too long (separate by comma) some part of it will missing during the audio generation

Example:
On April 1, 1942, Desmond Doss joined the United States Army. Little did he realize that three and a half years later, he would be standing on the White House lawn, receiving the nations highest award for his bravery and courage under fire. Of the 16 million men in uniform during World War 2, only 431 received the Congressional Medal of Honor.

The missing part will be: he would be standing on the White House lawn, receiving the nations highest award for his bravery and courage under fire.

To work around, shorten the sentence by replace comma with full stop:
On April 1, 1942, Desmond Doss joined the United States Army. Little did he realize that three and a half years later. He would be standing on the White House lawn, receiving the nations highest award for his bravery and courage under fire. Of the 16 million men in uniform during World War 2, only 431 received the Congressional Medal of Honor.

To Reproduce

Run below command
tts --text "On April 1, 1942, Desmond Doss joined the United States Army. Little did he realize that three and a half years later, he would be standing on the White House lawn, receiving the nations highest award for his bravery and courage under fire. Of the 16 million men in uniform during World War 2, only 431 received the Congressional Medal of Honor. " --model_name "tts_models/en/ljspeech/tacotron2-DDC_ph" --out_path /var/data/The-unlikely-hero5.wav

Expected behavior

Able to generate whole audio file

Logs

ubuntu@ubuntu:~$ tts --text "On April 1, 1942, Desmond Doss joined the United States Army. Little did he realize that three and a half years later, he would be standing on the White House lawn, receiving the nations highest award for his bravery and courage under fire. Of the 16 million men in uniform during World War 2, only 431 received the Congressional Medal of Honor. " --model_name "tts_models/en/ljspeech/tacotron2-DDC_ph" --out_path /opt/tts_output/The-unlikely-hero5.wav
 > tts_models/en/ljspeech/tacotron2-DDC_ph is already downloaded.
 > vocoder_models/en/ljspeech/univnet is already downloaded.
 > Using model: Tacotron2
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:-100
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:20
 | > fft_size:1024
 | > power:1.5
 | > preemphasis:0.0
 | > griffin_lim_iters:60
 | > signal_norm:True
 | > symmetric_norm:True
 | > mel_fmin:50.0
 | > mel_fmax:7600.0
 | > pitch_fmin:0.0
 | > pitch_fmax:640.0
 | > spec_gain:1.0
 | > stft_pad_mode:reflect
 | > max_norm:4.0
 | > clip_norm:True
 | > do_trim_silence:True
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:/home/xstts/.local/share/tts/tts_models--en--ljspeech--tacotron2-DDC_ph/scale_stats.npy
 | > base:10
 | > hop_length:256
 | > win_length:1024
 > Model's reduction rate `r` is set to: 2
 > Vocoder Model: univnet
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:-100
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:20
 | > fft_size:1024
 | > power:1.5
 | > preemphasis:0.0
 | > griffin_lim_iters:60
 | > signal_norm:True
 | > symmetric_norm:True
 | > mel_fmin:50.0
 | > mel_fmax:7600.0
 | > pitch_fmin:0.0
 | > pitch_fmax:640.0
 | > spec_gain:1.0
 | > stft_pad_mode:reflect
 | > max_norm:4.0
 | > clip_norm:True
 | > do_trim_silence:True
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:/home/xstts/.local/share/tts/vocoder_models--en--ljspeech--univnet/scale_stats.npy
 | > base:10
 | > hop_length:256
 | > win_length:1024
 > Generator Model: univnet_generator
 > Discriminator Model: univnet_discriminator
 > Text: On April 1, 1942, Desmond Doss joined the United States Army. Little did he realize that three and a half years later, he would be standing on the White House lawn, receiving the nations highest award for his bravery and courage under fire. Of the 16 million men in uniform during World War 2, only 431 received the Congressional Medal of Honor.
 > Text splitted to sentences.
['On April 1, 1942, Desmond Doss joined the United States Army.', 'Little did he realize that three and a half years later, he would be standing on the White House lawn, receiving the nations highest award for his bravery and courage under fire.', 'Of the 16 million men in uniform during World War 2, only 431 received the Congressional Medal of Honor.']
ɔn eɪpɹəl wʌn, naɪntin fɔɹti tu, dɛzmənd dɔs d͡ʒɔɪnd ðə junaɪtɪd steɪts ɑɹmi.
 [!] Character '͡' not found in the vocabulary. Discarding it.
[W NNPACK.cpp:51] Could not initialize NNPACK! Reason: Unsupported hardware.
 > Processing time: 18.15455675125122
 > Real-time factor: 0.9681247735486627
 > Saving output to /opt/tts_output/The-unlikely-hero5.wav

Environment

Package                Version              Location
---------------------- -------------------- --------
anyascii               0.3.1
appdirs                1.4.4
astroid                2.7.3
attrs                  19.3.0
audioread              2.1.9
Automat                0.8.0
Babel                  2.10.3
backports.zoneinfo     0.2.1
black                  22.3.0
blinker                1.4
bokeh                  1.4.0
certifi                2019.11.28
cffi                   1.15.0
chardet                3.0.4
click                  8.1.3
cloud-init             22.2
colorama               0.4.3
command-not-found      0.3
configobj              5.0.6
constantly             15.1.0
coqpit                 0.0.16
coverage               6.4.1
cryptography           2.8
cycler                 0.11.0
Cython                 0.29.28
dateparser             1.1.1
dbus-python            1.2.16
decorator              5.1.1
distro                 1.4.0
distro-info            0.23ubuntu1
docopt                 0.6.2
entrypoints            0.3
Flask                  2.1.2
fonttools              4.33.3
fsspec                 2022.5.0
gruut                  2.2.3
gruut-ipa              0.13.0
gruut-lang-cs          2.0.0
gruut-lang-de          2.0.0
gruut-lang-en          2.0.0
gruut-lang-es          2.0.0
gruut-lang-fr          2.0.2
gruut-lang-it          2.0.0
gruut-lang-nl          2.0.2
gruut-lang-pt          2.0.0
gruut-lang-ru          2.0.0
gruut-lang-sv          2.0.0
httplib2               0.14.0
hyperlink              19.0.0
idna                   2.8
importlib-metadata     4.11.4
importlib-resources    5.8.0
incremental            16.10.1
inflect                5.6.0
isort                  5.10.1
itsdangerous           2.1.2
jieba                  0.42.1
Jinja2                 3.1.2
joblib                 1.1.0
jsonlines              1.2.0
jsonpatch              1.22
jsonpointer            2.0
jsonschema             3.2.0
keyring                18.0.1
kiwisolver             1.4.3
language-selector      0.1
launchpadlib           1.10.13
lazr.restfulclient     0.14.2
lazr.uri               1.0.3
lazy-object-proxy      1.7.1
librosa                0.8.0
llvmlite               0.38.1
MarkupSafe             2.1.1
matplotlib             3.5.2
mccabe                 0.6.1
mecab-python3          1.0.5
more-itertools         4.2.0
mypy-extensions        0.4.3
netifaces              0.10.4
networkx               2.8.4
nose2                  0.11.0
num2words              0.5.10
numba                  0.55.1
numpy                  1.21.6
oauthlib               3.1.0
packaging              21.3
pandas                 1.4.2
pathspec               0.9.0
pexpect                4.6.0
Pillow                 9.1.1
pip                    20.0.2
platformdirs           2.5.2
pooch                  1.6.0
protobuf               3.19.4
pyasn1                 0.4.2
pyasn1-modules         0.2.1
pycparser              2.21
PyGObject              3.36.0
PyHamcrest             1.9.0
PyJWT                  1.7.1
pylint                 2.10.2
pymacaroons            0.13.0
PyNaCl                 1.3.0
pynndescent            0.5.7
pyOpenSSL              19.0.0
pyparsing              3.0.9
pypinyin               0.46.0
pyrsistent             0.15.5
pysbd                  0.3.4
pyserial               3.4
python-apt             2.0.0+ubuntu0.20.4.7
python-crfsuite        0.9.8
python-dateutil        2.8.2
python-debian          0.1.36ubuntu1
pytz                   2022.1
pytz-deprecation-shim  0.1.0.post0
pyworld                0.2.10
PyYAML                 5.3.1
regex                  2022.3.2
requests               2.22.0
requests-unixsocket    0.2.0
resampy                0.2.2
scikit-learn           1.1.1
scipy                  1.8.1
SecretStorage          2.3.1
service-identity       18.1.0
setuptools             45.2.0
simplejson             3.16.0
six                    1.14.0
sos                    4.3
SoundFile              0.10.3.post1
ssh-import-id          5.10
systemd-python         234
tensorboardX           2.5.1
threadpoolctl          3.1.0
toml                   0.10.2
tomli                  2.0.1
torch                  1.11.0
torchaudio             0.11.0
tornado                6.1
tqdm                   4.64.0
trainer                0.0.12
TTS                    0.7.0                /opt/TTS
Twisted                18.9.0
typing-extensions      4.2.0
tzdata                 2022.1
tzlocal                4.2
ubuntu-advantage-tools 27.8
ufw                    0.36
umap-learn             0.5.1
unattended-upgrades    0.1
unidic-lite            1.0.8
urllib3                1.25.8
wadllib                1.3.3
Werkzeug               2.1.2
wheel                  0.34.2
wrapt                  1.12.1
zipp                   3.8.0
zope.interface         4.7.1

Additional context

No response

erogol · 2022-07-05T09:13:18Z

For Tacotron models there is a cap of 250 chars not to crash your memory. You need to set it manually if you wanna change it.

genglinxiao · 2023-11-15T10:39:25Z

I'm also looking for methods to generate long sentences. What I've found is, the limit is actually in the tokenizer, and is hard coded:

class VoiceBpeTokenizer: def __init__(self, vocab_file=None): self.tokenizer = None if vocab_file is not None: self.tokenizer = Tokenizer.from_file(vocab_file) self.char_limits = { "en": 250, "de": 253, "fr": 273, "es": 239, "it": 213, "pt": 203, "pl": 224, "zh-cn": 82, "ar": 166, "cs": 186, "ru": 182, "nl": 251, "tr": 226, "ja": 71, "hu": 224, "ko": 95, }

So you can simply modify the limit. However, I'm not sure about the downstream effect.

FurkanGozukara · 2023-11-16T07:18:07Z

For Tacotron models there is a cap of 250 chars not to crash your memory. You need to set it manually if you wanna change it.

what is limit for TTS V2? I saw in code 400 tokens

m000lie · 2024-03-06T01:03:28Z

For Tacotron models there is a cap of 250 chars not to crash your memory. You need to set it manually if you wanna change it.

how much memory is it expected to use per char? i have access to 1x H100 SCM 80GB. surely memory shouldn't be a problem right?

OlegRuban-ai · 2024-11-05T20:21:45Z

@genglinxiao is there a way to make these changes inside the code installed with PIP without having to clone the repository?

bigsk1 · 2025-04-02T02:18:49Z

@genglinxiao yes you can I have done

def init_set_tts(set_tts):
    global TTS_PROVIDER, tts
    if set_tts == 'xtts':
        print("Initializing XTTS model (may download on first run)...")
        try:
            os.environ["COQUI_TOS_AGREED"] = "1"  # Auto-agree to terms
            tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2")
            print("Model downloaded, loading into memory...")
            tts = tts.to(device)
            tts.synthesizer.tts_model.args.num_chars = 1000  # default is 255 we are overriding it
            print("XTTS model loaded successfully.")
            TTS_PROVIDER = set_tts
        except Exception as e:
            print(f"Failed to load XTTS model: {e}")
            loop = asyncio.get_running_loop()
            loop.create_task(send_message_to_clients(json.dumps({
                "action": "error",
                "message": "Failed to load XTTS model. Please check your internet connection or model availability."
            })))
    else:
        TTS_PROVIDER = set_tts
        tts = None
        print(f"Switched to TTS Provider: {set_tts}")

hengway added the bug Something isn't working label Jun 22, 2022

erogol closed this as completed Jul 5, 2022

mmol67 mentioned this issue Jan 17, 2024

Catch max tokens before exceeding it aedocw/epub2tts#193

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] If sentence too long, some part will be missing during audio file generation #1680

[Bug] If sentence too long, some part will be missing during audio file generation #1680

hengway commented Jun 22, 2022

erogol commented Jul 5, 2022

genglinxiao commented Nov 15, 2023

FurkanGozukara commented Nov 16, 2023

m000lie commented Mar 6, 2024

OlegRuban-ai commented Nov 5, 2024

bigsk1 commented Apr 2, 2025

[Bug] If sentence too long, some part will be missing during audio file generation #1680

[Bug] If sentence too long, some part will be missing during audio file generation #1680

Comments

hengway commented Jun 22, 2022

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

erogol commented Jul 5, 2022

genglinxiao commented Nov 15, 2023

FurkanGozukara commented Nov 16, 2023

m000lie commented Mar 6, 2024

OlegRuban-ai commented Nov 5, 2024

bigsk1 commented Apr 2, 2025