-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] If sentence too long, some part will be missing during audio file generation #1680
Comments
For Tacotron models there is a cap of 250 chars not to crash your memory. You need to set it manually if you wanna change it. |
I'm also looking for methods to generate long sentences. What I've found is, the limit is actually in the tokenizer, and is hard coded:
So you can simply modify the limit. However, I'm not sure about the downstream effect. |
what is limit for TTS V2? I saw in code 400 tokens |
how much memory is it expected to use per char? i have access to 1x H100 SCM 80GB. surely memory shouldn't be a problem right? |
@genglinxiao is there a way to make these changes inside the code installed with PIP without having to clone the repository? |
@genglinxiao yes you can I have done def init_set_tts(set_tts):
global TTS_PROVIDER, tts
if set_tts == 'xtts':
print("Initializing XTTS model (may download on first run)...")
try:
os.environ["COQUI_TOS_AGREED"] = "1" # Auto-agree to terms
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2")
print("Model downloaded, loading into memory...")
tts = tts.to(device)
tts.synthesizer.tts_model.args.num_chars = 1000 # default is 255 we are overriding it
print("XTTS model loaded successfully.")
TTS_PROVIDER = set_tts
except Exception as e:
print(f"Failed to load XTTS model: {e}")
loop = asyncio.get_running_loop()
loop.create_task(send_message_to_clients(json.dumps({
"action": "error",
"message": "Failed to load XTTS model. Please check your internet connection or model availability."
})))
else:
TTS_PROVIDER = set_tts
tts = None
print(f"Switched to TTS Provider: {set_tts}") |
Describe the bug
If a sentence too long (separate by comma) some part of it will missing during the audio generation
Example:
On April 1, 1942, Desmond Doss joined the United States Army. Little did he realize that three and a half years later, he would be standing on the White House lawn, receiving the nations highest award for his bravery and courage under fire. Of the 16 million men in uniform during World War 2, only 431 received the Congressional Medal of Honor.
The missing part will be: he would be standing on the White House lawn, receiving the nations highest award for his bravery and courage under fire.
To work around, shorten the sentence by replace comma with full stop:
On April 1, 1942, Desmond Doss joined the United States Army. Little did he realize that three and a half years later. He would be standing on the White House lawn, receiving the nations highest award for his bravery and courage under fire. Of the 16 million men in uniform during World War 2, only 431 received the Congressional Medal of Honor.
To Reproduce
Run below command
tts --text "On April 1, 1942, Desmond Doss joined the United States Army. Little did he realize that three and a half years later, he would be standing on the White House lawn, receiving the nations highest award for his bravery and courage under fire. Of the 16 million men in uniform during World War 2, only 431 received the Congressional Medal of Honor. " --model_name "tts_models/en/ljspeech/tacotron2-DDC_ph" --out_path /var/data/The-unlikely-hero5.wav
Expected behavior
Able to generate whole audio file
Logs
Environment
Additional context
No response
The text was updated successfully, but these errors were encountered: