-
Notifications
You must be signed in to change notification settings - Fork 287
Individual Word Timestamps - Kokoro TTS #278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the inspiration, I integrated it in v0.4.51 |
Awesome how can get the words and timestamps in your implementation in RealtimeTTS? |
Check out the kokoro_test.py file and use the on_word callback. The object returned to the callback includes the properties "word" (the text), "start_time", and "end_time" (the time offsets in seconds for when the word starts and ends). This callback is triggered right when a word starts playing. |
I hope I implemented it in a way that lets you use the feature as you intended. If you had a different approach in mind for the timestamps, please let me know, and I'll think about a way how to integrate it. |
Looks like a logical implementation, tried it but sometimes got errors like these ⚡ synthesizing → '1. Answer your questions: I've been trained on a vast amount of knowledge, so I can provide information on a wide range of topics, from science and history to entertainment and culture.' I'm using play_async and muted=True, similar to the async_server example |
Oh, that wasn't supposed to happen. I try to reproduce. |
Probably me not handling the muted=True situation correctly, I guess. Will look into that. |
Hmm, could not reproduce. Took the sentence from your log with async and muted=True. It did not raise the callback (because in muted case it gets ignored) but did not throw any errors too. Hopefully it's not OS-dependent. Could you share the code? |
Okay muted=True was probably why I wasn't seeing the word printout then. What I am trying to achieve is to send the words to to a web client so I can see the text as it is being spoken, like live subtitles. I think the error I got was because of the asterix '**' or numbers, I see here that the ** just made Answer your questions bold in the text i pasted here but it was actually 1. ** Answer your questions ** I got several of those errors with similar text, i.e the markdown syntax for bold text. Some code, basically from the async_server example class TTSRequestHandler:
|
Asterisk was the problem, it should be fixed now with v0.4.52 |
Okay great, is it possible to get the word timestamps with Muted=True and play_async now as well? It would be needed to use the functionality when RealtimeTTS is run as an async server, streaming voice and text with timestamps to clients. |
Yeah, agreed. Will integrate that. |
Hi and thank you for a great repo. Kokoro TTS now supports individual word timestamps in their output. Is that something you have for Kokoro (and possibly other models) in Realtime TTS as well? If not it would be an awesome feature.
Here is how to get it from Kokoro:
hexgrad/kokoro#32
The text was updated successfully, but these errors were encountered: