Skip to content

Is this could be used for audio synthesis? #24

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
MonolithFoundation opened this issue Sep 18, 2024 · 6 comments
Open

Is this could be used for audio synthesis? #24

MonolithFoundation opened this issue Sep 18, 2024 · 6 comments

Comments

@MonolithFoundation
Copy link

For instance, LLM out produce snac tokens, and decode into audio?

@itsliupeng
Copy link

FYI. https://github.com/gpt-omni/mini-omni uses snac codec to generate audio.

@MonolithFoundation
Copy link
Author

Thanks for the hint, how about Chinese?

@MrWaterZhou
Copy link

https://github.com/MrWaterZhou/viitor-voice

I tried, and it works well :)

@MonolithFoundation
Copy link
Author

Woo, does it support Madrian and Japanese?

@MrWaterZhou
Copy link

Woo, does it support Madrian and Japanese?

Not yet, but we are working on Madrian and will release it soon.

@MrWaterZhou
Copy link

MrWaterZhou commented Nov 28, 2024

Woo, does it support Madrian and Japanese?

Our Chinese model has been updated—feel free to give it a try!
https://github.com/viitor-ai/viitor-voice/tree/main

FYI, we’ve noticed that the 24kHz SNAC model doesn’t perform well on Chinese audio, especially with higher-pitched samples. We’re currently experimenting with fine-tuning the decoder using vocos, and so far, the results look promising.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants