NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Spark-TTS: Text-2-Speech Model Single-Stream Decoupled Tokens [pdf] (arxiv.org)
mike978 4 days ago [-]
smusamashah 4 days ago [-]
The voices with Chinese origin when generated as English samples do sound like a Chinese person speaking English. It is very interesting.
vessenes 4 days ago [-]
This is really quite good at sounding like Donald, especially for the first half of the audio. I’ll probably play around with this for a bit; it’s. It clear to me how much variation you can get in voice in latent space. Anyway it looks to be a very high quality (at least) short form tts engine with open weights so thanks team!
deknos 3 days ago [-]
Is this really free software? I am really looking for _GOOD_ TTS software which is maintainable, really opensource (for every usage) and can do english/german/spanish/french/russian.
popalchemist 3 days ago [-]
Zonos TTS is the SOTA, fully open-source (Apache license), and supports English, Japanese, Chinese, French, and German out of the box. You could train to add Russian, or run the output of this TTS through Meta's Seamless translation.

https://github.com/Zyphra/Zonos

maxglute 2 days ago [-]
>Fast: our model runs with a real-time factor of ~2x on an RTX 4090 (i.e. generates 2 seconds of audio per 1 second of compute time)

This is great as heavy TTS user. Waiting for real time 3x.

fdafds 4 days ago [-]
[flagged]
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 03:28:33 GMT+0000 (Coordinated Universal Time) with Vercel.