Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Spark-TTS: Text-2-Speech Model Single-Stream Decoupled Tokens [pdf] (arxiv.org)

78 points by bilekas 338 days ago | 6 comments

mike978 336 days ago [-]

https://spark-tts.github.io/

smusamashah 335 days ago [-]

The voices with Chinese origin when generated as English samples do sound like a Chinese person speaking English. It is very interesting.

vessenes 335 days ago [-]

This is really quite good at sounding like Donald, especially for the first half of the audio. I’ll probably play around with this for a bit; it’s. It clear to me how much variation you can get in voice in latent space. Anyway it looks to be a very high quality (at least) short form tts engine with open weights so thanks team!

deknos 335 days ago [-]

Is this really free software? I am really looking for _GOOD_ TTS software which is maintainable, really opensource (for every usage) and can do english/german/spanish/french/russian.

popalchemist 334 days ago [-]

Zonos TTS is the SOTA, fully open-source (Apache license), and supports English, Japanese, Chinese, French, and German out of the box. You could train to add Russian, or run the output of this TTS through Meta's Seamless translation.

https://github.com/Zyphra/Zonos

maxglute 334 days ago [-]

>Fast: our model runs with a real-time factor of ~2x on an RTX 4090 (i.e. generates 2 seconds of audio per 1 second of compute time)

This is great as heavy TTS user. Waiting for real time 3x.

fdafds 335 days ago [-]

[flagged]

Rendered at 10:11:13 GMT+0000 (Coordinated Universal Time) with Vercel.