Effortlessly convert articles, research papers, or any text into high-quality audio. Listen on the go, anytime, anywhere.
This tool uses Kokoro. More voices and voice-cloning coming soon.
Upload PDFs, EPUBs, or paste text to generate your own audiobooks. Perfect for long-form content and study materials.
Create your own private podcast feed from articles or documents. Use your favorite podcast app to listen.
Optional "Humanize" feature rephrases content for a more natural, conversational listening experience.
RealtimeTTS library is the ideal wrapper.| Model | Arch. | Features / License | Params |
|---|---|---|---|
| Kokoro | StyleTTS 2 | Fast CPU/GPU inference. Apache 2.0. | 82M |
| Piper | VITS/ONNX | Fastest (Raspberry Pi), local-first. MIT. | 5-32M |
| StyleTTS 2 | Diffusion | Clean audio, zero-shot cloning. MIT (Code). | ~200-300M |
| XTTS-v2 | VQ-VAE+GPT | Best cross-lang clone (17). Non-commercial. | ~500M |
| Chatterbox | Llama | Emotion control, watermarking. MIT. | 0.5B |
| Kyutai | Transformer | True text-streaming (~220ms), word timestamps. | N/S |
| IndexTTS | GPT-style | Chinese pinyin control. Apache 2.0. | N/S |
LLM Text -> TTS Node -> Audio Output. Custom nodes exist for most popular models, including Piper (ComfyUI-TTS), Chatterbox (Chatterbox Nodes), Spark-TTS, F5-TTS, and more.