Text-to-Speech

On Device AI can read text aloud using three on-device engines: Apple's built-in voices, Kokoro neural TTS, and PocketTTS for low-latency streaming speech. Pick the one that fits your needs in Settings → Voice.

TTS Engines

Three text-to-speech engines are available:

ℹ️ Which engine to pick

Apple voices are lightweight and cover the most languages. Kokoro sounds more natural and still supports multiple languages. PocketTTS has the lowest playback latency but only works with English text.

PocketTTS settings

When PocketTTS is the active engine, two extra controls appear in Settings → Voice:

PocketTTS supports English only. Non-English text will still synthesize on a best-effort basis, but the results are unpredictable. For other languages, switch to Kokoro or Apple.

Using TTS

There are several ways to use text-to-speech:

Generate & Save

From the Text to Speech tab, you can export speech to a WAV file. Tap "Generate & Save" to write the audio to your device's recordings folder with a metadata sidecar (text, engine, voice, speed). Cancelling mid-export removes any partial files automatically.

Speech Speed

Adjust the speech speed in Settings → Voice:

Auto-Play Responses

When enabled, AI responses in chat are automatically converted to speech and played back. This creates a hands-free conversational experience, especially useful when combined with voice input.

Toggle auto-play in Settings → Voice → "Auto-play voice responses".