Text-to-Speech
On Device AI can read text aloud using three on-device engines: Apple's built-in voices, Kokoro neural TTS, and PocketTTS for low-latency streaming speech. Pick the one that fits your needs in Settings → Voice.
TTS Engines
Three text-to-speech engines are available:
- Apple Voices: The system's built-in speech synthesis. Dozens of voices, many languages, minimal resource use. No download needed.
- Kokoro TTS: Neural TTS that produces natural, human-like speech in multiple languages. Runs on-device. Requires a one-time model download (~80 MB).
- PocketTTS: Streaming neural TTS, English only. Audio begins playing within roughly 80 ms, before the full sentence finishes generating. Requires a Pro subscription and a one-time model download. PRO
Apple voices are lightweight and cover the most languages. Kokoro sounds more natural and still supports multiple languages. PocketTTS has the lowest playback latency but only works with English text.
PocketTTS settings
When PocketTTS is the active engine, two extra controls appear in Settings → Voice:
- Temperature (0.0–1.0, default 0.7): Lower values produce more consistent speech. Higher values add more expressiveness and variation.
- De-essing (on by default): Reduces harsh sibilant ("s" and "sh") sounds in the output.
PocketTTS supports English only. Non-English text will still synthesize on a best-effort basis, but the results are unpredictable. For other languages, switch to Kokoro or Apple.
Using TTS
There are several ways to use text-to-speech:
- TTS Tab: Navigate to the Text to Speech tab, paste or type text, and tap the play button
- AI Chat responses: Tap the speaker icon on any AI response to have it read aloud
- Message Actions: Select "Send to TTS" from a message's context menu (iOS) or the "More Actions" button (macOS) to send it directly to the TTS tab
- Auto-play: Enable auto-play to have every AI response spoken automatically
Generate & Save
From the Text to Speech tab, you can export speech to a WAV file. Tap "Generate & Save" to write the audio to your device's recordings folder with a metadata sidecar (text, engine, voice, speed). Cancelling mid-export removes any partial files automatically.
Speech Speed
Adjust the speech speed in Settings → Voice:
- Range: 0.5x (slow) to 2.0x (fast)
- Default: 1.0x (normal speed)
- Test button: Preview the current speed setting before applying
Auto-Play Responses
When enabled, AI responses in chat are automatically converted to speech and played back. This creates a hands-free conversational experience, especially useful when combined with voice input.
Toggle auto-play in Settings → Voice → "Auto-play voice responses".