Voice Notes

Never miss a detail again. Instantly capture, transcribe, label speakers, and summarize meetings, lectures, or brilliant ideas with word-level precision—all securely processed on your device.

Recording Audio & Import

Navigate to the AI Voice Note tab and tap the record button to start capturing audio. The app uses either WhisperKit or Apple STT for on-device speech recognition.

ℹ️ Privacy

All audio processing happens entirely on your device using WhisperKit and Apple Speech frameworks. No audio data is sent to any server.

Transcription

Transcription runs locally using one of several engines: Whisper models, Apple STT, Parakeet, Nemotron, or Qwen3-ASR. Pick a model in Settings → Voice based on your language needs and device. Features include:

Transcription models

In addition to Whisper and Apple STT, several newer models are available. They run entirely on-device and are downloaded once before first use.

Batch transcription

These models process a recorded or imported audio file after recording finishes. They tend to be more accurate than streaming models because they can look at the full audio context.

ModelLanguagesRuns onNotes
Parakeet TDT v2EnglishMac (8 GB+)Highest English accuracy in this group. ~700 MB download.
Parakeet TDT v325 European languagesiPad / MacSame architecture as v2 with multilingual support. ~700 MB download.
Parakeet TDT-CTC 110MEnglishiPhone / iPad / MacSmaller model (~407 MB). The only Parakeet model that supports custom vocabulary.
Parakeet CTC JapaneseJapaneseiPad / Mac~700 MB download. CER around 6.85%.
Parakeet CTC ChineseMandarin ChineseiPad / MacAvailable in Int8 (~550 MB) and full-precision (~1.1 GB) variants.
Qwen3-ASR30 languagesiPad / MacCovers CJK, Southeast Asian, and European languages. 30-second clip limit per segment. Available in Int8 (~800 MB) and F32 (~1.5 GB).
ℹ️ Custom vocabulary

Parakeet TDT-CTC 110M is the only model in this group that works with the custom vocabulary feature. If you need the app to recognize domain-specific terms (product names, acronyms, jargon), use this model.

Streaming recognition

These models transcribe in real time while you record, so text appears on screen as you speak.

ModelLanguagesRuns onNotes
Parakeet EOU 120M35+ languagesiPhone / iPad / MacFree for all users. ~150 MB download. End-of-utterance detection for natural segmentation.
Nemotron StreamingEnglishiPad / MacLower error rate than Parakeet EOU for English (~2% WER). Available in 560 ms and 1120 ms chunk variants. ~600 MB download.
💡 Tip

Parakeet EOU 120M is free and covers 35+ languages. It's a good default for real-time transcription on any device, including iPhone.

All Parakeet, Nemotron, and Qwen3 models require a Pro subscription except Parakeet EOU 120M, which is free.

Speaker Diarization PRO

Automatically label who said what in your voice notes and imported recordings.

Re-Transcription

Use Re-Transcript to regenerate transcript text from existing audio with Whisper or Apple STT.

AI Processing

After transcription, you can process the text with AI for:

Tap any word in the transcription to jump to that exact moment in the recording. This makes it easy to:

Organization & Renaming

Keep your brilliant ideas and crucial meetings perfectly organized.

Language Support

Whisper and Apple STT support transcription in many languages. The newer Parakeet and Qwen3 models expand coverage to include Japanese, Mandarin Chinese, Korean, Thai, Vietnamese, and dozens of other languages not previously available. You can let the model detect spoken language automatically or set it manually for better accuracy.

💡 Tip

For best transcription quality, use a quiet environment and speak clearly. External microphones also improve accuracy significantly.