Voice Notes
Never miss a detail again. Instantly capture, transcribe, label speakers, and summarize meetings, lectures, or brilliant ideas with word-level precision—all securely processed on your device.
Recording Audio & Import
Navigate to the AI Voice Note tab and tap the record button to start capturing audio. The app uses either WhisperKit or Apple STT for on-device speech recognition.
- Real-time transcription: Text appears as you speak
- Background recording: On iOS, recording continues when the app is in the background
- Import audio (PRO): Turn your past recordings into searchable, summarizable text with clear, step-by-step progress tracking so you're never left guessing.
All audio processing happens entirely on your device using WhisperKit and Apple Speech frameworks. No audio data is sent to any server.
Transcription
Transcription runs locally using one of several engines: Whisper models, Apple STT, Parakeet, Nemotron, or Qwen3-ASR. Pick a model in Settings → Voice based on your language needs and device. Features include:
- Word-level timestamps: Each word is timestamped for precise navigation
- Multiple languages: Support for many languages including English, Chinese, Japanese, Spanish, French, and more
- Automatic punctuation: The model adds punctuation and sentence structure
Transcription models
In addition to Whisper and Apple STT, several newer models are available. They run entirely on-device and are downloaded once before first use.
Batch transcription
These models process a recorded or imported audio file after recording finishes. They tend to be more accurate than streaming models because they can look at the full audio context.
| Model | Languages | Runs on | Notes |
|---|---|---|---|
| Parakeet TDT v2 | English | Mac (8 GB+) | Highest English accuracy in this group. ~700 MB download. |
| Parakeet TDT v3 | 25 European languages | iPad / Mac | Same architecture as v2 with multilingual support. ~700 MB download. |
| Parakeet TDT-CTC 110M | English | iPhone / iPad / Mac | Smaller model (~407 MB). The only Parakeet model that supports custom vocabulary. |
| Parakeet CTC Japanese | Japanese | iPad / Mac | ~700 MB download. CER around 6.85%. |
| Parakeet CTC Chinese | Mandarin Chinese | iPad / Mac | Available in Int8 (~550 MB) and full-precision (~1.1 GB) variants. |
| Qwen3-ASR | 30 languages | iPad / Mac | Covers CJK, Southeast Asian, and European languages. 30-second clip limit per segment. Available in Int8 (~800 MB) and F32 (~1.5 GB). |
Parakeet TDT-CTC 110M is the only model in this group that works with the custom vocabulary feature. If you need the app to recognize domain-specific terms (product names, acronyms, jargon), use this model.
Streaming recognition
These models transcribe in real time while you record, so text appears on screen as you speak.
| Model | Languages | Runs on | Notes |
|---|---|---|---|
| Parakeet EOU 120M | 35+ languages | iPhone / iPad / Mac | Free for all users. ~150 MB download. End-of-utterance detection for natural segmentation. |
| Nemotron Streaming | English | iPad / Mac | Lower error rate than Parakeet EOU for English (~2% WER). Available in 560 ms and 1120 ms chunk variants. ~600 MB download. |
Parakeet EOU 120M is free and covers 35+ languages. It's a good default for real-time transcription on any device, including iPhone.
All Parakeet, Nemotron, and Qwen3 models require a Pro subscription except Parakeet EOU 120M, which is free.
Speaker Diarization PRO
Automatically label who said what in your voice notes and imported recordings.
- Model download: Download speaker models once directly from the Voice Note UI.
- Label Speakers toggle: Turn on diarization to instantly split transcripts by speaker.
- Precision matching: Advanced tuning controls guarantee perfectly accurate speaker labels, even when people talk over each other.
- Display Speakers toggle: Switch cleanly between speaker labels and timestamp-only views.
- Persisted labels: Speaker labels are securely saved and instantly restored when you reopen a recording.
Re-Transcription
Use Re-Transcript to regenerate transcript text from existing audio with Whisper or Apple STT.
- Model switching: Re-run with a different transcription model for better results
- Optional speaker labeling: Apply diarization automatically after re-transcription
- Safe overwrite: New transcript and speaker labels replace older results for the same audio
AI Processing
After transcription, you can process the text with AI for:
- Summarization: Get concise summaries of meetings or lectures
- Translation: Translate the transcription to another language
- Key points: Extract action items and key takeaways
- Speaker-aware analysis: Your AI assistant knows exactly who said what. Ask it to "Summarize Sarah's points" or "List action items for John" for deeply personalized insights.
- Custom processing: Use any prompt to analyze your transcript however you need.
Word-Level Navigation
Tap any word in the transcription to jump to that exact moment in the recording. This makes it easy to:
- Verify specific quotes or statements
- Re-listen to important sections
- Navigate long recordings efficiently
Organization & Renaming
Keep your brilliant ideas and crucial meetings perfectly organized.
- Custom naming: Easily rename any recording so you can identify important lectures, interviews, and brainstorms at a single glance.
Language Support
Whisper and Apple STT support transcription in many languages. The newer Parakeet and Qwen3 models expand coverage to include Japanese, Mandarin Chinese, Korean, Thai, Vietnamese, and dozens of other languages not previously available. You can let the model detect spoken language automatically or set it manually for better accuracy.
For best transcription quality, use a quiet environment and speak clearly. External microphones also improve accuracy significantly.