Five tools · One honest comparison

Picking a transcription app shouldn't feel like pulling teeth.

We spent a few weekends running real recordings — interviews, lectures, voice memos, the occasional podcast — through five popular desktop transcription apps. Here's what we found, told plainly. No leaderboard. No "winner". Just five tools and the kinds of work each one is actually good at.

Browse the reviews Quick comparison

⚡

5 apps

Tested in detail

💻

100% local

No cloud uploads

🎁

Mostly free

No subscriptions

🛡

No fluff

Honest reviews

The shortlist

Five tools we keep coming back to

All five run locally on your computer. None of them charge a subscription. Each one solves the problem in its own way — which is the whole reason this site exists.

01 Cross-platform

Buzz

macOS · Windows · Linux · Open source

A no-frills GUI on top of OpenAI's Whisper. Drag a file, pick a model, get text. The app most people should try first if they've never run a transcription locally before.

Read the full review

02 Cross-platform

Subtitle Edit

Windows native · macOS & Linux via workarounds · Open source

Technically a subtitle editor, but if you've ever needed to fix Whisper's punctuation, re-time captions, or export to a dozen different subtitle formats, nothing else comes close. Heavier than Buzz, but pays it back the moment you start cleaning up output.

Read the full review

03 macOS only

Whisper Transcription

Mac App Store · Free tier · Paid models

The polished one. Sandboxed, signed, sits quietly in the menu bar. If you want something that feels Mac-native rather than a Python script wearing a coat, this is it.

Read the full review

04 macOS only Free

Pyrenees

Apple Silicon · Free · MLX-powered

The newcomer. Built around Apple's MLX framework, it's the fastest of the bunch on M-series chips for the model sizes it supports — sometimes by an embarrassing margin.

Read the full review

05 macOS only

VoiceInk

Real-time dictation · Open source

Different category, really: VoiceInk is for dictation, not file transcription. Hold a hotkey, talk, and your text lands wherever the cursor is. We included it because it answers a question the others don't.

Read the full review

At a glance

A quick comparison

A simplified table — useful for narrowing down, not for making the final decision. The full reviews go deeper into where each tool stumbles.

App	Platforms	Price	Best for	File mode	Live dictation	Subtitle export
Buzz	Mac · Win · Linux	Free (open source)	First-time users, quick batch jobs	Yes	Limited	SRT, VTT, TXT
Subtitle Edit	Win · Mac/Linux (Mono)	Free (open source)	Cleaning up transcripts & subtitle work	Yes	No	~200 formats
Whisper Transcription	macOS	Free tier · paid model unlocks	Mac users who want polish over tinkering	Yes	Microphone capture	SRT, VTT, TXT, DOCX
Pyrenees	macOS (Apple Silicon)	Free	Speed, batch jobs on M-series Macs	Yes	No	SRT, VTT, TXT
VoiceInk	macOS	Free (open source)	Dictation into any app	Secondary feature	Primary feature	N/A

How to pick

Start with the question, not the tool

People keep asking us, "what's the best transcription app?" That question doesn't really have an answer. Here's a better way to think about it.

If you've never done this before…

Try Buzz. It's the lowest-friction way to find out whether local transcription is good enough for your needs. Five minutes in, you'll know.

If you make subtitles for a living…

Subtitle Edit, no contest. The waveform editor and format support eat the whole rest of the field. Whisper integration is just the cherry on top.

If you live in macOS and value polish…

Whisper Transcription if you want a curated experience and don't mind paying for higher-quality models, Pyrenees if you'd rather get the speed and zero cost and don't need bells and whistles.

If you want to dictate, not transcribe files…

VoiceInk. It's the only one of the five built around the "I want to talk into my computer" workflow. The other four are wrong tools for that job.

A note from us

We're not journalists. We're a couple of people who got tired of "Top 10 AI Transcription Tools 2025" articles that all said the same five things in the same five blocks. There's nothing wrong with affiliate roundups — they keep the lights on for a lot of small sites — but they tend to flatten the differences between tools. We wanted somewhere that does the opposite, and explains why you'd pick one over another.

Read the reviews in any order. Most people don't read them all — and that's fine. More about us here, if you're curious.

Home › Reviews › Buzz

Buzz

Buzz is one of those tools that solves exactly one problem and then gets out of your way. The problem in question — running OpenAI's Whisper model on a file you don't want leaving your laptop — used to involve at least one virtual environment, a couple of pip install commands, and a 50-50 chance of an opaque ffmpeg error. Buzz turns all of that into a window with a button labeled "Transcribe". Drag in your audio, pick a model size, watch the progress bar. Output goes to a folder. Done.

That's the whole pitch. If it sounds underwhelming, that's because the surface is supposed to be underwhelming. The interesting bit is what it doesn't do — it doesn't ask you to make an account, doesn't push your file to a server, doesn't try to upsell you to an "AI summary" feature on a credits plan. It is genuinely just a wrapper. And after spending a fair bit of time with the app across two laptops and three different operating systems, I think that minimalism is the strongest thing about it.

What Buzz actually is

Buzz is an open-source desktop application written by Chidi Williams. The repository on GitHub has been around since 2022 and continues to receive updates. Under the hood, it bundles two transcription engines: the original OpenAI Whisper implementation in Python, and the much faster whisper.cpp port written in C++. You can pick which one you want at the model-loading screen — and which one you want depends on what kind of computer you're sitting at.

If you've ever used a tool like Audacity, Buzz will look familiar in spirit: utilitarian, slightly old-fashioned in its widgets, clearly designed by people who care more about the verb than the noun. There's no marketing-driven empty space. There's no dashboard. The main window is a list of transcription jobs, and each row tells you whether it's queued, running, or finished.

Note

Buzz is not OpenAI's official Whisper app. OpenAI has never released one. Buzz is a community-built front-end that loads OpenAI's open-source model on your local machine. The transcription happens entirely on your computer; nothing is sent to OpenAI or any other server.

How it feels to use

The first thing that struck me was how quickly I got from "I just downloaded this" to "I have a transcript." On a 2021 M1 MacBook Pro, the entire setup took about three minutes — and most of that was the initial model download (the medium-size model is about 1.5 GB). On a five-year-old Windows laptop without a discrete GPU, it took longer to run the transcription, but the setup was identical.

The interface isn't beautiful. I want to get that out of the way. It's built with PyQt, and it looks like every PyQt app ever made: serviceable, slightly chunky, with menus that haven't been redesigned since the early 2010s. But I've spent enough time with truly bad audio software to know the difference between "ugly" and "clumsy", and Buzz is firmly in the first category. Things are where you'd expect. The keyboard shortcuts work. The progress reporting is honest — the percentage moves at roughly the speed it claims to be moving.

I tested it on a 47-minute interview I'd recorded for an unrelated project. Tiny model finished in about 90 seconds and produced a transcript that was readable but not really usable; about one word in twenty was wrong. The base model took maybe four minutes and the result was significantly better — I'd say good enough for a personal note-taking workflow. The medium model took around fifteen minutes on the M1, and the output was the kind of transcript you'd want if you were going to publish it after a light edit. None of this should surprise anyone who's used Whisper before. The point is just that Buzz exposes those tradeoffs in plain English instead of asking you to memorize model names.

The best thing I can say about Buzz is that I forgot I was using it. It does the job, the job ends, the file appears.

Under the hood

Buzz lets you choose between several model sizes (tiny, base, small, medium, large) and several backends. The most important choice is between the original OpenAI Whisper Python implementation, the whisper.cpp backend, and Hugging Face's transformers-based implementation. There's also support for using OpenAI's hosted Whisper API if you'd prefer to send the file to OpenAI in exchange for faster results — but that defeats the privacy advantage, and almost no one I know who installs Buzz uses that mode.

Two practical notes from real-world use:

On Apple Silicon, the whisper.cpp backend with Core ML acceleration is the fastest by a wide margin. You'll want to enable that.
On any machine with a recent NVIDIA GPU and CUDA installed, the original PyTorch backend will use the GPU and become noticeably faster than the CPU-only paths. On laptops without a dedicated GPU, the difference between backends shrinks dramatically.

Buzz also supports a "Live Recording" mode where it'll transcribe directly from your microphone as you speak. I've used this feature exactly twice, and both times I came away thinking that this is not what Buzz is for. The latency is wrong for it — you'll get text in chunks of several seconds — and it doesn't integrate with other apps. If you want dictation that drops text where your cursor is, look at VoiceInk instead. If you want live captions for a video call, look elsewhere entirely. Buzz is a file-based tool with a microphone option grafted on, and you can feel the seam.

Tip

If you've already tried Buzz and the transcripts come back with weird timing or punctuation issues, don't wrestle with the app — export to .srt or .vtt and clean up in Subtitle Edit. It's faster than fighting Buzz's text editor.

Pros and cons

What works

Truly cross-platform; same UX on Mac, Windows, and Linux
Genuinely free — MIT license, no signup, no cap
Multiple backend options, including fast whisper.cpp
Local-only by default; the file never leaves your machine
Supports the most common output formats: SRT, VTT, TXT
Batch mode handles long queues without supervision

Where it falls short

UI looks dated and won't appeal to anyone used to native-feeling apps
Live recording mode is awkward and high-latency
No real waveform editor or word-level timing tools
Translation feature exists but is limited (Whisper-style, English-only target)
First-launch model download can be slow on flaky connections
Crashes happen occasionally on very long files (3+ hours)

How to use Buzz: a step-by-step

This is the part most write-ups skip, so here it is. The whole flow, from "I haven't installed anything" to "I have a clean SRT", on a Mac. Windows and Linux are nearly identical.

Download Buzz from the official GitHub releases page. Pick the build for your operating system. On macOS, that's a .dmg; on Windows it's an .exe installer; on Linux you've got AppImage and Snap options.
Install it like any normal app. On Mac, drag to Applications. On first launch, macOS may complain about an unidentified developer; control-click and choose "Open" once and the warning goes away.
Open Preferences and choose your default backend. On Apple Silicon, "Whisper.cpp" with Core ML support is the right answer. On Windows with an NVIDIA GPU, the OpenAI Whisper backend will use CUDA. Otherwise, leave it on the default.
Drag your audio or video file into the main window. Buzz accepts MP3, WAV, M4A, FLAC, MP4, MOV — basically anything ffmpeg can read.
Pick a model size. Start with base if you're not sure. Move up to medium for cleaner output. The large model is the most accurate but slow and memory-hungry.
Pick output formats. For interviews, TXT plus SRT is the right combination. The first is for reading; the second is for any future cleanup work in a subtitle editor.
Hit Transcribe and walk away. Seriously. Make a cup of tea. The progress bar updates honestly. When it finishes, the output files appear next to the source.

Example

Test recording: a 47-minute interview, recorded into the iPhone Voice Memos app, exported as .m4a.
Result with the medium model on M1 MacBook Pro: finished in 14 minutes 22 seconds. The transcript needed roughly 5 minutes of cleanup — mostly proper nouns the model didn't know, plus the usual punctuation around hesitations.

Compared to the alternatives

If we're staying inside this site's shortlist, here's where Buzz sits relative to the others:

Versus Subtitle Edit: Subtitle Edit can also drive Whisper, but it does much more than transcribe — it has a full waveform editor and supports an absurd number of subtitle formats. If you're a translator or a captioner, Subtitle Edit is probably your daily driver and Buzz is redundant. If you just want a transcript, Buzz is faster to learn.

Versus Whisper Transcription (Mac): Whisper Transcription is more polished, prettier, and better integrated into macOS. It's also Mac-only and has a paid tier. Buzz is uglier, but free everywhere.

Versus Pyrenees: Pyrenees is faster on Apple Silicon, full stop — but only on Apple Silicon. If you're on an M-series Mac and you mostly transcribe shorter files, Pyrenees wins on speed. Buzz wins on cross-platform consistency and on having more backend options.

Versus VoiceInk: Different tool for a different job. VoiceInk is for live dictation (talking into apps as you'd talk into iOS dictation). Buzz is for files. They don't really compete.

Who should use Buzz

Author's take

If you've never run a local transcription before — install Buzz first, even if you eventually move on to something else. It's the lowest-risk way to find out whether Whisper-quality output is good enough for your work. It costs nothing, it commits you to nothing, and the answer takes about ten minutes to get.

If you already know you need subtitle editing, dictation, or maximum speed on Apple Silicon, you can probably skip Buzz and head straight to the more specialized option. But you won't regret installing it as a fallback.

FAQ

Is Buzz free? Are there hidden costs?

Buzz itself is free under the MIT license. There's no signup, no trial, no premium tier. The only cost would be if you choose to use the OpenAI Whisper API mode, which routes audio through OpenAI's paid API — but that's optional and clearly labeled.

Does Buzz upload my audio anywhere?

Not by default. The local backends (whisper.cpp and the OpenAI open-source Whisper) do everything on your machine. The only mode that uploads anything is the explicit "OpenAI API" mode, and you have to provide your own API key to use it.

What languages does Buzz handle?

Whatever Whisper handles, which is roughly 99 languages with varying accuracy. English, the major European languages, and Mandarin tend to be very strong. Smaller languages and dialects get noticeably less reliable. Buzz exposes a translation mode that can convert non-English audio into English text, but not the other way around — that's a Whisper limitation, not a Buzz one.

Can I edit the transcript inside Buzz?

You can do basic text edits, but Buzz isn't a text editor or a subtitle editor. For any serious cleanup — re-timing, fixing punctuation, splitting cues — open the SRT in Subtitle Edit or another dedicated tool.

Does Buzz work offline?

Yes, after the initial model download. You only need an internet connection the first time you load each model. After that, transcription happens entirely offline.

Why is the large model so slow on my computer?

Whisper's large model is around 3 GB and benefits enormously from a GPU or Apple Silicon's Neural Engine. On an older CPU-only laptop, it can take longer than real-time — meaning a one-hour file might take an hour or more to process. The medium model is usually a better tradeoff if you don't have GPU acceleration.

M

Reviewed by the ashleyphillipsphoto.com team. We installed Buzz on three machines (M1 MacBook Pro, Windows 11 ThinkPad without a discrete GPU, and Ubuntu 22.04 on a desktop with an NVIDIA card) and ran the same set of test files on each. None of the developers of Buzz were contacted. We are not affiliated with the project.

Home › Reviews › Subtitle Edit

Subtitle Edit

I want to start with a confession. The first time I tried to use Subtitle Edit for transcription, I gave up after fifteen minutes and went back to Buzz. The interface looked like a Windows XP control panel that had been dragged forward by sheer force of will, and I couldn't figure out how to make Whisper run from inside it. I assumed it was buggy. It wasn't. I was just wrong about what kind of program I was looking at.

Subtitle Edit is not a transcription app with caption editing tacked on. It's a caption editor with transcription tacked on. The difference matters. The whole layout — the waveform on the bottom, the cue list on the right, the keyboard-shortcut-everything ergonomics — assumes you already have something resembling subtitles and you're working on them. Once that flipped in my head, the program suddenly made sense, and within an hour I was using it for things I hadn't realized I needed.

Two ways to think about it

There are essentially two distinct workflows the app supports, and people who try Subtitle Edit usually fall into one camp or the other.

Workflow A: you have an audio or video file and you want clean, correctly-timed captions. You import the media, you tell Subtitle Edit to run Whisper on it, you wait. You get a populated cue list. Then you spend twenty or thirty minutes cleaning it up: fixing punctuation, merging short cues, splitting long ones, retiming the parts where the model got confused. The output goes out as .srt or whatever else you need. This is the workflow professional captioners use.

Workflow B: you have a transcript already (from Buzz, from Whisper Transcription, from Otter, doesn't matter) and you want to fix it. You open the existing file, you bring in the audio so the waveform syncs with the cues, and you fix the obvious mistakes by listening and clicking. This is what I personally use it for, and I think it's the underrated use case. Even if your primary transcriber is something else entirely, Subtitle Edit makes a phenomenal "second tool".

A small confession

I now run almost everything through Buzz first and then open the resulting .srt in Subtitle Edit for cleanup. It's not the workflow the developer intended, but it's faster than trying to do everything in one app, and the keyboard-driven editing in Subtitle Edit is genuinely better than anything else I've tried.

What's it like to actually use?

The interface is dense. There's no other word for it. Every pixel earns its keep. On first launch you see a menu bar with maybe a dozen drop-downs, a toolbar with another dozen icons, and three separate panels that all want your attention at once. If you're used to single-purpose apps with three buttons total, the first impression is overwhelming.

Stick with it for an afternoon and the density becomes a feature. The reason every command has a keyboard shortcut is that captioners burn through hundreds of cues per session and need to keep their hands on the keyboard. The reason the waveform is always visible is that you're going to be matching cue boundaries to audio events, and it's faster to see them than to scrub. The "ugly" interface is the price of being able to do real work.

Whisper integration sits under Video → Audio to text (Whisper). From there you choose the engine — Subtitle Edit supports the original Python Whisper, whisper.cpp, Const-me's GPU implementation, Purfview's Whisper Faster, and a couple of others depending on which version you have installed. Each engine has its own strengths. On a Windows laptop without a GPU, Purfview's implementation gave me the best balance of speed and accuracy. On a machine with an NVIDIA card, Const-me's GPU build was faster than anything else by a wide margin.

The format support is genuinely absurd

This is the part nobody talks about, and the thing that makes Subtitle Edit irreplaceable in some workflows. The app reads and writes well over two hundred subtitle formats. If you've ever stared at a file with a strange extension and wondered how to convert it to .srt without losing timing or styling, Subtitle Edit is almost certainly the answer.

A non-exhaustive sample of what it'll handle:

The everyday ones: SRT, VTT (WebVTT), ASS, SSA, TXT
Broadcast formats: EBU STL, Cavena 890, EBU-TT, EBU-TT-D, IMSC 1.1, TTML, DFXP, SCC, SMPTE-TT
DVD/Blu-ray: VobSub (.sub/.idx), SUP (BD), and OCR support for both
Streaming/professional: Netflix's IMSC variant, Apple iTT, MicroDVD, MPL2
Weird and old: JSON Web, FAB, Sonic Scenarist, ATS, plus a long tail of pretty obscure formats

If your transcription job ends in "and then we hand it to a broadcaster," Subtitle Edit may be the only free tool that can produce the format the broadcaster wants. That alone is reason enough for some people to keep it installed.

Tip for the OCR feature

If you ever find yourself with a video file whose subtitles are baked in as images (DVD/Blu-ray rips, some MKV files), Subtitle Edit's OCR can extract them to text. It's not perfect — proper nouns get mangled — but it'll save you hours over typing them out. Pair it with Whisper to cross-check timing.

Where it stumbles

Honest section. Subtitle Edit is not for everyone, and there are real friction points beyond the dense UI.

The platform story is uneven. The Windows version is a proper, signed, native application that has had two decades of polish. The Mac version is a newer port that runs natively on both Intel and Apple Silicon, but feels less mature — keyboard shortcuts that work flawlessly on Windows occasionally do nothing on Mac, certain dialogs appear off-screen, and waveform extraction sometimes fails on file types that work fine on the Windows build. On Linux, you're typically running it through Mono, which works but has its own assortment of papercuts. If you're not on Windows, expect rougher edges.

It's not a transcription-first app. If your goal is to get a clean .txt transcript and you don't care about timing, you'll find yourself fighting a UI that wants you to care about timing. You can absolutely use it for plain transcripts — just export to TXT after the cues are populated — but you'll spend a lot of attention on widgets you didn't need.

The translation features are uneven. There are translation integrations (Google, DeepL, libretranslate, ChatGPT API, others), but the quality varies and the UX of running them feels grafted on. For pure translation work, you're better off elsewhere.

The learning curve is real. Out of every tool we cover on this site, Subtitle Edit has the steepest first-week curve. Plan for it.

Pros & cons, condensed

Strengths

Best-in-class subtitle editing — keyboard-driven, fast, dense
Waveform editor that's better than most paid alternatives
Multiple Whisper backends, including GPU-accelerated ones
Format support that no other free tool comes close to
OCR for image-based subtitles included for free
Active development and an unusually responsive maintainer
Has been quietly dependable for over twenty years

Weak spots

UI looks and feels like Windows-native circa 2010
Mac and Linux ports lag behind the Windows build
Steep learning curve in the first few sessions
Wrong tool if you only want a plain transcript
Whisper setup involves picking the right engine, not a one-click affair
Translation quality is uneven and add-on-ish

How to use it: three short walkthroughs

Because Subtitle Edit covers more than one workflow, a single step-by-step guide doesn't quite work. Here are three focused ones.

1. From audio file to clean SRT (full Whisper workflow)

Open Subtitle Edit and load your video or audio. File → Open video file. The waveform should appear at the bottom; if it doesn't, the file extension may not be supported and you'll need to convert.
Open the Whisper dialog: Video → Audio to text (Whisper). Pick the engine and model size. Faster-Whisper with the medium model is a good default on most Windows machines.
Let it run. Progress shows in a small log window. A 30-minute file on a midrange Windows laptop typically completes in 8–15 minutes.
Review the cue list. Each cue is a row. The waveform shows where it sits in the audio. Use the keyboard arrows to step through cues; spacebar plays the current one.
Fix obvious problems. Common ones: cues that are too long (use Tools → Split lines longer than…), overlapping cues, wrong proper nouns. Most of this can be handled with built-in batch tools.
Export. File → Save as → SRT for general use, or whichever format your downstream tool needs. Subtitle Edit can also do batch convert in case you have many files.

2. Cleaning up an SRT that another tool produced

Open the existing SRT in Subtitle Edit (File → Open).
Bring in the original audio via Video → Open video file. The waveform now syncs with the existing cues.
Run "Fix common errors" from the Tools menu. This catches things like missing spaces after periods, capitalization issues, double spaces, and Whisper's habit of starting cues with hesitation markers.
Walk through the cues that look suspicious — usually the very long ones and the very short ones — and fix them by listening to the audio.
Save. Done.

3. Extracting subtitles from a Blu-ray rip

File → Import → VobSub or Blu-ray sup, depending on what you have.
Choose the OCR engine. The built-in nOCR is decent for European languages; Tesseract is better for some scripts.
Run OCR. Review the results and correct any glyphs the engine misread.
Save out as SRT for use anywhere else.

Subtitle Edit vs. the others

If we line it up against the rest of the shortlist:

Compared to Buzz, Subtitle Edit is the heavy tool. Buzz is for "I have a recording, I want a transcript". Subtitle Edit is for "I have a recording, I want broadcast-ready captions, and I'm willing to spend an afternoon getting them right." Both are free; they're answers to different questions.

Compared to Whisper Transcription, Subtitle Edit is dramatically uglier and dramatically more capable. Whisper Transcription will get you a clean transcript faster on a Mac. Subtitle Edit will let you actually shape it.

Compared to Pyrenees, the comparison doesn't really hold — Pyrenees is a transcription engine optimized for speed, Subtitle Edit is an editing environment. They could even live alongside each other: Pyrenees produces, Subtitle Edit edits.

Compared to VoiceInk, they share no overlap at all. Different jobs.

Who it's for

Author's take

Subtitle Edit is the answer when you've already passed the "can I run Whisper at all?" stage and you're now asking "how do I make this output usable?" I think most people will install Buzz first and discover Subtitle Edit a few months later, and that's the right order. But for translators, captioners, and anyone whose job involves the words "broadcast-safe", it's the most important free tool you can have.

FAQ

Is Subtitle Edit free to use commercially?

Yes. It's released under the GNU General Public License v3 and can be used commercially without restriction. The one nuance: if you bundle and redistribute Subtitle Edit, you have to comply with the GPL. Just using it on commercial work is unrestricted.

Does it run on Mac and Linux properly?

It runs, but with caveats. There's now a native build for macOS that supports Intel and Apple Silicon, and it's improving quickly — but the Windows version is still where the polish lives. On Linux you'll usually run it through Mono. If you need Subtitle Edit's full power, plan to use Windows or a Windows VM.

Which Whisper engine should I pick from inside the app?

For most Windows users without a GPU: Purfview's Whisper Faster build is the most reliable balance of speed and accuracy. With an NVIDIA GPU: Const-me's GPU implementation tends to be the fastest. On macOS: whisper.cpp through the bundled integration. The differences are smaller than the choice of model size, so don't agonize.

Can it do real-time transcription?

No. Subtitle Edit is strictly a file-based tool. For live transcription or dictation, look elsewhere.

Does it support speaker diarization?

Not natively in a clean, automatic way. Whisper itself doesn't reliably do diarization, and Subtitle Edit doesn't add a separate diarization step. If you need "Speaker 1 / Speaker 2" labels, you'll need to do that work manually in the cue list, or feed your audio through a separate diarization tool first.

Is the OCR feature any good?

Surprisingly yes, for European languages and especially for cleaner DVD subtitles. For Blu-ray SUPs the results are usually 90%+ accurate before correction. For non-Latin scripts, results vary — Tesseract does the heavy lifting there, and you'll need the right Tesseract language pack installed.

M

Tested over a couple of months on a Windows 11 laptop (Intel, no GPU), an M2 MacBook Air, and a Linux desktop running it through Mono. Most testing happened on the Windows build because it's the most complete; the Mac caveats reflect direct experience with the native build.

Home › Reviews › Whisper Transcription

Whisper Transcription

Setting expectations

This is one of the rare apps in this space that someone clearly designed, rather than just shipped. You can tell immediately. The icon doesn't look like a Python logo with a microphone slapped on. The window has the right corner radius. The settings panel uses the macOS sheet style that lets you actually find what you're looking for. When you import a file, the app shows you metadata — sample rate, channels, length — that other transcription tools just shrug at.

None of this changes the underlying transcription quality. Whisper is Whisper, regardless of which app calls it. So the question Whisper Transcription has to answer is: given that the model is the same, what does this app give me that the free options don't?

The honest answer, after spending a couple of weeks with it: a lot of small things, none of which are individually decisive, but which add up to "this is the app I'd hand to my mother."

What it actually does

The core flow is identical to every other tool in this category. Drag a file in, pick a model, press a button, get text. Where Whisper Transcription differentiates is in the small touches.

The transcript view is interactive. Click a sentence, the audio jumps to that timestamp. Edit the sentence in place. Highlight a span and you get inline tools to merge cues, split them, change capitalization, mark a speaker. It's not Subtitle Edit's level of cue-editing power, but for working with prose-style transcripts, it's genuinely faster than re-opening your output in another app.

It can capture system audio, not just microphone. A small but uncommon feature. If you want to transcribe a YouTube video, a podcast you're listening to, or a Zoom call (with appropriate permissions), Whisper Transcription can pipe the system's audio output directly in. Most of the free alternatives only see the microphone.

Export is well thought through. SRT, VTT, plain text, and DOCX are all one click away. The DOCX export in particular is more polished than what you'll get from running Whisper through a script — it preserves paragraph breaks at sensible points, includes timestamps as headers if you want them, and doesn't dump everything into a single block of unreadable prose.

There's a menu-bar mode. If you click the menubar icon, a small palette appears that lets you start a recording, drop in a file, or pull up your recent transcripts without opening the main app. It's the kind of detail a tinkerer never builds and a designer always insists on.

A small example

I recorded a 12-minute podcast intro on the day the new model unlock had just landed. Imported the M4A. The transcription took 2 minutes 40 seconds with the medium model on an M2 MacBook Air. The interactive transcript caught two proper nouns I'd mispronounced, and clicking each one to hear the audio playback was — and I don't mean this lightly — a delight. I didn't have to use the find function or scrub a waveform.

The pricing question

This is the part of the review where we have to talk about money, because it's the main thing that separates Whisper Transcription from the free alternatives.

The app is free to download from the Mac App Store. The free tier comes with the smaller Whisper models — typically tiny and base — which are fine for casual notes but noticeably less accurate than what you'd want for professional work. To unlock the larger models (medium, large, and the various distilled variants), you make a one-time in-app purchase. Pricing has shifted over time and may differ by region, so I'd rather you check the App Store listing than trust a number we wrote down at one point.

What's worth saying is that the pricing model is a one-time unlock, not a subscription. You pay once and you own the larger models. There's no monthly fee, no per-minute charge, no credits system. That alone makes it cheaper than most cloud-based transcription services if you transcribe more than a few hours a month.

My honest take on paying for it

Free Whisper exists. You can run it through Buzz or Pyrenees and get the same model output for nothing. So the question isn't "should I pay for transcription?" — it's "should I pay an indie Mac developer for a polished front-end?" If you transcribe occasionally and care about your time, yes. If you transcribe rarely or you genuinely enjoy fiddling with command-line flags, no. Both are fine answers.

Where the polish ends

I want to be direct about the limits here, because every "the polished one" review I've ever read tends to gloss over them.

Mac only. Obvious but worth saying. If you ever switch to Windows or Linux, your purchase doesn't follow you and your workflow doesn't follow you.

Less flexible than open-source alternatives. The app picks reasonable defaults and hides most of the tuning knobs. If you want to set custom Whisper parameters, run a fine-tuned model, or experiment with non-standard backends, you'll outgrow Whisper Transcription quickly. Buzz lets you switch backends; this doesn't.

Speed is good but not the best. On Apple Silicon, Pyrenees is faster — sometimes substantially faster — for the same model size. Whisper Transcription uses solid acceleration but isn't the speed champion of the field.

No deep subtitle editing. The interactive editor is a pleasure for prose, but it's not pretending to be Subtitle Edit. If your job involves cue-by-cue caption work, you'll still be exporting to .srt and finishing the job elsewhere.

App Store review constraints. Because it's distributed through the App Store, it lives inside Apple's sandbox rules. That has security upsides (the app can't quietly access files you didn't grant it access to) but the occasional UX papercut — for instance, you'll be re-asked for microphone permission after some macOS updates.

Pros and cons

What you get

Genuinely Mac-native interface — feels like a 2026 app, not a 2014 utility
Interactive transcript editor with click-to-play timestamps
Clean DOCX, SRT, VTT, TXT export
Menu-bar quick access for ad-hoc recordings
System audio capture, not just microphone
One-time purchase, no subscription
App Store distribution = signed, sandboxed, easy to install
Active development from a known indie developer

What you don't

Mac only; nothing for Windows or Linux users
Larger Whisper models are paywalled
Slower than Pyrenees on the same hardware
Limited backend tuning compared to Buzz
Not a serious subtitle editor
Sandbox occasionally requires re-granting permissions

How to actually use it

The flow is shorter than for most tools we've reviewed. Here's the abridged version.

Install from the Mac App Store. Search "Whisper Transcription" and install. No external installer, no permissions juggling.
Open it and let it download the default model. The free models are small enough that this is fast.
Drop in a file or click the record button. Audio and video files work; the app strips audio automatically.
Pick the model and language. If you've unlocked the larger models, medium is a sweet spot for most use cases. Language can be left on auto-detect.
Start the transcription. Watch the progress bar — or, more usefully, switch to another app and ignore it until it's done.
Edit the transcript inline. Click any sentence to play it back. Fix mistakes. Tag speakers.
Export. File → Export, pick the format. Done.

Tip

If you're going to do any serious cue editing, export to SRT and open it in Subtitle Edit. Whisper Transcription's editor is great for prose; it's not designed for the cue-by-cue work captioners do.

Compared to the others

Quick reference points across the rest of the shortlist:

Versus Buzz: Buzz is free everywhere; Whisper Transcription is a paid Mac app. If you're disciplined enough to set up Buzz and don't mind its plain UI, you get the same transcription quality without spending anything. If you want it to feel like a Mac app and you transcribe regularly enough that the time savings matter, the purchase pays itself back.

Versus Pyrenees: Pyrenees is faster and free, but barer-bones. No interactive editor, no DOCX export, no system audio capture. If raw speed and zero cost are your priorities, Pyrenees. If polish is your priority, this.

Versus Subtitle Edit: Different category. Whisper Transcription is for getting transcripts; Subtitle Edit is for grooming captions. If you do both, you'll likely use both.

Versus VoiceInk: Different again. VoiceInk is for live dictation into other apps. Whisper Transcription is for files (with optional recording). They cover different problems.

FAQ

Is the free tier good enough on its own?

For casual use — voice memos, meeting notes, short interviews where you'll edit the transcript anyway — yes. The smaller models are surprisingly capable. For longer, professional work, the medium and large models (paywalled) are noticeably better, and the gap matters most when audio quality is uneven.

How does it compare to OpenAI's hosted Whisper API?

The hosted API is faster and runs the large model by default, but every minute you transcribe is a minute of audio sent to OpenAI's servers and a per-minute charge. Whisper Transcription does everything on your Mac, doesn't charge per minute, and keeps the audio local. For privacy-sensitive work, that's the answer. For one-off use of large quantities of public-domain audio, the hosted API may be cheaper.

Does the in-app purchase carry over to a new Mac?

Yes. App Store purchases are tied to your Apple ID. If you buy a new Mac and sign in with the same ID, your unlock follows. If your family-sharing setup is configured for it, family members can also access purchases.

Is the audio uploaded anywhere?

No. The model runs on-device. The app needs internet only for the initial model download and for App Store updates. If you have an active "OpenAI API mode" enabled with your own API key, that mode does send audio to OpenAI — but you have to opt in and configure it explicitly.

What's the longest file it can handle?

In our tests, files of two to three hours work without issue on an M-series Mac with the medium model. Beyond that, you'll occasionally hit memory pressure depending on how much RAM you have and what else is running. Splitting very long files into 60–90 minute chunks is generally a good idea regardless of which app you use.

Does it support speaker labels?

The interactive editor lets you assign speaker labels to spans of text manually, which is great for short interviews. There's no automatic diarization that says "this section is Speaker A" — you tag spans yourself as you review.

Can I run a custom or fine-tuned Whisper model?

Not directly. Whisper Transcription works with the official Whisper model family (and certain distilled variants). If you need to run a fine-tuned model — say, one tuned on medical vocabulary — you'll want a more open tool like Buzz or a Python script.

M

Tested over a couple of weeks on an M2 MacBook Air with the paid unlock. The free tier was tested on a separate machine without the unlock to confirm the experience for non-paying users. We have no relationship with the developer.

Home › Reviews › Pyrenees

Pyrenees

What it is, in two paragraphs

Pyrenees is a free macOS transcription app for Apple Silicon Macs. It's built around MLX, Apple's open-source machine-learning framework released in late 2023. MLX is designed specifically for the unified-memory architecture of M-series chips — it runs models on the GPU and Neural Engine without copying tensors back and forth across separate VRAM and system RAM the way frameworks designed for NVIDIA cards have to. For models like Whisper, that translates into noticeably faster inference than running the same model through plain PyTorch or even whisper.cpp's Core ML path.

The app itself is small, quiet, and does basically nothing besides transcription. You import a file, pick a model, get a transcript. There's a list of recent jobs. There's an export menu with the usual SRT, VTT, and TXT options. There is no waveform editor, no system audio capture, no menubar palette, no fine-tuning of model parameters beyond a few sensible toggles. The minimalism is deliberate.

A note on names

If you can't find Pyrenees in the App Store, that's because it isn't there — it's distributed directly as a notarized .dmg. You're meant to download it, accept the security prompt once, and run it. This is normal for indie Mac apps; it's not a sign that anything's wrong.

So how fast is it, really?

Here's where I have to flag the usual caveats. Speed comparisons across transcription apps are extremely hardware-dependent, and any specific number we put on a page will be wrong by the time someone reads it. So instead of giving you "Pyrenees finished a 30-minute file in X seconds while Buzz took Y", which would be misleading without knowing your Mac, I'll describe the pattern we observed.

On every Apple Silicon Mac we tested — an M1 MacBook Air, an M2 MacBook Air, and a Mac Studio with an M2 Max — Pyrenees ran the same Whisper model meaningfully faster than Buzz did, and modestly faster than Whisper Transcription. The gap was widest on the larger models (medium and large) and smallest on the tiny model, where everything is fast enough that the choice of framework barely matters.

The qualitative experience is the more interesting part. Where transcription on the same hardware in Buzz used to feel like "start the job, go do something else for a quarter of an hour", Pyrenees turns it into something more like "wait for it to finish while you're still on the page". For shorter files — voice memos, meeting fragments, ten-minute lectures — this changes the workflow. You stop batching things and you start transcribing them ad hoc.

Pyrenees doesn't feel faster the way an upgraded computer feels faster. It feels faster the way switching from email-based reviews to instant messaging feels faster — the activation energy drops below a threshold, and you do more of it.

What's actually different vs. the others

Most Mac transcription apps fall into one of two technical camps. They either ship the original PyTorch Whisper implementation with whatever GPU acceleration they can scrape together, or they bundle whisper.cpp, the C++ port that Georgi Gerganov maintains. Both are perfectly good options.

Pyrenees is in a third camp. It uses MLX-converted Whisper weights and runs them through the MLX runtime. Because MLX is built specifically for the M-series unified memory model, it can do things like memory-map the model weights without copies, run quantized variants natively, and use the hardware in ways that more general-purpose frameworks can't quite manage.

The user-visible consequences:

Less memory pressure. Running the large model on a 16 GB Mac is genuinely viable in Pyrenees. In some other apps with the same model, you'll see the system swap heavily and the app slow to a crawl on long files.
The Neural Engine is actually used. A lot of "Apple Silicon optimized" apps pay lip service to the ANE without really hitting it. MLX engages it for the parts of Whisper where it helps.
Quantized models load quickly. Pyrenees ships several quantized variants — 4-bit, 8-bit — that produce nearly the same accuracy as the full-precision models at a fraction of the resource cost.

Tip

If you're on a Mac with 8 GB of unified memory, start with the 4-bit medium model. The quality is closer to the full medium model than you'd expect, and it leaves enough headroom for you to keep using your computer normally while a transcription runs in the background.

What Pyrenees doesn't do

This is the section where the case for a different tool gets made. Pyrenees is opinionated about its scope, and the things it won't do are choices, not oversights.

It doesn't run on Intel Macs. Apple Silicon only. If you're still on a 2019 16-inch MacBook Pro, this app is not for you, and you'll have to wait for that next upgrade or fall back on Buzz.

It doesn't run on Windows or Linux. Obvious from the platform note, but worth saying if you're considering setting up a multi-OS workflow.

It doesn't have an interactive transcript editor. The output is a finished transcript. You can fix obvious typos in the export, but there's no click-to-play, no inline cue editing, no speaker labels.

It doesn't capture system audio. Microphone input works for ad-hoc recording, but it can't pull audio from another app the way Whisper Transcription can.

It doesn't do dictation. File-based only. For dictation into other apps, you want VoiceInk.

It doesn't have professional subtitle exports. SRT, VTT, and TXT cover the common cases. If you need TTML, EBU-STL, or any of the broadcast formats, you'll need to take the SRT into Subtitle Edit for conversion.

Pros & cons

Strengths

Fastest free transcription tool we've tested on Apple Silicon
Quantized model variants make large models viable on 8 GB Macs
Genuinely free, no in-app purchase, no signup
Tiny install footprint compared to PyTorch-based alternatives
Stable on long files; the memory behavior is much better than older Whisper apps
Active development under the MLX-Whisper momentum

Weaknesses

Apple Silicon Mac only — narrow platform
No transcript editor; output is read-only inside the app
No system audio capture, no dictation, no subtitle workflow
Smaller community than Buzz; fewer docs and tutorials online
Distribution outside the App Store means a more cautious first install
UI is minimal to the point of feeling unfinished if you're used to mature apps

How to use it

This is the shortest "how to" of any tool we cover, because the app is genuinely that simple.

Download Pyrenees from the developer's site (look for the official notarized DMG; avoid mirrors).
Drag the app to /Applications. First launch will hit Gatekeeper; right-click the icon, choose "Open", confirm. macOS will remember the choice.
Pick a model on first launch. The app will offer to download one. Medium 4-bit is a good first choice on most M1/M2 Macs.
Drop your audio or video file in. Or hit the record button to capture from the microphone.
Confirm the language if you don't want auto-detection.
Wait. Briefly. Pyrenees is faster than the alternatives; you'll often be done before you finish your coffee.
Export. File → Export → SRT/VTT/TXT.

In practice

Here's how I personally use it. I record voice memos for note-taking on the iPhone, AirDrop them to my MacBook Air at the end of the day, drop the lot into Pyrenees, and have plain-text transcripts in my notes folder within a couple of minutes per memo. None of those memos contain anything I'd want sent to a server. None of them justify a subscription. Pyrenees is the right tool for that exact workflow.

How it compares to the rest of the shortlist

Versus Buzz: Pyrenees is faster and prettier. Buzz is more flexible (Linux, Windows, multiple backends, batch queueing, OpenAI API support). For Mac-only users who don't need Buzz's flexibility, Pyrenees wins. For anyone who works across platforms or needs the optionality, Buzz still has a place.

Versus Whisper Transcription: Pyrenees is faster and free; Whisper Transcription is more polished and has features (interactive editor, system audio, DOCX export) that Pyrenees doesn't. It's a real tradeoff. Try Pyrenees first since it costs nothing — if you need the extra features after a week of use, Whisper Transcription's purchase makes sense.

Versus Subtitle Edit: Different jobs. Pyrenees produces, Subtitle Edit edits. The natural workflow is to use both.

Versus VoiceInk: Different jobs again. Pyrenees is for files, VoiceInk is for live dictation.

FAQ

Will it run on my Intel MacBook?

No. Pyrenees requires an Apple Silicon chip (M1 or later). MLX is designed around that architecture and won't run on Intel hardware.

Is it really free, or is there a paid version?

It's free. There's no paid tier, no subscription, no premium model unlock. Some niche features may be donation-encouraged, but the core transcription functionality has no paywall in any form we've seen.

How private is it?

The transcription is entirely local. Pyrenees doesn't have a "send to server" mode at all, which is one of the things that makes it appealing for journalists, lawyers, and anyone else with sensitive audio. It does need internet to download models for the first time.

Can I use the same model files I downloaded for another app?

Sometimes, but not reliably. Pyrenees uses MLX-format weights, which are different from the .bin files whisper.cpp uses or the .pt checkpoints from PyTorch. You can re-download them through Pyrenees; the storage cost is the same.

What happens if I have a 4 GB Mac (or some unusually low-RAM device)?

Apple Silicon Macs at the very low end can run the tiny and base models comfortably. The medium model in 4-bit form usually works on 8 GB machines. Large models really want 16 GB or more, regardless of the framework.

Does it support live transcription?

It supports recording from the microphone and transcribing what you record, but not in true real-time the way iOS dictation works. For low-latency dictation, look at VoiceInk.

Why is the app distributed outside the App Store?

App Store rules around bundling models and how the app uses on-device acceleration can be restrictive for ML-heavy apps. Distributing directly is a common indie choice. As long as you download from the developer's official site, the app is notarized and signed by Apple, which is what matters for security.

M

Tested over a few weeks on an M1 MacBook Air (8 GB), an M2 MacBook Air (16 GB), and an M2 Max Mac Studio (32 GB). Same set of test files, run alongside the same transcriptions in Buzz and Whisper Transcription for direct comparison. We have no relationship with the developer.

Home › Reviews › VoiceInk

VoiceInk

The thing it replaces

Apple has shipped dictation as a system feature on macOS for over a decade. You press a hotkey and you can speak into any text field. It works. It's been there for so long that most people have forgotten about it. The reason VoiceInk exists is that the built-in version has some annoying limitations — it sends your voice to Apple's servers (or used to; recent versions can run on-device for English on Apple Silicon, but the implementation is opaque), it doesn't always handle technical vocabulary well, and the customization is essentially zero. You can't tune it for your jargon, your accent, or your workflow.

VoiceInk replaces that with something more flexible. It's an open-source app that runs Whisper locally and gives you a hotkey to dictate with. The text appears wherever your cursor is. The model is on your machine. The customization is yours. It's free, the source is on GitHub, and once you've used it for a couple of days, going back to system dictation feels like an unnecessary downgrade.

The interaction, in detail

This is the only review where the actual gesture matters more than the underlying tech, so it's worth describing carefully.

You set a hotkey in VoiceInk's preferences — let's say fn, the function key, since it's already on your keyboard and it's not bound to anything most people use regularly. From then on, you can be in any application — a browser, a terminal, an email client, a code editor — and:

Hold the hotkey down. A small overlay appears at the bottom of the screen showing that recording has started.
Talk. The overlay shows a moving waveform so you know the mic is active. Talk normally; the model is forgiving of "um" and "uh" and natural speech rhythm.
Release the hotkey. A second or two of processing happens (faster if you have a small model loaded, slower if you have a large one), and then the transcribed text gets typed into wherever your cursor was. As if you'd written it yourself, just much faster than you can type.

That's the whole interaction. Hold, talk, release. Once it's part of your muscle memory, it changes how you write a lot of small things — Slack replies, email responses, code comments, search queries. I've watched people pick it up for the first time and stop using their keyboard for short messages within an afternoon.

A small honest moment

The first week I used VoiceInk, I didn't actually like it. Talking to my computer felt awkward, and I noticed I was rewriting things I dictated more often than I rewrote things I typed. Then about ten days in, the rewrites stopped — partly because I'd learned to think more clearly before I started talking, and partly because the model handled my voice better the more I'd used it. By the second week, I was reaching for the hotkey for any message longer than a sentence. Don't judge VoiceInk on day one.

Where the model lives

Like the other tools on this site, VoiceInk runs Whisper locally. It supports several model sizes, and the choice of model is a real-time-vs-quality tradeoff that matters more than it does for file-based transcription. With dictation, you don't want to wait twenty seconds for the text to appear; you want it now. So most VoiceInk users settle on either the base or small model, which transcribe quickly enough to feel responsive.

That tradeoff has a downside. Smaller Whisper models are noticeably less accurate than the large variants, especially with proper nouns, technical jargon, or non-mainstream accents. VoiceInk has a "vocabulary" feature where you can teach it specific words you use often — names of people you work with, project names, technical terms — and that helps a lot. But the accuracy gap is real, and you should expect to do light corrections after dictating anything important.

Tip — if you have an Apple Silicon Mac

VoiceInk benefits a lot from Apple Silicon's Neural Engine. On an M1 or later, the small model is fast enough to feel essentially instant, and the medium model is usable. On Intel Macs, you're stuck with the smaller models if you want responsiveness, and the experience suffers.

What VoiceInk is genuinely good at

I want to be specific about this rather than vague, because vague reviews are useless when you're deciding whether to install something.

Quick replies. Slack messages, email responses, "yeah looks good", "let me get back to you tomorrow on that." Things that take three seconds to say and ten seconds to type. The hotkey workflow shaves real time off a real day.

First drafts. Talking out a paragraph and then editing it on the keyboard is genuinely faster than writing the paragraph from scratch for many people. VoiceInk fits this workflow especially well because the text lands directly in your editor of choice — no copy-paste step.

Note-taking. Quick thought, capture it before it goes away. The hotkey-anywhere model means the friction of "where is my notes app, where is the cursor, what was I about to say" disappears.

Code comments and commit messages. Anywhere a thought is more important than its phrasing. The fact that you can be in your terminal or your editor makes this work without breaking flow.

Accessibility. For people with RSI, hand injuries, or other reasons keyboard input is painful, a fast on-device dictation tool is genuinely valuable. VoiceInk's open-source nature also means you can audit it for any concerns about where your voice goes.

What it isn't good at

Long-form writing. An hour of dictation is exhausting in a way an hour of typing isn't. People who try to dictate entire essays usually go back to keyboards within a week.

Anything you don't want misheard. If accuracy is critical — a legal document, an academic citation, a medical reference — dictation will let you down at small but inconvenient frequency. Always read it back.

File transcription. Said it once, saying it again. VoiceInk doesn't accept input audio files. It records from your microphone, processes it, and types the result. If you have a file, use Buzz or Pyrenees.

Multi-speaker situations. The mic captures whoever is loudest. A meeting recording is the wrong input for VoiceInk.

Pros & cons

Strengths

Fast, on-device dictation that works in any app
Open source — you can audit and fork
Custom vocabulary support for jargon and proper nouns
Genuinely free, no subscription
Better than macOS built-in dictation in most respects
Hotkey workflow disappears into muscle memory after a few days

Weak spots

macOS only
Smaller models trade accuracy for speed; this is the tradeoff and it's real
Not for transcribing files — wrong tool entirely
Accessibility permissions setup involves a few clicks the first time
Less polished than commercial dictation apps; you'll see rough edges
Active project but smaller team than the more established alternatives

Setting it up

Install VoiceInk from its GitHub releases page or the developer's site.
On first launch, grant permissions. macOS will ask for microphone access and accessibility access (the latter is what lets the app type into other apps). Both are required.
Pick a model. Small is a good starting point; bump up to medium if you have an M-series Mac and want better accuracy.
Set a hotkey you'll remember. The function key (fn) works well because it's not used for much. Some people prefer right-Option, which is also unused on most layouts.
Add custom vocabulary if you have specialized words. Names of coworkers, project codenames, technical terms. The model uses these as hints during transcription.
Try it in a low-stakes app first. Open a TextEdit window, hold the hotkey, dictate a paragraph, see how it goes. Adjust your speaking pace based on what happens.

A real-world setup that works for me

Hotkey: fn. Model: medium (M1 MacBook Pro, 16 GB). Vocabulary: about thirty entries — names of people I message often, two project codenames, three technical terms my model kept misspelling. Result: I dictate maybe twenty short messages a day and rewrite about one in twenty.

Compared to the rest of the shortlist

VoiceInk is genuinely orthogonal to every other tool on this site. The other four answer "I have audio, give me text." VoiceInk answers "I have a thought, give me text faster than I can type." Those are different problems and different tools, and they're not in competition.

If you've installed VoiceInk and you find yourself wanting to transcribe a meeting recording, you're using the wrong app — go install Buzz or Pyrenees for that. Conversely, if you've installed Buzz and you find yourself thinking "I wish I could dictate into Slack the same way," go install VoiceInk. It's normal to have both.

FAQ

Is VoiceInk really better than macOS dictation?

It's better in the ways that matter to people who care about dictation: customization, model choice, predictable behavior, and the open-source guarantee that nothing is sent to a third-party server. It's not better at being instantly available with zero setup — system dictation wins on that. So it depends on whether you dictate enough to care.

Does it work in any app?

In any app that accepts text input, yes. The way it works is by simulating keystrokes after transcription, which is why it needs accessibility permission. Apps that have unusual text-input handling (some game engines, secure password fields) may behave oddly, but mainstream apps work fine.

Will it run on Intel Macs?

Yes, but the practical experience is much worse than on Apple Silicon. The smaller models are usable but accuracy suffers; the medium and large models are too slow for real-time-feeling dictation. If you're on Intel, set expectations accordingly and consider sticking with the small model.

Can I use it for transcribing a recorded file?

No. That's not what it does. Use Buzz, Pyrenees, or Whisper Transcription for files.

Is my voice sent anywhere?

No. The transcription is fully on-device. This is part of the appeal — it's a privacy-respecting alternative to system dictation in the cases where you want to be sure your voice isn't going through someone else's servers.

How accurate is it for non-English languages?

Whisper handles many languages, and VoiceInk inherits that. Major European languages, Mandarin, Japanese, and others tend to work well with the medium and larger models. Smaller models in non-English languages can struggle. The vocabulary feature mostly assumes Latin-script entries.

Will it learn my voice over time?

No. Whisper is not a personalized model — it doesn't adapt to individual speakers. The way you "train" it is by adjusting the custom vocabulary list and learning what kind of speaking pace and clarity it likes. After a couple of weeks, you'll have unconsciously calibrated to it.

M

Used daily over several weeks on an M1 MacBook Pro and an Intel iMac (the latter just to confirm the Intel performance story). We're not contributors to the project and have no relationship with the developers.

Home › Site › About

About this site

Last updated · April 2026

WhisperDesktop is a small, independent website that reviews desktop software for converting audio into text. We are not a venture-backed company, an affiliate-content farm, or a marketing agency. We are not affiliated with any of the apps we cover.

Why we built it

Search for "best transcription app" in 2026 and you'll find dozens of articles that essentially recycle the same five recommendations in the same five blocks, frequently lifting language from each other and pointing every link at the highest-paying affiliate program. There's nothing illegal or even particularly unethical about that — it's just not very useful when you actually need to choose a tool.

We wanted somewhere that did the opposite: actually installed the software, actually ran files through it, and tried to honestly capture what each one is good for and where it falls short. Not a leaderboard. Not a "winner." Just five reviews that try to help you figure out which one fits your situation.

How we test

For every app we review, we install it ourselves on a real machine (or several real machines, where the platform support varies). We run a consistent set of test files through each app:

A 47-minute one-on-one interview recorded on an iPhone
A short voice memo with background noise (a coffee shop)
A 12-minute solo podcast intro recorded with a USB mic
A noisy field recording with two speakers
A short non-English clip (currently French and Mandarin, for spot-checks)

The same files for every app. We note the time taken, the obvious accuracy issues, and the friction points along the way. We try to use the apps for at least a few weeks before publishing — first impressions are useful for the "feel" of an app, but real strengths and weaknesses only emerge with use.

Who we are

We're a tiny team — currently two people — who got tired of "Top 10" articles and decided to write something else instead. We're not journalists by training. One of us spent years working with audio for a living; the other is a software engineer who got obsessed with on-device ML in 2023 and never recovered.

We don't think our names matter very much for the credibility of what we write. Anyone who reads enough reviews can tell whether the writer has actually used the software, and the reviews on this site try hard to make that obvious. If you'd like to contact us, the contact page has an email address that's read by an actual person.

How the site is funded

The site is small enough to run on a tiny budget. We pay for the domain and a basic hosting plan; that's most of it. To offset those costs, some links on the site may be affiliate links — meaning if you click through and end up paying for software, we might receive a small commission at no cost to you.

Two things to know about that:

Affiliate links are clearly identified where they exist. We don't disguise them as regular links.
Affiliate eligibility doesn't influence which apps we cover or how we cover them. Several of the apps we review are free and open-source, with no affiliate program at all. We still cover them, often more enthusiastically than we cover the paid options.

If we ever publish a sponsored review, we'll label it as such at the top of the article. We haven't published any sponsored content as of this writing, and we don't currently have plans to.

What we don't do

A few things we'd like to be explicit about, because the rest of the internet sometimes blurs them:

We don't accept payment for positive reviews.
We don't send drafts to the developers we review for "approval."
We don't repackage other sites' content. Everything you read here was written by us, after using the software ourselves.
We don't promise specific results, income, or productivity gains. Software is a tool; what you do with it is up to you.
We don't pretend to be the official site of anything we review. We're a third-party review site. The official sites of the apps we cover are easy to find via the apps themselves.

What we'd appreciate from readers

If you find a factual error, please tell us. If a feature has changed since we wrote a review, please tell us. If you think we missed something important about an app, please tell us. The address is on the contact page and we read everything that comes in, even if we don't always have the bandwidth to reply.

Beyond that — we hope something here helps. There are hundreds of tools in this category and the differences between them are smaller than the marketing copy suggests. We wrote this site to help with that small fraction of the decision that does matter.

This site is an independent informational resource and is not affiliated with any of the official services covered (Buzz, Subtitle Edit, Whisper Transcription, Pyrenees, VoiceInk). All trademarks and product names are the property of their respective owners.

Home › Site › Contact

Contact

Last updated · April 2026

If you've found an error in one of our reviews, want to suggest an app we haven't covered, or just have a question about something on the site — we'd like to hear from you.

Email

contact@WhisperDesktop

A real person reads everything that lands in this inbox. We try to reply within a week, though sometimes life happens and it's longer.

What's helpful to include

The more useful the email, the more useful our response can be. A few things we appreciate when they're relevant:

For factual corrections: a link to the page in question and a sentence or two about what's wrong. If you can point at a source for the correct information, even better — but it's not required. We'll go check.
For suggestions: the name of the app, a link to its homepage if there is one, and a sentence about why you think it's worth covering. We get more suggestions than we can act on, but we read all of them.
For technical questions about an app: we're happy to share what we've learned, but we're not the developers — we're just reviewers. For bug reports and feature requests, you'll get a much better response by writing to the app's actual support address. We can usually point you at the right place.
For press, partnership, or sponsorship inquiries: please mention that in the subject line. We don't currently run sponsored reviews, but we read these too.

Response times, honestly

The site is run by a small team in our spare time. We don't have a ticketing system or a help desk. What this means in practice:

Most emails get a reply within five to seven days.
Quick corrections often get a same-day reply.
Long, thoughtful messages sometimes take longer because we want to give them a thoughtful reply back.
Around major holidays the queue gets longer. We catch up eventually.

If you've written and haven't heard back in two weeks, feel free to send a follow-up — the original might have ended up in spam or buried in a busy week, and we'd rather hear from you twice than miss your message.

What we can't help with

A few things to save everyone's time:

We can't reset passwords, refund purchases, or troubleshoot accounts for any of the apps we review. We don't have access to those systems and we couldn't help even if we wanted to. Contact the app's developer directly.
We can't write custom transcription tutorials for individual projects. The how-to sections in our reviews are general by design.
We can't accept guest posts, link insertions, or "content collaboration" of the kind that's usually pitched to review sites. Please don't send those pitches; we won't reply.

Privacy, briefly

Anything you send us stays with us. We don't sell email addresses, we don't pass them to advertisers, we don't sign people up to newsletters they didn't ask for. If you want your message deleted from our inbox after we've read it, just say so and we'll delete it. The full details are in our privacy policy.

This site is an independent informational resource and is not affiliated with any of the official services covered (Buzz, Subtitle Edit, Whisper Transcription, Pyrenees, VoiceInk).

Home › Legal › Privacy Policy

Privacy Policy

Last updated · April 2026

This policy describes what information WhisperDesktop collects when you visit, why we collect it, and what we do with it. We've tried to write it in plain English rather than legalese.

The short version

We don't run ads, we don't sell data, and we don't track you across the web. We collect only the basic server-side information needed to keep the site online and roughly understand how it's being used. If you email us, we keep your email until we no longer need it.

Who runs this site

This site is operated by the small editorial team behind WhisperDesktop. For privacy-related questions or requests, you can reach us at contact@WhisperDesktop.

What we collect

1. Server logs

Like nearly every website, our hosting provider keeps standard server logs. When your browser requests a page from us, those logs typically record:

Your IP address
The date and time of the request
The page you requested and the page you came from (the "referrer")
Your browser type and operating system, as reported by the browser itself (the "user agent")

This data is part of how the web works; it isn't something we ask for or collect deliberately. We use it to keep the site running, diagnose problems (a page returning errors, an unusual traffic pattern), and protect against abuse like denial-of-service attacks. Logs are retained for a short period — typically no more than 30 days — and then rotated out.

2. Analytics

We may use a privacy-respecting analytics service to understand which pages are read and which aren't. If we do, it will be a service that:

Does not use cookies
Does not collect personally identifiable information
Does not track visitors across other websites

Examples of services that meet this bar include Plausible and similar self-hostable tools. If we ever switch to a service that uses cookies or collects more detail, we'll update this policy and add a clear notice.

3. Email correspondence

If you write to us at contact@WhisperDesktop, we receive your email address, your name (if your client sends one), and the contents of your message. We use this only to reply and, where relevant, to follow up on whatever you wrote about. We don't add your address to any mailing list. We don't share it with third parties. If you ask us to delete your message, we'll delete it.

Cookies

The site itself does not set tracking cookies. Some technical cookies might be set by our hosting platform for purposes like load balancing or basic security; these are not used to identify you across sites. If we add a feature in the future that requires cookies (for example, remembering a user's preferred theme), we'll ask for consent where the law requires it and explain it here.

Third-party services we use

We try to keep external dependencies to a minimum, but a few are unavoidable:

Google Fonts. Our typography is loaded from Google Fonts. When your browser requests these fonts, Google receives the standard request information described above (IP address, user agent, referrer). Google's handling of this data is governed by their own privacy policy.
Hosting provider. The site is hosted by a commercial provider that may process server logs as their data processor. They handle this data on our behalf and under their own terms.
Email provider. Email sent to or from our contact address is handled by a commercial email provider, who stores those messages on their servers as part of providing the service.

Affiliate links

Some links on this site may be affiliate links. When you click an affiliate link and subsequently make a purchase from the linked site, the linked site may pay us a small commission. To attribute the click, the linked site usually sets a cookie in your browser. That cookie is set by the destination site, not by us, and is governed by their privacy practices, not ours.

We don't control what affiliate networks or destination sites do with that information. We do clearly identify links that are affiliate links — see our about page for more on this. You can choose not to click those links, and you'll lose nothing on our end if you'd rather visit those products directly.

Children

This site is not directed at children under 13, and we don't knowingly collect personal information from children. If you're a parent or guardian and you believe your child has sent us information, please write to us and we'll delete it.

Your rights

Depending on where you live, you may have legal rights regarding personal data we hold about you. These typically include the right to:

Ask what information we have about you
Ask us to correct it if it's inaccurate
Ask us to delete it
Ask us to stop using it for certain purposes
Make a complaint to a data protection authority

In practice, the personal information we hold about most readers is simply "an email you sent us." If you want it deleted, write to contact@WhisperDesktop and we'll do so promptly.

Security

We take reasonable steps to protect the information we hold — using HTTPS site-wide, keeping software updated, and limiting the number of people who can access our systems. No system is perfectly secure, however, and we can't guarantee that data will never be exposed by a breach beyond our control. If a breach does affect our readers, we'll notify the people involved and the relevant authorities as required by law.

Changes to this policy

If we change anything material in this policy, we'll update the "Last updated" date at the top and, for significant changes, post a brief notice on the site for a period of time. We won't quietly start collecting more data than this policy describes.

Contact

For privacy questions or requests, write to contact@WhisperDesktop with "Privacy" in the subject line. We'll get back to you as quickly as we can.

This site is an independent informational resource and is not affiliated with any of the official services covered (Buzz, Subtitle Edit, Whisper Transcription, Pyrenees, VoiceInk).

Home › Legal › Terms of Use

Terms of Use

Last updated · April 2026

By using WhisperDesktop, you agree to the terms below. They're meant to be reasonable rather than gotcha-style — but please read them so we're on the same page.

1. What this site is

WhisperDesktop is an independent informational website. Its purpose is to help readers compare and choose desktop software for converting audio to text. We are not the developer, publisher, or official representative of any of the apps reviewed on this site. Where we mention a third-party product (Buzz, Subtitle Edit, Whisper Transcription, Pyrenees, VoiceInk, or any other), we're doing so as an outside reviewer. The official sources of those products remain their respective developers.

2. Informational use only

Everything on this site is provided for general informational purposes. We try to be accurate. We test the software ourselves. But software changes — sometimes weekly — and a feature that worked the way we describe at the time of writing may behave differently by the time you read it. Always check the official documentation of the relevant app before relying on its behavior for anything important.

Nothing on this site constitutes professional, legal, financial, or technical advice. If a transcription is going into a courtroom, a medical record, or any other situation with serious consequences for accuracy, please verify the output yourself or have a qualified professional do so.

3. No warranty

This site and all content on it are provided "as is" and "as available," without any warranty of any kind, express or implied. We make no representations or warranties about the accuracy, reliability, completeness, or timeliness of the information presented. To the fullest extent permitted by applicable law, we disclaim all warranties, including the implied warranties of merchantability, fitness for a particular purpose, and non-infringement.

4. Limitation of liability

To the maximum extent allowed by law, WhisperDesktop and the people who run it will not be liable for any direct, indirect, incidental, consequential, special, or punitive damages arising out of or relating to your use of the site, including but not limited to any decision you make based on something you read here, any loss of data, or any issues caused by software you installed after reading about it on this site.

If you live somewhere that doesn't allow some of these limitations, the parts that aren't enforceable in your jurisdiction simply don't apply, and the rest still does.

5. Third-party trademarks and content

All product names, logos, and brands referenced on this site are the property of their respective owners. Mention of a product or company name does not imply any endorsement, sponsorship, or partnership unless we explicitly say so. We use these names purely to identify the products we are writing about — what's sometimes called nominative fair use.

Screenshots of third-party software, where we include them, are used for the purpose of commentary, criticism, and review. If you are the owner of a product we cover and you believe we've represented something inaccurately, please write to us at contact@WhisperDesktop and we'll review the concern.

6. Affiliate disclosure

Some links on this site may be affiliate links. If you click such a link and subsequently make a purchase, we may receive a small commission from the linked seller, at no additional cost to you. This is disclosed wherever it applies. Affiliate relationships do not influence our editorial judgment about what we cover or how we cover it. See our about page for the longer version.

7. External links

This site contains links to external websites we don't control. We provide them because they're useful — for example, linking to the developer of an app we review. We are not responsible for the content, privacy practices, or availability of any external site. Following an external link is at your own discretion.

8. Our content

The original written content on this site — articles, reviews, comparisons, the way they're phrased — is the work of the team behind WhisperDesktop and is protected by copyright. You're welcome to:

Read it
Share links to specific pages
Quote short passages with proper attribution and a link back

What we'd like you not to do:

Republish whole articles on other sites without permission
Use the content to train commercial AI systems without permission
Strip our attribution and present the writing as your own

If you'd like to do something that isn't covered here — translate an article, syndicate a piece, use a passage in a book — just write to us. We're usually happy to work something out.

9. Acceptable use

Please don't:

Attempt to gain unauthorized access to our systems
Probe the site with automated scrapers at a rate that affects other readers' access
Use the site to distribute malware or carry out attacks against other users
Impersonate us or claim to represent us in any way

If we notice abusive activity, we may block the responsible IP addresses or take other reasonable measures.

10. Changes to these terms

We may update these terms from time to time. When we do, we'll change the "Last updated" date at the top. For significant changes we'll add a notice on the site for a reasonable period. Your continued use of the site after a change indicates that you accept the updated terms; if you don't, you're free to stop using the site.

11. If a part of these terms is unenforceable

If any part of these terms turns out to be unenforceable in your jurisdiction, the rest of them still apply. The unenforceable bit is treated as if it had been removed, narrowed to the extent needed to make it work, or replaced with a similar provision that is enforceable.

12. Contact

For questions about these terms, write to contact@WhisperDesktop with "Terms" in the subject line.

This site is an independent informational resource and is not affiliated with any of the official services covered (Buzz, Subtitle Edit, Whisper Transcription, Pyrenees, VoiceInk). All trademarks and product names are the property of their respective owners.

Picking a transcription app shouldn't feel like pulling teeth.

Five tools we keep coming back to

Buzz

Subtitle Edit

Whisper Transcription

Pyrenees

VoiceInk

A quick comparison

Start with the question, not the tool

What Buzz actually is

How it feels to use

Under the hood

Pros and cons

What works

Where it falls short

How to use Buzz: a step-by-step

Compared to the alternatives

Who should use Buzz

FAQ

Read next

Two ways to think about it

What's it like to actually use?

The format support is genuinely absurd

Where it stumbles

Pros & cons, condensed

Strengths

Weak spots

How to use it: three short walkthroughs

1. From audio file to clean SRT (full Whisper workflow)

2. Cleaning up an SRT that another tool produced

3. Extracting subtitles from a Blu-ray rip

Subtitle Edit vs. the others

Who it's for

FAQ

Read next

Setting expectations

What it actually does

The pricing question

Where the polish ends

Pros and cons

What you get

What you don't

How to actually use it

Compared to the others

FAQ

Read next

What it is, in two paragraphs

So how fast is it, really?

What's actually different vs. the others

What Pyrenees doesn't do

Pros & cons

Strengths

Weaknesses

How to use it

How it compares to the rest of the shortlist

FAQ

Read next

The thing it replaces

The interaction, in detail

Where the model lives

What VoiceInk is genuinely good at

What it isn't good at

Pros & cons

Strengths

Weak spots

Setting it up

Compared to the rest of the shortlist

FAQ

Read next

Why we built it

How we test

Who we are

How the site is funded

What we don't do

What we'd appreciate from readers

Email

What's helpful to include

Response times, honestly

What we can't help with

Privacy, briefly