Whisper on Mac – an installation and usage guide | Blog

You have an hour-long interview from last week sitting on your phone. You recorded that meeting so you could focus on what the other person was saying instead of frantically scribbling notes. Now you need to turn that audio into text. You find an online service, upload the file, and a form pops up: pay this much for the transcription, and by the way, you're now sending a sensitive recording off to some stranger's server somewhere in the world.

There is another way. Your Mac can handle the whole thing on its own - offline, for free, and without a single byte leaving your computer. You just set up a tool called Whisper once, and from then on you can transcribe anything anytime. This guide walks you through that setup step by step.

What Whisper is and why offline

Whisper is a transcription model from OpenAI, that is, a program that turns spoken words into text. The key thing is that it runs directly on your Mac. You don't need an account, a subscription, or an internet connection (apart from the first download). Whisper handles Czech well and can cope with hour-long recordings.

And because it sends nothing to anyone else's server, your files stay right there on your disk - no cloud service, no terms of use, no per-minute fees. For sensitive material like work meetings, interviews, or medical records, this privacy is exactly what matters most.

Before you start: what you'll need

You only need a few things:

A Mac - with an Apple Silicon processor (M1 and newer) or an older one with Intel. Both work.
Terminal - the app you can find through Spotlight (Cmd + Space, type "Terminal").
A few GB of free disk space - depending on how accurate a model you download.
A little patience for the first download - the models range from hundreds of MB to several GB.

You don't need to know how to program. You'll just copy all the commands in this guide into Terminal and press Enter. Nothing more.

Setting up the environment: Homebrew, Python, ffmpeg

We'll start by adding three tools to your Mac. The first is Homebrew - a package manager that then makes it easy to install everything else. Open Terminal and paste in this command:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

The installation takes a moment and may ask for your password. At the end, Homebrew itself prints two lines for you to run - copy them exactly as you see them on screen. On a Mac with Apple Silicon, these will be the two lines:

echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zprofile
eval "$(/opt/homebrew/bin/brew shellenv)"

This step adds Homebrew to the so-called PATH (the list of places where your Mac looks for commands) so that Terminal can find the brew command. On an Intel Mac the paths are different, so always go by what Homebrew prints. You can verify that everything is fine like this:

brew --version

You should see something like Homebrew 4.x.x. Now install Python through Homebrew:

brew install python

And verify:

python3 --version

Finally, install ffmpeg. It's required - Whisper uses it to read the audio from your recordings:

brew install ffmpeg

Installing Whisper into a virtual environment

We won't install Whisper directly into the system, but into a so-called virtual environment (venv). It's a separate space for Python packages, which means you don't risk breaking the system Python or other tools on your Mac.

Create the virtual environment and activate it:

python3 -m venv ~/whisper-env
source ~/whisper-env/bin/activate

After activation, (whisper-env) appears at the start of the line in Terminal. That's the sign you're inside the environment. Now install Whisper itself:

pip install openai-whisper

The installation also downloads the necessary libraries, so it may take a moment. Once it finishes, verify it:

whisper --help

If it prints out help with a list of options, you're done.

Your first transcription: the basic command

That completes the setup, and you can start transcribing. Get an audio file ready - say nazev-souboru.mp3 - and run:

whisper nazev-souboru.mp3

On the very first run, Whisper pauses for a moment and downloads a model. This happens only once; next time it uses the downloaded model right away. Then the transcribed text starts appearing in Terminal bit by bit, and text outputs are created next to the file.

Czech: always set the language

If you don't specify the language, Whisper guesses it from the first few seconds of the recording - and with Czech it sometimes gets it wrong, especially when the recording starts with silence or a foreign word. So it's better to always set the language:

whisper nazev-souboru.mp3 --language Czech

The shorter form --language cs works too. The result is the same, you just save a few characters:

whisper nazev-souboru.mp3 --language cs

Choosing a model: table and recommendation

Whisper offers several models, and each is a different trade-off between speed and accuracy. Smaller models run fast but make more mistakes; larger ones are more accurate but slower and take up more space. If you don't specify the --model parameter at all, as of 2024 the default model is turbo (size 809M) - it's almost as fast as small, but its accuracy is close to large. You choose a specific model with the --model parameter:

whisper nazev-souboru.mp3 --language Czech --model small

Model	Accuracy	Speed	When to use
tiny	⭐	⚡⚡⚡⚡	Quick testing (~75 MB)
base	⭐⭐	⚡⚡⚡	Simple recordings (~142 MB)
small	⭐⭐⭐	⚡⚡	A good starting point (~466 MB)
turbo	⭐⭐⭐⭐	⚡⚡	The default choice, fast and accurate (~809 MB)
medium	⭐⭐⭐⭐	⚡	Longer or more complex recordings (~1.5 GB)
large	⭐⭐⭐⭐⭐	🐌	Maximum accuracy (~3 GB)

The turbo model has just one limitation: it can't translate. So if you wanted to translate the transcription right away with --task translate, reach for a different model.

Keep an eye on timing, too. Whisper runs on the CPU on a Mac, so with the larger models (medium, large) the transcription can take longer than the recording itself. For everyday use, turbo or small give the best balance of speed and accuracy.

Output: formats and where it's saved

By default, Whisper creates several files at once: .txt with plain text, .srt and .vtt with subtitles and timing, and .json with detailed data. When plain text is all you need, limit the output with the --output_format parameter:

whisper nazev-souboru.mp3 --language Czech --output_format txt

By default, the files are saved to the folder you're currently in. If you want to send them elsewhere, specify the destination with --output_dir:

whisper nazev-souboru.mp3 --language Czech --output_dir ~/Desktop/prepisy

Whisper handles common audio and video formats - among them mp3, mp4, m4a, wav, and ogg.

Everyday use

The virtual environment applies only to the current Terminal window. Every time you open a new Terminal, you'd have to activate it again with source ~/whisper-env/bin/activate before you can transcribe anything. For day-to-day use that gets annoying fast. So let's set up our own command that takes that chore off your hands.

This single line adds a wt function (short for "Whisper transcribe") to the end of your zsh shell config:

cat >> ~/.zshrc <<'EOF'
wt() {
  ~/whisper-env/bin/whisper "$@"
}
EOF

The function calls Whisper directly from the virtual environment, so you don't even have to activate it - and your Terminal stays clean. To apply the change, load the config once (you won't have to do this in future Terminals):

source ~/.zshrc

From now on, every Terminal has a wt command that passes everything after it to Whisper. Transcribing from anywhere then looks like this - you switch to the folder with the file and run it:

cd ~/Downloads
wt schuzka.mp4 --language Czech --model small --output_format txt

You can also process several files at once - either list them or use a wildcard:

wt *.mp3 --language Czech

And that's all there is to it. No uploading, no waiting in a queue, no form.

When something doesn't work (troubleshooting)

Most problems have a simple cause. Here are the most common ones:

command not found: whisper - you're calling Whisper directly without activating the environment. The easiest fix is to use the wt function (it reaches Whisper right inside the venv). If you want to run whisper directly, activate the venv first:

source ~/whisper-env/bin/activate

command not found: wt - you added the function, but the shell hasn't loaded it yet. Run source ~/.zshrc, or open a new Terminal.

command not found: brew - Homebrew isn't in your PATH. Add it as described in the PATH step above and restart Terminal.

The transcription is inaccurate - try a larger model and make sure you've set the language:

whisper nazev-souboru.mp3 --language Czech --model medium

For maximum quality, reach for --model large.

The installation or first transcription takes a long time - that's fine. The first time you use each model, that model gets downloaded (large is roughly 3 GB). Once it's on your disk, further transcriptions are instant.

Command cheat sheet

Here's a compact cheat sheet you can copy anytime:

# One-time setup: add the function to ~/.zshrc and load it (source ~/.zshrc)
wt() {
  ~/whisper-env/bin/whisper "$@"
}

# Basic transcription
wt soubor.mp3 --language Czech

# Transcription with a specific model and output
wt soubor.mp3 --language Czech --model small --output_format txt

# Save to a folder
wt soubor.mp3 --language Czech --output_dir ~/Desktop/prepisy

Next time an hour-long interview or a recorded meeting lands on your phone, you won't open any paid service or upload anything anywhere - you'll just open Terminal, run wt, and let your Mac handle the transcription on its own. Privately and for free.

Whisper on Mac – an installation and usage guide