You have an hour-long interview from last week sitting on your phone. You recorded that meeting so you could focus on what the other person was saying instead of frantically scribbling notes. Now you need to turn that audio into text. You find an online service, upload the file, and a form pops up: pay this much for the transcription, and by the way, you're now sending a sensitive recording off to some stranger's server somewhere in the world.
There is another way. Your Mac can handle the whole thing on its own - offline, for free, and without a single byte leaving your computer. You just set up a tool called Whisper once, and from then on you can transcribe anything anytime. This guide walks you through that setup step by step.
What Whisper is and why offline
Whisper is a transcription model from OpenAI, that is, a program that turns spoken words into text. The key thing is that it runs directly on your Mac. You don't need an account, a subscription, or an internet connection (apart from the first download). Whisper handles Czech well and can cope with hour-long recordings.
And because it sends nothing to anyone else's server, your files stay right there on your disk - no cloud service, no terms of use, no per-minute fees. For sensitive material like work meetings, interviews, or medical records, this privacy is exactly what matters most.
Before you start: what you'll need
You only need a few things:
- A Mac - with an Apple Silicon processor (M1 and newer) or an older one with Intel. Both work.
- Terminal - the app you can find through Spotlight (Cmd + Space, type "Terminal").
- A few GB of free disk space - depending on how accurate a model you download.
- A little patience for the first download - the models range from hundreds of MB to several GB.
You don't need to know how to program. You'll just copy all the commands in this guide into Terminal and press Enter. Nothing more.
Setting up the environment: Homebrew, Python, ffmpeg
We'll start by adding three tools to your Mac. The first is Homebrew - a package manager that then makes it easy to install everything else. Open Terminal and paste in this command:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
The installation takes a moment and may ask for your password. At the end, Homebrew itself prints two lines for you to run - copy them exactly as you see them on screen. On a Mac with Apple Silicon, these will be the two lines:
echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zprofile
eval "$(/opt/homebrew/bin/brew shellenv)"
This step adds Homebrew to the so-called PATH (the list of places where your Mac looks for commands) so that Terminal can find the brew command. On an Intel Mac the paths are different, so always go by what Homebrew prints. You can verify that everything is fine like this:
brew --version
You should see something like Homebrew 4.x.x. Now install Python through Homebrew:
brew install python
And verify:
python3 --version
Finally, install ffmpeg. It's required - Whisper uses it to read the audio from your recordings:
brew install ffmpeg
Installing Whisper into a virtual environment
We won't install Whisper directly into the system, but into a so-called virtual environment (venv). It's a separate space for Python packages, which means you don't risk breaking the system Python or other tools on your Mac.
Create the virtual environment and activate it:
python3 -m venv ~/whisper-env
source ~/whisper-env/bin/activate
After activation, (whisper-env) appears at the start of the line in Terminal. That's the sign you're inside the environment. Now install Whisper itself:
pip install openai-whisper
The installation also downloads the necessary libraries, so it may take a moment. Once it finishes, verify it:
whisper --help
If it prints out help with a list of options, you're done.
Your first transcription: the basic command
That completes the setup, and you can start transcribing. Get an audio file ready - say nazev-souboru.mp3 - and run:
whisper nazev-souboru.mp3
On the very first run, Whisper pauses for a moment and downloads a model. This happens only once; next time it uses the downloaded model right away. Then the transcribed text starts appearing in Terminal bit by bit, and text outputs are created next to the file.
Czech: always set the language
If you don't specify the language, Whisper guesses it from the first few seconds of the recording - and with Czech it sometimes gets it wrong, especially when the recording starts with silence or a foreign word. So it's better to always set the language:
whisper nazev-souboru.mp3 --language Czech
The shorter form --language cs works too. The result is the same, you just save a few characters:
whisper nazev-souboru.mp3 --language cs
Choosing a model: table and recommendation
Whisper offers several models, and each is a different trade-off between speed and accuracy. Smaller models run fast but make more mistakes; larger ones are more accurate but slower and take up more space. If you don't specify the --model parameter at all, as of 2024 the default model is turbo (size 809M) - it's almost as fast as small, but its accuracy is close to large. You choose a specific model with the --model parameter:
whisper nazev-souboru.mp3 --language Czech --model small
| Model | Accuracy | Speed | When to use |
|---|---|---|---|
| tiny | ⭐ | ⚡⚡⚡⚡ | Quick testing (~75 MB) |
| base | ⭐⭐ | ⚡⚡⚡ | Simple recordings (~142 MB) |
| small | ⭐⭐⭐ | ⚡⚡ | A good starting point (~466 MB) |
| turbo | ⭐⭐⭐⭐ | ⚡⚡ | The default choice, fast and accurate (~809 MB) |
| medium | ⭐⭐⭐⭐ | ⚡ | Longer or more complex recordings (~1.5 GB) |
| large | ⭐⭐⭐⭐⭐ | 🐌 | Maximum accuracy (~3 GB) |
The turbo model has just one limitation: it can't translate. So if you wanted to translate the transcription right away with --task translate, reach for a different model.
Keep an eye on timing, too. Whisper runs on the CPU on a Mac, so with the larger models (medium, large) the transcription can take longer than the recording itself. For everyday use, turbo or small give the best balance of speed and accuracy.
Output: formats and where it's saved
By default, Whisper creates several files at once: .txt with plain text, .srt and .vtt with subtitles and timing, and .json with detailed data. When plain text is all you need, limit the output with the --output_format parameter:
whisper nazev-souboru.mp3 --language Czech --output_format txt
By default, the files are saved to the folder you're currently in. If you want to send them elsewhere, specify the destination with --output_dir:
whisper nazev-souboru.mp3 --language Czech --output_dir ~/Desktop/prepisy
Whisper handles common audio and video formats - among them mp3, mp4, m4a, wav, and ogg.
Everyday use
The virtual environment applies only to the current Terminal window. Every time you open a new Terminal, you'd have to activate it again with source ~/whisper-env/bin/activate before you can transcribe anything. For day-to-day use that gets annoying fast. So let's set up our own command that takes that chore off your hands.
This single line adds a wt function (short for "Whisper transcribe") to the end of your zsh shell config:
cat >> ~/.zshrc <<'EOF'
wt() {
~/whisper-env/bin/whisper "$@"
}
EOF
The function calls Whisper directly from the virtual environment, so you don't even have to activate it - and your Terminal stays clean. To apply the change, load the config once (you won't have to do this in future Terminals):
source ~/.zshrc
From now on, every Terminal has a wt command that passes everything after it to Whisper. Transcribing from anywhere then looks like this - you switch to the folder with the file and run it:
cd ~/Downloads
wt schuzka.mp4 --language Czech --model small --output_format txt
You can also process several files at once - either list them or use a wildcard:
wt *.mp3 --language Czech
And that's all there is to it. No uploading, no waiting in a queue, no form.
When something doesn't work (troubleshooting)
Most problems have a simple cause. Here are the most common ones:
command not found: whisper - you're calling Whisper directly without activating the environment. The easiest fix is to use the wt function (it reaches Whisper right inside the venv). If you want to run whisper directly, activate the venv first:
source ~/whisper-env/bin/activate
command not found: wt - you added the function, but the shell hasn't loaded it yet. Run source ~/.zshrc, or open a new Terminal.
command not found: brew - Homebrew isn't in your PATH. Add it as described in the PATH step above and restart Terminal.
The transcription is inaccurate - try a larger model and make sure you've set the language:
whisper nazev-souboru.mp3 --language Czech --model medium
For maximum quality, reach for --model large.
The installation or first transcription takes a long time - that's fine. The first time you use each model, that model gets downloaded (large is roughly 3 GB). Once it's on your disk, further transcriptions are instant.
Command cheat sheet
Here's a compact cheat sheet you can copy anytime:
# One-time setup: add the function to ~/.zshrc and load it (source ~/.zshrc)
wt() {
~/whisper-env/bin/whisper "$@"
}
# Basic transcription
wt soubor.mp3 --language Czech
# Transcription with a specific model and output
wt soubor.mp3 --language Czech --model small --output_format txt
# Save to a folder
wt soubor.mp3 --language Czech --output_dir ~/Desktop/prepisy
Next time an hour-long interview or a recorded meeting lands on your phone, you won't open any paid service or upload anything anywhere - you'll just open Terminal, run wt, and let your Mac handle the transcription on its own. Privately and for free.
