How to Use Whisper CLI for Audio Transcription and Translation

a textured oil painting of a penguin wearing a green google android helmet talking to a mouse.
a textured oil painting of a penguin wearing a green google android helmet talking to a mouse.

Whisper CLI is a powerful tool for audio transcription and translation directly from your terminal. Whether you’re working with podcasts, interviews, or conference recordings, Whisper CLI makes it easy to convert spoken content into text in various languages. In this guide, we’ll walk you through installing Whisper CLI, using it for different purposes, and explore its language options, use cases, and potential areas for expansion.


Table of Contents


Introduction to Whisper CLI

Whisper CLI by OpenAI is an automatic speech recognition (ASR) system that excels in audio transcription and translation tasks. This tool supports over 60 languages and provides accurate text conversion from spoken audio. Whisper CLI is ideal for content creators, developers, and businesses looking for a reliable and efficient transcription solution.

For more information about Whisper CLI, check the official GitHub repository.

Installing Whisper CLI

Prerequisites:

  • Python 3.7 or higher
  • pip (Python package installer)

Step-by-Step Installation:

Install Python and pip:

For Linux/macOS, use:

$ sudo apt update && sudo apt install python3 python3-pip

For macOS, you can also use Homebrew:

$ brew install python3

Install Whisper CLI: Install Whisper CLI directly from its GitHub repository using pip:

$ pip3 install git+https://github.com/openai/whisper.git

Optional: Install PyTorch: Whisper CLI relies on PyTorch for neural network computations. Install PyTorch for optimized performance:

$ pip3 install torch torchvision torchaudio

Verify Installation: After installation, verify it by running:bashCopy codewhisper --help

Basic Usage of Whisper CLI

With Whisper CLI installed, you can start transcribing and translating audio files with simple commands.

Transcribing Audio Files:

To transcribe an audio file to text, use:

$ whisper path/to/audio.mp3 --task transcribe

Translating Audio Files to English:

Whisper CLI can translate audio from any supported language into English:

$ whisper path/to/audio.mp3 --task translate

Specify Output Format:

To output transcription in different formats, such as .srt:

$ whisper path/to/audio.mp3 --output_format srt

Specify Output Directory:

To save the output in a specific directory:

$ whisper path/to/audio.mp3 --output_dir /path/to/output_directory

For detailed usage, refer to the Whisper CLI documentation.

Language Options in Whisper CLI

Whisper CLI supports over 60 languages for transcription and translation. Here’s a complete list of supported languages:

To specify a language manually, use:

whisper path/to/audio.mp3 --language [language_code] --task transcribe

Language Options Table

Language Codes Table
CodeLanguage
afAfrikaans
amAmharic
arArabic
asAssamese
azAzerbaijani
baBashkir
beBelarusian
bgBulgarian
bnBengali
boTibetan
bsBosnian
caCatalan
csCzech
cyWelsh
daDanish
deGerman
elGreek
enEnglish
esSpanish
etEstonian
euBasque
faPersian
fiFinnish
frFrench
glGalician
guGujarati
heHebrew
hiHindi
hrCroatian
htHaitian Creole
huHungarian
hyArmenian
idIndonesian
isIcelandic
itItalian
jaJapanese
jvJavanese
kaGeorgian
kkKazakh
kmKhmer
knKannada
koKorean
kuKurdish
kyKyrgyz
laLatin
lbLuxembourgish
loLao
ltLithuanian
lvLatvian
mgMalagasy
miMaori
mkMacedonian
mlMalayalam
mnMongolian
mrMarathi
msMalay
mtMaltese
myBurmese
neNepali
nlDutch
noNorwegian
orOriya
paPunjabi
plPolish
psPashto
ptPortuguese
roRomanian
ruRussian
saSanskrit
sdSindhi
siSinhala
skSlovak
slSlovenian
sqAlbanian
srSerbian
suSundanese
svSwedish
swSwahili
taTamil
teTelugu
thThai
tlTagalog
trTurkish
ttTatar
ukUkrainian
urUrdu
uzUzbek
viVietnamese
xhXhosa
yiYiddish
zhChinese (Mandarin)
zuZulu

Use Cases for Whisper CLI

Whisper CLI offers a range of applications across various fields:

  • Media and Content Creation: Transcribe interviews, podcasts, and generate subtitles.
  • Education: Transcribe lectures and aid language learning.
  • Business: Generate meeting minutes and improve customer support.
  • Legal: Transcribe legal proceedings and documents.

Potential Areas of Expansion

Whisper CLI has potential for future improvements:

  • Real-Time Transcription: Implement live transcription for events and calls.
  • API Integration: Develop APIs for cloud-based transcription services.
  • Language-Specific Models: Enhance accuracy for specific languages.
  • Audio File Format Support: Broaden the range of supported audio formats.
  • NLP Integration: Combine with NLP tools for advanced text analysis.

Handling NumPy Compatibility Issues

If you encounter an issue where a module compiled with NumPy 1.x cannot run in NumPy 2.0.1, you need to either downgrade NumPy or rebuild the module. Here’s how to do it:

Downgrade NumPy

Uninstall Current NumPy Version:

pip uninstall numpy

Install Compatible NumPy Version:

pip install 'numpy<2'

Conclusion

Whisper CLI is a robust tool for audio transcription and translation, providing users with high-quality text conversion from audio files in multiple languages. By following this guide, you can effectively use Whisper CLI for various purposes and explore its full potential.

For more information, visit the Whisper CLI GitHub repository.

Leave a Reply

Your email address will not be published. Required fields are marked *