Perform Speech Recognition In Python
OpenAI Whisper API is an open source speech to text (speech recognition) model.
Python is a requirement for the Whisper API. Captions, subtitles and speech recognition is possible using open source tools Whisper API and FFmpeg.
Python can be used from the command line to transcribe audio input via the Whisper API.
Test Tools
Test System:
- CPU: Intel(R) i7 2600 @ 3.40GHz.
- Memory: 16GB DDR3.
- Operating System: Fedora Linux 40 64bit.
- Desktop: Gnome 46 On Wayland.
- Graphics: NVIDIA GeForce GTX 950.
- Hard Disk: OCZ Intrepid 3800 SSD.
Test Suite
- Audio file: 56 seconds, 1536 kbps Bit Rate, 48.00 kHz Sample Rate, WAV Codec
- Python file: Transcribe Audio File Into Text
- Video player: mpv Media Player
Whisper API
Whisper Language Models
Name | Description | Example |
---|---|---|
tiny | 32x speed model | whisper audio.wav –model tiny |
base | 16x speed model | whisper audio.wav –model base |
small | 6x speed model | whisper audio.wav –model small |
medium | 2x speed model | whisper audio.wav –model medium |
large | 1x speed model | whisper audio.wav –model large |
Name | Description | Example |
Whisper Setup For Fedora Linux
sudo dnf install ffmpeg sudo dnf install python-pip pip install -U openai-whisper
Whisper Command Line Usage
whisper audio.wav --model base.en
Whisper Python Usage
import whisper model = whisper.load_model("base.en") result = model.transcribe("audio.wav") print(result["text"])
Usage
You can use any IDE or text editor and the command line to compile and execute Python code. In this tutorial, the OpenAI Whisper AI was used to transcribe an audio file from the Python Arithmetic Operators article short video.
Results
The base model was chosen for speed. The base model was very accurate and only had 4 mistakes transcribing.
The phrase “Arithmetic Operators” was incorrectly transcribed as “At three million co-opeters”. The phrase “for your reader” was incorrectly transcribed as “the kabillano jiarida”. The phrase “the link is in the description” was incorrectly transcribed as “the link sign-in description”. The phrase “Ojambo” was transcribed as “jumbo” for both OjamboShop.com and OjamboServices.com.
Open Source
Python is licensed under the Python Software Foundation License. This allows commercial use, modification, distribution, and allows making derivatives proprietary.
OpenAI Whisper API is licensed under the permissive MIT License. This allows commercial use, modification, distribution, and allows making derivatives proprietary. The MIT License was drafted before software patents were recognized under US Law.
FFmpeg is licensed under the GNU Lesser General Public License (LGPL) version 2.1 or later. This allows commercial use, modification, distribution, and allows making derivatives proprietary.
mpv is licensed under the GNU General Public License (GPL) and GNU Lesser General Public License (LGPL) version 2.1. This allows commercial use, modification, distribution, and allows making derivatives proprietary.
Learn Programming Courses:
Courses are optimized for your web browser on any device.
Limited Time Offer:
OjamboShop.com is offering 20% off coupon code SCHOOL for Learning Python Course or for Learning PHP Course until End Day 2024.
Learn Programming Ebooks:
Ebooks can be downloaded to your reader of choice.
Conclusion:
Python makes it easy to perform speech recognition using the OpenAI Whisper API. Converting audio to text for captions or subtitles is easy in Python using the open source tools.
Take this opportunity to learn the Python or PHP programming language by making a one-time purchase at Learning Python Course or Learning PHP Course. A web browser is the only thing needed to learn Python or PHP in 2024 at your leisure. All the developer tools are provided right in your web browser.
If you prefer to download ebook versions for your reader then you may purchase at Learning Python Ebook or Learning PHP Ebook
References:
- Learning Python Course on OjamboShop.com
- Learning PHP Course on OjamboShop.com
- Learning Python Ebook on Amazon
- Learning PHP Ebook on Amazon
- OpenAI Whisper Github Page
- OpenAI Whisper License
- FFmpeg Project
- FFmpeg License
- mpv Media Player
- mpv GPL License
- mpv LGPL License