Python Speech Recognition Using Whisper API And FFmpeg

Perform Speech Recognition In Python

OpenAI Whisper API is an open source speech to text (speech recognition) model.

Python is a requirement for the Whisper API. Captions, subtitles and speech recognition is possible using open source tools Whisper API and FFmpeg.

Python can be used from the command line to transcribe audio input via the Whisper API.

Test Tools

Test System:

CPU: Intel(R) i7 2600 @ 3.40GHz.
Memory: 16GB DDR3.
Operating System: Fedora Linux 40 64bit.
Desktop: Gnome 46 On Wayland.
Graphics: NVIDIA GeForce GTX 950.
Hard Disk: OCZ Intrepid 3800 SSD.

Test Suite

Audio file: 56 seconds, 1536 kbps Bit Rate, 48.00 kHz Sample Rate, WAV Codec
Python file: Transcribe Audio File Into Text
Video player: mpv Media Player

Whisper API

Whisper Language Models

Whisper Models And Languages
Name	Description	Example
tiny	32x speed model	whisper audio.wav –model tiny
base	16x speed model	whisper audio.wav –model base
small	6x speed model	whisper audio.wav –model small
medium	2x speed model	whisper audio.wav –model medium
large	1x speed model	whisper audio.wav –model large
Name	Description	Example

Whisper Setup For Fedora Linux

sudo dnf install ffmpeg
sudo dnf install python-pip
pip install -U openai-whisper

Whisper Command Line Usage

whisper audio.wav --model base.en

Whisper Python Usage

import whisper
model = whisper.load_model("base.en")
result = model.transcribe("audio.wav")
print(result["text"])

OpenAI Whisper API Command Line Base Speech To Text

OpenAI Whisper API Python Base Speech To Text

Usage

You can use any IDE or text editor and the command line to compile and execute Python code. In this tutorial, the OpenAI Whisper AI was used to transcribe an audio file from the Python Arithmetic Operators article short video.

Results

The base model was chosen for speed. The base model was very accurate and only had 4 mistakes transcribing.

The phrase “Arithmetic Operators” was incorrectly transcribed as “At three million co-opeters”. The phrase “for your reader” was incorrectly transcribed as “the kabillano jiarida”. The phrase “the link is in the description” was incorrectly transcribed as “the link sign-in description”. The phrase “Ojambo” was transcribed as “jumbo” for both OjamboShop.com and OjamboServices.com.

Open Source

Python is licensed under the Python Software Foundation License. This allows commercial use, modification, distribution, and allows making derivatives proprietary.

OpenAI Whisper API is licensed under the permissive MIT License. This allows commercial use, modification, distribution, and allows making derivatives proprietary. The MIT License was drafted before software patents were recognized under US Law.

FFmpeg is licensed under the GNU Lesser General Public License (LGPL) version 2.1 or later. This allows commercial use, modification, distribution, and allows making derivatives proprietary.

mpv is licensed under the GNU General Public License (GPL) and GNU Lesser General Public License (LGPL) version 2.1. This allows commercial use, modification, distribution, and allows making derivatives proprietary.

Learn Programming Courses:

Courses are optimized for your web browser on any device.

OjamboShop.com Learning Python Course — OjamboShop.com Learning Python Interactive Online Course

OjamboShop.com Learning PHP Course — OjamboShop.com Learning PHP Interactive Online Course

Limited Time Offer:

OjamboShop.com is offering 20% off coupon code SCHOOL for Learning Python Course or for Learning PHP Course until End Day 2024.

Learn Programming Ebooks:

Ebooks can be downloaded to your reader of choice.

OjamboShop.com Learning Python Ebook Cover Page

OjamboShop.com Learning PHP Ebook With Sample Page

Conclusion:

Python makes it easy to perform speech recognition using the OpenAI Whisper API. Converting audio to text for captions or subtitles is easy in Python using the open source tools.

Take this opportunity to learn the Python or PHP programming language by making a one-time purchase at Learning Python Course or Learning PHP Course. A web browser is the only thing needed to learn Python or PHP in 2024 at your leisure. All the developer tools are provided right in your web browser.

If you prefer to download ebook versions for your reader then you may purchase at Learning Python Ebook or Learning PHP Ebook

References:

Learning Python Course on OjamboShop.com
Learning PHP Course on OjamboShop.com
Learning Python Ebook on Amazon
Learning PHP Ebook on Amazon
OpenAI Whisper Github Page
OpenAI Whisper License
FFmpeg Project
FFmpeg License
mpv Media Player
mpv GPL License
mpv LGPL License