Python Speech Recognition Using Whisper API And FFmpeg

Perform Speech Recognition In Python

OpenAI Whisper API is an open source speech to text (speech recognition) model.

Python is a requirement for the Whisper API. Captions, subtitles and speech recognition is possible using open source tools Whisper API and FFmpeg.

Python can be used from the command line to transcribe audio input via the Whisper API.

Test Tools

Test System:

  • CPU: Intel(R) i7 2600 @ 3.40GHz.
  • Memory: 16GB DDR3.
  • Operating System: Fedora Linux 40 64bit.
  • Desktop: Gnome 46 On Wayland.
  • Graphics: NVIDIA GeForce GTX 950.
  • Hard Disk: OCZ Intrepid 3800 SSD.

Test Suite

  • Audio file: 56 seconds, 1536 kbps Bit Rate, 48.00 kHz Sample Rate, WAV Codec
  • Python file: Transcribe Audio File Into Text
  • Video player: mpv Media Player

Whisper API

Whisper Language Models

Whisper Models And Languages
Name Description Example
tiny 32x speed model whisper audio.wav –model tiny
base 16x speed model whisper audio.wav –model base
small 6x speed model whisper audio.wav –model small
medium 2x speed model whisper audio.wav –model medium
large 1x speed model whisper audio.wav –model large
Name Description Example

Whisper Setup For Fedora Linux

sudo dnf install ffmpeg
sudo dnf install python-pip
pip install -U openai-whisper

Whisper Command Line Usage

whisper audio.wav --model base.en

Whisper Python Usage

import whisper
model = whisper.load_model("base.en")
result = model.transcribe("audio.wav")
print(result["text"])

Whisper API Command Line Base Speech Recognition
OpenAI Whisper API Command Line Base Speech To Text

Whisper API Python Base Speech Recognition
OpenAI Whisper API Python Base Speech To Text


Usage

You can use any IDE or text editor and the command line to compile and execute Python code. In this tutorial, the OpenAI Whisper AI was used to transcribe an audio file from the Python Arithmetic Operators article short video.

Results

The base model was chosen for speed. The base model was very accurate and only had 4 mistakes transcribing.

The phrase “Arithmetic Operators” was incorrectly transcribed as “At three million co-opeters”. The phrase “for your reader” was incorrectly transcribed as “the kabillano jiarida”. The phrase “the link is in the description” was incorrectly transcribed as “the link sign-in description”. The phrase “Ojambo” was transcribed as “jumbo” for both OjamboShop.com and OjamboServices.com.

Open Source

Python is licensed under the Python Software Foundation License. This allows commercial use, modification, distribution, and allows making derivatives proprietary.

OpenAI Whisper API is licensed under the permissive MIT License. This allows commercial use, modification, distribution, and allows making derivatives proprietary. The MIT License was drafted before software patents were recognized under US Law.

FFmpeg is licensed under the GNU Lesser General Public License (LGPL) version 2.1 or later. This allows commercial use, modification, distribution, and allows making derivatives proprietary.

mpv is licensed under the GNU General Public License (GPL) and GNU Lesser General Public License (LGPL) version 2.1. This allows commercial use, modification, distribution, and allows making derivatives proprietary.

Learn Programming Courses:

Courses are optimized for your web browser on any device.

OjamboShop.com Learning Python Course
OjamboShop.com Learning Python Interactive Online Course

OjamboShop.com Learning PHP Course
OjamboShop.com Learning PHP Interactive Online Course

Limited Time Offer:

OjamboShop.com is offering 20% off coupon code SCHOOL for Learning Python Course or for Learning PHP Course until End Day 2024.

Learn Programming Ebooks:

Ebooks can be downloaded to your reader of choice.

OjamboShop.com Learning Python Ebook
OjamboShop.com Learning Python Ebook Cover Page

OjamboShop.com Learning PHP Ebook
OjamboShop.com Learning PHP Ebook With Sample Page

Conclusion:

Python makes it easy to perform speech recognition using the OpenAI Whisper API. Converting audio to text for captions or subtitles is easy in Python using the open source tools.

Take this opportunity to learn the Python or PHP programming language by making a one-time purchase at Learning Python Course or Learning PHP Course. A web browser is the only thing needed to learn Python or PHP in 2024 at your leisure. All the developer tools are provided right in your web browser.

If you prefer to download ebook versions for your reader then you may purchase at Learning Python Ebook or Learning PHP Ebook

References:

Leave a Reply

Your email address will not be published. Required fields are marked *