Creating a Voice Recognition Application with Python

Introduction
Prerequisites
Setup
Step 1: Installing Required Libraries
Step 2: Recording Audio
Step 3: Speech-to-Text Conversion
Step 4: Text Processing
Step 5: Responding to Voice Commands
Conclusion

Introduction

In this tutorial, we will learn how to create a voice recognition application using Python. Voice recognition, also known as speech recognition, enables a computer to interpret and understand human speech. By the end of this tutorial, you will be able to build a basic voice recognition application that can convert spoken words into written text and perform actions based on voice commands.

Prerequisites

Before getting started, you should have a basic understanding of Python programming. Familiarity with installing libraries using pip and working with command-line interfaces (CLI) will also be helpful.

Setup

To follow along with this tutorial, you will need to have Python installed on your system. You can download the latest version of Python from the official website (https://www.python.org/downloads/). Additionally, you may need to install a text editor or integrated development environment (IDE) to write and run Python code. Examples of popular text editors/IDEs include Visual Studio Code, PyCharm, and Atom.

Step 1: Installing Required Libraries

To begin, we need to install some Python libraries that will help us build our voice recognition application. Open your command-line interface and execute the following command to install the required libraries: python pip install SpeechRecognition pyaudio

SpeechRecognition: This library provides support for speech recognition functionalities in Python.
pyaudio: This library enables Python to work with microphones and audio input.

Step 2: Recording Audio

Now that we have installed the necessary libraries, we can start developing our voice recognition application. The first step is to record audio from the user’s microphone. Create a Python script, for example, record_audio.py, and let’s get started: ```python import speech_recognition as sr

# Create an instance of the Recognizer class
r = sr.Recognizer()

# Open the microphone and start recording
with sr.Microphone() as source:
    print("Listening...")

    # Adjust ambient noise for better recognition
    r.adjust_for_ambient_noise(source)

    # Record audio
    audio = r.listen(source)

# Save the recorded audio to a file
with open("audio.wav", "wb") as f:
    f.write(audio.get_wav_data())
    print("Audio saved as audio.wav")
``` In the above code, we import the `speech_recognition` library and create an instance of the `Recognizer` class. We then open the microphone and start recording audio. The `adjust_for_ambient_noise` method helps in improving recognition accuracy by adjusting to the surrounding noise. Finally, the recorded audio is saved as a WAV file.

To run the script, execute the following command in your command-line interface: python python record_audio.py

Step 3: Speech-to-Text Conversion

With the audio recorded, we can now convert it into text using the Google Web Speech API. Modify the record_audio.py script as follows: ```python import speech_recognition as sr

# Create an instance of the Recognizer class
r = sr.Recognizer()

# Open the audio file
with sr.AudioFile("audio.wav") as source:
    # Load audio to memory
    audio = r.record(source)

try:
    # Use Google Web Speech API for speech-to-text conversion
    text = r.recognize_google(audio)
    print("Text:", text)
except sr.UnknownValueError:
    print("Unable to convert speech to text.")
except sr.RequestError as e:
    print("Error:", str(e))
``` In the updated code, we import the `AudioFile` class from `speech_recognition` and read the audio file recorded in the previous step. We then use the `recognize_google` method to convert the audio into text using the Google Web Speech API. If successful, the recognized text will be printed; otherwise, an error message will be displayed.

Run the updated script using the command: python python record_audio.py

Step 4: Text Processing

Now that we have the converted text, we can perform any necessary processing or analysis on it. In this example, we will simply display the recognized text. Modify the record_audio.py script once again: ```python import speech_recognition as sr

# Create an instance of the Recognizer class
r = sr.Recognizer()

# Open the audio file
with sr.AudioFile("audio.wav") as source:
    # Load audio to memory
    audio = r.record(source)

try:
    # Use Google Web Speech API for speech-to-text conversion
    text = r.recognize_google(audio)
    print("You said:", text)
except sr.UnknownValueError:
    print("Unable to convert speech to text.")
except sr.RequestError as e:
    print("Error:", str(e))
``` Now, when running the script, it will output the recognized text as `"You said: <text>"`, where `<text>` is the speech input you provided.

Step 5: Responding to Voice Commands

To make our voice recognition application more interactive, we can define specific voice commands and associate them with actions. For this example, let’s create a simple command for opening a web browser. Modify the record_audio.py script one last time: ```python import speech_recognition as sr

# Create an instance of the Recognizer class
r = sr.Recognizer()

# Open the audio file
with sr.AudioFile("audio.wav") as source:
    # Load audio to memory
    audio = r.record(source)

try:
    # Use Google Web Speech API for speech-to-text conversion
    text = r.recognize_google(audio)
    print("You said:", text)

    # Respond to voice commands
    if "open browser" in text.lower():
        import webbrowser
        webbrowser.open("https://www.example.com")
except sr.UnknownValueError:
    print("Unable to convert speech to text.")
except sr.RequestError as e:
    print("Error:", str(e))
``` In the updated script, we check if the recognized text contains the phrase "open browser" (case-insensitive). If the phrase is found, we import the `webbrowser` module and open the specified URL ("https://www.example.com" in this case). You can modify the command and corresponding action according to your requirements.

Run the script once more to see the voice command in action: python python record_audio.py Congratulations! You have just created a voice recognition application with Python. You can further enhance the application by adding more voice commands and associated actions.

Conclusion

In this tutorial, we have learned how to build a voice recognition application using Python. We started by installing the necessary libraries and then proceeded to record audio from a microphone. We converted the recorded audio into text using the Google Web Speech API and performed some basic text processing or analysis. Finally, we responded to specific voice commands by executing associated actions. This application can serve as a foundation for developing voice-enabled applications or voice assistants.

Remember to explore the libraries used in this tutorial and refer to their documentation for more advanced features and customization options.

Published: 25 May 2021