Python for Sound Processing: Using Librosa

Table of Contents

  1. Introduction
  2. Prerequisites
  3. Installation
  4. Loading and Visualizing Audio Files
  5. Basic Audio Analysis
  6. Feature Extraction
  7. Pitch and Tempo
  8. Audio Effects
  9. Conclusion

Introduction

In this tutorial, we will explore how to process and analyze sound using Python’s Librosa library. Librosa is a powerful audio analysis library that provides a simple and intuitive interface for working with audio data. By the end of this tutorial, you will be able to load and visualize audio files, perform basic audio analysis, extract useful features, analyze pitch and tempo, and apply various audio effects.

Prerequisites

To follow along with this tutorial, you should have a basic understanding of Python programming. Familiarity with concepts such as arrays and functions will be helpful. Additionally, you should have the following software installed on your computer:

  • Python (version 3.6 or higher)
  • Librosa library (can be installed using pip)

Installation

Before we begin, let’s make sure we have the Librosa library installed. Open your command prompt or terminal and run the following command to install Librosa: shell pip install librosa Once the installation is complete, we can start processing sound using Python and Librosa.

Loading and Visualizing Audio Files

The first step in analyzing sound is to load an audio file into our Python environment. Librosa provides functions to load various audio file formats, such as WAV, MP3, and FLAC. Let’s start by loading an audio file and visualizing its waveform. ```python import librosa import librosa.display import matplotlib.pyplot as plt

# Load audio file
audio_path = 'path/to/audio/file.wav'
audio, sr = librosa.load(audio_path)

# Visualize waveform
plt.figure(figsize=(14, 5))
librosa.display.waveplot(audio, sr=sr)
plt.xlabel('Time (s)')
plt.ylabel('Amplitude')
plt.title('Waveform')
plt.show()
``` In the code above, we first import the necessary libraries: Librosa, Librosa's display module, and Matplotlib for visualization. We then specify the path to our audio file and use Librosa's `load` function to load the audio into memory. The `load` function returns the audio data and the sample rate (`sr`) of the audio file.

Next, we create a figure using Matplotlib and use Librosa’s waveplot function to plot the waveform of the audio. We set the x-axis labels to represent time in seconds and the y-axis labels to represent amplitude. Finally, we display the waveform using plt.show().

Basic Audio Analysis

Now that we know how to load and visualize audio, let’s explore some basic audio analysis techniques. Librosa provides a range of functions to compute various audio features, such as Mel frequency cepstral coefficients (MFCCs), spectral contrast, and chroma feature. Let’s compute and visualize the MFCCs of our audio file. ```python # Compute MFCCs mfccs = librosa.feature.mfcc(audio, sr=sr)

# Visualize MFCCs
plt.figure(figsize=(10, 4))
librosa.display.specshow(mfccs, x_axis='time')
plt.colorbar(format='%+2.0f dB')
plt.xlabel('Time (s)')
plt.ylabel('MFCC')
plt.title('MFCCs')
plt.show()
``` In the code above, we use Librosa's `feature.mfcc` function to compute the MFCCs of our audio. The function takes the audio data (`audio`) and the sample rate (`sr`) as input. The output is a 2D array representing the MFCC values over time.

We then use Librosa’s specshow function to visualize the MFCCs as a spectrogram. We set the x-axis labels to represent time in seconds, the y-axis labels to represent MFCC values, and add a color bar to indicate the magnitude of the MFCC values. Finally, we display the MFCCs using plt.show().

Feature Extraction

Apart from MFCCs, Librosa provides several other functions for feature extraction. These features can be useful for tasks such as audio classification, music genre recognition, and audio similarity analysis. Let’s explore a few feature extraction techniques using Librosa.

Spectral Contrast

Spectral contrast measures the difference in amplitude between peaks and valleys in the spectrogram. It can be used to identify regions of interest in an audio signal. Let’s compute and visualize the spectral contrast of our audio file. ```python # Compute spectral contrast contrast = librosa.feature.spectral_contrast(audio, sr=sr)

# Visualize spectral contrast
plt.figure(figsize=(10, 4))
librosa.display.specshow(contrast, x_axis='time')
plt.colorbar(format='%+2.0f dB')
plt.xlabel('Time (s)')
plt.ylabel('Frequency Band')
plt.title('Spectral Contrast')
plt.show()
``` In the above code, we use Librosa's `feature.spectral_contrast` function to compute the spectral contrast of our audio. The function takes the audio data and the sample rate as input and returns a 2D array representing the spectral contrast values over time.

We then use Librosa’s specshow function to visualize the spectral contrast as a spectrogram, similar to the MFCCs visualization. We set the x-axis labels to represent time in seconds, the y-axis labels to represent frequency bands, and add a color bar to indicate the magnitude of the spectral contrast values. Finally, we display the spectral contrast using plt.show().

Chroma Feature

The chroma feature represents the 12 different pitch classes of an audio signal. It can be used to analyze the harmonic content of a piece of music or detect key changes. Let’s compute and visualize the chroma feature of our audio file. ```python # Compute chroma feature chroma = librosa.feature.chroma_stft(audio, sr=sr)

# Visualize chroma feature
plt.figure(figsize=(10, 4))
librosa.display.specshow(chroma, x_axis='time', y_axis='chroma')
plt.colorbar()
plt.xlabel('Time (s)')
plt.ylabel('Pitch Class')
plt.title('Chroma Feature')
plt.show()
``` In the above code, we use Librosa's `feature.chroma_stft` function to compute the chroma feature of our audio. The function takes the audio data and the sample rate as input and returns a 2D array representing the chroma feature values over time.

We then use Librosa’s specshow function to visualize the chroma feature. This time, we set the y-axis labels to represent the 12 pitch classes, and the color bar indicates the magnitude of the chroma feature values. Finally, we display the chroma feature using plt.show().

Pitch and Tempo

Librosa provides functions to estimate pitch and tempo from audio signals. Pitch refers to the perceived frequency of a musical note, while tempo represents the speed or pace of a piece of music. Let’s explore how to estimate the pitch and tempo of our audio file using Librosa.

Pitch Estimation

Pitch estimation can be performed by analyzing the harmonic content of an audio signal. Librosa’s yin function provides a fast and accurate pitch estimator. Let’s estimate the pitch of our audio file. ```python # Estimate pitch pitch, _ = librosa.core.piptrack(audio, sr=sr)

# Visualize pitch
plt.figure(figsize=(10, 4))
plt.plot(pitch)
plt.xlabel('Frames')
plt.ylabel('Pitch')
plt.title('Pitch Estimation')
plt.show()
``` In the above code, we use Librosa's `core.piptrack` function to estimate the pitch of our audio. The function takes the audio data and the sample rate as input and returns two arrays: pitch estimates and pitch magnitudes. In this example, we only use the pitch estimates.

We then use Matplotlib’s plot function to visualize the pitch estimates over time. The x-axis represents the frames, and the y-axis represents the pitch values. Finally, we display the pitch estimation using plt.show().

Tempo Estimation

Tempo estimation is the process of determining the tempo or beats-per-minute (BPM) of an audio signal. Librosa’s beat module provides functions to estimate the tempo of a music recording. Let’s estimate the tempo of our audio file. ```python # Estimate tempo tempo, _ = librosa.beat.beat_track(audio, sr=sr)

print(f"The estimated tempo is {tempo} BPM.")
``` In the above code, we use Librosa's `beat.beat_track` function to estimate the tempo of our audio. The function takes the audio data and the sample rate as input and returns the estimated tempo.

We then print the estimated tempo in BPM to the console. You can further process the tempo value or use it for various tempo-based applications.

Audio Effects

Apart from analysis, Librosa also provides functions to manipulate audio signals and apply audio effects. Let’s explore a few audio effects that can be applied using Librosa.

Time Stretching

Time stretching is a technique that changes the duration of an audio signal without affecting its pitch. Librosa’s effects.time_stretch function can be used to time stretch an audio signal. Let’s apply time stretching to our audio file. ```python # Apply time stretching (2x speed) audio_stretched = librosa.effects.time_stretch(audio, 2.0)

# Save stretched audio to file
output_path = 'path/to/stretched/file.wav'
librosa.output.write_wav(output_path, audio_stretched, sr)
``` In the above code, we use Librosa's `effects.time_stretch` function to time stretch our audio by a factor of 2.0, effectively doubling its speed. The function takes the audio data and the time stretch factor as input and returns the time-stretched audio.

We then use Librosa’s output.write_wav function to save the time-stretched audio to a file. The function takes the output file path, the time-stretched audio, and the sample rate as input.

Pitch Shifting

Pitch shifting is a technique that changes the pitch of an audio signal without affecting its duration. Librosa’s effects.pitch_shift function can be used to pitch shift an audio signal. Let’s apply pitch shifting to our audio file. ```python # Apply pitch shifting (+5 semitones) audio_pitch_shifted = librosa.effects.pitch_shift(audio, sr, n_steps=5)

# Save pitch-shifted audio to file
output_path = 'path/to/pitch-shifted/file.wav'
librosa.output.write_wav(output_path, audio_pitch_shifted, sr)
``` In the above code, we use Librosa's `effects.pitch_shift` function to pitch shift our audio by 5 semitones. The function takes the audio data, the sample rate, and the pitch shift factor (`n_steps`) as input and returns the pitch-shifted audio.

We then use Librosa’s output.write_wav function to save the pitch-shifted audio to a file, similar to the time stretching example.

Conclusion

In this tutorial, we have learned how to process and analyze sound using Python’s Librosa library. We started by loading and visualizing audio files, then explored basic audio analysis techniques such as computing MFCCs, spectral contrast, and chroma features. We also learned how to estimate pitch and tempo from audio signals. Finally, we applied audio effects such as time stretching and pitch shifting.

Librosa provides a wide range of functionality for sound processing and analysis. With the knowledge gained from this tutorial, you can now start exploring more advanced audio processing techniques and apply them to various real-world applications.

Remember to refer to the official Librosa documentation for additional details, functions, and examples. Happy sound processing with Python and Librosa!