Drishti Sharma — January 21, 2022
Advanced Audio Deep Learning Libraries Python

This article was published as a part of the Data Science Blogathon.

In this article, we are going to analyze the Zero-crossing rates (ZCRs) of different music genre tracks. This post is inspired by Valerio Valerdo’s work. I highly encourage you to check out his Youtube channel for his outstanding work in the field of ML/DL for audio.

Tools used

  • Python
  • Librosa (librosa.feature.zero_crossing)
  • One 30-second audio clip from each of 10 distinct music genres (Classical, Blues, Reggae, Rock, Jazz, Pop, Hip-hop, Country, Disco, and Metal) from GTZAN dataset

Introduction

Zero-Crossing: A zero-crossing is an instantaneous point at which the sign of a mathematical function changes (e.g. from positive to negative). It is represented by an intercept of the axis (zero value) in the graph of the function. 

 

Zero Crossing Rates
A zero-crossing in a line graph of a waveform representing voltage over time

 

Zero-Crossing Rate: The zero-crossing rate (ZCR) is the rate at which a signal transitions from positive to zero to negative or negative to zero to positive. Its value has been extensively used in both speech recognition and music information retrieval for classifying percussive sounds.

ZCR is defined as :

Zero Crossing Rates | Formula

The zero-crossing rate can be utilized as a basic pitch detection algorithm for monophonic tonal signals. Voice activity detection (VAD), which determines whether or not human speech is present in an audio segment, also makes use of zero-crossing rates.

Now, let’s take a closer look at it using the librosa library. To begin, we will import all of the required libraries and load the audio files from different music genres with the help of librosa.

A Brief Analysis using Librosa

#Importing all the necessary libraries
import matplotlib.pyplot as plt
import numpy as np
import librosa
import librosa.display
import IPython.display as ipd
%matplotlib inline
#Specifying the path to audio files
classical_music_file = "/content/drive/MyDrive/trytheseaudios/classical.00000.wav"
blues_music_file = "/content/drive/MyDrive/trytheseaudios/blues.00000.wav"
reggae_music_file = "/content/drive/MyDrive/trytheseaudios/reggae.00000.wav"
rock_music_file = "/content/drive/MyDrive/trytheseaudios/rock.00000.wav"
jazz_music_file = "/content/drive/MyDrive/trytheseaudios/jazz.00000.wav"
country_music_file ="/content/drive/MyDrive/trytheseaudios/country.00000.wav"
disco_music_file = "/content/drive/MyDrive/trytheseaudios/disco.00000.wav"
hiphop_music_file = "/content/drive/MyDrive/trytheseaudios/hiphop.00000.wav"
metal_music_file = "/content/drive/MyDrive/trytheseaudios/metal.00000.wav"
pop_music_file = "/content/drive/MyDrive/trytheseaudios/pop.00000.wav"
# load audio files with librosa
classical, sr = librosa.load(classical_music_file, duration=30)
blues, _ = librosa.load(blues_music_file,duration=30)
reggae, _ = librosa.load(reggae_music_file, duration=30)
rock, _ = librosa.load(rock_music_file, duration=30)
jazz, _ = librosa.load(jazz_music_file, duration=30)
country,_ = librosa.load(country_music_file, duration=30)
disco, _ = librosa.load(disco_music_file, duration=30)
hiphop, _ = librosa.load(hiphop_music_file, duration=30)
metal, _ = librosa.load(metal_music_file, duration=30)
pop, _ = librosa.load(pop_music_file, duration=30)

Following that, we will evaluate and compare the lowest and highest instantaneous ZCR values, as well as the lowest and highest average ZCR values of various music genre samples.

#Determining the music genre with the lowest instantaneous value of ZCR
min([librosa.feature.zero_crossing_rate(classical).min(), librosa.feature.zero_crossing_rate(blues).min(), librosa.feature.zero_crossing_rate(reggae).min(), librosa.feature.zero_crossing_rate(rock).min(), librosa.feature.zero_crossing_rate(jazz).min(), librosa.feature.zero_crossing_rate(country).min(), librosa.feature.zero_crossing_rate(disco).min(), librosa.feature.zero_crossing_rate(hiphop).min(), librosa.feature.zero_crossing_rate(metal).min(), librosa.feature.zero_crossing_rate(pop).min()])

Output: 0.00585 —> which is for the Jazz music genre track!

#Determining the music genre with the highest instantaneous value of ZCR
max([librosa.feature.zero_crossing_rate(classical).max(), librosa.feature.zero_crossing_rate(blues).max(), librosa.feature.zero_crossing_rate(reggae).max(), librosa.feature.zero_crossing_rate(rock).max(), librosa.feature.zero_crossing_rate(jazz).max(), librosa.feature.zero_crossing_rate(country).max(), librosa.feature.zero_crossing_rate(disco).max(), librosa.feature.zero_crossing_rate(hiphop).max(), librosa.feature.zero_crossing_rate(metal).max(), librosa.feature.zero_crossing_rate(pop).max()])

Output: 0.67675 —-> pop music genre track!

#Determining the music genre with the LOWEST AVERAGE value of ZCR
min([librosa.feature.zero_crossing_rate(classical).mean(), librosa.feature.zero_crossing_rate(blues).mean(), librosa.feature.zero_crossing_rate(reggae).mean(), librosa.feature.zero_crossing_rate(rock).mean(), librosa.feature.zero_crossing_rate(jazz).mean(), librosa.feature.zero_crossing_rate(country).mean(), librosa.feature.zero_crossing_rate(disco).mean(), librosa.feature.zero_crossing_rate(hiphop).mean(), librosa.feature.zero_crossing_rate(metal).mean(), librosa.feature.zero_crossing_rate(pop).mean()])

Output: 0.07846 —> Jazz music genre track!

#Determining the music genre with the HIGHEST AVERAGE value of ZCR
max([librosa.feature.zero_crossing_rate(classical).mean(), librosa.feature.zero_crossing_rate(blues).mean(), librosa.feature.zero_crossing_rate(reggae).mean(), librosa.feature.zero_crossing_rate(rock).mean(), librosa.feature.zero_crossing_rate(jazz).mean(), librosa.feature.zero_crossing_rate(country).mean(), librosa.feature.zero_crossing_rate(disco).mean(), librosa.feature.zero_crossing_rate(hiphop).mean(), librosa.feature.zero_crossing_rate(metal).mean(), librosa.feature.zero_crossing_rate(pop).mean()])

Output: 0.18307 —> Metal music genre track!

Also, on further investigation, it was found that the classical genre audio sample track has a low ZCR.

print(f"Minimum Instantaneous ZCR for Classical Genre song:{librosa.feature.zero_crossing_rate(classical).min()}, Maximum Instantaneous ZCR for Classical Genre song:{librosa.feature.zero_crossing_rate(classical).max()}, Average ZCR for Classical Genre song: {librosa.feature.zero_crossing_rate(classical).mean()}")

Output: Minimum Instantaneous ZCR for Classical Genre song:0.02685, Maximum Instantaneous ZCR for Classical Genre song:0.1767, Average ZCR for Classical Genre song: 0.0982

#Determining Minimum instantaneous, Maximum instantaneous and average ZCR for pop music genre track
print(f"Minimum Instantaneous ZCR for Pop Genre song:{librosa.feature.zero_crossing_rate(pop).min()}, Maximum Instantaneous ZCR for Pop Genre song:{librosa.feature.zero_crossing_rate(pop).max()}, Average ZCR for Pop Genre song: {librosa.feature.zero_crossing_rate(pop).mean()}")

Output: Minimum Instantaneous ZCR for Pop Genre song:0.00683, Maximum Instantaneous ZCR for Pop Genre song:0.6767, Average ZCR for Pop Genre song: 0.12676

Observation: According to the results, the jazz music genre track has the lowest instantaneous and average ZCR. Further analysis revealed that even the classical music genre song has extremely low ZCR values. As a result, we can’t generalize and declare that jazz music genre songs have the lowest ZCR, because the observation varies depending on song composition. Furthermore, the metal and pop music audio sample tracks have the highest average ZCR.

Demystification via Visualization

Let us now demystify a little more with the help of visuals. In this regard, we will first use Librosa to extract the zero-crossing rate for each music genre track, and then plot the normalized ZCR for each music genre, followed by the actual (non-normalized) ZCR for each music genre.

#Specifying frame size and hop length
FRAME_SIZE = 1024
HOP_LENGTH = 512
#Extracting zero crossing rate for each music genre song using Librosa 
zcr_classical = librosa.feature.zero_crossing_rate(classical, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_blues = librosa.feature.zero_crossing_rate(blues, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_reggae = librosa.feature.zero_crossing_rate(reggae, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_rock = librosa.feature.zero_crossing_rate(rock, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_jazz = librosa.feature.zero_crossing_rate(jazz, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_country = librosa.feature.zero_crossing_rate(country, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_disco = librosa.feature.zero_crossing_rate(disco, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_hiphop = librosa.feature.zero_crossing_rate(hiphop, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_metal = librosa.feature.zero_crossing_rate(metal, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
zcr_pop = librosa.feature.zero_crossing_rate(pop, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
frames = range(len(zcr_classical))
t = librosa.frames_to_time(frames, hop_length=HOP_LENGTH)
#Visualizing normalized Zero-crossing rate (ZCR) of different music genre songs
plt.figure(figsize=(20, 20))
ax = plt.subplot(5, 2, 1)
librosa.display.waveplot(classical, alpha=0.5)
plt.plot(t, zcr_classical, color="b")
plt.ylim((-1, 1))
plt.title("Classical Music Genre song")
plt.subplot(5, 2, 2)
librosa.display.waveplot(blues, alpha=0.5)
plt.plot(t, zcr_blues, color="g")
plt.ylim((-1, 1))
plt.title("Blues Music Genre song")
plt.subplot(5, 2, 3)
librosa.display.waveplot(reggae, alpha=0.5)
plt.plot(t, zcr_reggae, color="k")
plt.ylim((-1, 1))
plt.title("Reggae Music Genre Song")
plt.subplot(5, 2, 4)
librosa.display.waveplot(rock, alpha=0.5)
plt.plot(t, zcr_rock, color="#E9967A")
plt.ylim((-1, 1))
plt.title("Rock Music Genre song")
plt.subplot(5, 2, 5)
librosa.display.waveplot(jazz, alpha=0.5)
plt.plot(t, zcr_jazz, color="m")
plt.ylim((-1, 1))
plt.title("Jazz Music Genre song")
plt.subplot(5, 2, 6)
librosa.display.waveplot(country, alpha=0.5)
plt.plot(t, zcr_country, color="y")
plt.ylim((-1, 1))
plt.title("Country Music Genre song")
plt.subplot(5, 2, 7)
librosa.display.waveplot(disco, alpha=0.5)
plt.plot(t, zcr_disco, color="r")
plt.ylim((-1, 1))
plt.title("Disco Music Genre song")
plt.subplot(5, 2, 8)
librosa.display.waveplot(hiphop, alpha=0.5)
plt.plot(t, zcr_hiphop, color="#7FFF00")
plt.ylim((-1, 1))
plt.title("Hiphop Music Genre song")
plt.subplot(5, 2, 9)
librosa.display.waveplot(metal, alpha=0.5)
plt.plot(t, zcr_metal, color="#FFB90F")
plt.ylim((-1, 1))
plt.title("Metal Music Genre song")
plt.subplot(5, 2, 10)
librosa.display.waveplot(pop, alpha=0.5)
plt.plot(t, zcr_pop, color="#458B00")
plt.ylim((-1, 1))
plt.title("Pop Music Genre song")
plt.subplots_adjust(hspace = 0.75)

 

Zero Crossing Rates
Waveplots illustrating the zero-crossing rates of various music genre tracks

#Visualizing NORMALIZED Zero-crossing rates of different music genre tracks

plt.figure(figsize=(25, 25))
plt.plot(t, zcr_classical, color="b")
plt.plot(t, zcr_blues, color="g")
plt.plot(t, zcr_reggae, color="k")
plt.plot(t, zcr_rock, color="#E9967A")
plt.plot(t, zcr_jazz, color="m")
plt.plot(t, zcr_country, color="y")
plt.plot(t, zcr_disco, color="r")
plt.plot(t, zcr_hiphop, color="#7FFF00")
plt.plot(t, zcr_metal, color="#FFB90F")
plt.plot(t, zcr_pop, color="#458B00")
plt.ylim(0, 1)

 

Zero Crossing Rates
Graph depicting the NORMALIZED Zero-crossing rates of different music genre tracks

#Visualizing ACTUAL (NON-NORMALIZED) Zero-crossing rate of different music genre tracks

plt.figure(figsize=(25, 25))
plt.plot(t, zcr_classical*FRAME_SIZE, color="b")
plt.plot(t, zcr_blues*FRAME_SIZE, color="g")
plt.plot(t, zcr_reggae*FRAME_SIZE, color="k")
plt.plot(t, zcr_rock*FRAME_SIZE, color="#E9967A")
plt.plot(t, zcr_jazz*FRAME_SIZE, color="m")
plt.plot(t, zcr_country*FRAME_SIZE, color="y")
plt.plot(t, zcr_disco*FRAME_SIZE, color="r")
plt.plot(t, zcr_hiphop*FRAME_SIZE, color="#7FFF00")
plt.plot(t, zcr_metal*FRAME_SIZE, color="#FFB90F")
plt.plot(t, zcr_pop*FRAME_SIZE, color="#458B00")
plt.ylim(0, 600)

 

Zero Crossing Rates
Graph depicting the ACTUAL (NON-NORMALIZED) zero-crossing rates of different music genre tracks

Conclusion

Upon mathematical and visual inspection, we can say that the jazz and classical music genre songs have low ZCR values. And Pop and Metal music genre songs have high ZCR. However, we can not extrapolate these findings to the entire group based on the small sample size. The preceding analysis, on the other hand, may offer us a concise summary, a form of intuition, about distinct types of music genres.

Thanks for reading. If you have any questions or concerns, please leave them in the comments section below. Happy Learning!

Read more articles on our blog.

Link to GitHub repo: Click here!

References: 1. https://www.youtube.com/watch?v=EycaSbIRx-0&t=1352s

2. https://en.wikipedia.org/wiki/Zero_crossing

3. https://en.wikipedia.org/wiki/Zero-crossing_rate

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion. 

Leave a Reply Your email address will not be published. Required fields are marked *