Exploring the World of Music Generation with AI

Akshit Behera 16 Aug, 2023 • 15 min read

Introduction

Music generation utilizing AI has gained significance as a valuable area, transforming the way music is produced and enjoyed. This project introduces the concept and purpose behind employing artificial intelligence in music creation. We aim to explore the process of generating music using AI algorithms and the potential it holds.

Music Generation with AI

Our project focuses on understanding and implementing AI techniques that facilitate music composition. AI can make tunes by learning from a big collection of music pieces by using special math rules to understand patterns, beats, and structures in music and then making new tunes based on what it has learned. By training models on musical data, we enable AI systems to learn and produce new original compositions. We will also examine recent developments in AI-generated music, particularly highlighting MusicGen by Meta.

By exploring the scope of AI in music generation, the objective of this project is to inspire musicians, researchers, and music enthusiasts to explore the possibilities of this innovative technology. Together, let us embark on this musical expedition and uncover the melodies AI can generate.

Learning Objectives

By working on this project, we stand to gain new technical skills and an understanding of how AI algorithms can be implemented to build innovative applications. By the end of this project, we will:

  1. Gain an understanding of how artificial intelligence is employed in creating music. We will learn the fundamental concepts and techniques used to train AI models for music composition.
  2. Learn how to collect and prepare relevant musical data for AI model training. We will discover how to gather .mp3 files and convert them into MIDI files, utilizing tools such as Spotify’s Basic Pitch.
  3. We will also understand the steps involved in building an AI model for music generation. Further, we will learn about the model architecture suitable for this task and its relevance and gain hands-on experience in training the model, including determining the number of epochs and batch size.
  4. We will spend time to discover methods to evaluate the performance of the trained model. Then we will learn how to analyze metrics and assess the quality of generated music pieces to gauge the model’s effectiveness and identify areas for improvement.
  5. Finally, we will explore the process of using the trained AI model to generate new musical compositions.

This article was published as a part of the Data Science Blogathon.

Project Description

The purpose of this project is to explore the intriguing domain of music generation using AI. We aim to investigate how artificial intelligence techniques create unique musical pieces. By leveraging machine learning algorithms, our objective is to train an AI model capable of producing melodies and harmonies across various musical genres.

The project’s focus is on gathering a diverse range of musical data, specifically .mp3 files, which will serve as the foundation for training the AI model. These files will undergo preprocessing to convert them into MIDI format using specialized tools like Spotify’s Basic Pitch. This conversion is essential as MIDI files provide a structured representation of musical elements that the AI model can easily interpret.

The subsequent phase involves building the AI model tailored for music generation. Train the model using the prepared MIDI data, aiming to capture underlying patterns and structures present in the music.

Conduct the performance evaluation to assess the model’s proficiency. This will involve generating music samples and assessing their quality to refine the process and enhance the model’s ability to produce creative music.

The final outcome of this project will be the ability to generate original compositions using the trained AI model. These compositions can be further refined through post-processing techniques to enrich their musicality and coherence.

Problem Statement

The project endeavours to tackle the issue of limited accessibility to music composition tools. Traditional methods of music creation can be laborious and demand specialized knowledge. Moreover, generating fresh and distinct musical concepts can pose a formidable challenge. The aim of this project is to employ artificial intelligence to circumvent these obstacles and offer a seamless solution for music generation, even for non-musicians. Through the development of an AI model with the capability to compose melodies and harmonies, the project aims to democratize the process of music creation, empowering musicians, hobbyists, and novices to unleash their creative potential and craft unique compositions with ease.

A Brief History of Music Generation Using AI

The story of AI in making tunes goes back to the 1950s, with the Illiac Suite for String Quartet being the first tune made with a computer’s help. However, it’s only in the last few years that AI has really started to shine in this area. Today, AI can make tunes of many types, from classical to pop, and even make tunes that copy the style of famous musicians.

Music Generation with AI

The current state of AI in making tunes is very advance in the recent times. Recently, Meta has brought out a new AI-powered tune maker called MusicGen. MusicGen, made on a strong Transformer model, can guess and make music parts in a similar way to how language models guess the next letters in a sentence. It uses an audio tokenizer called EnCodec to break down audio data into smaller parts for easy processing.

One of the special features of MusicGen is its ability to handle both text descriptions and music cues at the same time, resulting in a smooth mix of artistic expression. Using a big dataset of 20,000 hours of allowed music, making sure its ability to create tunes that connect with listeners. Further, companies like OpenAI have made AI models like MuseNet and Jukin Media’s Jukin Composer that can make tunes in a wide range of styles and types. Moreover, AI can now make tunes that are almost the same as tunes made by humans, making it a strong tool in the music world.

Ethical Considerations

Ethical consideration | Music Generation with AI

Discussing the ethical aspects of AI-generated music is crucial when exploring this field. One pertinent area of concern involves potential copyright and intellectual property infringements. Train AI models on extensive musical datasets, which could result in generated compositions bearing similarities to existing works. It is vital to respect copyright laws and attribute original artists appropriately to uphold fair practices.

Moreover, the advent of AI-generated music may disrupt the music industry, posing challenges for musicians seeking recognition in a landscape inundated with AI compositions. Striking a balance between utilizing AI as a creative tool and safeguarding the artistic individuality of human musicians is an essential consideration.

Data Collection & Preparation

For the purpose of this project, we will try and generate some original instrumental music using AI. Personally, I am a big fan of renowned instrumental music channels like Fluidified, MusicLabChill, and FilFar on YouTube, which have excellent tracks for all kinds of mood. Taking inspiration from these channels, we will attempt to generate music on similar lines, which we will finally share on YouTube.

To assemble the necessary data for our project, we focus on sourcing the relevant .mp3 files that align with our desired musical style. Through extensive exploration of online platforms and websites, we discover legal and freely available instrumental music tracks. These tracks serve as invaluable assets for our dataset, encompassing a diverse assortment of melodies and harmonies to enrich the training process of our model.

Once we have successfully acquired the desired .mp3 files, we proceed to transform them into MIDI files. MIDI files represent musical compositions in a digital format, enabling efficient analysis and generation by our models. For this conversion, we rely on the practical and user-friendly functionality provided by Spotify’s Basic Pitch.

With the assistance of Spotify’s Basic Pitch, we upload the acquired .mp3 files, initiating the transformation process. The tool harnesses advanced algorithms to decipher the audio content, extracting crucial musical elements such as notes and structures to generate corresponding MIDI files. These MIDI files serve as the cornerstone of our music generation models, empowering us to manipulate and produce fresh, innovative compositions.

Model Architecture

To develop our music generation model, we utilize a specialized architecture tailored specifically for this purpose. The chosen architecture comprises two LSTM (Long Short-Term Memory) layers, each consisting of 256 units. LSTM, a type of recurrent neural network (RNN), excels in handling sequential data, making it an excellent choice for generating music with its inherent temporal characteristics.

The first LSTM layer processes input sequences with a fixed length of 100, as determined by the sequence_length variable. By returning sequences, this layer effectively preserves the temporal relationships present in the musical data. To prevent overfitting and improve the model’s adaptability to new data, a dropout layer with a dropout rate of 0.3 is incorporated.

The second LSTM layer, which does not return sequences, receives the outputs from the previous layer and further learns intricate patterns within the music. Finally, a dense layer with a softmax activation function generates output probabilities for the subsequent note.

Building the Model

Having established our model architecture, let’s dive straight into building the same. We will break down the code into sections and explain each part for the reader’s sake.

We start by importing the necessary libraries that provide useful functionalities for our project. In addition to the usual libraries required for regular ops, we will be using tensorflow for deep learning, and music21 for music manipulation.

import numpy as np
import os
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dropout, Dense
from tensorflow.keras.utils import to_categorical
from music21 import converter, instrument, stream, note, chord
from google.colab import files

Loading and Processing MIDI Files

Next, we define the directory where our MIDI files are located. The code then goes through each file in the directory, extracts the notes and chords, and stores them for further processing. The ‘converter’ module from the music21 library is used to parse the MIDI files and retrieve the musical elements. As an experiment, we will first use just one MIDI file to train the model and then compare the result by using five MIDI files for training.

# Directory containing the MIDI files
midi_dir = "/content/Midi Files"

notes = []

# Process each MIDI file in the directory
for filename in os.listdir(midi_dir):
    if filename.endswith(".midi"):
        file = converter.parse(os.path.join(midi_dir, filename))

        # Find all the notes and chords in the MIDI file
        try:
            # If the MIDI file has instrument parts
            s2 = file.parts.stream()
            notes_to_parse = s2[0].recurse()
        except:
            # If the MIDI file only has notes (
            # no chords or instrument parts)
            notes_to_parse = file.flat.notes

        # Extract pitch and duration information from notes and chords
        for element in notes_to_parse:
            if isinstance(element, note.Note):
                notes.append(str(element.pitch))
            elif isinstance(element, chord.Chord):
                notes.append('.'.join(str(n) for n in 
                element.normalOrder))

# Print the number of notes and some example notes
print("Total notes:", len(notes))
print("Example notes:", notes[:10])
 Source: Google Colab Notebook
Source: Google Colab Notebook

Mapping Notes to Integers

To convert the notes into numerical sequences that our model can process, we create a dictionary that maps each unique note or chord to a corresponding integer. This step allows us to represent the musical elements in a numerical format.

# Create a dictionary to map unique notes to integers
unique_notes = sorted(set(notes))
note_to_int = {note: i for i, note in 
enumerate(unique_notes)}

Generating Input and Output Sequences

In order to train our model, we need to create input and output sequences. This is done by sliding a fixed-length window over the list of notes. The input sequence consists of the preceding notes and the output sequence is the next note. These sequences are stored in separate lists.

# Convert the notes to numerical sequences
sequence_length = 100  # Length of each input sequence
input_sequences = []
output_sequences = []

# Generate input/output sequences
for i in range(0, len(notes) - sequence_length, 1):
    # Extract the input sequence
    input_sequence = notes[i:i + sequence_length]
    input_sequences.append([note_to_int[note] for 
    note in input_sequence])

    # Extract the output sequence
    output_sequence = notes[i + sequence_length]
    output_sequences.append(note_to_int[output_sequence])

Reshaping and Normalizing Input Sequences

Before feeding the input sequences to our model, we reshape them to match the expected input shape of the LSTM layer. Additionally, we normalize the sequences by dividing them by the total number of unique notes. This step ensures that the input values fall within a suitable range for the model to learn effectively.

# Reshape and normalize the input sequences
num_sequences = len(input_sequences)
num_unique_notes = len(unique_notes)

# Reshape the input sequences
X = np.reshape(input_sequences, (num_sequences, sequence_length, 1))
# Normalize the input sequences
X = X / float(num_unique_notes)

One-Hot Encoding Output Sequences

The output sequences representing the next note to predict will convert into a one-hot encoded format. This encoding allows the model to understand the probability distribution of the next note among the available notes.

# One-hot encode the output sequences
y = to_categorical(output_sequences)

Defining the RNN Model

We define our RNN (Recurrent Neural Network) model using the Sequential class from the tensorflow.keras.models module. The model consists of two LSTM (Long Short-Term Memory) layers, followed by a dropout layer to prevent overfitting. The last layer is a Dense layer with a softmax activation function to output the probabilities of each note.

# Define the RNN model
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]), 
return_sequences=True))
model.add(Dropout(0.3))
model.add(LSTM(256))
model.add(Dense(y.shape[1], activation='softmax'))

Compiling and Training the Model

We compile the model by specifying the loss function and optimizer. We then proceed to train the model on the input sequences (X) and output sequences (y) for a specific number of epochs and with a given batch size.

# Compile the model
model.compile(loss='categorical_crossentropy', optimizer='adam')

# Step 4: Train the model
model.fit(X, y, batch_size=64, epochs=100)

Music Generation

Once we train the model, we can generate new music sequences. We define a function named generate_music that takes three inputs: the trained model, seed_sequence, and length. It uses the model to predict the next note in the sequence based on the previous notes and repeats this process to generate the desired length of music.

To start, we create a copy of the seed_sequence to prevent any modifications to the original sequence. This seed_sequence serves as the initial point for generating the music.

We then enter a loop that runs length times. Within each iteration, perform the following steps:

  1. Convert the generated_sequence into a numpy array.
  2. Reshape the input_sequence by adding an extra dimension to match the expected input shape of the model.
  3. Normalize the input_sequence by dividing it by the total number of unique notes. This ensures that the values fall within a suitable range for the model to work effectively.

After normalizing the input_sequence, use the model to predict the probabilities of the next note. The model.predict method takes the input_sequence as input and returns the predicted probabilities.

To select the next note, the np.random.choice function is used, which randomly picks an index based on the probabilities obtained. This randomness introduces diversity and unpredictability into the generated music.

The selected index represents the new note, which is appended to the generated_sequence. The generated_sequence is then updated by removing the first element to maintain the desired length. Once the loop completes, the generated_sequence is returned, representing the newly generated music.

The seed_sequence and the desired generated_length need to be set to generate the music. The seed_sequence should be a valid input sequence that the model has been trained on, and the generated_length determines the number of notes the generated music should contain.

# Generate new music
def generate_music(model, seed_sequence, length):
    generated_sequence = seed_sequence.copy()

    for _ in range(length):
        input_sequence = np.array(generated_sequence)
        input_sequence = np.reshape(input_sequence, (1, len(input_sequence), 1))
        input_sequence = input_sequence / float(num_unique_notes)  # Normalize input sequence

        predictions = model.predict(input_sequence)[0]
        new_note = np.random.choice(range(len(predictions)), p=predictions)
        generated_sequence.append(new_note)
        generated_sequence = generated_sequence[1:]

    return generated_sequence

# Set the seed sequence and length of the generated music
seed_sequence = input_sequences[0]   # Replace with your own seed sequence
generated_length = 100  # Replace with the desired length of the generated music

generated_music = generate_music(model, seed_sequence, generated_length)
generated_music
# Output of the above code
[1928,
 1916,
 1959,
 1964,
 1948,
 1928,
 1190,
 873,
 1965,
 1946,
 1928,
 1970,
 1947,
 1946,
 1964,
 1948,
 1022,
 1945,
 1916,
 1653,
 873,
 873,
 1960,
 1946,
 1959,
 1942,
 1348,
 1960,
 1961,
 1971,
 1966,
 1927,
 705,
 1054,
 150,
 1935,
 864,
 1932,
 1936,
 1763,
 1978,
 1949,
 1946,
 351,
 1926,
 357,
 363,
 864,
 1965,
 357,
 1928,
 1949,
 351,
 1928,
 1949,
 1662,
 1352,
 1034,
 1021,
 977,
 150,
 325,
 1916,
 1960,
 363,
 943,
 1949,
 553,
 1917,
 1962,
 1917,
 1916,
 1947,
 1021,
 1021,
 1051,
 1648,
 873,
 977,
 1959,
 1927,
 1959,
 1947,
 434,
 1949,
 553,
 360,
 1916,
 1190,
 1022,
 1348,
 1051,
 325,
 1965,
 1051,
 1917,
 1917,
 407,
 1948,
 1051]

Post-Processing

The generated output, as seen, is a sequence of integers representing the notes or chords in our generated music. In order to listen to the generated output, we will have to convert this back into music by reversing the mapping we created earlier to get the original notes/chords. To do this, we will firstly create a dictionary called int_to_note, where the integers are the keys and the corresponding notes are the values.

Next, we create a stream called output_stream to store the generated notes and chords. This stream acts as a container to hold the musical elements that will constitute the generated music.

We then iterate through each element in the generated_music sequence. Each element is a number representing a note or a chord. We use the int_to_note dictionary to convert the number back to its original note or chord string representation.

If the pattern is a chord, which can be identified by the presence of a dot or being a digit, we split the pattern string into individual notes. For each note, we create a note.Note object, assign it a piano instrument, and add it to the notes list. Finally, we create a chord.Chord object from the notes list, representing the chord, and append it to the output_stream.

If the pattern is a single note, we create a note.Note object for that note, assign it a piano instrument, and add it directly to the output_stream.

Once all the patterns in the generated_music sequence have been processed, we write the output_stream to a MIDI file named ‘generated_music.mid’. Finally, we download the generated music file from Colab using the files.download function.

# Reverse the mapping from notes to integers
int_to_note = {i: note for note, i in note_to_int.items()}

# Create a stream to hold the generated notes/chords
output_stream = stream.Stream()

# Convert the output from the model into notes/chords
for pattern in generated_music:
    # pattern is a number, so we convert it back to a note/chord string
    pattern = int_to_note[pattern]

    # If the pattern is a chord
    if ('.' in pattern) or pattern.isdigit():
        notes_in_chord = pattern.split('.')
        notes = []
        for current_note in notes_in_chord:
            new_note = note.Note(int(current_note))
            new_note.storedInstrument = instrument.Piano()
            notes.append(new_note)
        new_chord = chord.Chord(notes)
        output_stream.append(new_chord)
    # If the pattern is a note
    else:
        new_note = note.Note(pattern)
        new_note.storedInstrument = instrument.Piano()
        output_stream.append(new_note)

# Write the stream to a MIDI file
output_stream.write('midi', fp='generated_music.mid')

# Download the generated music file from Colab
files.download('generated_music.mid')

Final output

Now, it’s time to listen to the outcome of our AI-generated music. You can find the link to listen to the music below.

To be honest, the initial result may sound like someone with limited experience playing musical instruments. This is primarily because we trained our model using only a single MIDI file. However, we can enhance the quality of the music by repeating the process and training our model on a larger dataset. In this case, we will train our model using five MIDI files, all of which will be instrumental music of a similar style.

The difference in the quality of the music generated from the expanded dataset is quite remarkable. It clearly demonstrates that training the model on a more diverse range of MIDI files leads to significant improvements in the generated music. This emphasizes the importance of increasing the size and variety of the training dataset to achieve better musical results.

Limitations

Though we managed to generate music using a sophisticated model, but there are certain limitations to scaling such a system.

  1. Limited Dataset: The quality and diversity of the generated music depend on the variety and size of the dataset used for training. A limited dataset can restrict the range of musical ideas and styles our model can learn from.
  2. Creativity Gap: Although AI-generated music can produce impressive results, it lacks the inherent creativity and emotional depth that human composers bring to their compositions. The music generated by AI may sound robotic or miss the subtle nuances that make music truly captivating.
  3. Data Dependency: Influence the generated music by the input MIDI files used for training. If the training dataset has biases or specific patterns, the generated music may exhibit similar biases or patterns, limiting its originality.
  4. Computational Requirements: Training and generating music using AI models can be computationally expensive and time-consuming. It requires powerful hardware and efficient algorithms to train complex models and generate music in a reasonable timeframe.
  5. Subjective Evaluation: Assessing the quality and artistic value of AI-generated music can be subjective. Different people may have different opinions on the aesthetics and emotional impact of the music, making it challenging to establish universal evaluation standards.

Conclusion

In this project, we embarked on the fascinating journey of generating music using AI. Our goal was to explore the capabilities of AI in music composition and unleash its potential in creating unique musical pieces. Through the implementation of AI models and deep learning techniques, we successfully generated music that closely resembled the style of the input MIDI files. The project showcased the ability of AI to assist and inspire in the creative process of music composition.

Key Takeaways

Here are some of the key takeaways from this project:

  1. We learned that AI can serve as a valuable assistant in the creative process, offering new perspectives and ideas for musicians and composers.
  2. The quality and diversity of the training dataset greatly influence the output of AI-generated music. Curating a well-rounded and varied dataset is crucial to achieving more original and diverse compositions.
  3. While AI-generated music shows promise, it cannot replace the artistic and emotional depth brought by human composers. The optimal approach is to leverage AI as a collaborative tool that complements human creativity.
  4. Exploring AI-generated music raises important ethical considerations, such as copyright and intellectual property rights. It is essential to respect these rights and foster a healthy and supportive environment for both AI and human artists.
  5. This project reinforced the significance of continuous learning in the field of AI-generated music. Staying updated with advancements and embracing new techniques enables us to push the boundaries of musical expression and innovation.

Frequently Asked Questions

Q1. How does AI create music?

A. AI creates music by understanding patterns and structures in a vast collection of music data. It learns how notes, chords, and rhythms are related and applies this understanding to generate new melodies, harmonies, and rhythms.

Q2. Can AI compose music in diverse styles?

A. Yes, AI can compose music in a wide range of styles. By training AI models on different styles of music, it can learn the distinct characteristics and elements of each style. This enables it to generate music that captures the essence of various styles like classical, jazz, rock, or electronic.

Q3. Does copyright protect AI-generated music?

A. AI-generated music can involve copyright complexities. Although AI algorithms create the music, the input data often includes copyrighted material. The legal protection and ownership of AI-generated music depend on the jurisdiction and specific situations. Proper attribution and knowledge of copyright laws are crucial when using or sharing AI-generated music.

Q4. Can you use AI music in business projects?

A. Yes, AI-created music can be used in business projects, but it’s important to consider copyright aspects. Certain AI models are trained on copyrighted music, which might necessitate acquiring appropriate licenses or permissions for commercial usage. Consulting legal experts or copyright specialists is advisable to ensure adherence to copyright laws.

Q5. Can AI-created music substitute human musicians?

A. AI-created music cannot completely replace human musicians. Although AI can compose music with impressive outcomes, it lacks the emotional depth, creativity, and interpretive skills of human musicians. AI serves as a valuable tool for inspiration and collaboration, but the unique artistry and expression of human musicians cannot be replicated.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Akshit Behera 16 Aug 2023

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Related Courses

image.name
0 Hrs 17 Lessons
4.96

Introduction to AI & ML

Free

  • [tta_listen_btn class="listen"]