MuseGAN
🧠 Overview
MuseGAN model generates polyphonic music of multiple tracks (instruments) using Generative Adversarial Networks (GANs). The models aims to generate 4 bars of multitrack coherent music from scratch for 5 instruments. We also aim to extend the model for Human-AI collaboration where 4 instrument tracks can be conditionally generated on the basis of one Human input track. Checkout the GitHub Repo
🎯 Objectives
- Generate multi-track Pianorolls music consisting of 5 tracks (drums, bass, guitar, piano, strings).
- Learn temporal and harmonic relationships across bars and tracks
- The model is built on CGANs with Wasserstein Loss and Gradient Penalty
📂 Dataset
Lakh Pianoroll Dataset (LPD-5 Cleansed)
Derived from the Lakh MIDI Dataset (LMD) and contains over 60,0000 5-track multitrack pianorolls.
Piano Rolls are a matrix representation of music where the horizontal axis denotes Time-steps, the vertical axis denotes Pitches and a particular point on the matrix stores a Velocity value (loudness of note)
🧩 Model Structure
The whole MuseGAN model is primarily split into 2 parts - Multitrack and Temporal Models.
Multi-Track Model
This is further split into 3 types of models: Composer, Jamming and Hybrid models
-
Composer Model
It is responsible for creating a uniformity across instruments of all the tracks by using a single generator and a single discriminator. -
Jamming Model
It is responsible for giving each instrument tracks its characteristic style by using 5 generators and discriminators for 5 tracks. -
Hybrid Model
The Hybrid Model merges both composer and jamming model into one single model using a global vector Z and 5 track-dependent vectors Zi
Temporal Model
This model is responsible for encoding bar-specific temporal encodings to the latent vectors. Temporal Model also has two types:
-
Generation From Sratch
A Temporal Generator (GTemp) is used when 5 coherent tracks are to be generated from scratch. -
Conditional Generation
If a conditional track input is provided, A Temporal Encoder is used to encode the temporal characteristics of human-input track into the latent vectors.
Overall Structure
This incorporates both Temporal Generators and Bar Generators and consists of a Global Latent Vector, z, Global Temporal Vector, Zt, Track Dependent Latent Vectors, Zi, and Track Dependent Temporal Vectors, Zit
💃 Model Versions & Outputs
Version 1
- Shared Temporal Generator: Takes two noise vectors and upscales them, this is the temporal context of music - to be given to Private Temporal Generator.
- Private Temporal Generator: Takes combined vector containing two outputs from bar generator as well as other two directly passed to it and gives the content for one track.
- Bar Generator: Takes the 5 bars generated from the Private Temporal Generator and combines them to form a pianoroll.
Outputs
Version 2
-
Temporal Generator: Takes in the Global Latent Vector, Z, and Track-dependent Latent Vector, Zi, and Generates the Temporal Latent Vectors, Zt and Zit
-
Bar Generator: Takes all the 4 Latent Vectors and generates the pianoroll Bar-by-Bar for every track.
Outputs
Outputs of this model 120 epochs
Checkpoint
Checkpoints after 120 epochs for this model can be found here
Conditional Generation
The Conditional Generator involves a Temporal Encoder which encodes Temporal structure of the input track in a latent vector and feeds it to the Generator Model of Version-1. Rest of the Generator remains same as the Version 1 Codes. The Generator, here, in all outputs 4 Tracks with similar Temporal Structure across bars as the input track.
Outputs
Outputs of Model After 25th Epoch
Checkpoint
The Checkpoints for this model after 25 epochs can be found here
🚂 How To Train The Model
-
Install the dependencies
pip install -r requirements -
Go to the particular version folder you want to train and download the
.ipynbfile. - Run the Nbk locally or in JupyterLab Notebooks
- To access the trained checkpoint for a particular model, check the
README.mdfile in the particular Version's folder
🎼 Outputs
To access the output audio, check out the Audio folder under the version Folder
👏 Acknowledgement
- Thanks to everyone at CoC and ProjectX for helping us in the progress of this project.
- Special shoutout to our mentors Kavya Rambhia and Swayam Shah for their support and guidance throughout
Made By Pratyush Rao and Yashasvi Choudhary