MuseGAN

🧠 Overview

MuseGAN model generates polyphonic music of multiple tracks (instruments) using Generative Adversarial Networks (GANs). The models aims to generate 4 bars of multitrack coherent music from scratch for 5 instruments. We also aim to extend the model for Human-AI collaboration where 4 instrument tracks can be conditionally generated on the basis of one Human input track. Checkout the GitHub Repo

🎯 Objectives

Generate multi-track Pianorolls music consisting of 5 tracks (drums, bass, guitar, piano, strings).
Learn temporal and harmonic relationships across bars and tracks
The model is built on CGANs with Wasserstein Loss and Gradient Penalty

📂 Dataset

Lakh Pianoroll Dataset (LPD-5 Cleansed)

Derived from the Lakh MIDI Dataset (LMD) and contains over 60,0000 5-track multitrack pianorolls.
Piano Rolls are a matrix representation of music where the horizontal axis denotes Time-steps, the vertical axis denotes Pitches and a particular point on the matrix stores a Velocity value (loudness of note)

🧩 Model Structure

The whole MuseGAN model is primarily split into 2 parts - Multitrack and Temporal Models.

Multi-Track Model

This is further split into 3 types of models: Composer, Jamming and Hybrid models

Composer Model
It is responsible for creating a uniformity across instruments of all the tracks by using a single generator and a single discriminator.
Jamming Model
It is responsible for giving each instrument tracks its characteristic style by using 5 generators and discriminators for 5 tracks.
Hybrid Model
The Hybrid Model merges both composer and jamming model into one single model using a global vector Z and 5 track-dependent vectors Z_i

Temporal Model

This model is responsible for encoding bar-specific temporal encodings to the latent vectors. Temporal Model also has two types:

Generation From Sratch
A Temporal Generator (GTemp) is used when 5 coherent tracks are to be generated from scratch.
Conditional Generation
If a conditional track input is provided, A Temporal Encoder is used to encode the temporal characteristics of human-input track into the latent vectors.

Overall Structure

This incorporates both Temporal Generators and Bar Generators and consists of a Global Latent Vector, z, Global Temporal Vector, Z_t, Track Dependent Latent Vectors, Z_i, and Track Dependent Temporal Vectors, Z_it

💃 Model Versions & Outputs

Version 1

WhatsApp Image 2025-10-10 at 01 07 57_738d2936

Shared Temporal Generator: Takes two noise vectors and upscales them, this is the temporal context of music - to be given to Private Temporal Generator.
Private Temporal Generator: Takes combined vector containing two outputs from bar generator as well as other two directly passed to it and gives the content for one track.
Bar Generator: Takes the 5 bars generated from the Private Temporal Generator and combines them to form a pianoroll.

Outputs

Version 2

Model Diagram-1

Temporal Generator: Takes in the Global Latent Vector, Z, and Track-dependent Latent Vector, Z_i, and Generates the Temporal Latent Vectors, Z_t and Z_it
Bar Generator: Takes all the 4 Latent Vectors and generates the pianoroll Bar-by-Bar for every track.

Outputs

Outputs of this model 120 epochs

Checkpoint

Checkpoints after 120 epochs for this model can be found here

Conditional Generation

WhatsApp Image 2025-10-10 at 15 35 50_a77b8a0f

The Conditional Generator involves a Temporal Encoder which encodes Temporal structure of the input track in a latent vector and feeds it to the Generator Model of Version-1. Rest of the Generator remains same as the Version 1 Codes. The Generator, here, in all outputs 4 Tracks with similar Temporal Structure across bars as the input track.

Outputs

f5c4b6b6-e1ff-49a2-af0c-df16836ad01f

Outputs of Model After 25th Epoch

Checkpoint

The Checkpoints for this model after 25 epochs can be found here

🚂 How To Train The Model

Install the dependencies

pip install -r requirements
Go to the particular version folder you want to train and download the .ipynb file.
Run the Nbk locally or in JupyterLab Notebooks
To access the trained checkpoint for a particular model, check the README.md file in the particular Version's folder

🎼 Outputs

To access the output audio, check out the Audio folder under the version Folder

👏 Acknowledgement

Thanks to everyone at CoC and ProjectX for helping us in the progress of this project.
Special shoutout to our mentors Kavya Rambhia and Swayam Shah for their support and guidance throughout

Made By Pratyush Rao and Yashasvi Choudhary

MuseGAN

🧠 Overview

🎯 Objectives

📂 Dataset

Lakh Pianoroll Dataset (LPD-5 Cleansed)

🧩 Model Structure

Multi-Track Model

Composer Model

Jamming Model

Hybrid Model

Temporal Model

Generation From Sratch

Conditional Generation

Overall Structure

💃 Model Versions & Outputs

Version 1

Version 2

Conditional Generation

🚂 How To Train The Model

🎼 Outputs

👏 Acknowledgement