Genre Identifier

Methodology

Random Forest Implementation

We implemented a random forest model using the sklearn library in Python, specifically the RandomForestClassifier function. The classifier builds a collection of decision trees and trains each one on a random subset of the data rows and features. The final prediction of the random forest is made by aggregating the predictions of all the individual trees through a majority vote.

GTZAN Dataset

We used the GTZAN dataset for our model, which contains 1000 audio clips of 10 different genres. Each audio clip is 30 seconds long and has been preprocessed to extract 59 features, including chroma features, spectral centroids, harmony features, and MFCCs. However, the GTZAN dataset has several known issues, including repeated audio files. To address this, we also created a modified version of the dataset that removed the 51 repeated audio files specified in previous research.

The following table shows the distribution of genres that were removed:

Jazz	Reggae	Metal	Pop	Disco	Hip-hop	Rock
13	11	9	9	6	2	1

Training and Testing

We trained our model on 80% of the dataset and tested it on the remaining 20%, ensuring that the two sets were mutually exclusive using the split and train function from sklearn. Using this trained model, we made genre predictions on the test set using the predict function from sklearn.

The following tables show the distribution of samples per genre in the train and test sets:

Set	Blues	Classical	Country	Jazz	Reggae	Metal	Pop	Disco	Hip-hop	Rock
Train	80	87	73	65	66	84	78	73	83	78
Test	20	13	27	22	23	25	13	21	15	21

Success Metrics

We then analyzed the results using several metrics, including overall accuracy values, a confusion matrix, and feature importance.

Accuracy:
To determine the success rate of our model, we compared the predicted labels to the actual labels using the accuracy_score function from sklearn. The accuracy_score function simply calculates the proportion of correct predictions to the total predictions made by the model, giving us a clear measure of how well our model is performing in classifying the genres of the audio clips.
Confusion Matrix:
To further analyze the performance of our model, we created a confusion matrix using the confusion_matrix function from sklearn. A confusion matrix is a table that helps visualize where the model is making correct and incorrect predictions. The rows represent the actual genres, while the columns represent the predicted genres. Each cell in the matrix contains the count of predictions for that specific combination. Using the confusion matrix, we can identify which genres are being confused with others and gain information into the strengths and weaknesses of our model in classifying different genres.
Feature Importance:
To determine which features were most important in our model, we used the feature_importances_ attribute on the trained random forest model. It calculates the importance of each feature based on how much it contributed to reducing impurity in decision trees. When a feature is used to split a node, the error reductions as a result of the split are multiplied by the number of samples directed to that node. These values are summed across all trees and normalized, which means that feature importances are relative to each other. The resulting score per feature indicate which ones had the greatest impact on the model's predictions.

Results

Original Dataset Evaluation

Accuracy Score: 75%

Feature Importance

Confusion Matrix

The random forest model trained on the original dataset had an accuracy of 75%. The confusion matrix shows that several genres, such as classical and jazz, are classified with relatively high accuracy, whereas genres like pop and rock show more frequent misclassifications The model is better at distinguishing between certain genres than others, potentially due to similar acoustic features that genres share, making it more difficult to differentiate between them. For example, classical music has a 100% classification rate, potentially due to its unique lack of percussive instrumentation. On the other hand, rock had the worst accuracy (10/21, or 48%), often misidentified as country or metal. Additionally, as this was trained on the full dataset, it may also be the effect of duplicate files in the training and testing sets, meaning the model may have overfitted to these repeated samples.

The feature importance graph shows that perceptual variance, chroma stft mean, and ms mean were among some of the most important features, showcasing that harmonic and melodic characteristics were integral in differentiation.

Modified Dataset Evaluation

Accuracy Score: 78.42%

Feature Importance

Confusion Matrix

The random forest model trained on the modified dataset improved accuracy from 75% to 78.42%. The confusion matrix shows that the model improved significantly at classifying pop (after 9 files were removed), and slightly improved with classical, blues, and country (all of which were not duplicated genres). On the other hand, it shows that jazz accuracy decreased significantly (from 91% to 45%), with 13 duplicates removed, and reggae decreased slightly, with 11 duplicates removed. This suggests that the model was incorrectly inflating the accuracy of jazz and reggae by overfitting to the duplicated audio files, and after removal, the model is able to better predict other genres. The feature importance graph shows that perceptual variance, chroma stft mean, and ms mean were still among some of the most important features.

Try It Out!

Disclaimer: The analysis might take a few moments to load, due to hosting constraints.

Pick a Genre

Audio Sample

A file with the specified genre was picked randomly from a subset of our test data..

File Name: —

Audio Player:

Results

Predicted Genre: —

What our random forest model classified this clip as, trained on the modified dataset.

Actual Genre: —

The actual label from the GTZAN dataset.

Low Baseline: —

A baseline that always predicts the most common genre in the training set, used to contextualize our model's performance.

High Baseline: —

This baseline was taken from an online genre identifier that has an accuracy of 84%.

Classify Genre of Music Using Random Forests

Motivation

Project Overview

Methodology

Results

Try It Out!