automatic music classification - bangladesh university of...

Automatic Music Classification

Cory McKay

2/47

Introduction Many areas of research in music information retrieval

(MIR) involve using computers to classify music in various ways Genre or style classification Mood classification Performer or composer identification Music recommendation Playlist generation Hit prediction Audio to symbolic transcription etc.

Such areas often share similar central procedures

3/47

Fundamental music classification tasks (1/3)

Musical data collection The instances (basic entities)

to classify Audio recordings, scores,

cultural data, etc. Feature extraction

Features represent characteristic information about instances

Must provide sufficient information to segment instances among classes (categories)

Machine learning Algorithms (“classifiers” or

“learners”) learn to associate feature patterns of instances with their classes

Musical Data Collection

Basic Classification Tasks

Feature Extraction

Machine Learning

Classifications

Music

4/47


Many classification tasks require metadata about instances Title, composer, performer,

genre, date, etc. Must be validated and

corrected Raw information found in

ID3 tags, Gracenote CDDB, etc. often erroneous and inconsistent



Feature Extraction

Machine Learning

Metadata Analysis

Classifications

Music

5/47


Supervised learning requires training Correctly labeled model

instances (“ground truth”) are used to teach classifiers to associate certain feature patterns with desired classes

Trained classifiers can then classify novel instances

Success of classifiers is dependent on the quality of the ground truth It is therefore essential that

the metadata labeling of the musical data be accurate



Feature Extraction

Machine Learning

Metadata

Metadata Analysis

Classifications

Music

Classifier Training

6/47

Consolidating fundamental tasks Properly performing these tasks requires significant effort

and knowledge in (at least): Data mining Signal processing Musicology

Result: Naïve or improperly performed research Duplication of effort Reluctance to use automatic music classification in musicological or

other research where it could be useful Solution: standardized MIR research software

Makes automatic music classification technology available to researchers in many disciplines

7/47

Existing MIR software Only a few MIR software systems have been built for use

by other researchers e.g. Marsyas and M2K Tend to focus primarily on particular sub-tasks

e.g. audio feature extraction Not typically well integrated with other systems Do not sufficiently emphasize extensibility Typically have usability problems

Installation and licensing issues, poor documentation Result:

Emphasis on existing techniques rather than development of new approaches

Difficulties in integrating research between labs Inaccessible to non-technical music researchers

8/47

jMIR has been developed to meet the need for standardized MIR research software Has a separate software component to address each important

aspect of automatic music classification Each component can be used independently Combinations of components can be used as an integrated whole

Architectural emphasis on providing an extensible platform for iteratively developing new techniques and algorithms Can also be used directly as is

Interfaces designed for both technical and non-technical users Well-documented

Free and open source Cross-platform Java implementation

jMIR

9/47

Musical data collection



Feature Extraction

Machine Learning

Metadata

Metadata Analysis

Classifications

Music

Classifier Training

10/47

Types of musical data Audio recordings

Sampled sound Wave, MP3, AAC, etc.

Symbolic recordings Abstract musical

instructions Scores, MIDI, Humdrum,

etc. Cultural information

Information external to musical content itself e.g. playlists, album

reviews, Billboard stats, etc.

Based on web searches, surveys, expert opinion, etc.

Symbolic RecordingsMIDI, scores, Humdrum, etc.

Audio RecordingsMP3, AAC, Wave, etc.

Cultural InformationWeb, surveys,

experts, etc.


11/47

Connections between data types Automatic transcription

technologies are increasingly making it possible to automatically generate symbolic recordings from audio

Metadata annotations are necessary for linking cultural information with particular recordings




experts, etc.

Metadata

Transcription


12/47

jMIR Codaich A research database

of labeled MP3 recordings For use in training and

testing algorithms There are plans to

eventually include additional format types in Codaich Including symbolic

formats




experts, etc.

Metadata

Transcription

jMIR Codaich


13/47

Sharing Codaich Codaich is intended to provide a common knowledge

base that can be used by researchers in different labs to compare the effectiveness of their varying approaches

Overcoming copyright limitations on distributing music: On-demand Feature Extraction Network (OMEN)

Implemented by Daniel McEnnis

Researchers use distributed computing and the jMIR jAudio feature extractor to request local feature extraction at sites (e.g., libraries) that have legal access to individual recordings

jAudio and OMEN allow custom original features and extraction parameters

14/47

Statistics on Codaich 27 305 MP3 recordingsConstantly growing

2247 artists 55 genres Popular, classical, jazz and “world”

19 metadata fields

15/47

jMIR Bodhidharma MIDI Database Collection of labeled

MIDI recordings 950 recordings 38 genres




experts, etc.

Metadata

Transcription

jMIR Codaich

jMIR BodhidharmaMIDI Database


16/47

jMIR jMusicMetaManager Metadata found with

recordings is typically problematic Inconsistent Error-prone

jMusicMetaManager is software that automatically analyzes metadata across recordings

Is currently used to maintain Codaich There are plans to adapt it

to MIDI as well




experts, etc.

Metadata

Transcription

jMIR Codaich

jMIR jMusicMeta-

Manager



17/47

Tasks performed by jMusicMetaManager

Detects differing metadata values that should in fact be the same e.g. in an performer identification task, “Charlie

Mingus” should not be misclassified as a different performer than “Mingus, Charles”

Detects redundant copies of recordings Could contaminate test sets

Generates inventory and statistical profile reports 39 reports in all

18/47

How jMusicMetaManager works Calculates edit

distance between pairs of field values Threshold

based on field lengths

Performs 23 additional pre-processing equivalency operations

Considers varied word orderings and word subsets

Applies false error filtering

19/47

jMusicMetaManager’s I/O Parses metadata from

Apple iTunes XML or MP3 ID3 tagsAnd Gracenote CDDB,

indirectlyCan export to ACE

XML or Weka ARFF Generates reports in

frames-based HTML

20/47

Musical data collection summary




experts, etc.

Metadata

Transcription

jMIR Codaich

jMIR jMusicMeta-

Manager



21/47

Feature extraction



Feature Extraction

Machine Learning

Metadata

Metadata Analysis

Classifications

Music

Classifier Training

22/47

Types of features Low-level

Associated with signal processing and basic auditory perception

e.g. spectral flux or RMS Usually not intuitively

musical High-level

Musical abstractions e.g. meter or pitch class

distributions Cultural

Sociocultural information outside the scope of auditory or musical content

e.g. playlist co-occurrence or purchase correlations

Feature Extraction

Low-Level Features

High-Level Features

Cultural Features

23/47

jMIR jAudio Implemented jointly

with Daniel McEnnis Extracts features

from audio filesMP3, WAV, AIFF,

AU, SND 28 bundled core

featuresMainly low-level Some high-level

Audio Recordings

jMIRjAudio

Feature Extraction

Low-Level Features

High-Level Features

Cultural Features

ExtractedFeatureValues

24/47

Developing features with jAudio Two general ways of using jAudio

Directly as an audio feature extractor Platform for developing and sharing new features

Can be independent features Can be based on existing features

New features are added using a modular plugin interface jAudio (like all jMIR feature extractors) automatically

calculates feature dependencies and scheduling at runtime

25/47

Metafeatures and aggregators jAudio automatically calculates “metafeatures” of

new or existing features e.g. running means, standard deviations or

derivatives across sample windows jAudio automatically calculates “aggregators” for

new or existing features Functions that collapse a sequence of feature vectors

into a single vector or smaller sequence of vectors Useful for representing in a low-dimensional way how

different features change together e.g. the Area of Moments aggregator transforms a set

of feature vectors into a two-dimensional image matrix and calculates two-dimensional moments

26/47

Using jAudio Customizable

extraction parameters Window size and

overlap Normalization Downsampling Individual feature

parameters Records and

synthesizes audio Converts MIDI to

audio Displays audio in

both the time and frequency domains

27/47

jMIR jSymbolic Extracts high-

level features from MIDI files

111 bundled featuresCurrently being

expanded to 160Many are original

Symbolic Recordings

Audio Recordings

jMIRjAudio

jMIR jSymbolic

Feature Extraction

Low-Level Features

High-Level Features

Cultural Features


28/47

jSymbolic’s features Features fall into 7

broad categories Instrumentation Musical Texture Rhythm Dynamics Pitch Statistics Melody Chords

Histogram aggregators are often used Rhythm, pitch, pitch

class, melody, vertical interval and chord histograms

29/47

jMIR jWebMiner Extracts cultural

features from the web using web services Google Yahoo!

Calculates the coocurrence and cross tabulation of metadata fields e.g. how often does Bach

co-occur on a web page with Baroque, compared to Stravinsky?

Currently in alpha development

Symbolic Recordings

Audio Recordings

Cultural Information

jMIRjAudio

jMIR jSymbolic

jMIR jWebMiner

Feature Extraction

Low-Level Features

High-Level Features

Cultural Features


30/47

jWebMiner’s functionality Parses search

terms from: iTunes, ACE XML,

Weka ARFF, text Can assign higher

weights to particular sites e.g. All Music,

Wikipedia, Pitchfork, etc.

Can enforce filter words e.g. a site must

include the word “music” to be considered

31/47

Feature extraction summary

Symbolic Recordings

Audio Recordings

Cultural Information

jMIRjAudio

jMIR jSymbolic

jMIR jWebMiner

Feature Extraction

Low-Level Features

High-Level Features

Cultural Features


32/47

Machine learning



Feature Extraction

Machine Learning

Metadata

Metadata Analysis

Classifications

Music

Classifier Training

33/47

Some types of machine learning Supervised

Learners trained on model labeled instances

Unsupervised Examines

instances in terms of internal similarities rather than externally provided labels

Ensemble Multiple

classifiers work together

Hopefully perform better overall than individually

Supervised Algorithms

Machine Learning

Unsupervised Algorithms

EnsembleAlgorithms

34/47

Input to machine learning systems Extracted feature

values serve as the percepts of classifiers

Ground truth needed by supervised learners

A class ontology (structured set of relationships between classes) is sometimes used Some learners

can capitalize on structuring

Long-term goal is to allow arbitrary ontologies in jMIR


Machine LearningExtractedFeatures

Ground Truth


EnsembleAlgorithms

ClassOntology

35/47

Training and testing sets Data segmented

into training and testing sets if classifiers need to be trained To avoid

overtraining (failure to generalize training instance features to those of the general instance population)

Feature values are simply passed on if training is not needed



Ground Truth


EnsembleAlgorithms

ClassOntology

TrainingSets

TestingSets

Features to Classify OR

36/47

Dimensionality reduction algorithms Too many features

degrade classifier performance “Curse of

dimensionality” Too few features

can fail to encapsulate sufficient information

Dimensionality reduction algorithms automatically find a good lower-dimensional subset or projection of the given features



Ground Truth


EnsembleAlgorithms

Dimensionality Reduction Algorithms

ClassOntology

TrainingSets

TestingSets


37/47

Output of machine learning systems Classifications of

instances are output if no supervised training is needed

Metalearners can be used to choose appropriate classifier(s) Each algorithm

has its own strengths and weaknesses

Training output consists of evaluations of each algorithm as well as the trained classifiers



Ground Truth

ClassificationResults


EnsembleAlgorithms

Dimensionality Reduction Algorithms

ClassOntology

Algorithm Evaluations

TrainingSets

TestingSets


Trained ClassifiersOR

38/47

jMIR ACE ACE is jMIR’s

classifier and metalearner Automatically

experiments with and selects classifier(s)

Trains classifiers

Classifies novel instances



Ground Truth



EnsembleAlgorithms

jMIR ACE Dimensionality Reduction Algorithms

ClassOntology


TrainingSets

TestingSets



39/47

Algorithms experimented with by ACE Classifiers:

Induction trees, naive Bayes, k-nearest neighbour, neural networks, support vector machines

Classifier parameters are also varied automatically Dimensionality reduction:

Principal component analysis, exhaustive searches, feature selection using genetic algorithms

Classifier ensembles: Bagging, boosting

Additional algorithms will be added in the future: Including unsupervised learning algorithms

Researchers are encouraged to add their own algorithms ACE, like all jMIR components, emphasizes extensibility ACE utilizes the Weka general pattern recognition library

40/47

Details of ACE ACE evaluates algorithms in terms of

Classification accuracy Performance consistency Training complexity / time Classification complexity / time

There are future plans to utilize distributed computing to spread out the computational burdenWill also add the ability to impose limits on the time

available for the ACE metalearner to come up algorithm selections

41/47

ACE’s interface Command line Java API GUI In alpha

development

42/47

jMIR ACE XML files Allow jMIR

components to communicate with each other

Allow jMIR output to be used by other software To help ensure

interoperability, jMIR components also produce and parse Weka ARFF files



Ground Truth



EnsembleAlgorithms


jMIR ACE XML Files

ClassOntology


TrainingSets

TestingSets



43/47

Details of the ACE XML files Information stored in ACE XML files:

Feature values and information about features Model classifications and other metadata Class taxonomies

Will be expanded to general ontologies in the future

Advantages of ACE XML compared to general data mining file formats (e.g. Weka ARFF) Ability to assign multiple classes to individual instances Ability to classify both overall instances and their sub-sections Maintenance of logical groupings of multi-dimensional features Maintenance of internal identifying metadata about instances Ability to represent taxonomical class structures

44/47

Machine learning summary



Ground Truth



EnsembleAlgorithms


jMIR ACE XML Files

ClassOntology


TrainingSets

TestingSets



45/47

Overview of jMIR

jAudio jSymbolic jWebMiner

jMIR and its Components

Codaich

jMusicMetaManager

Bodhidharma

Audio Music Symbolic Music Internet

ACE XML Files

ACE

ClassificationOutput





Feature Extraction

Machine Learning

Metadata

Metadata Analysis

Classifications

Music

Classifier Training

46/47

Goals of jMIR Make sophisticated pattern recognition technologies

accessible to music researchers with both technical and non-technical backgrounds

Increase cooperation between research groups Enable objective comparisons of algorithms Eliminate redundant duplication of effort Facilitate iterative development and sharing of new MIR

technologies Facilitate research combining all 3 feature types

Limited intersection of information encapsulated by each type Significant potential to improve classification performance

47/47

Contact information Software available at:

http://sourceforge.net/projects/jmir e-mail:

[email protected]

automatic music classification - bangladesh university of...

Documents