automatic music classification - bangladesh university of...
TRANSCRIPT
Automatic Music Classification
Cory McKay
2/47
Introduction Many areas of research in music information retrieval
(MIR) involve using computers to classify music in various ways Genre or style classification Mood classification Performer or composer identification Music recommendation Playlist generation Hit prediction Audio to symbolic transcription etc.
Such areas often share similar central procedures
3/47
Fundamental music classification tasks (1/3)
Musical data collection The instances (basic entities)
to classify Audio recordings, scores,
cultural data, etc. Feature extraction
Features represent characteristic information about instances
Must provide sufficient information to segment instances among classes (categories)
Machine learning Algorithms (“classifiers” or
“learners”) learn to associate feature patterns of instances with their classes
Musical Data Collection
Basic Classification Tasks
Feature Extraction
Machine Learning
Classifications
Music
4/47
Fundamental music classification tasks (2/3)
Many classification tasks require metadata about instances Title, composer, performer,
genre, date, etc. Must be validated and
corrected Raw information found in
ID3 tags, Gracenote CDDB, etc. often erroneous and inconsistent
Musical Data Collection
Basic Classification Tasks
Feature Extraction
Machine Learning
Metadata Analysis
Classifications
Music
5/47
Fundamental music classification tasks (3/3)
Supervised learning requires training Correctly labeled model
instances (“ground truth”) are used to teach classifiers to associate certain feature patterns with desired classes
Trained classifiers can then classify novel instances
Success of classifiers is dependent on the quality of the ground truth It is therefore essential that
the metadata labeling of the musical data be accurate
Musical Data Collection
Basic Classification Tasks
Feature Extraction
Machine Learning
Metadata
Metadata Analysis
Classifications
Music
Classifier Training
6/47
Consolidating fundamental tasks Properly performing these tasks requires significant effort
and knowledge in (at least): Data mining Signal processing Musicology
Result: Naïve or improperly performed research Duplication of effort Reluctance to use automatic music classification in musicological or
other research where it could be useful Solution: standardized MIR research software
Makes automatic music classification technology available to researchers in many disciplines
7/47
Existing MIR software Only a few MIR software systems have been built for use
by other researchers e.g. Marsyas and M2K Tend to focus primarily on particular sub-tasks
e.g. audio feature extraction Not typically well integrated with other systems Do not sufficiently emphasize extensibility Typically have usability problems
Installation and licensing issues, poor documentation Result:
Emphasis on existing techniques rather than development of new approaches
Difficulties in integrating research between labs Inaccessible to non-technical music researchers
8/47
jMIR has been developed to meet the need for standardized MIR research software Has a separate software component to address each important
aspect of automatic music classification Each component can be used independently Combinations of components can be used as an integrated whole
Architectural emphasis on providing an extensible platform for iteratively developing new techniques and algorithms Can also be used directly as is
Interfaces designed for both technical and non-technical users Well-documented
Free and open source Cross-platform Java implementation
jMIR
9/47
Musical data collection
Musical Data Collection
Basic Classification Tasks
Feature Extraction
Machine Learning
Metadata
Metadata Analysis
Classifications
Music
Classifier Training
10/47
Types of musical data Audio recordings
Sampled sound Wave, MP3, AAC, etc.
Symbolic recordings Abstract musical
instructions Scores, MIDI, Humdrum,
etc. Cultural information
Information external to musical content itself e.g. playlists, album
reviews, Billboard stats, etc.
Based on web searches, surveys, expert opinion, etc.
Symbolic RecordingsMIDI, scores, Humdrum, etc.
Audio RecordingsMP3, AAC, Wave, etc.
Cultural InformationWeb, surveys,
experts, etc.
Musical Data Collection
11/47
Connections between data types Automatic transcription
technologies are increasingly making it possible to automatically generate symbolic recordings from audio
Metadata annotations are necessary for linking cultural information with particular recordings
Symbolic RecordingsMIDI, scores, Humdrum, etc.
Audio RecordingsMP3, AAC, Wave, etc.
Cultural InformationWeb, surveys,
experts, etc.
Metadata
Transcription
Musical Data Collection
12/47
jMIR Codaich A research database
of labeled MP3 recordings For use in training and
testing algorithms There are plans to
eventually include additional format types in Codaich Including symbolic
formats
Symbolic RecordingsMIDI, scores, Humdrum, etc.
Audio RecordingsMP3, AAC, Wave, etc.
Cultural InformationWeb, surveys,
experts, etc.
Metadata
Transcription
jMIR Codaich
Musical Data Collection
13/47
Sharing Codaich Codaich is intended to provide a common knowledge
base that can be used by researchers in different labs to compare the effectiveness of their varying approaches
Overcoming copyright limitations on distributing music: On-demand Feature Extraction Network (OMEN)
Implemented by Daniel McEnnis
Researchers use distributed computing and the jMIR jAudio feature extractor to request local feature extraction at sites (e.g., libraries) that have legal access to individual recordings
jAudio and OMEN allow custom original features and extraction parameters
14/47
Statistics on Codaich 27 305 MP3 recordingsConstantly growing
2247 artists 55 genres Popular, classical, jazz and “world”
19 metadata fields
15/47
jMIR Bodhidharma MIDI Database Collection of labeled
MIDI recordings 950 recordings 38 genres
Symbolic RecordingsMIDI, scores, Humdrum, etc.
Audio RecordingsMP3, AAC, Wave, etc.
Cultural InformationWeb, surveys,
experts, etc.
Metadata
Transcription
jMIR Codaich
jMIR BodhidharmaMIDI Database
Musical Data Collection
16/47
jMIR jMusicMetaManager Metadata found with
recordings is typically problematic Inconsistent Error-prone
jMusicMetaManager is software that automatically analyzes metadata across recordings
Is currently used to maintain Codaich There are plans to adapt it
to MIDI as well
Symbolic RecordingsMIDI, scores, Humdrum, etc.
Audio RecordingsMP3, AAC, Wave, etc.
Cultural InformationWeb, surveys,
experts, etc.
Metadata
Transcription
jMIR Codaich
jMIR jMusicMeta-
Manager
jMIR BodhidharmaMIDI Database
Musical Data Collection
17/47
Tasks performed by jMusicMetaManager
Detects differing metadata values that should in fact be the same e.g. in an performer identification task, “Charlie
Mingus” should not be misclassified as a different performer than “Mingus, Charles”
Detects redundant copies of recordings Could contaminate test sets
Generates inventory and statistical profile reports 39 reports in all
18/47
How jMusicMetaManager works Calculates edit
distance between pairs of field values Threshold
based on field lengths
Performs 23 additional pre-processing equivalency operations
Considers varied word orderings and word subsets
Applies false error filtering
19/47
jMusicMetaManager’s I/O Parses metadata from
Apple iTunes XML or MP3 ID3 tagsAnd Gracenote CDDB,
indirectlyCan export to ACE
XML or Weka ARFF Generates reports in
frames-based HTML
20/47
Musical data collection summary
Symbolic RecordingsMIDI, scores, Humdrum, etc.
Audio RecordingsMP3, AAC, Wave, etc.
Cultural InformationWeb, surveys,
experts, etc.
Metadata
Transcription
jMIR Codaich
jMIR jMusicMeta-
Manager
jMIR BodhidharmaMIDI Database
Musical Data Collection
21/47
Feature extraction
Musical Data Collection
Basic Classification Tasks
Feature Extraction
Machine Learning
Metadata
Metadata Analysis
Classifications
Music
Classifier Training
22/47
Types of features Low-level
Associated with signal processing and basic auditory perception
e.g. spectral flux or RMS Usually not intuitively
musical High-level
Musical abstractions e.g. meter or pitch class
distributions Cultural
Sociocultural information outside the scope of auditory or musical content
e.g. playlist co-occurrence or purchase correlations
Feature Extraction
Low-Level Features
High-Level Features
Cultural Features
23/47
jMIR jAudio Implemented jointly
with Daniel McEnnis Extracts features
from audio filesMP3, WAV, AIFF,
AU, SND 28 bundled core
featuresMainly low-level Some high-level
Audio Recordings
jMIRjAudio
Feature Extraction
Low-Level Features
High-Level Features
Cultural Features
ExtractedFeatureValues
24/47
Developing features with jAudio Two general ways of using jAudio
Directly as an audio feature extractor Platform for developing and sharing new features
Can be independent features Can be based on existing features
New features are added using a modular plugin interface jAudio (like all jMIR feature extractors) automatically
calculates feature dependencies and scheduling at runtime
25/47
Metafeatures and aggregators jAudio automatically calculates “metafeatures” of
new or existing features e.g. running means, standard deviations or
derivatives across sample windows jAudio automatically calculates “aggregators” for
new or existing features Functions that collapse a sequence of feature vectors
into a single vector or smaller sequence of vectors Useful for representing in a low-dimensional way how
different features change together e.g. the Area of Moments aggregator transforms a set
of feature vectors into a two-dimensional image matrix and calculates two-dimensional moments
26/47
Using jAudio Customizable
extraction parameters Window size and
overlap Normalization Downsampling Individual feature
parameters Records and
synthesizes audio Converts MIDI to
audio Displays audio in
both the time and frequency domains
27/47
jMIR jSymbolic Extracts high-
level features from MIDI files
111 bundled featuresCurrently being
expanded to 160Many are original
Symbolic Recordings
Audio Recordings
jMIRjAudio
jMIR jSymbolic
Feature Extraction
Low-Level Features
High-Level Features
Cultural Features
ExtractedFeatureValues
28/47
jSymbolic’s features Features fall into 7
broad categories Instrumentation Musical Texture Rhythm Dynamics Pitch Statistics Melody Chords
Histogram aggregators are often used Rhythm, pitch, pitch
class, melody, vertical interval and chord histograms
29/47
jMIR jWebMiner Extracts cultural
features from the web using web services Google Yahoo!
Calculates the coocurrence and cross tabulation of metadata fields e.g. how often does Bach
co-occur on a web page with Baroque, compared to Stravinsky?
Currently in alpha development
Symbolic Recordings
Audio Recordings
Cultural Information
jMIRjAudio
jMIR jSymbolic
jMIR jWebMiner
Feature Extraction
Low-Level Features
High-Level Features
Cultural Features
ExtractedFeatureValues
30/47
jWebMiner’s functionality Parses search
terms from: iTunes, ACE XML,
Weka ARFF, text Can assign higher
weights to particular sites e.g. All Music,
Wikipedia, Pitchfork, etc.
Can enforce filter words e.g. a site must
include the word “music” to be considered
31/47
Feature extraction summary
Symbolic Recordings
Audio Recordings
Cultural Information
jMIRjAudio
jMIR jSymbolic
jMIR jWebMiner
Feature Extraction
Low-Level Features
High-Level Features
Cultural Features
ExtractedFeatureValues
32/47
Machine learning
Musical Data Collection
Basic Classification Tasks
Feature Extraction
Machine Learning
Metadata
Metadata Analysis
Classifications
Music
Classifier Training
33/47
Some types of machine learning Supervised
Learners trained on model labeled instances
Unsupervised Examines
instances in terms of internal similarities rather than externally provided labels
Ensemble Multiple
classifiers work together
Hopefully perform better overall than individually
Supervised Algorithms
Machine Learning
Unsupervised Algorithms
EnsembleAlgorithms
34/47
Input to machine learning systems Extracted feature
values serve as the percepts of classifiers
Ground truth needed by supervised learners
A class ontology (structured set of relationships between classes) is sometimes used Some learners
can capitalize on structuring
Long-term goal is to allow arbitrary ontologies in jMIR
Supervised Algorithms
Machine LearningExtractedFeatures
Ground Truth
Unsupervised Algorithms
EnsembleAlgorithms
ClassOntology
35/47
Training and testing sets Data segmented
into training and testing sets if classifiers need to be trained To avoid
overtraining (failure to generalize training instance features to those of the general instance population)
Feature values are simply passed on if training is not needed
Supervised Algorithms
Machine LearningExtractedFeatures
Ground Truth
Unsupervised Algorithms
EnsembleAlgorithms
ClassOntology
TrainingSets
TestingSets
Features to Classify OR
36/47
Dimensionality reduction algorithms Too many features
degrade classifier performance “Curse of
dimensionality” Too few features
can fail to encapsulate sufficient information
Dimensionality reduction algorithms automatically find a good lower-dimensional subset or projection of the given features
Supervised Algorithms
Machine LearningExtractedFeatures
Ground Truth
Unsupervised Algorithms
EnsembleAlgorithms
Dimensionality Reduction Algorithms
ClassOntology
TrainingSets
TestingSets
Features to Classify OR
37/47
Output of machine learning systems Classifications of
instances are output if no supervised training is needed
Metalearners can be used to choose appropriate classifier(s) Each algorithm
has its own strengths and weaknesses
Training output consists of evaluations of each algorithm as well as the trained classifiers
Supervised Algorithms
Machine LearningExtractedFeatures
Ground Truth
ClassificationResults
Unsupervised Algorithms
EnsembleAlgorithms
Dimensionality Reduction Algorithms
ClassOntology
Algorithm Evaluations
TrainingSets
TestingSets
Features to Classify OR
Trained ClassifiersOR
38/47
jMIR ACE ACE is jMIR’s
classifier and metalearner Automatically
experiments with and selects classifier(s)
Trains classifiers
Classifies novel instances
Supervised Algorithms
Machine LearningExtractedFeatures
Ground Truth
ClassificationResults
Unsupervised Algorithms
EnsembleAlgorithms
jMIR ACE Dimensionality Reduction Algorithms
ClassOntology
Algorithm Evaluations
TrainingSets
TestingSets
Features to Classify OR
Trained ClassifiersOR
39/47
Algorithms experimented with by ACE Classifiers:
Induction trees, naive Bayes, k-nearest neighbour, neural networks, support vector machines
Classifier parameters are also varied automatically Dimensionality reduction:
Principal component analysis, exhaustive searches, feature selection using genetic algorithms
Classifier ensembles: Bagging, boosting
Additional algorithms will be added in the future: Including unsupervised learning algorithms
Researchers are encouraged to add their own algorithms ACE, like all jMIR components, emphasizes extensibility ACE utilizes the Weka general pattern recognition library
40/47
Details of ACE ACE evaluates algorithms in terms of
Classification accuracy Performance consistency Training complexity / time Classification complexity / time
There are future plans to utilize distributed computing to spread out the computational burdenWill also add the ability to impose limits on the time
available for the ACE metalearner to come up algorithm selections
41/47
ACE’s interface Command line Java API GUI In alpha
development
42/47
jMIR ACE XML files Allow jMIR
components to communicate with each other
Allow jMIR output to be used by other software To help ensure
interoperability, jMIR components also produce and parse Weka ARFF files
Supervised Algorithms
Machine LearningExtractedFeatures
Ground Truth
ClassificationResults
Unsupervised Algorithms
EnsembleAlgorithms
jMIR ACE Dimensionality Reduction Algorithms
jMIR ACE XML Files
ClassOntology
Algorithm Evaluations
TrainingSets
TestingSets
Features to Classify OR
Trained ClassifiersOR
43/47
Details of the ACE XML files Information stored in ACE XML files:
Feature values and information about features Model classifications and other metadata Class taxonomies
Will be expanded to general ontologies in the future
Advantages of ACE XML compared to general data mining file formats (e.g. Weka ARFF) Ability to assign multiple classes to individual instances Ability to classify both overall instances and their sub-sections Maintenance of logical groupings of multi-dimensional features Maintenance of internal identifying metadata about instances Ability to represent taxonomical class structures
44/47
Machine learning summary
Supervised Algorithms
Machine LearningExtractedFeatures
Ground Truth
ClassificationResults
Unsupervised Algorithms
EnsembleAlgorithms
jMIR ACE Dimensionality Reduction Algorithms
jMIR ACE XML Files
ClassOntology
Algorithm Evaluations
TrainingSets
TestingSets
Features to Classify OR
Trained ClassifiersOR
45/47
Overview of jMIR
jAudio jSymbolic jWebMiner
jMIR and its Components
Codaich
jMusicMetaManager
Bodhidharma
Audio Music Symbolic Music Internet
ACE XML Files
ACE
ClassificationOutput
Algorithm Evaluations
Trained ClassifiersOR
Musical Data Collection
Basic Classification Tasks
Feature Extraction
Machine Learning
Metadata
Metadata Analysis
Classifications
Music
Classifier Training
46/47
Goals of jMIR Make sophisticated pattern recognition technologies
accessible to music researchers with both technical and non-technical backgrounds
Increase cooperation between research groups Enable objective comparisons of algorithms Eliminate redundant duplication of effort Facilitate iterative development and sharing of new MIR
technologies Facilitate research combining all 3 feature types
Limited intersection of information encapsulated by each type Significant potential to improve classification performance
47/47
Contact information Software available at:
http://sourceforge.net/projects/jmir e-mail: