intelligent audio systems: a review of the foundations and ... · pdf fileremixes with echo...
TRANSCRIPT
Jay LeBoeuf jayatizotopecom
Leigh Smith
lsmithatizotopecom
Steve Tjoa kiemyanggmailcom
June 2013
Intelligent Audio Systems A review of the foundations and applications of
semantic audio analysis and music information retrieval
DAY 1
bull httpsccrmastanfordeduwikiMIR_workshop_2013
bull Getting setup at CCRMA ndash Parking Tresidder ($12day) CCRMA ($12day daily permits) ndash Building access logins laptops ndash Wifi (Lecturers can log you in as guest)
bull Online access + lectures video materials
bull Daily schedule ndash 9930 ndash Contact us if you know yoursquoll be late
bull Introductions
ndash Our background ndash A little about yourself ndash Your area of interest background with DSP coding Matlab and any specific
items of interest that yoursquod like to see covered ndash Will you be using your own laptop for the lab
bull If so do you have Matlab (MacPC)
Administration
bull Cornell University Electrical Engineering bull Stanford University CCRMA (MAMST) bull DigidesignAvid
ndash ATG Enter ML and MIR
bull Imagine Research ndash Pandora Line 6 Lucasfilm iZotope ndash NSF SBIR Phase I II IIB ndash Businessweek Innovator November 2012
bull CCRMA MIR Workshop 2008-2013 (6th Year) bull iZotope
Introduction Jay LeBoeuf
Example Seedhellip
Problems
1 Computers are deaf 2 Content is overwhelming and unsearchable
Problems
1 Computers are deaf 2 Content is overwhelming and unsearchable
Problems
Text metadata only (Name Creator User Tags)
The state of the art is ldquosearch by textrdquo
Solution Add ears to computers
Male Speech
Applause Baby Cry
Music (Rock)
Sound signature Smart chapter markers
Female Speech
Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD
Machine learning
Content analysis
Trends
1 Ubiquitous Content
2 Machine Learning
3 Cloud Computing
Find me sound effects that sound like this
Discover and interact with content
Audio Search similarity search find similar
Male Speech
Applause Baby Cry
Music (Rock)
Sound signature Smart chapter markers
Female Speech
Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD
Machine learning Content analysis
Metadata (XML JSON)
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
ldquodistorted guitarrdquo
Discover and interact with content
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
Find me songs that sound like this
find similar
Discover and interact with content
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
James Brown ndash The Payback
Original
Groove
Remixes
Graphic Clint Bajakian
Discover and interact with content
Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox
content-based querying and retrieval indexing (tagging similarity)
fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization
Why MIR
Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs
Commercial Applications
This weekhellip
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ
This weekhellip
Day 1
Day 2
Day 3
Day 4
Day 5
Questions Concerns
INTRO BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
TIMING AND SEGMENTATION
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
Timing and Segmentation
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
bull httpsccrmastanfordeduwikiMIR_workshop_2013
bull Getting setup at CCRMA ndash Parking Tresidder ($12day) CCRMA ($12day daily permits) ndash Building access logins laptops ndash Wifi (Lecturers can log you in as guest)
bull Online access + lectures video materials
bull Daily schedule ndash 9930 ndash Contact us if you know yoursquoll be late
bull Introductions
ndash Our background ndash A little about yourself ndash Your area of interest background with DSP coding Matlab and any specific
items of interest that yoursquod like to see covered ndash Will you be using your own laptop for the lab
bull If so do you have Matlab (MacPC)
Administration
bull Cornell University Electrical Engineering bull Stanford University CCRMA (MAMST) bull DigidesignAvid
ndash ATG Enter ML and MIR
bull Imagine Research ndash Pandora Line 6 Lucasfilm iZotope ndash NSF SBIR Phase I II IIB ndash Businessweek Innovator November 2012
bull CCRMA MIR Workshop 2008-2013 (6th Year) bull iZotope
Introduction Jay LeBoeuf
Example Seedhellip
Problems
1 Computers are deaf 2 Content is overwhelming and unsearchable
Problems
1 Computers are deaf 2 Content is overwhelming and unsearchable
Problems
Text metadata only (Name Creator User Tags)
The state of the art is ldquosearch by textrdquo
Solution Add ears to computers
Male Speech
Applause Baby Cry
Music (Rock)
Sound signature Smart chapter markers
Female Speech
Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD
Machine learning
Content analysis
Trends
1 Ubiquitous Content
2 Machine Learning
3 Cloud Computing
Find me sound effects that sound like this
Discover and interact with content
Audio Search similarity search find similar
Male Speech
Applause Baby Cry
Music (Rock)
Sound signature Smart chapter markers
Female Speech
Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD
Machine learning Content analysis
Metadata (XML JSON)
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
ldquodistorted guitarrdquo
Discover and interact with content
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
Find me songs that sound like this
find similar
Discover and interact with content
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
James Brown ndash The Payback
Original
Groove
Remixes
Graphic Clint Bajakian
Discover and interact with content
Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox
content-based querying and retrieval indexing (tagging similarity)
fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization
Why MIR
Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs
Commercial Applications
This weekhellip
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ
This weekhellip
Day 1
Day 2
Day 3
Day 4
Day 5
Questions Concerns
INTRO BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
TIMING AND SEGMENTATION
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
Timing and Segmentation
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
bull Cornell University Electrical Engineering bull Stanford University CCRMA (MAMST) bull DigidesignAvid
ndash ATG Enter ML and MIR
bull Imagine Research ndash Pandora Line 6 Lucasfilm iZotope ndash NSF SBIR Phase I II IIB ndash Businessweek Innovator November 2012
bull CCRMA MIR Workshop 2008-2013 (6th Year) bull iZotope
Introduction Jay LeBoeuf
Example Seedhellip
Problems
1 Computers are deaf 2 Content is overwhelming and unsearchable
Problems
1 Computers are deaf 2 Content is overwhelming and unsearchable
Problems
Text metadata only (Name Creator User Tags)
The state of the art is ldquosearch by textrdquo
Solution Add ears to computers
Male Speech
Applause Baby Cry
Music (Rock)
Sound signature Smart chapter markers
Female Speech
Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD
Machine learning
Content analysis
Trends
1 Ubiquitous Content
2 Machine Learning
3 Cloud Computing
Find me sound effects that sound like this
Discover and interact with content
Audio Search similarity search find similar
Male Speech
Applause Baby Cry
Music (Rock)
Sound signature Smart chapter markers
Female Speech
Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD
Machine learning Content analysis
Metadata (XML JSON)
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
ldquodistorted guitarrdquo
Discover and interact with content
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
Find me songs that sound like this
find similar
Discover and interact with content
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
James Brown ndash The Payback
Original
Groove
Remixes
Graphic Clint Bajakian
Discover and interact with content
Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox
content-based querying and retrieval indexing (tagging similarity)
fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization
Why MIR
Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs
Commercial Applications
This weekhellip
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ
This weekhellip
Day 1
Day 2
Day 3
Day 4
Day 5
Questions Concerns
INTRO BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
TIMING AND SEGMENTATION
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
Timing and Segmentation
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
Example Seedhellip
Problems
1 Computers are deaf 2 Content is overwhelming and unsearchable
Problems
1 Computers are deaf 2 Content is overwhelming and unsearchable
Problems
Text metadata only (Name Creator User Tags)
The state of the art is ldquosearch by textrdquo
Solution Add ears to computers
Male Speech
Applause Baby Cry
Music (Rock)
Sound signature Smart chapter markers
Female Speech
Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD
Machine learning
Content analysis
Trends
1 Ubiquitous Content
2 Machine Learning
3 Cloud Computing
Find me sound effects that sound like this
Discover and interact with content
Audio Search similarity search find similar
Male Speech
Applause Baby Cry
Music (Rock)
Sound signature Smart chapter markers
Female Speech
Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD
Machine learning Content analysis
Metadata (XML JSON)
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
ldquodistorted guitarrdquo
Discover and interact with content
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
Find me songs that sound like this
find similar
Discover and interact with content
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
James Brown ndash The Payback
Original
Groove
Remixes
Graphic Clint Bajakian
Discover and interact with content
Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox
content-based querying and retrieval indexing (tagging similarity)
fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization
Why MIR
Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs
Commercial Applications
This weekhellip
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ
This weekhellip
Day 1
Day 2
Day 3
Day 4
Day 5
Questions Concerns
INTRO BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
TIMING AND SEGMENTATION
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
Timing and Segmentation
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
Problems
1 Computers are deaf 2 Content is overwhelming and unsearchable
Problems
1 Computers are deaf 2 Content is overwhelming and unsearchable
Problems
Text metadata only (Name Creator User Tags)
The state of the art is ldquosearch by textrdquo
Solution Add ears to computers
Male Speech
Applause Baby Cry
Music (Rock)
Sound signature Smart chapter markers
Female Speech
Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD
Machine learning
Content analysis
Trends
1 Ubiquitous Content
2 Machine Learning
3 Cloud Computing
Find me sound effects that sound like this
Discover and interact with content
Audio Search similarity search find similar
Male Speech
Applause Baby Cry
Music (Rock)
Sound signature Smart chapter markers
Female Speech
Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD
Machine learning Content analysis
Metadata (XML JSON)
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
ldquodistorted guitarrdquo
Discover and interact with content
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
Find me songs that sound like this
find similar
Discover and interact with content
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
James Brown ndash The Payback
Original
Groove
Remixes
Graphic Clint Bajakian
Discover and interact with content
Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox
content-based querying and retrieval indexing (tagging similarity)
fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization
Why MIR
Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs
Commercial Applications
This weekhellip
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ
This weekhellip
Day 1
Day 2
Day 3
Day 4
Day 5
Questions Concerns
INTRO BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
TIMING AND SEGMENTATION
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
Timing and Segmentation
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
1 Computers are deaf 2 Content is overwhelming and unsearchable
Problems
1 Computers are deaf 2 Content is overwhelming and unsearchable
Problems
Text metadata only (Name Creator User Tags)
The state of the art is ldquosearch by textrdquo
Solution Add ears to computers
Male Speech
Applause Baby Cry
Music (Rock)
Sound signature Smart chapter markers
Female Speech
Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD
Machine learning
Content analysis
Trends
1 Ubiquitous Content
2 Machine Learning
3 Cloud Computing
Find me sound effects that sound like this
Discover and interact with content
Audio Search similarity search find similar
Male Speech
Applause Baby Cry
Music (Rock)
Sound signature Smart chapter markers
Female Speech
Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD
Machine learning Content analysis
Metadata (XML JSON)
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
ldquodistorted guitarrdquo
Discover and interact with content
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
Find me songs that sound like this
find similar
Discover and interact with content
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
James Brown ndash The Payback
Original
Groove
Remixes
Graphic Clint Bajakian
Discover and interact with content
Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox
content-based querying and retrieval indexing (tagging similarity)
fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization
Why MIR
Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs
Commercial Applications
This weekhellip
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ
This weekhellip
Day 1
Day 2
Day 3
Day 4
Day 5
Questions Concerns
INTRO BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
TIMING AND SEGMENTATION
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
Timing and Segmentation
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
1 Computers are deaf 2 Content is overwhelming and unsearchable
Problems
Text metadata only (Name Creator User Tags)
The state of the art is ldquosearch by textrdquo
Solution Add ears to computers
Male Speech
Applause Baby Cry
Music (Rock)
Sound signature Smart chapter markers
Female Speech
Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD
Machine learning
Content analysis
Trends
1 Ubiquitous Content
2 Machine Learning
3 Cloud Computing
Find me sound effects that sound like this
Discover and interact with content
Audio Search similarity search find similar
Male Speech
Applause Baby Cry
Music (Rock)
Sound signature Smart chapter markers
Female Speech
Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD
Machine learning Content analysis
Metadata (XML JSON)
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
ldquodistorted guitarrdquo
Discover and interact with content
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
Find me songs that sound like this
find similar
Discover and interact with content
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
James Brown ndash The Payback
Original
Groove
Remixes
Graphic Clint Bajakian
Discover and interact with content
Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox
content-based querying and retrieval indexing (tagging similarity)
fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization
Why MIR
Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs
Commercial Applications
This weekhellip
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ
This weekhellip
Day 1
Day 2
Day 3
Day 4
Day 5
Questions Concerns
INTRO BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
TIMING AND SEGMENTATION
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
Timing and Segmentation
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
Solution Add ears to computers
Male Speech
Applause Baby Cry
Music (Rock)
Sound signature Smart chapter markers
Female Speech
Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD
Machine learning
Content analysis
Trends
1 Ubiquitous Content
2 Machine Learning
3 Cloud Computing
Find me sound effects that sound like this
Discover and interact with content
Audio Search similarity search find similar
Male Speech
Applause Baby Cry
Music (Rock)
Sound signature Smart chapter markers
Female Speech
Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD
Machine learning Content analysis
Metadata (XML JSON)
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
ldquodistorted guitarrdquo
Discover and interact with content
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
Find me songs that sound like this
find similar
Discover and interact with content
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
James Brown ndash The Payback
Original
Groove
Remixes
Graphic Clint Bajakian
Discover and interact with content
Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox
content-based querying and retrieval indexing (tagging similarity)
fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization
Why MIR
Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs
Commercial Applications
This weekhellip
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ
This weekhellip
Day 1
Day 2
Day 3
Day 4
Day 5
Questions Concerns
INTRO BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
TIMING AND SEGMENTATION
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
Timing and Segmentation
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
Trends
1 Ubiquitous Content
2 Machine Learning
3 Cloud Computing
Find me sound effects that sound like this
Discover and interact with content
Audio Search similarity search find similar
Male Speech
Applause Baby Cry
Music (Rock)
Sound signature Smart chapter markers
Female Speech
Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD
Machine learning Content analysis
Metadata (XML JSON)
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
ldquodistorted guitarrdquo
Discover and interact with content
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
Find me songs that sound like this
find similar
Discover and interact with content
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
James Brown ndash The Payback
Original
Groove
Remixes
Graphic Clint Bajakian
Discover and interact with content
Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox
content-based querying and retrieval indexing (tagging similarity)
fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization
Why MIR
Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs
Commercial Applications
This weekhellip
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ
This weekhellip
Day 1
Day 2
Day 3
Day 4
Day 5
Questions Concerns
INTRO BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
TIMING AND SEGMENTATION
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
Timing and Segmentation
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
Find me sound effects that sound like this
Discover and interact with content
Audio Search similarity search find similar
Male Speech
Applause Baby Cry
Music (Rock)
Sound signature Smart chapter markers
Female Speech
Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD
Machine learning Content analysis
Metadata (XML JSON)
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
ldquodistorted guitarrdquo
Discover and interact with content
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
Find me songs that sound like this
find similar
Discover and interact with content
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
James Brown ndash The Payback
Original
Groove
Remixes
Graphic Clint Bajakian
Discover and interact with content
Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox
content-based querying and retrieval indexing (tagging similarity)
fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization
Why MIR
Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs
Commercial Applications
This weekhellip
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ
This weekhellip
Day 1
Day 2
Day 3
Day 4
Day 5
Questions Concerns
INTRO BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
TIMING AND SEGMENTATION
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
Timing and Segmentation
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
Male Speech
Applause Baby Cry
Music (Rock)
Sound signature Smart chapter markers
Female Speech
Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD
Machine learning Content analysis
Metadata (XML JSON)
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
ldquodistorted guitarrdquo
Discover and interact with content
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
Find me songs that sound like this
find similar
Discover and interact with content
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
James Brown ndash The Payback
Original
Groove
Remixes
Graphic Clint Bajakian
Discover and interact with content
Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox
content-based querying and retrieval indexing (tagging similarity)
fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization
Why MIR
Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs
Commercial Applications
This weekhellip
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ
This weekhellip
Day 1
Day 2
Day 3
Day 4
Day 5
Questions Concerns
INTRO BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
TIMING AND SEGMENTATION
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
Timing and Segmentation
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
ldquodistorted guitarrdquo
Discover and interact with content
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
Find me songs that sound like this
find similar
Discover and interact with content
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
James Brown ndash The Payback
Original
Groove
Remixes
Graphic Clint Bajakian
Discover and interact with content
Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox
content-based querying and retrieval indexing (tagging similarity)
fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization
Why MIR
Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs
Commercial Applications
This weekhellip
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ
This weekhellip
Day 1
Day 2
Day 3
Day 4
Day 5
Questions Concerns
INTRO BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
TIMING AND SEGMENTATION
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
Timing and Segmentation
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
ldquodistorted guitarrdquo
Discover and interact with content
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
Find me songs that sound like this
find similar
Discover and interact with content
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
James Brown ndash The Payback
Original
Groove
Remixes
Graphic Clint Bajakian
Discover and interact with content
Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox
content-based querying and retrieval indexing (tagging similarity)
fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization
Why MIR
Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs
Commercial Applications
This weekhellip
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ
This weekhellip
Day 1
Day 2
Day 3
Day 4
Day 5
Questions Concerns
INTRO BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
TIMING AND SEGMENTATION
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
Timing and Segmentation
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
Find me songs that sound like this
find similar
Discover and interact with content
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
James Brown ndash The Payback
Original
Groove
Remixes
Graphic Clint Bajakian
Discover and interact with content
Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox
content-based querying and retrieval indexing (tagging similarity)
fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization
Why MIR
Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs
Commercial Applications
This weekhellip
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ
This weekhellip
Day 1
Day 2
Day 3
Day 4
Day 5
Questions Concerns
INTRO BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
TIMING AND SEGMENTATION
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
Timing and Segmentation
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
Find me songs that sound like this
find similar
Discover and interact with content
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
James Brown ndash The Payback
Original
Groove
Remixes
Graphic Clint Bajakian
Discover and interact with content
Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox
content-based querying and retrieval indexing (tagging similarity)
fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization
Why MIR
Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs
Commercial Applications
This weekhellip
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ
This weekhellip
Day 1
Day 2
Day 3
Day 4
Day 5
Questions Concerns
INTRO BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
TIMING AND SEGMENTATION
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
Timing and Segmentation
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
Trends
1 Ubiquitous Content
2 Machine Learning
Organize
Search
Understand
3 Cloud Computing
James Brown ndash The Payback
Original
Groove
Remixes
Graphic Clint Bajakian
Discover and interact with content
Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox
content-based querying and retrieval indexing (tagging similarity)
fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization
Why MIR
Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs
Commercial Applications
This weekhellip
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ
This weekhellip
Day 1
Day 2
Day 3
Day 4
Day 5
Questions Concerns
INTRO BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
TIMING AND SEGMENTATION
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
Timing and Segmentation
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
James Brown ndash The Payback
Original
Groove
Remixes
Graphic Clint Bajakian
Discover and interact with content
Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox
content-based querying and retrieval indexing (tagging similarity)
fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization
Why MIR
Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs
Commercial Applications
This weekhellip
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ
This weekhellip
Day 1
Day 2
Day 3
Day 4
Day 5
Questions Concerns
INTRO BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
TIMING AND SEGMENTATION
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
Timing and Segmentation
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
content-based querying and retrieval indexing (tagging similarity)
fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization
Why MIR
Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs
Commercial Applications
This weekhellip
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ
This weekhellip
Day 1
Day 2
Day 3
Day 4
Day 5
Questions Concerns
INTRO BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
TIMING AND SEGMENTATION
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
Timing and Segmentation
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs
Commercial Applications
This weekhellip
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ
This weekhellip
Day 1
Day 2
Day 3
Day 4
Day 5
Questions Concerns
INTRO BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
TIMING AND SEGMENTATION
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
Timing and Segmentation
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
This weekhellip
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ
This weekhellip
Day 1
Day 2
Day 3
Day 4
Day 5
Questions Concerns
INTRO BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
TIMING AND SEGMENTATION
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
Timing and Segmentation
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ
This weekhellip
Day 1
Day 2
Day 3
Day 4
Day 5
Questions Concerns
INTRO BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
TIMING AND SEGMENTATION
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
Timing and Segmentation
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ
This weekhellip
Day 1
Day 2
Day 3
Day 4
Day 5
Questions Concerns
INTRO BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
TIMING AND SEGMENTATION
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
Timing and Segmentation
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
Questions Concerns
INTRO BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
TIMING AND SEGMENTATION
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
Timing and Segmentation
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
INTRO BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
TIMING AND SEGMENTATION
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
Timing and Segmentation
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
TIMING AND SEGMENTATION
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
Timing and Segmentation
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Basic system overview
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
TIMING AND SEGMENTATION
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
Timing and Segmentation
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
TIMING AND SEGMENTATION
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
Timing and Segmentation
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
TIMING AND SEGMENTATION
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
Timing and Segmentation
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
Timing and Segmentation
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
Frames 1 second 1 second
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window
bull ldquoFramesrdquo ndash Different problems call for different frame lengths
bull Onset detection bull Beat detection
ndash Beat ndash Measure Bar Harmonic changes
bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue
Timing and Segmentation
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
OVER TO YOU LEIGH
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
BASIC SYSTEM OVERVIEW
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
Segmentation
(Frames Onsets Beats Bars Chord
Changes etc)
Feature Extraction
(Time-based spectral energy
MFCC etc)
Analysis Decision Making
(Classification Clustering etc)
Basic system overview
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
Example Feature Vector
ZCR Centroid Bandwidth Skew
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
ANALYSIS AND DECISION MAKING HEURISTICS
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition
bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares
bull Time for more sophistication
Heuristic Analysis
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
Traininghellip
TEST
TRAINING SET
ldquo1rdquo ldquo0rdquo
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
bull Explanationhellip
Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data
Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces
Can easily be ldquooverfitrdquo
We can improve computation efficiency by storing just the class prototypes
k-NN
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
k-NN
bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
hellipbuthellip scaling is a good ideahellip
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
Scaling Example unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
Scaling Example linear scale to -11 unscaled scaled
zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis
2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797
1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848
1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426
1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836
2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824
2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439
1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108
1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1
2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934
1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1
1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284
1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091
7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807
6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582
5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931
110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889
4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884
7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172
2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744
4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
k-NN
bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters
bull Generally choose k being odd to guarantee a majority vote for a class
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
Distance Classification
1 Find nearest neighbor 2 Find representative match via class prototype (eg
center of group or mean of training data class) Distance metric Most common Euclidean distance
gt End of Lecture 1 Part 1
gt End of Lecture 1 Part 1