intelligent audio systems: a review of the foundations and ... · pdf fileremixes with echo...

49
Jay LeBoeuf jay{at}izotope.com Leigh Smith lsmith{at}izotope.com Steve Tjoa [email protected] June 2013 Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval DAY 1

Upload: hoangtu

Post on 18-Mar-2018

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

Jay LeBoeuf jayatizotopecom

Leigh Smith

lsmithatizotopecom

Steve Tjoa kiemyanggmailcom

June 2013

Intelligent Audio Systems A review of the foundations and applications of

semantic audio analysis and music information retrieval

DAY 1

bull httpsccrmastanfordeduwikiMIR_workshop_2013

bull Getting setup at CCRMA ndash Parking Tresidder ($12day) CCRMA ($12day daily permits) ndash Building access logins laptops ndash Wifi (Lecturers can log you in as guest)

bull Online access + lectures video materials

bull Daily schedule ndash 9930 ndash Contact us if you know yoursquoll be late

bull Introductions

ndash Our background ndash A little about yourself ndash Your area of interest background with DSP coding Matlab and any specific

items of interest that yoursquod like to see covered ndash Will you be using your own laptop for the lab

bull If so do you have Matlab (MacPC)

Administration

bull Cornell University Electrical Engineering bull Stanford University CCRMA (MAMST) bull DigidesignAvid

ndash ATG Enter ML and MIR

bull Imagine Research ndash Pandora Line 6 Lucasfilm iZotope ndash NSF SBIR Phase I II IIB ndash Businessweek Innovator November 2012

bull CCRMA MIR Workshop 2008-2013 (6th Year) bull iZotope

Introduction Jay LeBoeuf

Example Seedhellip

Problems

1 Computers are deaf 2 Content is overwhelming and unsearchable

Problems

1 Computers are deaf 2 Content is overwhelming and unsearchable

Problems

Text metadata only (Name Creator User Tags)

The state of the art is ldquosearch by textrdquo

Solution Add ears to computers

Male Speech

Applause Baby Cry

Music (Rock)

Sound signature Smart chapter markers

Female Speech

Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD

Machine learning

Content analysis

Trends

1 Ubiquitous Content

2 Machine Learning

3 Cloud Computing

Find me sound effects that sound like this

Discover and interact with content

Audio Search similarity search find similar

Male Speech

Applause Baby Cry

Music (Rock)

Sound signature Smart chapter markers

Female Speech

Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD

Machine learning Content analysis

Metadata (XML JSON)

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

ldquodistorted guitarrdquo

Discover and interact with content

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

Find me songs that sound like this

find similar

Discover and interact with content

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

James Brown ndash The Payback

Original

Groove

Remixes

Graphic Clint Bajakian

Discover and interact with content

Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox

content-based querying and retrieval indexing (tagging similarity)

fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization

Why MIR

Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs

Commercial Applications

This weekhellip

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ

This weekhellip

Day 1

Day 2

Day 3

Day 4

Day 5

Questions Concerns

INTRO BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

TIMING AND SEGMENTATION

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

Timing and Segmentation

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 2: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

bull httpsccrmastanfordeduwikiMIR_workshop_2013

bull Getting setup at CCRMA ndash Parking Tresidder ($12day) CCRMA ($12day daily permits) ndash Building access logins laptops ndash Wifi (Lecturers can log you in as guest)

bull Online access + lectures video materials

bull Daily schedule ndash 9930 ndash Contact us if you know yoursquoll be late

bull Introductions

ndash Our background ndash A little about yourself ndash Your area of interest background with DSP coding Matlab and any specific

items of interest that yoursquod like to see covered ndash Will you be using your own laptop for the lab

bull If so do you have Matlab (MacPC)

Administration

bull Cornell University Electrical Engineering bull Stanford University CCRMA (MAMST) bull DigidesignAvid

ndash ATG Enter ML and MIR

bull Imagine Research ndash Pandora Line 6 Lucasfilm iZotope ndash NSF SBIR Phase I II IIB ndash Businessweek Innovator November 2012

bull CCRMA MIR Workshop 2008-2013 (6th Year) bull iZotope

Introduction Jay LeBoeuf

Example Seedhellip

Problems

1 Computers are deaf 2 Content is overwhelming and unsearchable

Problems

1 Computers are deaf 2 Content is overwhelming and unsearchable

Problems

Text metadata only (Name Creator User Tags)

The state of the art is ldquosearch by textrdquo

Solution Add ears to computers

Male Speech

Applause Baby Cry

Music (Rock)

Sound signature Smart chapter markers

Female Speech

Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD

Machine learning

Content analysis

Trends

1 Ubiquitous Content

2 Machine Learning

3 Cloud Computing

Find me sound effects that sound like this

Discover and interact with content

Audio Search similarity search find similar

Male Speech

Applause Baby Cry

Music (Rock)

Sound signature Smart chapter markers

Female Speech

Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD

Machine learning Content analysis

Metadata (XML JSON)

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

ldquodistorted guitarrdquo

Discover and interact with content

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

Find me songs that sound like this

find similar

Discover and interact with content

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

James Brown ndash The Payback

Original

Groove

Remixes

Graphic Clint Bajakian

Discover and interact with content

Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox

content-based querying and retrieval indexing (tagging similarity)

fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization

Why MIR

Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs

Commercial Applications

This weekhellip

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ

This weekhellip

Day 1

Day 2

Day 3

Day 4

Day 5

Questions Concerns

INTRO BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

TIMING AND SEGMENTATION

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

Timing and Segmentation

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 3: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

bull Cornell University Electrical Engineering bull Stanford University CCRMA (MAMST) bull DigidesignAvid

ndash ATG Enter ML and MIR

bull Imagine Research ndash Pandora Line 6 Lucasfilm iZotope ndash NSF SBIR Phase I II IIB ndash Businessweek Innovator November 2012

bull CCRMA MIR Workshop 2008-2013 (6th Year) bull iZotope

Introduction Jay LeBoeuf

Example Seedhellip

Problems

1 Computers are deaf 2 Content is overwhelming and unsearchable

Problems

1 Computers are deaf 2 Content is overwhelming and unsearchable

Problems

Text metadata only (Name Creator User Tags)

The state of the art is ldquosearch by textrdquo

Solution Add ears to computers

Male Speech

Applause Baby Cry

Music (Rock)

Sound signature Smart chapter markers

Female Speech

Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD

Machine learning

Content analysis

Trends

1 Ubiquitous Content

2 Machine Learning

3 Cloud Computing

Find me sound effects that sound like this

Discover and interact with content

Audio Search similarity search find similar

Male Speech

Applause Baby Cry

Music (Rock)

Sound signature Smart chapter markers

Female Speech

Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD

Machine learning Content analysis

Metadata (XML JSON)

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

ldquodistorted guitarrdquo

Discover and interact with content

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

Find me songs that sound like this

find similar

Discover and interact with content

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

James Brown ndash The Payback

Original

Groove

Remixes

Graphic Clint Bajakian

Discover and interact with content

Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox

content-based querying and retrieval indexing (tagging similarity)

fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization

Why MIR

Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs

Commercial Applications

This weekhellip

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ

This weekhellip

Day 1

Day 2

Day 3

Day 4

Day 5

Questions Concerns

INTRO BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

TIMING AND SEGMENTATION

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

Timing and Segmentation

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 4: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

Example Seedhellip

Problems

1 Computers are deaf 2 Content is overwhelming and unsearchable

Problems

1 Computers are deaf 2 Content is overwhelming and unsearchable

Problems

Text metadata only (Name Creator User Tags)

The state of the art is ldquosearch by textrdquo

Solution Add ears to computers

Male Speech

Applause Baby Cry

Music (Rock)

Sound signature Smart chapter markers

Female Speech

Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD

Machine learning

Content analysis

Trends

1 Ubiquitous Content

2 Machine Learning

3 Cloud Computing

Find me sound effects that sound like this

Discover and interact with content

Audio Search similarity search find similar

Male Speech

Applause Baby Cry

Music (Rock)

Sound signature Smart chapter markers

Female Speech

Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD

Machine learning Content analysis

Metadata (XML JSON)

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

ldquodistorted guitarrdquo

Discover and interact with content

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

Find me songs that sound like this

find similar

Discover and interact with content

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

James Brown ndash The Payback

Original

Groove

Remixes

Graphic Clint Bajakian

Discover and interact with content

Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox

content-based querying and retrieval indexing (tagging similarity)

fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization

Why MIR

Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs

Commercial Applications

This weekhellip

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ

This weekhellip

Day 1

Day 2

Day 3

Day 4

Day 5

Questions Concerns

INTRO BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

TIMING AND SEGMENTATION

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

Timing and Segmentation

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 5: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

Problems

1 Computers are deaf 2 Content is overwhelming and unsearchable

Problems

1 Computers are deaf 2 Content is overwhelming and unsearchable

Problems

Text metadata only (Name Creator User Tags)

The state of the art is ldquosearch by textrdquo

Solution Add ears to computers

Male Speech

Applause Baby Cry

Music (Rock)

Sound signature Smart chapter markers

Female Speech

Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD

Machine learning

Content analysis

Trends

1 Ubiquitous Content

2 Machine Learning

3 Cloud Computing

Find me sound effects that sound like this

Discover and interact with content

Audio Search similarity search find similar

Male Speech

Applause Baby Cry

Music (Rock)

Sound signature Smart chapter markers

Female Speech

Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD

Machine learning Content analysis

Metadata (XML JSON)

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

ldquodistorted guitarrdquo

Discover and interact with content

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

Find me songs that sound like this

find similar

Discover and interact with content

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

James Brown ndash The Payback

Original

Groove

Remixes

Graphic Clint Bajakian

Discover and interact with content

Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox

content-based querying and retrieval indexing (tagging similarity)

fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization

Why MIR

Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs

Commercial Applications

This weekhellip

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ

This weekhellip

Day 1

Day 2

Day 3

Day 4

Day 5

Questions Concerns

INTRO BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

TIMING AND SEGMENTATION

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

Timing and Segmentation

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 6: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

1 Computers are deaf 2 Content is overwhelming and unsearchable

Problems

1 Computers are deaf 2 Content is overwhelming and unsearchable

Problems

Text metadata only (Name Creator User Tags)

The state of the art is ldquosearch by textrdquo

Solution Add ears to computers

Male Speech

Applause Baby Cry

Music (Rock)

Sound signature Smart chapter markers

Female Speech

Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD

Machine learning

Content analysis

Trends

1 Ubiquitous Content

2 Machine Learning

3 Cloud Computing

Find me sound effects that sound like this

Discover and interact with content

Audio Search similarity search find similar

Male Speech

Applause Baby Cry

Music (Rock)

Sound signature Smart chapter markers

Female Speech

Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD

Machine learning Content analysis

Metadata (XML JSON)

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

ldquodistorted guitarrdquo

Discover and interact with content

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

Find me songs that sound like this

find similar

Discover and interact with content

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

James Brown ndash The Payback

Original

Groove

Remixes

Graphic Clint Bajakian

Discover and interact with content

Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox

content-based querying and retrieval indexing (tagging similarity)

fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization

Why MIR

Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs

Commercial Applications

This weekhellip

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ

This weekhellip

Day 1

Day 2

Day 3

Day 4

Day 5

Questions Concerns

INTRO BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

TIMING AND SEGMENTATION

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

Timing and Segmentation

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 7: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

1 Computers are deaf 2 Content is overwhelming and unsearchable

Problems

Text metadata only (Name Creator User Tags)

The state of the art is ldquosearch by textrdquo

Solution Add ears to computers

Male Speech

Applause Baby Cry

Music (Rock)

Sound signature Smart chapter markers

Female Speech

Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD

Machine learning

Content analysis

Trends

1 Ubiquitous Content

2 Machine Learning

3 Cloud Computing

Find me sound effects that sound like this

Discover and interact with content

Audio Search similarity search find similar

Male Speech

Applause Baby Cry

Music (Rock)

Sound signature Smart chapter markers

Female Speech

Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD

Machine learning Content analysis

Metadata (XML JSON)

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

ldquodistorted guitarrdquo

Discover and interact with content

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

Find me songs that sound like this

find similar

Discover and interact with content

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

James Brown ndash The Payback

Original

Groove

Remixes

Graphic Clint Bajakian

Discover and interact with content

Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox

content-based querying and retrieval indexing (tagging similarity)

fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization

Why MIR

Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs

Commercial Applications

This weekhellip

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ

This weekhellip

Day 1

Day 2

Day 3

Day 4

Day 5

Questions Concerns

INTRO BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

TIMING AND SEGMENTATION

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

Timing and Segmentation

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 8: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

Solution Add ears to computers

Male Speech

Applause Baby Cry

Music (Rock)

Sound signature Smart chapter markers

Female Speech

Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD

Machine learning

Content analysis

Trends

1 Ubiquitous Content

2 Machine Learning

3 Cloud Computing

Find me sound effects that sound like this

Discover and interact with content

Audio Search similarity search find similar

Male Speech

Applause Baby Cry

Music (Rock)

Sound signature Smart chapter markers

Female Speech

Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD

Machine learning Content analysis

Metadata (XML JSON)

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

ldquodistorted guitarrdquo

Discover and interact with content

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

Find me songs that sound like this

find similar

Discover and interact with content

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

James Brown ndash The Payback

Original

Groove

Remixes

Graphic Clint Bajakian

Discover and interact with content

Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox

content-based querying and retrieval indexing (tagging similarity)

fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization

Why MIR

Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs

Commercial Applications

This weekhellip

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ

This weekhellip

Day 1

Day 2

Day 3

Day 4

Day 5

Questions Concerns

INTRO BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

TIMING AND SEGMENTATION

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

Timing and Segmentation

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 9: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

Trends

1 Ubiquitous Content

2 Machine Learning

3 Cloud Computing

Find me sound effects that sound like this

Discover and interact with content

Audio Search similarity search find similar

Male Speech

Applause Baby Cry

Music (Rock)

Sound signature Smart chapter markers

Female Speech

Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD

Machine learning Content analysis

Metadata (XML JSON)

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

ldquodistorted guitarrdquo

Discover and interact with content

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

Find me songs that sound like this

find similar

Discover and interact with content

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

James Brown ndash The Payback

Original

Groove

Remixes

Graphic Clint Bajakian

Discover and interact with content

Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox

content-based querying and retrieval indexing (tagging similarity)

fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization

Why MIR

Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs

Commercial Applications

This weekhellip

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ

This weekhellip

Day 1

Day 2

Day 3

Day 4

Day 5

Questions Concerns

INTRO BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

TIMING AND SEGMENTATION

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

Timing and Segmentation

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 10: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

Find me sound effects that sound like this

Discover and interact with content

Audio Search similarity search find similar

Male Speech

Applause Baby Cry

Music (Rock)

Sound signature Smart chapter markers

Female Speech

Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD

Machine learning Content analysis

Metadata (XML JSON)

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

ldquodistorted guitarrdquo

Discover and interact with content

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

Find me songs that sound like this

find similar

Discover and interact with content

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

James Brown ndash The Payback

Original

Groove

Remixes

Graphic Clint Bajakian

Discover and interact with content

Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox

content-based querying and retrieval indexing (tagging similarity)

fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization

Why MIR

Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs

Commercial Applications

This weekhellip

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ

This weekhellip

Day 1

Day 2

Day 3

Day 4

Day 5

Questions Concerns

INTRO BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

TIMING AND SEGMENTATION

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

Timing and Segmentation

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 11: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

Male Speech

Applause Baby Cry

Music (Rock)

Sound signature Smart chapter markers

Female Speech

Sound signature Genre Tags Tempo Beats Groove Intelligent FFWD

Machine learning Content analysis

Metadata (XML JSON)

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

ldquodistorted guitarrdquo

Discover and interact with content

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

Find me songs that sound like this

find similar

Discover and interact with content

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

James Brown ndash The Payback

Original

Groove

Remixes

Graphic Clint Bajakian

Discover and interact with content

Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox

content-based querying and retrieval indexing (tagging similarity)

fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization

Why MIR

Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs

Commercial Applications

This weekhellip

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ

This weekhellip

Day 1

Day 2

Day 3

Day 4

Day 5

Questions Concerns

INTRO BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

TIMING AND SEGMENTATION

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

Timing and Segmentation

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 12: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

ldquodistorted guitarrdquo

Discover and interact with content

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

Find me songs that sound like this

find similar

Discover and interact with content

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

James Brown ndash The Payback

Original

Groove

Remixes

Graphic Clint Bajakian

Discover and interact with content

Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox

content-based querying and retrieval indexing (tagging similarity)

fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization

Why MIR

Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs

Commercial Applications

This weekhellip

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ

This weekhellip

Day 1

Day 2

Day 3

Day 4

Day 5

Questions Concerns

INTRO BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

TIMING AND SEGMENTATION

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

Timing and Segmentation

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 13: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

ldquodistorted guitarrdquo

Discover and interact with content

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

Find me songs that sound like this

find similar

Discover and interact with content

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

James Brown ndash The Payback

Original

Groove

Remixes

Graphic Clint Bajakian

Discover and interact with content

Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox

content-based querying and retrieval indexing (tagging similarity)

fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization

Why MIR

Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs

Commercial Applications

This weekhellip

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ

This weekhellip

Day 1

Day 2

Day 3

Day 4

Day 5

Questions Concerns

INTRO BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

TIMING AND SEGMENTATION

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

Timing and Segmentation

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 14: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

Find me songs that sound like this

find similar

Discover and interact with content

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

James Brown ndash The Payback

Original

Groove

Remixes

Graphic Clint Bajakian

Discover and interact with content

Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox

content-based querying and retrieval indexing (tagging similarity)

fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization

Why MIR

Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs

Commercial Applications

This weekhellip

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ

This weekhellip

Day 1

Day 2

Day 3

Day 4

Day 5

Questions Concerns

INTRO BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

TIMING AND SEGMENTATION

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

Timing and Segmentation

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 15: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

Find me songs that sound like this

find similar

Discover and interact with content

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

James Brown ndash The Payback

Original

Groove

Remixes

Graphic Clint Bajakian

Discover and interact with content

Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox

content-based querying and retrieval indexing (tagging similarity)

fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization

Why MIR

Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs

Commercial Applications

This weekhellip

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ

This weekhellip

Day 1

Day 2

Day 3

Day 4

Day 5

Questions Concerns

INTRO BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

TIMING AND SEGMENTATION

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

Timing and Segmentation

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 16: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

Trends

1 Ubiquitous Content

2 Machine Learning

Organize

Search

Understand

3 Cloud Computing

James Brown ndash The Payback

Original

Groove

Remixes

Graphic Clint Bajakian

Discover and interact with content

Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox

content-based querying and retrieval indexing (tagging similarity)

fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization

Why MIR

Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs

Commercial Applications

This weekhellip

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ

This weekhellip

Day 1

Day 2

Day 3

Day 4

Day 5

Questions Concerns

INTRO BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

TIMING AND SEGMENTATION

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

Timing and Segmentation

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 17: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

James Brown ndash The Payback

Original

Groove

Remixes

Graphic Clint Bajakian

Discover and interact with content

Remixes with Echo Nest Remix API httpthewubmachinecom and httpstaticechonestcomgirltalkinabox

content-based querying and retrieval indexing (tagging similarity)

fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization

Why MIR

Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs

Commercial Applications

This weekhellip

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ

This weekhellip

Day 1

Day 2

Day 3

Day 4

Day 5

Questions Concerns

INTRO BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

TIMING AND SEGMENTATION

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

Timing and Segmentation

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 18: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

content-based querying and retrieval indexing (tagging similarity)

fingerprinting and digital rights management music recommendation and playlist generation music transcription and annotation score following and audio alignment automatic classification rhythm beat tempo and form harmony chords and tonality timbre instrumentation genre emotion style and mood analysis music summarization

Why MIR

Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs

Commercial Applications

This weekhellip

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ

This weekhellip

Day 1

Day 2

Day 3

Day 4

Day 5

Questions Concerns

INTRO BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

TIMING AND SEGMENTATION

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

Timing and Segmentation

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 19: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

Music recommendation and metadata APIs - Gracenote Echo Nest Rovi BMAT Bach Technology Moodagent Clio Music Audio fingerprinting (User-driven Broadcast monitoring and copyrightbma 2nd screen) -SoundHound Shazam Audible Magic EchoNest BMAT Gracenote Civolution Digimarc 19 Billion Transactions Pitch and rhythm tracking analysis - Algorithms in Guitar Hero Rock Band - Skore (BMAT) DAW products that include beattempokeynote analysis - Ableton Live Melodyne Mixed In Key Innovative software for music creation - Khush UJAM Songsmith VoiceBand Dynamic Music - SongCruncher QBH - SoundHound (madonna example) Music Visualization amp Discover SpectralMindrsquos Sonarflow Mufin audiovision and mufinplayer Music players with recommendation - Apple Genius Google Instant Mix Broadcast monitoring - Audible Magic Clustermedia Labs

Commercial Applications

This weekhellip

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ

This weekhellip

Day 1

Day 2

Day 3

Day 4

Day 5

Questions Concerns

INTRO BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

TIMING AND SEGMENTATION

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

Timing and Segmentation

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 20: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

This weekhellip

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ

This weekhellip

Day 1

Day 2

Day 3

Day 4

Day 5

Questions Concerns

INTRO BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

TIMING AND SEGMENTATION

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

Timing and Segmentation

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 21: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ

This weekhellip

Day 1

Day 2

Day 3

Day 4

Day 5

Questions Concerns

INTRO BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

TIMING AND SEGMENTATION

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

Timing and Segmentation

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 22: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

Overview Features + Classification MIR Overview Basic feature extraction classification Time domain features Musical Analysis Beat Rhythm Pitch and Chroma Analysis Frequency domain features Beat Onset Rhythm Machine Learning Clustering and Classification Clustering and Classification (PCA LDA k-means GMM SVM) Guest Speaker Gracenote Music Information Retrieval in Polyphonic Mixtures Guest Adobe Research Information Retrieval Metrics Evaluation Real World Considerations Guest DJZ

This weekhellip

Day 1

Day 2

Day 3

Day 4

Day 5

Questions Concerns

INTRO BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

TIMING AND SEGMENTATION

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

Timing and Segmentation

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 23: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

Questions Concerns

INTRO BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

TIMING AND SEGMENTATION

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

Timing and Segmentation

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 24: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

INTRO BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

TIMING AND SEGMENTATION

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

Timing and Segmentation

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 25: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

TIMING AND SEGMENTATION

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

Timing and Segmentation

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 26: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Basic system overview

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

TIMING AND SEGMENTATION

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

Timing and Segmentation

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 27: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

TIMING AND SEGMENTATION

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

Timing and Segmentation

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 28: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

TIMING AND SEGMENTATION

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

Timing and Segmentation

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 29: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

Timing and Segmentation

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 30: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

Frames 1 second 1 second

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 31: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

bull Slicing up by fixed time sliceshellip ndash 1 second 80 ms 100 ms 20-40ms etc ndash Usually 50 overlap with a window

bull ldquoFramesrdquo ndash Different problems call for different frame lengths

bull Onset detection bull Beat detection

ndash Beat ndash Measure Bar Harmonic changes

bull Segments ndash Musically relevant boundaries ndash Separate by some perceptual cue

Timing and Segmentation

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 32: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

OVER TO YOU LEIGH

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 33: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

BASIC SYSTEM OVERVIEW

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 34: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

Segmentation

(Frames Onsets Beats Bars Chord

Changes etc)

Feature Extraction

(Time-based spectral energy

MFCC etc)

Analysis Decision Making

(Classification Clustering etc)

Basic system overview

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 35: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

Example Feature Vector

ZCR Centroid Bandwidth Skew

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 36: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

ANALYSIS AND DECISION MAKING HEURISTICS

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 37: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

bull Example ldquoCowbellrdquo on just the snare drum of a drum loop ldquoSimplerdquo instrument recognition

bull Use basic thresholds or simple decision tree to form rudimentary transcription of kicks and snares

bull Time for more sophistication

Heuristic Analysis

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 38: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

ANALYSIS AND DECISION MAKING INSTANCE-BASED CLASSIFIERS (K-NN)

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 39: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

Traininghellip

TEST

TRAINING SET

ldquo1rdquo ldquo0rdquo

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 40: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

bull Explanationhellip

Advantages Training is trivial just store the training samples very simple to implement and use Disadvantages Classification gets very complex with a lot of training data

Must measure distance to all training samples Euclidean distance becomes problematic in high-dimensional spaces

Can easily be ldquooverfitrdquo

We can improve computation efficiency by storing just the class prototypes

k-NN

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 41: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

k-NN

bull Steps ndash Measure distance to all points ndash Take the k closest ndash Majority rules (eg if k=5 then take 3 out of 5)

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 42: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

hellipbuthellip scaling is a good ideahellip

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 43: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

Scaling Example unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 44: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

Scaling Example linear scale to -11 unscaled scaled

zcr centroid brightnessrolloff kurtosis zcr centroid brightnessrolloff kurtosis

2138663 4461262 5278183 5741853 154E+08 0567221 0654219 0485151 -091438 -095797

1420046 3860736 5266741 4433475 103E+08 -000683 0396341 0479636 -096526 -098848

1412664 4095468 3829324 1168014 396E+08 -001273 049714 -021313 -068342 -081426

1223236 3551842 5074382 4351981 103E+08 -016405 0263696 0386928 -096843 -098836

2680428 5266489 5354988 444444 137E+08 1 1 0522167 -096484 -096824

2304393 431617 4682006 7819634 228E+08 0699611 0591913 019782 -083357 -091439

1594079 4597727 5422522 5733602 149E+08 013219 071282 0554715 -09147 -096108

1900388 4732717 6346437 3540374 83337090 037688 0770787 1 -1 -1

2231022 4914268 5765587 3765195 101E+08 0641 0848749 0720057 -099126 -098934

1221818 4323917 2196666 5496309 345E+09 -016518 059524 -1 1 1

1018673 2169696 2944604 3511451 148E+09 -032746 -032983 -063953 0228023 -017284

1767722 1038943 3765943 9728746 318E+08 -1 -081539 -024368 -075931 -086091

7589167 6090479 219855 2146602 137E+09 -053496 -1 -099909 -030281 -023807

6562162 1982091 3404355 1829962 659E+08 -0617 -041039 -041795 -042596 -06582

5639213 1309174 4358942 1152514 371E+08 -069073 -069935 0042118 -068945 -082931

110488 9674398 378372 9308457 304E+08 -02586 -08461 -023511 -077566 -086889

4231187 1113061 2758021 3217640 162E+09 -080321 -078357 -072945 011375 -008884

7175933 1638758 3447308 1721541 62E+08 -056797 -055782 -039725 -046813 -068172

2423024 125606 3870912 8911627 273E+08 -094765 -072216 -019309 -079109 -088744

4746985 1240683 2655689 3657653 175E+09 -076201 -072876 -077877 0284886 -001055

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 45: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

k-NN

bull Instance-based learning ndash training examples are stored directly rather than estimate model parameters

bull Generally choose k being odd to guarantee a majority vote for a class

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 46: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

Distance Classification

1 Find nearest neighbor 2 Find representative match via class prototype (eg

center of group or mean of training data class) Distance metric Most common Euclidean distance

gt End of Lecture 1 Part 1

Page 47: Intelligent Audio Systems: A review of the foundations and ... · PDF fileRemixes with Echo Nest Remix API, ... 1104.88 967.4398 37.8372 930845.7 3.04E+08 -0.2586 -0.8461 -0.23511

gt End of Lecture 1 Part 1