learning instrument identificationcs229.stanford.edu/proj2015/010_poster.pdf · for each method, we...

All Instruments (1440) Kernel Linear Radial (Gaussian) Polynomial Cost parameter 1 10 0.1 Gamma n/a 0.5 1 Degree n/a n/a 2 Training Error 8.00% 8.00% 8.00% Test Error 20.5 30 17.9 Support Vectors 713 1105 681 10-fold cv error % 21.5 30.9 17.8 Guitars (771) Kernel Linear Radial (Gaussian) Polynomial Cost Parameter 5 10 0.1 Gamma n/a 0.5 0.5 Degree n/a n/a 3 Training Error 6.5 6.5 6.5 Test Error 17.4 3.6 6 Support Vectors 188 604 299 10-fold cv error % 14.4 38.2 13.1 Learning Instrument Identification Lewis Guignard and Greg Kehoe Abstract In this project we perform instrument recognition from a mono- track audio recording. We focus on single instrument classification for eight different instruments. • Acoustic Guitar • Electric Guitar • Electric Bass Guitar • Tenor Drum • Bass Drum • Snare Drum • Cymbals • Hi-Hat Feature Selection After normalizing all samples to unit energy, we extract 10 features from the DFT of each sample. Each of the n features consists of a 3-tuple, thus each example is a length-30 feature vector. Feature Selection and Instrument Comparison • Frequency f n of n th largest magnitude • Magnitude of peak at frequency f n • Power contribution of frequency f n SVM Classification We ran SVM for all instruments (8 classes) and for the Electric Guitar, Bass Guitar, and Acoustic guitars only (3 classes). For multi-class situations we used a set of one-versus-one classifiers. For each method, we used three kernels, trained on 80% of a random permutation the data, and evaluated training and test error. We iterated over several settings of hyper-parameter values to determine the lowest training error then used the optimal hyper-parameters to run 10-fold cross-validation to estimate the test error of each model. K-Means Using k-means, we attempted to group each instrument into its own cluster. We discovered that certain instruments were not discernable by using k-means, while other instruments are easily separable. Intuitively, instruments that are significantly different in their acoustic qualities are more easily separated into distinct clusters. • Electric Guitar and Cymbals 93.2% correct labeling • Electric Guitar and Tenor Drum 58.4% correct labeling • Electric, Bass, and Acoustic Guitar No discernable mapping from clusters to instrument class Principal Component Analysis Using the first three principal components, the data set appears to be separable, although the three guitars are very similar and more difficult to classify given our feature selection. Data Collection & Sample Quality We used two methods of data collection, each with varying levels of quality in obtained in the sample. Our analysis shows that the spectral content is well-preserved for various bit-rates. • Live recording Controlled quality for sample generation (mp3 or wav format) • YouTube audio extraction Quality limited by online content (mp3 format at various bit rates) WAV vs. MP3 Format

Upload: dangnhan

Post on 20-Jul-2018

213 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Learning Instrument Identificationcs229.stanford.edu/proj2015/010_poster.pdf · For each method, we used three kernels, trained on 80% of a ... • Electric, Bass, and Acoustic Guitar

Guitars (771)Kernel Linear Radial (Gaussian) PolynomialCost Parameter 5 10 0.1Gamma n/a 0.5 0.5Degree n/a n/a 3Training Error 6.5 6.5 6.5Test Error 17.4 3.6 6Support Vectors 188 604 29910-fold cv error % 14.4 38.2 13.1

Learning Instrument Identification Lewis Guignard and Greg Kehoe

Abstract In this project we perform instrument recognition from a mono-track audio recording. We focus on single instrument classification for eight different instruments.

•  Acoustic Guitar •  Electric Guitar •  Electric Bass Guitar •  Tenor Drum

•  Bass Drum •  Snare Drum •  Cymbals •  Hi-Hat

Feature Selection After normalizing all samples to unit energy, we extract 10 features from the DFT of each sample. Each of the n features consists of a 3-tuple, thus each example is a length-30 feature vector.

Feature Selection and Instrument Comparison

•  Frequency fn of nth largest magnitude •  Magnitude of peak at frequency fn •  Power contribution of frequency fn

SVM Classification We ran SVM for all instruments (8 classes) and for the Electric Guitar, Bass Guitar, and Acoustic guitars only (3 classes). For multi-class situations we used a set of one-versus-one classifiers. For each method, we used three kernels, trained on 80% of a random permutation the data, and evaluated training and test error. We iterated over several settings of hyper-parameter values to determine the lowest training error then used the optimal hyper-parameters to run 10-fold cross-validation to estimate the test error of each model. K-Means

Using k-means, we attempted to group each instrument into its own cluster. We discovered that certain instruments were not discernable by using k-means, while other instruments are easily separable. Intuitively, instruments that are significantly different in their acoustic qualities are more easily separated into distinct clusters.

•  Electric Guitar and Cymbals 93.2% correct labeling

•  Electric Guitar and Tenor Drum 58.4% correct labeling

•  Electric, Bass, and Acoustic Guitar No discernable mapping from clusters to instrument class

Principal Component Analysis Using the first three principal components, the data set appears to be separable, although the three guitars are very similar and more difficult to classify given our feature selection.

Data Collection & Sample Quality We used two methods of data collection, each with varying levels of quality in obtained in the sample. Our analysis shows that the spectral content is well-preserved for various bit-rates.

•  Live recording Controlled quality for sample generation (mp3 or wav format)

•  YouTube audio extraction Quality limited by online content (mp3 format at various bit rates)

WAV vs. MP3 Format

Calorie estimation from food imagescs229.stanford.edu/proj2015/151_poster.pdfWe perform food image classiﬁcation using SVManddeeplearningalgorithms. Diﬀer-entfeaturessuchasLBP,HOG,andCNN)

Earthquake-Induced Structural Damage Classi cation Algorithmcs229.stanford.edu/proj2015/343_report.pdf · Earthquake-Induced Structural Damage Classi cation ... To obtain some of

Boqi Li, Mingyu Wang CS 229, Stanford Universitycs229.stanford.edu/proj2015/313_poster.pdfCuisine Classification from Ingredients Boqi Li, Mingyu Wang CS 229, Stanford University References

Sales Prediction with Time Series Modelingcs229.stanford.edu/proj2015/219_poster.pdfSales forecasting is critical for retailers • optimal stocking of products • website stability

Learning of Visualization of Object Recognition Features ...cs229.stanford.edu/proj2015/167_report.pdf · Learning of Visualization of Object Recognition Features and Image Reconstruction

Exploring Commodity and Stock Volatility using Topic ...cs229.stanford.edu/proj2015/189_report.pdf3 Finally, using the visualization, knowledge of the major historical events, and

Prediction of clonal subtypes in breast invasive carcinomacs229.stanford.edu/proj2015/285_report.pdf · 2017. 9. 23. · Hunter Boyce, Anna Shcherbina, Alice Yu Abstract—Breast

Reducing)False)ArrhythmiaAlarms)in)the)Intensive)Care)Unitcs229.stanford.edu/proj2015/288_poster.pdf · 2017-09-23 · Reducing)False)ArrhythmiaAlarms)in)the)Intensive)Care)Unit!

Predicting Valuations of Venture Capital Funded Private ...cs229.stanford.edu/proj2015/210_report.pdfpredicting the numerical valuation of venture capital funded company, however,