machine learning for music

Post on 17-Jul-2015

806 Views

Category:

Data & Analytics

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Machine Learning for Music

Faculty of Mathematics and Informatics, SUPetko Nikolov April 8, 2015

About Me

Machine Learning

Music Information Retrieval

Machine Learning / Automated Data Science

What’s Music Information Retrieval?

Musicology

Computer Science

Signal Processing

Machine Learning

MIR

Music Recommendations

Recommending tags

Spotify’s Shuffle Mode

● Not really random

● Certainly some processing

● Probably some MIR behind

Pandora’s Music Genome Project

● started in 2000

● 800 000 manually annotated tracks by music experts

● 450 attributes to describe music

● 25 minutes per track to label

MIREX

Music Information Retrieval Evaluation eXchange annual competition featuring more than 20 tasks

state-of-the-art algorithms compete against each other

Structured Information

Retrieval

Synthesis

fingerprintingcover song detectiongenre recognitioninstrument recognitionmood detectiontranscriptionplaylist generation

beat trackingkey detectionpitch trackingvocal detectionrecommendationaudio similaritysource separation

genre recognitioninstrument recognitionmood detection

vocal detection

audio similarity

MIR Architecture

Audio

Segmentation and

Preprocessing

MIR Architecture

Audio

Segmentation and

Preprocessing

Feature Extraction

MIR Architecture

Audio

Segmentation and

Preprocessing

Feature Extraction

Machine Learning

MIR Architecture

Audio

Segmentation and

Preprocessing

Feature Extraction

Machine Learning

classical

piano

romanticBethoven

by Daniel Barenboim

2 4

MIR Architecture

Audio

Segmentation and

Preprocessing

classical

piano

romanticBethoven

Deep Learning

by Daniel Barenboim

2 4

MIR Architecture

Audio

Audio signal

Audio signal

human hearing: 20 Hz to 20 KHz

Segmentation

SegmentationFrame

SegmentationFrame

52 ms

SegmentationFrame

52 msf1

SegmentationFrame

52 msf1 f2

SegmentationFrame

52 msf1 f2 f3

SegmentationFrame

52 msf1 f2 f3 f4

SegmentationFrame

52 msf1 f2 f3 f4 fn

Spectrum - on frame level

Discrete Fourier Transform (DFT)

time frequency

Feature extraction

f x

Spectral Centroid

where is the ‘center of mass’ of the spectrum

Spectral Slope

fit linear regression and get the slope coef.

Spectral Slope

fit linear regression and get the slope coef.

Spectral Slope

fit linear regression and get the slope coef.

Spectral Slope

fit linear regression and get the slope coef.

Spectral Correlation is the cosine distance between the frequency vectors of two consecutive framesVariation is (1.0 - correlation) respectively.

Spectral Correlation / Variation

Feature extraction - Result

f11 f12 f13 f14 f15 ……… f1m

f21 f22 f23 f24 f25 ……… f2m

centroid

correlation

Frames

Feature extraction - Result

f11 f12 f13 f14 f15 ……… f1m

f21 f22 f23 f24 f25 ……… f2m

centroid

correlation

Framesframes number vary across audio recordings

Universal Background Model

Gaussian Mixture Model

frame feature vector

Gaussian Mixture Model

Multivariate Gaussian Distribution

Gaussian Mixture Model

Gaussian Mixture Model

Gaussian Mixture Model - per track

Gaussian Mixture Model - per track

Gaussian Mixture Model - per track

Gaussian Mixture Model - per track

[𝛍1,𝛍2,𝛍3,𝛍4]

Classification - Example Neural Netaik

wk

Feature vector

Input Hidden Output

Likelihood of Rock?

Layers:

Classification - Example Neural Netaik

wk

Feature vector

Input Hidden Output

Likelihood of Rock?

Layers:

Classification - Example Neural Netaik

wk

Feature vector

Input Hidden Output

Likelihood of Rock?

Layers:

What’s Deep Learning?

(defn deep-learning? [neural-net] (hidden-layer? neural-net))

we are trying to learn new high-level representation having many more hidden layers

input is as raw as possible

Mel-spectrum

Deep Neural Network

Deep Neural Network

Backpropagation

Deep Neural Network

Backpropagation

Deep Neural Network

Backpropagation gradient fades quickly

Deep Belief Network

Input (Mel spectrum)

Output

Hidden Layer 3

Hidden Layer 2

Hidden Layer 1Restricted Boltzmann Machine

RBM

RBM

RBM

Rock Jazz Punk Electronic

Deep Belief Network

Input (Mel spectrum)

Hidden Layer 1Restricted Boltzmann Machine

Deep Belief Network

Input (Mel spectrum)

Hidden Layer 1Restricted Boltzmann Machine

Deep Belief Network

Input (Mel spectrum)

Output

Hidden Layer 3

Hidden Layer 2

Hidden Layer 1Restricted Boltzmann Machine

RBM

RBM

RBM

Rock Jazz Punk Electronic

Deep Auto Encoders

Mel spectrum

Mel spectrumOutput

Input

Deep Auto Encoders

Mel spectrum

Mel spectrumOutput

Input

Used for denoising

Tools

essentia - audio retrieval algorithms

theano - CPU/GPU symbolic optimization

scikit-learn - machine learning in Python

top related