spectral methods for automatic processing of audio documents · september 29, 2008 université...

56
September 29, 2008 Université Toulouse III - Paul Sabatier 1 Spectral methods for automatic processing of audio documents José Anibal Arias Aguilar Advisor: Régine André-Obrecht Tutor: Jérôme Farinas

Upload: others

Post on 21-Feb-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

September 29, 2008 Université Toulouse III - Paul Sabatier 1

Spectral methods for automatic processing of audio documents

José Anibal Arias Aguilar

Advisor: Régine André-ObrechtTutor: Jérôme Farinas

September 29, 2008 Université Toulouse III - Paul Sabatier 2

Objectives

n Unify different dimensionality reduction approaches

n Visualize speechn Identify basic units in speechn Represent variable-length acoustic

sequences by 3D vectors

September 29, 2008 Université Toulouse III - Paul Sabatier 3

Outline

n Introductionn State of the artq Kernel functionsq Spectral methods for dimensionality reduction

n Contributionq Acoustic information in low dimensional spacesq Speech segmentation and labelingq Content visualization of audio databases

n Conclusions and perspectives

September 29, 2008 Université Toulouse III - Paul Sabatier 4

Introduction

n Data and machine learningq Quantity and quality, but dimensionality?

n Manifoldsq Low dimensional data embedded in high

dimensional spaces

n Kernel functionsq Link between pattern space and feature space

n Speech soundsq Complex, high dimensional information

September 29, 2008 Université Toulouse III - Paul Sabatier 5

Outline

n Introductionn State of the artq Kernel functionsq Spectral methods for dimensionality reduction

n Contributionq Acoustic information in low-dimensional spacesq Speech segmentation and labelingq Content visualization of audio databases

n Conclusions and perspectives

September 29, 2008 Université Toulouse III - Paul Sabatier 6

State of the art: kernel functions

Pattern space vs feature spacen We can transform the pattern space to find

more informative data representations

September 29, 2008 Université Toulouse III - Paul Sabatier 7

State of the art: kernel functions

Feature space: propertiesn Desirable properties of the new spaceq Contain a rich class of functionsq Have linear structureq Have inner product so that we can take

projections

n Example: Hilbert space (complete vector space with inner product)

September 29, 2008 Université Toulouse III - Paul Sabatier 8

State of the art: kernel functions

Access to feature space: Kernelsn X is a compact metric space

psdis)x,x(K,Xx)x,z()z,x(

thatsuchXX:

jiiji κκκ

κ

=∈∀=

ℜ→×

)z()x()z,x(

thatsuchspaceHilbertaisFwhereFX:

Φ⋅Φ=

→Φ

κ

n For every Mercer kernel

September 29, 2008 Université Toulouse III - Paul Sabatier 9

State of the art: kernel functions

Kernels and regularization theory [Evg99]

n Data: n Estimate

n Hypothesis space H (RKHS), complexity of the solution controlled by Hilbert space norm

n Representer theorem:

21

Hiii

Hf

f)o),x(f(Vn

minargf ∑ +=∈

λ

)x,x()x(f ii

iκα∑=

ℜ×ℜ∈ dnn )o,x(,),o,x( K11

OX:f →

fit to data complexity penalty

September 29, 2008 Université Toulouse III - Paul Sabatier 10

Outline

n Introductionn State of the artq Kernel functionsq Spectral methods for dimensionality reductionn Principal Component Analysis (PCA)n Metric Multidimensional Scaling (MDS)n Isometric mapping (ISOMAP)n Locally Linear Embedding (LLE)n Spectral Clustering (SC)

September 29, 2008 Université Toulouse III - Paul Sabatier 11

State of the art: spectral methods

Spectral methods for dimensionality reduction: two approachesn Manifold learning: nearby points remain

nearby, distant points remain distantn Information extraction: separate clustersn Spectral methods reveal low dimensional

structure by eigenvalues and eigenvectors of special matrices

September 29, 2008 Université Toulouse III - Paul Sabatier 12

State of the art: spectral methods

Linear methods: PCAn Principal Component Analysis [Alp04]q Spectral decomposition of covariance matrixq Eigenvectors: principal axes of maximum variance

subspaceq Eigenvalues: projected variance of inputs along

principal axes. The number of significant (non negative) eigenvalues estimates dimensionality

September 29, 2008 Université Toulouse III - Paul Sabatier 13

State of the art: spectral methods

Linear methods: MDSn Metric Multidimensional Scaling [Bor97]q Spectral decomposition of dot product matrix

(computed in terms of Euclidean distances of zero mean vectors)

q Eigenvectors: low dimensional embeddingq Eigenvalues: measure how each dimension

contributes to dot products. The number of significant (non negative) eigenvalues estimates dimensionality

September 29, 2008 Université Toulouse III - Paul Sabatier 14

State of the art: spectral methods – manifold learning

Nonlinear methods: ISOMAPn Preserve geodesic distances as estimated

along the manifoldn Algorithm [Ten00]:q Build adjacency graph: vertices represent inputs

and edges weighted by local distances connect neighbors

q Estimate geodesics: compute shortest paths through graph

q Metric MDS

September 29, 2008 Université Toulouse III - Paul Sabatier 15

State of the art: spectral methods – manifold learning

Nonlinear methods: ISOMAPn Assumptionsq Graph is connectedq Neighborhoods on graph reflect neighborhoods on

manifold (no shortcuts)q Dense graph without “holes”

September 29, 2008 Université Toulouse III - Paul Sabatier 16

State of the art: spectral methods – manifold learning

Nonlinear methods: ISOMAP

Fingerextension

Wrist rotation

September 29, 2008 Université Toulouse III - Paul Sabatier 17

State of the art: spectral methods – manifold learning

Nonlinear methods: LLEn Preserve local geometric relationshipsn Algorithm [Row00]:q Nearest neighbor searchq Characterize local geometry of each

neighborhood by weights W ij

q Optimize low dimensional outputs

September 29, 2008 Université Toulouse III - Paul Sabatier 18

State of the art: spectral methods – manifold learning

Nonlinear methods: LLEn Different approach than ISOMAPq Preserve local geometry: assume neighbors lie on

locally linear patchesq Construct sparse matrix

September 29, 2008 Université Toulouse III - Paul Sabatier 19

State of the art: spectral methods – manifold learning

Non linear methods: LLE

Pose

Expression

September 29, 2008 Université Toulouse III - Paul Sabatier 20

State of the art: spectral methods – information extraction

Nonlinear methods: Spectral clusteringn Discover non convex clustersn Graph partition problem (minimal cut)

September 29, 2008 Université Toulouse III - Paul Sabatier 21

State of the art: spectral methods – information extraction

Nonlinear methods: Spectral clustering

n Relaxation of the Ncut problemn Solution based on eigenvectors of an affinity

matrix [Ng01]

=

=3

2

1

3

2

1

33

22

11

000000

000000

YYY

vv

v

AA

AA

)(

)(

)(

September 29, 2008 Université Toulouse III - Paul Sabatier 22

Outline

n Introductionn State of the artq Kernel functionsq Spectral methods for dimensionality reduction

n Contributionq Acoustic information in low-dimensional spacesq Speech segmentation and labelingq Content visualization of audio databases

n Conclusion and perspectives

September 29, 2008 Université Toulouse III - Paul Sabatier 23

Contribution

Corporan OGI-MLTS

q 100 files of spontaneous telephonic speech (~45s, 8kHz)q Multilanguage (English, German, Hindi, Japanese,

Mandarin, Spanish)q Phonetically labeled

n ANITAq 150 files of studio speech (~7s, 16kHz)q 6 speakersq Posed and stressed conditions

n MUSICq 70 files (60s, 16khz)q Classic, singing voice, rock, jazz

September 29, 2008 Université Toulouse III - Paul Sabatier 24

Contribution

Some considerations

n Complexityq Isomap, SC-Kernel PCA: ~8000 vectors (~1 min

signal), 10 mins if we use phonetic speech labelsq LLE, LapEig, Landmark Isomap: ~10000 vectors

n Audio intrinsic dimensionality (MLE)q Speech: ~ 8-9 MFCCq Music: ~ 7-8 MFCCq Speech in stress conditions: dim - 1

September 29, 2008 Université Toulouse III - Paul Sabatier 25

- acoustic information in low-dimensional spaces- speech segmentation and labeling- content visualization of audio databases

September 29, 2008 Université Toulouse III - Paul Sabatier 26

Contribution: acoustic information in low-dimensional spaces

Speech manifolds: speech structure

n OGI sequencen 15 MFCCn Simplified phonetic

labels

n ISOMAP discovers a particular distribution of phonetic classes

September 29, 2008 Université Toulouse III - Paul Sabatier 27

Contribution: acoustic information in low-dimensional spaces

Eigenvalues as intrinsic dimensionality estimators

n OGI sequencen 15 MFCC

n Original variance retained in the first 6dim:q PCA: 74.27%q Kernel PCA:

86.84%q ISOMAP: 89.80%

September 29, 2008 Université Toulouse III - Paul Sabatier 28

Contribution: acoustic information in low-dimensional spaces

Speech manifolds: speech and music

n 20s of audio signal containing speech and music

n 15 MFCCn Laplacian eigenmaps

n Different zones of variation

September 29, 2008 Université Toulouse III - Paul Sabatier 29

Contribution: acoustic information in low-dimensional spaces

Information extraction: a new kind of projectionsn OGI sequences in

english, mandarin and spanish

n 15 MFCCn Spectral clustering

n Different geometric structure than manifold learning approach

September 29, 2008 Université Toulouse III - Paul Sabatier 30

Contribution: acoustic information in low-dimensional spaces

Information extraction: labels

September 29, 2008 Université Toulouse III - Paul Sabatier 31

- acoustic information in low-dimensional spaces- speech segmentation and labeling- content visualization of audio databases

September 29, 2008 Université Toulouse III - Paul Sabatier 32

Contribution: speech segmentation and labeling

Temporal spectral clustering

n OGI sequencesn 15 MFCC + D + DD

n Classical SC affinity matrix

n New metric applied to the main diagonal

A

A’

September 29, 2008 Université Toulouse III - Paul Sabatier 33

Contribution: speech segmentation and labeling

Temporal spectral clustering

ikki

ik

xx

ik

aa

otherwiseSxifea

kiif

ki

=

∈=

<−

0

2

2

n The new metric takes into account temporal closeness between vectors

n Eigenvectors of A’ are associated to segments on the signal

September 29, 2008 Université Toulouse III - Paul Sabatier 34

Contribution: speech segmentation and labeling

TSC: results

September 29, 2008 Université Toulouse III - Paul Sabatier 35

Contribution: speech segmentation and labeling

SCV labelingn Segments issued

of TSCn MFCC from the

middle of each segment

n Kernel PCAn k-means (k=3)n Labeling of clusters

according to their mean energy

September 29, 2008 Université Toulouse III - Paul Sabatier 36

Contribution: speech segmentation and labeling

TSC-SCV labeling: test conditions & resultsn 40 minutes of speech from OGI corpus (6

languages)n Results:q 74.66 % accuracy compared to manual labelingq Fbd + VActivity + Edetection [AO88] : 72.66 %q Hmm system : 81.22 %

September 29, 2008 Université Toulouse III - Paul Sabatier 37

Contribution: speech segmentation and labeling

Application: projection alignmentn Spectral projections

can randomly rotaten After SCV labeling

q Mean of S cluster in the positive side of X

q Mean of V cluster in the positive side of Y

q Mean of S cluster in the positive side of Z

n We can now model and compare projections

September 29, 2008 Université Toulouse III - Paul Sabatier 38

Contribution: speech segmentation and labeling

Application: Voiced C - Non Voiced C labelingn After TSC-SCV, Isomap with consonantsn 67.08% accuracy

September 29, 2008 Université Toulouse III - Paul Sabatier 39

- acoustic information in low-dimensional spaces- speech segmentation and labeling- content visualization of audio databases

September 29, 2008 Université Toulouse III - Paul Sabatier 40

Contribution: content visualization of audio databases

Audio databases

n Speech/musicn Musicn Languagesn Speakers

n Proposal: Visualization of acoustic sequences in 3D spaces!q Unsupervised and supervised analysis

September 29, 2008 Université Toulouse III - Paul Sabatier 41

Contribution: content visualization of audio databases

KL system

September 29, 2008 Université Toulouse III - Paul Sabatier 42

Contribution: content visualization of audio databases

KL system: speech – music database

n 60 filesq 30 from music db

(60s)q 30 from OGI

(45s)

n 15 MFCCn GMM 16

componentsn 2 well defined

clusters

September 29, 2008 Université Toulouse III - Paul Sabatier 43

Contribution: content visualization of audio databases

KL system: music results

n Music cluster filesq 9 singing voiceq 17 instrumentalq 30 rock/jazz

September 29, 2008 Université Toulouse III - Paul Sabatier 44

Contribution: content visualization of audio databases

KL system: languages database

n 60 OGI files, 3 languages (english, italian, mandarin)

n MFCC-SDC parameters

n Very difficult task

September 29, 2008 Université Toulouse III - Paul Sabatier 45

Contribution: content visualization of audio databases

KL system: speakers databasesn 6 speakers from

ANITA corpusq 3 women, 3 men

n 25 files per speakern 15 MFCC + Dn GMM 32

components

n SC eigengapindicate 6 clusters in the set

September 29, 2008 Université Toulouse III - Paul Sabatier 46

Contribution: content visualization of audio databases

KL-CV system: two modeling spaces

September 29, 2008 Université Toulouse III - Paul Sabatier 47

Contribution: content visualization of audio databases

KL-CV system: speakers databasen 6 speakers from

ANITA corpusq 3 women, 3 men

n 25 files per speakern 15 MFCC + Dn GMM 8

components

n SC eigengapindicate 4 clusters in the set

September 29, 2008 Université Toulouse III - Paul Sabatier 48

Contribution: content visualization of audio databases

SV system

September 29, 2008 Université Toulouse III - Paul Sabatier 49

Contribution: content visualization of audio databases

SV system: speakers databasen 6 speakers from

ANITA corpusq 3 women, 3 men

n 25 files per speakern 15 MFCC + Dn GMM 32

components

n SC eigengapindicate 9 clusters in the set

September 29, 2008 Université Toulouse III - Paul Sabatier 50

Contribution: content visualization of audio databases

Supervised learning results on speakers databasen SVM multiclass, one vs. all configuration

q 90 files for learning, 60 files for tests

n KL systemq 0 % test error, 85 support vectors

n KL-C systemq 3.33% test error, 22 support vectors

n KL-V systemq 3.33% test error, 17 support vectors ?

n SV systemq 6,66% test error, 33 support vectors

September 29, 2008 Université Toulouse III - Paul Sabatier 51

Outline

n Introductionn State of the artq Kernel functionsq Spectral methods for dimensionality reduction

n Contributionq Acoustic information in low-dimensional spacesq Speech segmentation and labelingq Content visualization of audio databases

n Conclusion and perspectives

September 29, 2008 Université Toulouse III - Paul Sabatier 52

Conclusions and perspectives

Conclusions

n Spectral matrices => kernel matricesn Intrinsic < original MFCCn Speech manifoldsq Particular structureq Hints to phonetic and perceptive studiesn Interpretation of speech invariants

September 29, 2008 Université Toulouse III - Paul Sabatier 53

Conclusions and perspectives

Conclusions

n Speech segmentation and labelingq Original approachq Good results and several applications

n Several proposals to transform variable length acoustic sequences into 3D vectorsq Similarity measure between sequences q Unsupervised and supervised analysis of results

September 29, 2008 Université Toulouse III - Paul Sabatier 54

Conclusions and perspectives

Future work

n Generalize regression, classification and clustering in manifolds

n Study intra-inter speaker variationsn Identify intrinsic dimensions of speech and musicn Source separationn Framework for time series studies

q Speech coding schemesq Statistical modeling of sequences, distance measures

between models

September 29, 2008 Université Toulouse III - Paul Sabatier 55

Bibliographyn [Alp04] E. Alpaydin. Introduction to Machine Learning. MIT

Press, 2004.n [AO88] R. André-Obrecht. A new statistical approach for

automatic speech segmentation. Transactions on Audio, Speech, and Signal Processing, 1988.

n [Bor97] I. Borg, P. Groenen. Modern Multidimensional Scaling : Theory and Applications. Springer, 1997.

n [Evg99] T. Evgeniou, M. Pontil, T. Poggio. Regularization networks and support vector machines. Advances in Computational Mathematics, 1999.

n [Ng01] A. Ng, M. Jordan, Y. Weiss. On spectral clustering : Analysis and an algorithm. Advances in Neural Information Processing Systems, MIT Press, 2001.

n [Row00] S. Roweis, L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 2000.

n [Ten00] J. Tenenbaum, V. D. Silva, J. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 2000.

September 29, 2008 Université Toulouse III - Paul Sabatier 56

Thank you!