spectral methods for automatic processing of audio documents · september 29, 2008 université...
TRANSCRIPT
September 29, 2008 Université Toulouse III - Paul Sabatier 1
Spectral methods for automatic processing of audio documents
José Anibal Arias Aguilar
Advisor: Régine André-ObrechtTutor: Jérôme Farinas
September 29, 2008 Université Toulouse III - Paul Sabatier 2
Objectives
n Unify different dimensionality reduction approaches
n Visualize speechn Identify basic units in speechn Represent variable-length acoustic
sequences by 3D vectors
September 29, 2008 Université Toulouse III - Paul Sabatier 3
Outline
n Introductionn State of the artq Kernel functionsq Spectral methods for dimensionality reduction
n Contributionq Acoustic information in low dimensional spacesq Speech segmentation and labelingq Content visualization of audio databases
n Conclusions and perspectives
September 29, 2008 Université Toulouse III - Paul Sabatier 4
Introduction
n Data and machine learningq Quantity and quality, but dimensionality?
n Manifoldsq Low dimensional data embedded in high
dimensional spaces
n Kernel functionsq Link between pattern space and feature space
n Speech soundsq Complex, high dimensional information
September 29, 2008 Université Toulouse III - Paul Sabatier 5
Outline
n Introductionn State of the artq Kernel functionsq Spectral methods for dimensionality reduction
n Contributionq Acoustic information in low-dimensional spacesq Speech segmentation and labelingq Content visualization of audio databases
n Conclusions and perspectives
September 29, 2008 Université Toulouse III - Paul Sabatier 6
State of the art: kernel functions
Pattern space vs feature spacen We can transform the pattern space to find
more informative data representations
September 29, 2008 Université Toulouse III - Paul Sabatier 7
State of the art: kernel functions
Feature space: propertiesn Desirable properties of the new spaceq Contain a rich class of functionsq Have linear structureq Have inner product so that we can take
projections
n Example: Hilbert space (complete vector space with inner product)
September 29, 2008 Université Toulouse III - Paul Sabatier 8
State of the art: kernel functions
Access to feature space: Kernelsn X is a compact metric space
psdis)x,x(K,Xx)x,z()z,x(
thatsuchXX:
jiiji κκκ
κ
=∈∀=
ℜ→×
)z()x()z,x(
thatsuchspaceHilbertaisFwhereFX:
Φ⋅Φ=
→Φ
κ
n For every Mercer kernel
September 29, 2008 Université Toulouse III - Paul Sabatier 9
State of the art: kernel functions
Kernels and regularization theory [Evg99]
n Data: n Estimate
n Hypothesis space H (RKHS), complexity of the solution controlled by Hilbert space norm
n Representer theorem:
21
Hiii
Hf
f)o),x(f(Vn
minargf ∑ +=∈
λ
)x,x()x(f ii
iκα∑=
ℜ×ℜ∈ dnn )o,x(,),o,x( K11
OX:f →
fit to data complexity penalty
September 29, 2008 Université Toulouse III - Paul Sabatier 10
Outline
n Introductionn State of the artq Kernel functionsq Spectral methods for dimensionality reductionn Principal Component Analysis (PCA)n Metric Multidimensional Scaling (MDS)n Isometric mapping (ISOMAP)n Locally Linear Embedding (LLE)n Spectral Clustering (SC)
September 29, 2008 Université Toulouse III - Paul Sabatier 11
State of the art: spectral methods
Spectral methods for dimensionality reduction: two approachesn Manifold learning: nearby points remain
nearby, distant points remain distantn Information extraction: separate clustersn Spectral methods reveal low dimensional
structure by eigenvalues and eigenvectors of special matrices
September 29, 2008 Université Toulouse III - Paul Sabatier 12
State of the art: spectral methods
Linear methods: PCAn Principal Component Analysis [Alp04]q Spectral decomposition of covariance matrixq Eigenvectors: principal axes of maximum variance
subspaceq Eigenvalues: projected variance of inputs along
principal axes. The number of significant (non negative) eigenvalues estimates dimensionality
September 29, 2008 Université Toulouse III - Paul Sabatier 13
State of the art: spectral methods
Linear methods: MDSn Metric Multidimensional Scaling [Bor97]q Spectral decomposition of dot product matrix
(computed in terms of Euclidean distances of zero mean vectors)
q Eigenvectors: low dimensional embeddingq Eigenvalues: measure how each dimension
contributes to dot products. The number of significant (non negative) eigenvalues estimates dimensionality
September 29, 2008 Université Toulouse III - Paul Sabatier 14
State of the art: spectral methods – manifold learning
Nonlinear methods: ISOMAPn Preserve geodesic distances as estimated
along the manifoldn Algorithm [Ten00]:q Build adjacency graph: vertices represent inputs
and edges weighted by local distances connect neighbors
q Estimate geodesics: compute shortest paths through graph
q Metric MDS
September 29, 2008 Université Toulouse III - Paul Sabatier 15
State of the art: spectral methods – manifold learning
Nonlinear methods: ISOMAPn Assumptionsq Graph is connectedq Neighborhoods on graph reflect neighborhoods on
manifold (no shortcuts)q Dense graph without “holes”
September 29, 2008 Université Toulouse III - Paul Sabatier 16
State of the art: spectral methods – manifold learning
Nonlinear methods: ISOMAP
Fingerextension
Wrist rotation
September 29, 2008 Université Toulouse III - Paul Sabatier 17
State of the art: spectral methods – manifold learning
Nonlinear methods: LLEn Preserve local geometric relationshipsn Algorithm [Row00]:q Nearest neighbor searchq Characterize local geometry of each
neighborhood by weights W ij
q Optimize low dimensional outputs
September 29, 2008 Université Toulouse III - Paul Sabatier 18
State of the art: spectral methods – manifold learning
Nonlinear methods: LLEn Different approach than ISOMAPq Preserve local geometry: assume neighbors lie on
locally linear patchesq Construct sparse matrix
September 29, 2008 Université Toulouse III - Paul Sabatier 19
State of the art: spectral methods – manifold learning
Non linear methods: LLE
Pose
Expression
September 29, 2008 Université Toulouse III - Paul Sabatier 20
State of the art: spectral methods – information extraction
Nonlinear methods: Spectral clusteringn Discover non convex clustersn Graph partition problem (minimal cut)
September 29, 2008 Université Toulouse III - Paul Sabatier 21
State of the art: spectral methods – information extraction
Nonlinear methods: Spectral clustering
n Relaxation of the Ncut problemn Solution based on eigenvectors of an affinity
matrix [Ng01]
=
⇒
=3
2
1
3
2
1
33
22
11
000000
000000
YYY
vv
v
AA
AA
)(
)(
)(
September 29, 2008 Université Toulouse III - Paul Sabatier 22
Outline
n Introductionn State of the artq Kernel functionsq Spectral methods for dimensionality reduction
n Contributionq Acoustic information in low-dimensional spacesq Speech segmentation and labelingq Content visualization of audio databases
n Conclusion and perspectives
September 29, 2008 Université Toulouse III - Paul Sabatier 23
Contribution
Corporan OGI-MLTS
q 100 files of spontaneous telephonic speech (~45s, 8kHz)q Multilanguage (English, German, Hindi, Japanese,
Mandarin, Spanish)q Phonetically labeled
n ANITAq 150 files of studio speech (~7s, 16kHz)q 6 speakersq Posed and stressed conditions
n MUSICq 70 files (60s, 16khz)q Classic, singing voice, rock, jazz
September 29, 2008 Université Toulouse III - Paul Sabatier 24
Contribution
Some considerations
n Complexityq Isomap, SC-Kernel PCA: ~8000 vectors (~1 min
signal), 10 mins if we use phonetic speech labelsq LLE, LapEig, Landmark Isomap: ~10000 vectors
n Audio intrinsic dimensionality (MLE)q Speech: ~ 8-9 MFCCq Music: ~ 7-8 MFCCq Speech in stress conditions: dim - 1
September 29, 2008 Université Toulouse III - Paul Sabatier 25
- acoustic information in low-dimensional spaces- speech segmentation and labeling- content visualization of audio databases
September 29, 2008 Université Toulouse III - Paul Sabatier 26
Contribution: acoustic information in low-dimensional spaces
Speech manifolds: speech structure
n OGI sequencen 15 MFCCn Simplified phonetic
labels
n ISOMAP discovers a particular distribution of phonetic classes
September 29, 2008 Université Toulouse III - Paul Sabatier 27
Contribution: acoustic information in low-dimensional spaces
Eigenvalues as intrinsic dimensionality estimators
n OGI sequencen 15 MFCC
n Original variance retained in the first 6dim:q PCA: 74.27%q Kernel PCA:
86.84%q ISOMAP: 89.80%
September 29, 2008 Université Toulouse III - Paul Sabatier 28
Contribution: acoustic information in low-dimensional spaces
Speech manifolds: speech and music
n 20s of audio signal containing speech and music
n 15 MFCCn Laplacian eigenmaps
n Different zones of variation
September 29, 2008 Université Toulouse III - Paul Sabatier 29
Contribution: acoustic information in low-dimensional spaces
Information extraction: a new kind of projectionsn OGI sequences in
english, mandarin and spanish
n 15 MFCCn Spectral clustering
n Different geometric structure than manifold learning approach
September 29, 2008 Université Toulouse III - Paul Sabatier 30
Contribution: acoustic information in low-dimensional spaces
Information extraction: labels
September 29, 2008 Université Toulouse III - Paul Sabatier 31
- acoustic information in low-dimensional spaces- speech segmentation and labeling- content visualization of audio databases
September 29, 2008 Université Toulouse III - Paul Sabatier 32
Contribution: speech segmentation and labeling
Temporal spectral clustering
n OGI sequencesn 15 MFCC + D + DD
n Classical SC affinity matrix
n New metric applied to the main diagonal
A
A’
September 29, 2008 Université Toulouse III - Paul Sabatier 33
Contribution: speech segmentation and labeling
Temporal spectral clustering
ikki
ik
xx
ik
aa
otherwiseSxifea
kiif
ki
=
∈=
<−
−
0
2
2
2σ
n The new metric takes into account temporal closeness between vectors
n Eigenvectors of A’ are associated to segments on the signal
September 29, 2008 Université Toulouse III - Paul Sabatier 34
Contribution: speech segmentation and labeling
TSC: results
September 29, 2008 Université Toulouse III - Paul Sabatier 35
Contribution: speech segmentation and labeling
SCV labelingn Segments issued
of TSCn MFCC from the
middle of each segment
n Kernel PCAn k-means (k=3)n Labeling of clusters
according to their mean energy
September 29, 2008 Université Toulouse III - Paul Sabatier 36
Contribution: speech segmentation and labeling
TSC-SCV labeling: test conditions & resultsn 40 minutes of speech from OGI corpus (6
languages)n Results:q 74.66 % accuracy compared to manual labelingq Fbd + VActivity + Edetection [AO88] : 72.66 %q Hmm system : 81.22 %
September 29, 2008 Université Toulouse III - Paul Sabatier 37
Contribution: speech segmentation and labeling
Application: projection alignmentn Spectral projections
can randomly rotaten After SCV labeling
q Mean of S cluster in the positive side of X
q Mean of V cluster in the positive side of Y
q Mean of S cluster in the positive side of Z
n We can now model and compare projections
September 29, 2008 Université Toulouse III - Paul Sabatier 38
Contribution: speech segmentation and labeling
Application: Voiced C - Non Voiced C labelingn After TSC-SCV, Isomap with consonantsn 67.08% accuracy
September 29, 2008 Université Toulouse III - Paul Sabatier 39
- acoustic information in low-dimensional spaces- speech segmentation and labeling- content visualization of audio databases
September 29, 2008 Université Toulouse III - Paul Sabatier 40
Contribution: content visualization of audio databases
Audio databases
n Speech/musicn Musicn Languagesn Speakers
n Proposal: Visualization of acoustic sequences in 3D spaces!q Unsupervised and supervised analysis
September 29, 2008 Université Toulouse III - Paul Sabatier 41
Contribution: content visualization of audio databases
KL system
September 29, 2008 Université Toulouse III - Paul Sabatier 42
Contribution: content visualization of audio databases
KL system: speech – music database
n 60 filesq 30 from music db
(60s)q 30 from OGI
(45s)
n 15 MFCCn GMM 16
componentsn 2 well defined
clusters
September 29, 2008 Université Toulouse III - Paul Sabatier 43
Contribution: content visualization of audio databases
KL system: music results
n Music cluster filesq 9 singing voiceq 17 instrumentalq 30 rock/jazz
September 29, 2008 Université Toulouse III - Paul Sabatier 44
Contribution: content visualization of audio databases
KL system: languages database
n 60 OGI files, 3 languages (english, italian, mandarin)
n MFCC-SDC parameters
n Very difficult task
September 29, 2008 Université Toulouse III - Paul Sabatier 45
Contribution: content visualization of audio databases
KL system: speakers databasesn 6 speakers from
ANITA corpusq 3 women, 3 men
n 25 files per speakern 15 MFCC + Dn GMM 32
components
n SC eigengapindicate 6 clusters in the set
September 29, 2008 Université Toulouse III - Paul Sabatier 46
Contribution: content visualization of audio databases
KL-CV system: two modeling spaces
September 29, 2008 Université Toulouse III - Paul Sabatier 47
Contribution: content visualization of audio databases
KL-CV system: speakers databasen 6 speakers from
ANITA corpusq 3 women, 3 men
n 25 files per speakern 15 MFCC + Dn GMM 8
components
n SC eigengapindicate 4 clusters in the set
September 29, 2008 Université Toulouse III - Paul Sabatier 48
Contribution: content visualization of audio databases
SV system
September 29, 2008 Université Toulouse III - Paul Sabatier 49
Contribution: content visualization of audio databases
SV system: speakers databasen 6 speakers from
ANITA corpusq 3 women, 3 men
n 25 files per speakern 15 MFCC + Dn GMM 32
components
n SC eigengapindicate 9 clusters in the set
September 29, 2008 Université Toulouse III - Paul Sabatier 50
Contribution: content visualization of audio databases
Supervised learning results on speakers databasen SVM multiclass, one vs. all configuration
q 90 files for learning, 60 files for tests
n KL systemq 0 % test error, 85 support vectors
n KL-C systemq 3.33% test error, 22 support vectors
n KL-V systemq 3.33% test error, 17 support vectors ?
n SV systemq 6,66% test error, 33 support vectors
September 29, 2008 Université Toulouse III - Paul Sabatier 51
Outline
n Introductionn State of the artq Kernel functionsq Spectral methods for dimensionality reduction
n Contributionq Acoustic information in low-dimensional spacesq Speech segmentation and labelingq Content visualization of audio databases
n Conclusion and perspectives
September 29, 2008 Université Toulouse III - Paul Sabatier 52
Conclusions and perspectives
Conclusions
n Spectral matrices => kernel matricesn Intrinsic < original MFCCn Speech manifoldsq Particular structureq Hints to phonetic and perceptive studiesn Interpretation of speech invariants
September 29, 2008 Université Toulouse III - Paul Sabatier 53
Conclusions and perspectives
Conclusions
n Speech segmentation and labelingq Original approachq Good results and several applications
n Several proposals to transform variable length acoustic sequences into 3D vectorsq Similarity measure between sequences q Unsupervised and supervised analysis of results
September 29, 2008 Université Toulouse III - Paul Sabatier 54
Conclusions and perspectives
Future work
n Generalize regression, classification and clustering in manifolds
n Study intra-inter speaker variationsn Identify intrinsic dimensions of speech and musicn Source separationn Framework for time series studies
q Speech coding schemesq Statistical modeling of sequences, distance measures
between models
September 29, 2008 Université Toulouse III - Paul Sabatier 55
Bibliographyn [Alp04] E. Alpaydin. Introduction to Machine Learning. MIT
Press, 2004.n [AO88] R. André-Obrecht. A new statistical approach for
automatic speech segmentation. Transactions on Audio, Speech, and Signal Processing, 1988.
n [Bor97] I. Borg, P. Groenen. Modern Multidimensional Scaling : Theory and Applications. Springer, 1997.
n [Evg99] T. Evgeniou, M. Pontil, T. Poggio. Regularization networks and support vector machines. Advances in Computational Mathematics, 1999.
n [Ng01] A. Ng, M. Jordan, Y. Weiss. On spectral clustering : Analysis and an algorithm. Advances in Neural Information Processing Systems, MIT Press, 2001.
n [Row00] S. Roweis, L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 2000.
n [Ten00] J. Tenenbaum, V. D. Silva, J. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 2000.