hiwire progress report trento, january 2007 presenter: prof. alex potamianos technical university of...

36
HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete

Post on 21-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

HIWIRE Progress ReportTrento, January 2007

Presenter: Prof. Alex PotamianosTechnical University of Crete

Presenter: Prof. Alex PotamianosTechnical University of Crete

Page 2: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Outline

Long Term Research• Audio-Visual Processing (WP1)• Segment Models (WP1)• Bayes’ Optimal Adaptation (WP2)

Research for the Platforms• Features and Fusion

Integration on Year 2 Platforms• Mobile Platform • Fixed Platform

Page 3: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Outline

Long Term Research• Audio-Visual Processing (WP1)• Segment Models (WP1)• Bayes’ Optimal Adaptation (WP2)

Research for the Platforms• New Features and Fusion

Integration on Year 2 Platforms• Mobile Platform • Fixed Platform

Page 4: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Stream-Weights: Motivation Low performance of ASR systems if low SNR

combine several sources of information

Sources of information are not equally reliable for different environments and noise conditions

Mismatch between training and test conditions

Unsupervised stream weight computation for multistream classifiers is an open problem.

Page 5: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Problem Definition Compute “optimal” exponent weights for each

stream si

Optimality in the sense of minimizing “total classification error”

Page 6: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Total Error Computation

Two class problem w1, w2, for the feature vector x

Feature pdfs p(x |w1) p(x |w2) Assume that estimation/modeling error is normal

variable zi

Page 7: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Optimal Stream Weights (1)

Minimize σ2 with respect to stream

Two interesting cases• Equal error rate in single-stream classifiers

p(x1 | w1 ) = p(x2 | w1) in decision region

• Equal estimation error variance in each stream

σS12 =σS2

2

Page 8: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Optimal Stream Weights (2) Equal error rate in single-stream classifiers

Equal estimation error variance in each stream

Page 9: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Antimodels, Inter and Intra Distances The multi-class problem is reposed as (multiple)

two-class classification problem

If p(x|w) follows a Gaussian distribution N(μ ,σ²), the Bayes error is function of D=|μ1 - μ2|/σ

Page 10: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Experimental Results (1) Test case: audio-visual continuous digit recognition

task

Difference from ideal two-class case• Multi-class problem• Recognition instead of classification

Multiple experiments:• clean video stream• noise corrupted audio streams at various SNR

Page 11: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Experimental Results (2) Subset of CUAVE database used:

• 36 speakers (30 training, 6 testing)• 5 sequences of 10 connected digits per speaker• Training set: 1500 digits (30x5x10)• Test set: 300 digits (6x5x10)

Features:• Audio: 39 features (MFCC_D_A)• Visual: 39 features (ROIDCT_D_A, odd columns)

Multi-Streams HMM models:• 8 state, left-to-right HMM whole-digit models• Single Gaussian mixture• AV-HMM uses separate audio and video feature streams

Page 12: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Weights’ distribution

Page 13: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Results (classification)

Page 14: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Inter- Intra- Distances and Recognition In each stream a total inter- intra- dist is computed

Page 15: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Inter- Intra- Distances and Recognition In each stream a total inter- intra- dist is computed

Page 16: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Results (recognition)

Page 17: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Conclusions We have proposed a stream computation method

for a multi class classification task based on theoretical results obtained for a two classes classification problem and making use of an anti-model technique

We use only the test utterance and the information contained in the trained models

Results are of interest for the problem of unsupervised estimation of stream weights for multi-streams classification and recognition problems

Page 18: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Outline

Long Term Research• Audio-Visual Processing (WP1)• Segment Models (WP1)• Bayes’ Optimal Adaptation (WP2)

Research for the Platforms• New Features and Fusion

Integration on Year 2 Platforms• Mobile Platform • Fixed Platform

Page 19: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Dynamical System Segment Model

Segment models directly model time evolution of speech parameters

Based on linear dynamical system

The system parameters should guarantee • Identifiability, Controllability, Observability, Stability

Simple matrix topologies studies up to now

1k k k

k k k

x Fx w

y Hx v

Page 20: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Linear dynamical system with state-control:

Parameters F,B,H have canonical forms(Ljung – “System Identification”)

Generalized forms of parameter structures

1k k k k

k k k

x Fx Bu w

y Hx v 0 1 0 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0

0 0 0 0 1 0 0 0 0

,

0 0 0 0 0 0 1 0 0

0 0 0 0 0 0 0 1 0

0 0 0 0 0 0 0 0 1

1 0 0 0 0 0 0 0 0

0 0 0 1 0 0 0 0 0

0 0 0 0 0 1 0 0 0

F B

H

Page 21: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Parameter Estimation

Use of EM algorithm to estimate the parameters F,B,P,R• We propose a new element-wise parameter estimation algorithm

For the forward-backward recursions, use Kalman smoother recursions

1 1 11 1 1 1 1^

1 11

( ) ( )

( )

M N M M Nc j r j

ic k k ic cr k kc k c r k

c j r j

ij Nj j

ii k kk

cof P x x cof P f x x

f

cof P x x

^

1 1 1 11 1 1 1 1 1 1 1

N M N M N M M Ni j r j r i r ck k ir k k jr k k ic jr k kij

k r k r k c r k

x x f x x f x x f f x xp

Page 22: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Experiments with artificial data

Experiments description:• Select random system parameters (using canonical matrix topology)

• Generate artificial data from the system

• Parameter estimation using the artificial data

Criteria for the evaluation of the system:• The log likelihood of the observations increases per EM iter.

• The parameter estimation error decreases per EM iter.

Page 23: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

•Without state control•Dimension of F: 3x3•Observation vector size: 3x1•# of rows with free parameters: 3•# of samples: 1000

Page 24: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Model Training on Speech Data Aurora 2 Database 77 training sentences Word models with different number of states based on the

phonetic transcription

State alignments produced using HTK

Segments Models

2 oh

4 two, eight

6 one, three, four, five, six, nine, zero

8 seven

Page 25: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Speech Segment Modeling

Page 26: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Classification process

Keep true word-boundaries fixed • Digit-level alignments produced by an HMM

Apply suboptimum search and pruning algorithm• Keep the 11 most probable word-histories for each word in the

sentence

Classification is based on maximizing the likelihood

Test set:• Aurora 2, test A, subway sentences • 1000 test sentences• Different levels of noise (Clean, SNR: 20, 15, 10, 5 dB)• Front-End extracts 14-dimensional features (static features):

• HTK standard front-end• 2 feature configurations

– 12 Cepstral Coefficients + C0 + Energy– + first and second order derivatives (δ, δδ)

Page 27: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Classification results

Comparison of Segment-Models and HTK HMM classification (% Accuracy)• Same Front-End configuration, same alignments• Both Models trained on clean training data

AURORASubway

HMM (HTK) Segment Models

MFCC, E +δ +δδ MFCC, E +δ +δδ

Clean 97,19% 97,57% 97,53% 97,61%

SNR20 90,91% 95,71% 93,23% 95,12%

SNR15 80,09% 91,76% 87,91% 91,13%

SNR10 57,68% 81,93% 76,29% 82,69%

SNR5 36,01% 64,24% 54,87% 63,56%

Page 28: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Conclusions and Future work

Without derivatives Segment-models significantly outperform HMMs particularly under highly noisy conditions

When derivatives are used for both models their performance is similar

Use formants and other articulatory features to initialize the state vectors

Examine different dimensions of the state vector

Extension to a non-linear dynamical system• Use of extended Kalman filter

• Derivation of the EM reestimation formulae for the non-linear case

Page 29: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Outline

Long Term Research• Audio-Visual Processing (WP1)• Segment Models (WP1)• Bayes’ Optimal Adaptation (WP2)

Research for the Platforms• New Features and Fusion

Integration on Year 2 Platforms• Mobile Platform • Fixed Platform

Page 30: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

MAP versus Bayes Optimal

MAP adaptation techniques derive from Bayes Optimal Classification• Assumption: Posterior is peaked around the most probable

model• It is not optimal

Bayes Optimal adaptation is based on a weighted average of the posteriors• Better Performance with less training data• But:

• Computationally expensive• Hard to find analytical solutions

• Approximations should be considered

Page 31: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Bayes Optimal Adaptation

Bayes optimal classification is based on:

Assuming θ denotes a Gaussian component this becomes:

• Θ is a subset of Gaussians

1

| , | , | ,N

t t a t a t a t

R

p x s X p x X s p X s d

Page 32: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Our Approach

To obtain the N Gaussians of Θ:• Step 1: Cluster the Gaussian mixtures associated to context-

dependent models with common central phone• Step 2: From the extended Gaussian mixture choose the N less

distant Gaussians from each Gaussian component of the SI Gaussian mixture

Bayes optimal classification becomes:

1

( | , ) ( ) ( ; , ) ( | , )M

t t a i t t i i a ti

p x s X c s N x m S p X s

Page 33: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Gau

ssia

n S

ize

Number of MixtureComponents

1 2 M 1 2 M

Mixture 1 Mixture 2

• For example based on the entropy-based distance between the Gaussians the less distant Gaussians (in gray color) are clustered together

• The clustering can be performed at an element or sub-vector basis thus increasing the degrees of freedom.

Page 34: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Adaptation Configuration

Baseline trained on the WSJ database Adaptation data:

• spoke3 WSJ task• non-native speakers• 5 male and 5 female• 20 adaptation sentences per speaker• 40 test sentences per speaker

Perform experiments for different number of associated mixtures (associations)

Page 35: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Adaptation Results (% WER)Bayes’ Adaptation Baseline

5 Associations 6 Associations

Male speaker (4n0) 51.52% 47.65% 59.28%

Male speaker (4n3) 43.27% 41.98% 51.72%

Male speaker (4n5) 33.13% 31.48% 36.30%

Male speaker (4n9) 34.48% 33.43% 28.96%

Male speaker (4na) 26.66% 26.22% 28.72%

Total Male %WER 37.87% 36.15% 40,99%

Female speaker (4n1) 74.96% 74.47% 81.01%

Female speaker (4n4) 58.18% 58.18% 60.12%

Female speaker (4n8) 34.16% 35.99% 30.85%

Female speaker (4nb) 40.31% 39.38% 39.06%

Female speaker (4nc) 40.23% 41.68% 42.97%

Total Female %WER 49.56% 49.94% 50.80%

Page 36: HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University

Total Results and Conclusions

Adaptation Baseline

5 Associations 6 Associations

Total %WER 43.71% 43.04% 45.89%

Small improvements can be obtained compared to the Baseline

The number of associations significantly influences the adaptation performance

The optimum number of associations depends on the baseline models and the adaptation data dynamically choose the associations