july 2011 1 age and gender recognition from speech patterns based on supervised non-negative matrix...

24
July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme

Upload: stella-nichols

Post on 13-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme

July 2011 1

Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization

Mohamad Hasan Bahari

Hugo Van hamme

Page 2: July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme

2

Outline

Introduction and Motivations

Age and Gender Recognition

Corpora

Supervised Non-negative Matrix Factorization

Proposed Method

Results

Conclusions and Future Researches

Page 3: July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme

3

Introduction

Confirming the identity of individuals

Biometric Characteristics Fingerprint

Face

Iris

Hand Geometry

Ear Shape

Voice pattern

Choosing a characteristic Availability

Reliability

Page 4: July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme

4

Motivation

In many real world cases, only speech patterns are available (kidnapping, threatening calls, …)

Speech patterns can include many interesting information

Gender

Age

Dialect (original or previous regions)

Membership of a particular social group

To facilitates in identifying a criminal

To narrow down the number of suspects

Page 5: July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme

Goal

5

Goal:

To extract different physical and psychological characteristics of the speaker from his/her voice patterns (Speaker Profiling).

Physical:

1. Gender

2. Age

3. Accent

4. …

Psychological:

1. Anxiousness

2. Stress

3. Confidence

4. …

Page 6: July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme

Age and Gender Recognition

6

Three approaches:

I. Directly from speech signal.

II. Modeling the speech generation system.

III. Modeling the hearing system.

Page 7: July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme

7

I. Directly from speech signal. Different acoustic features vary with age.

1) Fundamental frequency

2) Speech rate

3) Sound pressure level

4) …

By Finding all acoustic features varying with age and their exact relation to the speaker age.

Conceptually simple and computationally inexpensive

x These features are affected by many other parameters, such as weight, height, voice quality, emotional condition, …

Age and Gender Recognition

Page 8: July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme

8

Effect of Age and Gender on speech (Fundamental frequency) [1]

Age and Gender Recognition

[1] W. S. Brown, R. J. Morris, H. Hollien, and E. Howell, Journal of Voice, vol. 5, pp. 310–315, 1991.

Age is only one of inputs affecting the speech and consequently acoustic features.

It is impossible to estimate the age without considering the rest of inputs

Perceptions of gender and age have a significant mutual impact on each other.

Page 9: July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme

9

II. Modeling the speech generation system. It is an input estimation problem.

x Modeling the speech generation system of the speaker is very difficult.

Age and Gender Recognition

Page 10: July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme

10

Age and Gender Recognition

III. Modeling the hearing system

To solve the speech recognition problem, the hearing system is modeled using Hidden Markove Models (HMMs).

Using the tools applied in speech recognition problems (HMMs) .

Well established.

Accurate in recognizing content.

x There exist a difference between the age of a speaker as perceived, and their actual age.

x Computationally complex

Page 11: July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme

11

Corpora

Category NameYoung Male

Young Female

Middle Male

Middle Female

Senior Male

Senior Female

Age 18-35 18-35 36-45 36-45 46-81 46-81

Number of Speakers 85 53 160 41 191 25

555 speakers from the N-best evaluation corpus [1]

The corpus contains live and read commentaries, news, interviews, and reports broadcast in Belgium

Different age groups and genders

[1] D. A. Van Leeuwen, J. Kessens, E. Sanders, and H. van den Heuvel, In proc. Interspeech, pp. 2571-2574, 2009.

Page 12: July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme

SNMF

12

Non-negative Matrix Factorization (NMF) is a popular machine learning algorithm [1]

It is used in supervised or unsupervised modes.

Supervised NMF or SNMF is a pattern recognition method [1] It is very effective in the case of high dimension input space. It is a generative classifier. It can directly classify patterns into multiple classes (no need to

change the problem into multiple binary classification).

[1] H. Van hamme, In proc. Interspeech, Australia, pp. 2554-2557, 2008.

Page 13: July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme

13

Problem Statement:

Given a training data-set: Str= {(x1, y1), . . ., (xn, yn), . . . , (xN, yN)}

xn is a vector of observed characteristics for the data item

yn denotes a label vector which represents the class that xn belongs to

Goal:

Approximation of a classifier function (g), such that ŷ=g(xtst) is as

close as possible to the true label.

xtst is an unseen observation

SNMF

Page 14: July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme

SNMF

SNMF in Training Phase:First step: Second step:

Extended Kullbeck-Leibler divergence:

Multiplicative updating formula:

14

Ntr

B

NtrS

xxV

yyV

1

1

tr

trB

trS

trB

trStrtrtr H

W

W

V

VHWV

znzn

tr

mn

trmnmn

trtr

mntrtr

trmntr

mntrtrtr

KL HVHWHW

VVHWVD log

trtr

trTtr

NMTtr

trtr

Ttr

trtr

tr

TtrNM

trtr

HW

VW

W

HH

HHW

V

H

WW

)(1)(

)()(1

trB

trStr

V

VV

Page 15: July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme

SNMF

SNMF in Testing Phase:

First step: Second step:

Extended Kullbeck-Leibler divergence:

Multiplicative updating formula:

15

tsttrB

tstKL

H

trS

tst HWxDWxgytst

minarg)(ˆ

tsttrB

tst HWx tsttrS

tst HWxgy )(ˆ

zz

tst

m

tstmm

tsttrB

mtsttr

B

tstmtst

mtsttr

Bstt

KL HxHWHW

xxHWxD log

tsttr

B

tstTtr

B

MTtr

B

tsttst

HW

xW

W

HH )(

1)( 1

Page 16: July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme

Proposed Method

16

1. Feature selection

2. Acoustic modeling

3. Supervector making procedure

4. Training phase

5. Testing phase

Page 17: July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme

Proposed Method

17

1. Feature selection• MEL Spectra

• Mean normalization

• vocal tract length normalization

• Augmented with their first and second order time derivatives.

Speech Signal

Feature selection

Feature Vectors

….

Page 18: July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme

Proposed Method

18

2. Acoustic modeling

Speaker independent Model:

• An HMM with a shared pool of 49740 Gaussians to model the observations in 3873 cross-word context-dependent tied triphone states.

Adaptation Method:

• The speaker dependent mixture weights for each speaker result from a re-estimation of the speaker independent weights based on a forced alignment of the training data for that speaker using a speaker-independent acoustic model.

The result of this step is 555 speaker adapted models

Speaker Independent

Model

Speaker Adaptation

Method

Model of the

Speaker

Page 19: July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme

Proposed Method

19

3. Supervector making procedure

Gaussian Mixture Model (GMM) of each speaker adapted HMMs is:

Three type of supervectors:

1. Means

2. Variances

3. Weights

Weights supervectors:

The result of this step is 555 supervectors for each of 555 speakers

),,()(1

sj

sjt

J

j

sjt owo

sf

s

TTSTsT

n

TsQ

sq

sss wwwfr

)()()( 1

1

Page 20: July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme

Proposed Method

20

4. Training phase

5. Testing phase

Page 21: July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme

Results

21

Evaluation Methodology 5-fold cross-validation (five independent run)

In each of five run: Training set is speech data of 444 speakers

Testing set is speech data of 111 speakers

TST TR TR TR TR

Database

TR TST TR TR TR

Database

.

.

.

Run 1

Run 2

Page 22: July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme

Results

22

Gender recognition is 96%.

relative confusion matrix

Age group recognition

CLAC

YM YF MM MF SM SF

YM 13 03 58 0 26 0YF 02 77 04 11 057 0MM 06 01 44 01 47 0

MF 0 54 02 24 17 02SM 03 01 19 0 76 0SF 0 2 08 28 28 16

Category Name Young Male Young

Female Middle Male Middle Female Senior Male Senior Female

Prior 15 10 29 7 34 4Accuracy 13 77 44 24 76 16

Page 23: July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme

Conclusions and Future Researches

23

Conclusions:

1. A new age-gender recognition method based on SNMF

2. Supervectors of GMM weights were used

3. Evaluated on N-Best Corpus

4. Gender recognition accuracy is 96%

5. Age group recognition accuracy is significantly higher than chance level

Future Researches:

1. Age estimation instead of age group recognition.

2. Using supervectors of GMM means and variances and combining these features

Page 24: July 2011 1 Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme

Thank You for Your Attention

24