emotional speech recognition - columbia universitydpwe/e6820/proposals/kisang.pdfwhat is emotional...

16
Emotional Speech Recognition Kisang Pak E6820: Speech & Audio Processing & Recognition Professor Dan Ellis

Upload: others

Post on 19-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Emotional Speech Recognition - Columbia Universitydpwe/e6820/proposals/kisang.pdfWhat is emotional speech recognition? A technique which can recognize emotions in a speech Common emotions:

Emotional Speech Recognition

Kisang PakE6820: Speech & Audio Processing &

RecognitionProfessor Dan Ellis

Page 2: Emotional Speech Recognition - Columbia Universitydpwe/e6820/proposals/kisang.pdfWhat is emotional speech recognition? A technique which can recognize emotions in a speech Common emotions:

What is emotional speech recognition?

A technique which can recognize emotions in a speechCommon emotions: anxiety, boredom, dissatisfaction, dominance, depression, disgust, frustrated, fear, happiness, indifference, irony, joy, neutral, panic, prohibition, surprise, sadness, stress, shyness, shock, tiredness, task load stress, worryA system usually recognizes 3-5 emotions

Page 3: Emotional Speech Recognition - Columbia Universitydpwe/e6820/proposals/kisang.pdfWhat is emotional speech recognition? A technique which can recognize emotions in a speech Common emotions:

Common technique

•Speech samples

Input Signal

•Pitch•Speech Energy•Formant Frequencies

Feature Extraction

•HMM•Binary decision tree•ANN (Artificial NeuralNetworks)

Classification

•Based on theresults of theclassification

Result

Page 4: Emotional Speech Recognition - Columbia Universitydpwe/e6820/proposals/kisang.pdfWhat is emotional speech recognition? A technique which can recognize emotions in a speech Common emotions:

Feature Extractions: Pitch

Page 5: Emotional Speech Recognition - Columbia Universitydpwe/e6820/proposals/kisang.pdfWhat is emotional speech recognition? A technique which can recognize emotions in a speech Common emotions:

Feature Extractions: Energy

, fs(n:m)=s(n)w(m-n)s(n): speech signal, w(m-n): window (i.e. hamming) of length Nw

Page 6: Emotional Speech Recognition - Columbia Universitydpwe/e6820/proposals/kisang.pdfWhat is emotional speech recognition? A technique which can recognize emotions in a speech Common emotions:

Feature Extractions: Formants

Neutral Anger Joy

Formant 1 Frequency 355.6Formant 2 Frequency 1400.4Formant 3 Frequency 2588.6Formant 4 Frequency 3505.9Formant 5 Frequency 4653.3Formant 6 Frequency 5338.3Formant 7 Frequency 6279.6Formant 8 Frequency 7000.2

Formant 1 Frequency 562.9Formant 2 Frequency 743.9Formant 3 Frequency 1458.5Formant 4 Frequency 2882.6Formant 5 Frequency 3731.8Formant 6 Frequency 4196.8Formant 7 Frequency 5381.2Formant 8 Frequency 6419.5Formant 9 Frequency 7215.3

Formant 1 Frequency 412.1Formant 2 Frequency 674.6Formant 3 Frequency 1567.9Formant 4 Frequency 2653.4Formant 5 Frequency 3661.1Formant 6 Frequency 4372.9Formant 7 Frequency 5489.9Formant 8 Frequency 6422.8Formant 9 Frequency 7038.4

Page 7: Emotional Speech Recognition - Columbia Universitydpwe/e6820/proposals/kisang.pdfWhat is emotional speech recognition? A technique which can recognize emotions in a speech Common emotions:

Emotion Classification: HMM

Anger

Happy

NeutralSurprise

Frustrated

Sadness

Example)

Initial State: Anger

Observation

Fo= 250 Hz

Gender: Male

0.2

0.3

0.25

0.15

0.05

0.05

Page 8: Emotional Speech Recognition - Columbia Universitydpwe/e6820/proposals/kisang.pdfWhat is emotional speech recognition? A technique which can recognize emotions in a speech Common emotions:

My technique: Overview

Emotions: Sadness, Neutral, Anger, Happy, (Frustrated), (Surprised)Language: EnglishFeatures to be used: Pitch, Energy, Formants, Classification: Modified Binary Decision

(why not HMM???)Goal: 50% Correction Rate (independent, gender unknown)

Page 9: Emotional Speech Recognition - Columbia Universitydpwe/e6820/proposals/kisang.pdfWhat is emotional speech recognition? A technique which can recognize emotions in a speech Common emotions:

My technique: Overview

Non-hyper(Sadness, Neutral

Hyper(Anger, FrustratedHappy, Surprise)

Sadness

Neutral

Negative(Anger, Frustrated)

Positive(Happy, Surprise)

Anger

Frustrated

Happy

Surprise

Men

Women Energy (i.e. rising slopes)

f0

Pitch Track

Speech

Frequency

Local maximas

p.d.f. pitch contour

f0

Page 10: Emotional Speech Recognition - Columbia Universitydpwe/e6820/proposals/kisang.pdfWhat is emotional speech recognition? A technique which can recognize emotions in a speech Common emotions:

My technique: Example (Gender Differentiation)

410281Happy

202110Neutral

176104Sad

FemaleMale1. Fundamental Frequencies

(Time-Domain Analysis using autocorrelation)

2. PDFs of mean value

of pitch contour

2071193

1821012

1861271

FemaleMale

Page 11: Emotional Speech Recognition - Columbia Universitydpwe/e6820/proposals/kisang.pdfWhat is emotional speech recognition? A technique which can recognize emotions in a speech Common emotions:

My technique: Example(Non-hyper vs. Hyper)

Neutral Angry

Page 12: Emotional Speech Recognition - Columbia Universitydpwe/e6820/proposals/kisang.pdfWhat is emotional speech recognition? A technique which can recognize emotions in a speech Common emotions:

My technique: Example(Neutral vs. Sadness)

Neutral Sadness

Page 13: Emotional Speech Recognition - Columbia Universitydpwe/e6820/proposals/kisang.pdfWhat is emotional speech recognition? A technique which can recognize emotions in a speech Common emotions:

Challenges (or Opportunities)

Database (Main source: movies, TVs)Enough angry speeches, insufficient happy speeches in Hollywood moviesTV sitcoms might be good (i.e. Friends, Seinfeld)

No standard methodologiesCharacterize emotions according to pitch, energy, formants, etcInput is very subjective

Page 14: Emotional Speech Recognition - Columbia Universitydpwe/e6820/proposals/kisang.pdfWhat is emotional speech recognition? A technique which can recognize emotions in a speech Common emotions:

Final Product

2. Stand Alone Application in LabVIEW

1. MatLAB

Page 15: Emotional Speech Recognition - Columbia Universitydpwe/e6820/proposals/kisang.pdfWhat is emotional speech recognition? A technique which can recognize emotions in a speech Common emotions:

Bonus works “Dream big!”

Emotional Speech Synthesize

joy

angry

neutral

Page 16: Emotional Speech Recognition - Columbia Universitydpwe/e6820/proposals/kisang.pdfWhat is emotional speech recognition? A technique which can recognize emotions in a speech Common emotions:

Discussions

Non-hyper(Sadness, Neutral

Hyper(Anger, FrustratedHappy, Surprise)

Sadness

Neutral

Negative(Anger, Frustrated)

Positive(Happy, Surprise)

Anger

Frustrated

Happy

Surprise

Men

Women Speech