production of emotional speech

Upload: susheela-dodla

Post on 05-Apr-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Production of Emotional Speech

    1/41

    Gabriel Schubiner

  • 7/31/2019 Production of Emotional Speech

    2/41

    Generation of Affect inSynthesized Speech

    Corpus-based approach to

    synthesisExpressive visual speech usingtalking head

    Demos

    Affect Editor Quiz/Demo

    Synface Demo

  • 7/31/2019 Production of Emotional Speech

    3/41

    Affect in SpeechGoals

    Addition of Emotion to Synthetic speech

    Acoustic ModelTypology of parameters of emotionalspeech

    Quantification

    Addresses problem of expressiveness

    What benefit is gained from expressive

    speech?

  • 7/31/2019 Production of Emotional Speech

    4/41

    mo on eoryAssumptionsAssumptionsEmotion -> Nervous System ->

    Speech Output

    Binary distinction

    Parasympathetic vsSympathetic

    based on physical changes

    universal emotions

  • 7/31/2019 Production of Emotional Speech

    5/41

    Approaches to

    AffectGenerative

    Emotion -> Physical ->Acoustic

    Descriptive

    Observed acoustic paramsimposed

  • 7/31/2019 Production of Emotional Speech

    6/41

    Descriptive

    Framework4 Parameter groupsPitch

    Timing

    Voice Quality

    Articulation

    Assumption of independence

    How could this affect design and

    results?

  • 7/31/2019 Production of Emotional Speech

    7/41

    Pitch

    TimingAccent ShapeAverage Pitch

    Contour Slope

    Final Lowering

    Pitch Range

    Reference Line

    Exaggeration (not used)

  • 7/31/2019 Production of Emotional Speech

    8/41

    Voice Quality

    ArticulationBreathiness

    Brilliance

    Loudness

    Pause Discontinuity

    Pitch Discontinuity

    Tremor

    Laryngealization

  • 7/31/2019 Production of Emotional Speech

    9/41

    Implementation

    Each parameter has scale

    Each scale is independent

    from other parameters

    between positive andnegative

  • 7/31/2019 Production of Emotional Speech

    10/41

    Implementation

    Settings grouped into presetconditions for each emotion

    based on prior studies

  • 7/31/2019 Production of Emotional Speech

    11/41

    Program Flow:

    InputEmotion -> parameterrepresentationUtterance -> clauses

    Agent, Action, Object, Locative

    Clause and lexeme annotations

    Finds all possible locations ofaffect and chooses whether ornot to use

  • 7/31/2019 Production of Emotional Speech

    12/41

    Program Flow

    Utterance -> Tree structure ->

    linear phonologycompiled for specificsynthesizer with software to

    simulate affects not available inhardware

  • 7/31/2019 Production of Emotional Speech

    13/41

  • 7/31/2019 Production of Emotional Speech

    14/41

    Perception

    30 Utterances

    5 sentences * 6 affects

    Forced choice of one of sixaffects

    magnitude and comments

  • 7/31/2019 Production of Emotional Speech

    15/41

    Elicitation

    SentencesIntroIm almost finished

    Im going to the city

    I saw your name in the paper X

    I thought you really meant it

    Look at that picture

  • 7/31/2019 Production of Emotional Speech

    16/41

    PopQuiz!!!

  • 7/31/2019 Production of Emotional Speech

    17/41

    Pop Quiz SolutionsIm almost finished

    Disgust : Surprise : Sadness : Gladness :Anger : Fear

    Im going to the citySurprise : Gladness : Anger : Disgust :Sadness : Fear

    I thought you really meant it

    Anger : Disgust : Gladness : Sadness : Fear :Surprise

    Look at that picture

    Anger : Fear : Disgust : Sadness : Gladness :Sur rise

  • 7/31/2019 Production of Emotional Speech

    18/41

    Results

    approx 50% recognition rate

    91% sadness

  • 7/31/2019 Production of Emotional Speech

    19/41

  • 7/31/2019 Production of Emotional Speech

    20/41

    Conclusions

    Effective?

    Thoughts?

  • 7/31/2019 Production of Emotional Speech

    21/41

    Corpus-basedApproach to

    Expressive SpeechSynthesis

  • 7/31/2019 Production of Emotional Speech

    22/41

    Corpus

    Collect utterances in each

    emotionemotion-dependent semantics

    One speaker

    Good news, Bad news, Question

  • 7/31/2019 Production of Emotional Speech

    23/41

    Model: Feature

    VectorFeaturesLexical stress

    Phrase-level stressDistance from beginning of phrase

    Distance from end of phrase

    POSPhrase-type

    End of syllable pitch

  • 7/31/2019 Production of Emotional Speech

    24/41

    Model:

    ClassificationPredicts F05 syllable window

    Uses feature vector to predictobservation vector

    observation vector: log(p),p

    p = end of syllable pitch

    Decision Tree

  • 7/31/2019 Production of Emotional Speech

    25/41

    Model: Target

    DurationSimilar to predicting F0

    build tree with goal of providingGaussian at leafs

    Use mean of class as target

    duration

    discretization

  • 7/31/2019 Production of Emotional Speech

    26/41

    ModelsUses acoustic analogue of n-grams

    captures sense of context

    compared to describing full

    emotion as sequencecompare to Affect Editor

    Uses only F0 and length (comp.

    A E)Include information about fromwhich utterance the featuresare derived

    intentional bias, justified?

  • 7/31/2019 Production of Emotional Speech

    27/41

    Model: SynthesisData tagged with originalexpression and emotion

    expression-cost matrix

    noted trade-off:

    emotional intensity vs.smoothness

    Paralinguistic events

  • 7/31/2019 Production of Emotional Speech

    28/41

    SSML

    Compare to Cahns typology

    Abstraction layers

  • 7/31/2019 Production of Emotional Speech

    29/41

    Perception

    Experiment

    Distinguish same utterancespoken with neutral andaffected prosody

    Semantic content problematic?

  • 7/31/2019 Production of Emotional Speech

    30/41

    Results

    Binary decision

    Reasonable gainover baseline?

  • 7/31/2019 Production of Emotional Speech

    31/41

    Conclusion

    Major contributions?

    Paths forward?

    S th i f E i

  • 7/31/2019 Production of Emotional Speech

    32/41

    Synthesis of ExpressiveVisual Speech on a

    Talking Head

  • 7/31/2019 Production of Emotional Speech

    33/41

    < Not theseTalking Heads...

  • 7/31/2019 Production of Emotional Speech

    34/41

    Synthesis

    BackgroundManipulation of video images

    Virtual model with deformationparametersSynchronized with time-alignedtranscription

    Articulatory Control ModelCohen & Massaro (1993)

  • 7/31/2019 Production of Emotional Speech

    35/41

    Data

    Single actor

    Given specific emotion asinstruction

    6 emotions + neutral

  • 7/31/2019 Production of Emotional Speech

    36/41

    Facial Animation

    ParametersFace independent

    FAP Matrix * scaling factor +position0

    Weighted deformations of

    distance between vertices andfeature point

  • 7/31/2019 Production of Emotional Speech

    37/41

    Modeling

    Phonetic segments assigned

    target parameter vectortemporal blending overdominance functions

    Principal components

  • 7/31/2019 Production of Emotional Speech

    38/41

    ML

    Separate models for each

    emotion

    6:1 training:testing ratio

    models -> PC traj -> FAP traj *emotion param matrix

  • 7/31/2019 Production of Emotional Speech

    39/41

    Results

    More extreme emotions easierto perceive

    73% sad, 60% angry, 40% sad

  • 7/31/2019 Production of Emotional Speech

    40/41

    Synface Demo

  • 7/31/2019 Production of Emotional Speech

    41/41

    Discussion

    Changes in approach from Cahnto Eide

    Production compared to

    Detection