exploring tools for expressive voice

Exploring Tools for Expressive Voice Affective Computing Fall 2011 Natalie Freed

Project Goal

¨  Build and evaluate tools to support people modulating their voice (speed, loudness, pitch) in a performative context

¨  Approach: real time feedback, playfulness

¨  Technology: speech analysis, audio manipulation

Applications

1. “Stretching Your Range” (loudness and speech rate)

2. “Playing with Voices” (pitch and intonation)

3. Pilot study: expert vs. computer analysis of voice modulation

4. Study design to determine effect of interventions 1 and 2

System Architecture (both)

Praat running on web server

audio file

analysis

visualization

Flash + Actionscript + Adobe Air

Goal: flexible, cross-platform, can run on portable devices, Praat audio analysis made into a web service that can be used for future applications.

CGI call

Application 1: Stretching Your Range (Loudness and Speech Rate)

Navigate through exercises

Target area

feedback is plotted

Evaluation of “Stretching Your Range”

1. Read book 3. Read book with feedback

2. Speech modulation exercises with feedback

n = 4 Not a controlled study: was not compared to a group that read the book twice without using the tool, or with the prompts alone. Goal: qualitative analysis of interface + learn how to measure effectiveness and degree of voice modulation based on different audio recordings to prepare for controlled study.

Human Analysis

4 20-second audio samples from each participant. For each recording (pre and post)

•  30 seconds after start •  1 minute before end

pre

post

Order randomized

Expert evaluator: public speaking instructor Bill Hoogterp

1-7 scale

Human Analysis

Audio (mean of 2 samples per participant)

How effectively does this speaker keep the listener's

attention?

How much is this speaker modulating the speed of his

or her voice?

How much is this speaker modulating the loudness of

his or her voice?

How much is this speaker modulating other aspects of

his or her voice, such as pitch, rhythm, or intonation?

A PRE 2 3 2 2

A POST 2.5 2 2.5 2

B PRE 4.5 4 4 3.5

B POST 4.5 4 4 3.5

C PRE 3.5 3.5 3.5 3.5

C POST 4 4 4 3.5

D PRE 2.5 2.5 2 2

D POST 4 3.5 4 2.5

Human Analysis

1-tailed T test for correlated samples (within-groups), alpha=0.05

Not significantly significant – but would be if there were one more participant who upheld trend. => Need a larger n!

0

1

2

3

4

Keep Attention

Modulate loudness

Modulate speed

Modulate other

Pre (mean)

Post (mean)

Software Analysis (Praat)

ID duration speaking

rate articulation

rate loudness range pitch range

Pitch standard deviation

intensity range

Intensity standard deviation

mode intensity

A PRE 240.83 2.69 4.40 58.34 448.83 48.65 35.68 9.88 57.84

A POST 253.42 2.56 4.31 56.99 360.89 39.77 35.09 9.39 56.36

B PRE 119.26 3.79 4.87 58.12 440.20 59.61 39.86 8.39 62.95

B POST 120.59 3.46 4.82 58.59 434.35 67.06 38.68 9.30 60.70

C PRE 179.42 3.41 4.63 61.72 446.26 69.52 39.70 10.08 64.52

C POST 212.30 2.96 4.51 60.49 447.73 76.11 43.66 10.77 62.26

D PRE 232.86 2.31 2.90 52.73 454.57 68.46 33.95 7.94 58.25

D POST 206.29 2.61 3.74 60.44 441.37 88.80 47.96 9.65 61.99

A pre A post

B pre B post

C pre C post

D pre D post

Pre and post recordings

Same book

A pre A post

B pre B post

C pre C post

D pre D post

Pre and post recordings (time-stretched)

“Effectiveness” (self report and evaluated)

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

A B C D

Pre

Post

Expert evaluation of effectiveness at keeping listener’s attention of audio samples (1-7)

0

1

2

3

4

5

6

A B C D

SR Effectiveness

Again not enough data for meaningful results (and the questions are not comparable here), but the interesting question for future work: how accurately do people estimate their own effectiveness?

Self-report of own “public speaking effectiveness” (Likert, 1-7)

Human/Software?

48

50

52

54

56

58

60

62

64

A B C D

Pre

Post

loudness range (max – min)

0 50

100 150 200 250 300 350 400 450 500

A B C D

Pre

Post

Pitch range (max – min)

0

0.5

1

1.5

2

2.5

3

3.5

4

A B C D

Pre

Post

speech rate (pause ratio)

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

A B C D

Pre

Post 0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

A B C D

Pre

Post

Speed modulation Other/pitch modulation

0

0.5

1

1.5

2

2.5

3

3.5

4

A B C D

Pre

Post

Loudness modulation Human analysis:

Software analysis:

How to most accurately evaluate and compare overall success at modulation? Exploring which metrics might map to expert human evaluation, or how they can be used together.

Redesign of “Stretching Your Range”

The prompts themselves are helpful, important to identify value of feedback component.

-“You can go louder!” Personalized and credible encouragement. -Identifying what you can’t hear about your own voice.

Can we avoid calibration/microphone/perceptual loudness issue for ease of deployment? Calibration is possible, but it may not be necessary to have an absolute sense of loudness to get people to stretch their range. New approach: continuous visualization. eliminate problem of when feedback arrives, allow people to speak without unnatural breaks. New playback button: reinforce feedback, introduce distance from own voice.

Proposed redesign

Three main exercises based on study results:

1.  Stretching range to limits 2.  Slowing down (pausing more and longer) 3.  Spanning full range

1. Stretching range to limits

continuous audio level

target area

2. Slowing down

pause duration target

3. Spanning full range

infrequently occurring loudness

frequently occurring loudness

Interface 2: Playing with Voices (Pitch and intonation)

Demo at: http://vimeo.com/33385700

Interface 2: Playing with Voices (Pitch and intonation)

Video at: http://vimeo.com/33385700

1.  Reader’s voice is recorded and sent to server for analysis.

2.  Audio is compared to different recordings, closest match in pitch and intonation is returned.

3.  “Doctor” character (the hand puppet) plays back the audio through embedded speaker.

=> Puppet mimics the reader’s character voices, encouraging silliness.

“Playing with Voices” User Feedback

“Wanted to try the extremes – because the extreme voices are funny!” “I liked it when it spoke with the same rhythm.” “I wanted it to mimic me.” “Turning the pages breaks the rhythm, but the pause before it speaks is right.”

First tested with random voices for puppet playback to learn what people expected, what was engaging. Feedback from (5) users:

Mimicry and extreme voices appealed, so built into final application (video on previous slide).

Controlled Study Design

Research question: Do these interventions impact speech modulation and expressiveness? n > 10. All participants read the same book. Control: Read book once, prompt to read more expressively, read book again. Group A: Read book, [study 1: exercises with no feedback], read book again.

[OR study 2: read “no more monkeys” book with random voices] Group B: Read book, [study 2: exercises with feedback], read book again. [OR study 2: read “no more monkeys” book with pitch-matched voices] Secondary question: compare human evaluations (not expert only) to software evaluation and identify correlated measures.

References

Boersma, P. and Weenink, D. " Praat: doing phonetics by computer," Version 4.4.16 ed, 2006. Camlot, J. et al. “The Victorianator.” 2011. http://ludicvoice.concordia.ca/?page_id=28 Hoque, M. E., Lane, J. K., el Kaliouby, R., Goodwin, M., Picard, R.W., Exploring Speech Therapy Games with Children on the Autism Spectrum, Proceedings of InterSpeech, Brighton, UK, September 6-10, 2009. Lewis, J. and Tsonis, F. “SenseText: Gesture Based Control of Text Visualization”. Proceedings of the 6th International Workshop on Gesture in Human-Computer Interaction and Simulation, Berder Island, France, May 18-20, 2005. Rodenburg, P. 1953. The actor speaks : voice and the performer. New York, NY : St. Martin's Press, 2000.

Thank you!

Ehsan Hoque: speech analysis scripts, guidance, COUHES assistance Bill Hoogterp: expert evaluation of audio data Ryan McDermott: help with web server setup and XML parsing Adam Setapen: Read book for demo video Cynthia Breazeal: help with COUHES approval User study participants