exploring tools for expressive voice
DESCRIPTION
Final presentation for Affective Computing Fall 2011: tools for helping people be more expressive with their voices.TRANSCRIPT
Exploring Tools for Expressive Voice Affective Computing Fall 2011 Natalie Freed
Project Goal
¨ Build and evaluate tools to support people modulating their voice (speed, loudness, pitch) in a performative context
¨ Approach: real time feedback, playfulness
¨ Technology: speech analysis, audio manipulation
Applications
1. “Stretching Your Range” (loudness and speech rate)
2. “Playing with Voices” (pitch and intonation)
3. Pilot study: expert vs. computer analysis of voice modulation
4. Study design to determine effect of interventions 1 and 2
System Architecture (both)
Praat running on web server
audio file
analysis
visualization
Flash + Actionscript + Adobe Air
Goal: flexible, cross-platform, can run on portable devices, Praat audio analysis made into a web service that can be used for future applications.
CGI call
Application 1: Stretching Your Range (Loudness and Speech Rate)
Navigate through exercises
Target area
feedback is plotted
Evaluation of “Stretching Your Range”
1. Read book 3. Read book with feedback
2. Speech modulation exercises with feedback
n = 4 Not a controlled study: was not compared to a group that read the book twice without using the tool, or with the prompts alone. Goal: qualitative analysis of interface + learn how to measure effectiveness and degree of voice modulation based on different audio recordings to prepare for controlled study.
Human Analysis
4 20-second audio samples from each participant. For each recording (pre and post)
• 30 seconds after start • 1 minute before end
pre
post
Order randomized
Expert evaluator: public speaking instructor Bill Hoogterp
1-7 scale
Human Analysis
Audio (mean of 2 samples per participant)
How effectively does this speaker keep the listener's
attention?
How much is this speaker modulating the speed of his
or her voice?
How much is this speaker modulating the loudness of
his or her voice?
How much is this speaker modulating other aspects of
his or her voice, such as pitch, rhythm, or intonation?
A PRE 2 3 2 2
A POST 2.5 2 2.5 2
B PRE 4.5 4 4 3.5
B POST 4.5 4 4 3.5
C PRE 3.5 3.5 3.5 3.5
C POST 4 4 4 3.5
D PRE 2.5 2.5 2 2
D POST 4 3.5 4 2.5
Human Analysis
1-tailed T test for correlated samples (within-groups), alpha=0.05
Not significantly significant – but would be if there were one more participant who upheld trend. => Need a larger n!
0
1
2
3
4
Keep Attention
Modulate loudness
Modulate speed
Modulate other
Pre (mean)
Post (mean)
Software Analysis (Praat)
ID duration speaking
rate articulation
rate loudness range pitch range
Pitch standard deviation
intensity range
Intensity standard deviation
mode intensity
A PRE 240.83 2.69 4.40 58.34 448.83 48.65 35.68 9.88 57.84
A POST 253.42 2.56 4.31 56.99 360.89 39.77 35.09 9.39 56.36
B PRE 119.26 3.79 4.87 58.12 440.20 59.61 39.86 8.39 62.95
B POST 120.59 3.46 4.82 58.59 434.35 67.06 38.68 9.30 60.70
C PRE 179.42 3.41 4.63 61.72 446.26 69.52 39.70 10.08 64.52
C POST 212.30 2.96 4.51 60.49 447.73 76.11 43.66 10.77 62.26
D PRE 232.86 2.31 2.90 52.73 454.57 68.46 33.95 7.94 58.25
D POST 206.29 2.61 3.74 60.44 441.37 88.80 47.96 9.65 61.99
A pre A post
B pre B post
C pre C post
D pre D post
Pre and post recordings
Same book
A pre A post
B pre B post
C pre C post
D pre D post
Pre and post recordings (time-stretched)
“Effectiveness” (self report and evaluated)
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
A B C D
Pre
Post
Expert evaluation of effectiveness at keeping listener’s attention of audio samples (1-7)
0
1
2
3
4
5
6
A B C D
SR Effectiveness
Again not enough data for meaningful results (and the questions are not comparable here), but the interesting question for future work: how accurately do people estimate their own effectiveness?
Self-report of own “public speaking effectiveness” (Likert, 1-7)
Human/Software?
48
50
52
54
56
58
60
62
64
A B C D
Pre
Post
loudness range (max – min)
0 50
100 150 200 250 300 350 400 450 500
A B C D
Pre
Post
Pitch range (max – min)
0
0.5
1
1.5
2
2.5
3
3.5
4
A B C D
Pre
Post
speech rate (pause ratio)
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
A B C D
Pre
Post 0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
A B C D
Pre
Post
Speed modulation Other/pitch modulation
0
0.5
1
1.5
2
2.5
3
3.5
4
A B C D
Pre
Post
Loudness modulation Human analysis:
Software analysis:
How to most accurately evaluate and compare overall success at modulation? Exploring which metrics might map to expert human evaluation, or how they can be used together.
Redesign of “Stretching Your Range”
The prompts themselves are helpful, important to identify value of feedback component.
-“You can go louder!” Personalized and credible encouragement. -Identifying what you can’t hear about your own voice.
Can we avoid calibration/microphone/perceptual loudness issue for ease of deployment? Calibration is possible, but it may not be necessary to have an absolute sense of loudness to get people to stretch their range. New approach: continuous visualization. eliminate problem of when feedback arrives, allow people to speak without unnatural breaks. New playback button: reinforce feedback, introduce distance from own voice.
Proposed redesign
Three main exercises based on study results:
1. Stretching range to limits 2. Slowing down (pausing more and longer) 3. Spanning full range
1. Stretching range to limits
continuous audio level
target area
2. Slowing down
pause duration target
3. Spanning full range
infrequently occurring loudness
frequently occurring loudness
Interface 2: Playing with Voices (Pitch and intonation)
Demo at: http://vimeo.com/33385700
Interface 2: Playing with Voices (Pitch and intonation)
Video at: http://vimeo.com/33385700
1. Reader’s voice is recorded and sent to server for analysis.
2. Audio is compared to different recordings, closest match in pitch and intonation is returned.
3. “Doctor” character (the hand puppet) plays back the audio through embedded speaker.
=> Puppet mimics the reader’s character voices, encouraging silliness.
“Playing with Voices” User Feedback
“Wanted to try the extremes – because the extreme voices are funny!” “I liked it when it spoke with the same rhythm.” “I wanted it to mimic me.” “Turning the pages breaks the rhythm, but the pause before it speaks is right.”
First tested with random voices for puppet playback to learn what people expected, what was engaging. Feedback from (5) users:
Mimicry and extreme voices appealed, so built into final application (video on previous slide).
Controlled Study Design
Research question: Do these interventions impact speech modulation and expressiveness? n > 10. All participants read the same book. Control: Read book once, prompt to read more expressively, read book again. Group A: Read book, [study 1: exercises with no feedback], read book again.
[OR study 2: read “no more monkeys” book with random voices] Group B: Read book, [study 2: exercises with feedback], read book again. [OR study 2: read “no more monkeys” book with pitch-matched voices] Secondary question: compare human evaluations (not expert only) to software evaluation and identify correlated measures.
References
Boersma, P. and Weenink, D. " Praat: doing phonetics by computer," Version 4.4.16 ed, 2006. Camlot, J. et al. “The Victorianator.” 2011. http://ludicvoice.concordia.ca/?page_id=28 Hoque, M. E., Lane, J. K., el Kaliouby, R., Goodwin, M., Picard, R.W., Exploring Speech Therapy Games with Children on the Autism Spectrum, Proceedings of InterSpeech, Brighton, UK, September 6-10, 2009. Lewis, J. and Tsonis, F. “SenseText: Gesture Based Control of Text Visualization”. Proceedings of the 6th International Workshop on Gesture in Human-Computer Interaction and Simulation, Berder Island, France, May 18-20, 2005. Rodenburg, P. 1953. The actor speaks : voice and the performer. New York, NY : St. Martin's Press, 2000.
Thank you!
Ehsan Hoque: speech analysis scripts, guidance, COUHES assistance Bill Hoogterp: expert evaluation of audio data Ryan McDermott: help with web server setup and XML parsing Adam Setapen: Read book for demo video Cynthia Breazeal: help with COUHES approval User study participants