miketalk:an adaptive man-machine interface
DESCRIPTION
MikeTalk:An Adaptive Man-Machine Interface. Tony Ezzat Volker Blanz Tomaso Poggio. TTVS Overview. Input: Text Output: Photo-realistic talking face uttering text. Desktop Agents. You have received 1 email from Tommy Poggio. Desktop Agents. Customer Support. You have bought 20 - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/1.jpg)
MikeTalk:An Adaptive Man-Machine Interface
Tony EzzatVolker Blanz
Tomaso Poggio
![Page 2: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/2.jpg)
TTVS Overview
• Input: Text
• Output: Photo-realistic talking face uttering text
![Page 3: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/3.jpg)
Desktop Agents
![Page 4: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/4.jpg)
Desktop Agents
You have received 1 email from Tommy Poggio.
![Page 5: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/5.jpg)
Customer Support
![Page 6: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/6.jpg)
Customer Support
You have bought 20 shares of SONYat $40 each.
![Page 7: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/7.jpg)
Advertisements
![Page 8: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/8.jpg)
Advertisements
Hi Tony, would you be interestedin a ticket from Boston to New
York for $50.00?
![Page 9: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/9.jpg)
Modules
![Page 10: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/10.jpg)
Phoneme Corpus
Step 1:
– collect a visual corpus from a subject
– corpus contains 44 words
–one word for each American English phoneme
![Page 11: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/11.jpg)
6 Consonantal Visemes
Step 2:
– extract one image per phoneme: viseme
–group visemes together by visual similarity
![Page 12: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/12.jpg)
9 Vocalic Visemes (+ 1 SilenceViseme)
![Page 13: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/13.jpg)
Problem1:Need to Interpolate!
![Page 14: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/14.jpg)
Solution: Morphing!
Problem 2: too tedious to specify correspondence by hand across many images!
Simultaneous interpolation of shape & texture. (Beier & Neely 1992)
![Page 15: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/15.jpg)
Solution: Optical Flow
• To interpolate between two visemes, optical flow is first computed
• A 2D motion vector field is produced:
dx(x,y) dy(x,y)
(Horn & Schunk 1986) (Lucas & Kanade 1988)
![Page 16: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/16.jpg)
Morphing
• Forward warping A to B
• Forward warping B to A
• Blending
• Holefilling
![Page 17: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/17.jpg)
Synthesis Database
• 16 Visemes total
• 256 Optical flow vectors total, from every viseme to every other viseme
![Page 18: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/18.jpg)
Concatenation and Lip Sync
• Load the correct viseme transitions
• Concatenate viseme transitions
• Sample the viseme transitions using audio durations
![Page 19: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/19.jpg)
Examples
“1, 2, 3, 4, 5”
“cat, dog, pig,cow, moose, horse,sheep”
“you have received10 email messages.”
![Page 20: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/20.jpg)
Current Work
• Coarticulation
• Eye + head movements
• Emotion
• 3D instead of 2d
• Psychophysics
![Page 21: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/21.jpg)
3DWith Volker Blanz
![Page 22: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/22.jpg)
The End
![Page 23: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/23.jpg)
Co-articulation
• Problem: Current method does not handle coarticulation, so speech looks overly articulated
• Can record all possible triphones/ quadriphones but this approach requires a lot of data!
• Best method is to learn a model for coarticulation, but what is the representation for the lips?
![Page 24: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/24.jpg)
Principal Components Analysis
• Each image is a vector in a high-dimensional space
• Using PCA, find the optimal set of vectors that span the space
• Project the entire corpus onto those basis vectors
![Page 25: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/25.jpg)
Top 2 PCA Bases for /buut/
![Page 26: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/26.jpg)
Top 2 PCA Bases for /get/
Problem: Too nonlinear!
![Page 27: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/27.jpg)
Flow Component Analysis
• Compute optical from a reference lip image to all other images in the corpus
• Compute PCA on all the flows
![Page 28: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/28.jpg)
Top 2 FPCA Bases for /buut/
![Page 29: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/29.jpg)
Top 2 FPCA Bases for /get/
Much more linear behavior!
![Page 30: MikeTalk:An Adaptive Man-Machine Interface](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fa550346895dc031ed/html5/thumbnails/30.jpg)
Current Work
• Now that we have parameterized the mouth, what is the model for mouth synthesis?
• How is that model fit to the PCA data?