face animation

8/3/2019 Face Animation

1/57

Facial AnimationFacial Animation

By: Shahzad MalikBy: Shahzad Malik

CSC2529 PresentationCSC2529 PresentationMarch 5, 2003March 5, 2003


2/57

MotivationMotivation

Realistic human facial animation is aRealistic human facial animation is achallenging problem (DOF, deformations)challenging problem (DOF, deformations)

Would like expressive and plausibleWould like expressive and plausibleanimations of a photorealistic 3D faceanimations of a photorealistic 3D face

Useful for virtual characters in film, videoUseful for virtual characters in film, video

gamesgames


3/57

PapersPapers

Three major areas are Lip Syncing, FaceThree major areas are Lip Syncing, FaceModeling, and Expression SynthesisModeling, and Expression Synthesis

Video Rewrite (BreglerVideo Rewrite (Bregler SIGGRAPH 1997)SIGGRAPH 1997)

Making Faces (GuenterMaking Faces (Guenter SIGGRAPH 1998)SIGGRAPH 1998)

Expression Cloning (NohExpression Cloning (Noh SIGGRAPH 2001)SIGGRAPH 2001)


4/57

Video RewriteVideo Rewrite

Generate a new video of an actorGenerate a new video of an actormouthing a new utterance by piecingmouthing a new utterance by piecingtogether old footagetogether old footage

Two stages:Two stages:

AnalysisAnalysis

SynthesisSynthesis


5/57

Analysis StageAnalysis Stage

Given footage of the subject speaking,Given footage of the subject speaking,extract mouth position and lip shapeextract mouth position and lip shape

Hand label 26 training images:Hand label 26 training images:

34 points on mouth (20 outer boundary, 1234 points on mouth (20 outer boundary, 12inner boundary, 1 at bottom of upper teeth, 1inner boundary, 1 at bottom of upper teeth, 1

at top of lower teeth)at top of lower teeth) 20 points on chin and jaw line20 points on chin and jaw line

Morph training set to get to 351 imagesMorph training set to get to 351 images


6/57

EigenPointsEigenPoints

Create EigenPoint models using this setCreate EigenPoint models using this set

Use derived EigenPoints model to labelUse derived EigenPoints model to labelfeatures in all frames of the training videofeatures in all frames of the training video


7/57

EigenPoints (continued)EigenPoints (continued)

Problem:Problem: EigenPoints assumes featuresEigenPoints assumes featuresare undergoing pure translationare undergoing pure translation


8/57

Face WarpingFace Warping

Before EigenPoints labeling, warp eachBefore EigenPoints labeling, warp eachimage into a reference planeimage into a reference plane

Use a minimization algorithm to registerUse a minimization algorithm to registerimagesimages

? A !i

iFiT IIME2

)x()'x()(

-

-

!}

1

x'x

876

543

210

y

x

mmm

mmm

mmm

M


9/57

Face Warping (continued)Face Warping (continued)

Use rigid parts of face to estimate warpUse rigid parts of face to estimate warp MM

Warp face byWarp face by MM--11

Perform eigenpoint analysisPerform eigenpoint analysis

BackBack--project features byproject features by MM onto faceonto face


10/57

Audio AnalysisAudio Analysis

Want to capture visual dynamics of speechWant to capture visual dynamics of speech

Phonemes are not enoughPhonemes are not enough

ConsiderConsider coarticulationcoarticulation

Lip shapes for many phonemes areLip shapes for many phonemes aremodified based on phonemes contextmodified based on phonemes context(e.g. /T/ in beet vs. /T/ in boot)(e.g. /T/ in beet vs. /T/ in boot)


11/57

Audio Analysis (continued)Audio Analysis (continued)

Segment speech into triphonesSegment speech into triphones

e.g. teapot becomes /SILe.g. teapot becomes /SIL--TT--IY/, /TIY/, /T--IYIY--P/, /IYP/, /IY--PP--AA/, /PAA/, /P--AAAA--T/, and /AAT/, and /AA--TT--SIL/)SIL/)

Emphasize middle of each triphoneEmphasize middle of each triphone

Effectively captures forward and backwardEffectively captures forward and backwardcoarticulationcoarticulation


12/57

Audio Analysis (continued)Audio Analysis (continued)

Training footage audio is labeled with phonemesTraining footage audio is labeled with phonemesand associated timingand associated timing

Use genderUse gender--specific HMMs for segmentationspecific HMMs for segmentation Convert transcript into triphonesConvert transcript into triphones


13/57

Synthesis StageSynthesis Stage

Given some new speech utteranceGiven some new speech utterance

Mark it with phoneme labelsMark it with phoneme labels

Determine triphonesDetermine triphones Find a video example with theFind a video example with the desireddesired

transitiontransition in databasein database

Compute a matching distance to eachCompute a matching distance to eachtriphone:triphone:

error = Dp + (1- )Ds


14/57

Viseme ClassesViseme Classes

Cluster phonemes intoCluster phonemes into visemeviseme classesclasses

Use 26 viseme classes (10 consonant, 15Use 26 viseme classes (10 consonant, 15

vowel):vowel):(1) /CH/, /JH/, /SH/, /ZH/(1) /CH/, /JH/, /SH/, /ZH/

(2) /K/, /G/, /N/, /L/(2) /K/, /G/, /N/, /L/

(25) /IH/, /AE/, /AH/(25) /IH/, /AE/, /AH/

(26) /SIL/(26) /SIL/


15/57

Phoneme Context DistancePhoneme Context Distance

Dp is phoneme context distance

Distance is 0 if phonemic categories are thesame (e.g. /P/ and /P/)

Distance is 1 if viseme classes are different(e.g. /P/ and /IY/)

Distance is between 0 and 1 if different

phonemic classes but same viseme class (e.g./P/ and /B/)

Compute for the entire triphone

Weight the center phoneme most


16/57

Lip Shape DistanceLip Shape Distance

Ds is distance between lip shapes inoverlapping triphones Eg. for teapot, contours for /IY/ and /P/

should match between /T-IY-P/ and /IY-P-AA/

Compute Euclidean distance between 4-element vectors (lip width, lip height, inner lip

height, height of visible teeth) Solution depends on neighbors in both

directions (use DP)


17/57

Time Alignment of Triphone VideosTime Alignment of Triphone Videos

Need to combine triphone videosNeed to combine triphone videos

Choose portion of overlapping triphonesChoose portion of overlapping triphoneswhere lip shapes are close as possiblewhere lip shapes are close as possible

Already done when computing DsAlready done when computing Ds


18/57

Time Alignment to UtteranceTime Alignment to Utterance

Still need to time align with target audioStill need to time align with target audio

Compare corresponding phoneme transcriptsCompare corresponding phoneme transcripts

Start time of center phoneme in triphone isStart time of center phoneme in triphone isaligned with label in target transcriptaligned with label in target transcript

Video is then stretched/compressed to fit timeVideo is then stretched/compressed to fit time

needed between target phoneme boundariesneeded between target phoneme boundaries


19/57

Combining Lips and BackgroundCombining Lips and Background

Need to stitch new mouth movie intoNeed to stitch new mouth movie intobackground original face sequencebackground original face sequence

Compute transformCompute transformMM

as beforeas before Warping replacement mask defines mouthWarping replacement mask defines mouth

and background portions in final videoand background portions in final video

Mouth mask Background mask


20/57

Combining Lips and BackgroundCombining Lips and Background

Mouth shape comes from triphone image,Mouth shape comes from triphone image,and is warped usingand is warped using MM

Jaw shape is combination of backgroundJaw shape is combination of backgroundjaw and triphone jaw linesjaw and triphone jaw lines

Near ears, jaw dependent on background,Near ears, jaw dependent on background,

near chin, jaw depends on mouthnear chin, jaw depends on mouth

Illumination matching is used to avoidIllumination matching is used to avoidseams mouth and backgroundseams mouth and background


21/57

Video Rewrite ResultsVideo Rewrite Results

Video: 8 minutes of video, 109 sentencesVideo: 8 minutes of video, 109 sentences

Training Data: frontTraining Data: front--facing segments of video,facing segments of video,around 1700 triphonesaround 1700 triphones

Emily sequences


22/57

Video Rewrite ResultsVideo Rewrite Results

2 minutes of video, 1157 triphones2 minutes of video, 1157 triphones

JFKsequences


23/57

Video RewriteVideo Rewrite

ImageImage--based facial animation systembased facial animation system

Driven by audioDriven by audio

Output sequence created from real videoOutput sequence created from real video

Allows natural facial movements (eyeAllows natural facial movements (eyeblinks, head motions)blinks, head motions)


24/57

Making FacesMaking Faces

Allows capturing facial expressions in 3DAllows capturing facial expressions in 3Dfrom a video sequencefrom a video sequence

Provides a 3D model and texture that canProvides a 3D model and texture that canbe rendered on 3D hardwarebe rendered on 3D hardware


25/57

Data CaptureData Capture

Actors face digitized using a CyberwareActors face digitized using a Cyberwarescanner to get a base 3D meshscanner to get a base 3D mesh

Six calibrated video cameras captureSix calibrated video cameras captureactors expressionsactors expressions

Six camera views


26/57

Data CaptureData Capture

182 dots are glued to actors face182 dots are glued to actors face

Each dot is one of six colors withEach dot is one of six colors withfluorescent pigmentfluorescent pigment

Dots of same color are placed as far apartDots of same color are placed as far apartas possibleas possible

Dots follow the contours of the face (eyes,Dots follow the contours of the face (eyes,lips, nasiolips, nasio--labial furrows, etc.)labial furrows, etc.)


27/57

Dot LabelingDot Labeling

Each dot needs a unique labelEach dot needs a unique label

Dots will be used to warp the 3D meshDots will be used to warp the 3D mesh

Also used later for texture generation fromAlso used later for texture generation fromthe six viewsthe six views

For each frame in each camera:For each frame in each camera: Classify each pixel as belonging to one of sixClassify each pixel as belonging to one of six

categoriescategories Find connected componentsFind connected components

Compute the centroidCompute the centroid


28/57

Dot Labeling (continued)Dot Labeling (continued)

Need to compute dot correspondencesNeed to compute dot correspondencesbetween camera viewsbetween camera views

Must handle occlusions, false matchesMust handle occlusions, false matches

Compute all point correspondencesCompute all point correspondencesbetweenbetween kk cameras andcameras and nn 2D dots2D dots

2

2n

k

point correspondences


29/57


For each correspondenceFor each correspondence

Triangulate a 3D point based on closestTriangulate a 3D point based on closest

intersection of rays cast through 2D dotsintersection of rays cast through 2D dots

Check if backCheck if back--projection is above someprojection is above somethresholdthreshold

All 3D candidates below threshold are storedAll 3D candidates below threshold are stored


30/57


Project stored 3D points into a referenceProject stored 3D points into a referenceviewview

Keep points that are within 2 pixels ofKeep points that are within 2 pixels ofdots in reference viewdots in reference view

These points are potential 3D matches forThese points are potential 3D matches for

a given 2D dota given 2D dot Compute average as final 3D positionCompute average as final 3D position

Assign to 2D dot in reference viewAssign to 2D dot in reference view


31/57


Need to assign consistent labels to 3D dotNeed to assign consistent labels to 3D dotlocations across entire sequencelocations across entire sequence

Define a reference set of dotsDefine a reference set of dots DD (frame 0)(frame 0)

LetLet ddjj DD be the neutral location for dotbe the neutral location for dot jj

Position ofPosition of ddjj at frameat frame ii isis ddjjii=d=djj+v+vjj

ii

For each reference dot, find the closest 3DFor each reference dot, find the closest 3Ddot of same color within some distancedot of same color within some distance


32/57

Moving the DotsMoving the Dots

Move reference dot to matched locationMove reference dot to matched location

For unmatched reference dotFor unmatched reference dot ddkk

, let, let nnkkbe the set of neighbor dots with matchbe the set of neighbor dots with match

in current framein current frame ii

!

k

ij nd

i

j

k

i

kv

nv 1


33/57

Constructing the MeshConstructing the Mesh

Cyberware scan has problems:Cyberware scan has problems: Fluorescent markers cause bumps on meshFluorescent markers cause bumps on mesh

No mouth openingNo mouth opening Too many polygonsToo many polygons

Bumps removed manuallyBumps removed manually

Split mouth polygons, add teeth andSplit mouth polygons, add teeth andtongue polygonstongue polygons

Run mesh simplification algorithmRun mesh simplification algorithm(Hoppes algorithm: 460k to 4800 polys)(Hoppes algorithm: 460k to 4800 polys)


34/57

Moving the MeshMoving the Mesh

Move vertices by linear combination ofMove vertices by linear combination ofoffsets of nearest dotsoffsets of nearest dots

!k

k

i

k

j

kj

i

j ddpp E

!Dd

jk

k

1where E


35/57

Assigning Blend CoefficientsAssigning Blend Coefficients

Assign blend coefficients for a grid ofAssign blend coefficients for a grid of1400 evenly distributed points on face1400 evenly distributed points on face


36/57

Assigning Blend CoefficientsAssigning Blend Coefficients

Label each dot, vertex, grid point asLabel each dot, vertex, grid point asaboveabove,, belowbelow, or, or neitherneither

Find 2 closest dots to each grid pointFind 2 closest dots to each grid point pp

DDnn is set of dots within 1.8(dis set of dots within 1.8(d11+d+d22)/2 of)/2 of pp

Remove points in relatively same directionRemove points in relatively same direction

Assign blend values based on distanceAssign blend values based on distancefromfrom pp


37/57

Assign Blend Coefficients (cont.)Assign Blend Coefficients (cont.)

If dot not in Dn, then a is 0If dot not in Dn, then a is 0

If dot in Dn:If dot in Dn:

!

!

niDd

i

ii

i

il

l

pdl Ethen,

0.1let

For vertices, find closest grid pointsFor vertices, find closest grid points

Copy blend coefficientsCopy blend coefficients


38/57

Dot RemovalDot Removal

Substitute skin color for dot colorsSubstitute skin color for dot colors

First lowFirst low--pass filter the imagepass filter the image

Directional filter prevents color bleedingDirectional filter prevents color bleeding Black=symmetric, White=directionalBlack=symmetric, White=directional

Face with dots Low-pass mask


39/57

Dot Removal (continued)Dot Removal (continued)

Extract rectangular patch of dotExtract rectangular patch of dot--free skinfree skin

HighHigh--pass filter this patchpass filter this patch

Register patch to center of dot regionsRegister patch to center of dot regions

Blend it with the lowBlend it with the low--frequency skinfrequency skin

Clamp hue values to narrow rangeClamp hue values to narrow range


40/57

Dot Removal (continued)Dot Removal (continued)

Original Low-pass

High-pass Hue clamped


41/57

Texture GenerationTexture Generation

Texture map generated for each frameTexture map generated for each frame

Project mesh onto cylinderProject mesh onto cylinder

Compute mesh location (Compute mesh location (kk,, 11,, 22) for each) for eachtexel (u,v)texel (u,v)

For each camera, transform mesh into viewFor each camera, transform mesh into view

For each texel, get 3D coordinates in meshFor each texel, get 3D coordinates in mesh Project 3D point to camera planeProject 3D point to camera plane

Get color at (x,y) and store as texel colorGet color at (x,y) and store as texel color


42/57

Texture Generation (continued)Texture Generation (continued)

Compute a texel weight (dot product betweenCompute a texel weight (dot product betweentexel normal on mesh and direction to camera)texel normal on mesh and direction to camera)

Merge the texture maps from all the camerasMerge the texture maps from all the camerasbased on the weight mapbased on the weight map


43/57

ResultsResults


44/57

Making Faces SummaryMaking Faces Summary

3D geometry and texture of a face3D geometry and texture of a face

Data generated for each frame of videoData generated for each frame of video

Shading and highlights glued to faceShading and highlights glued to face

Nice results, but not totally automatedNice results, but not totally automated

Need to repeat entire process for everyNeed to repeat entire process for everynew face we want to animatenew face we want to animate


45/57

Expression CloningExpression Cloning

Allows facial expressions to be mappedAllows facial expressions to be mappedfrom one model to anotherfrom one model to another

Source

model

Animation

Targetmodel


46/57

Expression Cloning OutlineExpression Cloning Outline

Motion capture dataor any animation

mechanism

Deform

Dense surface

correspondences

Vertex

displacements

Cloned

expressions

Motion transfer

Source model Target model

Source animation Target animation


47/57

Source Animation CreationSource Animation Creation

Use any existing facial animation methodUse any existing facial animation method

Eg. Making Faces paper described earlierEg. Making Faces paper described earlier

Motion capture

data

Source model Source animation


48/57

Dense Surface CorrespondenceDense Surface Correspondence Manually select 15Manually select 15--35 correspondences35 correspondences

Morph the source model using RBFsMorph the source model using RBFs

Perform cylindrical projection (ray throughPerform cylindrical projection (ray through

source vertex, into target mesh)source vertex, into target mesh) Compute Barycentric coords of intersectionCompute Barycentric coords of intersection

After RBFInitial Features After projection


49/57

Automatic Feature SelectionAutomatic Feature Selection

Can also automate initial correspondencesCan also automate initial correspondences

Use basic facts about human geometry:Use basic facts about human geometry:

Tip of Nose = point with highest Z valueTip of Nose = point with highest Z value

Top of Head = point with highest Y valueTop of Head = point with highest Y value

Currently use around 15 such heuristic rulesCurrently use around 15 such heuristic rules


50/57

Example DeformationsExample Deformations

Source Deformed source Target

Closely approximates

the target models


51/57

Animation with Motion VectorsAnimation with Motion Vectors

Animate by displacing target vertex byAnimate by displacing target vertex bymotion of corresponding source pointmotion of corresponding source point

Interpolate Barycentric coordinates ofInterpolate Barycentric coordinates oftarget vertices based on source verticestarget vertices based on source vertices

Need to project target model onto sourceNeed to project target model onto source

model (opposite of what we did before)model (opposite of what we did before)


52/57

Motion Vector TransferMotion Vector Transfer

Need to adjust direction and magnitudeNeed to adjust direction and magnitude

Source Target Source Target

Source motion vector

Proper target motion vector


53/57

Motion Vector Transfer (cont.)Motion Vector Transfer (cont.) Attach local coordinate system for each vertex inAttach local coordinate system for each vertex in

source and deformed sourcesource and deformed source XX--axis = average normalaxis = average normal YY--axis = proj. of any adjacent edge onto planeaxis = proj. of any adjacent edge onto plane

with Xwith X--axis as normalaxis as normal ZZ--axis = cross product of X and Yaxis = cross product of X and Y

Source TargetDeformed

source

X

Y

Z

X

Y

Z

T mX

mm M


54/57

Motion Vector TransferMotion Vector Transfer

Compute transformation between twoCompute transformation between twocoordinate systemscoordinate systems

Mapping determines the deformed sourceMapping determines the deformed sourcemodel motion vectorsmodel motion vectors


55/57

Example Motion TransferExample Motion Transfer

Adjusted

motions

Models Motion vectors

Source

Target

More

horizontal

Smaller


56/57

ResultsResults

Angry expression

Distorted mouth

Source Targets

Big open mouth


57/57

SummarySummary

EC can animate new models using aEC can animate new models using alibrary of existing expressionslibrary of existing expressions

Transfers motion vectors from sourceTransfers motion vectors from sourceanimations to target modelsanimations to target models

Process is fast and can be fully automatedProcess is fast and can be fully automated

face animation

Documents