exploring nonlinear relationship of blendshape facial...

Volume x (xxxx), Number x pp. 1–xxx COMPUTER GRAPHICS forum

Exploring Nonlinear Relationship of Blendshape FacialAnimation

Xuecheng Liu1,2, Shihong Xia1, Yiwen Fan1,2 and Zhaoqi Wang1

1Institute of Computing Technology, Chinese Academy of Sciences2Graduate School of the Chinese Academy of Sciences

AbstractHuman face is a complex biomechanical system and nonlinearity is a remarkable feature of facial expressions.However, in blendshape animation, facial expression space is linearized by regarding linear relationship betweenblending weights and deformed face geometry. This results in the loss of reality in facial animation. In order tosynthesize more realistic facial animation, aforementioned relationship should be nonlinear to allow the greatestgenerality and fidelity of facial expressions. Unfortunately, few existing works pay attention to the topic abouthow to measure the nonlinear relationship. In this paper, we propose an optimization scheme that automaticallyexplores the nonlinear relationship of blendshape facial animation from captured facial expressions. Experimentsshow that the explored nonlinear relationship is consistent with the nonlinearity of facial expressions soundly andis able to synthesize more realistic facial animation than the linear one.

Categories and Subject Descriptors (according to ACM CCS): I.3.7 [Computer Graphics]: Three-DimensionalGraphics and Realism—Animation

1. Introduction

Creating realistic facial animation is one of the greatestchallenges in computer graphics. Human face is a complexbiomechanical system, so it is very hard to simulate facialmotion accurately. Furthermore, human are so sensitive tofacial expression that even a tiny flaw of facial animationcould hardly escape from our attention. Despite realistic fa-cial animation synthesis has gone a long way and many use-ful methods and tools have been developed, there is stillspace for improvement.

In both research and industry domains, blendshapemethod is widely used to synthesize facial animation dueto its efficiency and intuition. In blendshape animation, thespace of potential faces is a linear subspace defined by a setof key shapes, which are linearly blended to synthesize facialexpressions. It is illustrated as follows.

E =Nbs

∑i=1

wiei, 0≤ wi ≤ 1 (1)

In above equation, facial expression E is expressed as avector of vertexes coordinates increments relative to the ones

of neutral expression model. It is similar to key shape ei. Nbsrepresents the number of key shapes. wi is blending weight.In order to avoid meaningless facial expressions, blendingweights are generally restricted in the closed interval [0,1].

There is a drawback in synthesizing realistic facial ani-mation through linear blending of key shapes. Human faceis a complex biomechanical system composed by skeletons,muscles, flesh, skin and so on. Therefore, nonlinearity is aremarkable feature of facial expressions. However, this fea-ture is ignored in blendshape method, and it inevitably re-sults in the loss of reality in facial animation. Taking jawmotion as example, while adjusting the blending weight ofkey shape "Mouth Opening" gradually, the vertex on chinshould move along an approximate arc centered at temporo-mandibular joint (Figure ??a) in reality. However in blend-shape facial animation, the orbit is linearized and poorlysimplified (Figure ??b). Another example is "Eyes Closing":the trail of eyelid motion which should correspond with thecontour of cornea (Figure ??a) is simplified as a section ofline (Figure ??b) in blendshape facial animation.

As proposed by Pighin and Lewis, the relationship be-tween blending weights and deformed face geometry should

c© 2010 The Author(s)Journal compilation c© 2010 The Eurographics Association and Blackwell Publishing Ltd.Published by Blackwell Publishing, 9600 Garsington Road, Oxford OX4 2DQ, UK and350 Main Street, Malden, MA 02148, USA.

Xuecheng Liu & Shihong Xia & Yiwen Fan & Zhaoqi Wang / Exploring Nonlinear Relationship of Blendshape Facial Animation

(a) (b)

(c)

Figure 1: The nonlinearity of jaw motion. (a) The arc orbit (red curve) of vertex on chin during "Mouth Opening". (b) Inblendshape facial animation, the orbit is linearly simplified (blue line). (c) Through adopting appropriate nonlinear relationshipfunctions (green curves), the orbit can be well reconstructed.

be nonlinear to allow the greatest generality and fidelityof facial expressions [?]. However, the relationship is lin-earized as wiei in blendshape facial animation. Throughadopting nonlinear relationship functions fi(wi) that consistwith the nonlinearity of facial expressions, it is able to syn-thesize more realistic facial animation (Equation ??). Tak-ing jaw motion for example, the orbit of vertex on chin canbe well reconstructed through setting fi(wi) as appropriatetrigonometric functions (Figure ??c). Nonlinear relationshipfunctions also apply to the case of "Eyes Closing" (Fig-ure ??c).

E =Nbs

∑i=1

fi(wi), 0≤ wi ≤ 1 (2)

Unfortunately, to our knowledge, few researchers havediscussed the topic of exploring the nonlinear relationshipof blendshape facial animation. Generally, given an arbitraryfacial model, it is very difficult to construct nonlinear rela-tionship functions fi(wi) due to the complex biomechanicsof human face. A potential solution is physical simulationof anatomical face model such as techniques of [?] and [?].However it is not easy to construct the accurate anatomyof the given face. Moreover, it is unexplored how to trans-form blending weights to facial muscles contract. Therefore,

physics-based simulation is not suitable for exploring non-linear relationship functions.

In this paper, we propose an optimization scheme thatautomatically explores the nonlinear relationship betweenblending weights and deformed face geometry of blendshapefacial animation. The essence of proposed optimization issearching the nonlinear relationship functions that matchcaptured facial expressions.

Through our scheme, the explored nonlinear relationshipconsists with the nonlinearity of facial expressions and isable to synthesize more realistic facial animation than thelinear one. In the experiments, we firstly show the nonlinear-ity of facial expressions brought by the explored nonlinearrelationship functions. Then, we contrast the facial anima-tions which are respectively synthesized by linear relation-ship functions and the explored nonlinear ones. The resultsare assessed through both user study and quantitative evalu-ation. At last, we discuss the computational efficiency of thealgorithms presented in this paper.

The remainder of this paper is organized as follows: Sec-tion 2 reviews related works of blendshape facial animation.Section 3 shows facial motion capture. Section 4 illustratesthe optimization scheme in detail. Section 5 shows the re-

c© 2010 The Author(s)Journal compilation c© 2010 The Eurographics Association and Blackwell Publishing Ltd.


sults of experiments. Finally, we conclude and discuss ourwork in section 6.

2. Related work

Since blendshape facial animation was firstly proposed byParke’s pioneer work in [?] and [?], it has been widely usedin both research and industry domains of facial animationdue to its intuition and convenience. In this section, we willspecifically review recent work related to blendshape facialanimation.

Principle components analysis (PCA) based methods con-struct basic shapes of facial animation through dimension re-duction of facial expression samples. Its primary advantageis the rigid orthogonality of the constructed space. The dis-advantage is that the principle components don’t possess vi-sual intuition and are not suitable for manual manipulation.Aiming at this problem, Chuang has designed three schemesthat select key shapes from facial expression samples basedon PCA results [?]; Li has used region-based PCA to auto-matically construct local orthogonal space from motion cap-ture data [?].

Since facial action coding system (FACS) was firstly pro-posed by Ekman in [?], it has been popular in facial ani-mation. According to FACS, facial expressions can be pre-sented as combinations of distinct Action Units. Each Ac-tion Unit intuitively corresponds with a basic shape of facialanimation. In specific applications, reduced or modified ver-sions of FACS also animate face well with improved usabil-ity [?] [?] [?] [?].

In light of the importance of facial animation, MPEG-4 has specified criterion of facial animation synthesizationfor network transmission [?]. In essence, MPEG-4 defines apiece linear blendshape facial animation. MPEG-4 has de-fined 68 parameters (FAP) to animate face. In these parame-ters, FAP1 and FAP2 depict fourteen static visemes and sixbasic facial expressions respectively, and the other FAPs de-fined 66 blending weights that enable synthesizing arbitraryfacial expressions. In MPEG-4 facial animation, it is a vitalissue that how to construct the facial animation table (FAT)that defines rules of facial motion. Kshirsagar, Fratarcangeliand Jiang have made their efforts to explore FAT [?] [?] [?].

Besides that, other works also have strived to constructlinear space of facial expressions. For example, Joshi hasproposed an automatic physically-motivated scheme thatsegments blendshapes into smaller regions [?]. Cao and Shinhave used independent components analysis (ICA) to extracta set of meaningful parameters that were independent to eachother as much as possible [?] [?].

3. Facial motion capture

A passive optical motion capture system is employed to cap-ture facial motion. In this system, twelve infrared cameras

are used to record coordinates of the optical markers at-tached on actor’s face, as shown in figure ??.

Figure 2: Illustration of facial motion capture. The leftimage shows the infrared cameras arrangement. The rightone shows optical markers distribution on actor’s face, thesmaller markers are used to reveal facial expressions and thefour larger ones are used as reference of computing transfor-mation matrix of head.

We process the captured facial expressions as follows. Atfirst, we transfer the marker coordinates from world coor-dinate system to the facial local system. We take the largeroptical markers which are stationary to actor’s head as ref-erence and compute the transformation matrix of head usingabsolute orientation algorithm [?]. After that, we retarget thecaptured expressions to the facial model using [?] [?], due tothat the facial model is different from actor’s face. Throughabove retargeting, the captured expressions from the actorcan be used to explore the nonlinearity of different facialmodels. It makes the proposed method more applicable. Atlast, the neutral expression model is subtracted from eachcaptured facial expression to obtain the marker coordinatesincrements. For ease of representation, in the latter sectionof paper, we refer the captured facial expressions as the dataprocessed above.

In essence, the proposed scheme in this paper is indepen-dent of the specific data form of captured facial expressions.Coordinates of sparse markers, points cloud, retargeted orcopied expressions, are all applicable.

Each captured facial expression is presented as followingtransversal vector:

Mk = (Mk,1,x,Mk,1,y,Mk,1,z, · · · ,Mk,Nm,x,Mk,Nm,y,Mk,Nm,z)

In above vector, Mk is composed of markers coordinatesincrements, k is captured expression index, Nm is the numberof markers, For ease of illustration, we rewrite above vectoras follows:

Mk = (Mk,1,Mk,2, · · · ,Mk, j, · · · ,Mk,Nm×3)

In this vector, j indicates the jth component of the vector,and Nm×3 is the vector dimension.

We further present the captured facial expressions as the



following matrix.

M =

M1,1 M1,2 · · · M1,Nm×3M2,1 M2,2 · · · M2,Nm×3

......

. . ....

MN f ,1 MN f ,2 · · · MN f ,Nm×3

(3)

In the matrix of equation ??, each row indicates a capturedfacial expression, and N f is the expressions number.

We have captured two groups of facial expressions. Thefirst group is used to analyze and parameterize the nonlin-ear relationship functions fi(wi). To do that, the actor actsseveral basic facial actions such as "Open mouth", "Closeeyes", "Raise lip corners" and so on. The motion capture datarecords the facial deformation during performing above ba-sic actions. The second group is used to optimize the nonlin-ear relationship functions fi(wi). We will optimize the non-linear relationship functions from this group of captured fa-cial expressions. For this purpose, the actor tries to contractall his facial muscles to perform enough facial expressionswhich are able to span the facial expression space.

4. Explore the nonlinear relationship functions

We explore the nonlinear relationship functions of blend-shape facial animation from captured facial expressionsthrough optimization technique. The scheme is illustrated infigure ??.

Figure 3: Diagram of exploring nonlinear relationship func-tions.

In above scheme, we firstly analyze and parameterize the

nonlinear relationship functions in section 4.1. It is donethrough analysis of the first group expressions. Then, in sec-tion 4.2, we optimize the nonlinear relationship functionsfrom the second group expressions. In the optimization, theparameters of nonlinear relationship functions are regardedas optimization variables. The captured facial expressionsonly record the motion of sparse markers, and above ex-plored nonlinear relationship functions are only the ones ofsparse markers. Therefore, at last, we expand the nonlinearrelationship functions from sparse markers to facial modelthrough Radial Basis Function interpolation in section 4.3.

4.1. Analyze and parameterize the nonlinearrelationship functions

We analyze and parameterize the nonlinear relationshipfunctions from the first group of captured facial expressionswhich record the facial geometry deformation during per-forming basic facial actions. For each facial action, we firstlyselected out the expression Mmax−norm which has maximumnorm. It is regarded as the extreme expression of this fa-cial action. Then, the others expressions are projected ontothe extreme expression as illustrated in equation ??, and itforms samples of (wi,k, Mk). In equation ??, Mk is a transver-sal vector projected onto Mmax−norm, and wi,k is the pro-jection weights. Some representative samples of (wi,k, Mk, j)are shown in figure ??a, where Mk, j is the jth componentof expression Mk. They prove that the relationship betweenblending weights and deformed facial geometry is definitelynonlinear.

wi,k = argmin∣∣Mk−wi,kMmax−norm

∣∣ ,0≤ wi,k ≤ 1 (4)

We choose cubic polynomials as the form of nonlinear re-lationship functions fi(wi). It is based on the following con-siderations. Firstly, polynomials contribute to the computa-tional efficiency of proposed optimization scheme in this pa-per, it will be illustrated in section 4.2. Base on that, cubicpolynomials are further selected because they can representthe nonlinearity of facial expressions with fewer polynomialcoefficients. They fit the (wi,k, Mk, j) samples well, the fittingresults are shown in figure ??b.

The cubic polynomial used in this paper is expressed asfollows.

f (w) = aw3 +bw2 + cw

The constant item of above cubic polynomial is left outbecause zero blending weights correspond with neutral fa-cial expression. The cubic polynomial is parameterized bythe polynomial coefficients a, b and c.

We set a cubic polynomial for each fi, j(wi) which is thejth component of relationship vector function fi(wi), that is,polynomial coefficients ai, j, bi, j, ci, j are used to parameter-ize fi, j(wi). We don’t get above parameters from the fitting



(a)

(b)

Figure 4: (a) (wi,k, Mk, j) samples (blue dots). (b) The fit-ting results of (wi,k, Mk, j) samples through cubic polynomi-als (red curves).

results of figure ??b, because it is generally impossible foractor to perform all the expected facial actions precisely. Weexplore them from the second group of captured expressionsthrough optimization.

In Shin’s work [?], cubic polynomials were used to ap-proximate the vertexes movements due to actuation signals.The polynomial coefficients were learnt to match motioncapture data in the case of known actuation signals. Ourwork is different to their one in two aspects. On the onehand, in Shin’s work, key shapes ei are extracted from mo-tion capture data automatically. However in our work, keyshapes can be constructed according to specific application,it is more neatly and usable in application. On the other hand,in [?], blending weights wi are computed from motion cap-ture data through independent components analysis (ICA).In our work, blending weights together with polynomial co-efficients are adjusted in the optimization. It guarantees thebest matching to motion capture data.

4.2. Optimize the nonlinear relationship functions

In this section, we optimize the nonlinear relationship func-tions of blendshape facial animation from the second groupcaptured facial expressions through optimization technique.The purpose of the proposed optimization is to search thenonlinear relationship functions that are able to reproducethe captured expressions. For that, we define the fitting en-ergy as the target function of optimization.

E f itting =N f

∑k=1

∣∣∣∣∣Mk−Nbs

∑i=1

fi(wi,k)

∣∣∣∣∣2

In above fitting energy, Mk is a captured expression; wi,k

is the blending weight of this facial expression.Nbs

∑i=1

fi(wi,k) is

the synthesized expression generated by nonlinear relation-ship functions. E f itting represents the difference between thecaptured facial expressions and the synthesized ones.

In this optimization, variables are composed of param-eters of nonlinear relationship functions ai, j, bi, j, ci, j andblending weights of synthesized facial expressions wi,k. Dueto the large amount of optimization variables (more than30,000 in our experiments), minimizing E f itting in globalvariables space results to overfitting. It means that the nu-merical optimal nonlinear relationship functions are able tosynthesize captured facial expressions most accurately, buttheir visual meanings miss. The overfitted nonlinear rela-tionship functions are not the desired ones, and the overfittedresults will be illustrated in experiments.

In order to avoid overfitting, we restrict the optimiza-tion variables in an appropriate subspace rather than globalspace. In the application of blendshape facial animation,users usually design a set of key shapes ei to animate face.These shapes have specific visual meanings, such as facialaction units, basic expressions or visemes. In the optimiza-tion, other than only being able to synthesize captured fa-cial expressions accurately, the desired nonlinear relation-ship functions fi(wi) should also preserve the visual mean-ings of their corresponding key shapes ei. It is vital in blend-shape facial animation.

In this paper, the subspace of optimization variables is es-tablished as follows. Firstly, construct a set of key shapesei of blendshape facial animation according to the spe-cific application. In our experiments, given the facial model,we have constructed 29 key shapes of basic facial actionunits [?] and visemes. Then, we rewrite the linear relation-ship functions wiei as nonlinear forms by setting ai, j = 0,bi, j = 0 and ci, j = ei, j. That is, the nonlinear form of wiei isfi, j(wi) = ei, jwi. In above formulation, ei, j and fi, j(wi) arethe jth components of key shape ei and nonlinear relation-ship function fi(wi) respectively, and ai, j, bi, j and ci, j are pa-rameters of fi, j(wi). At last, we restrict parameters ai, j, bi, jand ci, j in the intervals Ai, j, Bi, j and Ci, j which are neighbor-hoods of 0, 0 and ei, j respectively. Also, we restrict blendingweights wi in the closed interval [0, 1] to avoid meaninglessfacial expressions.

From above, the complete optimization is expressed asfollows:

minwi,k ,ai, j ,bi, j ,ci, j

N f

∑k=1

∣∣∣∣Mk−Nbs

∑i=1

fi(wi,k)∣∣∣∣2

sub ject to :0≤ wi,k ≤ 1,ai, j ∈ Ai, j,bi, j ∈ Bi, j,ci, j ∈Ci, j

(5)

Above optimization is a bound-constraint nonlinear least



square one. It is difficult to directly solve due to large amountof variables. Aiming at efficient solving, we propose a twosteps iteration optimization algorithm, as shown in figure ??.

Figure 5: The iterative optimization algorithm.

The flow of this algorithm can be illustrated as follows:

1. Initialize optimization variables by setting wi,k = 0, ai, j =0, bi, j = 0 and ci, j = ei, j.

2. Minimize E f itting through blending weights wi,k of syn-thesized expressions. That is, to each captured facial ex-pression Mk, we solve following optimization:

minwi,k


∑i=1

fi(wi,k)∣∣∣∣2

sub ject to : 0≤ wi,k ≤ 1(6)

This optimization is a nonlinear least square one but witha small amount (Nbs, 29 in our experiments) of variables.Therefore, it can be solved efficiently.

3. Minimize E f itting through parameters of nonlinear rela-tionship functions ai, j , bi, j and ci, j. It can be expressedas the following optimization.

minai, j ,bi, j ,ci, j

N f

∑k=1


∑i=1

fi(wi,k)∣∣∣∣2

sub ject to : ai, j ∈ Ai, j,bi, j ∈ Bi, j,ci, j ∈Ci, j

(7)

Due to that polynomial function is linear to their poly-nomial coefficients, above optimization is a linear leastsquare one. Although with a large amount of variables, itcan be solved efficiently.

4. Judgment of convergence. If E f itting converges, the algo-rithm terminates, else jump back to step 2 for a furtherloop.

In above algorithm, It is crucial to efficiently solving ofthe optimization in step 3 that nonlinear relationship func-tions are linear to their parameters. Otherwise, the optimiza-tion will become a nonlinear one with large amount vari-ables which are difficult to optimize. Polynomials are linearto their coefficients, it is an important reason of choosing cu-bic polynomials as the nonlinear relationship functions, asillustrated in section 4.1.

4.3. Expand the nonlinear relationship to facial model

The explored nonlinear relationship functions are expandedfrom sparse markers to the whole facial model throughRadius Basis Function (RBF) interpolation. We train aRBF (Equation ??) for each nonlinear relationship functionfi(wi). In this RBF, the input is vertex coordinates v of fa-cial model; the output is the difference between parametersof nonlinear relationship functions and the linear ones cor-responding to this vertex, it can be expressed as ai, j, bi, j andci, j − ei, j. Coordinates and nonlinear relationship functionsof markers are regarded as the training samples. In this RBF,we choose inverse multiquadrics [?] as the basis functionhm(v) (Equation ??). The shortest path is set as the distancedist(vt ,vm) between two vertexes vt and vm with regardingfacial model as undigraph [?]. The RBF training algorithmis elaborated in [?].

F(v) =Nm

∑m=1

p jmhm(v) (8)

hm(v) = (dist2(v,vm)+ c2m)−1/2

cm = mint 6=m

(dist(vt ,vm)) (9)

5. Experiments

We show the experiment results of proposed scheme in thissection. We firstly reveal the nonlinearity of facial expres-sions brought by the explored nonlinear relationship func-tions. Then, we contrast the facial animations respectivelysynthesized by linear relationship functions and the explorednonlinear ones. The results are assessed both through userstudy and quantitative evaluation. After that, we introducethe computational efficiency of the proposed algorithms inthis paper. At last, we illustrated the overfitting results dueto the optimization in global variable space. In the experi-ments, the facial model is composed of more than 6000 ver-texes, and the main hardware of computer is 2.67GHz CPUand 2.0GB RAM. We have captured about 500 expressionsto optimize the nonlinear relationship functions. To the inter-vals of optimization variables, Ai, j and Bi, j are set as [-|ei, j |,|ei, j |], and Ci, j is set as [0.8ei, j,1.2ei, j].

By taking "Mouth Opening" as example, we show the



Figure 6: The nonlinearity of "Mouth Opening" brought by the explored nonlinear relationship functions. The top images showthat nonlinearity of facial expressions is linearized by the linear relationship function wiei. Through optimization, the explorednonlinear relationship function fi(wi) corresponds to the nonlinearity well (bottom images). From left to right, the blendingweight of "Mouth Opening" increase progressively from 0.0 to 1.0 with 0.2 intervals.

nonlinearity of jaw motion brought by the explored nonlin-ear relationship functions. As shown in figure ??, the topface geometries are generated by linear relationship func-tions wiei; the bottom images are synthesized by the ex-plored nonlinear ones fi(wi); from left to right, the blend-ing weight wi increase progressively from 0.0 to 1.0 with0.2 intervals; the green spherules indicate the chin vertexpositions of different blending weights; the purple ones re-veal the orbit of chin vertex during blending weight chang-ing gradually. Furthermore, the example of "Eyes Closing"is shown in figure ??. From figure ?? and ?? we can seethat, through optimization, the explored nonlinear relation-ship between blending weights and deformed geometry iswell consistent with the nonlinearity of facial expressions.More examples are shown in the accompanying video.

We contrast the facial animations which are respectivelysynthesized by linear relationship functions (Equation ??)and the explored nonlinear ones (Equation ??). The resultsare assessed through user study. To do that, we have capturedanother 13 clips of facial motion whose durations spreadfrom 10 seconds to one minute. These clips of facial mo-tion are independent from the captured expressions whichare used to explore the nonlinear relationship functions. Foreach clip, we synthesize facial animations through linear re-lationship functions wiei and nonlinear ones fi(wi) respec-tively. Given a frame of captured facial motion Mk, to the lin-ear one, blending weights are calculated through optimiza-tions of equation ??, and facial expression is synthesizedthrough equation ??; to the nonlinear one, blending weightsare optimized through equation ??, and facial expression isgenerated according to equation ??. After that, the two an-imations and the corresponding video clip of facial motionare displayed synchronously in the screen abreast. The twoanimations are randomly placed on left or right side, which

is out of observers’ awareness. Referring to video clips offacial motion capture placed in the middle of screen, ob-servers vote the more realistic facial animation, as shownin figure ??. Fifteen observers have participated in the userstudy, and the results of vote percents are list in table ??.

minwi,k

∣∣∣∣∣Mk−Nbs

∑i=1

wi,kei

∣∣∣∣∣ ,sub ject to : 0≤ wi,k ≤ 1 (10)

From this table, we can see that, the explored nonlinear rela-

Figure 8: The facial animations and video arrangement ofuser study. The facial animations are respectively synthe-sized by linear and nonlinear relationship functions, and theyare randomly placed on the left and right side of the screen.In the middle of them, it is the video clips of facial motioncapture.

tionship functions are able to synthesize much more realisticfacial animation than the linear ones(89.7% vs. 10.3%).

For intuition, we list several synthesized facial expres-sions of user study in figure ??. In this figure, the greenspherules denote the marker positions of synthesized facialexpressions, the red ones represent the marker coordinatesof the captured facial motion. The second row and third rowfacial expressions are respectively synthesized by linear rela-tionship functions and nonlinear ones from the clips of cap-



Figure 7: The nonlinearity of facial expression "Eyes Closing" brought by the explored nonlinear relationship functions.

Figure 9: The contrast of facial expressions respectively synthesized by linear and nonlinear relationship functions in userstudy. The second row expressions are synthesized by linear relationship functions, and the third and bottom row images aresynthesized by the nonlinear ones. The green spherules indicate the synthesized facial motion, the red ones mark the capturedones.

tured facial motion. Then, we remove markers of the thirdrow expressions to generate the bottom ones. The top rowimages are corresponding stills of facial motion capture.

We also quantitatively evaluate the reality promotionbrought by the explored nonlinear relationship functions rel-ative to linear ones. We quantify the synthesis error as themarkers average Euclidean distance between synthesized fa-cial expressions and captured ones. From the thirteen clipsof captured facial motion, the synthesis error brought by lin-ear relationship functions is 3.52mm, and the one brought bythe nonlinear ones is 0.71mm.

The algorithms proposed in this paper can be efficiently

solved. At first, in the iterative algorithm of exploring non-linear relationship functions, the optimization of equation?? takes about 0.03 seconds; the one of equation ?? takesabout 1.47 seconds; the iterative optimization algorithm to-tally spend about ten minutes to converge. Furtherly, it takesabout another 3 minutes to expand the explored nonlinear re-lationship functions from sparse markers to facial model. Atlast, when synthesizing facial expressions through nonlinearrelationship functions according to equation ??, it takes onlyabout 0.0006 seconds per frame. Although the synthesiza-tion takes about three times as long as the linear blendingmethod of equation ??, the high efficiency doesn’t cause anytrouble of real-time application.



In the optimization, it results in overfitting to optimizenonlinear relationship functions in global variables space.The overfitted nonlinear relationship functions are able tosynthesize the captured facial expressions which are usedto explore them most accurately. However, they miss theirvisual meanings. It leads to unusability, as shown in fig-ure ??a. Through setting the optimization intervals Ai, j, Bi, jand Ci, j as the neighborhoods of 0, 0 and ei, j respectively,the overfitting is avoided. As results, the explored nonlin-ear relationship functions possess their visual meanings andare available for synthesizing facial expressions, as shown infigure ??b.

(a) (b)

Figure 10: Illustration of the overfitting of nonlinear re-lationship functions by taking "Mouth Opening" as exam-ple. (a) The overfitted result of "Mouth Opening". It is donethrough optimization in the global variable space. (b) By re-stricting the optimization variables in appropriate intervals,overfitting is avoided.

6. Conclusion and discussion

In this paper, we work on the issue of exploring the non-linear relationship between blending weights and deformedface geometry of blendshape facial animation. We exploreaforementioned nonlinear relationship through optimization.The optimization searches the nonlinear relationship that cansynthesize realistic facial expressions in the appropriate sub-space of variables. Experiments show that the explored non-linear relationship functions consist with the nonlinearity offacial expressions well. Also, more realistic facial animationcan be synthesized by the explored nonlinear relationshipfunctions than the linear ones.

The proposed scheme in this paper can be used in realisticfacial animation synthesization as conveniently as the lin-ear blending method. In blendshape animation, to animatea facial model, a set of key shapes is firstly constructed.It is an initialization procedure of blendshape facial anima-tion. Then, blending weights are adjusted to synthesize de-sired facial animation. Slightly different from above flow,after construction of key shapes, a further step is neededto explore the nonlinear relationship functions fi(wi). It isdone efficiently with little manual operation. As another

initialization step, it doesn’t cause trouble of usability. Af-ter exploration of fi(wi), users can synthesize facial ani-mation through nonlinear relationship functions as if linearones. When adjusting blending weights, facial expressionsare synthesized through nonlinear relationship functions asillustrated in equation ??.

We have applied the proposed scheme in the applicationsof both performance driven and key frame facial anima-tion. In performance driven application, given one frame ofcaptured facial motion, the blending weights are computedthrough optimization of equation ??. Then the facial expres-sion is synthesized from blending weights through equation??. We have applied it in experiments of user study, as shownin figure ??.

The scheme can also be used in key frame facial anima-tion. Firstly, users may configure key frames expressionsthrough manually adjusting blending weights, as shown infigure ??. Based on that, the other frames blending weightscan be obtained by interpolating the ones between keyframes. At last, the expression of each frame is synthesizedthrough equation ??. We have demonstrated several expres-sions results of key frame facial animation in figure ??.

Figure 11: Configuration of key frames expressions throughmanually adjusting blending weights.

In our work, the geometric constraints are ignored, whichmay result in some artifacts. For example, when closingeyes, the upper and lower eyelids may penetrate into eachother in extreme cases. In blendshape facial animation, usermay achieve the geometric constraints through designingkey shapes carefully. However the geometric constraintsmay be broken during optimization. So in future work, wewill take the geometric constraints into consideration and tryto add equality constraints into the optimization to improvethe scheme.

7. Acknowledge

This paper was supported in part by the National Natu-ral Science Foundation of China, No. U0935003 and No.60970086.



Figure 12: Results of key frame facial animation. Given blending weights of key frames (red bars), the ones of other frames(slider block) are obtained through interpolation between key frames. The facial expressions are then synthesized from blendingweights through nonlinear relationship functions.

References[CDB02] CHUANG E. S., DESHPANDE H., BREGLER C.: Fa-

cial expression space learning. In PG ’02: Proceedings of the10th Pacific Conference on Computer Graphics and Applications(Washington, DC, USA, 2002), IEEE Computer Society, p. 68.

[CFP03] CAO Y., FALOUTSOS P., PIGHIN F.: Unsupervisedlearning for speech motion editing. In SCA ’03: Proceedingsof the 2003 ACM SIGGRAPH/Eurographics symposium on Com-puter animation (Aire-la-Ville, Switzerland, Switzerland, 2003),Eurographics Association, pp. 225–231.

[CK01] CHOE B., KO H.-S.: Analysis and synthesis of facialexpressions with hand-generated muscle actuation basis. In InIEEE Computer Animation Conference (2001), pp. 12–19.

[DZZW02] DALONG J., ZHIGUO L., ZHAOQI W., WEN G.: An-imating 3d facial models with mpeg-4 facedeftables. In SS ’02:Proceedings of the 35th Annual Simulation Symposium (Wash-ington, DC, USA, 2002), IEEE Computer Society, p. 395.

[EF77] EKMAN P., FRIESEN W. (Eds.): Manual for the facialaction coding system. Consulting Psychologists Press, Palo Alto,1977.

[FSF07] FRATARCANGELI M., SCHAERF M., FORCHHEIMERR.: Facial motion cloning with radial basis functions in mpeg-4fba. Graph. Models 69, 2 (2007), 106–118.

[Har71] HARDY R. L.: Multiquadric Equations of Topographyand Other Irregular Surfaces. jgr 76 (1971), 1905–1915.

[Hav06] HAVALDAR P.: Sony pictures imageworks. In SIG-GRAPH ’06: ACM SIGGRAPH 2006 Courses (New York, NY,USA, 2006), ACM, p. 5.

[Hay94] HAYKIN S.: Neural Networks: A Comprehensive Foun-dation. Prentice Hall PTR, Upper Saddle River, NJ, USA, 1994.

[JTDP03] JOSHI P., TIEN W. C., DESBRUN M., PIGHIN F.:Learning controls for blend shape based realistic facial anima-tion.

[KGMT01] KSHIRSAGAR S., GARCHERY S., MAGNENAT-THALMANN N.: Feature point based mesh deformation appliedto mpeg-4 facial animation. In DEFORM ’00/AVATARS ’00:Proceedings of the IFIP TC5/WG5.10 DEFORM’2000 Workshopand AVATARS’2000 Workshop on Deformable Avatars (Deventer,The Netherlands, The Netherlands, 2001), Kluwer, B.V., pp. 24–34.

[KHS01] KÄHLER K., HABER J., SEIDEL H.-P.: Geometry-based muscle modeling for facial animation. In GRIN’01:No description on Graphics interface 2001 (Toronto, Ont.,Canada, Canada, 2001), Canadian Information Processing Soci-ety, pp. 37–46.

[LD08] LI Q., DENG Z.: Facial motion capture editing by auto-mated orthogonal blendshape construction and weight propaga-tion. IEEE Computer Graphics and Applications 2008 (2008).

[LMX∗08] LIU X., MAO T., XIA S., YU Y., WANG Z.: Facialanimation by optimized blendshapes from motion capture data.Comput. Animat. Virtual Worlds 19, 3-4 (2008), 235–245.

[NN01] NOH J.-Y., NEUMANN U.: Expression cloning. In SIG-GRAPH ’01: Proceedings of the 28th annual conference on Com-puter graphics and interactive techniques (New York, NY, USA,2001), ACM, pp. 277–288.

[Par72] PARKE F. I.: Computer generated animation of faces.In ACM ’72: Proceedings of the ACM annual conference (NewYork, NY, USA, 1972), ACM, pp. 451–457.

[Par74] PARKE F. I.: A parametric model for human faces. PhDthesis, 1974.

[PF03] PANDZIC I. S., FORCHHEIMER R. (Eds.): MPEG-4 Fa-cial Animation: The Standard, Implementation and Applications.John Wiley & Sons, Inc., New York, NY, USA, 2003.

[PL06] PIGHIN F., LEWIS J. P.: Facial motion retargeting. InSIGGRAPH ’06: ACM SIGGRAPH 2006 Courses (2006), p. 2.

[Sag06] SAGAR M.: Facial performance capture and expressive



translation for king kong. In SIGGRAPH ’06: ACM SIGGRAPH2006 Sketches (New York, NY, USA, 2006), ACM, p. 26.

[SL09] SHIN H., LEE Y.: Expression synthesis and transfer inparameter spaces. Computer Graphics Forum 28 (2009), 1829–1835.

[TSIF05] TERAN J., SIFAKIS E., IRVING G., FEDKIW R.: Ro-bust quasistatic finite elements and flesh simulation. In SCA ’05:Proceedings of the 2005 ACM SIGGRAPH/Eurographics sym-posium on Computer animation (New York, NY, USA, 2005),ACM, pp. 181–190.

[Ume91] UMEYAMA S.: Least-squares estimation of transforma-tion parameters between two point patterns. IEEE Trans. PatternAnal. Mach. Intell. 13, 4 (1991), 376–380.



(a) (b)

(c)

Figure 13: The nonlinearity of upper eyelid motion. (a) The orbit (red curve) of vertex on upper eyelid during "closing eyes". (b)In blendshape facial animation, the orbit is linearly simplified (blue line). (c) Through setting appropriate nonlinear relationshipfunctions (green curves), the orbit is well approximated.

Table 1: The vote percents of user study.Clip Index 1 2 3 4 5 6 7 8 9 10 11 12 13 TotalNonlinear 100 93.3 100 86.7 93.3 66.7 100 100 80 86.7 86.7 80 93.3 89.7

Linear 0 6.7 0 13.3 6.7 33.3 0 0 20 13.3 13.3 20 6.7 10.3


exploring nonlinear relationship of blendshape facial...

Documents