semantic parametric body shape estimation from...
TRANSCRIPT
SEMANTIC PARAMETRIC BODY SHAPE ESTIMATION FROM NOISY DEPTH
SEQUENCES
Alexandru E. Ichim Federico Tombari
Online Submission ID: 0417
Dynamic 3D Avatar Creation from Hand-held Video Input
Figure 1: Our system creates a fully rigged 3D avatar of the user from uncalibrated video input acquired with a cell-phone camera. Theblendshape models of the reconstructed avatars are augmented with textures and dynamic detail maps, and can be animated in realtime.
Abstract1
We present a complete pipeline for creating fully rigged, personal-2
ized 3D facial avatars from hand-held video. Our system faithfully3
recovers facial expression dynamics of the user by adapting a blend-4
shape template to an image sequence of recorded expressions using5
an optimization that integrates feature tracking, optical flow, and6
shape from shading. Fine-scale details such as wrinkles are captured7
separately in normal maps and ambient occlusion maps. From this8
user- and expression-specific data, we learn a regressor for on-the-fly9
detail synthesis during animation to enhance the perceptual realism10
of the avatars. Our system demonstrates that the use of appropri-11
ate reconstruction priors yields compelling face rigs even with a12
minimalistic acquisition system and limited user assistance. This13
facilitates a range of new applications in computer animation and14
consumer-level online communication based on personalized avatars.15
We present realtime application demos to validate our method.16
CR Categories: I.3.7 [Computer Graphics]: Three-Dimensional17
Graphics and Realism—Animation;18
Keywords: 3D avatar creation, face animation, blendshapes, rig-19
ging20
1 Introduction21
Recent advances in realtime face tracking enable fascinating new ap-22
plications in performance-based facial animation for entertainment23
and human communication. Current realtime systems typically use24
the extracted tracking parameters to animate a set of pre-defined25
characters [Weise et al. 2011; Li et al. 2013; Cao et al. 2013; Bouaziz26
et al. 2013; Cao et al. 2014a]. While this allows the user to enact27
virtual avatars in realtime, personalized interaction requires a custom28
rig that matches the facial geometry, texture, and expression dynam-29
ics of the user. With accurate tracking solutions in place, creating30
compelling user-specific face rigs is currently a major challenge for31
new interactive applications in online communication. In this paper32
we propose a software pipeline for building fully rigged 3D avatars33
from hand-held video recordings of the user.34
Avatar-based interactions offer a number of distinct advantages for35
online communication compared to video streaming. An important36
benefit for mobile applications is the significantly lower demand37
on bandwidth. Once the avatar has been transferred to the target38
device, only animation parameters need to be transmitted during39
live interaction. Bandwidth can thus be reduced by several orders40
of magnitude compared to video streaming, which is particularly41
relevant for multi-person interactions such as conference calls.42
A second main advantage is the increased content flexibility. A 3D43
avatar can be more easily integrated into different scenes, such as44
games or virtual meeting rooms, with changing geometry, illumi-45
nation, or viewpoint. This facilitates a range of new applications,46
in particular on mobile platforms and for VR devices such as the47
Occulus Rift.48
Our goal is to enable users to create fully rigged and textured 3D49
avatars of themselves at home. These avatars should be as realistic50
as possible, yet lightweight, so that they can be readily integrated51
into realtime applications for online communication. Achieving52
this goal implies meeting a number of constraints: the acquisition53
hardware and process need to be simple and robust, precluding54
any custom-build setups that are not easily deployable. Manual55
assistance needs to be minimal and restricted to operations that can56
be easily performed by untrained users. The created rigs need to be57
efficient to support realtime animation, yet accurate and detailed to58
enable engaging virtual interactions.59
These requirements pose significant technical challenges. We maxi-60
mize the potential user base of our system by only relying on simple61
photo and video recording using a hand-held cell-phone camera to62
acquire user-specific data.63
The core processing algorithms of our reconstruction pipeline run64
automatically. To improve reconstruction quality, we integrate a65
simple UI to enable the user to communicate tracking errors with66
simple point clicking. User assistance is minimal, however, and67
required less than 15 minutes of interaction for all our examples.68
Realtime applications are enabled by representing the facial rig as69
a set of blendshape meshes with low polygon count. Blendshapes70
allow for efficient animation and are compatible with all major ani-71
mation tools. We increase perceptual realism by adding fine-scale72
facial features such as dynamic wrinkles that are synthesized on the73
fly during animation based on precomputed normal and ambient74
occlusion maps.75
We aim for the best possible quality of the facial rigs in terms of76
geometry, texture, and expression dynamics. To achieve this goal,77
we formulate dynamic avatar creation as a geometry and texture re-78
construction problem that is regularized through the use of carefully79
designed facial priors. These priors enforce consistency and guaran-80
1
Dynamic 3D Avatar Creation From Hand-held Video Input
SOME OF MY OTHER WORK 1
2
Static Dynamic8 MP, ~100 images HD, ~2 min
Semantic Parametric Body Shape Estimation from Noisy Depth Sequences
SOME OF MY OTHER WORK 1
3
Semantic Parametric Body Shape Estimation from Noisy Depth Sequences
SOME OF MY OTHER WORK 2
4
Semantic Parametric Body Shape Estimation from Noisy Depth Sequences 5
SOME OF MY OTHER WORK 2
Semantic Parametric Body Shape Estimation from Noisy Depth Sequences 6
SOME OF MY OTHER WORK 2
Semantic Parametric Body Shape Estimation from Noisy Depth Sequences
MOTIVATION
7
Generic framework for tracking and modeling articulated bodies
in our experiments:
human bodies
RGB-D input
can be applied to:
hand tracking
animal tracking
Semantic Parametric Body Shape Estimation from Noisy Depth Sequences
BODY REPRESENTATION
8
Pose variations - skeleton + linear blend skinning
Rest pose variations - blendshapes + bone lengths
Semantic Parametric Body Shape Estimation from Noisy Depth Sequences
REGISTRATION ENERGY
9
Semantic Parametric Body Shape Estimation from Noisy Depth Sequences
REGISTRATION ENERGY
10
Features
Efeatures
=X
j
����tposej
� �j
����22
NiTE landmarks - replaceable
Point-to-plane dense closeness
Esurface =X
i
����n
Ti (xi � vi)
����22
ICP iterations
Semantic Parametric Body Shape Estimation from Noisy Depth Sequences
REGISTRATION ENERGY
11
Contour
n =(r
x
Icontour
,ry
Icontour
, 0)T����(rx
Icontour
,ry
Icontour
, 0)����
extracted from the 2D depth image
can be extended to RGB-segmentation
Semantic Parametric Body Shape Estimation from Noisy Depth Sequences
REGISTRATION ENERGY
12
Statistical Pose Prior
Eprior
= �5Eprior proj
+ �6Eprior dev
Eprior proj
=������(� � µ)�MMT (� � µ)
������2
2distance to projection
Eprior dev
=������⌃�1/2MT (� � µ)
������2
2
std-dev-weighting of the projection
Semantic Parametric Body Shape Estimation from Noisy Depth Sequences
REGISTRATION ENERGY
13
Smoothness term
Esmoothness
=�������(t) � �(t�1)
������2
2
Semantic Parametric Body Shape Estimation from Noisy Depth Sequences
MODELING ENERGY
14
Emodeling
= �1Esurface
+ �2Econtour
+ �3Ebs reg
Regularization termsEbs reg L2 = ||s||22Ebs reg L1 = ||s||1
accumulate constraints over windows of frames - performance trade-off
solve and update the body template shape
Semantic Parametric Body Shape Estimation from Noisy Depth Sequences
MODELING ENERGY
15
word cloud visualizations
L1 regularization
L2 regularization
Semantic Parametric Body Shape Estimation from Noisy Depth Sequences
RESULTS
16
Semantic Parametric Body Shape Estimation from Noisy Depth Sequences
Thank you!
17