2d tracking to 3d reconstruction of human body from monocular video moin nabi mohammad rastegari

55
2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Post on 20-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

2D Tracking to 3D Reconstruction of Human Body from Monocular Video

Moin NabiMohammad Rastegari

Page 2: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Introduction to 3D Reconstruction

Stereo [Multiple Camera] Monocular [Single Camera]

Approaches:

Difficult!

Page 3: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Difficulties of 3D Reconstruction

Page 4: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Difficulties of Monocular 3D Reconstruction

Local properties not enough for depth estimation.

Need to learn global structure.

Overall organization of the image Contextual Information

Page 5: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Difficulties of Monocular 3D Reconstruction

Page 6: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Difficulties of Monocular 3D Reconstruction

Depth ambiguity problem

we should estimate Depth We can have innumerable States with single Observation

Page 7: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Difficulties of Monocular 3D Reconstruction

Forward-Backward ambiguity problem

We can have 2#limbs configuration

• With Physical constrain• With Learning

Page 8: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Application of Monocular 3D Reconstruction

3D Motion Capturing

3D Medical Imaging

Page 9: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Application of Monocular 3D Reconstruction

Human-Computer Interfaces

Video games, More Reality

Page 10: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Problem Backgrounds

2D video offers limited clues about actual 3D motion

Humans interpret 2D video easily

Goal: Reliable 3D reconstructions from standard single-camera input

Page 11: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

2D Tracking 3D Reconstruction

Skeleton Extraction Build Flesh

Work-Flow of Monocular 3D Reconstruction

2D 3D

?

Page 12: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Skeleton Extraction

Page 13: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Skeleton Extraction

Proposed Skeleton for Human Body

Page 14: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Overview of approach

2D Tracking 3D Reconstruction

Page 15: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Reconstruction of articulated Objects from Point Correspondences in a Single Uncalibrated Image

[Camillo J. Taylor, 2000 ]

Objective: To recover the configuration of anarticulated object from image measurements

Assumptions: Scaled orthographic projection (unknown scale) Relative lengths of segments in model known

Input: Correspondences between joints in themodel and points in the image

Output: Characterization of the set of all possibleconfigurations

Page 16: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Reconstruction of articulated Objects from Point Correspondences in a Single Uncalibrated Image

[Camillo J. Taylor, 2000 ]

?

Page 17: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Reconstruction of articulated Objects from Point Correspondences in a Single Uncalibrated Image

[Camillo J. Taylor, 2000 ]

The set of all possible solutions can be characterized by a single

scalar parameter, s and a set of binary flags indicating the direction of each segment

Solutions for various values of the s parameter

Page 18: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Reconstruction of articulated Objects from Point Correspondences in a Single Uncalibrated Image

[Camillo J. Taylor, 2000 ]

The scalar, s, was chosen to be the minimum possible value and the segment directions were specified by the user.

In practice the policy of choosing minimum allowable value of scale parameter as default usually yields acceptable result since it reflects the fact that one or more segments in the model are typically quit close to perpendicular to the viewing direction and are, therefore, not significantly foreshortened.

Page 19: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Reconstruction of articulated Objects from Point Correspondences in a Single Uncalibrated Image

[Camillo J. Taylor, 2000 ]

Experimental results:

Page 20: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Reconstruction of articulated Objects from Point Correspondences in a Single Uncalibrated Image

[Camillo J. Taylor, 2000 ]

Page 21: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Bayesian Reconstruction of 3D Human Motion from Single-Camera Video

[N. R. Howe, M. E. Leventon, W. T. Freeman, 2001]

Motion divided into short movements, informally called snippets.

Assign probability to 3D snippets by analyzing knowledge base.

Each snippet of 2D observations is matched to the most likely 3D motion.

Resulting snippets are stitched together to reconstruct complete movement.

Page 22: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Bayesian Reconstruction of 3D Human Motion from Single-Camera Video

[N. R. Howe, M. E. Leventon, W. T. Freeman, 2001]

choose snippet -> Long enough to be informative, but short enough to characterize Collect known 3D motions, form snippets. Group similar movements, assemble matrix. SVD gives Gaussian probability cloud that generalizes to similar movements.

Learning Priors on Human Motion

Page 23: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Bayesian Reconstruction of 3D Human Motion from Single-Camera Video

[N. R. Howe, M. E. Leventon, W. T. Freeman, 2001]

Bayes’Law gives probability of 3D snippet given the 2D observations:

P(snip|obs)=k P(obs|snip) P(snip)

Training database gives prior -> P(snip). Assume normal distribution of tracking errors to get likelihood -> P(obs|snip).

Posterior Probability

Page 24: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Bayesian Reconstruction of 3D Human Motion from Single-Camera Video

[N. R. Howe, M. E. Leventon, W. T. Freeman, 2001]

Snippets overlap by n frames. Use weighted interpolation for frames of overlapping snippets.

Stitching

Page 25: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Bayesian Reconstruction of 3D Human Motion from Single-Camera Video

[N. R. Howe, M. E. Leventon, W. T. Freeman, 2001]

Page 26: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Bayesian Reconstruction of 3D Human Motion from Single-Camera Video

[N. R. Howe, M. E. Leventon, W. T. Freeman, 2001]

Page 27: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Bayesian Reconstruction of 3D Human Motion from Single-Camera Video

[N. R. Howe, M. E. Leventon, W. T. Freeman, 2001]

Page 28: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Monocular Reconstruction of 3D Human Motion by Qualitative Selection

[M. Eriksson, S. Carlsson, 2004]

Depth ambiguity -> by using Taylor method

Forward-Backward ambiguity -> Prune possible binary configurations

Page 29: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Monocular Reconstruction of 3D Human Motion by Qualitative Selection

[M. Eriksson, S. Carlsson, 2004]

Forward-Backward ambiguity ->

For any point-set, X, representing a motion, we can represent its binary configuration with respect to the image plane

where 0 means that the limb points outwards, from the image plane, and 1 means that the limb points inwards, towards the image plane.

Page 30: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Monocular Reconstruction of 3D Human Motion by Qualitative Selection

[M. Eriksson, S. Carlsson, 2004]

In this case, limb 1 and limb 2 are both parallel to the mage-plane.If limb 1 is the root segment, limb 3 points towards the image plane, while limb 4 points away from the image plane. Any infinitesimal rotation (except for rotations around limb 1 and limb 2), of this structure will put it into one of the following four binary configurations: [0, 0, 1, 0], [0, 1, 1, 0], [1, 0, 1, 0], [1, 1, 1, 0]

Example for 4 limbs:

Page 31: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Monocular Reconstruction of 3D Human Motion by Qualitative Selection

[M. Eriksson, S. Carlsson, 2004]

3d Reconstruction in Limited Domain

Key frame Selection

Limited Domain:

Page 32: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Monocular Reconstruction of 3D Human Motion by Qualitative Selection

[M. Eriksson, S. Carlsson, 2004]

Sign of determinant

Humming distance

Qualitative measure:

Page 33: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Monocular Reconstruction of 3D Human Motion by Qualitative Selection

[M. Eriksson, S. Carlsson, 2004]

Experimental Results:

Page 34: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Monocular Reconstruction of 3D Human Motion by Qualitative Selection

[M. Eriksson, S. Carlsson, 2004]

Page 35: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Learning to Reconstruct 3D Human Pose and Motion from Silhouettes

[A. Agarwal, B. Triggs, 2004]

Recover 3D human body pose from image silhouettes 3D pose = joint angles Use either individual images or video sequences

Page 36: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

2 Broad Classes of Approaches

• Model based approaches– Presuppose an explicitly known parametric body model– Inverting kinematics / Numerical optimization– subcase: Model based tracking

• Learning based approaches– Avoid accurate 3D modeling/rendering– e.g. Example based methods

Page 37: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

“Model Free” Learning – based Approach

• Recovers 3D pose (joint angles) by direct regression on robust silhouette descriptors

• Sparse kernel-based regressor trained used human motion capture data

Advantages: • no need to build an explicit 3D model• easily adapted to different people / appearances • may be more robust than model based approach

Disadvantages: • harder to interpret than explicit model, and may be less accurate

Page 38: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

The Basic Idea

To learn a compact system that directly outputs pose from an image

• Represent the input (image) by a descriptor vector z.• Write the multi-parameter output (pose) as a vector x.• Learn a regressor

x = F(z) + εNote: this assumes a functional relationship between z and x, which might not really be the case.

Page 39: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Silhouette Descriptors

Page 40: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Why Use Silhouettes ?

• Captures most of the available pose information• Can (often) be extracted from real images• Insensitive to colour, texture, clothing• No prior labeling (e.g. of limbs) required

Limitations• Artifacts like attached shadows are

common• Depth ordering / sidedness information

is lost

Page 41: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Ambiguities

Which arm / leg is forwards? Front or back view? Where is occluded arm? How much is knee bent?

Silhouette-to-pose problem is inherently multi-valued …Single-valued regressors sometimes behave erratically

Page 42: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Shape Context Histograms• Need to capture silhouette shape but be robust against

occlusions/segmentation failures– Avoid global descriptors like moments– Use Shape Context Histograms – distributions of local shape

context responses

Page 43: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Shape Context Histograms Encode Locality

First 2 principal components of Shape Context (SC) distribution from combined training data, with k-means centres superimposed, and an SC distribution from a single silhouette.

SCs implicitly encode position on silhouette – an average overall human silhouettes -like form is discernable

Page 44: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Nonlinear Regression

Page 45: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Regression ModelPredict output vector x (here 3D human pose), given input vector z (here a shape context histogram):

x = ∑ akφk(z) + ε ≡ A f(z) + ε

• {φk(z) | k = 1…p} : basis functions• A ≡ (a1 a2 … ap)• f(z) = (φ1(z) φ2(z) … φp(z))T

• Kernel bases φk = K(z,zk) for given centre points zk and kernel K.

e.g. K(z,zk) = exp(-β║z-zk║2)

k=1

p

A

Page 46: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Regularized Least Squares

A = arg min { ∑ ║A f(zi) - xi║2 + R(A)}

= arg min { ║A F - X║2 + R(A)}

R(A): Regularizer / penalty function to control overfitting

Ridge Regression:

R(A) = trace(A T A)

i=1

n

A

A

Page 47: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Synthetic Spiral Walk Test Sequence

Page 48: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Spiral Walk Test Sequence

Mostly OK, but ~15% “glitches” owing to pose ambiguities

Page 49: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Glitches

• Results are OK most of the time, but there are frequent “glitches”– regressor either chooses wrong case of an ambiguous

pair, or remains undecided.• Problem is especially evident for heading angle the

most visible pose variable.

Page 50: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Real Image example

Page 51: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Tracking Framework

• Reduce glitches by embedding problem in a tracking framework.

• Idea: using temporal information to serve as a hint to ‘select’ the correct solution

• To include state information, we use the familiar (dynamical prediction) + (observation update) framework, but implement both parts using learned regression models.

Page 52: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Joint Regression equations

• Dynamics2nd order linear autoregressive model

x’t ≡ A xt-1 + B xt-2

• State-sensitive observation updateNonlinear dependence on state prediction

xt = C x’t + ∑dkφk(x’t,zt) + ε

[Kernel selects examples close in both z and x space]

Page 53: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Spiral Walk Test Sequence

Page 54: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Weakness: a weak observation may lead to domination of the dynamical model..

Real Images Test Sequence

Page 55: 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

Thank You