[footsteps and inchworms€¦ · recover 3d vectors of world motion stereo view 1 stereo view 2 t...

60

Upload: others

Post on 16-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating
Page 2: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

[Footsteps and Inchworms – Anstis, Perception 2001]

Page 3: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

[Footsteps and Inchworms – Anstis, Perception 2001]

Page 4: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

[Footsteps and Inchworms – Anstis, Perception 2001]

Page 5: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

https://scratch.mit.edu/projects/188838060/

Page 6: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Optical flow

Gradient is informative of direction over only < 1 pixel

Page 7: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Reduce the resolution!

Page 8: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

image 2image 1

Gaussian pyramid of image 1 Gaussian pyramid of image 2

image 2image 1 u=10 pixels

u=5 pixels

u=2.5 pixels

u=1.25 pixels

Coarse-to-fine optical flow estimation

Page 9: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

image Iimage J

Gaussian pyramid of image 1 Gaussian pyramid of image 2

image 2image 1

Coarse-to-fine optical flow estimation

run iterative L-K

run iterative L-K

warp & upsample

.

.

.

Page 10: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Optical Flow Results

* From Khurram Hassan-Shafique CAP5415 Computer Vision 2003

Page 11: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Optical Flow Results

* From Khurram Hassan-Shafique CAP5415 Computer Vision 2003

Page 12: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Can we do more? Scene flow

Combine spatial stereo & temporal constraints

Recover 3D vectors of world motion

Stereo view 1 Stereo view 2

t

t-1

3D world motion vector per pixel

z

x

y

Page 13: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Scene flow example for human motion

Estimating 3D Scene Flow from Multiple 2D Optical Flows, Ruttle et al., 2009

Page 14: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Stereo correspondence

• Let x be a point in left image, x’ in right image

• Epipolar relation• x maps to epipolar line l’

• x’ maps to epipolar line ll’l

x x’

Page 15: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

How does a depth camera work?

Intel laptop depth camera

Microsoft Kinect v1

Page 16: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Active stereo with structured light

Project “structured” light patterns onto the object• Simplifies the correspondence problem

• Allows us to use only one camera

camera

projector

L. Zhang, B. Curless, and S. M. Seitz. Rapid Shape Acquisition Using Color Structured Light and Multi-pass Dynamic Programming.

3DPVT 2002

Stereo system!

Page 17: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Kinect: Structured infrared light

http://bbzippo.wordpress.com/2010/11/28/kinect-in-infrared/

Page 18: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

With either technique…

…I gain depth maps over time.

Optex Depth Camera Based on Canesta Solution

Page 19: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Yisheng Zhou

Page 20: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Demo

Page 21: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Real-Time Human Pose Recognition inParts from Single DepthImages

Jamie Shotton et al. ( MS Research & Xbox Incubation )

CVPR 2011

Slides by YoungSun Kwonhttp://sglab.kaist.ac.kr/~sungeui/IR/Presentation/first/20143050권용선.pdf

2014. 11. 11

Page 22: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Background

• Motion Capture ( Mocap )

• Capture a motion from sensors attached to human body

http://www.neogaf.com/forum/showthread.php?t=824332

Page 23: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Background

• Pose Recognition

• Estimate a pose from images and make a skeletal model

http://www.vision.ee.ethz.ch/~hpedemo/fullhpedemo.png

http://www.youtube.com/watch?v=Y-iKWe-U9bY

Page 24: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Background

• Depth Image

• Each pixel has distance information, instead of RGB

Depth Image

Depth Camera

RGB Image

RGB Camerahttp://userpage.fu-berlin.de/~latotzky/wheelchair/wp-content/uploads/kinect1_cropped.png

Page 25: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Why thispaper?

• Main Contribution

• Convert pose recognition problem to classification problem

• One of application for image retrieval technique

Solution

Solution

Pose Recognition Problem

difficult

ClassificationProblem

Pose RecognitionProblem

simple

Page 26: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Why thispaper?

• Main Contribution

• Convert pose recognition problem to classification problem

• One of application for image retrieval technique

SolutionPose Recognition

Problem

difficult

[1] V. Ganapathi et al.,Real-Time motion Captureusing a Single Time-of-Flight camera,CVPR, 2010

[2] C. Palgemann et al.,Real-Time Identification and Localization of Body Parts from DepthImages,ICRA, 2010

Kinematic constraintT-pose initialization

Limited patches Only 3 parts

Page 27: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Why thispaper?

• Main Contribution

• Convert pose recognition problem to classification problem

• One of application for image retrieval technique

SolutionClassification

ProblemPose Recognition

Problem

No constraint, More General

Page 28: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

• Overview

Overview

Classification OutputInput

Training Classifier Dataset

Feature detect

Lecture Note - BoW

Page 29: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

• Overview

Overview

Capture depth image

Calculate feature response per pixel

Classify body parts per pixel

Estimate body joint positions

Kinect Slides CVPR2011.pptx

Training Decision Forest Classifier Synthetic training set

Page 30: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Body PartRepresentation

• 31 body parts ( classes )

• LU/RU/LW/RW head

• Neck

• L/R shoulder

• LU/RU/LW/RW arm

• L/R elbow

• L/R wrist

• L/R hand

• LU/RU/LW/RW torso

• LU/RU/LW/RW leg

• L/R knee

• L/R ankle

• L/R foot

Page 31: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Synthetic dataset

• To account for variations in real world

• Rotation & Translation, Hair, Clothing, Height, Camera Pose, etc…

• Large scale and variety

Record motion captures500K frames and

extract 100K poses among these

Render (depth, body parts) pairs

Create several modelswith variations

Supplementary Material

Page 32: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

• ∆1= 0 , 1

• 𝑓 𝐼, 𝒙| ∆1

• 𝑓 𝐼, 𝒙| ∆3

∆3= −1 ,0

has small value

has large value

Depth ImageFeature Comparison

Calculate feature response for each pixel

• ∆ is chosen in training step randomly

• For example

• Can be trained in parallel on GPUs

Input depth image

∆1

image depth offset depth

𝑓 𝐼, x = 𝑑𝐼 x − 𝑑𝐼(x + Δ)

pixel

∆2

∆3

∆2

∆1

∆3

∆3

∆1

[3] V. Lepetit et al., Randomized trees for real-time keypoint recognition, CVPR, 2005

Feature Response Function

Page 33: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

• Remember Viola-Jones face detector?

• Example of classification for hand(H) or foot(F)

At point𝒙

no

F H

P(c)

𝑓 𝐼, 𝒙|∆1

yes

> 100

no

𝑓 𝐼, 𝒙| ∆3 > 50

yes

F H

P(c)

F H

P(c)

4 T. Amit et al., Shape quantization and recognition with randomized trees, Neural Computation,19975 L. Breiman, Random forests, Mach. Learning,20016 F. Moosmann et al., Fast discriminative visual codebooks using randomized clustering forests, NIPS,2006

Decision tree classifier

Page 34: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Decision ForestClassifier

• In training step, ∆ is chosen randomly

• Generate many trees to build a decision forest

• In testing step, check all trees and compute average probability

………

tree 1 tree T

c

PT(c)

(𝐼, x) (𝐼, x)

c

P1(c)

1𝑃 𝑐 𝐼, x =

𝑇σ𝑡

𝑇 𝑃𝑡(𝑥, 𝐼|𝑐)

Page 35: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

But…normalized in depth

• for Depth Invariance in

Yisheng Zhou

Page 36: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Joint PositionProposal

• Find mode using mean shift algorithm

• With weighted Gaussian kernel

• Using class probabilities for each pixel, find representative positions of classes

Estimate bodyjoint positions

[7] S. Belongie et al., Mean shift: A robust approach toward feature space analysis, PAMI,2002

Page 37: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Mean shift algorithmTry to find modes of a non-parametric density.

Color

space

Color space

clusters

Page 38: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Region of

interest

Center of

mass

Mean Shift

vector

Slide by Y. Ukrainitz & B. Sarel

Mean shift

Page 39: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Region of

interest

Center of

mass

Mean Shift

vector

Slide by Y. Ukrainitz & B. Sarel

Mean shift

Page 40: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Region of

interest

Center of

mass

Mean Shift

vector

Slide by Y. Ukrainitz & B. Sarel

Mean shift

Page 41: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Region of

interest

Center of

mass

Mean Shift

vector

Mean shift

Slide by Y. Ukrainitz & B. Sarel

Page 42: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Region of

interest

Center of

mass

Mean Shift

vector

Slide by Y. Ukrainitz & B. Sarel

Mean shift

Page 43: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Region of

interest

Center of

mass

Mean Shift

vector

Slide by Y. Ukrainitz & B. Sarel

Mean shift

Page 44: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Region of

interest

Center of

mass

Slide by Y. Ukrainitz & B. Sarel

Mean shift

Page 45: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Kernel density estimation

Kernel density estimation function

Gaussian kernel

n = number of points assessedh = ‘bandwidth’, or normalization for size of region

Page 46: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Mean shift clustering

The mean shift algorithm seeks modes of the given set of points1. Choose kernel and bandwidth

2. For each point:a) Center a window on that point

b) Compute the mean of the data in the search window

c) Center the search window at the new mean location

d) Repeat (b,c) until convergence

3. Assign points that lead to nearby modes to the same cluster

Page 47: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Joint PositionProposal

• Find mode using mean shift algorithm

• With weighted Gaussian kernel

• Using class probabilities for each pixel, find representative positions of classes

Estimate bodyjoint positions

3D position3D position pixel of i pixel

of class weight

pixel index i bandwidth

class depth at probability i pixel

[7] S. Belongie et al., Mean shift: A robust approach toward feature space analysis, PAMI,2002

Page 48: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Results

• Fast Joint Proposals

• Max. 200 FPS on Xbox 360 GPU, 50 FPS on 8 core CPU

• Previous work was 4 ~ 16FPS

Input depth image(background segmented)

Inferred body parts

Front view

Side view

Top view

Page 49: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Depth of trees

Page 50: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Offset Size

Page 51: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Results

• Body Parts Classification Accuracy on synthetic test set

• GT body parts ( 0.914 mAP ) vs Our Algorithm ( 0.731 mAP )

Page 52: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Results

• Joint Prediction Accuracy

• How well body joint position is predicted

[1]

Page 53: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Summary

• Body parts representation for efficiency

• Fast, simple machine learning – Decision Forest

• No constraint, high generality

• Significant engineering to scale to a massive, varied training dataset

Page 54: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

VNect – Mehta et al.

Depth information is rich…

…but do we always need it?

Can we learn to predict joint locations from RGB data?

Page 55: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating
Page 56: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Pipeline

Page 57: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Joint position encoding

Page 58: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Training data

Page 59: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating

Architecture

ResNetreminder

Page 60: [Footsteps and Inchworms€¦ · Recover 3D vectors of world motion Stereo view 1 Stereo view 2 t t-1 3D world motion vector per pixel z x y. Scene flow example for human motion Estimating