d , human pose and camera posefiles.is.tue.mpg.de/jgall/tutorials/slides/cvpr...joint work with...
TRANSCRIPT
![Page 1: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/1.jpg)
DEPTH, HUMAN POSE, AND CAMERA POSEJAMIE SHOTTON
![Page 2: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/2.jpg)
Kinect Adventures
• Depth sensing camera
• Tracks 20 body joints in real time
• Recognises your face and voice
![Page 3: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/3.jpg)
![Page 4: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/4.jpg)
top view
side view
depth image(camera view)
What the Kinect Sees
![Page 5: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/5.jpg)
Structured light
x
yz
baseline
imagingplane
optic centreof camera
optic centreof IR laser
object atdepth d1
object atdepth d2
![Page 6: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/6.jpg)
Depth Makes Vision That Little Bit Easier
RGB
Only works well lit
Background clutter
Scale unknown
Color and texture variation
DEPTH
Works in low light
Background removal easier
Calibrated depth readings
Uniform texture
![Page 7: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/7.jpg)
![Page 8: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/8.jpg)
Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet
Kohli, Steve Hodges, Andrew Davison, Andrew Fitzgibbon. SIGGRAPH, UIST and ISMAR 2011.
KIN
ECTFusion
![Page 9: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/9.jpg)
KIN
ECTFusion
Camera drift
![Page 10: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/10.jpg)
ROADMAP
THEVITRUVIAN MANIFOLD
[CVPR 2012]
SCENE COORDINATE REGRESSION
[CVPR 2013]
![Page 11: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/11.jpg)
THE VITRUVIAN MANIFOLD
Jonathan Taylor Jamie Shotton Toby Sharp Andrew Fitzgibbon
CVPR 2012
![Page 12: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/12.jpg)
Human Pose Estimation
In this work:
• Single frame at a time (no tracking)
• Kinect depth image as input (background removed)
Given some image input, recover the 3D human pose:
Joint positions and angles
![Page 13: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/13.jpg)
Why is Pose Estimation Hard?
![Page 14: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/14.jpg)
A Few Approaches
Regress directly to pose?e.g. [Gavrila ’00] [Agarwal & Triggs ’04]
Per-Pixel Body Part Classification[Shotton et al. ‘11]
Per-Pixel Joint Offset Regression[Girshick et al. ‘11]
Detect and assemble parts?e.g. [Felzenszwalb & Huttenlocher ’00] [Ramanan & Forsyth ’03] [Sigal et al. ’04]
Detect parts?e.g. [Bourdev & Malik ‘09] [Plagemann et al. ‘10] [Kalogerakis et al. ‘10]
![Page 15: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/15.jpg)
body joint hypotheses
front view side view top view
input depth image body parts
BPC Clustering
Background: Learning Body Parts for Kinect
[Shotton et al. CVPR 2011]
![Page 16: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/16.jpg)
Synthetic Training Data
Train invariance to:
Record mocap100,000s of poses
Retarget to varied body shapes
Render (depth, body parts) pairs
[Vicon]
![Page 17: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/17.jpg)
Depth Image Features
• Depth comparisons
– very fast to compute
inputdepthimage
xΔ
xΔ
xΔx
Δ
x
Δ
x
Δ
f(x; v) = 𝑑 x − 𝑑(x+ Δ)
offset depth
image coordinate
offset depth
featureresponse
Background pixelsd = large constant
scales inversely with depth
Δ =𝐯
𝑑 x
![Page 18: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/18.jpg)
Decision tree classification
image windowcentred at x
no
no yes
yes
P(c)P(c)
f(x; v1) > θ1
f(x; v2) > θ2
no yes
P(c)P(c)
f(x; v3) > θ3
![Page 19: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/19.jpg)
Training Decision Trees
Sn = x
f(x; vn) > θn
no yes
c
Pr(c)
body part c
Pn(c)
c
Pl(c)
Take (v, θ) that maximisesinformation gain:
n
l r
Goal: drive entropyat leaf nodesto zero
reduceentropy
[Breiman et al. 84]
for all pixels
Δ𝐸 = −𝑆l𝑆𝑛𝐸(Sl) −
𝑆r𝑆𝑛𝐸(Sr)
![Page 20: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/20.jpg)
Decision Forests Book
• Theory – Tutorial & Reference
• Practice – Invited Chapters
• Software and Exercises
• Tricks of the Trade
![Page 21: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/21.jpg)
input depth inferred body parts
no tracking or smoothing
![Page 22: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/22.jpg)
body joint hypotheses
front view side view top view
input depth image body parts
BPC Clustering
![Page 23: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/23.jpg)
front view top viewside view
input depth inferred body parts
inferred joint position hypothesesno tracking or smoothing
![Page 24: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/24.jpg)
Single frame at a time –> robust
Large training corpus -> invariant
Fast, parallel implementation
Skeleton does not explain the depth dataLimited ability to cope with hard poses
Body Part Recognition in Kinect
![Page 25: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/25.jpg)
Explain the data directly with a mesh model[Ballan et al. ‘08] [Baak et al. ‘11]
• GOOD: Full skeleton• GOOD: Kinematic constraints enforced from the outset• GOOD: Able to cope with occlusion and cropping• BAD: Many local minima• BAD: Highly sensitive to initial guess• BAD: Potentially slow
A few approaches
![Page 26: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/26.jpg)
𝑅l_arm(𝜃)
• Mesh is attached to a hierarchical skeleton
• Each limb 𝑙 has a transformation matrix 𝑇𝑙 𝜃relating its local coordinate system to the world:
• 𝑅global(𝜃) encodes a global scaling, translation and rotation
• 𝑅𝑙(𝜃) encodes a rotation and fixed translation relative to its parent
• 13 parameterized joints using quaternions to represent unconstrained rotations
• This gives 𝜃 a total of 1 + 3 + 4 + 4 ∗ 13 = 60 degrees of freedom
𝑅global(𝜃)𝑇root 𝜃 = 𝑅global(𝜃)
𝑇𝑙 𝜃 = 𝑇parent 𝑙 𝜃 𝑅𝑙(𝜃)
Human Skeleton Model
![Page 27: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/27.jpg)
Linear Blend Skinning
𝑀 𝑢; 𝜃 =
𝑘=1
𝐾
𝛼𝑘𝑇𝑙𝑘 𝜃 𝑇𝑙𝑘−1 𝜃0 𝑝
Each vertex 𝑢
• has position 𝑝 in base pose 𝜃0• is attached to K limbs 𝑙𝑘 𝑘=1
𝐾 with weights 𝛼𝑘 𝑘=1𝐾
In a new pose 𝜃, the skinned position 𝑢 of is:
Mesh in base pose 𝜃0
position in limb lk’scoordinate system
position in world coordinate system
![Page 28: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/28.jpg)
min𝜃min𝑢1…𝑢𝑛
𝑖
𝑑(𝑥𝑖 , 𝑀 𝑢𝑖; 𝜃 )
Test Time Model Fitting• Assume each observation 𝑥𝑖 is generated by a point on our model 𝑢𝑖
𝑥𝑖 = 𝑀 𝑢𝑖; 𝜃
What pose isthe model in?
Observed 3D Point Predicted 3D Point• Optimize:
𝐶𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔𝑀𝑜𝑑𝑒𝑙 𝑃𝑜𝑖𝑛𝑡𝑠: 𝑢1, … 𝑢𝑛
𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑃𝑜𝑖𝑛𝑡𝑠: 𝑥1, … , 𝑥𝑛
𝑥𝑖
Note: simplified energy - more details to come
![Page 29: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/29.jpg)
Optimizing
• Alternating between pose 𝜃 and correspondences 𝑢1, … 𝑢𝑛Articulated Iterative Closest Point (ICP)
• Traditionally, start from initial 𝜃
• manual initialization
• track from previous frame
• Could we instead infer initial correspondences 𝑢1, … 𝑢𝑛 discriminatively?
• And, do we even need to iterate?
min𝜃min𝑢1…𝑢𝑛
𝑖
𝑑(𝑥𝑖 , 𝑀 𝑢𝑖; 𝜃 )
![Page 30: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/30.jpg)
One-Shot Pose Estimation: An Early Result
Can we achieve a good result without iterating between pose 𝜃 and correspondences 𝑢1, … 𝑢n?
ground truthcorrespondences
testdepth image
convergencevisualization
![Page 31: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/31.jpg)
Texture is mapped across body shapes and poses
From Body Parts to Dense Correspondences
increasing number of parts
classification regression
The “Vitruvian Manifold”Body Parts
![Page 32: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/32.jpg)
The “Vitruvian Manifold” Embedding in 3D
v = 1
v = -1
u = -1 u = 1
w = -1 [L. Da Vinci, 1487]
w = 1
Geodesic surface distances approximated by Euclidean distance
![Page 33: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/33.jpg)
Overview
inferred dense correspondences
testimages
regression forest
…
energyfunction
op
tim
izat
ion
of
mo
de
l par
amet
ers 𝜃
final optimized poses
front right top
trainingimages
![Page 34: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/34.jpg)
Discriminative Model: Predicting Correspondences
inputimages
inferred dense correspondences
regression forest
…
trainingimages
![Page 35: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/35.jpg)
Learning the Correspondences
• How to learn the mapping from depth pixels to correspondences?𝑥𝑖
• Render synthetic training set:
rendercharacters
mocap
• Train regression forest
![Page 36: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/36.jpg)
mean shift
mode detection
Each pixel-correspondence pair descends to a leaf in the tree
Learning a Regression Model at the Leaf Nodes
![Page 37: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/37.jpg)
Inferring Correspondences
![Page 38: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/38.jpg)
infer correspondences 𝑈
optimize parameters
min𝜃𝐸(𝜃, 𝑈)
![Page 39: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/39.jpg)
Full Energy
• Term Evis approximates hidden surface removal and uses robust error
• Gaussian prior term Eprior
• Self-intersection prior term Eint approximates interior volume
𝐸 𝜃, 𝑈 =𝜆vis𝐸vis 𝜃, 𝑈 + 𝜆prior𝐸prior 𝜃 + 𝜆int𝐸int 𝜃
Energy is robust to noisy correspondences
• Correspondences far from their image points are “ignored”
• Correspondences facing away from the camera are “ignored”• avoids model getting stuck in front of the image pixels
𝜌(𝑒)
𝑒 = 0
𝑐𝑠(𝜃0) 𝑐𝑡(𝜃0)
![Page 40: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/40.jpg)
![Page 41: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/41.jpg)
![Page 42: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/42.jpg)
“Easy” Metric: Average Joint Accuracy
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0 0.05 0.1 0.15 0.2
Join
ts a
vera
ge a
ccu
racy
(% jo
ints
wit
hin
dis
tan
ce D
)
D: max allowed distance to GT (m)
Our algorithm
Given GT u
Optimal θ𝜃
𝑈
Results on 5000 synthetic images
![Page 43: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/43.jpg)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0 0.05 0.1 0.15 0.2 0.25 0.3
Wo
rst-
case
acc
ura
cy(%
fra
me
s w
ith
all
join
ts w
ith
in d
ist.
D)
D: max allowed distance to GT (m)
Our algorithm
Given GT u
Optimal θ
Hard Metric:“Perfect” Frame Accuracy
𝜃
𝑈Results on 5000synthetic images
0.09m 0.11m 0.17m 0.21m 0.45mD:
![Page 44: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/44.jpg)
Comparison
0%
10%
20%
30%
40%
50%
60%
70%
0 0.05 0.1 0.15 0.2 0.25 0.3
Wo
rst
case
acc
ura
cy(%
fra
me
s w
ith
all
join
ts w
ith
in d
ist.
D)
D: max allowed distance to GT (m)
Our algorithm
[Shotton et al. '11] (top hypothesis)
[Girshick et al. 11] (top hypothesis)
[Shotton et al. '11] (best of top 5)
[Girshick et al. '11] (best of top 5)
Require an oracle
Achievable algorithms
Results on 5000 synthetic images
Vitruvian Manifold
![Page 45: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/45.jpg)
Sequence Result
Each frame fit independently: no temporal information used
![Page 46: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/46.jpg)
![Page 47: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/47.jpg)
• Easily extended to Q views where each view has
– 𝑛𝑞 correspondences per view
– viewing matrix 𝑃𝑞 to register the scene
• Can also extend to 2D silhouette views
– let data points 𝑥𝑖𝑘 be 2D image coordinates
– let 𝑃𝑞 include a projection to 2D
– minimize re-projection error
Generalization to Multiple 3D/2D Views
min𝜃
𝑞=1
𝑄
𝑖
𝑛𝑞
𝑑(𝑥𝑖𝑞 , 𝑃𝑞𝑀 𝑢𝑖𝑞; 𝜃 )
![Page 48: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/48.jpg)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
Wo
rst
case
acc
ura
cy(%
fra
me
s w
ith
all
join
ts w
ith
in d
ist.
D)
D: max allowed distance to GT (m)
2 silhouette views
3 silhouette views
5 silhouette views
1 depth view
2 depth views
5 depth views
Silhouette Experiment
![Page 49: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/49.jpg)
Vitruvian Manifold: Summary
• Predict per-pixel image-to-model correspondences
– train invariance to body shape, size, and pose
• “One-shot” pose estimation
– fast, accurate
– auto-initializes using correspondences
![Page 50: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/50.jpg)
SCENE COORDINATE REGRESSION FORESTS
FOR CAMERA RELOCALIZATION IN RGB-D IMAGES
JAMIE SHOTTON BEN GLOCKER CHRISTOPHER ZACH SHAHRAM IZADI ANTONIO CRIMINISI ANDREW FITZGIBBON
[CVPR 2013]
![Page 51: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/51.jpg)
Know this
Observe this
Where is this?
6D camera pose, 𝐻(camera to scene transformation)
Single RGB-D frame
A world scene
![Page 52: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/52.jpg)
APPLICATIONS
Lost or kidnapped robots
Improving KinectFusion
Augmented reality
![Page 53: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/53.jpg)
TYPICAL APPROACHES TO CAMERA LOCALIZATION
Tracking – alignment relative to previous frame e.g. [Besl & MacKay ‘92]
Key point detection → local descriptors → matching → geometric verification
e.g. [Holzer et al. ‘12], [Winder & Brown ‘07], [Lepetit & Fua ‘06], [Irschara et al. ‘09]
Whole key-frame matching e.g. [Klein & Murray 2008] [Gee & Mayol-Cuevas 2012]
Epitomic location recognition [Ni et al. 2009]approximate
precise
![Page 54: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/54.jpg)
PROBLEMS IN REAL WORLD CAMERA LOCALIZATION
The real world is less exciting than vision researchers might like
sparse interest points can fail
The real world is big
?
!
![Page 55: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/55.jpg)
KEY IDEA: SCENE COORDINATE REGRESSION
Scene coordinate XYZ
RGB color space
![Page 56: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/56.jpg)
KEY IDEA: SCENE COORDINATE REGRESSION
Let each pixel predict direct correspondence
to 3D point in scene coordinates:
AB C
Input RGB Input Depth Desired Correspondences
A
B
C Scene coordinate XYZ RGB color space
3D model from KinectFusion
(only used for visualization)
![Page 57: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/57.jpg)
SCENE COORDINATE REGRESSION
Offline approach to relocalization
observe a scene
train a regression forest
revisit the scene
Aim for really precise localization
e.g. suitable for AR overlays
from a single frame
without an explicit 3D model
p
ℳ𝑙1 𝐩
[Bunny: Stanford]
![Page 58: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/58.jpg)
SCENE COORDINATE REGRESSION (SCORE) FORESTS
RGB
Depth
…
tree 1p
ℳ𝑙1 𝐩
ptree T
ℳ𝑙𝑇 𝐩
Depth & RGB
features
SCoRe Forest
𝛿1𝐷(𝐩)
𝛿2𝐷(𝐩)
Leaf Predictions ℳ𝑙 ⊂ ℝ3
𝐩
Forest Predictions ℳ 𝐩 =
𝑡
ℳ𝑙𝑡(𝐩)
![Page 59: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/59.jpg)
TRAINING A SCORE FOREST
RGB-D frames with known camera poses 𝐻
Generate 3D pixel labels automatically:
𝐦 = 𝐻𝐱
Training Data
RGB Depth
𝐱Labels
𝐦
Learning (standard)
Greedily train tree
Reduction in spatial variance objective:
Regression, not classification
Mean shift to summarize distribution
at leaf 𝑙 into small set ℳ𝒍 ⊂ ℝ3
![Page 60: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/60.jpg)
ROBUST CAMERA POSE OPTIMIZATION
pixel index
camera pose
robust
error function
correspondences
predicted
by forest
at pixel 𝑖
Energy Function Optimization
Preemptive RANSAC
[Nistér ICCV 2003]
With pose refinement
[Chum et al. DAGM 2003]
efficient updates to means & covariancesused by Kabsch SVD
Only a small subset of pixels used
![Page 61: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/61.jpg)
INLYING FOREST PREDICTIONS
Ground truth Inferred
Test images Inliers for six hypotheses
𝐻1
𝐻2
𝐻3
𝐻4
𝐻5
𝐻6
Camera pose
![Page 62: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/62.jpg)
PREEMPTIVE RANSAC OPTIMIZATION
![Page 63: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/63.jpg)
THE 7SCENES DATASET
Heads
Pumpkin
RedKitchen
Stairs
Dataset available from authors
![Page 64: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/64.jpg)
BASELINES FOR COMPARISON
ORB matching
[Rublee et al. ICCV 2011]
FAST detector
Rotation aware BRIEF descriptor
Hashing for matching
Geometric verification
RANSAC & perspective 3 point
Final refinement given inliers
Sparse Key-Points (RGB only) Tiny-Image Key-Frames (RGB & Depth)
Downsample to 40x30 pixels
Blur
Normalized Euclidean distance
Brute-force search
Interpolation of 100 closest poses
[Klein & Murray ECCV 2008]
[Gee & Mayol-Cuevas BMVC 2012]
![Page 65: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/65.jpg)
QUANTITATIVE COMPARISON
Choice of different image features
Proportion of test frames with < 0.05m translational error and < 5○ angular error
Metric:
Results:
![Page 66: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/66.jpg)
QUALITATIVE COMPARISON
ground truth DA-RGB SCoRe forest sparse baseline closest training pose
![Page 67: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/67.jpg)
QUALITATIVE COMPARISON
ground truth DA-RGB SCoRe forest sparse baseline closest training pose
![Page 68: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/68.jpg)
TRACK VISUALIZATION VIDEOS
ground truth
DA-RGB SCoRe forest
RGB sparse baseline
single frame at a time – no tracking
![Page 69: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/69.jpg)
AR VISUALIZATION
RGB input
+ AR overlay
depth input
+ AR overlay
rendering of model
from inferred pose
single frame at a time – no tracking
[Bunny: Stanford]
![Page 70: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/70.jpg)
SIMPLE ROBUST TRACKING
Add a single extra hypothesis to optimization: the result from previous frame
Single frame
![Page 71: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/71.jpg)
AR VISUALIZATION WITH TRACKING
RGB input
+ AR overlay
depth input
+ AR overlay
rendering of model
from inferred pose
simple robust frame-to-frame tracking enabled
[Bunny: Stanford]
![Page 72: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/72.jpg)
MODEL-BASED REFINEMENT
Model-based refinement
requires 3D model of scene
run rigid ICP from our inferred pose between observed image and model
0%
20%
40%
60%
80%
100%
Chess Fire Heads Office Pumpkin RedKitchen StairsPro
po
rtio
n o
f fr
ames
co
rrec
t
Baseline: Tiny-Image Depth
Baseline: Tiny-Image RGB
Baseline: Tiny-Image RGB-D
Baseline: Sparse RGB
Ours: Depth
Ours: DA-RGB
Ours: DA-RGB + D
[Besl & McKay PAMI 1992]
![Page 73: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/73.jpg)
AR VISUALIZATION WITH TRACKING AND REFINEMENT
RGB input
+ AR overlay
depth input
+ AR overlay
rendering of model
from inferred pose
simple robust frame-to-frame tracking and ICP-based model refinement enabled
[Bunny: Stanford]
![Page 74: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/74.jpg)
Fire Scene
SCoRe Forest
(single frame at a time)
SCoRe Forest
+
simple robust
frame-to-frame tracking
SCoRe Forest
+
simple robust
frame-to-frame tracking
+
ICP refinement to 3D model
RGB input
+ AR overlay
depth input
+ AR overlay
rendering of model
from inferred pose
[Bunny:
Sta
nfo
rd]
![Page 75: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/75.jpg)
Pumpkin Scene
RGB input
+ AR overlay
depth input
+ AR overlay
rendering of model
from inferred pose
SCoRe Forest
(single frame at a time)
SCoRe Forest
+
simple robust
frame-to-frame tracking
SCoRe Forest
+
simple robust
frame-to-frame tracking
+
ICP refinement to 3D model [Bunny, A
rmad
illo: S
tanfo
rd]
![Page 76: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/76.jpg)
SCENE RECOGNITION
Ch
ess
Fire
Hea
ds
Off
ice
Pu
mp
kin
Red
Kit
chen
Stai
rs
Chess 100.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
Fire 2.0% 98.0% 0.0% 0.0% 0.0% 0.0% 0.0%
Heads 0.0% 0.0% 100.0% 0.0% 0.0% 0.0% 0.0%
Office 0.0% 0.5% 4.0% 95.5% 0.0% 0.0% 0.0%
Pumpkin 0.0% 0.0% 0.0% 0.0% 100.0% 0.0% 0.0%
RedKitchen 2.8% 1.2% 3.6% 0.0% 0.0% 92.4% 0.0%
Stairs 0.0% 0.0% 10.0% 0.0% 0.0% 0.0% 90.0%
Train one SCoRe Forest per scene
Test frame against all scenes
Scene with lowest energy wins
Single frame only
![Page 77: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/77.jpg)
SCENE COORDINATE REGRESSION - SUMMARY
Scene coordinate regression forests
provide a single-step alternative to detection/description/matching pipeline
can be applied at any valid pixel, not just at interest points
allow accurate relocalization without explicit 3D model
Tracking-by-detection is approaching temporal tracking accuracy
![Page 78: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/78.jpg)
Unifying principal:
Per-pixel regression and per-image model fitting
Depth cameras are having huge impact
Decision forests + big data
WRAP UP
![Page 79: D , HUMAN POSE AND CAMERA POSEfiles.is.tue.mpg.de/jgall/tutorials/slides/CVPR...Joint work with Shahram Izadi, Richard Newcombe, David Kim, Otmar Hilliges, David Molyneaux, Pushmeet](https://reader034.vdocuments.us/reader034/viewer/2022050516/5fa015587e782429e9163989/html5/thumbnails/79.jpg)
Thank you!
With thanks to:Andrew Fitzgibbon, Jon Taylor, Ross Girshick, Mat Cook, Andrew Blake, Toby Sharp, Pushmeet Kohli, Ollie Williams, Sebastian Nowozin, Antonio Criminisi, Mihai Budiu, Duncan Robertson, John Winn, Shahram Izadi
The whole Kinect team, especially: Alex Kipman, Mark Finocchio,Ryan Geiss, Richard Moore, Robert Craig, Momin Al-Ghosien,Matt Bronder, Craig Peeper