hand pose estimation rnd project - cse, iit bombaypratikm/projectpages/deeplearningforpo… ·...
TRANSCRIPT
Hand Pose Estimation
Author: Pratik Kalshet Supervisor: Parag Chaudhuri
Department of Computer Science and EngineeringIndian Institute of Technology Bombay
RnD Project
Introduction Problem Statement Previous Work Approach Results
Outline
Applications - Human-computer interaction, Augmented and Virtual Reality, … Hot research topic – ICCV, CVPR, SIGGRAPH. 2016
IntroductionMotivation
Robert Wang. Nimble VR 2014
IntroductionChallenges
Self-occlusion Self-similarity NoiseHigh Degree-of-freedom
Aim – Accuracy and Efficiency
Problem StatementHand Pose Estimation
Input – Depth Image (of hand) Output – Joint Locations in 3-D
Model-driven (Generative) [Mak+15(CVPR)] Synthesize, optimize energy (discrepancy) to get hand pose
Advantage – accurate, valid poses
Disadvantage – slow, local minima (initialization problem)
Data-driven (Discriminative) [Sun+15(CVPR)] Direct regression function – observed image to hand pose
Advantage – fast (real-time)
Disadvantage – coarse results, violate hand geometry
Previous WorkTypes of Techniques
Generative Methods
Discriminative MethodsTheobalt. “Real-time Capture of Hands in Motion”. CVPR. 2015
Hybrid [Tay+16(SIGGRAPH)] Initialization using discriminative, refinement using generative
Advantage – accurate, fast
Disadvantage – separate stages lead to sub-optimal results
Previous WorkTypes of Techniques
Tomson et. al. “Real-time continuous pose recovery of human hands using convolutional networks”. TOG. 2014
Hand prior
Non-linear regression
Previous WorkIssues in Existing Systems
Ge. “Robust 3D hand pose estimation in single depth images: from single-view CNN to multi-view CNNs”. CVPR. 2016
ApproachOverview
Input(Depth Image)
Output(3-D Joint Positions)
Deep Network(ConvNet, Kinematic Layer)
Zhou et al. “Model-based Deep Hand Pose Estimation”. IJCAI. 2016
ApproachPre-processing
Zhang et al. “Accurate per-pixel hand detection from a single depth image”. Optical Engineering. 2017
1. Hand detection
2. Depth normalization
*This is assumed to be done.
ApproachDeep Network
Loss:
NYU Hand Pose Dataset Training samples: 10000 Test samples: 1200 Joints: 31 DoF: 26
ResultsData
Input – Depth Image Label – Joint Positions in 3-D
Tomson et. al. “Real-time continuous pose recovery of human hands using convolutional networks”. TOG. 2014
ResultsQualitative Results
Input
Prediction
Ground Truth
ResultsComparative Study
Input
Prediction
Ground Truth
Without Kinematic Layer With Kinematic Layer
ResultsComparison with state-of-the-art
Technique Error
No prior 6395.45
Existing best prior 4699.16
Kinematic prior 3079.38
Hand kinematic prior in a deep network Achieved competitive results
ConclusionSummary
Future Work
Multi-view CNN Temporal data for tracking Physics-based constraint layer