hand pose estimation rnd project - cse, iit bombaypratikm/projectpages/deeplearningforpo… ·...

16
Hand Pose Esmaon Author: Prak Kalshet Supervisor: Parag Chaudhuri Department of Computer Science and Engineering Indian Instute of Technology Bombay RnD Project

Upload: others

Post on 24-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hand Pose Estimation RnD Project - CSE, IIT Bombaypratikm/projectPages/deepLearningForPo… · Model-driven (Generative) [Mak+15(CVPR)] Synthesize, optimize energy (discrepancy) to

Hand Pose Estimation

Author: Pratik Kalshet Supervisor: Parag Chaudhuri

Department of Computer Science and EngineeringIndian Institute of Technology Bombay

RnD Project

Page 2: Hand Pose Estimation RnD Project - CSE, IIT Bombaypratikm/projectPages/deepLearningForPo… · Model-driven (Generative) [Mak+15(CVPR)] Synthesize, optimize energy (discrepancy) to

Introduction Problem Statement Previous Work Approach Results

Outline

Page 3: Hand Pose Estimation RnD Project - CSE, IIT Bombaypratikm/projectPages/deepLearningForPo… · Model-driven (Generative) [Mak+15(CVPR)] Synthesize, optimize energy (discrepancy) to

Applications - Human-computer interaction, Augmented and Virtual Reality, … Hot research topic – ICCV, CVPR, SIGGRAPH. 2016

IntroductionMotivation

Robert Wang. Nimble VR 2014

Page 4: Hand Pose Estimation RnD Project - CSE, IIT Bombaypratikm/projectPages/deepLearningForPo… · Model-driven (Generative) [Mak+15(CVPR)] Synthesize, optimize energy (discrepancy) to

IntroductionChallenges

Self-occlusion Self-similarity NoiseHigh Degree-of-freedom

Page 5: Hand Pose Estimation RnD Project - CSE, IIT Bombaypratikm/projectPages/deepLearningForPo… · Model-driven (Generative) [Mak+15(CVPR)] Synthesize, optimize energy (discrepancy) to

Aim – Accuracy and Efficiency

Problem StatementHand Pose Estimation

Input – Depth Image (of hand) Output – Joint Locations in 3-D

Page 6: Hand Pose Estimation RnD Project - CSE, IIT Bombaypratikm/projectPages/deepLearningForPo… · Model-driven (Generative) [Mak+15(CVPR)] Synthesize, optimize energy (discrepancy) to

Model-driven (Generative) [Mak+15(CVPR)] Synthesize, optimize energy (discrepancy) to get hand pose

Advantage – accurate, valid poses

Disadvantage – slow, local minima (initialization problem)

Data-driven (Discriminative) [Sun+15(CVPR)] Direct regression function – observed image to hand pose

Advantage – fast (real-time)

Disadvantage – coarse results, violate hand geometry

Previous WorkTypes of Techniques

Generative Methods

Discriminative MethodsTheobalt. “Real-time Capture of Hands in Motion”. CVPR. 2015

Page 7: Hand Pose Estimation RnD Project - CSE, IIT Bombaypratikm/projectPages/deepLearningForPo… · Model-driven (Generative) [Mak+15(CVPR)] Synthesize, optimize energy (discrepancy) to

Hybrid [Tay+16(SIGGRAPH)] Initialization using discriminative, refinement using generative

Advantage – accurate, fast

Disadvantage – separate stages lead to sub-optimal results

Previous WorkTypes of Techniques

Tomson et. al. “Real-time continuous pose recovery of human hands using convolutional networks”. TOG. 2014

Page 8: Hand Pose Estimation RnD Project - CSE, IIT Bombaypratikm/projectPages/deepLearningForPo… · Model-driven (Generative) [Mak+15(CVPR)] Synthesize, optimize energy (discrepancy) to

Hand prior

Non-linear regression

Previous WorkIssues in Existing Systems

Ge. “Robust 3D hand pose estimation in single depth images: from single-view CNN to multi-view CNNs”. CVPR. 2016

Page 9: Hand Pose Estimation RnD Project - CSE, IIT Bombaypratikm/projectPages/deepLearningForPo… · Model-driven (Generative) [Mak+15(CVPR)] Synthesize, optimize energy (discrepancy) to

ApproachOverview

Input(Depth Image)

Output(3-D Joint Positions)

Deep Network(ConvNet, Kinematic Layer)

Zhou et al. “Model-based Deep Hand Pose Estimation”. IJCAI. 2016

Page 10: Hand Pose Estimation RnD Project - CSE, IIT Bombaypratikm/projectPages/deepLearningForPo… · Model-driven (Generative) [Mak+15(CVPR)] Synthesize, optimize energy (discrepancy) to

ApproachPre-processing

Zhang et al. “Accurate per-pixel hand detection from a single depth image”. Optical Engineering. 2017

1. Hand detection

2. Depth normalization

*This is assumed to be done.

Page 11: Hand Pose Estimation RnD Project - CSE, IIT Bombaypratikm/projectPages/deepLearningForPo… · Model-driven (Generative) [Mak+15(CVPR)] Synthesize, optimize energy (discrepancy) to

ApproachDeep Network

Loss:

Page 12: Hand Pose Estimation RnD Project - CSE, IIT Bombaypratikm/projectPages/deepLearningForPo… · Model-driven (Generative) [Mak+15(CVPR)] Synthesize, optimize energy (discrepancy) to

NYU Hand Pose Dataset Training samples: 10000 Test samples: 1200 Joints: 31 DoF: 26

ResultsData

Input – Depth Image Label – Joint Positions in 3-D

Tomson et. al. “Real-time continuous pose recovery of human hands using convolutional networks”. TOG. 2014

Page 13: Hand Pose Estimation RnD Project - CSE, IIT Bombaypratikm/projectPages/deepLearningForPo… · Model-driven (Generative) [Mak+15(CVPR)] Synthesize, optimize energy (discrepancy) to

ResultsQualitative Results

Input

Prediction

Ground Truth

Page 14: Hand Pose Estimation RnD Project - CSE, IIT Bombaypratikm/projectPages/deepLearningForPo… · Model-driven (Generative) [Mak+15(CVPR)] Synthesize, optimize energy (discrepancy) to

ResultsComparative Study

Input

Prediction

Ground Truth

Without Kinematic Layer With Kinematic Layer

Page 15: Hand Pose Estimation RnD Project - CSE, IIT Bombaypratikm/projectPages/deepLearningForPo… · Model-driven (Generative) [Mak+15(CVPR)] Synthesize, optimize energy (discrepancy) to

ResultsComparison with state-of-the-art

Technique Error

No prior 6395.45

Existing best prior 4699.16

Kinematic prior 3079.38

Page 16: Hand Pose Estimation RnD Project - CSE, IIT Bombaypratikm/projectPages/deepLearningForPo… · Model-driven (Generative) [Mak+15(CVPR)] Synthesize, optimize energy (discrepancy) to

Hand kinematic prior in a deep network Achieved competitive results

ConclusionSummary

Future Work

Multi-view CNN Temporal data for tracking Physics-based constraint layer