database-based hand pose estimation cse 6367 – computer vision vassilis athitsos university of...

35
Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

Upload: neil-daniel

Post on 15-Jan-2016

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

Database-Based Hand Pose Estimation

CSE 6367 – Computer VisionVassilis Athitsos

University of Texas at Arlington

Page 2: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

2

Static Gestures (Hand Poses)

Input image Articulated hand model

• Given a hand model, and a single image of a hand, estimate:– 3D hand shape (joint angles).– 3D hand orientation.

Joints

Page 3: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

3

Static Gestures

Input image Articulated hand model

• Given a hand model, and a single image of a hand, estimate:– 3D hand shape (joint angles).– 3D hand orientation.

Page 4: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

4

Goal: Hand Tracking Initialization

• Given the 3D hand pose in the previous frame, estimate it in the current frame.– Problem: no good way to automatically

initialize a tracker.

Rehg et al. (1995), Heap et al. (1996), Shimada et al. (2001),

Wu et al. (2001), Stenger et al. (2001), Lu et al. (2003), …

Page 5: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

5

Assumptions in Our Approach

• A few tens of distinct hand shapes.– All 3D orientations should be allowed.– Motivation: American Sign Language.

Page 6: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

6

Assumptions in Our Approach

• A few tens of distinct hand shapes.– All 3D orientations should be allowed.– Motivation: American Sign Language.

• Input: single image, bounding box of hand.

Page 7: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

7

Assumptions in Our Approach

• We do not assume precise segmentation!– No clean contour extracted.

input image

skin detection segmented

hand

Page 8: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

8

Approach: Database Search

• Over 100,000 computer-generated images.– Known hand pose.

input

Page 9: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

9

Why?

• We avoid direct estimation of 3D info.– With a database, we only match 2D to 2D.

• We can find all plausible estimates.– Hand pose is often ambiguous.

input

Page 10: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

10

Building the Database

26 hand shapes

Page 11: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

11

4128 images are generated for each hand shape.

Total: 107,328 images.

Building the Database

Page 12: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

12

Features: Edge Pixels

• We use edge images.– Easy to extract.– Stable under illumination

changes.

input edge image

Page 13: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

13

Chamfer Distance

input

model

Overlaying input and model

How far apart are they?

Page 14: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

14

• Input: two sets of points.– red, green.

• c(red, green):– Average distance from each

red point to nearest green point.

Directed Chamfer Distance

Page 15: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

15

• Input: two sets of points.– red, green.

• c(red, green):– Average distance from each

red point to nearest green point.

• c(green, red):– Average distance from each

red point to nearest green point.

Directed Chamfer Distance

Page 16: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

16

• Input: two sets of points.– red, green.

• c(red, green):– Average distance from each red

point to nearest green point.

• c(green, red):– Average distance from each red

point to nearest green point.

Chamfer Distance

Chamfer distance:C(red, green) = c(red, green) + c(green, red)

Page 17: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

17

Evaluating Retrieval Accuracy

• A database image is a correct match for the input if:– the hand shapes are the same,– 3D hand orientations differ by at most 30 degrees.

inputcorrect matches

incorrect matches

Page 18: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

18

Evaluating Retrieval Accuracy

• An input image has 25-35 correct matches among the 107,328 database images.– Ground truth for input images is estimated by

humans.

inputcorrect matches

incorrect matches

Page 19: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

19

Evaluating Retrieval Accuracy

• Retrieval accuracy measure: what is the rank of the highest ranking correct match?

inputcorrect matches

incorrect matches

Page 20: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

20

Evaluating Retrieval Accuracyinput

rank 1 rank 2 rank 3 rank 5 rank 6rank 4

highest rankingcorrect match

Page 21: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

21

Results on 703 Real Hand Images

Rank of highest ranking correct match

Percentage of test images

1 15%

1-10 40%

1-100 73%

Page 22: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

22

Results on 703 Real Hand Images

Rank of highest ranking correct match

Percentage of test images

1 15%

1-10 40%

1-100 73%

• Results are better on “nicer” images:– Dark background.– Frontal view.– For half the images, top match was correct.

Page 23: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

23

Examples

initial image

segmented hand

edge image

correct match

rank: 1

Page 24: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

24

Examples

initial image

segmented hand

edge image

correct match

rank: 644

Page 25: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

25

Examples

initial image

segmented hand

edge image

incorrect match

rank: 1

Page 26: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

26

Examples

initial image

segmented hand

edge image

correct match

rank: 1

Page 27: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

27

Examples

initial image

segmented hand

edge image

correct match

rank: 33

Page 28: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

28

Examples

initial image

segmented hand

edge image

incorrect match

rank: 1

Page 29: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

29

Examples

“hard” case

segmented hand

edge image

“easy” case

segmented hand

edge image

Page 30: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

30

Research Directions

• More accurate similarity measures.

• Better tolerance to segmentation errors.– Clutter.– Incorrect scale and

translation.

• Verifying top matches.

• Registration.

Page 31: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

31

Efficiency of the Chamfer Distance

• Computing chamfer distances is slow.– For images with d edge pixels, O(d log d) time.– Comparing input to entire database takes over 4 minutes.

• Must measure 107,328 distances.

input model

Page 32: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

32

The Nearest Neighbor Problem

database

Page 33: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

33

The Nearest Neighbor Problem

query

• Goal: – find the k nearest

neighbors of query q.database

Page 34: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

34

The Nearest Neighbor Problem

• Goal: – find the k nearest

neighbors of query q.

• Brute force time is linear to:– n (size of database).– time it takes to measure a

single distance.

query

database

Page 35: Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington

35

• Goal: – find the k nearest

neighbors of query q.

• Brute force time is linear to:– n (size of database).– time it takes to measure a

single distance.

The Nearest Neighbor Problem

query

database