accurate and efficient gesture spotting via pruning and subgesture reasoning

95
Computer Science Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning Jonathan Alon, Vassilis Athitsos, and Stan Sclaroff Computer Science Department Boston University

Upload: dulcea

Post on 20-Jan-2016

16 views

Category:

Documents


0 download

DESCRIPTION

Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning. Jonathan Alon, Vassilis Athitsos, and Stan Sclaroff Computer Science Department Boston University. Gesture Recognition Applications. Human Computer Interaction. Sign Language Analysis. Video Annotation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Jonathan Alon, Vassilis Athitsos, and Stan Sclaroff

Computer Science DepartmentBoston University

Page 2: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Gesture Recognition Applications

Human ComputerInteraction

Sign LanguageAnalysis

VideoAnnotation

Command spotting to control:

•Computer Applications [Lee&Kim 99,Zhu et al 02]•TV and Video games [Freeman et al 96, 99]•Robots [Triesch 97]

UAV Guidance

Page 3: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Classification of Gesture Recognition Problems

Isolated Continuous

Easier HarderSpotting and Recognition

Page 4: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Gesture Spotting Problem Given a vocabulary of gestures:

Locate the start and frame of a gesture within a long video stream (and recognize the gesture).

non-gesture“2” gesture “5” gesture

Frame 334 Frame 403 Frame 733 Frame 836

Page 5: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Overview

Objective Propose an efficient and accurate gesture

spotting and recognition system that enables most natural human computer interaction.

Approach1. Pruning method that views pruning as a

classification (learning) problem2. Subgesture reasoning process that models the

fact that a gesture may resemble a part of a longer gesture

Experiments Order of magnitude speedup 18% improvement in accuracy

Page 6: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Gesture Spotting FrameworkIndirect approach: spotting is intertwined with recognition:

Temporal Matching: Continuous Dynamic Programming (CDP) [Oka 98]

Spotting [Morguet&Lang 98, Lee&Kim 99]

Hand Detection + Feature Extraction

Temporal Matching

Gesture id,start and end frames

Video Stream

FeatureVector

Gesture Models + Pruning Classifiers

Spotting

Spotting Rules +Subgesture table

MatchingCosts

Page 7: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Hand Detection and Feature Extraction

Hand Detection: based on color and motion

Feature: (x,y) hand centroid

Skin Likelihood Frame Differencing

Input Frame

“Hand Likelihood”

Detected Hand

Page 8: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Temporal Matching: Continuous Dynamic Programming (CDP)

Input

Mod

el

time j

time i

Page 9: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Temporal Matching: Continuous Dynamic Programming (CDP)

Input

Mod

el

time j

time i

Page 10: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Temporal Matching: Continuous Dynamic Programming (CDP)

Input

Mod

el

time j

time i

d(i,j)

Local Cost: d(i,j)=L2(Mi,Qj)

Page 11: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Temporal Matching: Continuous Dynamic Programming (CDP)

Input

Mod

el

time j

time i

D(i,j)

Cumulative Cost: D(i,j)=d(i,j)+min{D(i-1,j), D(i,j-1), D(i-1,j-1)}

Page 12: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Temporal Matching: Continuous Dynamic Programming (CDP)

Input

Mod

el

time j

time i

D(m,j)

W

Page 13: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Temporal Matching: Continuous Dynamic Programming (CDP)

Input

Mod

el

time j

time i

D(m,j1) D(m,j2)

D(m,j2) < D(m,j1)

Page 14: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Spotting: Detection of candidate gesture end point

Detectionthreshold

matching cost Dg(mg,j)

time j

Page 15: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Why Pruning?

Search time for best matching model increases linearly with the number of gesture models. This can be too expensive for Systems with large gesture vocabularies Real time applications

Efficient search methods [Gao et al 00] Fast match, N-best search, A*,… Beam search

maintains promising hypotheses that have low matching costs within a “beam width” from the matching cost of the current best hypothesis.

requires ad hoc setting of “beam width”.

Page 16: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Pruning: Novel Viewpoint

Pruning is a classification problem, so we can use any classifier, e.g., based on cumulative cost. based on observation cost. based on transition cost.

Classifiers can be learned from training data, instead of manually specifying “beam width”.

Pruning is decoupled from recognition.

Page 17: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Pruning: Motivating Example If input feature j is too far from model feature i (d(i,j)

> τi) then all paths going through cell (i,j) should be pruned.

For example, the start point of digit “5” is far from the start point of digit “2” both in terms of position and direction.

Page 18: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

How to Prune?

Classifier learning objective: maximize pruning (white cells area) s.t. minimize expectation of pruning the optimal path (red).

Input (digit “6”)

Mod

el (

digi

t “6”

)

Legend:White: pruned cells• pixBlack: visited cellsRed: optimal path

iji QMd ),(

Page 19: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Learning to prune: example classifier

1. Match every positive gesture example Mp with model M.

2. For every model feature Mi record all features Mp

j that match it (using DTW).

3. Let The pruning classifier for model feature Mi is:

iji

ijiji τ) ,Qd(M

τ) ,Qd(MQC

if 1

if 1)(

),(max jp

iji MMdτ

Page 20: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

CDP with Pruning (CDPP)

Qj-1 Qj

M1

Mm

•Sparse vector representation:black cells are stored in memory

Page 21: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

CDP with Pruning (CDPP)

C1(Qj) = ?

Qj-1 Qj

M1

Mm

Page 22: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

CDP with Pruning (CDPP)

C1(Qj) = +1

Qj-1 Qj

M1

Mm

Page 23: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

CDP with Pruning (CDPP)

Qj-1 Qj

M1

Mm

C2(Qj) = ?

Page 24: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

CDP with Pruning (CDPP)

Qj-1 Qj

M1

Mm

C2(Qj) = +1

Page 25: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

CDP with Pruning (CDPP)

Qj-1 Qj

M1

Mm

C3(Qj) = ?

Page 26: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

CDP with Pruning (CDPP)

Qj-1 Qj

M1

Mm

C3(Qj) = -1

Skip to nextcell that hasa neighbor

Page 27: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Spotting

Spotting Rules

OUTPUT

•Detected gesture and gesture endpoint

OR

•New candidategesture list

INPUT

•Matching costs in current frame j•Current candidategesture list

•matching cost•duration

Optional:•Frame index of last detected gesture•Response time

Page 28: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Nested Gestures

Which gesture to recognize?5 or 8? 7 or 3? 1 or 9?

Page 29: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Nested Gestures

time j

mat

chin

g co

st

Which gesture to recognize?5 or 8?

Page 30: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Nested Gestures

Which gesture to recognize?5 or 8? Subgesture Supergesture

0 9

1 4,7,9

4 2,5,6,8,9

5 8

7 2,3,9

Solution: subgesture table

•If a gesture is firing then if at least one of its supergestures is firing then wait; otherwise, recognize it.•If a gesture is firing and it has no supergestures then recognize it.

Page 31: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Spotting Algorithm (1)

Update Candidate Gesture List:

1. Find all firing models.

2. Conduct subgesture competitions among firing models.

3. Find the best firing model.

4. For every candidate perform overlapping and subgesture tests wrt best firing model.

5. Remove candidate if failed any test.

6. Add the best firing model if passed all tests.

Page 32: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Spotting Algorithm (2)

Spot candidate gesture if either

1. all of its active supergesture models started after the candidate's end frame j*.

2. all current active paths started after the candidate's end frame j*.

3. a specified number of frames have elapsed since the candidate's end frame j*.

Page 33: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Experiments

Models: 2 users * 10 digits * 3 examples per digit. Test: 2 users * 3 long sequences * 10 digits. Sequence length: input: 1000-1500 frames. digit: 30-

90 frames.

Example Sequence

Page 34: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Results

Accuracy:

CDP = Continuous Dynamic Programming, CDPP = CDP with Pruning,

CDPPS = CDP with Pruning and Subgesture Resoning

Speedup: CDPP 10 times faster than CDP.

Method CDP CDPP CDPPS

Detection Rate 78.3% 85.0% 96.7%

False Matches 13 9 2

Page 35: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Conclusions

Pruning is a classification problem. CDPP an order of magnitude faster than

CDP, and 7% more accurate than CDP.

Reasoning about nested gestures improves recognition accuracy.

CDPPS improves accuracy by additional 12%.

Both pruning and subgesture reasoning can be applied to other dynamic models (e.g., HMMs).

Page 36: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Thank you

Page 37: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Ongoing Work

Learn1. Pruning classifiers using cross-validation.

2. Subgesture table.

3. Gesture verifiers.

Compare pruning method to Beam Search. Handle multiple candidate hand hypotheses.

Apply methods to automatic sign language transcription.

Page 38: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Towards Automatic Annotationof American Sign Language

Additional challenges: Users not cooperative: fast gesture speeds;

variation between users. Significant variation in hand shape and appearance. Different types of gestures: finger spelling, one vs.

two handed.

Page 39: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Gesture Spotting: Related Work Direct approach [Kang et. al 04, Kahol et. Al 04]

Spotting precedes recognition.

1. Compute low-level motion parameters, such as velocity, acceleration, trajectory curvature.

2. Look for abrupt changes (zero-crossings) in those parameters to find candidate gesture boundaries.

Indirect approach [Morguet&Lang 98, Lee&Kim 99]

Spotting is intertwined with recognition.

1. Compute input to models matching costs.

2. Look for low cost to detect candidate gesture end point. (Gesture start point can be found by backtracking the optimal dynamic programming path).

Page 40: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Approach: Continuous Dynamic Programming (CDP) [Oka 98]

Input

“0”

Mod

el“2

” M

odel

“9”

Mod

el

Mi

Mi-1

Qj-1 Qj

)}1,1(),1,(),,1(min{),(),( jiDjiDjiDjidjiD

d(i,j): distance between model feature Mi and input feature Qj.D(i,j): cumulative distance between model M(1:i) and input subsequence Q(j’:j)Continuous and Monotonic Warping Path

Page 41: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Approach: Continuous Dynamic Programming (CDP)

Mi

Mi-1

Qj-1 Qj

)}1,1(),1,(),,1(min{),(),( jiDjiDjiDjidjiD

d(i,j): distance between model feature Mi and input feature Qj.D(i,j): cumulative distance between model M(1:i) and input subsequence Q(j-:j)

0

2

9

Acceptthreshold

Input

Model

time j

time iGesture Start End

Optimal Warping Path(continuous & monotonic)

Page 42: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Conclusions

CDPP an order of magnitude faster than CDP, and 7% more accurate than CDP.

CDPPS improves accuracy by additional 12%. Both pruning and subgesture reasoning can

be applied to Hidden Markov Models (HMMs). Future Work:

Learn:1. Subgesture table.2. Gesture Transition Classifiers and Subsequence

Classifiers.3. Gesture Verifiers.

Apply methods to spot signs in American Sign Language (ASL) sequences (e.g., utterances, stories, and dialogs).

Page 43: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Gesture Types (Channels)

Head GestureBody Gesture Hand Gesture

Page 44: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Gesture Spotting: Related Work

Indirect approach [Morguet&Lang 98, Lee&Kim 99]

Spotting is intertwined with recognition.

0. Detect hands and extract features.

1. Compute input to models matching costs.

2. Look for low cost to detect candidate gesture end point.

Page 45: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Pruning: Motivation

Detection and Tracking Where (in the image is the gesture

performed)?

Spotting When (does the gesture start and end)?

Recognition What (gesture)?

Search complexity can be high ! | Where | * | When | * | What |

Page 46: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Gesture End Point Detection and Gesture Recognition

The algorithm is invoked for every input frame j, and consists of two steps:

1. Update the current list of candidate gesture models.

2. Apply a set of spotting rules to decide whether or not a gesture was spotted, and if yes decide which gesture model.

Page 47: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

End Point Detection Definitions

Paths Complete Path W(M1:m, Qj’:j): a legal

warping path matching the input subsequence Qj’:j with the complete model M1:m.

Partial Path W(M1:i, Qj’:j): a legal warping path matching the input subsequence Qj’:j with part of the model M1:m.

Active Path: a partial path that has not been pruned.

Page 48: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

End Point Detection Definitions

Models Active Model g: a model that has a

complete path ending at the current input frame j.

Firing Model g: an active model with a cost below the detection acceptance threshold.

Subgesture Relationship: a gesture g1 is a subgetsure of gesture g2 if it is properly contained in g2. In this case, g2 is a supergesture of g1.

Page 49: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Spotting Rules (1)

Zhu et. Al 02 (Spotting rules) Based on Baudel&Beaudouin-Lafon’s

Interaction Model.1. A moving hand appears in the sequence.2. The moving hand is the dominant moving

object.3. The movement of the hand follows a

three-stage process: preparation, stroke, and retraction [Kendon].

4. The duration of the stroke T is bounded, T1≤T≤T2, for a given sampling rate.

Page 50: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Spotting Rules (3)

Lee&Kim 99 (End-point detection):

Page 51: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Gesture Spotting: Applications

Command spotting for Controlling Computer Applications [Lee&Kim 99,Zhu et

al 02] TV and Video games [Freeman et al 96, 99] Robots [Triesch 97]

Sign Language Analysis [Starner&Pentland 95, Vogler&Metaxas 99,…]

[Cui&Weng 96, Yang&Ahuja 99, Bowden et al 04, folks at ucf]

Page 52: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Implementation Details

We use a circular buffer of fixed length (e.g., 150 frames) to implement the sliding window concept.

We use a sparse vector representation that enables fast individual element access (compared to fast matrix vector operations as in Matlab).

We sacrifice memory in favor of efficiency and no fragmentation by preallocating memory for the sparse vectors to their max. capacity (model length).

Page 53: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Sparse Vector Representation

D1j

i=1D2j

i=2

D5j

i=3

nil

nil

nil

nil

indj listj

Page 54: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

CDPP in a picture

Ci(Qj) = ?

nil

nil

nil

nil

indj-1 listj-1

Page 55: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Example Spotting Rules

Morguet&Lang 98 (Peak finding rules):1. Cost must be a local minimum inside

interval centered at current frame.

2. Cost must be smaller than a model-dependent threshold.

3. Cost must be lowest compared to all other model costs.

4. Cost must have a minimum temporal distance to the last valid found.

Page 56: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

d(i,j)

Temporal Matching: Continuous Dynamic Programming

Input

Mod

el

time j

time i

Qj=(xj,yj)

Mi=(xi,yi)

Local Cost: d(i,j)=L2(Mi,Qj)

Mm

M1

Page 57: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Temporal Matching: Continuous Dynamic Programming

D(i,j)

Input

Mod

el

time j

time i Cumulative Cost: D(i,j)=d(i,j)+min{D(i-1,j), D(i,j-1), D(i-1,j-1)}

Mi=(xi,yi)

Qj=(xj,yj)

M1

Mm

Page 58: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Temporal Matching: Continuous Dynamic Programming

W(i,j)

Input

Mod

el

time j

time i Warping Path: W(i,j)=((1,j’),…,(i,j))

Mi=(xi,yi)

Qj=(xj,yj)Qj’

M1

Mm

Page 59: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Temporal Matching: Continuous Dynamic Programming

D(m,j)

Input

Mod

el

time j

time i Cumulative Cost D(m,j) is used for spotting and recognition

Mi=(xi,yi)

Qj=(xj,yj)

M1

Mm

Page 60: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Temporal Matching: Continuous Dynamic Programming

Input

Mod

el

time j

time i

Qj=(xj,yj)

Mi=(xi,yi)

Mm

M1

Local Cost: d(i,j)=L2(Mi,Qj)

d(i,j)

Page 61: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Temporal Matching: Continuous Dynamic Programming

D(i,j)

Input

Mod

el

time i Cumulative Cost: D(i,j)=d(i,j)+min{D(i-1,j), D(i,j-1), D(i-1,j-1)}

Mi=(xi,yi)

M1

Mm

time jQj=(xj,yj)

Page 62: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Temporal Matching: Continuous Dynamic Programming

Input

Mod

el

time j

time i

Qj=(xj,yj)

Mi=(xi,yi)

Mm

M1

Qjs

W

D(m,j)

Local Cost: d(i,j)=L2(Mi,Qj)

d(i,j)

Page 63: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Temporal Matching: Continuous Dynamic Programming

Input

Mod

el

time j

time i

Qj=(xj,yj)

Mi=(xi,yi)

Mm

M1

Local Cost: d(i,j)=L2(Mi,Qj)

d(i,j)

Page 64: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Temporal Matching: Continuous Dynamic Programming

Input

Mod

el

time j

time i

Qj=(xj,yj)

Mi=(xi,yi)

Mm

M1

D(i,j)

Cumulative Cost: D(i,j)=d(i,j)+min{D(i-1,j), D(i,j-1), D(i-1,j-1)}

Page 65: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Pruning: Common Practice

Beam-Search [Jelinek 97, Gao et al 00] Idea: only maintain promising hypotheses

that have low cum. costs within a “beam width” from the cum. cost of the current best hypothesis.

Works well in practice, but requires ad hoc setting of the beam width parameter.

Page 66: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Objective

Propose an accurate and efficient gesture spotting and recognition system that enables most natural human computer interaction.

Page 67: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Overview

Introduction Classification of Gesture Recognition Problems Gesture Spotting: Problem Definition Objective Applications

Approach Related Work: Continuous Dynamic

Programming Pruning as a classification problem Subgesture reasoning

Experiments Order of magnitude speedup 18% improvement in accuracy

Page 68: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Pruning: Approach

Likely Cells Visited CellsVisited Likely Cells

Black: likely d(i,j) ≤ τi

White: unlikely&pruned d(i,j) > τi

Black: likely&visitedWhite: unlikely&prunedGray: likely&pruned

Black: visitedWhite: pruned

84% pruned cellsor 6.25 speedup

White: White: White:

j

“6”

Mod

el

Page 69: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

CDPP in a picture

Ci(Qj) = ?

nil

nil

nil

nil

indj-1 listj-1

Page 70: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Temporal Matching: Continuous Dynamic Programming (CDP)

Input

Mod

el

time j

time i

Qj=(xj,yj)

Mi=(xi,yi)

Mm

M1

Qjs

W

D(m,j)

Page 71: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Temporal Matching: Continuous Dynamic Programming (CDP)

Input

Mod

el

time j

time i

Qj=(xj,yj)

Mi=(xi,yi)

Mm

M1

Qjs

W

D(m,j)

Page 72: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

d(i,j)

Temporal Matching: Continuous Dynamic Programming

Input

Mod

el

time j

time i

Qj=(xj,yj)

Mi=(xi,yi)

Local Cost: d(i,j)=L2(Mi,Qj)

Mm

M1

Page 73: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Temporal Matching: Continuous Dynamic Programming

D(i,j)

Input

Mod

el

time j

time i Cumulative Cost: D(i,j)=d(i,j)+min{D(i-1,j), D(i,j-1), D(i-1,j-1)}

Mi=(xi,yi)

Qj=(xj,yj)

M1

Mm

Page 74: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Gesture Spotting: Related Work

Direct approach [Kang et. al 04, Kahol et. Al 04]

Spotting precedes recognition.

1. Compute low-level motion parameters, such as velocity, acceleration, trajectory curvature.

2. Look for abrupt changes (zero-crossings) in those parameters to find candidate gesture boundaries.

Page 75: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Spotting: Detection of candidate gesture end point

0

2

9

Detectionthreshold

Dg(mg,j)DP tables

Page 76: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Spotting Definitions

Paths Complete Path W(M1:m, Qj’:j): a legal warping path

matching the input subsequence Qj’:j with the complete model M1:m.

Partial Path W(M1:i, Qj’:j): a legal warping path matching the input subsequence Qj’:j with part of the model M1:m.

Active Path: a partial path that has not been pruned. Models

Active Model g: a model that has a complete path ending at the current input frame j.

Firing Model g: an active model with a cost below the detection acceptance threshold.

Subgesture Relationship: a gesture g1 is a subgetsure of gesture g2 if it is properly contained in g2. In this case, g2 is a supergesture of g1.

Page 77: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Nested Gestures

time j

matching cost

Page 78: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Nested Gestures

Which gesture to recognize? 7 or 3? 5 or 8? 1 or 9?

Page 79: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Nested Gestures

Which gesture to recognize? Solution: store a subgesture table. 5 or 8?

If a gesture is firing then if at least one of its supergestures is firing then wait; otherwise, recognize it.

If a gesture is firing and it has no supergestures then recognize it.

Subgesture Supergesture

0 9

1 4,7,9

4 2,5,6,8,9

5 8

7 2,3,9

Page 80: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Nested Gestures

Which gesture to recognize? 5 or 8? 7 or 3? 1 or 9?

Solution: store a subgesture table. If a gesture is firing then if at least one of its

supergestures is firing then wait; otherwise, recognize it.

If a gesture is firing and it has no supergestures then recognize it.

Page 81: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Temporal Matching: Continuous Dynamic Programming (CDP)

Input

Mod

el

time j

time i

Qj=(xj,yj)

Mi=(xi,yi)

Mm

M1

Qjs

W

D(m,j)

Page 82: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Temporal Matching: Continuous Dynamic Programming (CDP)

Input

Mod

el

time j

time i

Qj=(xj,yj)

Mi=(xi,yi)

Mm

M1

Local Cost: d(i,j)=L2(Mi,Qj)

d(i,j)

Page 83: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Temporal Matching: Continuous Dynamic Programming (CDP)

Input

Mod

el

time j

time i

Qj=(xj,yj)

Mi=(xi,yi)

Mm

M1

D(i,j)

Cumulative Cost: D(i,j)=d(i,j)+min{D(i-1,j), D(i,j-1), D(i-1,j-1)}

Page 84: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

How to Prune?

Input (digit “6”)

Mod

el (

digi

t “6”

)

Legend:White: pruned cells•d(Mi,Qj) > 100 pixBlack: visited cellsRed: optimal path

Page 85: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

How to Prune?

Input (digit “6”)

Mod

el (

digi

t “6”

)

Legend:White: pruned cells•d(Mi,Qj) > 40 pixBlack: visited cellsRed: optimal path

Page 86: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

How to Prune?

Input (digit “6”)

Mod

el (

digi

t “6”

)

Legend:White: pruned cells•d(Mi,Qj) > 30 pixBlack: visited cellsRed: optimal path

Page 87: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

How to Prune?

Input (digit “6”)

Mod

el (

digi

t “6”

)

Legend:White: pruned cells•d(Mi,Qj) > 20 pixBlack: visited cellsRed: optimal path

Page 88: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

How to Prune?

Input (digit “6”)

Mod

el (

digi

t “6”

)

Legend:White: pruned cells•d(Mi,Qj) > 20 pixBlack: visited cellsRed: optimal path

Page 89: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

How to Prune?

Input (digit “6”)

Mod

el (

digi

t “6”

)

Legend:White: pruned cells•d(Mi,Qj) > 15 pixBlack: visited cellsRed: optimal path

Page 90: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

How to Prune?

Input (digit “6”)

Mod

el (

digi

t “6”

)

Legend:White: pruned cells•d(Mi,Qj) > 10 pixBlack: visited cellsRed: optimal path

Page 91: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

How to Prune?

Answer: learn classifiers (thresholds ): maximize pruning s.t. minimize expectation of pruning the optimal path.

Input (digit “6”)

Mod

el (

digi

t “6”

)

Legend:White: pruned cells• pixBlack: visited cellsRed: optimal path

i

iji QMd ),(

Page 92: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Temporal Matching: Continuous Dynamic Programming (CDP)

Input

Mod

el

time j

time i

Page 93: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Temporal Matching: Continuous Dynamic Programming (CDP)

Input

Mod

el

time j

time i

W

D(m,j)

Page 94: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Page 95: Accurate and Efficient Gesture Spotting via Pruning and Subgesture Reasoning

Computer Science

Demo