Download - Gesture Recognition - CVHCI › download › visio...Gesture recognition exploits the power of non-verbal communication, ... McNeill, D. 1992. Hand and Mind: What gestures reveal about

Gesture Recognition

Interactive Systems Laboratories, Universität Karlsruhe (TH)1

Vorlesung „Visuelle Perzeption für Mensch-Maschine Interaktion“

Rainer Stiefelhagen

Kai Nickel

2009-01-16

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)Overview

� Introduction

� Hidden Markov Models

� Applications

� American Sign Language

� Pointing Gestures

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

2


� Gestures and Speech

� Miscellaneous

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)Definition

Gesture:

� a movement usually of the body or limbs that

expresses or emphasizes an idea, sentiment, or

attitude

� the use of motions of the limbs or body as a means

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

3

� the use of motions of the limbs or body as a means

of expression

[Merriam-Webster Online Dictionary]

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)Automatic Gesture Recognition

� A gesture recognition system generates a semantic

description for certain body motions

� Gesture recognition exploits the power of non-verbal

communication, which is very common in human-human

interaction

� Gesture recognition is often built on top of a human motion

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

4

� Gesture recognition is often built on top of a human motion

tracker

� Related topics: Human Activity Analysis, Intention

Recognition, Facial Expression Recognition

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)Some Applications

� Virtual Environments� manipulation of objects

� Multimodal Rooms� interaction with room facilities

� analysis of a user‘s actions� context aware services

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

5

� context aware services

� Human-Robot Interaction� communicative gestures

� programming by demonstration

� Understanding Human-Human Interaction� level of arousal

� turn-taking

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)Types of Gestures

� Hand & arm gestures


� Sign Language

� Hello, Thumbs up/down, victory sign, the finger, call-me, … (Wikipedia lists around 40 hand gestures)

� Head gestures

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

6

� Head gestures

� Nodding, head shaking, turning, pointing

� Body gestures

� Don’t know / shrug

� Gestures are mostly culture-specific

� Pointing gesture probably one of the exceptions

� Some gestures closely coordinated with speech

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)Gesture Taxonomy

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

7

� static vs. dynamic gestures

� different phases of a gesture: preparation, peak, retraction

[Pavlović97]

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)

A Purpose Taxonomy of Gestural Interaction [Quek, ICMI05]

� Manipulative Gestures

� “Put that there” [Bolt 1980]

� Navigation in virtual spaces, game control, …

� Robot control

� Semaphoric Gestures

� Precisely defined, specific symbols of an alphabet

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

8

� Precisely defined, specific symbols of an alphabet

� Static: Finger menu selection, hand pose classification, …

� Dynamic: Crane operator signals, editing gestures for interactive whiteboard, …

� Conversational Gestures

� Naturally performed in the course of human multimodal communication

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)

Spontaneous gestures [Cassell98]

(Communicative Gestures)

� Iconic gestures Depict by the form of the gesture some feature/action/event being described

� Metaphoric gestures Also representational, but the concept they represent has no physical form; form of the gesture comes from a common metaphor. Example: "the meeting

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

9

from a common metaphor. Example: "the meeting went on and on" + hand indicating rolling motion.

� DeicticsSpatialize, or locate in the physical space in front of the narrator, aspects of the discourse.

� Beat gesturesSmall baton like movements; do not change in form with content of speech. Pragmatic function, occurring with comments on one’s own linguistic contribution, speech repairs and reported speech. Example: "she talked first, I mean second“ + hand flicking down and up.

See also:

McNeill, D. 1992. Hand and Mind:

What gestures reveal about thought.

The University of Chicago Press,

Chicago IL.

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)Spatial Models

� 3D model-based

� 3D hand model. Parameters:

angels, palm position

� Appearance-based

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

10

� Appearance-based

� Parameters:

Pixel templates, image geometry parameters, Image motion

parameters, Fingertip position & motion...

� Use templates for modelling gestures

[Pavlović]

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)Parameter Computation

Vary parameters until model’s features match data

image’s features

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

11

[Pavlović]

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)Gesture Recognition

Feature acquisition:• attached sensors• visually

• markers• color• motion• shape

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

12

• shape• …

Classifiers: • Hidden Markov Models (HMM)• Neural Networks (ANN)• Finite State Machines (FSM)• Support Vector Machines (SVM)• …

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

13

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)Hidden Markov Models

� Gesture recognition temporal pattern matching

� HMMs

� represent templates in a person-independent way (when trained

with examples from many persons)

� provide an inherently “soft” time-alignment mechanism

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

14

� provide an inherently “soft” time-alignment mechanism

� are based on a sound mathematical foundation

� Introduction to HMM � [Rabiner89]

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)Motivation

x1 x2 x3

b1 b2 b3

observable

non-observable

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

15

� the term "hidden" comes from observing observations and drawing

conclusions without knowing the hidden sequence of states

� Markov assumption (1st order): the next state depends only on the current

state (not on the complete state history)

s1 a12

s2 s3a23

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)HMM Definition

A Hidden Markov Model is a five-tuple (S,π,A,B,V).

It consists of:

� the set of States S = {s1,s2,...,sn}

� the initial probability distribution ππ(si) = probability of si being the first state of a state sequence

� the matrix of state transition probabilities AA=(a ) where a is the probability of state s following s

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

16

A=(aij) where aij is the probability of state sj following si

� the set of emission probability distributions/densities B B={b1,b2,...,bn} where bi(x) is the probability of observing xwhen the system is in state si

� the observable feature space V can be discrete V={x1,x2,...,xv}, or continuous V=Rd

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)Some Properties of HMMs

� for the initial probabilities we have: Σi π(si) = 1

� often things are simplified by π(s1)=1, and π(si>1) = 0

� obviously: Σj aij = 1 for all i

� often: aij = 0 for most j except for a few states

� when V = {x1,x2,...,xv} then bi are discrete probability

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

17

� when V = {x1,x2,...,xv} then bi are discrete probability

distributions, the HMMs are called discrete HMMs

� when V = Rd then bi are continuous probability density

functions, the HMMs are called continuous (density)

HMMs

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)HMM Topologies

left-to-right

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

18

alternative

pathsergodic

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)The Observation Model

Most popular: Gaussian mixture models

nj: number of Gaussians (in state j)

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

19

nj: number of Gaussians (in state j)

cjk: mixture weight for k-th Gaussian (in state j)

µjk: means of k-th Gaussian (in state j)

Σjk: covariane matrix of k-th Gaussian (in state j)

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)

Three main problems of HMMs

(That can be solved!)

Given an HMM λ and an observation x1,x2,...,xT

� The evaluation problem:compute the probability of the observation p(x1,x2,...,xT | λ)

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

20

p(x1,x2,...,xT | λ)

� The decoding problem: compute the most likely state sequence sq1,sq2,...,sqT, i.e. argmaxq1,..,qT p(q1,..,qT| x1,x2,...,xT, λ)

� The learning/optimization problem: find an HMM λ‘ such that p(x1,x2,...,xT | λ‘ ) > p(x1,x2,...,xT | λ)

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)The Evaluation Problem

� We define αt( j) as the probability of observing the partial

observation x1,x2,...,xt and then being in state sj:

αt( j) = p(x1,x2,...,xt, qt=j | λ )

� p(x1,x2,...,xT | λ) = Σj=1..n αT( j)

� Recursive calculation:

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

21

� Recursive calculation:

αt( j) = bj(xt) · Σi=1..n aij αt-1(i)

α1( j) = π(sj) bj(x1)

The Forward Algorithm

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)The Decoding Problem

� Find the most likely state sequence, i.e.argmaxq1,..,qT p(q1,..,qT, x1,...,xT | λ )

� Define zt( j) = max p(x1,...,xt, qt=j | λ )

zt(j) = maxi(zt-1(i) aij) bj(xt)

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

22

zt(j) = maxi(zt-1(i) aij) bj(xt) z1(j) = π(sj) bj(x1)

� To retrieve the optimal state sequence,we have to remember for every state its optimal predecessorrt( j) = argmaxi(zt-1(i) aij)

The Viterbi Algorithm

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)The Learning / Optimization Problem

� Train parameters of HMM

� Tune λ to maximize P(O | λ)

� No efficient algorithm for global optimum

� Efficient iterative algorithm finds a local optimum

� Viterbi-Training

� Compute Viterbi-Path using Current Model

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

23

� Compute Viterbi-Path using Current Model

� Reestimate Parameters Using Labels Assigned by Viterbi

� Baum-Welch (Forward-Backward) reestimation

� See [Rabiner]

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

24

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)Sign Language Recognition [Starner98]

� American Sign Language (ASL)

� 6000 gesture describe persons,

places and things

� Exact meaning and strong rules

of context and grammar for each

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

25

of context and grammar for each

� Sign recognition

� Previous with instrumental gloves and neural nets

� HMM ideal for complex and structured hand gestures of ASL

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)ASL Test Lexicon

Vocabulary size: 40 words

Part of speech Vocabulary

Pronoun I, you, he, we, you (pl), they

Verb want, like, lose, dontwant, dontlike, love, pack, hit, loan

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

26

Verb want, like, lose, dontwant, dontlike, love, pack, hit, loan

Noun box, car, book, table, paper, pants, bicycle, bottle, can,

wristwatch, umbrella, coat, pencil, shoes, food,

magazine, fish, mouse, pill, bowl

Adjective red, brown, black, gray, yellow

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)Feature Extraction

� Camera either located as a 1st-person and a 2nd-person view

� Segment hand blobs by a skin color model

� Extracted Features: x, y, ∆x, ∆y, size, blob ”angle”, eccentricity

of ellipse

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

27

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)HMM for American Sign Language

� Four-State HMM for each word

� Training:

� Automatic segmentation of sentences in five portions

� Initial estimates by iterative Viterbi-alignment

� Then Baum-Welch re-estimation

� No context used

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

28

� No context used

� Recognition

� With and without part-of-speech grammar (pronoun, verb, noun, adjective, pronoun)

� All features / only relative features used

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)ASL Results: Desk-based

� 348 training and 94 testing sentences without contexts

� Accuracy: N: #Words

D: #Deletions

S: #Substitutions

I: #Insertions

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

29

Experiment Training set Independent test set

All features 94.1% 91.9%

Relative features 89.6% 87.2%

All features &

unrestricted

grammar

81.0%

(D=31, S=287, I=137,

N=2390)

74.5%

(D=3, S=76, I=41,

N=470)

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)ASL Results: Wearable-based

� 400 training sentences and 100 for testing

� Test 5-word sentences

� Restricted and unrestricted similar!

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

30

Experiment Training set Independent test set

Part-of-speech 99.3% 97.8%

5-word sentence 98.2% 97.8%

unrestricted 96.4%

(D=24, S=32, I=45,

N=2500)

96.8%

(D=4, S=6, I=6, N=500)

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

31

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)Pointing Gesture Recognition

� Pointing gestures

� are used to specify objects and locations

� can be needful to resolve ambiguities in verbal statements: „Put that there!“

� Here:

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

32

� Here: Pointing gesture = movement of the arm towards a pointing target

� Tasks:

� Detect occurrence of human pointing gestures in natural arm movements

� Extract the 3D pointing direction

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)

Application: Multimodal Dialogue for

Human-Robot InteractionC

om

pu

ter

Vis

ion

fo

r H

um

an

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

33

(2004)

(ca. 2005)

(2006)

Video

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)Pointing Gesture Phases

� Looking at persons performing pointing gestures, one can identify 3

phases in the movement of the pointing hand:

µ [sec] σ [sec]

Complete gesture 1.75 0.48

Begin 0.52 0.17

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

34

� For estimating the pointing direction, it is crucial to detect the hold

phase precisely!

Begin 0.52 0.17

Hold 0.76 0.40

End 0.47 0.12

Average duration of

pointing gesture phases

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)Modeling the Gesture

� Separate models for each of the 3 phases:

continuous HMMs with 2

Gaussians per state

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

35

� „Garbage model“ acts as a reference for the phase models‘ output

� Models trained on hand-labeled sequences using the Baum-Welch reestimation equations [Rabiner89].

Gaussians per state

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)Classification

� The length of the phases varies strongly. � classify a series of

subsequences s1..n in order to find the best length:

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

36

� Train and evaluate the models with time-reversed sequences, in order to

exploit the recursive nature of the Viterbi algorithm (resp. the forward

algorithm) �[Becker97].

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)Detection

Output of the phase

models during a sequence

of 2 pointing gestures (subtracted by P0)

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

37

� the likelihood P0 of the garbage model M0 is subtracted from the gesture

model likelihoods (P1,P2 ,P3)

� P0 thus serves as a threshold

� � Detect a pointing gesture, whenever there are 3 points in time

tB < tH < tE, so that

� PE(tE) > PB(tE) and PE(tE) > 0

� PB(tB) > PE(tB) and PB(tB) > 0

� PH(tH) > 0

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)

Features

Hand position

( r, ∆θ, ∆y )

� Coordinates of the hand in a cylindrical head-centered coordinate system

Head orientation

( |θHead-θHand|, |ΦHead-ΦHand| )

� Difference between head’s and hand’s azimuth/elevation angle

� 0, if head and hand are

+

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

38

[see also Campbell, 96]

� 0, if head and hand are “in-line“

� Measured with a magnetic sensor

� Visually Estimated

(ANN)

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)Pointing Direction

Head-hand line� Line of sight between head and hand

� provided by the tracker

Forearm orientation

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

39

Forearm orientation� PCA of the point cloud around the

hand � forearm orientation

Head orientation� Visually estimated (ANNs)

� (Magnetic sensor)

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)Experiments and Results

Scenario:

Household Robot

� 118 gestures

� 8 targets

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

40

1

23

45

6

7

8

-2-1

01

2

x [m]

-1

0

1

2

3

4

z [m]

0

1

2

3

y [m]

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)

Video

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

41

• Head and hand tracking will be discussed later (� Tracking II)(Nickel & Stiefelhagen, 2004)

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)Results Pointing Direction

Accuracy of pointing direction estimation

on manually labeled hold-phases

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

42

Error

angle

Targets

identifiedAvailability

Head-hand line 25° 90% 98%

Forearm line 39° 73% 78%

Head orientation 22° 75% (100%)

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)Results in Gesture Recognition

Recall Precision Error angle*

Performance of the automatic gesture recognition

and pointing direction estimation:

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

43

Recall Precision Error angle*

No Head Orientation 79.8% 73.6% 19.4°

Visually estimated

Head-Orientation78.3% 87.1% 16.9°

True (Sensor)

Head Orientation78.3% 86.3% 16.8°

*Head-hand line

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)

Multimodal Human-Robot Interaction

Multimodal Interaction in a Household Scenario

� Person Tracking & Gesture Recognition

Take the cup!

“Which cup do you

want me to take?”

This one!

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

44

& Gesture Recognition

� Speech Recognition

� Multimodal Dialog Manager

� Initiates clarification dialog for missing parameters

� Speech Synthesis

� Integrated on Mobile Robot

� able to find & follow person

SFB 588 Humanoide Roboter - Lernende und

kooperierende multimodale Roboter

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

45

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)Gesture and Speech [Poddar98]

� Classify a weatherman’s gestures:

contour, area, point, [rest]

� Features: hand motion (color segmentation)

� Classification: HMM

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

46

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH


� Certain keywords co-occur with certain gestures � boost a gesture when the according keyword has been detected

� Method: replace a gesture according to the co-occurrence matrix, if� Recognition probability is small

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

47

� Recognition probability is small

� Keyword spotted in gesture time window

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH


Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

48

D

D

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

49

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)Related Topics

� Human motion Analysis

� Gait: classification and person identification

� Classification of motion figures in dance, sport, …

� Facial expression [Yam02]

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

50

� Synthesizing gesture and facial expressions

� Talking heads

� Avatars

Efros et al., Recognizing Action at a

Distance, ICCV’03

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)Summary

� Gesture Taxonomy: � Manipulative - semaphoric - conversational

� Static vs. dynamic

� Hidden Markov Models� “The 3 problems”

� Gaussian mixture models

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

51

� Gaussian mixture models

� Examples� American Sign Language recognition

� Pointing gestures recognition

� Gesture and speech

Co

mp

ute

r V

isio

n f

or

Hu

ma

n-C

om

pu

ter

Inte

rac

tio

n

Un

ive

rsit

ät

Ka

rls

ruh

e (

TH

)References

* [Pavlović97]

V.I. Pavlović, R. Sharma, T.S. Huang: Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997.

* [Rabiner89]Rabiner, L.R.: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proc. IEEE, 77 (2), 257–286, 1989.

Additional:

[Becker97]Becker, D.A.: Sensei: A Real-Time Recognition,Feedback and Training System for T’ai Chi Gestures. M.I.T. Media Lab Perceptual Computing Group Technical Report No. 426, 1997.

[Cassell98]

Cassell, J.: A Framework For Gesture Generation

And Interpretation. In Cipolla, R. and Pentland, A.

(eds.), Computer Vision in Human-Machine

Co

mp

ute

r V

isio

n f

or

Hu

ma

n

Re

se

arc

h G

rou

p,

Un

ive

rsit

ät

cv:h

ci

52

* [Starner98]

T. Starner, J. Weaver, A. Pentland: Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(12):1371--1375, 1998.

* [Nickel04]Nickel, K., Stiefelhagen, R.: 3D-Tracking of Heads and Hands for Pointing Gesture Recognition in a Human-Robot Interaction Scenario, Sixth Int. Conf. On Face and Gesture Recognition, May 2004, Seoul, Korea.

Interaction, pp. 191-215. New York: Cambridge

University Press. 1998.

[Poddar98]

Poddar, I., Sethi, Y., Ozyildiz, E. and Sharma, R.:

Toward Natural Gesture/Speech HCI: A Case Study

of Weather Narration. Proc. Workshops on

Perceptual User Interfaces, pages 1-6, November,

1998.

[Xiong03]

Y. Xioung, F. Quek, D. McNeill: Hand Motion Gestural

Oscillations and Multimodal Discourse. ICMI 2003.

Download - Gesture Recognition - CVHCI › download › visio...Gesture recognition exploits the power of non-verbal communication, ... McNeill, D. 1992. Hand and Mind: What gestures reveal about

Top Related