activity analysis of sign language video

57
Activity Analysis of Sign Language Video Generals exam Neva Cherniavsky

Upload: willis

Post on 05-Jan-2016

28 views

Category:

Documents


1 download

DESCRIPTION

Activity Analysis of Sign Language Video. Generals exam Neva Cherniavsky. MobileASL goal:. Challenges:. ASL communication using video cell phones over current U.S. cell phone network. Limited network bandwidth Limited processing power on cell phones FAQ. Activity Analysis and MobileASL. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Activity Analysis of Sign Language Video

Activity Analysis of Sign Language Video

Generals exam

Neva Cherniavsky

Page 2: Activity Analysis of Sign Language Video

Challenges:• Limited network bandwidth• Limited processing power on cell phones• FAQ

MobileASL goal:• ASL communication using video cell phones over

current U.S. cell phone network

Page 3: Activity Analysis of Sign Language Video

Activity Analysis and MobileASL

• Use qualities unique to sign language– Signing/Not signing/Finger spelling– Information at beginning and ending of signs

Page 4: Activity Analysis of Sign Language Video

Activity Analysis and MobileASL

• Use qualities unique to sign language– Signing/Not signing/Finger spelling– Information at beginning and ending of signs

• Decrease cost of sending video

Page 5: Activity Analysis of Sign Language Video

Activity Analysis and MobileASL

• Use qualities unique to sign language– Signing/Not signing/Finger spelling– Information at beginning and ending of signs

• Decrease cost of sending video– Maximum bandwidth

Page 6: Activity Analysis of Sign Language Video

Activity Analysis and MobileASL

• Use qualities unique to sign language– Signing/Not signing/Finger spelling– Information at beginning and ending of signs

• Decrease cost of sending video– Maximum bandwidth– Total data sent and received

Page 7: Activity Analysis of Sign Language Video

Activity Analysis and MobileASL

• Use qualities unique to sign language– Signing/Not signing/Finger spelling– Information at beginning and ending of signs

• Decrease cost of sending video– Maximum bandwidth– Total data sent and received– Power consumption

Page 8: Activity Analysis of Sign Language Video

Activity Analysis and MobileASL

• Use qualities unique to sign language– Signing/Not signing/Finger spelling– Information at beginning and ending of signs

• Decrease cost of sending video– Maximum bandwidth– Total data sent and received– Power consumption– Processing cost

Page 9: Activity Analysis of Sign Language Video

One Approach: Variable Frame Rate

Page 10: Activity Analysis of Sign Language Video

Variable Frame Rate

• Decrease frame rate during “listening”

• Goal: reduce cost while maintaining or increasing intelligibility– Maximum bandwidth? – Total data sent and received? – Power consumption? – Processing cost?

YESNO

YESYES

Page 11: Activity Analysis of Sign Language Video

Demo

Page 12: Activity Analysis of Sign Language Video

The story so far...

• Showed variable frame rate can reduce cost (25% savings in bit rate)

• Conducted user studies to determine intelligibility of variable frame rate videos– Quality of each frame held constant (data

transmitted decreased with decreased frame rate)

– Lowering frame rate did not affect intelligibility– Freeze frame thought unnatural

Page 13: Activity Analysis of Sign Language Video

Outline

1. Introduction

2. Completed Activity Analysis Researcha. Feature extraction

b. Classification

3. Proposed Activity Analysis Research

4. Timeline to complete dissertation

Page 14: Activity Analysis of Sign Language Video

Activity Analysis, big picture

Raw Data

Feature

Extraction

Classification

Engine

Classification

Modification

Page 15: Activity Analysis of Sign Language Video

Activity Analysis, thus far

Feature

Extraction

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.

, , , ,

Signing, Listening

Classification

Page 16: Activity Analysis of Sign Language Video

Features

H.264 information:

Type of macroblock

Motion vectors

Page 17: Activity Analysis of Sign Language Video

Features cont.

Features:

(x,y) motion vector face

(x,y) motion vector left

(x,y) motion vector right

# of I blocks

Page 18: Activity Analysis of Sign Language Video

Classification

• Train via labeled examples

• Training can be performed offline, testing must be real-time

• Support vector machines

• Hidden Markov models

Page 19: Activity Analysis of Sign Language Video

Support vector machines

• More accurately called support vector classifier

• Separates training data into two classes so that they are maximally apart

Page 20: Activity Analysis of Sign Language Video

Maximum margin hyperplane

Small Margin Large MarginSupport vectors

Page 21: Activity Analysis of Sign Language Video

What if it’s non-linear?

Page 22: Activity Analysis of Sign Language Video

Implementation notes

• May not be separable – Use linear separation, but allow training errors– Higher cost for errors = more accurate model, may

not generalize• libsvm, publicly available Matlab library

– Exhaustive search on training data to choose best parameters

– Radial basis kernel function• As originally published, no temporal information

– Use “sliding window”, keep track of classification– Majority vote gives result

Page 23: Activity Analysis of Sign Language Video

Implementation notes

• May not be separable – Use linear separation, but allow training errors– Higher cost for errors = more accurate model, may

not generalize• libsvm, publicly available Matlab library

– Exhaustive search on training data to choose best parameters

– Radial basis kernel function• As originally published, no temporal information

– Use “sliding window”, keep track of classification– Majority vote gives result

Page 24: Activity Analysis of Sign Language Video

Implementation notes

• May not be separable – Use linear separation, but allow training errors– Higher cost for errors = more accurate model, may

not generalize• libsvm, publicly available Matlab library

– Exhaustive search on training data to choose best parameters

– Radial basis kernel function• As originally published, no temporal information

– Use “sliding window”, keep track of classification– Majority vote gives result

Page 25: Activity Analysis of Sign Language Video

SVM Classification Accuracy

Test video SVM SVM

3 frame

SVM

4 frame

SVM

5 frame

gina1 87.8% 88.8% 87.9% 88.7%

gina2 85.2% 87.4% 90.3% 88.3%

gina3 90.6% 91.3% 91.1% 91.3%

gina4 86.6% 87.1% 87.6% 87.6%

Average 87.6% 88.7% 89.2% 89.0%

Page 26: Activity Analysis of Sign Language Video

Hidden Markov models

• Markov model: finite state model, obeys Markov propertyPr[Xn = x | Xn-1 = xn-1, Xn-2 = xn-2, … X1 = x1]

= Pr [Xn = x | Xn-1 = xn-1]

• Current state depends only on previous state

• Hidden Markov model: states are hidden, infer through observations

Page 27: Activity Analysis of Sign Language Video

0.2

0.4

0.2

0.1

0.7

0.5 0.3

0.40.4

0.3

0.1

0.2

0.60.4

0.4

0.5

0.1

0.2

Page 28: Activity Analysis of Sign Language Video

Different models

0.3

0.4

0.8

0.1

0.1

0.2 0.2

0.20.5

0.1

0.1

0.1

0.8 0.5

0.4

0.5

0.1

0.6

0.2

0.4

0.2

0.1

0.7

0.5 0.3

0.40.4

0.3

0.1

0.2

0.60.4

0.4

0.50.1

0.2

Page 29: Activity Analysis of Sign Language Video

Two ways to solve recognition

1. Given observation sequence O and a choice of models , maximize Pr(O| )

Speech recognition: which word produced observation?

2. Given observation sequence and model, find the most likely state sequence.

Has been used for continuous sign recognition.

??

?

Page 30: Activity Analysis of Sign Language Video

Two ways to solve recognition

1. Given observation sequence O and a choice of models , maximize Pr(O| )

Speech recognition: which word produced observation?

2. Given observation sequence and model, find the most likely state sequence.

Has been used for continuous sign recognition.

??

?

Page 31: Activity Analysis of Sign Language Video

Two ways to solve recognition

1. Given observation sequence O and model , what is Pr(O| )?

Speech recognition: which word produced observation?

2. Given observation sequence and model, find the most likely state sequence.

Has been used for continuous sign recognition [Starner95].

Page 32: Activity Analysis of Sign Language Video

Implementation notes

• Use htk, publicly available library written in C

• Model signing/not signing as “words”– Other possibility is to trace state sequence– Each is a 3 state model, no backward

transitions

• Must include some temporal info, else degenerate (biased coin flip)

• Use 3, 4, and 5 frame window

Page 33: Activity Analysis of Sign Language Video

Implementation notes

• Use htk, publicly available library written in C

• Model signing/not signing as “words”– Other possibility is to trace state sequence– Each is a 3 state model, no backward

transitions

• Must include some temporal info, else degenerate (biased coin flip)

• Use 3, 4, and 5 frame window

Page 34: Activity Analysis of Sign Language Video

HMM Classification Accuracy

Test video HMM

3 frame

HMM

4 frame

HMM

5 frame

Best SVM

gina1 87.3% 88.4% 88.4% 88.8%

gina2 85.4% 86.0% 86.8% 90.3%

gina3 87.3% 88.6% 89.2% 91.3%

gina4 82.6% 82.5% 81.4% 87.6%

Average 85.7% 86.4% 86.5% 89.2%

Page 35: Activity Analysis of Sign Language Video

Outline

1. Motivation

2. Completed Activity Analysis Research

3. Proposed Activity Analysis Researcha. Recognize finger spelling

b. Recognize movement epenthesis

4. Timeline to complete dissertation

Page 36: Activity Analysis of Sign Language Video

Activity Analysis, thus far

Feature

Extraction

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.

, , , ,

Signing, Listening

Classification

Page 37: Activity Analysis of Sign Language Video

Activity Analysis, proposed

Feature

Extraction

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.

, , , ,

Signing, Listening, Finger spelling

Classification

Movement epenthesis

Page 38: Activity Analysis of Sign Language Video

Proposed Research

• Recognize new activity– Finger spelling– Movement epenthesis (= sign segmentation)

• Questions– Why is this valuable?– Is it feasible?– How will it be solved?

Page 39: Activity Analysis of Sign Language Video

Why? Finger spelling

Believe that increased frame rate will increase intelligibility

Will confirm optimal frame rate through user studies

Page 40: Activity Analysis of Sign Language Video

Why? Movement epenthesis• Choose frames so that

low frame rate more intelligible

• Potentially first step in continuous sign language recognition engine

• Irritation must not outweigh savings; verify through user studies

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 41: Activity Analysis of Sign Language Video

Is it feasible?

• Previous (somewhat successful) work:– Direct measure device– Rules-based

• Change in motion trajectory, low motion [Sagawa00]

• Finger flexion [Liang98]

• Previous very successful work (98.8%)– Neural Network + direct measure device– Frame classified as left boundary, right

boundary, or interior [Fang01]

Page 42: Activity Analysis of Sign Language Video

Is it feasible?

• Previous (somewhat successful) work:– Direct measure device– Rules-based

• Change in motion trajectory, low motion [Sagawa00]

• Finger flexion [Liang98]

• Previous very successful work (98.8%)– Neural Network + direct measure device– Frame classified as beginning of sign, end of

sign, or interior [Fang01]

Page 43: Activity Analysis of Sign Language Video

How?

• Improved feature extraction– Use the part of sign to inform extraction– See what works from the sign recognition

literature

• Improved classification

Page 44: Activity Analysis of Sign Language Video

Parts of sign

• Handshape– Most work in sign language recognition focused here– Includes expensive techniques (time, power)

• Movement– We only use this right now!– Often implicitly recognized in machine learning

• Location• Palm orientation• Nonmanual signals (facial expression)

Page 45: Activity Analysis of Sign Language Video

Parts of sign

• Handshape– Most work in sign language recognition focused here– Includes expensive techniques (time, power)

• Movement– We only use this right now!– Often implicitly recognized in machine learning

• Location• Palm orientation• Nonmanual signals (facial expression)

Page 46: Activity Analysis of Sign Language Video

Parts of sign

• Handshape– Most work in sign language recognition focused here– Includes expensive techniques (time, power)

• Movement– We only use this right now!– Often implicitly recognized in machine learning

• Location• Palm orientation• Nonmanual signals (facial expression)

Page 47: Activity Analysis of Sign Language Video

Add center of gravity to features

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.

Page 48: Activity Analysis of Sign Language Video

Parts of sign recognized by center of gravity

• Handshape • Movement• Location• Palm orientation• Nonmanual signals (facial expression)

Page 49: Activity Analysis of Sign Language Video

Accurate COG

• Bayesian filters– Very similar to hidden Markov models– What state are we in, given the (noisy)

observations?– Find posterior pdf of state– Kalman filter, particle filter

• Viola and Jones [01] object detection

Page 50: Activity Analysis of Sign Language Video

Bayesian filters

UpdatePredictKalman: assume linear system, minimize MSE; measure

Particle: sum of weighted samples; measure, update weights

Kalman: add in noise, guess state

Particle: add in noise, guess particle location

Page 51: Activity Analysis of Sign Language Video

How?

• Improved feature extraction

• Improved machine learning– 3 class SVM for finger spelling– State sequence HMM– AdaBoost [Freund97]

Page 52: Activity Analysis of Sign Language Video

AdaBoost (adaptive boosting)

Page 53: Activity Analysis of Sign Language Video

AdaBoost Algorithm

• In each round t = 1 to T:– Train a “weak learner” on weighted data

– ht : features {signing, listening}, error is sum of weights of misclassfied examples

t = 1/2 ln((1 - error)/error)

– Reweight based on error, normalize weights

• Answer is sign(∑t t ht)

Page 54: Activity Analysis of Sign Language Video

Outline

1. Motivation

2. Completed Research

3. Proposed Research

4. Timeline to complete dissertation

Page 55: Activity Analysis of Sign Language Video

Timeline

• October 2007 - March 2008: Recognize signing/listening/finger spelling

• Deadline: Automatic Face and Gesture Recognition, March 28, 2008 1. Bayesian filters for better features. 2. Viola and Jones’s object detection.3. Improve hidden Markov model.4. Evaluate three class support vector machine. 5. Implement AdaBoost, cascade. 6. Experiment with combining these techniques.

Page 56: Activity Analysis of Sign Language Video

Timeline, cont.

• April 2008 - May 2008: Run user study to evaluate optimal frame rate for finger spelling.

• Deadline: ASSETS 2008, May 25, 2008• June 2008 - December 2008: Apply techniques

to the problem of sign segmentation. 1. Evaluate feature set and improve.2. Conduct a user study to evaluate intelligibility of

dropping frames during movement epenthesis. 3. Improve machine learning techniques; implement

combination via decision trees.

• Early 2009: Complete dissertation and defend.

Page 57: Activity Analysis of Sign Language Video

Questions?