human action recognition in videos employing 2dpca on 2dhoof and radon transform

85
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform Presented in Partial Fullment of the Requirements of the Degree of Masters of Science in the School of Communication and Information Technology Fadwa Fawzy Fouad Supervisor: Dr. Moataz M.Abdelwahab

Upload: fadwa-fouad

Post on 10-May-2015

318 views

Category:

Education


1 download

TRANSCRIPT

Page 1: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Human Action Recognition in Videos Employing

2DPCA on 2DHOOF and Radon Transform

Presented in Partial Fullment of the Requirements of the Degree of Masters of Science in the School of Communication and Information Technology

Fadwa Fawzy FouadSupervisor: Dr. Moataz

M.Abdelwahab

Page 2: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Agenda

Introduction

Quick overview

2DHOOF/2DPCA Contour Based Optical Flow Algorithm

Human Gesture Recognition Employing Radon Transform/2DPCA

Page 3: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Introduction

• Importance & Applications• Action V.S. Activity• Challenges & characteristics of the domain

Page 4: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Importance &Applications

Human action\activity recognition is one of the most promising applications of computer vision. The interest of this topic is motivated by the promise of many applications include

• character animation for games and movies

• advanced intelligent user interfaces

• biomechanical analysis of actions for sports and medicine

• automatic surveillance

Page 5: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Action V.S. Activity

Action

Simple motion pattern

Single person

Short time duration

Activity

Complex sequence of actions

Single/ multiple person(s)

Long time duration

Page 6: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Challenges and characteristics of the

domainThe difficulty of the recognition process is associated with multiple variation sources

Inter- and intra-class variations

Environmental Variations and Capturing conditions

Temporal variations

Page 7: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

• Inter-class variations (variations within single class)

The variations in the performance of certain action due to anthropometric differences between individuals. For example, running movements can differ in speed and stride length.

• Intra-class variations (variations within different classes)

Overlap between different action classes due to the similarity in actions performance.

Page 8: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

• Environmental variations

Destructions originate from the actor’s surroundings include dynamic or cluttered environments, illumination variation, Body occlusion

• Capturing conditions

Depend on the method used to capture the scene, wither single\multiple static/dynamic camera(s) systems.

• Temporal variations

Includes the changes in the performance rate from one person to another. Also, the changes in the recording rate (frame/sec).

Page 9: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Agenda

Introduction

Quick overview

2DHOOF/2DPCA Contour Based Optical Flow Algorithm

Human Gesture Recognition Employing Radon Transform/2DPCA

Page 10: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Overview

The main structure of action recognition system

Page 11: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

The main structure of action recognition

systemThe structure of the action recognition system is typically hierarchical.

Action classificati

on

Extraction of the action descriptors

Human detection & segmentation

Capture the input videoStart

End

Page 12: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Capture the input video

For single camera, the scene is captured from only one viewpoint, so it can't provide enough information about the action performed in case of poor viewpoint. Besides, it can't handle the occlusion problem.

Video 1

Video 2

Video 3 Video 4

Page 13: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Multi-camera systems can capture the same view from different poses., so they provide sufficient information that can alleviate the occlusion problem.

Camera 0 Camera 1

Camera 2 Camera 3

Page 14: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

The new technology of Kinect depth camera can be utilized to capture theperformed actions. The device has: RGB camera, depth sensor and multi-array microphone.

It provides full-body 3D motion capture, facial recognition and voice recognition capabilities. Furthermore, depth information can be used for segmentation.

Kinect depth camera

RGBinformation

Depth information

Page 15: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

It’s the first step of the full process of human sequence evaluation.

Techniques can be divided into :

• Background Subtraction techniques

• Motion Based techniques

• Appearance Based techniques

• Depth Based Segmentation

Human detection & segmentation

Page 16: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Extraction of the action descriptors

Input videos consist of massive amounts of information in the form of spatio-temporal pixel intensity variations. But most of this information is not directly relevant to the task of understanding and identifying the activity occurring in the video.

In this work we used Non-Parametric approaches in which a set of features are extracted per video frame, then these features are accumulated and matched to stored templates.

Example: Motion Energy Image & Motion History Image

Page 17: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

When the extracted features are available for an input video, human action recognition becomes a classification problem.

Dimensionality reduction is a common step before the actual classification and is discussed first.

Action classificati

on

Dimensionality reductionImage representations are often high-dimensional. This makes matching task computationally more expensive. Also, the representation might contain noisy features. This problem trigged the idea of obtaining a more compact, robust feature representation by reducing the space of the image representation into a lower dimensional space.

Example: One\Two Dimension(s) Principal component analysis (PCA)

Page 18: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Nearest neighbor classification

k-Nearest neighbor (NN) classifiers use the distance between the features of anobserved sequence and those in a training set. The most common label among the k closest training sequences is chosen as the classification.

NN classification can be either performed at the frame level, or for the whole video sequences. In the latter case, issues with different frame lengths need to be resolved.

In our work we used 1-NN with Euclidean distance to classify the tested actions.

is class

is class

Page 19: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Agenda

Introduction

Quick overview

2DHOOF/2DPCA Contour Based Optical Flow Algorithm

Human Gesture Recognition Employing Radon Transform/2DPCA

Page 20: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

2DHOOF/2DPCA Contour BasedOptical Flow Algorithm

• Dense V.S. Sparse OF• Alignment issues with OF• The Calculation of 2D Histogram of Optical Flow(2DHOOF)• Overall System Description• Experimental Results

Page 21: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Dense V.S. Sparse OF

In practice, dense OF is not the best choice to get the OF. Besides it’s high computation complexity, it is not accurate for homogenous moving objects (aperture problem).

Page 22: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Align actor then calculate OF

Calculate OF then Align it

Alignment issues with OF

We had two choices to decide the best order for actor alignment:

Page 23: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Jumping & Transition effects in Running action

Page 24: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Align actor then calculate OF Calculate OF then Align OF

Page 25: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

The Calculation of 2D Histogram of Optical

Flow(2DHOOF)

Calculated OF

Histogram layersW/m x H/m x n

Page 26: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

An example to obtain the n-layers 2DHOOF for any two successive frames

Page 27: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Accumulated 2D-HOOF that represents the whole video

Page 28: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

1DHOOF V.S. 2DHOOF

Page 29: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Confusion between Wave and Bend actions when using 1DHOOF

Wave

Bend

Page 30: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Overall System Description

Segmentation & Contour Extraction

Extract the dominant vectors

Store extracted features

Sparse OF 2DHOOF 2DPCA

Segmentation & Contour Extraction

Projection on the

dominant vectors

Classification and Voting

Scheme

Sparse OF 2DHOOF

Training Mode

Testing Mode

Page 31: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Training Mode

Segmentation & Contour Extraction

Extract the dominant vectors

Store extracted features

Sparse OF 2DHOOF 2DPCA

Page 32: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Segmentation & Contour Extraction (Method 1)

• Geodesic segmentation

Input Video Frame

Face Detection

Initial Stroke

Blob Extraction

Final Contour

GD

Where xi : stroke pixels (black)x : other pixels (white)I : image intensity

Page 33: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Segmentation & Contour Extraction (Method 2)

• Contour extraction from Magnitude dense OF

Edge pixel has specific criteria based on it's (3 x 3) neighbor pixels.

Page 34: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Applying edgy criteria on the magnitude of the dense OF

Page 35: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Steps of contour extraction from dense OF

Page 36: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Training Mode

Segmentation & Contour Extraction

Extract the dominant vectors

Store extracted features

Sparse OF 2DHOOF 2DPCA

Page 37: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

2DHOOF-2DPCA Features Extraction

Projection

Final Features

2DHOOF ofTraining Videos

Mea

n/L

ayer

Cov

aria

nce

/Lay

er

Dom

inan

t Ve

ctor

s/La

yer

Page 38: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Training Mode

Segmentation & Contour Extraction

Extract the dominant vectors

Store extracted features

Sparse OF 2DHOOF 2DPCA

Page 39: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Testing Mode

Segmentation & Contour Extraction

Projection on the

dominant vectors

Classification and Voting

Scheme

Sparse OF 2DHOOF

Page 40: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Projection on the dominant vectors

Page 41: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Classification

D1

D2

D3

Dj

Final Decision

based on the minimum D

value

Page 42: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Experimental Results

Two experiments were conducted to evaluate the performance of the proposed algorithm.

• For the first experiment Weizmann dataset was used to measure the performance of the low resolution single camera operation.

• For the second Experiment IXMAS multi-view dataset was used to evaluate the performance of the parallel camera structure.

The two experiments was conducted using the Leave-One-Actor-Out (LOAO) technique to be consistent with the most recent algorithms.

Both datasets provide RGB frames and the actor ‘s silhouettes.

Page 43: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Weizmann dataset

The Weizmann dataset consists of 90 low-resolution video sequences showing 9 different actors, each performing 10 natural actions such as walk, run, jump forward, gallop sideways, bend, wave with one hand (wave1), wave with two hands (wave2), jump in place (Pjump), jump-jack, and skip.

Bend Run Jump Jump-jack Gallop

Page 44: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

The confusion matrix for this experiment shows that the average recognition accuracy is 97.78%, and eight actions were 100% accurate.

2DHOOF / 2DPCA

Page 45: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

On the other hand, using 1DHOOF with 1DPCA decreases the accuracy to 63.34% because of the large confusion between actions (as discussed before).

1DHOOF / 1DPCA

Page 46: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Comparison with the most recent algorithms:

Method Accuracy

Previous Contribution

98.89%

Our Algorithm 97.79%

Shah et al. 95.57%

Yang et al. 92.8%

Yuan et al. 92.22%

• Recognition Accuracy

Method Average Runtime

Our Algorithm 66.11 msec

Previous Contribution

113.00 msec

Shah et al. 18.65 sec

Blank et al. 30 sec

• Average Testing Time

Page 47: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Samples from the calculated contour OF

Walk Skip P-jump

Page 48: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

IXMAS Dataset

The proposed parallel structure algorithm was applied on the IXMAS multi-view dataset. Each camera is considered as an independent system, then a voting scheme was carried out between the four cameras to obtain the final decision.

Our AlgorithmCamera0

Our AlgorithmCamera1

Our Algorithm

Our Algorithm

Camera2

Camera3

Voting Scheme

Final Decision

This dataset consists of 5 cameras capturing the scene, 12 actors, each performing 13 natural actions 3 times in which the actors are free to change their orientation for each scenario.

The actions: check watch, cross arms, scratch head, sit down, get up, turn around, walk, wave, punch, kick, and pick up and throw.

Page 49: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Example on IXMAS multi-camera dataset. Action: Pick up and Throw

Camera 0 Camera 1

Camera 2 Camera 3

Page 50: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Confusion matrix for IXMAS dataset shows that average accuracy is 87.12%,where SH=Scratch head, CW=Check watch, CA=Cross arms, SD=Sit down, GU=Get up, TA=Turn around, PU=Pick up.

Page 51: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Method Actors #

Cam(0) %

Cam(1) %

Cam(2) %

Cam(3) %

Overall

Vote%

Proposed Algorithm 12 97.29 79.04 72.47 78.53 87.12

Previous Contribution

12 78.9 78.61 80.93 77.38 84.59

Weinland et al. 10 65.04 70.00 54.30 66.00 81.30

Srivastava et al. 10 N/A N/A N/A N/A 81.40

Shah et al. 12 72.00 53.00 68.00 63.00 78.00

Comparison with the best reported accuracies shows that we achieved the highest accuracy with an enhancement of 3%.

Bold indicates the best performance, N/A= Not available in published reports

Page 52: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Samples from the calculated contour OF

Walk Set down Kick

Page 53: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Published Paper

F. Fawzy, M. Abdelwahab, and W. Mikhael. 2DHOOF-2DPCA Contour Based Optical Flow Algorithm for Human Activity Recognition . IEEE International Midwest Symposium on Circuits and Systems (MWSCAS 2013), Ohio, USA.

Page 54: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Agenda

Introduction

Quick overview

2DHOOF/2DPCA Contour Based Optical Flow Algorithm

Human Gesture Recognition Employing Radon Transform/2DPCA

Page 55: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Human Gesture Recognition Employing

Radon Transform/2DPCA

• Radon Transform (RT)• Overall system description

Page 56: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Radon Transform

The RT computes projections of an image matrix along specified directions. A projection of a two-dimensional function f(x,y) is a set of line integrals along parallel paths, or beams.

Page 57: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Projections can be computed along any angle , by using general equation of the Radon Transform:

where is the delta function with value not equal zero only for argument equal 0, and is the projection direction, and is the orientation of this direction.

Page 58: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Overall system description

The proposed system is designed and tested for gesture recognition and can be extended to regular action recognition.

We have two modes for this algorithm• Training Mode• Testing Mode

Both have a pre-processing step before feature extraction.

Page 59: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Training Mode

Page 60: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Pre-processing Step: 1) Input videos

The One Shot Learning ChaLearn Gesture Dataset was used for this experiment. In this dataset a single user facing a fixed Kinect™ camera, interacting with a computer by performing gestures was captured.

Videos are represented by RGB and depth images.

Each actor has from 8 to 15 different gestures(vocabulary) for training, and 47 input videos each has from 1 to 5 gesture(s) for testing.

We applied our algorithm on a subset of this dataset consists of 37 different actors.

Page 61: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

The dataset can be divided into two main groups; standing actors, and sitting actors. In this experiment we used a subset of the standing actor group in which actors are using their whole body to perform the gesture and make significant motion to be captured by the MEI and MHI.

Standing actors Sitting actors

Page 62: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Also, we used only the depth videos as input videos. Depth information makes the segmentation task easier than using RGB or gray videos, especially when the actor's clothes have the same color as the background, or textured background.

Page 63: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Training Mode

Page 64: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Pre-processing Step: 2) Segmentation & Blob extraction

We used Basic Global Thresholding Algorithm in order to extract the actor's blob.

1. Select an initial estimate for T (typically the average grey level in the image).

2. Segment the image using T into two groups of pixels: consisting of pixels with grey levels > T and consisting pixels with grey levels < T.

3. Compute the average grey levels of pixels in to give and to give .

4. Compute a new threshold value: Repeat steps 2-4 until the difference T is less than 1 or the number of total iterations is more than 10.

Page 65: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
Page 66: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

In some cases the resultant blob has some objects with it. This noise results from some objects that were at the same depth as the actor.

Case 1

Case 2

Case 3

Page 67: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

In this situation we perform a noise elimination step

Case 1

Case 2

Case 3

Page 68: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Training Mode

Page 69: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Alignment using RT of the First Frame

• Vertical alignment using the projection on the y-axis (90o from RT)

Page 70: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

• Horizontal alignment using the projection on the x-axis (0o from RT)

Page 71: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Training Mode

Page 72: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Calculate the MEI and MHI

MEI MHI MEI MHI

Whole Body Body Parts

Page 73: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Training Mode

Page 74: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Get Radon Transform for MEI and MHI

Page 75: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Basically, the difference between RT of the whole body and RT of the body parts is the white portion in the center representing the projection of the actor's body

Page 76: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Training Mode

Page 77: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Testing Mode

Page 78: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Video Chopping

We can do that by two main steps :1. Calculate the plot that represents the moving area/frame2. Apply the Local minima criteria on this plot.

As we have mentioned, the testing videos may contain from 1 to 5 different gestures per video. In this case we need to separate these gestures into one gesture per video to test our system with.

Page 79: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

1. Calculate the plot that represents the moving area/frame

Page 80: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

2. Apply the Local minima criteria

We are searching for a frame i that satisfies the following conditions:

a) The number of frames before this i is greater than or equal to the Frame Threshold.

b) The amount of decrease in the area at i is greater than 50% of Peak value.

c) The area at i-1 and i+1 is grater than the area at i to insure that i is a local minima between two peaks.

Page 81: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Good Results

Page 82: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Bad Results

Page 83: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Experimental Results

We did four One Shot Learning experiments

OSL Experimen

ts

Radon Transform

2DPCA

Direct correlation

MEI/MHI

2DPCA

Direct correlation

I, II

III, IV

Page 84: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Features

Experiment

Whole Body Body Parts

MEI MHI MEI MHI

RTI 71 69 82 81.5

II 70 70 81.7 81.6

MEI/MHIIII 70 68 82 81.7

IV 71.24 68.7 83.33 82.9

Recognition accuracy of the four experiments

Comparison between using RT, and using MEI/MHI directly without RT

Features % Maintained Energy

Storage Requirements

RT 99% 72 Mbytes

MEI/MHI 88% 102Mbytes

30% OFF

2D

PC

A

Better

Page 85: Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Thank You