human activity recognition from video
TRANSCRIPT
-
8/3/2019 Human Activity Recognition From Video
1/24
Guide Presented by:Prof. Vijay Bhosale Ms.Rajashri S.
Ms. Bhagyashri S.Ms. Dipashri S.
HUMAN ACTIVITY RECOGNITIONfrom VIDEO
-
8/3/2019 Human Activity Recognition From Video
2/24
Human Activity Recognition
Focus is on three fundamental issues: Design of a classifier & data modeling for
activity recognition
How to perform feature selection How to define the structure of the classifier
-
8/3/2019 Human Activity Recognition From Video
3/24
Introduction
Human movement at different levels: Analysis of the movement of body parts
Single person activities
Over increasing temporal windows Large scale interaction
Human Motion analysis common tasks:
Person detection & tracking
Activity classification
Behavior interpretation & person identification
-
8/3/2019 Human Activity Recognition From Video
4/24
Human action interpretation
Three Approaches:
1.Generic model recovery
- Try to fit 3D model to the person pose
2.Appearance based model-based on extraction of 2D shape
model directly from the image
3. Motion based model-rely on people motion characteristics
-
8/3/2019 Human Activity Recognition From Video
5/24
Bobick and davis work [1] used Motion EnergyandMotion History Images(MEI and MHI) to classifyaerobic-type exercises.
Efros et al [3] compute optical flow measurements ina spatio-temporal volume to recognize humanactivities in a nearest-neighbor framework.
The CAVIAR [13] sequences are used in [7] torecognize a the set of activities, scenarios and roles.The approach generates a list of features andautomatically chooses the smallest set, thataccurately identifies the desired class.
The design of the classifier, we use Bayesianclassifier
Functions are modeled as Gaussian mixture
-
8/3/2019 Human Activity Recognition From Video
6/24
Low level activities & features
The activities can be detected from a relatively shortvideo sequences & are described below:
id #frame Activity Description
1 3,211 Inactive A static person/object
2 1,974 Active Person making movements butwithout translating in the image
3 9,831, Walking There are movements & overall
image translation4 297 Running As in walking but with larger
translation
5 594 Fighting Large quantities of movement with
few translation
-
8/3/2019 Human Activity Recognition From Video
7/24
Features
There are two large sets of features, eachorganized in several subgroups.
1.Subset of features code the instantaneousposition & velocity of the tracked subject.
- Organized in 3 groups:
i) instantaneous measurement
ii) Avg. speed/velocity based featuresiii) 2nd order moments/energy related
indicators.
-
8/3/2019 Human Activity Recognition From Video
8/24
2. Based on estimates of the optic flow orinstantaneous pixel motion inside thebounding box.
-Organized in 4 subgroups:
i) instantaneous measurement
ii) Spatial 2nd order moments
iii) Temporal averaged quantities &
iv) Temporal 2nd
order moments/energyrelated indicators.
-
8/3/2019 Human Activity Recognition From Video
9/24
Feature Selection & Recognition
1.The recognition strategy:
Given a set of activities Aj,j=1,n, theposterior probability of a certain activity takingplace can be computed using Bayes rules:
P(Aj|F(t))= p(F(t)|Aj)P(Aj)/p(F(t)Where,
P(Aj|F(t)) is the likelihood of activity AjP(Aj) is the prior probability of the same activity
p(F(t) is the probability of observing F(t),irrespective of the underlying activity.
-
8/3/2019 Human Activity Recognition From Video
10/24
To build the Bayesian classifier, estimatethe likelihood function of the features, giveneach class.
The likelihood function is approximatedby:
p(F(t)|Ak)j N(j,j )
Where
N(j,j ) denotes a normal distibutionj represents the weight of that Gaussian in the
mixture for each listed activity
-
8/3/2019 Human Activity Recognition From Video
11/24
2.Selecting promising features:
Three Approaches:
1.Brute-Search
2.Lite Search3.Lite-lite search
Following table summarizes the cost of these different method,for M=29
Brute-search Lite search Lite-lite
Nf CNfM=M!/Nf !(M- Nf )! M+(M-1)+(Nf term) M+1
1 29 29 29
2 406 57 30
3 3654 84 30
-
8/3/2019 Human Activity Recognition From Video
12/24
Relief algorithm creates a weight vector overall features to quantify their quality.
This vector is updated according to:
wi= wi+(xi-nearmiss(x)i)2-(xi-nearhit(x)i)
2
wherewi represents the weight vector
Xi the ith feature for data point x
nearmiss(x) & nearhit(x) denote the nearest pointto x from the same & different classrespectively.
-
8/3/2019 Human Activity Recognition From Video
13/24
Following table show the results obtained usingthese different feature search criteria & for 1,2, or 3
features.
BruteSearch
LiteSearch
Lite-LiteSearch
Relief
Feat
.
Feat. R.
rate
Feat. R.
rate
Feat. R.
rate
Feat. R.
rate
1 7 83,9% 7 83,9% 7 83,9% 14 46,8%
2 9 18 93,5% 7 25 89,8% 7 18 89,6% 14 18 59,2%
3 3 9 20 94% 7 19 25 92,1% 7 1823
86,7% 14 18 23 57,1%
-
8/3/2019 Human Activity Recognition From Video
14/24
Classifier Structure
Group activities in subsets & perform
classification in a hierarchical mannerFigure shows binary hierarchical classifier
ActiveInactive
WalkingRunningFighting
95,5% classifier1
99,3%
classifier 2Inactive Active
Walking
Running Fighting
98,8%
classifier 3
Walking Running
100%classifier 4
-
8/3/2019 Human Activity Recognition From Video
15/24
Categories:
1.Gestureselementary movements of a persons bodyparts and are the atomic components describing themeaningful motion of a person.
For eg: stretching an arm, raising a leg
2.Actions- single person activities that may becomposed of multiple gestures organized temporallysuch as walking, waving and punching.
3. Interactions
involves 2 or more persons or object.Eg. 2 persons fighting
4. Group Activitiesconceptual group composed ofmultiple persons or objects.
Eg. Group having meeting,2 groups fighting
Human Activity Analysis
-
8/3/2019 Human Activity Recognition From Video
16/24
Human Activity Recognitionmethodologies
-
8/3/2019 Human Activity Recognition From Video
17/24
1.Single layered Approach
Represent and recognize human activities
directly based on sequences of images. Analyze sequential movements of humans
such as walking, jumping and waving.
Categorized into 2 classes Space-time approach
Space-time volume
Space-time trajectories
Space-time features
Sequential approach Exemplar based approach
State model based approach
-
8/3/2019 Human Activity Recognition From Video
18/24
Space-time approach
Approaches that recognize human activities byanalyzing space-time volumes of activity videos.
The video volumes are constructed by concatenatingimage frames along a time axis, and are compared to
measure their similarities. Figure 4 shows example 3-D XYT volumes
corresponding to a human action of `punching'.
-
8/3/2019 Human Activity Recognition From Video
19/24
Various recognition algorithms usingspace-time representations
Template matching, which constructs arepresentative model (i.e. a volume) per actionusing training data.
Neighbor-based matching, the systemmaintains a set of sample volumes (ortrajectories) to describe an activity.
Statistical modeling algorithms, which matchvideos by explicitly modeling a probabilitydistribution of an activity.
-
8/3/2019 Human Activity Recognition From Video
20/24
1. Action Recognition with Space-time volumes:
The core of the recognition is in the similarity measurementbetween two volumes.
Bobick & Davis constructed a real-time action recognitionsystem using template matching. Represents each action with a template composed of two 2-
dimensional images: a 2-dimensional binary motion-energy image(MEI) and a scalar-valued motion-history image (MHI).
These images are constructed from a sequence of foreground images,
which essentially are weighted 2-D (XY) projections of the original 3-DXYT space-time volume.
Shechtman and Irani have estimated motion flows from a 3-D space-time volume to recognize human actions.
Rodriguez have analyzed 3-D space-time volumes by
synthesizing filters: adopted the maximum averagecorrelation height (MACH) filters used for an analysis ofimages (e.g. object recognition), to solve the actionrecognition problem.
Disadvantage : difficulty in recognizing actions when multiplepersons are present in the scene.
-
8/3/2019 Human Activity Recognition From Video
21/24
2. Action recognition with space-time trajectories.
Interpret an activity as a set of space-time trajectories.
A person is generally represented as a set of 2-dimensional (XY) or 3-dimensional (XYZ) pointscorresponding to his/her joint positions.
Advantage : ability to analyze detailed levels of human
movements.
3 A ti iti i ti l l f t
-
8/3/2019 Human Activity Recognition From Video
22/24
3. Action recognition using space-time local features:
Approaches using local features extracted from 3-dimensional space-time volumes to represent and
recognize activities. Focusing on three aspects:
what 3-D local features the approaches extract,
how they represent an activity in terms of the extracted
features, and what methodology they use to classify activities.
Advantages: By its nature, background subtraction orother low-level components are generally not required,
and the local features are scale, rotation, and translationinvariant in most cases.
Suitable for recognizing simple periodic actions such as`walking' and `waving',
S ti l h
-
8/3/2019 Human Activity Recognition From Video
23/24
Sequential approaches Recognize human activities by analyzing
sequences of features.
Two categories Exemplar-based recognition approaches
State model-based recognition approaches
Exemplar-based sequential approachesdescribe classes of human actions usingtraining samples directly.
State model-based sequential approaches areapproaches that represent a human action byconstructing a model which is trained togenerate sequences of feature vectors
corresponding to the activity.
-
8/3/2019 Human Activity Recognition From Video
24/24
*** THANK U ***