cvpr2011: human activity recognition - part 5: description based
TRANSCRIPT
![Page 1: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/1.jpg)
Frontiers of
Human Activity Analysis
J. K. Aggarwal
Michael S. Ryoo
Kris M. Kitani
![Page 2: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/2.jpg)
2
Description-based
approaches
2000s
![Page 3: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/3.jpg)
3
Approach paradigm
Description-based approach
We represent the structure of the activities,
and recognize activities using semantic matching.
Hand shake = “two persons do shake-action (stretches, stays
stretched, withdraw) simultaneously, while touching”.
Recognition by finding observations satisfying the definition.
Humans conceptual
knowledge on
human activity
Human activity
representation
Input sequence Atomic action
(e.g. arm stretch)
recognition Low-level
processing
High-level
activity
recognition
![Page 4: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/4.jpg)
4
Comparisons
Approaches Levels of hierarchy
Complex temporal relations
Complex logical con-catenations
Recognition of recursive activities
Handle imperfect low-
levels
Statistical limited
(depends on data amount)
√
Syntactic unlimited √ √
Siskind 2001 unlimited a sub-event participates only once
√
Hongeng et al. 2004
limited (3-levels)
√ √
Vu et al. 2003
unlimited √ conjunctions
only
Ryoo and Aggarwal
2009 unlimited √ √ √ √
Gupta et al. 2009
limited (2-levels)
√ network form
only √
![Page 5: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/5.jpg)
5
Description-based
approaches
Human interactions
2000s
![Page 6: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/6.jpg)
6
Recognition of human interactions
Input sequences
Body-part layer
Pose layer
Gesture layer
Semantic layer
1 : Arm withdrawn
8 : Arm somewhat stretched
13 : Arm fully stretched
1 : Arm withdrawn
8 : Arm withdrawn
13 : Arm withdrawn
<1,20> : Facing right
<1,20> : Arm staying
<1,20> : Leg staying
<1,20> : Facing left
<4,20> : Arm stretching
<1,20> : Leg staying
Interaction
Gesture Elementary
movement of a person
Pose Abstract status
of a body part.
Body-part feature. Numerical status
of a body part.
Person “pushed” by
Interaction:
Person in time interval
<4, 20>
Ryoo and Aggarwal,
CVPR 2006
![Page 7: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/7.jpg)
7
Atomic actions
Operation triplets <agent, motion, target>
Gesture together with subject and object information.
Unit human activity.
Computed based on gestures.
Ex> person1 stretches his/her arm
<p1’s arm, stretch, null>
Time intervals
Ex> Time intervals detected for Pointing action
P1:Head :222222222222222222
P1:ArmV :322021221111122333
P1:ArmH :100022222222221100
Sequences of poses
<p1‟s arm, stretch, p2>
<p1‟s arm, stay stret., p2>
<p1‟s arm, withdraw, null>
Time
Time intervals of operation triplets
![Page 8: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/8.jpg)
Punching is a sequence of hand
stretch and withdrawal.
Semantic layer recognition
8
… …
Video observations Time intervals of gestures
<p2‟s arm, stretch, p1>
<p2‟s arm, stay stret., p1>
<p2‟s arm, withdraw, null>
<p2‟s arm, stay withd., null>
<p2‟s head, face left, null>
<p1‟s arm, withdraw, nulll>
Time
Similar?
Machine-understandable
representation of Punching
![Page 9: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/9.jpg)
9
Human activity representation
Semantics
Knowledge on the structure
of an activity.
Punching is a sequence of
hand stretch and withdrawal.
Time intervals
Allen‟s temporal predicates
Syntax
Rules to construct formal
representation.
Organizes a set of
vocabularies to describe
the activities‟ structure.
Context-free grammar
Conceptual/verbal description
x=arm_
stretch(p1)
y=arm_
withdraw(p1)
this=Punching_action (p1) CFG
Machine-understandable language
Punching_action(i) = (
list( def(„x‟, Arm_Stretch(i)),
def(„y‟, Arm_Withdraw(i)) ),
and( meets(„x‟, „y‟),
and( starts(„x‟, „this‟),
finishes(„y‟, „this‟)) ) );
![Page 10: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/10.jpg)
10
Hierarchical activity representation
Representation of the „shake-hands‟ interaction
Description-based
ShakeHandsInteractions(i, j) = (
list( def(„x‟, ShakeHandsAction(i)),
list( def(„y‟, ShakeHandsAction(j)),
def(„z‟, TouchingInteraction(i, j))) ),
and( and( during(„z‟, „x‟), during(„z‟, „y‟)),
and( starts(„z‟, „this‟), finishes(„z‟, „this‟)))
);
ShakeHandsActions(i) = (
list( def(„x‟, Arm_Stretch(i)),
list( def(„y‟, Arm_Stay_Stretched(i)),
def(„z‟, Arm_withdraw(i))) ),
and( and( meets(„x‟, „y‟), meets(„y‟, „z‟)),
and( starts(„x‟, „this‟), finishes(„z‟, „this‟)))
);
TouchingInteraction(i, j)=(null, touch(i, j, 0));
Shake hands
interaction
CFG
Syntax Hand shake = “two persons do
shake-action (stretches, stays stretched,
withdraw)
simultaneously, while touching”.
![Page 11: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/11.jpg)
11
Hierarchical recognition algorithm
Recognition process of the „Shake-hands‟ interaction.
![Page 12: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/12.jpg)
12
Fighting Interaction(i, j)
Negative Interaction(i, j)
Fighting Interaction(i, j)
Fighting Interaction(i, j)
Negative Interaction(i, j)
Base Case
Fighting Interaction(i, j)
Negative Interaction(i, j)
Continued and recursive activities
Interaction „fighting‟
Composed of multiple negative interactions
Punching + kicking + pushing + punching + …
Iterative approach is taken.
![Page 13: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/13.jpg)
13
Experiments - Simple interactions
Recognized 8 types of simple interactions, which were
recognized in Park and Aggarwal, 2004
(approach, depart, point, shake-hands, hug, punch, kick, and
push)
A videos of a sequence of interactions are taken. (continuous
executions)
Interactions are described in more detailed and formal way,
resulting better recognition accuracy.
![Page 14: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/14.jpg)
14
Example Experiment - Fighting
Poses: Gestures
and
activities:
Input video: Processed video:
![Page 15: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/15.jpg)
Past-Now-Future networks
Pinhanez and Bobick 1998
PNF networks to represent temporal structure
of an activity.
Kitchen activities:
15
[Pinhanez, C. S. and Bobick, A. F., Human action detection using PNF propagation
of temporal constraints. CVPR 1998]
![Page 16: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/16.jpg)
Event logic
Siskind 2001
Logical concatenations of predicates
Time intervals?
16
[Siskind, J. M., Grounding the lexical semantics of verbs in visual perception using force
dynamics and event logic. Journal of Artificial Intelligence Research (JAIR) 15, 2001]
![Page 17: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/17.jpg)
Representation languages
Nevatia, Zhao,
and Hongeng 2003
VERL - language
Vu, Bremond,
Thonnat 2003
Similar to Nevatia
et al. 2003
Recursive? Uncertainties? 17
![Page 18: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/18.jpg)
Stochastic approaches
Limitations of the conventional description-
based approaches
Uncertainties? – stochastic recognition
Probabilistic framework needed
[Ryoo and Aggarwal, IJCV 2009]
[Tran and Davis, ECCV 2008] - MLN
18
![Page 19: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/19.jpg)
19
MoveR(s1) MoveR(s1) MoveR(p1) MoveR(p1) equals equals
Hierarchical matching algorithm
Recognition process tree of „steal(p1, s1, p2)‟
P Obj: (person p1, suitcase s1) P Obj: (person p2, suitcase s1)
Steal(p1, s1, p2)
Carry(p1, s1)
Parameter Objects:
(person p1, suitcase s1, person p2)
Carry(p2, s1) meets meets
MoveL(s1) MoveL(p2)
Recognition results from the object and motion layers
Stay(s1)
Recognition results from the object and motion layers
MoveL(p2) MoveL(s1)
Carry(p1, s1) Stay(s1) Carry(p2, s1)
Steal(p1, s1, p2)
P(A1 | O1, M1) = 0.8 P(A2 | O2, M2) = 0.7
P(A3 | O3, M3) = 0.5
P(A4 | O4, M4) = 0.9 P(A5 | O5, M5) = 0.9
P(S | O1, M1, …) = 0.8
![Page 20: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/20.jpg)
Probabilistic recognition
Probability of the activity given observation
20
Gesture detection
confidence
Description-based
.
.
.
.
Structural
similarity
![Page 21: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/21.jpg)
21
Experiments
Recognized following six types of interactions. Each activity was tested with at least 10 sequences.
Carrying a box, leaving a box, placing a box into a trash bin.
Carrying a suitcase, leaving a suitcase, stealing the suitcase.
Object and Motion layer trained with 5 sequences.
Time
Carry(Person1, SuitCase1) :
Stay(SuitCase1) :
Carry(Person2, SuitCase1) :
Steal(Person1, SuitCase1, Person2) :
![Page 22: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/22.jpg)
22
Experiments
Example
a person placing a box into a trash bin
Time
Move(Person1, right) :
Move(Box1, right) :
Move(Person1, left) :
Move(Box1, down) :
Carry(Person1, Box1) :
Trash(Person1, Box1, TrashBin1) :
![Page 23: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/23.jpg)
23
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Recognition accuracy (true positives): Compared with a multi-object version of previous works.
False positives rates are almost zero for all activities.
Experiments - Performance
Carry -Suitcase
Leave -Suitcase
Steal -Suitcase
Carry -Box
Leave -Box
Trash -Box
Total
2 atomics
1 spatials
3 atomics
1 spatials
5 atomics
2 spatials
2 atomics
1 spatials
3 atomics
1 spatials
3 atomics
3 spatials
P
![Page 24: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/24.jpg)
24
Advantages
Ability to represent and recognize an activity composed of concurrent sub-events. Ex> “touching occurred during pushing”
Ability to represent and recognize „recursive activities‟ Ex> Fighting = Fighting + another negative interaction.
Less data required for training. „Structure of activities‟ are encoded based on human knowledge.
High recognition accuracy?
this==Pushing_interactions(p1,p2)
i=Arm_Stretch(p1)
j=Arm_Stay_Stretched(p1)
l =Depart(p2, p1)
k =Touch(p1, p2)
![Page 25: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/25.jpg)
25
Description-based
approaches
Group activities
2008, 2009
![Page 26: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/26.jpg)
Group activity
Events performed by
groups
Various types of
complex activities
Group-person interaction
Group-group interaction
Uncertain nature
Varying # of
participants
Dynamic spatial
relation
26 26
Group Assault (grp vs. per)
A person with red shirts is
taking the laptop on the table
while the others are talking
Group Stealing (grp vs. grp)
Ryoo and Aggarwal,
CVPR-SIG 09, IJCV 11
![Page 27: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/27.jpg)
Formal representation:
1. Member variables
2. Time intervals
3. Predicates
Formal representation:
1. ∃ a in Thieves, ∀ b in Owners,
∃ c in Thieves
2. def(t1, TakeObject(a)),
def(t2, Distract(c, b))
3. equals (t1, this),
during (t1, t2)
27
Representation
Group stealing
Distract! Distract!
Distract!
∀ ∀
∃
∃ ∃
Take object!
∃ Distract (r1, g1)
Distract (r2, g1)
Distract (r3, g2)
TakeObject (r4)
Stealing(R, G)
Time
Time intervals of activities of
individual members
Owners
Thieves
Object
a
b b
c c c
during
equals
t2
t1
this
![Page 28: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/28.jpg)
28
Recognition overview
3 key components
Recognition:
Obtain a pool of group member candidates
with non-zero probability.
Not many persons perform sub-events.
Who?
Distract Distract
Distract
Take
Stand
What? When?
Distract
Take
Stand
Distract
Distract
Generates a pool of member candidates
)|)((maxarg* Mt
M OMGPM
NP-hard. Approximation required.
![Page 29: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/29.jpg)
29
Temporal constraints
Hierarchical temporal constraint matching.
Steal(T, O)
Approach(c, b)
Member variables:
∃a in Thieves, ∀b in Owners, ∃c in Thieves
TakeObject(a) before during
Video inputs
Distract(c, b)
Video inputs
Steal(T, O)
Distract(c, b) Approach(c, b) TakeObject(a)
![Page 30: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/30.jpg)
30
Group candidates
Among possible groupings,
Find a set of group members:
which maximizes the overall
probability.
Bayesian formulation
where
)|)((max)|( Mt
M
t OMGPOGP
)()(
)(max
MM
M
GG
GM
))(())(|()( MGPMGOPM tt
MG
},...,,{ ||21 MmmmM
![Page 31: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/31.jpg)
31
Bayesian formulation
Ci: persons performing ith sub-event.
Essential and anti-essential relations: Ki, Li
1
2
3
4
5
6
1
2
3
4
5
6
1
2
3
4
5
6
1
2
3
4
5
6
Case1:
∃∃ Case2:
∀∃
Case3:
∃∀ Case4:
∀∀
ii CKk
iti
iii OkSECK ]|)([||
tnn
t SS
ttn
n
ttn
n
t
M
t
M MGSSPSSQMOPMGOP,...,
1
1
1
11
1
))(|,...,(),...,,,|())(|(
tnn
t iiiiiiSS
t
i
KCLKCK
i
tti
i
rel
ba
GMGed
MGSPSSrelP
M,...,
|)|/||||/|(|1
1)(
))(|(),|(
)(
ii CLl
iti
iii OlSECL ]|)([||
Represented relations Represented sub-events
Structural similarity
![Page 32: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/32.jpg)
32
Markov chain Monte Carlo
MCMC-based probability estimation.
Provides a set of samples from the distribution.
Models the probability distribution.
Metropolis-Hastings algorithm
Actions:
Add:
Remove:
)',()(
),'()'(
MMqM
MMqMa
G
G
),1min()',( 1 aMMP t
it CmmMM where}{' 1
}{' 1 mMM t
Add
Add Add Remove
Add
![Page 33: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/33.jpg)
33
Experimental setting
We have tested 45 sequences of 8 activities. 320*240 with 10 fps
CCTV videos download from YouTube. Group stealing in Malaysia and group arresting in UK.
Videos that we have taken with 10 participants in various environments. A group of people carrying a large object.
A group of people assaulting a person.
Videos of real human activities
![Page 34: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/34.jpg)
34
Experiments - stealing
Group stealing
One of thieves
steals a laptop,
while the other
thieves are
distracting the
shop owner.
Laptop
Thieves
Owners
![Page 35: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/35.jpg)
35
Experiments - arresting
Group arresting
A group of
policemen
arresting a group
of suspicious
persons.
Color histogram
Pedestrians
Policemen
Criminal
candidates
![Page 36: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/36.jpg)
36
Experiments – group assault
Highly stochastic
There may be (and may not be) attackers whose
guarding the area, or just watching.
10 videos.
![Page 37: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/37.jpg)
37
Experiments – group assault
Highly stochastic
There may be (and may not be) attackers whose
guarding the area, or just watching.
10 videos.
![Page 38: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/38.jpg)
38
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Experimental results
Recognition accuracy
False positive rates are almost 0 because of the detailed
representations: previous, deterministic, stochastic.
Move G
Carry G
CarryCmd GP
Fight IG
Steal GG
Arrest GG
Total
∀ ∃∀
Fight GG
∀∃ ∃∃ ∃∀∃ ∀∃ ∃∀
Assault GG
∀∃∃
![Page 39: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/39.jpg)
39
Spatio-Temporal
Relationship Match
2009
![Page 40: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/40.jpg)
40
Description-based vs. Space-time
Description-based
High-level activities
Hierarchical
Semantic structures
Difficult to cope with
noise
Space-time
Reliable under noise
Difficult to model
complex activities
Miss semantic
structures
t
![Page 41: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/41.jpg)
41
Space-time approaches
Video classification
Each video is represented as a histogram
Limitation:
Unable to model complex activities
Pushing
Hugging Shaking hands
Punching
t
Spatio-temporal features
Laptev 04,
Dollar et al. 05
![Page 42: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/42.jpg)
42
Spatio-temporal relations (STRs)
t
t
before(17, 12)
x
y
t
12
35
17
5
overlaps(12, 35)
overlaps(5, 17)
...
x
y
t
12
35
17
5
overlaps(12, 35)
before(17, 12)
...
equals(5, 17)
Videos Feature relations Ryoo and Aggarwal,
ICCV 2009
![Page 43: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/43.jpg)
43
Histogram of STRs
before(17, 12)
x
y
t
12
35
17
5
overlaps(12, 35)
overlaps(5, 17)
...
x
y
t
12
35
17
5
overlaps(12, 35)
before(17, 12)
...
equals(5, 17)
![Page 44: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/44.jpg)
44
STR-match learning
Supervised learning
Videos with activity labels are provided.
Shaking hands
Hugging Pushing
Punching
Feature space
Histogram of Relationships
?
![Page 45: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/45.jpg)
45
STR equations
STR match considers distributions of pair-
wise relationships among features.
Histogram construction
STR distance:
![Page 46: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/46.jpg)
46
expected_starting(vtest)
STR-match activity detection
Must detect starting time and ending time
Models starting XYT location of an activity.
Each feature pair in a matching training video
makes a vote.
x
y
t
12
35
17 5
vtrstart
x
y
t
12
35
17
5
Original starting Expected starting
![Page 47: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/47.jpg)
47
Hierarchical recognition
Atomic action detections as new features
Localization ability enables hierarchical
recognition
![Page 48: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/48.jpg)
48
Experiments
KTH dataset
Public dataset composed of simple actions
Walking, jogging, running, waving, …
![Page 49: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/49.jpg)
49
Experiments: high-level activities
High-level human activity detection results
Changing backgrounds, lighting conditions, …
![Page 50: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/50.jpg)
50
STR-match summary
Detection from continuous videos
Localization using voting-based method
Noisy observations
Different backgrounds/lightings
Uncertainties
Human-human interactions
Hierarchical recognition
Future work
Hierarchy learning algorithm
![Page 51: cvpr2011: human activity recognition - part 5: description based](https://reader033.vdocuments.us/reader033/viewer/2022042815/557e78f4d8b42a4d108b4c8a/html5/thumbnails/51.jpg)
Description-based: References Allen, J. F. and Ferguson, G., Actions and events in interval temporal logic. Journal of Logic and
Computation 1994.
Pinhanez, C. S. and Bobick, A. F., Human action detection using PNF propagation of temporal
constraints. CVPR 1998.
Siskind, J. M., Grounding the lexical semantics of verbs in visual perception using force dynamics
and event logic. Journal of Artificial Intelligence Research (JAIR) 15, 2001.
Nevatia, R., Zhao, T., and Hongeng, S., Hierarchical language-based representation of events in
video streams. In IEEE Workshop on Event Mining 2003.
Vu, V.-T., Bremond, F., and Thonnat, M., Automatic video interpretation: A novel algorithm for
temporal scenario recognition. IJCAI 2003.
Ryoo, M. S. and Aggarwal, J. K., Recognition of composite human activities through context-free
grammar based representation. CVPR 2006.
Ryoo, M. S. and Aggarwal, J. K., Hierarchical recognition of human activities interacting with objects.
CVPR-SLAM 2007.
Tran, S. D. and Davis, L. S., Event modeling and recognition using markov logic networks. ECCV
2008.
Ryoo, M. S. and Aggarwal, J. K., Semantic representation and recognition of continued and
recursive human activities. IJCV, 2009.
Ryoo, M. S. and Aggarwal, J. K., Spatio-temporal relationship match: Video structure comparison
for recognition of complex human activities. ICCV 2009.
Ryoo, M. S. and Aggarwal, J. K., Stochastic representation and recognition of high-level group
activities. IJCV, 2010.
51