presented by: shahab helmi fall 2015. title: activity inference in wide aerial video surveillance...
TRANSCRIPT
WAASPresented by: Shahab Helmi
Fall 2015
Papers InfoTitle:
Activity Inference in Wide Aerial Video Surveillance Using Entity Relationship Models
Authors: Jongmoo Choi, University of Southern California
Yann Dumortier, DXO Labs
Jan Prokaj, University of Southern California
Ge´rard Medioni, University of Southern California
Publication: ACM Transactions on Spatial Algorithms and Systems, Vol. 1, No. 1, Article 1, Publication date:
January 2014.
Papers Info (2)Title:
Learning Symbolic Descriptions of Activities from Examples in WAAS
Authors: Jongmoo Choi and Gérard Medioni
Institute for Robotics and Intelligent Systems
University of Southern California, USA
{jongmooc,medioni}@usc.edu
Publication: ACM Transactions on Spatial Algorithms and Systems, Vol. 1, No. 1, Article 1, Publication date:
January 2014.
Outline
1. Introduction and Approach Overview
2. Related Work
3. ERM-Based Activity Recognition
4. Experimental Results
1. IntroductionWhat is WAAS?
An activity recognition system for wide aerial video surveillance where vehicular segmented tracks are the essential components. By leveraging the powerful computational features of a RDBMS, which provides efficient data structures for the ERM, an activity is defined as a SQL query.
It is demonstrated that different types of activities, with hierarchical structure, multiple actors, and (geo) context information, can be effectively defined and inferred using the ERM framework.
The approach is validated on noisy visual tracks estimated from real data using a state of the art tracker [Prokaj et al. 2011] for wide area aerial imagery.
1. System Overview
Geo-spatial data source: Open Street Map
2. Related Work
Activity Recognition Single motion patterns by a single actor could be represented by:
Represented by non-parametric (e.g. template matching and dimensional reduction)
Volumetric (e.g. space-time filtering and tensors)
Parametric (e.g. Hidden Markov Models)
Linear dynamic Systems
complex sequences of actions, are represented using: Graphical models (e.g. Dynamic Bayesian Nets and Petri nets)
Syntactic (e.g. Context Free Grammars and Attribute Grammars)
Knowledge Based (e.g. Constraint Satisfaction, Logic Rules, and Ontologies)
2. Related Work (2)
Activity recognition in wide area surveillance:
Reilly et al.: the scene is divided into grid cells and the tracking problem is solved within each cell using bipartite graph matching and then tracks are linked across cells.
Pollard et al. present activity detection results using a complex probabilistic framework but only a single activity, convoys, is proposed and geospatial constraints are not considered.
Pozdnoukhov prposed a new framework for semi-supervised nonlinear embedding methods, built on a neural network that optimizes the graph- based cost function, in order to analysis large-scale spatio-temporal network data. large sets of mobile objects’ tra- jectories are distributed to a network of database servers by using Space-Partitioned Moving Objects Databases.
3. ERM-Based Activity Recognition (Outline)1. Computing tracks from imagery
2. Tracklets from tracks
3. Geo-registration of tracks
4. Activity Representation Using ERM
5. Activity Inference
6. Scalability
3.1. Computing Tracks from Imagery
The proposed implementation uses a state of the art real-time tracker for wide area imagery introduced in [Prokaj et al. 2011].
Uses sliding window of size 16 frames.
Removes background using background subtraction methods -> will lose track of object when it is stopped!
the algorithm is able to run at real-time speeds (> 2 fps) on large format imagery (2K × 2K).
3.2. Tracklets from Tracks
Tracklet: the atomic spatio-temporal which is a segmented portion of a track representing vehicle’s “instantaneous” motion, like going straight, turning left or turning right, with intensity.
Each tracklet has a collection of attributes xi = {λ1, λ2, · · · , λm}, where an element λi presents a physical property such as time, location, and speed.
The trajectory segmentation optimally, using a classic dynamic programming algorithm segmented least squares”.
Only turning points are kept + their attributes such as speed, heading … and other points are replaced by lines connecting these points.
3.3. Geo-registration of Tracks
Geo-tagging the tracking data.
Estimating camera pose allows us to transform the position of a computed track with respect to the image plane to the corresponding geo-registered track point with respect to the world system.
This produces an accuracy of about 5m ( ∼ 10 pixels) for the CLIF dataset [CLIF2006 2006].
3.4. Activity Representation Using ERM We represent tracks {o}, tracklets {x}, and track points {p}, as entities and link the three
entities: {o} {x} {p}⊃ ⊃ .
The same representation is used for spatial objects. For example, an entity “road” is a collection of road segments and each segment has a set of attributes such as type, name, and speed-limit.
An activity aj is defined as a collection of tracklets obeying certain properties:aj = {x|x ∈ Ωj ,Cj(x) > θj}, where Ωj , Cj(x) ∈ [0, 1], and θj represent the relationship associated with the activity, the confidence function and the recognition threshold. For example, Speeding can be seen as an activity defined by the relationship between the
attributes of tracklets (e.g. speed) and geospatial objects (e.g. speed-limit): speeding := {x| r G∈ road, x.roadID = r.ID, x.s > r.s, C(x.s, r.s) > θ}
3.5. Activity InferenceExample I: Simple activity: Loop
A Loop is defined as a segmented track where there exist two tracklets {xi, xj} whose Euclidean distance x∥ i.pos − xj.pos is smaller than the traveling distance:∥
Loop = {xi, xj | (1 − x∥ i:pos−xj:pos x∥ j:acc−xi:acc ) > θ, i < j, xi.ID = xj .ID}, where (xj.acc − xi.acc) represent the traveling distance between xi and xj .
SELECT * FROM
T1, T2WHERE T1.track id = T2.track id AND(1 -(dist(T1.pos, T2.pos)/(T2.acc -T1.acc))) > θ
3.5. Activity Inference (2)Example II: Composite Activity: Visit
Suppose that we have three independent events identified as three entity sets: Entry (aEn), Stay (aSt) and Exit (aEx). Visit is a composite activity that can be described as a combination of these events.
SELECT * FROM T1, T2, T3, En, St, ExWHERE
T1.track id = T2.track id ANDT2.track id = T3.track id ANDT1.id + 1 = T2.id ANDT2.id + 1 = T3.id ANDT1.id = En.id ANDT2.id = St.id ANDT3.id = Ex.id AND(En.conf * St.conf * Ex.conf) > θ
Experimental Results
1. Datasets
2. Activities (Queries)
3. Scalability