visual human tracking and group activity analysis: a video mining system for retail marketing
Post on 26-Feb-2016
93 Views
Preview:
DESCRIPTION
TRANSCRIPT
VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS:
A VIDEO MINING SYSTEM FOR RETAIL MARKETING
Alex Leykin Indiana University
PhD Thesis by:
Motivation
• Automated tracking and activity recognition is missing from marketing research
• Hardware is already there• Visual information can reveal a lot about
human interactions with each other • Help in making intelligent marketing
decisions
Goals
Extract semantic information from the tracks (Activity Analysis)
Process visual information to get a formal representation of human locations (Visual Tracking)
Related Work: Detection and Tracking• Yacoob and Davis “Learned models for estimation of
rigid and articulated human motion from stationary or moving camera” IJCV 2000
• Zhao and Nevatia “Tracking multiple humans in crowded environment” CVPR 2004
• Haritaoglu, Harwood, and Davis “W-4: Real-time surveillance of people and their activities” PAMI 2000
• J. Deutscher, B. North, B. Bascle and A. Blake “Tracking through singularities and discontinuities by random sampling”, ICCV 1999
• A. Elgammal and L. S. Davis, “Probabilistic Framework for Segmenting People Under Occlusion”, ICCV 2001.
• M. Isard, J. MacCormick, “BraMBLe: a Bayesian multiple-blob tracker”, ICCV 2001
Related Work: Activity Recognition
• Haritaoglu and Flickner “Detection and tracking of shopping groups in stores” CVPR 2001
• Oliver, Rosario, and Pentland “A bayesian computer vision system for modeling human interactions” PAMI 2000
• Buzan, Sclaroff, and Kollios “Extraction and clustering of motion trajectories in video” ICPR 2004
• Hongeng, Nevatia, and Bremond “Video-based event recognition: activity representation and probabilistic recognition methods” CVIU 2004
• Bobick and Ivanov “Action recognition using probabilistic parsing” CVPR 1998
System Components
Low-level Processing
Camera Model
Obstacle Model
Foreground Segmentation
Head Detection
Background Modeling
Color• μRGB• Ilow • Ihi
codeword
codebook
………..
Adaptive Background Update
If there is no match
if codebook is saturated then pixel is foreground else create new codeword
Else update the codeword with new pixel information
If >1 matches then merge matching codewords
I(p) > Ilow
I(p) < Ihigh
(RGB(p)∙ μRGB) < TRGB
t(p)/thigh > Tt1
t(p)/tlow > Tt2
Match pixel p to the codebook b
Background Subtraction
Head DetectionVanishing Point Projection (VPP) Historgram
Vanishing Point in Z-direction
Camera Setup
• Two camera typesPerspective Spherical
• Mixtures of indoor and outdoor scenes• Color and thermal image sensors• Varying lighting conditions (daylight, cloud
cover, incandescent, etc.)
Camera ModelingPerspective Projection Spherical Projection
X, Y, Z from:[sx; sy; s] = P [X; Y; Ż; 1] using SVDWhere P, is the 3x4 projection matrix
Assumption: floor plane Zf = 0
X = cos(θ) tan(π-φ)(Zc-Ż)Y = sin(θ) tan(π-φ)(Zc-Ż)Z = Ż
XY
Z
y
x
[Xc, Yc, Zc]
Lat
Lon[Xc, Yc, Zc]
XY
Z
TrackingGoal: find a correspondence between the bodies, already detected in the
current frame with the bodies which appear in the next frame.
Apply Markov Chain Monte Carlo (MCMC) to estimate the next state
??
?
xt-1 xt
zt
?
Add bodyDelete body
Recover deletedChange Size
Move
TrackingLocation of each pedestrian is estimated probabilistically based on: Current image Previous state of the system Physical constraints
The goal of our tracking system is to find the candidate state x´ (a set of bodies along with their parameters) which, given the last known state x, will best fit the current observation z
P(x’| z, x) = L(z|x’) · P(x’{x})
observation likelihood state prior probability
Tracking: Priors
N(hμ, hσ2) and N(wμ,wσ
2) body width and height
U(x)R and U(y)R body coordinates are weighted uniformly within the rectangular region R of the floor map.
d(wt, wt−1) and d(ht, ht−1) variation from the previous size
d(xt, x’t−1) and d(y, y’t−1) variation from Kalman predicted position
N(μdoor, σdoor) distance to the closest door (for new bodies)
Constraints on the body parameters:
Temporal continuity:
Tracking Likelihoods: Distance weight plane
2hPz
Problem: blob trackers ignore blob position in 3D (see Zhao and Nevatia CVPR 2004) Solution: employ “distance weight plane” Dxy = |Pxyz, Cxyz| where P and C are world
coordinates of the camera and reference point correspondingly and
Tracking Likelihoods: Z-buffer
0 = background, 1=furthermost body, 2 = next closest body, etc
Tracking Likelihoods: Color Histogram
),(11 1 ttcolorcolor ccBwP
I
DZOIP xyZ
)( )0(
O
DIZOP xyZ
)( )0(
Implementation of z-buffer (Z) and distance weight plane (D) allows to compute multiple-body configuration with one computationally efficient step.Let: I - set of all blob pixels O - set of body pixels
Color observation likelihood is based on the Bhattacharya distance between candidate and observed color histograms
Tracking: Anisotropic Weighted Mean Shift
Classic Mean-Shift Our Mean-Shift
t-1t
H
t
Actors and events
• Shopper groups are formed by individual shoppers who shop together for some amount of time– More than fleeting crossing of paths – Dwelling together– Splitting and uniting after a period of time
Swarming
• Shopper groups detected based on “swarming” idea in reverse– Swarming is used in graphics to generate
flocking behaviour in animations. – Rules define flocking behaviour:
• Avoid collisions with the neighbors.• Maintain fixed distance with neighbors• Coordinate velocity vector with neighbors.
Tracking Customer Groups
• We treat customers as swarming agents, acting according to simple rules (e.g. stay together with swarm members)
5
16
10
Customer groups
Terminology
• Actors: shoppers (bodies detected in tracking)– (x, y, id)
• Swarming events defined as short time activity sequences of multiple agents interacting with each other.– Could be fleeting (crossing paths)– Later analysis sorts this out and ignores
chance encounters.
Swarming
• The actors that best fit this model signal a Swarming Event
• Multiple swarming events are further clustered with fuzzy weights to find out shoppers in the same group over long periods.
11
1213
• Two actors come sufficiently close according to some distance measure:– Relative position pi=(xi, yi) of actor i on the floor– Body orientations αi– Dwelling state δi={T,F}.
Event detection
Distance between two agents is a linear combination of co-location, co-ordination and co-dwelling
Event detection
Perform agglomerative clustering of actors a into clusters C• Initialize: N singleton clusters • Do: merge two closest clusters• While not: validity index I reaches its maximum
I consists of isolation Ini and compactness Inc
Ini = isolation
Inc = compactness
Event detection
# Iteration # Iteration
Final events
Activity Detection
• The shopper group detection is accomplished by clustering the short term events over long time periods. – The events could be separated in time, but
they will be part of the same shopper group if the actors are the same (the first term).
Activity detection
• Higher level activities (shopper groups) detected using these events as building blocks over longer time periods
• Some definitions:– Bei={b ei} the set of all bodies taking part in
an event ei.– τei and τej are the average times of events ei
and ej happening.
Activity detection
2
22112 )||(
||
|)()(|),(
ji
ji
ijji
eeee
eeeejie BB
BBBBeeD
Define a measure of similarity between two events
Overlap between two sets of actors Separation in time
Activity detection• Perform fuzzy agglomerative clustering• Minimize objective function
• where wij are fuzzy weights• and asymmetric variants of Tukey’s biweight estimators:
• (.) is the loss function from robust statistics.• ψ(.) is the weight function
Adaptively choose only strong fuzzy clusters
Label remaining clusters as activities
Results: Swarming activities detected in space-time
• Dot location: average event location
• Dot size: validity• Dots of same color: belong to
same activity
Group Detection Results
Quantitative Results
Tracking
Sequence
number
Frames
People
People
missed
False hits
Identity switches
1 1054
15 3 1 3
2 0601
8 0 0 0
3 1700
16 5 1 2
4 1506
3 0 0 0
5 2031
2 0 0 0
6 1652
4 0 0 0
%% 8544
48 12.5
4.1 10.4
Group DetectionSequence Groups P+ P− Partial
1 20 0 7 02 17 1 3 13 17 0 7 0
Total 54 1 12 2Percent 100 1.8 22.2 3.7
Ground truth(manually determined)
false positives
false negatives(groups missed)
Partially identified groups(≥2 people in the group Correctly identified)
Qualitative Assesments• Longer paths provide better group detection
(pval << 1)• Two-people groups are easiest to detect• Simple one-step clustering of trajectories is not
sufficient for long-term group detection• Employee tracks pose a significant problem and
have to be excluded• Several groups were missed by the operator in
the initial ground truth– System caught groups missed by the human expert
after inspection of results.
Contributions– BG subtraction based on codebook (RGB+thermal)– Introduced head candidate selection method based on
VPP histogram– Resolving track initialization ambiguity and non-unique
body-blob correspondence– Informed jump-diffuse transitions in MCMC tracker– Weight plane and z-buffer improve likelihood estimation– Anisotropic mean-shift with obstacle model– Two-layer formal framework high level activity detection – Implemented robust fuzzy clustering to group events
into activities
Future Work• Improved Tracking (via feature points)• Demographical analysis• Focus of Attention• Sensor Fusion• Other Types of Swarming Activities
Questions?
Thank you!
|,||,||,|),( 321 jijijiji wwppwbbd
top related