Download - VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING
![Page 1: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/1.jpg)
VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS:
A VIDEO MINING SYSTEM FOR RETAIL MARKETING
Alex Leykin Indiana University
PhD Thesis by:
![Page 2: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/2.jpg)
Motivation
• Automated tracking and activity recognition is missing from marketing research
• Hardware is already there• Visual information can reveal a lot about
human interactions with each other • Help in making intelligent marketing
decisions
![Page 3: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/3.jpg)
Goals
Extract semantic information from the tracks (Activity Analysis)
Process visual information to get a formal representation of human locations (Visual Tracking)
![Page 4: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/4.jpg)
Related Work: Detection and Tracking• Yacoob and Davis “Learned models for estimation of
rigid and articulated human motion from stationary or moving camera” IJCV 2000
• Zhao and Nevatia “Tracking multiple humans in crowded environment” CVPR 2004
• Haritaoglu, Harwood, and Davis “W-4: Real-time surveillance of people and their activities” PAMI 2000
• J. Deutscher, B. North, B. Bascle and A. Blake “Tracking through singularities and discontinuities by random sampling”, ICCV 1999
• A. Elgammal and L. S. Davis, “Probabilistic Framework for Segmenting People Under Occlusion”, ICCV 2001.
• M. Isard, J. MacCormick, “BraMBLe: a Bayesian multiple-blob tracker”, ICCV 2001
![Page 5: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/5.jpg)
Related Work: Activity Recognition
• Haritaoglu and Flickner “Detection and tracking of shopping groups in stores” CVPR 2001
• Oliver, Rosario, and Pentland “A bayesian computer vision system for modeling human interactions” PAMI 2000
• Buzan, Sclaroff, and Kollios “Extraction and clustering of motion trajectories in video” ICPR 2004
• Hongeng, Nevatia, and Bremond “Video-based event recognition: activity representation and probabilistic recognition methods” CVIU 2004
• Bobick and Ivanov “Action recognition using probabilistic parsing” CVPR 1998
![Page 6: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/6.jpg)
System Components
Low-level Processing
Camera Model
Obstacle Model
Foreground Segmentation
Head Detection
![Page 7: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/7.jpg)
Background Modeling
Color• μRGB• Ilow • Ihi
codeword
codebook
………..
![Page 8: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/8.jpg)
Adaptive Background Update
If there is no match
if codebook is saturated then pixel is foreground else create new codeword
Else update the codeword with new pixel information
If >1 matches then merge matching codewords
I(p) > Ilow
I(p) < Ihigh
(RGB(p)∙ μRGB) < TRGB
t(p)/thigh > Tt1
t(p)/tlow > Tt2
Match pixel p to the codebook b
![Page 9: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/9.jpg)
Background Subtraction
![Page 10: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/10.jpg)
Head DetectionVanishing Point Projection (VPP) Historgram
Vanishing Point in Z-direction
![Page 11: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/11.jpg)
Camera Setup
• Two camera typesPerspective Spherical
• Mixtures of indoor and outdoor scenes• Color and thermal image sensors• Varying lighting conditions (daylight, cloud
cover, incandescent, etc.)
![Page 12: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/12.jpg)
Camera ModelingPerspective Projection Spherical Projection
X, Y, Z from:[sx; sy; s] = P [X; Y; Ż; 1] using SVDWhere P, is the 3x4 projection matrix
Assumption: floor plane Zf = 0
X = cos(θ) tan(π-φ)(Zc-Ż)Y = sin(θ) tan(π-φ)(Zc-Ż)Z = Ż
XY
Z
y
x
[Xc, Yc, Zc]
Lat
Lon[Xc, Yc, Zc]
XY
Z
![Page 13: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/13.jpg)
TrackingGoal: find a correspondence between the bodies, already detected in the
current frame with the bodies which appear in the next frame.
Apply Markov Chain Monte Carlo (MCMC) to estimate the next state
??
?
xt-1 xt
zt
?
Add bodyDelete body
Recover deletedChange Size
Move
![Page 14: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/14.jpg)
TrackingLocation of each pedestrian is estimated probabilistically based on: Current image Previous state of the system Physical constraints
The goal of our tracking system is to find the candidate state x´ (a set of bodies along with their parameters) which, given the last known state x, will best fit the current observation z
P(x’| z, x) = L(z|x’) · P(x’{x})
observation likelihood state prior probability
![Page 15: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/15.jpg)
Tracking: Priors
N(hμ, hσ2) and N(wμ,wσ
2) body width and height
U(x)R and U(y)R body coordinates are weighted uniformly within the rectangular region R of the floor map.
d(wt, wt−1) and d(ht, ht−1) variation from the previous size
d(xt, x’t−1) and d(y, y’t−1) variation from Kalman predicted position
N(μdoor, σdoor) distance to the closest door (for new bodies)
Constraints on the body parameters:
Temporal continuity:
![Page 16: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/16.jpg)
Tracking Likelihoods: Distance weight plane
2hPz
Problem: blob trackers ignore blob position in 3D (see Zhao and Nevatia CVPR 2004) Solution: employ “distance weight plane” Dxy = |Pxyz, Cxyz| where P and C are world
coordinates of the camera and reference point correspondingly and
![Page 17: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/17.jpg)
Tracking Likelihoods: Z-buffer
0 = background, 1=furthermost body, 2 = next closest body, etc
![Page 18: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/18.jpg)
Tracking Likelihoods: Color Histogram
),(11 1 ttcolorcolor ccBwP
I
DZOIP xyZ
)( )0(
O
DIZOP xyZ
)( )0(
Implementation of z-buffer (Z) and distance weight plane (D) allows to compute multiple-body configuration with one computationally efficient step.Let: I - set of all blob pixels O - set of body pixels
Color observation likelihood is based on the Bhattacharya distance between candidate and observed color histograms
![Page 19: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/19.jpg)
Tracking: Anisotropic Weighted Mean Shift
Classic Mean-Shift Our Mean-Shift
t-1t
H
t
![Page 20: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/20.jpg)
![Page 21: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/21.jpg)
Actors and events
• Shopper groups are formed by individual shoppers who shop together for some amount of time– More than fleeting crossing of paths – Dwelling together– Splitting and uniting after a period of time
![Page 22: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/22.jpg)
Swarming
• Shopper groups detected based on “swarming” idea in reverse– Swarming is used in graphics to generate
flocking behaviour in animations. – Rules define flocking behaviour:
• Avoid collisions with the neighbors.• Maintain fixed distance with neighbors• Coordinate velocity vector with neighbors.
![Page 23: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/23.jpg)
Tracking Customer Groups
• We treat customers as swarming agents, acting according to simple rules (e.g. stay together with swarm members)
5
16
10
Customer groups
![Page 24: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/24.jpg)
Terminology
• Actors: shoppers (bodies detected in tracking)– (x, y, id)
• Swarming events defined as short time activity sequences of multiple agents interacting with each other.– Could be fleeting (crossing paths)– Later analysis sorts this out and ignores
chance encounters.
![Page 25: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/25.jpg)
Swarming
• The actors that best fit this model signal a Swarming Event
• Multiple swarming events are further clustered with fuzzy weights to find out shoppers in the same group over long periods.
11
1213
![Page 26: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/26.jpg)
• Two actors come sufficiently close according to some distance measure:– Relative position pi=(xi, yi) of actor i on the floor– Body orientations αi– Dwelling state δi={T,F}.
Event detection
Distance between two agents is a linear combination of co-location, co-ordination and co-dwelling
![Page 27: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/27.jpg)
Event detection
Perform agglomerative clustering of actors a into clusters C• Initialize: N singleton clusters • Do: merge two closest clusters• While not: validity index I reaches its maximum
I consists of isolation Ini and compactness Inc
Ini = isolation
Inc = compactness
![Page 28: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/28.jpg)
Event detection
# Iteration # Iteration
Final events
![Page 29: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/29.jpg)
Activity Detection
• The shopper group detection is accomplished by clustering the short term events over long time periods. – The events could be separated in time, but
they will be part of the same shopper group if the actors are the same (the first term).
![Page 30: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/30.jpg)
Activity detection
• Higher level activities (shopper groups) detected using these events as building blocks over longer time periods
• Some definitions:– Bei={b ei} the set of all bodies taking part in
an event ei.– τei and τej are the average times of events ei
and ej happening.
![Page 31: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/31.jpg)
Activity detection
2
22112 )||(
||
|)()(|),(
ji
ji
ijji
eeee
eeeejie BB
BBBBeeD
Define a measure of similarity between two events
Overlap between two sets of actors Separation in time
![Page 32: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/32.jpg)
Activity detection• Perform fuzzy agglomerative clustering• Minimize objective function
• where wij are fuzzy weights• and asymmetric variants of Tukey’s biweight estimators:
• (.) is the loss function from robust statistics.• ψ(.) is the weight function
Adaptively choose only strong fuzzy clusters
Label remaining clusters as activities
![Page 33: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/33.jpg)
Results: Swarming activities detected in space-time
• Dot location: average event location
• Dot size: validity• Dots of same color: belong to
same activity
![Page 34: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/34.jpg)
Group Detection Results
![Page 35: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/35.jpg)
Quantitative Results
![Page 36: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/36.jpg)
Tracking
Sequence
number
Frames
People
People
missed
False hits
Identity switches
1 1054
15 3 1 3
2 0601
8 0 0 0
3 1700
16 5 1 2
4 1506
3 0 0 0
5 2031
2 0 0 0
6 1652
4 0 0 0
%% 8544
48 12.5
4.1 10.4
![Page 37: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/37.jpg)
Group DetectionSequence Groups P+ P− Partial
1 20 0 7 02 17 1 3 13 17 0 7 0
Total 54 1 12 2Percent 100 1.8 22.2 3.7
Ground truth(manually determined)
false positives
false negatives(groups missed)
Partially identified groups(≥2 people in the group Correctly identified)
![Page 38: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/38.jpg)
Qualitative Assesments• Longer paths provide better group detection
(pval << 1)• Two-people groups are easiest to detect• Simple one-step clustering of trajectories is not
sufficient for long-term group detection• Employee tracks pose a significant problem and
have to be excluded• Several groups were missed by the operator in
the initial ground truth– System caught groups missed by the human expert
after inspection of results.
![Page 39: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/39.jpg)
Contributions– BG subtraction based on codebook (RGB+thermal)– Introduced head candidate selection method based on
VPP histogram– Resolving track initialization ambiguity and non-unique
body-blob correspondence– Informed jump-diffuse transitions in MCMC tracker– Weight plane and z-buffer improve likelihood estimation– Anisotropic mean-shift with obstacle model– Two-layer formal framework high level activity detection – Implemented robust fuzzy clustering to group events
into activities
![Page 40: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/40.jpg)
Future Work• Improved Tracking (via feature points)• Demographical analysis• Focus of Attention• Sensor Fusion• Other Types of Swarming Activities
![Page 41: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/41.jpg)
Questions?
Thank you!
![Page 42: VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING](https://reader035.vdocuments.us/reader035/viewer/2022062521/56816945550346895de0d0c1/html5/thumbnails/42.jpg)
|,||,||,|),( 321 jijijiji wwppwbbd