distributed video data fusion, analysis, and mining for video surveillance applications* edward...
TRANSCRIPT
![Page 1: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/1.jpg)
Distributed Video Data Fusion, Distributed Video Data Fusion, Analysis, and Mining for Video Analysis, and Mining for Video Surveillance Applications* Surveillance Applications*
Edward Chang2 and Yuan-Fang Wang1
Department of Electrical and Computer Engineering2
Department of Computer Science1
University of CaliforniaSanta Barbara, CA 93106
*Supported in part by NSF Career, ITR, IDM, and Infrastructure grants, and a gift from Proximex Corp.
![Page 2: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/2.jpg)
Problem Statement Video surveillance with
Multiple cameras Mobile, wireless networks Online data processing Intelligent, computer-assisted content analysis
Focus of current work Event Sensing for
detection representation, and Recognition of motion events
Sensor Network Management for Bandwidth and power resource conservation
![Page 3: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/3.jpg)
Potential Applications and Needs
Applications Emergency search and rescue in natural disaster Deterrence of cross-border illegal activities Reconnaissance and intelligence gathering in
digital battlefields Needs
Rapid deployment, dynamic configuration, and continuous operations
Robust and real-time data fusion and analysis Intelligent event modeling and recognition
![Page 4: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/4.jpg)
1x
1y1z
2x
2y
2z
mx
my
mz
X
Y
Z
TtZtYtXt ))(),(),(()( P
Ttytxt ))(),(()( 111 p
Internet
Slave station
Masterstation
Validation Scenario
![Page 5: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/5.jpg)
Research and Development Framework
Event detection Far-field coordination and update Near-field sensor data fusion
Event representation Hierarchical – multiple levels of detail Invariant – insensitive to incidental changes
Event recognition Temporally correlated event signature Imbalanced training set
![Page 6: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/6.jpg)
Event Detection: Near-field Sensor Data Fusion
Sensing coordination and intelligent data fusion
Two-level hierarchy of Kalman filter
Bottom level (feed forward) Summarize trajectories in local
state vectors Merge state vectors from multiple
cameras through registration parameters
Top level (feed backward) Fill in missing or occluded
trajectory pieces Camera pose & frame rate control
)0(
)0(
)0(
)0(
p
p
p
x
P
P
P
X
)()0( tz)()( tiz )()1( tmz
XTxworldreal
)0()0(
)0()0( xTX
realworld
XTxworldreal
mm
)1(
)1(
)1()1(
m
realworld m xTXInternet
Master fusion station
Slave stationSlave station
Slave station
)(
)(
)(
)(
i
i
i
i
p
p
p
x
)1(
)1(
)1(
)1(
m
m
m
m
p
p
p
x
![Page 7: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/7.jpg)
Event Detection: Far-field Coordination and Update
Minimizing Bandwidth and power consumption under pre-specified accuracy constraints
Dual Kalman filters Update necessary only when
predications diverge Cache dynamic algorithms instead of
static data
![Page 8: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/8.jpg)
Event Representation
Hierarchical Multiple levels of description
Syntactic level Semantic level
Invariant Descriptors unaffected by incidental changes of
environmental factors and camera pose Consequences
Be able to perform both “intra-class” and “inter-class” recognition
Recognize syntactic similarity (the same trajectory) and semantic similarity (the same type of trajectory)
![Page 9: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/9.jpg)
Event Representation: Syntactic Level
Normalization against View point (Affine or
perspective) Speed
To derive an invariant signature
![Page 10: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/10.jpg)
Event Representation: Semantic Level Segmentation based on acceleration Segment characterization Markov chain representation
?0P ierP no
?0V oyes constant? r
Stoppedyes no
Constantvelocity
Right spiral
yes no
yes no
Start
constant?
?0V o
?|| oVP
Left half turn
yes no
yes no
Slow down
?oVP
yes no
Right half turn
0)( zoVP
yes no
Right outwardturn
0)( zoVP
yes no
Rightinwardturn
0 oVP 0 oVP
Left outwardturn
Leftinwardturn
yes no yes no
?0V o
0/ dtd
yes no yes no
Right turn
Left turn
yes no
0/ dtd
Left spiral
yes no
Quickaccelerate
0 oVP
yes no
Quickstart
constant?
?0V o
?|| oVP
Left half Turn w.acc
yes no
yes noEmergency stop
?oVP
yes no
Right half turn w.acc
0)( zoVP
yes no
0)( zoVP
yes no
Rightoutwardturn w acc
0 oVP 0 oVP
yesno yes
no
?0V o
0/ dtd
yes no yes no
Left turn w. acc
yes no
0/ dtd
yes no
0/|| dtd r
Right half turn w.decel
yes
0/|| dtd r
Left half Turn w.decel
yesno no
0/|| dtd r
yes
0/|| dtd r
yesno no
0/|| dtd r
yes
0/|| dtd r
yesno no
Rightoutwardturn w decel
Rightinwardturn w acc
Rightinwardturn w decel
Leftoutwardturn w acc
Leftoutwardturn w decel
Leftinwardturn w acc
Leftinwardturn w decel
0/|| dtd r
yes
0/|| dtd r
yesno no
0/|| dtd r
yes
0/|| dtd r
yesno no
Left turn w. decel
Rightturn w. acc
Rightturn w. decel
Left turn w. acc
Left turn w. decel
RightTurn w. acc
Rightturn w. decel
![Page 11: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/11.jpg)
Event Representation: Semantic Level (cont.)
Left half turn
Left half turn w. acc
Left half turn w. decel
Left outwardspiral
Left outward spiral w. acc
Left outward spiral
w. decel
Left inwardspiral
Left inward spiral w. acc
Left inward spiral
w. decelConstant velocity
Speed up
Slow down
Left half turn
Left half turn w. acc
Left half turn w. decel
Left outwardspiral
Left outward spiral w. acc
Left outward spiral
w. decel
Left inwardspiral
Left inward spiral w. acc
Left inward spiral
w. decel Constant velocity
Speed up
Slow down
Left half turn
Left half turn w. acc
Left half turn w. decel
Left outwardspiral
Left outward spiral w. acc
Left outward spiral
w. decel
Left inwardspiral
Left inward spiral w. acc
Left inward spiral
w. decel Constant velocity
Speed up
Slow down
Left half turn
Left half turn w. acc
Left half turn w. decel
Left outwardspiral
Left outward spiral w. acc
Left outward spiral
w. decel
Left inwardspiral
Left inward spiral w. acc
Left inward spiral
w. decel Constant velocity
Speed up
Slow down
![Page 12: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/12.jpg)
Event Recognition: Sequence Data Learning
Similarity measurement difficult Sequence data with temporal correlation may
not have a vector space representation However, kernel methods (e.g., SVM) are
applicable No vector space representation OK But with feature space representation
Use DP algorithm for feature space distance metric Use hierarchical kernel recognition and fusion
![Page 13: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/13.jpg)
Event Recognition: Imbalanced Data Set Negative samples significantly
outnumber positive samples Bayesian risk associated with
false negative significantly outweighs false positive
Adaptive conformal mapping at decision boundary
![Page 14: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/14.jpg)
Event Recognition: Statistical Modeling
HMM is expensive to build
Not all behaviors are structured (e.g., loitering behaviors)
It may not be necessarily to understand individual activities before recognizing interaction
Distinguish interaction patterns Following Following-and-
gaining Stalking
![Page 15: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/15.jpg)
![Page 16: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/16.jpg)
Experimental Results: Syntactic Matching
![Page 17: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/17.jpg)
Experimental Results: Semantic Indexing
![Page 18: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/18.jpg)
Experimental Results: Biased Learning
=TP/(TP+FN)
=TN/(TN+FP)
threshold
penalty
![Page 19: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/19.jpg)
Experimental Results: Statistical Learning
![Page 20: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/20.jpg)
![Page 21: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/21.jpg)
Results
![Page 22: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/22.jpg)
Relevant Publications
Many details are omitted Sensor registration (spatial and temporal) Object tracking (Kalman and multi-state) Power management and routing
1. L. Jiao, G. Wu, Y. Wu, E. Y. Chang, and Y. F. Wang, “The Anatomy of A Multi-Camera Video Surveillance System,'' to appear in the ACM Multimedia System Journal.
2. K. Wu, J. Long, D. Han, and Y. F. Wang, “Human Activity Detection and Recognition for Video Surveillance,” Proceedings of IEEE International Conference on Multimedia Computing and Systems, 2004.
3. Edward Chang and Yuan-Fang Wang, "Toward Building a Robust and Intelligent video Surveillance System: A Case Study," (invited paper) Proceedings of the IEEE Multimedia and Expo Conference, Taipei, Taiwan, 2004.
4. R. Rangaswami, Z. Dimitrijevic, K. Kakligian, Edward Chang, and Yuan-Fang Wang, "The SfinX Video Surveillance System," Proceedings of the IEEE Multimedia and Expo Conference, Taipei, Taiwan, 2004.
5. G. Wu, Y. Wu, L. Jiao, Y. F. Wang, and E. Y. Chang, `”Multi-camera Spatio-temporal Fusion and Biased Sequence-data Learning for Security Surveillance,'' Proceedings of ACM Multimedia Conference, Berkeley, CA, 2003.
6. K. Wu, J. Long, D. Han, and Y. F. Wang, “Real-Time Multi-person Tracking in Video Surveillance,” Proceedings of the Pacific Rim Multimedia Conference, Singapore, 2003.
7. Y. Wu, L, Jiao, G. Wu, E. Chang, and Y. F. Wang, “Invariant Feature Extraction and Biased Statistical Inference for Video Surveillance,” Proceedings of the IEEE International Conference on Advanced Video and Signal-based Surveillance, Miami, FL, 2003.
![Page 23: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/23.jpg)
Focus of This Seminar
Video-based face tracking, modeling and recognition
Human activity and interaction analysis
![Page 24: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/24.jpg)
Video-Based Face Tracking & Recognition
Image-based Image normalization Feature selection Face recognition
Video-based Face region detection Tracking Face modeling and recognition
![Page 25: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/25.jpg)
Difficulties
Quality of video is low Large illumination, pose variation, occlusion
Face images are small Compared to still image-based system
Model construction and fitting Generic vs. personal-specific 2D vs. 3D
![Page 26: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/26.jpg)
Proposed Approach: Resolution Enhancement
Exploit multiple image frames and spatial coherency Single camera super-resolution (digital zoom) Multi-camera (master-slave) face region detection and
zooming (optical zoom) Need feature appearance (PCA + LDA) and
geometrical relations
![Page 27: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/27.jpg)
General Framework: Visual Servoing
A Feedback control mechanism Reference and real signals are computed
from images
- J-1 Camera Control +
External Disturbance
New Image
FeatureDection
Referencesignal
Realsignal
Errorsignal
Controlsignal
![Page 28: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/28.jpg)
Master-Slave Combo Setup
slaveslaveslaveworldworldmastermaster
slaveworldslaveworldmastermasterworldworldslaveslave
worldworldmastermaster
zf
ff
pTTp
pTTpPTp
PTp
),,,(
),,(),,(
1
X
Y
Z
X
X
Y
YZ Z
fslavep
),,,( slaveslaveworld zfT
worldmasterT
![Page 29: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/29.jpg)
Mater: Anatomy-Guided Face Modeling
Face region localization based on anatomy Face region detection based on skin color
segmentation Face region modeling based on ellipse fitting Face region tracking using mean-shift tracker
X
YZ
worldmasterT
X
Y
Z
![Page 30: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/30.jpg)
Slave: Master-Guided Zooming
X
Y
Z
X
X
Y
YZ Z
fslavep
),,,( slaveslaveworld zfT
worldmasterT
![Page 31: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/31.jpg)
What’s Next?
View-based recognition Frontal-view detection Multi-frame evidence aggregation 3D model (?)
![Page 32: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/32.jpg)
Single Camera Super resolution
Multiple, spatially-coherent frames as down-sampled, low-resolution (LR) images of original high-resolution (HR) images
![Page 33: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/33.jpg)
Mathematically
)(
,)(
2,)(
1,)(
,1)(
12)(
11
)()(2
)(1
)(1
)(12
)(11
kncmc
kmc
kmc
knc
kk
kmn
km
km
kn
kkk
kkkkk
IIIIII
IIIIII
I
I
nITBDI
Three components: Spatial registration function
(T) Blurring function (B) Down-sampling function
(D) c: down-sampling factor
![Page 34: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/34.jpg)
Spatial Registration Function
Modeled as affine transform Capture translation, rotation, and zooming In reality, only translation motion has been
successfully demonstrated
yyy
xxxk cba
cbaT
![Page 35: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/35.jpg)
Blurring Function
Modeled as Gaussian kernel Caveats:
point spread function (blurring) function may not be known and is wave-length dependent
Diffraction effect induces ripples and is better modeled with Besel functions
![Page 36: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/36.jpg)
Numerical Solution
Large system of equations Require preconditioning
Not sure that it will work in the real world Simpler mechanism (e.g., bi-linear
interpolation) exists with inferior performance
Optical zoom instead of digital zoom
![Page 37: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/37.jpg)
Schedule 9/29: overview 10/6: Dan: face recognition overview 10/13: no meeting (research travel) 10/20: Dr. Kang 10/27: 11/3: 11/10: 11/17: 11/24:
Video-based face modeling and recognition Super resolution
Multiple images Space-time
Human activity/interaction analysis
![Page 38: Distributed Video Data Fusion, Analysis, and Mining for Video Surveillance Applications* Edward Chang 2 and Yuan-Fang Wang 1 Department of Electrical and](https://reader035.vdocuments.us/reader035/viewer/2022070400/56649f135503460f94c26829/html5/thumbnails/38.jpg)
Video-based face modeling and recognition Super resolution
Multiple images Space-time
Human activity/interaction analysis