class 11: smart surveillance - rogerio...
TRANSCRIPT
Rogerio Feris, April 17, 2014 EECS 6890 – Topics in Information Processing
Spring 2014, Columbia University http://rogerioferis.com/VisualRecognitionAndSearch2014
Class 11: Smart Surveillance
Visual Recognition And Search Columbia University, Spring 2014
Deadlines
Visual Recognition And Search Columbia University, Spring 2014
Final Project Presentation
Visual Recognition And Search Columbia University, Spring 2014
Project Paper
Visual Recognition And Search Columbia University, Spring 2014
What we have seen so far
Low-Level Features Feature Coding and Pooling Encoding Structure: Part-based Models Attributes And Semantic Features
Part I: From Low-level to Semantic Visual Representations
Deep Learning Similarity-based Image Search Learning-based Hashing for Large-Scale Image Search Large-scale Active Learning
Part II: Tools for Large-Scale Image Classification and Retrieval
Visual Recognition And Search Columbia University, Spring 2014
Case Studies
IBM Smart Surveillance System [Today]
(second half of the class: project update II)
IBM Multimedia Analysis and Retrieval [Next Class]
Part III: Case Studies
Video Analytics for Smart Surveillance
Video Capture/
Encoding &
Management
DVR - records
& streams video
Real-time alerts • Perimeter violation
• Tailgating attempt
• Red car on service road
User driven queries • Find red cars
• Find tailgating incidents involving this person
Sensors &
Transactions
Analytics & Framework
Watches the video for alerts & events
• Analytics modules:
- Object tracking and classification
- Face capture and recognition
- License Plate Recognition
- Many others
• Gathers event meta-data & makes it searchable
• Provides plug and play framework for analytics
IBM Smart Vision Suite (SVS)
Visual Recognition And Search Columbia University, Spring 2014
Mode of Operation: Real-time Alerts
Tripwire Directional Motion Removed Object
Examples of user configurable real-time alerts
Triggers on the cat crossing the blue line
Triggers on right-turns, when the cars move in the direction of the arrow
Triggers when object outlined in blue is removed from its position
Visual Recognition And Search Columbia University, Spring 2014
Mode of Operation: Search After the Fact
“Show me all large vehicles with yellow color that crossed this road in the past 5 days” *Finds DHL delivery trucks+
“Show me all events with duration greater than 30 seconds” [Finds people loitering]
Visual Recognition And Search Columbia University, Spring 2014
Traditional Pipeline: Blob-Based Analytics
Background Subtraction
Blob Tracking High-Level Processing
Background Subtraction: Moving Object Detection (Blobs)
Most existing smart surveillance systems in the market rely on blob-based video analysis. They are efficient and work well in low-activity scenarios.
Visual Recognition And Search Columbia University, Spring 2014
Background Modeling: Challenges
Pixel-wise noises
Example: Latecki et al
Visual Recognition And Search Columbia University, Spring 2014
Background Modeling: Challenges
Lighting changes
• Gradual
• Sudden
Shadows and reflections
Camouflage / low-contrast
Crowded Scenes
Removed objects
Shadow Reflection
Crowded Scene
Visual Recognition And Search Columbia University, Spring 2014
Background Modeling: Challenges
Scatter plots of red and green values of a single pixel over time
Multimodal Backgrounds (swaying trees, water, flickering, …)
Visual Recognition And Search Columbia University, Spring 2014
Background Modeling
Gaussian Mixture Model (GMM) for each pixel location
Stauffer and Grimson, “Adaptive Background Mixture Models for Real-time Tracking,” CVPR 1999
Visual Recognition And Search Columbia University, Spring 2014
Background Modeling
Given a new video frame, the task is to classify each pixel as foreground or background
Each pixel has an associated GMM with K Gaussians (usually K ranges from 3 to 5)
Consider a single pixel as an example, with the associated model containing 5 Gaussians
Let’s assume for now 3 Gaussians correspond to the background and the two others correspond to the foreground
Visual Recognition And Search Columbia University, Spring 2014
Background Modeling
Check which Gaussian better represents the pixel value:
• If “background” Gaussian, then classify the pixel as “background” and update Gaussian parameters.
• If “foreground” Gaussian, then classify the pixel as “foreground” and update Gaussian parameters.
• If none of them, classify the pixel as “foreground” and replace the least probable distribution with a new Gaussian (centered at the pixel value, with low weight and high variance)
Is the pixel foreground or background?
Matching Process
Visual Recognition And Search Columbia University, Spring 2014
Background Modeling
Adaptive GMMs
Important to handle gradual lighting changes
Once a pixel value “matches” a Gaussian, the corresponding Gaussian parameters are updated:
Prior Weight
Mean
Variance
Visual Recognition And Search Columbia University, Spring 2014
Background Modeling
Background and Foreground Gaussians
The Gaussians are ordered by (high support & less variance)
Then the first B distributions are labeled as “Background”, where
Visual Recognition And Search Columbia University, Spring 2014
Blob-Based Analytics: Limitations
Dealing with Crowded Scenes
Objects close to each other are clustered into a single blob
Environmental Conditions
Quick lighting changes, reflections, and shadows cause spurious blobs
Original Video Background Subtraction Tracking
Visual Recognition And Search Columbia University, Spring 2014
Object-Centric Video Analytics
Vehicle Detection in Crowded Scenes
Click for video
Visual Recognition And Search Columbia University, Spring 2014
Object-Centric Video Analytics
Pedestrian Detection and Tracking in Crowded Scenes
Visual Recognition And Search Columbia University, Spring 2014
Object-Centric Video Analytics
Limitations / Challenges
Detector accuracy: dealing with appearance variations
Different object poses, lighting changes, etc.
Detector efficiency / cost
State-of-the-art approaches usually run at low frame rates
How many object classes are needed?
Visual Recognition And Search Columbia University, Spring 2014
Large-Scale Detector Learning
[Feris et al, Large-Scale Vehicle Detection, Indexing, and Search in Urban Surveillance Videos, IEEE Transactions on Multimedia, 2012]
Visual Recognition And Search Columbia University, Spring 2014
Semi-Automatic Training Data Collection
User-defined Region of Interest (ROI)
Prior information about motion direction and size of cars in the region
Classifier is applied based on motion direction and blob shape (via background modeling, no appearance) and only high-confidence samples are selected
Original Video Captured Samples [click for video]
Visual Recognition And Search Columbia University, Spring 2014
Semi-Automatic Training Data Collection
~5 hours video, click for demo [Training Data]
Visual Recognition And Search Columbia University, Spring 2014
Synthetic Occlusion Generator
Visual Recognition And Search Columbia University, Spring 2014
Synthetic Occlusion Generator
Visual Recognition And Search Columbia University, Spring 2014
Huge Vehicle Dataset
Nearly one million images (50+ cameras) ! Largest public dataset to date has ~5000 images
Visual Recognition And Search Columbia University, Spring 2014
Automatic Dataset Semantic Partitioning
Large variations in pose cause drastic appearance variations difficult for learning
Clustering based on motion direction (related to vehicle pose) motionlet clusters
Multiple detectors are learned (for each motionlet cluster) rather than a single monolithic detector
Clustering Based on Motionlets
Visual Recognition And Search Columbia University, Spring 2014
Core Detector Model
Cascade of Adaboost Classifiers with Haar-like Features
A feature pool containing a huge set (order of millions) of feature configurations is generated over multiple feature planes
Similar to Integral channel features (Dollar et al), but instead of randomization, we use massively parallel feature selection to select a compact set of discriminative features through Adaboost learning
Visual Recognition And Search Columbia University, Spring 2014
Deep Cascade Detectors
Significant accuracy improvement by training deep cascades with huge amount of bootstraped negative samples [200,000 negative samples]
Visual Recognition And Search Columbia University, Spring 2014
Large-Scale Multi-Pose Vehicle Detection
100+ frames per second!
Visual Recognition And Search Columbia University, Spring 2014
Other Visual Analytics Modules
Visual Recognition And Search Columbia University, Spring 2014
Abandoned Object Detection
[Q. Fan et al, Relative Attributes For Large-scale Abandoned Object Detection, ICCV 2013]
[Y. L. Tian, R. S. Feris, and A. Hampapur. Real-time detection of abandoned and removed objects in complex environments, VS 2008]
Main Issues
Approach
Visual Recognition And Search Columbia University, Spring 2014
Attribute-based People Search
[Feris et al, Indexing and searching according to attributes of a person, US Patent 20100106707, 2008]
[Feris et al, Attribute-based People Search: Lessons Learnt from a Practical Surveillance System, ICMR 2014]
[B. Siddiquie , R. S. Feris and L. Davis. Image Ranking and Retrieval Based on Multi-Attribute Queries, CVPR 2011 (Oral), USA, 2011]
[D. Vaquero, R. S. Feris, D. Tran, L. Brown, A. Hampapur, and M. Turk, Attribute-based people search in surveillance environments, WACV 2009 ]
Query Example: “Show me all people entering IBM last month with beard, dark skin, using sunglasses, wearing a red jacket and blue pants”
Visual Recognition And Search Columbia University, Spring 2014
Attribute-based Vehicle Search
[Feris et al, Attribute-based Vehicle Search in Crowded Surveillance Environments, ICMR 2011]
Visual Recognition And Search Columbia University, Spring 2014
Surveillance Event Detection (SED)
Qiang Chen et al, CMU-IBM-NUS@TRECVID 2012: Surveillance Event Detection, 2012
We ranked 1st in 4 out of 7 surveillance event detection tasks
Visual Recognition And Search Columbia University, Spring 2014
Sweethearting Detection
[Fan et al, Recognition of Repetitive Sequential Human Activity, CVPR 2009]
Visual Recognition And Search Columbia University, Spring 2014
Resources
AMOS: the archive of many outdoor scenes (http://amos.cse.wustl.edu/)
Many public traffic cameras available online! For example, you can check: http://www.chart.state.md.us/travinfo/trafficcams.php#
Visual Recognition And Search Columbia University, Spring 2014
Project Update II
1) Flower Recognition (Wenqian Liu & Shun-Xuan Wang)
2) Axon Segmentation (Mo Zhou & John Bowler)
3) Safer Driving Through Gesture Control (Kartik Darapuneni, Jianze Wang, Shuheng Gong)
4) Recognition of Animal Skin Texture Attributes in the Wild (Amey Dharwadker & Kai Zhang)
5) Identifying Animals in the Wild (Chia Kang Chao & Yen- Cheng Chou)
6) From ImageNet to Serengeti: Recognizing Animals in Wild Scenes (Guangnan Ye & Maja Rudolph)