class 11: smart surveillance - rogerio...

Rogerio Feris, April 17, 2014 EECS 6890 – Topics in Information Processing

Spring 2014, Columbia University http://rogerioferis.com/VisualRecognitionAndSearch2014

Class 11: Smart Surveillance

Visual Recognition And Search Columbia University, Spring 2014

Deadlines


Final Project Presentation


Project Paper


What we have seen so far

Low-Level Features Feature Coding and Pooling Encoding Structure: Part-based Models Attributes And Semantic Features

Part I: From Low-level to Semantic Visual Representations

Deep Learning Similarity-based Image Search Learning-based Hashing for Large-Scale Image Search Large-scale Active Learning

Part II: Tools for Large-Scale Image Classification and Retrieval


Case Studies

IBM Smart Surveillance System [Today]

(second half of the class: project update II)

IBM Multimedia Analysis and Retrieval [Next Class]

Part III: Case Studies

Video Analytics for Smart Surveillance

Video Capture/

Encoding &

Management

DVR - records

& streams video

Real-time alerts • Perimeter violation

• Tailgating attempt

• Red car on service road

User driven queries • Find red cars

• Find tailgating incidents involving this person

Sensors &

Transactions

Analytics & Framework

Watches the video for alerts & events

• Analytics modules:

- Object tracking and classification

- Face capture and recognition

- License Plate Recognition

- Many others

• Gathers event meta-data & makes it searchable

• Provides plug and play framework for analytics

IBM Smart Vision Suite (SVS)


Mode of Operation: Real-time Alerts

Tripwire Directional Motion Removed Object

Examples of user configurable real-time alerts

Triggers on the cat crossing the blue line

Triggers on right-turns, when the cars move in the direction of the arrow

Triggers when object outlined in blue is removed from its position


Mode of Operation: Search After the Fact

“Show me all large vehicles with yellow color that crossed this road in the past 5 days” *Finds DHL delivery trucks+

“Show me all events with duration greater than 30 seconds” [Finds people loitering]


Traditional Pipeline: Blob-Based Analytics

Background Subtraction

Blob Tracking High-Level Processing

Background Subtraction: Moving Object Detection (Blobs)

Most existing smart surveillance systems in the market rely on blob-based video analysis. They are efficient and work well in low-activity scenarios.


Background Modeling: Challenges

Pixel-wise noises

Example: Latecki et al



Lighting changes

• Gradual

• Sudden

Shadows and reflections

Camouflage / low-contrast

Crowded Scenes

Removed objects

Shadow Reflection

Crowded Scene



Scatter plots of red and green values of a single pixel over time

Multimodal Backgrounds (swaying trees, water, flickering, …)


Background Modeling

Gaussian Mixture Model (GMM) for each pixel location

Stauffer and Grimson, “Adaptive Background Mixture Models for Real-time Tracking,” CVPR 1999


Background Modeling

Given a new video frame, the task is to classify each pixel as foreground or background

Each pixel has an associated GMM with K Gaussians (usually K ranges from 3 to 5)

Consider a single pixel as an example, with the associated model containing 5 Gaussians

Let’s assume for now 3 Gaussians correspond to the background and the two others correspond to the foreground


Background Modeling

Check which Gaussian better represents the pixel value:

• If “background” Gaussian, then classify the pixel as “background” and update Gaussian parameters.

• If “foreground” Gaussian, then classify the pixel as “foreground” and update Gaussian parameters.

• If none of them, classify the pixel as “foreground” and replace the least probable distribution with a new Gaussian (centered at the pixel value, with low weight and high variance)

Is the pixel foreground or background?

Matching Process


Background Modeling

Adaptive GMMs

Important to handle gradual lighting changes

Once a pixel value “matches” a Gaussian, the corresponding Gaussian parameters are updated:

Prior Weight

Mean

Variance


Background Modeling

Background and Foreground Gaussians

The Gaussians are ordered by (high support & less variance)

Then the first B distributions are labeled as “Background”, where


Blob-Based Analytics: Limitations

Dealing with Crowded Scenes

Objects close to each other are clustered into a single blob

Environmental Conditions

Quick lighting changes, reflections, and shadows cause spurious blobs

Original Video Background Subtraction Tracking


Object-Centric Video Analytics

Vehicle Detection in Crowded Scenes

Click for video



Pedestrian Detection and Tracking in Crowded Scenes



Limitations / Challenges

Detector accuracy: dealing with appearance variations

Different object poses, lighting changes, etc.

Detector efficiency / cost

State-of-the-art approaches usually run at low frame rates

How many object classes are needed?


Large-Scale Detector Learning

[Feris et al, Large-Scale Vehicle Detection, Indexing, and Search in Urban Surveillance Videos, IEEE Transactions on Multimedia, 2012]


Semi-Automatic Training Data Collection

User-defined Region of Interest (ROI)

Prior information about motion direction and size of cars in the region

Classifier is applied based on motion direction and blob shape (via background modeling, no appearance) and only high-confidence samples are selected

Original Video Captured Samples [click for video]


Semi-Automatic Training Data Collection

~5 hours video, click for demo [Training Data]


Synthetic Occlusion Generator


Huge Vehicle Dataset

Nearly one million images (50+ cameras) ! Largest public dataset to date has ~5000 images


Automatic Dataset Semantic Partitioning

Large variations in pose cause drastic appearance variations difficult for learning

Clustering based on motion direction (related to vehicle pose) motionlet clusters

Multiple detectors are learned (for each motionlet cluster) rather than a single monolithic detector

Clustering Based on Motionlets


Core Detector Model

Cascade of Adaboost Classifiers with Haar-like Features

A feature pool containing a huge set (order of millions) of feature configurations is generated over multiple feature planes

Similar to Integral channel features (Dollar et al), but instead of randomization, we use massively parallel feature selection to select a compact set of discriminative features through Adaboost learning


Deep Cascade Detectors

Significant accuracy improvement by training deep cascades with huge amount of bootstraped negative samples [200,000 negative samples]


Large-Scale Multi-Pose Vehicle Detection

100+ frames per second!


Other Visual Analytics Modules


Abandoned Object Detection

[Q. Fan et al, Relative Attributes For Large-scale Abandoned Object Detection, ICCV 2013]

[Y. L. Tian, R. S. Feris, and A. Hampapur. Real-time detection of abandoned and removed objects in complex environments, VS 2008]

Main Issues

Approach


Attribute-based People Search

[Feris et al, Indexing and searching according to attributes of a person, US Patent 20100106707, 2008]

[Feris et al, Attribute-based People Search: Lessons Learnt from a Practical Surveillance System, ICMR 2014]

[B. Siddiquie , R. S. Feris and L. Davis. Image Ranking and Retrieval Based on Multi-Attribute Queries, CVPR 2011 (Oral), USA, 2011]

[D. Vaquero, R. S. Feris, D. Tran, L. Brown, A. Hampapur, and M. Turk, Attribute-based people search in surveillance environments, WACV 2009 ]

Query Example: “Show me all people entering IBM last month with beard, dark skin, using sunglasses, wearing a red jacket and blue pants”


Attribute-based Vehicle Search

[Feris et al, Attribute-based Vehicle Search in Crowded Surveillance Environments, ICMR 2011]


Surveillance Event Detection (SED)

Qiang Chen et al, CMU-IBM-NUS@TRECVID 2012: Surveillance Event Detection, 2012

We ranked 1st in 4 out of 7 surveillance event detection tasks


Sweethearting Detection

[Fan et al, Recognition of Repetitive Sequential Human Activity, CVPR 2009]


Resources

AMOS: the archive of many outdoor scenes (http://amos.cse.wustl.edu/)

Many public traffic cameras available online! For example, you can check: http://www.chart.state.md.us/travinfo/trafficcams.php#


Project Update II

1) Flower Recognition (Wenqian Liu & Shun-Xuan Wang)

2) Axon Segmentation (Mo Zhou & John Bowler)

3) Safer Driving Through Gesture Control (Kartik Darapuneni, Jianze Wang, Shuheng Gong)

4) Recognition of Animal Skin Texture Attributes in the Wild (Amey Dharwadker & Kai Zhang)

5) Identifying Animals in the Wild (Chia Kang Chao & Yen- Cheng Chou)

6) From ImageNet to Serengeti: Recognizing Animals in Wild Scenes (Guangnan Ye & Maja Rudolph)

class 11: smart surveillance - rogerio...

Documents