pedestrian detection in crowded scenes dhruv batra ece cmu
TRANSCRIPT
Pedestrian Detection in Crowded Scenes
1. Pedestrian Detection in Crowded Scenes. Bastian Leibe, Edgar Seemann, and Bernt Schiele. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR'05), San Diego, CA, June 2005.
2. An Evaluation of Local Shape-Based Features for Pedestrian Detection. Edgar Seemann, Bastian Leibe, Krystian Mikolajczyk, and Bernt Schiele. In British Machine Vision Conference (BMVC'05) Oxford, UK, September 2005.
3. Combined Object Categorization and Segmentation with an Implicit Shape Model. Bastian Leibe, Ales Leonardis, and Bernt Schiele. In ECCV'04 Workshop on Statistical Learning in Computer Vision, Prague, May 2004.
Theme of the Paper
Probabilistic top-down/bottom-up formulation of segmentation/recognition
Basic Premise: “[Such a] problem is too difficult for any type of feature or model alone”
Theme of the Paper
Open Question: How would you do pedestrian detection/segmentation?
Solution: integrate as many cues as possible from many sources
Original imageSupport of Segmentation from local featuresSegmentation from local featuresSupport of segmentation from global features (Chamfer Matching)Segmentation from global features (Chamfer Matching)
Theme of the Paper
Goal: Localize AND count pedestrians in a given image
Datasets
Training Set: 35 people walking parallel to the image planeTesting Set (Much harder!): 209 images of 595 annotated pedestrians
Initial Recognition Approach
First Step: Generate hypotheses from local features (Intrinsic Shape Models)
Training: Code book Approach (with spatial information)
Initial Recognition Approach
First Step: Generate hypotheses from local features (Intrinsic Shape Models)
Training:
Lowe’s DoG Detector 3x 3 patches
Resize to 25 x 25
Initial Recognition Approach
First Step: Generate hypotheses from local features (Intrinsic Shape Models)
Training: Agglomerative Clustering
Initial Recognition Approach
First Step: Generate hypotheses from local features (Intrinsic Shape Models)
Training: Agglomerative Clustering
Initial Recognition Approach
First Step: Generate hypotheses from local features (Intrinsic Shape Models)
Training: Agglomerative Clustering
Initial Recognition Approach
First Step: Generate hypotheses from local features (Intrinsic Shape Models)
Training: Agglomerative Clustering
Codebook entries store figure-ground masks for these entries
Initial Recognition Approach
First Step: Generate hypotheses from local features (Intrinsic Shape Models)
Training: But wait! We just lost spatial information … Run again
Lowe’s DoG Detector
Resize to 25 x 25
3x 3 patches
Find codebook patches
Learn Spatial Distribution
Initial Recognition Approach
First Step: Generate hypotheses from local features (Intrinsic Shape Models)
Testing: Initial Hypothesis: Overall
Initial Recognition Approach
First Step: Generate hypotheses from local features (Intrinsic Shape Models)
Testing: Initial Hypothesis (Probabilistic Hough Voting Procedure)
measuring similarity between patch and codebook entrylearnt from spatial distributions of codebook entries
Search for maximum in probability spaceUsing a fixed size search window
Initial Recognition Approach
First Step: Generate hypotheses from local features (Intrinsic Shape Models)
Testing: Initial Hypothesis: found as maxima in 3D voting space
maxima computed using Mean Shift Mode Estimationover this balloon density estimator
Uniform Cubicle Kernel
Initial Recognition Approach
First Step: Generate hypotheses from local features (Intrinsic Shape Models)
Testing: Initial Hypothesis: Overall
Initial Recognition Approach
First Step: Generate hypotheses from local features (Intrinsic Shape Models)
Testing: Initial Hypothesis: Overall
Initial Recognition Approach
First Step: Generate hypotheses from local features (Intrinsic Shape Models)
Testing: Probabilistic top down segmentation
Intermediate Goal: Find this
start here
Assumption: Uniform Priors
Estimate from training data
From similarity measure
Initial Recognition Approach
First Step: Generate hypotheses from local features (Intrinsic Shape Models)
Testing: Probabilistic top down segmentation
Marginalized over all patches in image
Substitute this here
to get this
Initial Recognition Approach
First Step: Generate hypotheses from local features (Intrinsic Shape Models)
Testing: Probabilistic top down segmentation
Initial Recognition Approach
First Step: Generate hypotheses from local features (Intrinsic Shape Models)
Testing: Probabilistic top down segmentation
Initial Recognition Approach
Second Step: Segmentation based Verification (Minimum Description Length)
Saving that can be achieved by explaining part of image by a particular hypothesis
Number of pixels N explained by hModel complexityCost of describing the error made by hypothesis h
Sum over all pixels hypothesized as figureProbability of being a background
Initial Recognition Approach
Second Step: Segmentation based Verification (Minimum Description Length)
With this framework we can resolve conflicts between overlapping hypothesis
Relative importance assigned to support of hypothesisBias term
Initial Recognition Approach
Second Step: Segmentation based Verification (Minimum Description Length)
Voila! It works
Initial Recognition Approach
Second Step: Segmentation based Verification (Minimum Description Length)
Caveat: it leads to another set of problems
ISM doesn’t know a person doesn’t have three legs!
Global Cues are needed
Or four legs and three arms
Assimilation of Global Cues
Distance Transform, Chamfer Matching
get Feature Image by an edge detectorget DT image by computing distance to nearest feature point Chamfer Distance between template and DT image
Assimilation of Global Cues (Attempt 1)
Distance Transform, Chamfer Matching
Chamfer distancebased matching
Use scale estimateto cut out surrounding region
Apply Cannydetector andcompute DT
Yellow is highestChamfer score
Initial hypothesisgenerated by local features
Assimilation of Global Cues (Attempt 2)
Maximize Chamfer Score AND overlap with overlap with hypothesized segmentation instead of pure Chamfer Score
Overlap expressed as Bhattacharya coeff.
Joint score is linear combinationof the two
Assimilation of Global Cues (Attempt 3)
Apply hypothesis saving MDL method again Boolean quadratic formulation