cmu-informedia @ trecvid 2011 surveillance event detection speaker: lu jiang longfei zhang, lu...
TRANSCRIPT
![Page 1: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/1.jpg)
CMU-Informedia @ TRECVID 2011 Surveillance Event Detection
Speaker: Lu Jiang
Longfei Zhang , Lu Jiang , Lei Bao, Shohei Takahashi, Yuanpeng Li,
Alexander Hauptmann
Carnegie Mellon University
![Page 2: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/2.jpg)
SED11 Team Team members:
Longfei Lu Lei Shohei Yuanpeng
Alex
![Page 3: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/3.jpg)
Outline Framework MoSIFT based Action Recognition
MoSIFT feature Spatial Bag of Word Tackling highly imbalanced datasets
Experiment Results
![Page 4: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/4.jpg)
Framework
Vi deo Person Detection
CascadeSVM
Filtering
Spatio-TemporalFeature
Detection
BackgroundSubtraction
Spatial Bag-of-Word
Slid
ing
wind
ow
Random Forest
Visual vocabularyK-means
(k = 3000)
Hot Region detection
Classification
• Augmented Boosted Cascade
![Page 5: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/5.jpg)
Framework
Vi deo Person Detection
CascadeSVM
Filtering
Spatio-TemporalFeature
Detection
BackgroundSubtraction
Spatial Bag-of-Word
Slid
ing
wind
ow
Random Forest
Visual vocabularyK-means
(k = 3000)
Hot Region detection
Classification
• Augmented Boosted Cascade
![Page 6: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/6.jpg)
MoSIFT• Given pairs of video frames, detect spatio-temporal interest points at
multiple scales.• SIFT point detection with sufficient optical flow.• Describing SIFT points through SIFT descriptor and optical flow.
![Page 7: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/7.jpg)
Spatial Bag of Words• Each frame is divided into a set of non-overlapping rectangular tiles.• The resulting BoW features are derived by concatenating the BoW features
captured in each tile.• Encode the spatial (tile) information in BoW.
![Page 8: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/8.jpg)
Hot Region Detection
• Person Detection: Person detection based on Histogram of Oriented Gradient (HOG) features.
• Background subtraction.
![Page 9: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/9.jpg)
Spatial Bag of Features• Each frame is divided into a set of rectangular tiles or grids.• The resulting Bow features are derived by concatenating the BoW features
captured in each grid.• Encode the adjusted spatial information in BoW.
1×1 2×2 1×3
720
576
720 720
173
173
230
![Page 10: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/10.jpg)
Spatial Bag of Features• Each frame is divided into a set of rectangular tiles or grids.• The resulting Bow features are derived by concatenating the BoW features
captured in each grid.• Encode the adjusted spatial information in BoW.
![Page 11: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/11.jpg)
Tackling the highly imbalanced data• Augmented Cascade SVM.• Bagging classification method except it adopts
probabilistic sampling to select negative samples in a sequential manner.
Training Dataset Sub Dataset 1
Classifier 1
Positive Samples
Negative Samples
![Page 12: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/12.jpg)
Tackling the highly imbalanced data• Augmented Cascade SVM.• Bagging classification method except it adopts
probabilistic sampling to select negative samples in a sequential manner.
Training Dataset Sub Dataset 1
Classifier 1
![Page 13: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/13.jpg)
Tackling the highly imbalanced data• Augmented Cascade SVM.• Bagging classification method except it adopts
probabilistic sampling to select negative samples in a sequential manner.
Training Dataset Sub Dataset 1
Classifier 1
0.80.70.30.90.10.1
![Page 14: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/14.jpg)
Tackling the highly imbalanced data• Augmented Cascade SVM.• Bagging classification method except it adopts
probabilistic sampling to select negative samples in a sequential manner.
Training Dataset Sub Dataset 1
Classifier 1
0.80.20.30.90.10.1
Sub Dataset 2
Classifier 2
… …
![Page 15: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/15.jpg)
Tackling the highly imbalanced data• Augmented Cascade SVM.• Bagging classification method except it adopts
probabilistic sampling to select negative samples in a sequential manner. N = 10 layers.
Training Dataset Sub Dataset 1
Classifier 1
Sub Dataset 2
Classifier 2
Sub Dataset N
Classifier N
… …
![Page 16: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/16.jpg)
Tackling highly imbalanced dataBagging Ensemble of Random Forests
• Random Forest is a forest of decision trees.• Two parameters:
– n is the number of trees in the forest.– m the number of features in each decision tree.
• Build each decision tree by randomly selecting m features and use C4.5.
• Each tree is grown without pruning.
![Page 17: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/17.jpg)
Tackling highly imbalanced dataBagging Random Forest: Ensemble of Random Forests
• Random Forest is a forest of decision trees.• Two parameters:
– n is the number of trees in the forest.– m the number of features in each decision tree.
• Build each decision tree by randomly selecting m features • Each tree is grown without pruning.
![Page 18: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/18.jpg)
Cascade SVM vs. Bagging Random ForestCascade SVM(chi2 kernel)
Bagging Random Forest
Effectiveness Most Effective Usually 3-8% less in Average PrecisionEfficiency Time consuming Usually tens to hundreds of times faster
Sensitive to Parameter
settings
Sensitive Relatively insensitive
![Page 19: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/19.jpg)
Results
• 8 Submissions:• The first 6 runs use cascade SVM with different
sliding window sizes and parameter sets.• Last 2 runs use bagging random forest method.
![Page 20: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/20.jpg)
Results• Results for Primary run:
Inputs Actual DCR Minimum DCR
#Targ #NTarg #Sys #CorDet #CorDet #FA #Miss DCR DCR
CellToEar 194 127 128 1 0 127 193 1.0365 1.0003Embrace 175 657 715 58 0 657 117 0.8840 0.8658ObjectPut 621 57 58 1 0 57 620 1.0171 1.0003PeopleMeet 449 336 381 45 0 336 404 1.0100 0.9724PeopleSplitUp 187 115 118 3 0 115 184 1.0217 1.0003PersonRuns 107 413 439 26 0 413 81 0.8924 0.8370Pointing 1063 1960 2092 132 0 1960 931 1.5186 1.0001
![Page 21: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/21.jpg)
Results
Compared with our primary run with those of other teams. We have the best Min DCR in 3 out of 6 events.
![Page 22: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/22.jpg)
Results
Compared with our primary run with those of other teams. We have the best Actual DCR in 3 out of 7 events.
![Page 23: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/23.jpg)
Results
Compared with our last year’s result, we get improvement in terms of MIN DCR in 5 events “Embrace”, “People Meet”, “People Slit up”, “Person Runs” and “Pointing”.• Best event results over all CMU runs
Min DCR Cell ToEarEmbrace ObjectPut PeopleMeet People Split Up PersonRuns Pointing
2010 CMU 1.0003 0.9838 1.0003 0.9793 0.9889 0.9477 1.0003
2010 Overall Best Event
1 0.9663 0.9971 0.9787 0.9889 0.6818 0.996
2011 CMU 1.0003 0.8658 1.0003 0.9684 0.7838 0.837 0.9996
![Page 24: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/24.jpg)
Min DCR Cell ToEarEmbrace ObjectPut PeopleMeet People Split Up PersonRuns Pointing
2010 CMU 1.0003 0.9838 1.0003 0.9793 0.9889 0.9477 1.0003
2010 Overall Best Event
1 0.9663 0.9971 0.9787 0.9889 0.6818 0.996
2011 CMU 1.0003 0.8658 1.0003 0.9684 0.7838 0.837 0.9996
Results
Compared with the best event results in TRECVID 2010, for event “Embrace”, “PeopleMeet” and “People Split Up” ours are the best system.
![Page 25: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/25.jpg)
Cascade SVM vs. Random Forest• Comparison between Run 1 (Cascade SVM) and Run
7 (Random Forest) in terms of Min DCR.
![Page 26: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/26.jpg)
Threshold Search• Searching for Min DCR using cross validation.• Actual DCR provides reasonable estimates of Min
DCR on all runs.
Primary Run
![Page 27: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/27.jpg)
Impact of sliding window size
• Results for all events with sliding window size 25 frames (Run 3).
![Page 28: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/28.jpg)
Impact of sliding window size
• Results for all events with sliding window size 60 (Run 5).
![Page 29: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/29.jpg)
Event-specific sliding window size• For PersonRuns, CellToEar, Embrace and Pointing a good sliding window is small.• For Embrace, ObjectPut and PeopleMeet a good sliding window size is larger.
![Page 30: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/30.jpg)
Conclusions Observations:
MoSIFT feature captures salient motions in videos. Spatial Bag of Words can boost the performance over last
year’s result. Event-specific sliding window size impacts the final result. Both cascade SVM and bagging random forest can handle
highly imbalanced data sets. Random forest is much faster.
![Page 31: CMU-Informedia @ TRECVID 2011 Surveillance Event Detection Speaker: Lu Jiang Longfei Zhang, Lu Jiang, Lei Bao, Shohei Takahashi, Yuanpeng Li, Alexander](https://reader035.vdocuments.us/reader035/viewer/2022070306/55178c625503460e6e8b57ee/html5/thumbnails/31.jpg)
THANK YOU.Q&A?