pedestrian detection by stereo vision on mobile robots · pedestrian detection by stereo vision on...
TRANSCRIPT
Seminar Heidelberg University Mobile Human Detection Systems
Pedestrian Detection by Stereo Vision on Mobile Robots
Philip Mayer Matrikelnummer: 3300646 06.03.2017
Motivation
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 2
Fig.1: Pedestrians Within Bounding Box [6] Fig.2: Car Pedestrian Detection [7]
Outline
1. Problem Formulation
2. Solution Approach
3. Stereo Vision
4. Methods
5. Results
6. Summary and Conclusion 06.03.2017
Philip Mayer, Seminar, Mobile Human Detection Systems, Heidelberg University
3
1. Problem Formulation
Given: • Stereo Vision Depth Image • Mobile Robot • Unknown Background • Cluttered Environment • Crowded Places
Required: • Pedestrian Detection Also If Partially Occluded
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 4
2. Solution Approach
Fig.3: Depth Image [1]
Fig.4: Segmented Regions [1]
Fig.5: Candidates [1]
Fig.6: Detected Humans[1] Fig.7: Block Diagram Solution Approach
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 5
3. Stereo Vision
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 6
Fig.10: Stereo Vision – Geometric Setup [3]
𝑃𝑟𝑒𝑎𝑙
𝑥𝑟𝑒𝑎𝑙
𝑦𝑟𝑒𝑎𝑙
𝑧𝑟𝑒𝑎𝑙
𝑦′ 𝑥′
𝑦 𝑥
𝜆
𝜆 𝑃
𝑃′
• 𝐴, 𝐴‘ – Optical Axis • 𝑂, 𝑂‘ – Lense Centers • 𝐵 – Baseline • 𝑃𝑟𝑒𝑎𝑙 – Point in real space • 𝑃′– Projection of 𝑃𝑟𝑒𝑎𝑙 on Image 2 • 𝑃 – Projection of 𝑃𝑟𝑒𝑎𝑙 on Image 1
3. Stereo Vision
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 7
Fig.8: Color Image 1 – Left Lense [5] Fig.9: Color Image 2 – Right Lense [5]
3. Stereo Vision
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 8
Distance To Camera:
0,5 m Undefined 8 m
Fig.11: Depth Image 1 – Left Lense [5] Fig.12: Depth Image 2 – Right Lense [5]
4. Methods Graph-Based Segmentation
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 9
Fig.3: Depth Image [1] Fig.4: Segmented Regions [1]
4. Methods Graph-Based Segmentation
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 10
i
j
𝐸𝑖𝑚𝑎𝑥,𝑗𝑚𝑎𝑥
0,0 α
α
𝑖𝑚𝑎𝑥 =𝑖𝑚𝑎𝑔𝑒 𝑤𝑖𝑑𝑡ℎ 𝑤
𝑐𝑒𝑙𝑙 𝑤𝑖𝑑𝑡ℎ 𝛼
𝑗𝑚𝑎𝑥 =𝑖𝑚𝑎𝑔𝑒 ℎ𝑒𝑖𝑔ℎ𝑡 ℎ
𝑐𝑒𝑙𝑙 ℎ𝑒𝑖𝑔ℎ𝑡 𝛼
Fig.13: Depth Image With Grid [1]
4. Methods Graph-Based Segmentation
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 11
Fig.14: Random Pixel Selection Within Depth Image Grid Cell
4. Methods Graph-Based Segmentation
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 12
i
j
𝐸𝑖,𝑗 → 𝑃𝑖,𝑗
0,0
𝑃𝑖,𝑗 =
𝑝𝑖,𝑗 𝑥𝑝𝑖,𝑗 𝑦𝑝𝑖,𝑗 𝑧
Point 𝑃𝑖,𝑗 in 3D-Space
Fig.15: Depth Image With Grid Points For Depth And Normals Graph [1]
4. Methods Graph-Based Segmentation
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 13
𝑃𝑖,𝑗 𝑃𝑖+1,𝑗 𝑃𝑖−1,𝑗
𝑃𝑖,𝑗−1
𝑃𝑖,𝑗+1
𝑤𝐷𝑒𝑝𝑡ℎ = 𝑧1 − 𝑧2 𝑧1 = 𝐷𝑒𝑝𝑡ℎ 𝑜𝑓 𝑃𝑖,𝑗
𝑧2 = 𝐷𝑒𝑝𝑡ℎ 𝑜𝑓 𝑃𝑖+1,𝑗
𝑤 = 𝐸𝑑𝑔𝑒 𝑊𝑒𝑖𝑔ℎ𝑡
Fig.16: Depth Graph Weights Calculation
4. Methods Graph-Based Segmentation
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 14
8 Neighbors Of Pi,j
Pi,j
Pi,j Pi+1,j
Pi+1,j−1
Pi+1,j+1
Pi−1,j
Pi−1,j−1
Pi−1,j+1 Pi,j+1
Pi,j−1
• 9 Samples of 𝑃 in 3D-Space • Least-Square-Roots Plane Normals 𝑛𝑖,𝑗 Fig.17: Depth Graph Normals Calculation
4. Methods Graph-Based Segmentation
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 15
𝑃𝑖,𝑗 𝑃𝑖+1,𝑗 𝑃𝑖−1,𝑗
𝑃𝑖,𝑗−1
𝑃𝑖,𝑗+1
𝑤𝑁𝑜𝑟𝑚𝑎𝑙 = 𝑐𝑜𝑠−1(𝑣 ∙ 𝑢)
𝑢 = 𝑁𝑜𝑟𝑚𝑎𝑙 𝑜𝑓 𝑃𝑖,𝑗
𝑣 = 𝑁𝑜𝑟𝑚𝑎𝑙 𝑜𝑓 𝑃𝑖+1,𝑗
𝑤 = 𝐸𝑑𝑔𝑒 𝑊𝑒𝑖𝑔ℎ𝑡
Fig.18: Normals Graph Weights Calculation
4. Methods Graph-Based Segmentation
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 16
𝑅𝑒𝑔𝑖𝑜𝑛 𝑝𝑜𝑖𝑛𝑡𝑠 𝑖𝑛 𝐺𝐷𝑒𝑝𝑡ℎ
𝑅𝑒𝑔𝑖𝑜𝑛 𝑝𝑜𝑖𝑛𝑡𝑠 𝑖𝑛 𝐺𝑁𝑜𝑟𝑚𝑎𝑙
𝑅𝑒𝑔𝑖𝑜𝑛 𝑟𝑖
• Regions 𝑟𝑖 ∈ 𝑅
• Minimal size
of a region is β
Filtering noise
Fig.19: Region Condition
4. Methods Filtering and Merging
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 17
Fig.5: Candidates [1] Fig.4: Segmented Regions [1]
4. Methods Filtering and Merging
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 18
x
y
𝑥2 𝑥1
𝑦2
𝑦1
h
w
𝑤 = 𝑥2 − 𝑥1
ℎ = 𝑦2 − 𝑦1
μ𝑥 = 𝑤
2
μ𝑦 = ℎ
2
μ𝑧 = 𝑚𝑒𝑎𝑛 𝑑𝑒𝑝𝑡ℎ 𝑧 (𝑟𝑖)
Bounding Box
𝑟𝑖
Fig.20: Region Attributes Calculation
4. Methods Filtering and Merging
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 19
1. Select 3 Points Randomly n-Times From 𝑟𝑖 Hypothesis Plane 𝜋𝑘
2. Maximum Number Of Points Fitting The Plane 𝜋𝑘
𝑚𝑎𝑥𝑘=1 𝑛 𝑝 ∈ 𝑟𝑖 : 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑜𝑓 𝑝 𝑡𝑜 𝜋𝑘 < 𝜀
𝑟𝑖
y
x
z
Points above 𝜋𝑘
Points below 𝜋𝑘
Points with distance to 𝜋𝑘 < 휀
3 randomly selected Points 𝜋𝑘
Fig.21: Hypothesis Plane
𝜋𝑘
4. Methods Filtering and Merging
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 20
Finding a rule specifiying valid ranges for: • Mean Depth • Height • Width • Minimum Inlier Fraction
Rule derived from positive examples in the training set Eliminate regions unable to be humans
4. Methods Filtering and Merging
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 21
Region too small but planar:
• 𝑆𝑖𝑧𝑒(𝑟𝑖) < 𝛽 • High number of fitting points on 𝜋𝑘 • Mean depth rule satisfied Merging regions (merging condition)
𝜇𝑥𝑧 𝑟𝑖 − 𝜇𝑥𝑧 𝑟𝑗 < 𝛿𝑥𝑧 and 𝜇𝑦 𝑟𝑖 − 𝜇𝑦 𝑟𝑗 < 𝛿𝑦
Important step due to detached parts by segmentation
4. Methods Filtering and Merging
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 22
• Set of regions Set of (unscaled) candidates
• Classifcation needs scaled candidates
Copy pixels of regions into candidate image with size 𝑤𝑐 × ℎ𝑐
• If pixel copied raw depth pixel
• Undefined otherwise
• Candidates 𝑐𝑖: Candidate image + bounding box
Output candidate set C
4. Methods Candidate Classification
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 23
Fig.4: Segmented Regions [1] Fig.6: Detected Humans[1]
4. Methods Candidate Classification
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 24
Bounding Box
8x8 Pixel Cell
Δ𝐷𝑒𝑝𝑡ℎ𝑥 = 222 – 55 = 167
Δ𝐷𝑒𝑝𝑡ℎ𝑦 = 235 – 33 = 202
𝐺𝑟𝑎𝑑𝑖𝑒𝑛𝑡 𝑉𝑒𝑐𝑡𝑜𝑟 𝑣 𝐺 =Δ𝐷𝑒𝑝𝑡ℎ𝑥Δ𝐷𝑒𝑝𝑡ℎ𝑦
=167202
2x2 Cell Box
Fig.23: Candidate Image With Bounding Box And Fixed Size [1]
Fig.22: Gradient Vector Calculation [2]
4. Methods Candidate Classification
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 25
𝑀𝑎𝑔𝑛𝑖𝑡𝑢𝑑𝑒
Angle [Deg]
𝑀𝑎𝑔𝑛𝑖𝑡𝑢𝑑𝑒 𝑀 = 1672 + 2022 = 262,1 𝐺𝑟𝑎𝑑𝑖𝑒𝑛𝑡 𝐴𝑛𝑔𝑙𝑒 Θ = arctan167
202= 69,3°
Fig.24: Histogram Of Oriented Depth
4. Methods Candidate Classification
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 26
50% Box Overlap Yellow: Initial Step
2x2 Cell Box
Green: Preceeding Step
4 Cell Histograms For Normalization
Vector of Histograms Candidate Descriptor for SVM
Fig.26: Candidate Image With Blocks For Normalization [1]
4. Methods Candidate Classification
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 27
Fig.27: Linear Support Vector Machine [4]
Positive Example
Negative Example
A
B
4. Methods Candidate Classification
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 28
• Depth image frames from training set
• Candidates labeled as positive or negative
Fig.28: Support Vector Machine Scheme [2]
- Set of Humans H
- Set of Candidates C
5. Results
„Hallway“ „Café“
Distances 0,5 – 8 [m] 0,5 – 5 [m]
Occlusion Level Varying Often
Environment Not Cluttered Cluttered
Ergonomic Position of People
Upright Various Poses
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 29
Two Sets Of Experiments: 1. Recall & Precision 2. Impact of varying number of
training examples on Recall & Precision
5. Results
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 30
Hallway Dataset Café Dataset
Fig.29: Accuracy Results, (a) Hallway Data Set, (b) Café Data Set [1]
Equal Error Rate (EER)
84
84
75
75
𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑃
𝑇𝑃 + 𝐹𝑁 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃
𝑇𝑃 + 𝐹𝑃 𝑇𝑃 = 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
𝐹𝑁 = 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠
𝐹𝑃 = 𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
5. Results
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 31
Hallway Dataset Café Dataset
Fig.30: Impact On Accuracy By Reduction Of Positive Training Examples, (a) Hallway Data Set, (b) Café Data Set [1]
6. Summary & Conclusion
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 32
• Stereo Vision • Segmentation Algorithm • Filtering and merging • HOD Descriptor • SVM • Precision, Recall • Comparison of impact on precision and recall due to less training for SVM
• Missing Information: Impact Of Resolution Loss • Comparison Of Datasets: Environmental Difference, Different Ergonomic Positions • Presented Depth Image: No Reference About Depth Information Encoding • No Measure Units in Data Sheet Table
Paper (Literatur)
1. Fast Human Detection for Indoor Mobile Robots Using Depth Images – 2013 IEEE International Conference on Robotics and Automation (ICRA) Karlsruhe, Germany, May 6-10, 2013
2. L. Spinello and K. Arras, “People Detection in RGB-D Data,” in Proceedings of IROS 2011, pp. 3838–3843 Perma-Link: http://ref.scielo.org/cmkfvr
3. Web-Page: https://en.wikipedia.org/wiki/Support_vector_machine
4. Web-Page: http://vision.middlebury.edu/stereo/data/scenes2003/
5. Web-Page: https://www.nextplatform.com/wp-content/uploads/2015/08/ped_det.png
6. Web-Page: https://www.extremetech.com/wp-content/uploads/2016/04/Autoliv-pedestrian-detection-640x395.jpg
06.03.2017 Philip Mayer, Seminar, Mobile Human
Detection Systems, Heidelberg University 33