pedestrian detection by stereo vision on mobile robots · pedestrian detection by stereo vision on...

Seminar Heidelberg University Mobile Human Detection Systems

Pedestrian Detection by Stereo Vision on Mobile Robots

Philip Mayer Matrikelnummer: 3300646 06.03.2017

Motivation

06.03.2017 Philip Mayer, Seminar, Mobile Human

Detection Systems, Heidelberg University 2

Fig.1: Pedestrians Within Bounding Box [6] Fig.2: Car Pedestrian Detection [7]

Outline

1. Problem Formulation

2. Solution Approach

3. Stereo Vision

4. Methods

5. Results

6. Summary and Conclusion 06.03.2017

Philip Mayer, Seminar, Mobile Human Detection Systems, Heidelberg University

3

1. Problem Formulation

Given: • Stereo Vision Depth Image • Mobile Robot • Unknown Background • Cluttered Environment • Crowded Places

Required: • Pedestrian Detection Also If Partially Occluded



2. Solution Approach

Fig.3: Depth Image [1]

Fig.4: Segmented Regions [1]

Fig.5: Candidates [1]

Fig.6: Detected Humans[1] Fig.7: Block Diagram Solution Approach



3. Stereo Vision



Fig.10: Stereo Vision – Geometric Setup [3]

𝑃𝑟𝑒𝑎𝑙

𝑥𝑟𝑒𝑎𝑙

𝑦𝑟𝑒𝑎𝑙

𝑧𝑟𝑒𝑎𝑙

𝑦′ 𝑥′

𝑦 𝑥

𝜆

𝜆 𝑃

𝑃′

• 𝐴, 𝐴‘ – Optical Axis • 𝑂, 𝑂‘ – Lense Centers • 𝐵 – Baseline • 𝑃𝑟𝑒𝑎𝑙 – Point in real space • 𝑃′– Projection of 𝑃𝑟𝑒𝑎𝑙 on Image 2 • 𝑃 – Projection of 𝑃𝑟𝑒𝑎𝑙 on Image 1

3. Stereo Vision



Fig.8: Color Image 1 – Left Lense [5] Fig.9: Color Image 2 – Right Lense [5]

3. Stereo Vision



Distance To Camera:

0,5 m Undefined 8 m

Fig.11: Depth Image 1 – Left Lense [5] Fig.12: Depth Image 2 – Right Lense [5]

4. Methods Graph-Based Segmentation



Fig.3: Depth Image [1] Fig.4: Segmented Regions [1]




i

j

𝐸𝑖𝑚𝑎𝑥,𝑗𝑚𝑎𝑥

0,0 α

α

𝑖𝑚𝑎𝑥 =𝑖𝑚𝑎𝑔𝑒 𝑤𝑖𝑑𝑡ℎ 𝑤

𝑐𝑒𝑙𝑙 𝑤𝑖𝑑𝑡ℎ 𝛼

𝑗𝑚𝑎𝑥 =𝑖𝑚𝑎𝑔𝑒 ℎ𝑒𝑖𝑔ℎ𝑡 ℎ

𝑐𝑒𝑙𝑙 ℎ𝑒𝑖𝑔ℎ𝑡 𝛼

Fig.13: Depth Image With Grid [1]




Fig.14: Random Pixel Selection Within Depth Image Grid Cell




i

j

𝐸𝑖,𝑗 → 𝑃𝑖,𝑗

0,0

𝑃𝑖,𝑗 =

𝑝𝑖,𝑗 𝑥𝑝𝑖,𝑗 𝑦𝑝𝑖,𝑗 𝑧

Point 𝑃𝑖,𝑗 in 3D-Space

Fig.15: Depth Image With Grid Points For Depth And Normals Graph [1]




𝑃𝑖,𝑗 𝑃𝑖+1,𝑗 𝑃𝑖−1,𝑗

𝑃𝑖,𝑗−1

𝑃𝑖,𝑗+1

𝑤𝐷𝑒𝑝𝑡ℎ = 𝑧1 − 𝑧2 𝑧1 = 𝐷𝑒𝑝𝑡ℎ 𝑜𝑓 𝑃𝑖,𝑗

𝑧2 = 𝐷𝑒𝑝𝑡ℎ 𝑜𝑓 𝑃𝑖+1,𝑗

𝑤 = 𝐸𝑑𝑔𝑒 𝑊𝑒𝑖𝑔ℎ𝑡

Fig.16: Depth Graph Weights Calculation




8 Neighbors Of Pi,j

Pi,j

Pi,j Pi+1,j

Pi+1,j−1

Pi+1,j+1

Pi−1,j

Pi−1,j−1

Pi−1,j+1 Pi,j+1

Pi,j−1

• 9 Samples of 𝑃 in 3D-Space • Least-Square-Roots Plane Normals 𝑛𝑖,𝑗 Fig.17: Depth Graph Normals Calculation




𝑃𝑖,𝑗 𝑃𝑖+1,𝑗 𝑃𝑖−1,𝑗

𝑃𝑖,𝑗−1

𝑃𝑖,𝑗+1

𝑤𝑁𝑜𝑟𝑚𝑎𝑙 = 𝑐𝑜𝑠−1(𝑣 ∙ 𝑢)

𝑢 = 𝑁𝑜𝑟𝑚𝑎𝑙 𝑜𝑓 𝑃𝑖,𝑗

𝑣 = 𝑁𝑜𝑟𝑚𝑎𝑙 𝑜𝑓 𝑃𝑖+1,𝑗

𝑤 = 𝐸𝑑𝑔𝑒 𝑊𝑒𝑖𝑔ℎ𝑡

Fig.18: Normals Graph Weights Calculation




𝑅𝑒𝑔𝑖𝑜𝑛 𝑝𝑜𝑖𝑛𝑡𝑠 𝑖𝑛 𝐺𝐷𝑒𝑝𝑡ℎ

𝑅𝑒𝑔𝑖𝑜𝑛 𝑝𝑜𝑖𝑛𝑡𝑠 𝑖𝑛 𝐺𝑁𝑜𝑟𝑚𝑎𝑙

𝑅𝑒𝑔𝑖𝑜𝑛 𝑟𝑖

• Regions 𝑟𝑖 ∈ 𝑅

• Minimal size

of a region is β

Filtering noise

Fig.19: Region Condition

4. Methods Filtering and Merging



Fig.5: Candidates [1] Fig.4: Segmented Regions [1]




x

y

𝑥2 𝑥1

𝑦2

𝑦1

h

w

𝑤 = 𝑥2 − 𝑥1

ℎ = 𝑦2 − 𝑦1

μ𝑥 = 𝑤

2

μ𝑦 = ℎ

2

μ𝑧 = 𝑚𝑒𝑎𝑛 𝑑𝑒𝑝𝑡ℎ 𝑧 (𝑟𝑖)

Bounding Box

𝑟𝑖

Fig.20: Region Attributes Calculation




1. Select 3 Points Randomly n-Times From 𝑟𝑖 Hypothesis Plane 𝜋𝑘

2. Maximum Number Of Points Fitting The Plane 𝜋𝑘

𝑚𝑎𝑥𝑘=1 𝑛 𝑝 ∈ 𝑟𝑖 : 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑜𝑓 𝑝 𝑡𝑜 𝜋𝑘 < 𝜀

𝑟𝑖

y

x

z

Points above 𝜋𝑘

Points below 𝜋𝑘

Points with distance to 𝜋𝑘 < 휀

3 randomly selected Points 𝜋𝑘

Fig.21: Hypothesis Plane

𝜋𝑘




Finding a rule specifiying valid ranges for: • Mean Depth • Height • Width • Minimum Inlier Fraction

Rule derived from positive examples in the training set Eliminate regions unable to be humans




Region too small but planar:

• 𝑆𝑖𝑧𝑒(𝑟𝑖) < 𝛽 • High number of fitting points on 𝜋𝑘 • Mean depth rule satisfied Merging regions (merging condition)

𝜇𝑥𝑧 𝑟𝑖 − 𝜇𝑥𝑧 𝑟𝑗 < 𝛿𝑥𝑧 and 𝜇𝑦 𝑟𝑖 − 𝜇𝑦 𝑟𝑗 < 𝛿𝑦

Important step due to detached parts by segmentation




• Set of regions Set of (unscaled) candidates

• Classifcation needs scaled candidates

Copy pixels of regions into candidate image with size 𝑤𝑐 × ℎ𝑐

• If pixel copied raw depth pixel

• Undefined otherwise

• Candidates 𝑐𝑖: Candidate image + bounding box

Output candidate set C

4. Methods Candidate Classification



Fig.4: Segmented Regions [1] Fig.6: Detected Humans[1]




Bounding Box

8x8 Pixel Cell

Δ𝐷𝑒𝑝𝑡ℎ𝑥 = 222 – 55 = 167

Δ𝐷𝑒𝑝𝑡ℎ𝑦 = 235 – 33 = 202

𝐺𝑟𝑎𝑑𝑖𝑒𝑛𝑡 𝑉𝑒𝑐𝑡𝑜𝑟 𝑣 𝐺 =Δ𝐷𝑒𝑝𝑡ℎ𝑥Δ𝐷𝑒𝑝𝑡ℎ𝑦

=167202

2x2 Cell Box

Fig.23: Candidate Image With Bounding Box And Fixed Size [1]

Fig.22: Gradient Vector Calculation [2]




𝑀𝑎𝑔𝑛𝑖𝑡𝑢𝑑𝑒

Angle [Deg]

𝑀𝑎𝑔𝑛𝑖𝑡𝑢𝑑𝑒 𝑀 = 1672 + 2022 = 262,1 𝐺𝑟𝑎𝑑𝑖𝑒𝑛𝑡 𝐴𝑛𝑔𝑙𝑒 Θ = arctan167

202= 69,3°

Fig.24: Histogram Of Oriented Depth




50% Box Overlap Yellow: Initial Step

2x2 Cell Box

Green: Preceeding Step

4 Cell Histograms For Normalization

Vector of Histograms Candidate Descriptor for SVM

Fig.26: Candidate Image With Blocks For Normalization [1]




Fig.27: Linear Support Vector Machine [4]

Positive Example

Negative Example

A

B




• Depth image frames from training set

• Candidates labeled as positive or negative

Fig.28: Support Vector Machine Scheme [2]

- Set of Humans H

- Set of Candidates C

5. Results

„Hallway“ „Café“

Distances 0,5 – 8 [m] 0,5 – 5 [m]

Occlusion Level Varying Often

Environment Not Cluttered Cluttered

Ergonomic Position of People

Upright Various Poses



Two Sets Of Experiments: 1. Recall & Precision 2. Impact of varying number of

training examples on Recall & Precision

5. Results



Hallway Dataset Café Dataset

Fig.29: Accuracy Results, (a) Hallway Data Set, (b) Café Data Set [1]

Equal Error Rate (EER)

84

84

75

75

𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑃

𝑇𝑃 + 𝐹𝑁 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =

𝑇𝑃

𝑇𝑃 + 𝐹𝑃 𝑇𝑃 = 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠

𝐹𝑁 = 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠

𝐹𝑃 = 𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠

5. Results



Hallway Dataset Café Dataset

Fig.30: Impact On Accuracy By Reduction Of Positive Training Examples, (a) Hallway Data Set, (b) Café Data Set [1]

6. Summary & Conclusion



• Stereo Vision • Segmentation Algorithm • Filtering and merging • HOD Descriptor • SVM • Precision, Recall • Comparison of impact on precision and recall due to less training for SVM

• Missing Information: Impact Of Resolution Loss • Comparison Of Datasets: Environmental Difference, Different Ergonomic Positions • Presented Depth Image: No Reference About Depth Information Encoding • No Measure Units in Data Sheet Table

Paper (Literatur)

1. Fast Human Detection for Indoor Mobile Robots Using Depth Images – 2013 IEEE International Conference on Robotics and Automation (ICRA) Karlsruhe, Germany, May 6-10, 2013

2. L. Spinello and K. Arras, “People Detection in RGB-D Data,” in Proceedings of IROS 2011, pp. 3838–3843 Perma-Link: http://ref.scielo.org/cmkfvr

3. Web-Page: https://en.wikipedia.org/wiki/Support_vector_machine

4. Web-Page: http://vision.middlebury.edu/stereo/data/scenes2003/

5. Web-Page: https://www.nextplatform.com/wp-content/uploads/2015/08/ped_det.png

6. Web-Page: https://www.extremetech.com/wp-content/uploads/2016/04/Autoliv-pedestrian-detection-640x395.jpg



http://ref.scielo.org/cmkfvr



https://en.wikipedia.org/wiki/Support_vector_machine



http://vision.middlebury.edu/stereo/data/scenes2003/



https://www.nextplatform.com/wp-content/uploads/2015/08/ped_det.png




pedestrian detection by stereo vision on mobile robots · pedestrian detection by stereo vision on...

Documents