![Page 1: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/1.jpg)
3D Object Representations for Recognition
Yu Xiang
Computational Vision and Geometry Lab
Stanford University
1
![Page 2: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/2.jpg)
2D Object Recognition
2
Image classification/tagging/annotation
Ord
on
ez et al. ICC
V1
3
Object detection
Ren
et al. NIP
S15
Object segmentation
Lon
g et al. CV
PR
15
"man in black shirt
is playing guitar."
Image description generation
Karp
athy
et al. CV
PR
15
![Page 3: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/3.jpg)
Applications of 2D Object Recognition
3
Photo editing
Biometrics authentication
Image search/indexing
Visual surveillance
![Page 4: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/4.jpg)
These are all great, but…
4
2D recognition is NOT enough!
![Page 5: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/5.jpg)
Applications that need 3D Object Recognition
5
Autonomous Driving
Augmented Reality Gaming
Robotics
Any application that requires interaction with the 3D world!
![Page 6: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/6.jpg)
Goal: Infer the 3D World
6
The 3D worldA 2D image
• Interaction• Control• Decision making• NavigationEtc.
![Page 7: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/7.jpg)
Goal: Infer the 3D World
7
Room Layout EstimationLee et al. CVPR’09
Hedau, el al., ICCV’09Mallya & Lazebnik, ICCV’15
3D Object RecognitionKar et al., ICCV’15
Tulsiani & Malik, CVPR’15
Blocks WorldLarry Roberts, 1965
Blocks World RevisitedGupta et al., ECCV’10 Marr’s Theory
David Marr, 1978
Surface Layout/Normal EstimationHoiem et al., ICCV’05
Fouhey et al. ICCV’13, ECCV’14
![Page 8: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/8.jpg)
8
The image is from the KITTI detection benchmark (Geiger et al. CVPR’12)
My Work: 2D Object Detection
![Page 9: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/9.jpg)
9
My Work: 2D Object Detection
![Page 10: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/10.jpg)
10
My Work: 2D Segmentation and 3D Pose Estimation
![Page 11: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/11.jpg)
11
My Work: Occlusion Reasoning
![Page 12: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/12.jpg)
12
My Work: 3D Localization
![Page 13: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/13.jpg)
Contribution: 3D Object Representations
13
A 2D image
3D Object Representation
The 3D world
![Page 14: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/14.jpg)
Related Work: 2D Object Representations
14
Deformable part modelFelzenszwalb et al., TPAMI’10
• Viola & Jones, IJCV’01• Fergus et al. , CVPR’03• Leibe et al., ECCVW’04• Hoiem et al., CVPR’06
2D detection 3D pose Occlusion 3D location
• Vedaldi et al., ICCV’09• Maji & Malik, CVPR’09• Felzenszwalb et al., TPAMI’10• Malisiewicz et al., ICCV’11
• Divvala et al., ECCVW’12• Dolla´r et al., TPAMI’14Etc.
![Page 15: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/15.jpg)
Related Work: 2.5D Object Representations
15
Savarese & Fei-Fei ICCV’07
• Thomas et al., CVPR’06• Savarese & Fei-Fei ICCV’07• Kushal et al., CVPR’07
2D detection 3D pose Occlusion 3D location
• Su et al., ICCV’09• Sun et al., CVPR’10• Etc.
![Page 16: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/16.jpg)
Related Work: 3D Object Representations
16
3DDPMPepik et al., CVPR’12
2D detection 3D pose Occlusion 3D location
• Yan et al., ICCV’07• Hoiem et al., CVPR’07• Liebelt et al., CVPR’08, 10• Glasner et al. ICCV’11
• Pepik et al., CVPR’12• Xiang & Savarese, CVPR’12• Hejrati & Ramanan, NIPS’12• Fidler et al., NIPS’12Etc
![Page 17: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/17.jpg)
Contribution: 3D Object Representations
17
2D detection 3D pose Occlusion 3D location
![Page 18: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/18.jpg)
Outline
•3D Aspect Part Representation
•3D Voxel Pattern Representation
•A Benchmark for 3D Object Recognition in the Wild
•Conclusion and Future Work
18
![Page 19: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/19.jpg)
Outline
•3D Aspect Part Representation
•3D Voxel Pattern Representation
•A Benchmark for 3D Object Recognition in the Wild
•Conclusion and Future Work
19
![Page 20: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/20.jpg)
3D Aspect Part Representation
20
Viewpoint Variation
![Page 21: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/21.jpg)
21
Viewpoint: Azimuth 315˚, Elevation 30˚, Distance 2
Y. Xiang and S. Savarese. Estimating the aspect layout of object categories. In CVPR, 2012.
3D Aspect Part Representation
![Page 22: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/22.jpg)
3D Aspect Parts from 3D CAD Models
22
…
…
Mean Shape
![Page 23: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/23.jpg)
3D Aspect Part Representation
23
![Page 24: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/24.jpg)
Aspect Layout Model
24
An input image
3D aspect part representation
Aspect Layout Model
Viewpoint: Azimuth 315˚, Elevation 30˚, Distance 2
Output
![Page 25: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/25.jpg)
Aspect Layout Model
•Posterior distribution
object label2D part centercoordinates
3D object viewpoint
25
ሻ𝑃(𝑌, 𝐿, 𝑂, 𝑉|𝐼 ሻ∝ exp(𝐸(𝑌, 𝐿, 𝑂, 𝑉, 𝐼ሻ
ሻ𝐿 = (𝐥1, … , 𝐥𝑛ሻ, 𝐥𝑖 = (𝑥𝑖 , 𝑦𝑖
![Page 26: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/26.jpg)
Aspect Layout Model
• Energy function
1 2
( , )
( , , , ) ( , , , ), if 1( , , , , )
0, if 1
i i j
i i j
V O V I V O V YE Y L O V I
Y
l l l
unary potential pairwise potential
26
![Page 27: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/27.jpg)
Aspect Layout Model
•Unary potential
1
( , , , ), if unoccluded( , , , )
, if self-occluded
T
i i
i
i
O V IV O V I
w ll
part template feature vector
self-occlusion weight
27
![Page 28: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/28.jpg)
Aspect Layout Model
Front
Head
Left Right
Back
Tail
Root1 Root2Root3 Root4
Root5 Root6 Root7 Root8
1
( , , , ), if unoccluded( , , , )
, if occluded
T
i i
i
i
O V IV O V I
w ll
28
![Page 29: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/29.jpg)
Aspect Layout Model
•Pairwise potential
3D world
2D projection
2D observation
2 2
2 , , , , , , , ,( , , , ) ( cos( )) ( sin( ))i j x i j ij O V ij O V y i j ij O V ij O VV O V w x x d w y y d l l
29
![Page 30: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/30.jpg)
Aspect Layout Model
•Training with Structural SVM [1]
• Inference• Loop over discretized viewpoints
• Run Belief Propagation [2] under each viewpoint to predict part locations
[1] I. Tsochantaridis, T. Hofmann, T. Joachims and Y. Altun. Support vector machine learning for interdependent and structured output spaces. In ICML, 2004.
min𝜃
1
2‖𝜃‖2 + 𝜆
𝑡=1
𝑁
max𝑌,𝐿,𝑂,𝑉
𝜃𝑇𝛹𝑡,𝑌,𝐿,𝑂,𝑉 + 𝛥𝑡,𝑌,𝐿,𝑂,𝑉 − 𝜃𝑇𝛹𝑡,𝑌𝑡,𝐿𝑡,𝑂𝑡,𝑉𝑡
(𝑌∗, 𝐿∗, 𝑂∗, 𝑉∗ሻ = arg max𝑌,𝐿,𝑂,𝑉
𝐸(𝑌, 𝐿, 𝑂, 𝑉, 𝐼|𝜃)
30[2] J. S. Yedidia, W. T. Freeman, and Y. Weiss. Understanding belief propagation and its generalizations. In Exploring artificial intelligence in the new millennium, 2003.
![Page 31: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/31.jpg)
Aspect Layout Model
31
Method Ours [1] [2] [3] [4] [5] [6]
Viewpoint (cars) 93.4% 85.4 85.3 81 70 67 48.5
Method Ours Ours - baseline DPM [7] [8]
Viewpoint (cars) 64.8% 58.1 56.6 41.6
Cars from 3D Object dataset
[Savarese & Fei-Fei ICCV’07]
Cars fromEPFL dataset
[Ozuysal et al. CVPR’09]
Method Ours Ours - baseline DPM [7]
Viewpoint 63.4% 34.0 49.5
Chairs, tables, sofas and beds from IMAGE NET
[Deng et al. CVPR’09]
[1] N. Payet and S. Todorovic. From contours to 3d object detection and pose estimation. In ICCV, 2011.[2] D. Glasner, M. Galun, S. Alpert, R. Basri, and G. Shakhnarovich. Viewpoint-aware object detection and pose estimation. In ICCV, 2011.[3] M. Stark, M. Goesele, and B. Schiele. Back to the future: Learning shape models from 3d cad data. In BMVC, 2010.[4] J. Liebelt and C. Schmid. Multi-view object class detection with a 3D geometric model. In CVPR, 2010.[5] H. Su, M. Sun, L. Fei-Fei, and S. Savarese. Learning a dense multiview representation for detection, viewpoint classification. In ICCV, 2009.[6] M. Arie-Nachimson and R. Basri. Constructing implicit 3d shape models for pose estimation. In ICCV, 2009.[7] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part-based models. TPAMI, 2010.[8] M. Ozuysal, V. Lepetit, and P. Fua. Pose estimation for category specific multiview object localization. In CVPR, 2009.
• Best results upon publication in pose estimation and 3D part estimation
![Page 32: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/32.jpg)
Aspect Layout Model
32
Prediction: a=225, e=30, d=7 Prediction: a=330, e=15, d=7 Prediction: a=150, e=15 d=7
Prediction: a=300, e=45 d=23Prediction: a=45, e=90, d=5
Prediction: a=240, e=45, d=11
![Page 33: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/33.jpg)
Aspect Layout Model
33ImageNet dataset [Deng et al. 2010]
Prediction: a=30, e=15, d=2.5
Prediction: a=0, e=15, d=1.5 Prediction: a=0, e=30, d=7
Prediction: a=345, e=15, d=3.5a=60, e-30, d=2.5 Prediction: a=315, e=30, d=2 Prediction: a=60, e=15, d=2
![Page 34: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/34.jpg)
Wrong examples
34
Prediction: a=45, e=15, d=1.5 Prediction: a=225, e=30, d=7
Prediction: a=0, e=30, d=7
Prediction: a=345, e=15 d=2.5
![Page 35: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/35.jpg)
Application I: Object Co-detection with 3D Aspect Parts
35
Single Image Detector Co-detectorShoe model
S. Bao, Y. Xiang and S. Savarese. Object Co-detection. In ECCV, 2012.
![Page 36: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/36.jpg)
Application II: Multiview Object Tracking with 3D Aspect Parts
36
Azimuth=315.48Elevation=4.56Distance=4.98
Azimuth=1.34Elevation=2.78Distance=6.58
Azimuth=25.17Elevation=3.60Distance=3.56
Azimuth=89.12Elevation=3.73Distance=2.34
Frame 9 Frame 29
Frame 48Frame 69
Y. Xiang, C. Song, R. Mottaghi and S. Savarese. Monocular Multiview Object Tracking with 3D Aspect Parts. In ECCV, 2014.
![Page 37: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/37.jpg)
Application II: Multiview Object Tracking with 3D Aspect Parts
37
MIL L1 TLD Struct
[MIL] Babenko, B., Yang, M.H., Belongie, S.: Robust object tracking with online multiple instance learning. TPAMI, 2011.[L1] Bao, C., Wu, Y., Ling, H., Ji, H.: Real time robust l1 tracker using accelerated proximal gradient approach. In CVPR, 2012.[TLD] Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. TPAMI, 2012.[Struct] Hare, S., Saari, A., Torr, P.H.: Struck: Structured output tracking with kernels. In ICCV, 2011.
Ours: Multiview tracker
![Page 38: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/38.jpg)
How to handle occlusion?
38
Occlusion changes the appearances of objects.
?
![Page 39: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/39.jpg)
3D Aspectlet Representation
39
3D Aspect Parts Atomic 3D Aspect Parts
Y. Xiang and S. Savarese. Object Detection by 3D Aspectlets and Occlusion Reasoning. In 3dRR, 2013.
![Page 40: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/40.jpg)
3D Aspectlet Representation
40
3D Aspectlets
![Page 41: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/41.jpg)
3D Aspectlet Representation
41
![Page 42: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/42.jpg)
Object Detection Experiments
42
Dataset Outdoor-scene Indoor-scene
% occlusion < 0.3 0.3 – 0.6 > 0.6 <0.2 0.2-0.4 >0.4
# images 66 68 66 77 111 112
ALM [1] 72.3 42.9 35.5 38.5 25.0 20.2
DPM [2] 75.9 58.6 44.6 38.0 22.9 21.9
Ours 3D Aspectlets 80.2 63.3 52.9 45.9 34.5 28.0
[1] Y. Xiang and S. Savarese. Estimating the aspect layout of object categories. In CVPR, 2012.[2] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part-based models. TPAMI, 2010.
![Page 43: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/43.jpg)
Object Detection Experiments
43
Outdoor Scenes Indoor Scenes
![Page 44: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/44.jpg)
Outline
•3D Aspect Part Representation
•3D Voxel Pattern Representation
•A Benchmark for 3D Object Recognition in the Wild
•Conclusion and Future Work
44
![Page 45: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/45.jpg)
45
What are the 3D aspect parts for aeroplane and bottle?
![Page 46: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/46.jpg)
Data-Driven 3D Voxel Patterns
46
2D detection 3D pose Occlusion 3D location
Y. Xiang, W. Choi, Y. Lin and S. Savarese. Data-Driven 3D Voxel Patterns for Object Category Recognition. In CVPR, 2015.
![Page 47: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/47.jpg)
Training Pipeline Overview
47
…
2. 3D voxel exemplars
3. 3D voxel patterns
……4. Training 3D voxel pattern detectors
1. Align 2D images with 3D CAD models
![Page 48: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/48.jpg)
1. Align 2D Images with 3D CAD Models
Project of 3D CAD models
Depth ordering
3D annotations
3D CAD models
…
48A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In CVPR, 2012
![Page 49: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/49.jpg)
2. Building 3D Voxel Exemplars
Depth ordering
occludedvisibletruncated
2D mask labeling
3D CAD model
Voxelization
493D voxel model
occluded
self-occluded
visible
truncated
![Page 50: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/50.jpg)
A 3D voxel exemplar
50
2. Building 3D Voxel Exemplars
2D image2D segmentation mask
3D voxel model
ሻ𝐸𝑖 = (𝐼𝑖 , 𝑀𝑖 , 𝑉𝑖
![Page 51: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/51.jpg)
3D Voxel Patterns (3DVPs)
Clustering in 3D voxel space
3D Voxel Exemplars
51
3. Discovering 3D Voxel Patterns
…
![Page 52: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/52.jpg)
4. Training 3D Voxel Pattern detectors
• Train a ACF detector for each 3DVP.P. Dollár, R. Appel, S. Belongie, and P. Perona. Fast feature pyramids for object detection. TPAMI, 2014. 52
𝐷1
𝐷2
𝐷K
…
…
![Page 53: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/53.jpg)
Testing Pipeline Overview
Input 2D image
1. Apply 3DVP detectors
2. Transfer meta-data3. Occlusion reasoning
4. Backproject to 3D
3D localization 2D segmentation 53
2D detection
![Page 54: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/54.jpg)
1. Apply 3DVP Detectors
54
![Page 55: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/55.jpg)
1. Apply 3DVP Detectors
55
![Page 56: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/56.jpg)
563D Voxel Patterns
2. Transfer Meta-Data
![Page 57: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/57.jpg)
57
2. Transfer Meta-Data
![Page 58: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/58.jpg)
3. Occlusion Reasoning
58
Occlusion reasoning: find a set of visibility-compatible detections
𝐸 =
𝑖
(𝜓detection_score+𝜓truncationሻ +
𝑖𝑗
𝜓occlusion
![Page 59: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/59.jpg)
3. Occlusion Reasoning
59
![Page 60: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/60.jpg)
3. Occlusion Reasoning
60
![Page 61: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/61.jpg)
4. 3D Localization
61
Backprojection
![Page 62: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/62.jpg)
Car Detection and Orientation Estimation on KITTIObject Detection (AP) Object Detection and Orientation estimation (AOS)
Method Easy Moderate Hard Easy Moderate Hard
ACF [1] 55.89 54.77 42.98 N/A N/A N/A
DPM [2] 68.02 56.48 44.18 67.27 55.77 43.59
DPM-VOC+VP [3] 74.95 64.71 48.76 72.28 61.84 46.54
OC-DPM [4] 74.94 65.95 53.86 73.50 64.42 52.40
SubCat [5] 84.14 75.46 59.71 83.41 74.42 58.83
Regionlets [6] 84.75 76.45 59.70 N/A N/A N/A
AOG [7] 84.80 75.94 60.70 33.79 30.77 24.75
Ours 3DVP 84.81 73.02 63.22 84.31 71.99 62.11
[1] P. Dolla´r, R. Appel, S. Belongie, and P. Perona. Fast feature pyramids for object detection. TPAMI, 2014.[2] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part-based models. TPAMI, 2010.[3] B. Pepik, M. Stark, P. Gehler, and B. Schiele. Multi-view and 3d deformable part models. TPAMI, 2015.[4] B. Pepikj, M. Stark, P. Gehler, and B. Schiele. Occlusion patterns for object class detection. In CVPR, 2013.[5] E. Ohn-Bar and M. M. Trivedi. Learning to detect vehicles by clustering appearance patterns. T-ITS, 2015.[6] X. Wang, M. Yang, S. Zhu, and Y. Lin. Regionlets for generic object detection. In ICCV, 2013.[7] B. Li, T. Wu, and S.-C. Zhu. Integrating context and occlusion for car detection by hierarchical and-or model. In ECCV, 2014.
62
![Page 63: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/63.jpg)
Car Detection and Orientation Estimation on KITTIObject Detection (AP) Object Detection and Orientation estimation (AOS)
Method Easy Moderate Hard Easy Moderate Hard
ACF [1] 55.89 54.77 42.98 N/A N/A N/A
DPM [2] 68.02 56.48 44.18 67.27 55.77 43.59
DPM-VOC+VP [3] 74.95 64.71 48.76 72.28 61.84 46.54
OC-DPM [4] 74.94 65.95 53.86 73.50 64.42 52.40
SubCat [5] 84.14 75.46 59.71 83.41 74.42 58.83
Regionlets [6] 84.75 76.45 59.70 N/A N/A N/A
AOG [7] 84.80 75.94 60.70 33.79 30.77 24.75
Ours 3DVP 84.81 73.02 63.22 84.31 71.99 62.11
Ours Occlusion 87.46 75.77 65.38 86.92 74.59 64.11
[1] P. Dolla´r, R. Appel, S. Belongie, and P. Perona. Fast feature pyramids for object detection. TPAMI, 2014.[2] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part-based models. TPAMI, 2010.[3] B. Pepik, M. Stark, P. Gehler, and B. Schiele. Multi-view and 3d deformable part models. TPAMI, 2015.[4] B. Pepikj, M. Stark, P. Gehler, and B. Schiele. Occlusion patterns for object class detection. In CVPR, 2013.[5] E. Ohn-Bar and M. M. Trivedi. Learning to detect vehicles by clustering appearance patterns. T-ITS, 2015.[6] X. Wang, M. Yang, S. Zhu, and Y. Lin. Regionlets for generic object detection. In ICCV, 2013.[7] B. Li, T. Wu, and S.-C. Zhu. Integrating context and occlusion for car detection by hierarchical and-or model. In ECCV, 2014.
63
![Page 64: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/64.jpg)
Apply 3DVPs in CNN-based Object Detection
• 3DVPs as subcategories
• Two-stage detection framework
64
CNN
CarCatDogPerson….
An input image
Region proposals
• R. Girshick et al., CVPR’14• R. Girshick, ICCV’15
• S. Ren et al., NIPS’15• S. Gidaris and N. Komodakis, CoRR’15
![Page 65: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/65.jpg)
Subcategory-ware Region Proposal Network
65
Conv layersFeature extractionInput image
(image pyramid) Feature map
SubcategoryConv filters
…
Heatmaps
Regionproposals
…
…
RoI Generating Layer• Training: hard positives and
hard negatives• Testing: high score boxes
Feature Extrapolating Layer• Generate features in nearby
scales by extrapolatingarXiv:1604.04693
![Page 66: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/66.jpg)
66
Subcategory-ware Detection Network
arXiv:1604.04693
Region proposals
Input image
5 Conv layersFeature extraction
RoI pooling Layer [1]
FC6(4096)
FC7(4096)
FC8(K+1)
Class loss
Bounding box Regression loss
Subcategory classification loss
[1] R. Girshick. Fast R-CNN. ICCV, 2015.
![Page 67: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/67.jpg)
Car Detection and Orientation Estimation on KITTIObject Detection (AP) Object Detection and Orientation estimation (AOS)
Method Easy Moderate Hard Easy Moderate Hard
ACF [1] 55.89 54.77 42.98 N/A N/A N/A
DPM [2] 68.02 56.48 44.18 67.27 55.77 43.59
DPM-VOC+VP [3] 74.95 64.71 48.76 72.28 61.84 46.54
OC-DPM [4] 74.94 65.95 53.86 73.50 64.42 52.40
SubCat [5] 84.14 75.46 59.71 83.41 74.42 58.83
Regionlets [6] 84.75 76.45 59.70 N/A N/A N/A
AOG [7] 84.80 75.94 60.70 33.79 30.77 24.75
Mono3D [8] 92.33 88.66 78.96 91.01 86.62 76.84
Ours 3DVP 84.81 73.02 63.22 84.31 71.99 62.11
Ours Occlusion 87.46 75.77 65.38 86.92 74.59 64.11
Ours CNN 90.86 89.32 79.33 90.72 88.97 78.83[1] P. Dolla´r, R. Appel, S. Belongie, and P. Perona. Fast feature pyramids for object detection. TPAMI, 2014.[2] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part-based models. TPAMI, 2010.[3] B. Pepik, M. Stark, P. Gehler, and B. Schiele. Multi-view and 3d deformable part models. TPAMI, 2015.[4] B. Pepikj, M. Stark, P. Gehler, and B. Schiele. Occlusion patterns for object class detection. In CVPR, 2013.[5] E. Ohn-Bar and M. M. Trivedi. Learning to detect vehicles by clustering appearance patterns. T-ITS, 2015.[6] X. Wang, M. Yang, S. Zhu, and Y. Lin. Regionlets for generic object detection. In ICCV, 2013.[7] B. Li, T. Wu, and S.-C. Zhu. Integrating context and occlusion for car detection by hierarchical and-or model. In ECCV, 2014.[8] X. Chen, K. Kundu, Z. Zhang, H. Ma, S. Fidler, R. Urtasun. Monocular 3D Object Detection for Autonomous Driving, in CVPR, 2016.
67
Detection: Rank 3 Pose : Rank 1
![Page 68: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/68.jpg)
3D Voxel Patterns from PASCAL3D+ [1]
68
12 Rigid Categories
[1] Y. Xiang, R. Mottaghi, and S. Savarese. Beyond PASCAL: A benchmark for 3D object detection in the wild. In WACV, 2014.
![Page 69: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/69.jpg)
Detection and Pose Estimation on PASCAL3D+
69
Method Detection (AP)
DPM [1] 29.6
R-CNN [2] 56.9
Ours CNN 60.7
[1] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part-based models. TPAMI, 2010.[2] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv preprintarXiv:1311.2524, 2013.[3] Y. Xiang, R. Mottaghi, and S. Savarese. Beyond pascal: A benchmark for 3d object detection in the wild. In WACV, 2014.[4] B. Pepik, M. Stark, P. Gehler, and B. Schiele. Multi-view and 3d deformable part models. TPAMI, 2015.
Method 4 Views (AVP)
8 Views (AVP)
16 Views (AVP)
24 Views (AVP)
VDPM [3] 19.5 18.7 15.6 12.1
DPM-VOC+VP [4] 24.5 22.2 17.9 14.4
Ours CNN 47.5 31.9 24.5 19.3
![Page 70: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/70.jpg)
70
![Page 71: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/71.jpg)
71
![Page 72: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/72.jpg)
72
![Page 73: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/73.jpg)
73
![Page 74: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/74.jpg)
74
![Page 75: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/75.jpg)
Application: Online Multi-Object Tracking
75
Y. Xiang, A. Alahi and S. Savarese. Learning to Track: Online Multi-Object Tracking by Decision Making. In ICCV, 2015.
![Page 76: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/76.jpg)
Outline
•3D Aspect Part Representation
•3D Voxel Pattern Representation
•A Benchmark for 3D Object Recognition in the Wild
•Conclusion and Future Work
76
![Page 77: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/77.jpg)
•Build a large scale dataset for 3D object recognition in the wild
elevation
distance
azimuth
77
Goal
![Page 78: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/78.jpg)
3D Object Dataset
[1] S. Savarese and L. Fei-Fei. 3d generic object categorization, localization and pose estimation. In ICCV, 2007.
#category #instance Non-centeredobjects
Denseviewpoint
3D Shape
3D Object [1] 10 100
78
![Page 79: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/79.jpg)
EPFL Car Dataset
[2] M. Ozuysal, V. Lepetit, and P. Fua. Pose estimation for category specific multiview object localization. In CVPR, 2009.
#category #instance Non-centeredobjects
Denseviewpoint
3D Shape
3D Object [1] 10 100
EPFL Car [2] 1 20
79
![Page 80: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/80.jpg)
PASCAL VOC Dataset
[3] M. Everingham, L. Van Gool, C. K. I.Williams, J.Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. IJCV, 2010.
#category #instance Non-centeredobjects
Denseviewpoint
3D Shape
3D Object [1] 10 100
EPFL Car [2] 1 20
PASCAL VOC [3] 20 27,450
80
![Page 81: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/81.jpg)
KITTI Dataset
[4] A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In CVPR, 2012.
#category #instance Non-centeredobjects
Denseviewpoint
3D Shape
3D Object [1] 10 100
EPFL Car [2] 1 20
PASCAL VOC [3] 20 27,450
KITTI [4] 3 80,256
81
![Page 82: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/82.jpg)
Our Contribution: PASCAL3D+
82
#category #instance Non-centeredobjects
Denseviewpoint
3D Shape
3D Object [1] 10 100
EPFL Car [2] 1 20
PASCAL VOC [3] 20 27,450
KITTI [4] 3 80,256
PASCAL3D+ (Ours) 12 35,672
Y. Xiang, R. Mottaghi and S. Savarese. Beyond PASCAL: A benchmark for 3D object detection in the wild. In WACV, 2014.
![Page 83: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/83.jpg)
83
3D Annotation: 2D-3D Alignment
![Page 84: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/84.jpg)
84
PASCAL3D+: A Benchmark for 3D Object Recognition
![Page 85: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/85.jpg)
Our Contribution: ObjectNet3D
85
#category #instance Non-centeredobjects
Denseviewpoint
3D Shape
3D Object [1] 10 100
EPFL Car [2] 1 20
PASCAL VOC [3] 20 27,450
KITTI [4] 3 80,256
PASCAL3D+ (Ours) 12 35,672 79
ObjectNet3D (Ours) 100 178,633 44,147
Under Review
![Page 86: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/86.jpg)
86
ObjectNet3D: A Large Scale Database for 3D Object Recognition
Images from ImageNet 3D Shapes from 3D Warehouse and ShapeNet
2D-3D alignment
100 rigid object categories
![Page 87: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/87.jpg)
Annotation Demo
87
![Page 88: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/88.jpg)
bed car cup hammer
laptop mouse shoe teapot
Viewpoint Distribution
88
![Page 89: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/89.jpg)
Image-based 3D Shape Retrieval
89H.O. Song, Y. Xiang, S. Jegelka and S. Savarese. Deep Metric Learning via Lifted Structured Feature Embedding. In CVPR, 2016.
![Page 90: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/90.jpg)
Conclusion
•3D Aspect Part Representation
•3D Voxel Pattern Representation
•A Benchmark for 3D Object Recognition in the Wild• PASCAL3D+ and ObjectNet3D
90
![Page 91: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/91.jpg)
Future Work: Generalization of Object Recognition
91
Training Data(with annotations)
Testing Data(with few annotations or even no annotation)
• How to achieve generalization across domains?• Can 3D object representations improve generalization?• Can we find better 3D object representations for recognition?
![Page 92: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/92.jpg)
92
Future Work: Joint Recognition and Reconstruction
Utilize large scale 3D shape data
Shap
eN
etKarsch et al., CVPR’13
![Page 93: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/93.jpg)
93
Future Work: End-to-End Multi-Object Tracking
Video frames Object Detector Object Tracker Object Trajectories
Tracking by Detection
Video frames Object Detection and Tracking Object Trajectories
Learning to Detect and Track
![Page 94: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/94.jpg)
• 3D object recognition and scene geometry understanding• Holistic 3D scene understanding
94
Future Work: Putting Objects in the Scene
road
![Page 95: 3D Object Representations for Recognition16 3DDPM Pepik et al., CVPR’12 2D detection 3D pose Occlusion 3D location • Yan et al., ICCV’07 • Hoiem et al., CVPR’07 • Liebelt](https://reader033.vdocuments.us/reader033/viewer/2022053018/5f1ff64af13c0645a11ff731/html5/thumbnails/95.jpg)
Conclusion
•3D aspect part representation
•3D voxel pattern representation
•A Benchmark for 3D Object Recognition in the Wild
95
Thank you!