scene understanding with 3d deep networksfunk/nips16.pdf · conv class conv 3d box softmax l1 sdf 1...
TRANSCRIPT
![Page 1: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/1.jpg)
Scene Understanding
with 3D Deep Networks
Thomas Funkhouser
Princeton University
![Page 2: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/2.jpg)
Disclaimer: I am talking about the work of these people …
Shuran Song Andy Zeng Fisher Yu
Angela Dai Matthias Niessner Matt FisherJianxiong Xiao
Maciej HalberYinda Zhang
![Page 3: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/3.jpg)
Goal
Understanding indoor scenes observed in RGB-D images
• Robotics
• Augmented reality
• Virtual tourism
• Surveillance
• Home remodeling
• Real estate
• Telepresence
• Forensics
• Games
• etc.
![Page 4: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/4.jpg)
Goal
Understanding indoor scenes observed in RGB-D images
Input RGB-D Image(s)
Semantic Segmentation
![Page 5: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/5.jpg)
Goal
Understanding indoor scenes observed in RGB-D images in 3D
3D Scene Understanding
Input RGB-D Image(s)
Semantic Segmentation
![Page 6: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/6.jpg)
Goal
Understanding indoor scenes observed in RGB-D images in 3D
• Surface reconstruction
• Amodal object detection
• Object relationships
• Materials, lights, etc.
• Physical properties
• Novel views
• Info sharing
• Spatial inference
• Simulation
• etc.
Semantic Segmentation
![Page 7: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/7.jpg)
Goal for This Talk
Learn ConvNets to recognize patterns in voxels
• Local shape descriptor
• Amodal object detection
• Semantic scene completion
![Page 8: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/8.jpg)
Talk Outline
Local shape descriptor
Amodal object detection
Semantic scene completion
Scale
Small
Large
![Page 9: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/9.jpg)
Talk Outline
Local shape descriptor
Amodal object detection
Semantic scene completion
Scale
Small
LargeA. Zeng, S. Song, M. Niessner, M. Fisher, J. Xiao, T. Funkhouser,
“3DMatch: Learning Local Geometric Descriptors from 3D Reconstructions,”
submitted to CVPR 2017
![Page 10: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/10.jpg)
Local Shape Descriptor
Goal: train a discriminating 3D local shape descriptor from data
Local shape descriptor Local shape descriptor
…0.58 0.21 0.92 0.67 0.04 0.53
Match!
0.58 0.21 0.92 0.67 0.04 0.53 …
![Page 11: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/11.jpg)
Local Shape Descriptor
Challenge: where to get training data?
![Page 12: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/12.jpg)
Local Shape Descriptor: “3D Match”
Approach: train on wide-baseline correspondences in RGB-D reconstructions
“Ground truth” match between
RGB-D Images from different views
![Page 13: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/13.jpg)
Local Shape Descriptor: “3D Match”
Approach: train on wide-baseline correspondences in RGB-D reconstructions
![Page 14: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/14.jpg)
Local Shape Descriptor: “3D Match”
Method: sample true/false correspondences from RGB-D reconstructions,
train Siamese network
![Page 15: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/15.jpg)
Local Shape Descriptor: “3D Match”
Result: learns to discriminate local shapes found in real-world data
![Page 16: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/16.jpg)
Local Shape Descriptor: “3D Match” Results
Result 1: learned feature descriptor predicts RGB-D point correspondences
more accurately than hand-tuned descriptors
Match classification error at 95% recall
Fragment Alignment Success Rate
![Page 17: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/17.jpg)
Local Shape Descriptor: “3D Match” Results
Result 2: feature descriptor learned from RGB-D reconstructions provides
matching for recognizing poses of small objects in Amazon Picking Challenge
Predicting pose of 3D object model in RGB-D scan
Object pose prediction accuracy
![Page 18: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/18.jpg)
Local Shape Descriptor: “3D Match” Results
Result 3: feature descriptor learned from RGB-D reconstructions provides
discriminative matching of semantic correspondences on 3D meshes
![Page 19: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/19.jpg)
Talk Outline
Local Shape Descriptor
Amodal object detection
Semantic scene completion
Scale
Small
LargeS. Song and J. Xiao,
“Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images,”
CVPR 2016
![Page 20: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/20.jpg)
Object Detection
Goal: given a RGB-D image, find objects (labeled 3D amodal bounding boxes)
Input: Single RGB-D Output: labeled 3D Amodal Boxes
![Page 21: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/21.jpg)
[CVPR13] Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images
[IJCV14] Indoor Scene Understanding with RGB-D Images: Bottom-up Segmentation, Object Detection and semantic segmentation
[ECCV14] Object Detection and Segmentation using Semantically Rich Image and Depth Features
[CVPR15] Aligning 3D Models to RGB-D Images of Cluttered Scenes
[CVPR16] Cross Modal Distillation for Supervision Transfer
2D Operations
2D Instance
Segmentation
Coarse Pose
Classification
Point Cloud
Alignment
2D Contour
Detection
2D Region
Proposal
2D Object
Detection
Encode Depth Map
as Extra Channels
3D Amodal
Detection Result
Depth Map
Image
3D Output3D Input 3D
Object Detection
Most previous work:
![Page 22: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/22.jpg)
3D Deep Learning
Object Detection: “Deep Sliding Shapes”
Approach:
3D Amodal
Detection Result
Depth Map
Image
3D Operations 3D Output3D Input
![Page 23: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/23.jpg)
bed
Object Detection: “Deep Sliding Shapes”
Object Recognition NetworkRegion Proposal Network
RGB-D Image
![Page 24: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/24.jpg)
bed
Object Detection: “Deep Sliding Shapes”
Object Recognition NetworkRegion Proposal Network
RGB-D Image
![Page 25: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/25.jpg)
Object Detection: “Deep Sliding Shapes”
Data encoding:
1) Estimate
major directions
of room
2) Compute
TSDF
![Page 26: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/26.jpg)
Object Detection: “Deep Sliding Shapes”
Data encoding:
1) Estimate
major directions
of room
2) Compute
TSDF
2.5 m
5.2 m
5.2 m
![Page 27: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/27.jpg)
Object Detection: “Deep Sliding Shapes”
Data encoding:
1) Estimate
major directions
of room
2) Compute
TSDF
![Page 28: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/28.jpg)
Region
Proposal
Network
TSDF 3D Region Proposals
Object Detection: “Deep Sliding Shapes”
3D region proposal network:
![Page 29: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/29.jpg)
×3
×50Pixel Area
Physical Size
Object Detection: “Deep Sliding Shapes”
3D region proposal network:
![Page 30: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/30.jpg)
Object Detection: “Deep Sliding Shapes”
Multiscale 3D region proposal network:
![Page 31: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/31.jpg)
Inp
ut:
TS
DF
Con
v 1
ReL
U +
Po
ol
Con
v 2
ReL
U +
Po
ol
Object Detection: “Deep Sliding Shapes”
Multiscale 3D region proposal network:
![Page 32: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/32.jpg)
Inp
ut:
TS
DF
Con
v 1
ReL
U +
Po
ol
Con
v 2
ReL
U +
Po
ol
Con
v 3
ReL
U +
Po
ol
Conv
Class
Conv
3D Box
Softmax
L1
Smooth
Object Detection: “Deep Sliding Shapes”
Multiscale 3D region proposal network:
Receptive field: 0.4 m3
![Page 33: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/33.jpg)
Level 1 Anchors
0.6×0.2×0.4 m
0.5×0.5×0.2 m
0.6×0.2×0.4 m
Inp
ut:
TS
DF
Con
v 1
ReL
U +
Po
ol
Con
v 2
ReL
U +
Po
ol
Con
v 3
ReL
U +
Po
ol
Conv
Class
Conv
3D Box
Softmax
L1
Smooth
Object Detection: “Deep Sliding Shapes”
Multiscale 3D region proposal network:
Receptive field: 0.4 m3
![Page 34: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/34.jpg)
Con
v 4
ReL
U +
Po
ol
Conv
Class
Conv
3D Box
Softmax
L1
Smooth
Inp
ut:
TS
DF
Con
v 1
ReL
U +
Po
ol
Con
v 2
ReL
U +
Po
ol
Con
v 3
ReL
U +
Po
ol
Conv
Class
Conv
3D Box
Softmax
L1
Smooth
Object Detection: “Deep Sliding Shapes”
Multiscale 3D region proposal network:
Receptive field: 1 m3Receptive field: 0.4 m3
![Page 35: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/35.jpg)
Level 2 Anchors
Con
v 4
ReL
U +
Po
ol
Conv
Class
Conv
3D Box
Softmax
L1
Smooth
Object Detection: “Deep Sliding Shapes”
Receptive field: 1 m3
![Page 36: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/36.jpg)
bed
Object Detection: “Deep Sliding Shapes”
Object Recognition NetworkRegion Proposal Network
RGB-D Image
![Page 37: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/37.jpg)
project to 2D
Object Detection: “Deep Sliding Shapes”
Joint object recognition network:
![Page 38: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/38.jpg)
TSDF
Image Patch
Object Detection: “Deep Sliding Shapes”
Joint object recognition network:
![Page 39: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/39.jpg)
Object Detection: “Deep Sliding Shapes”
Joint object recognition network:
![Page 40: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/40.jpg)
Co
nv 1
Re
LU
+ P
oo
l
Co
nv 2
Re
LU
+ P
oo
l
Co
nv 3
Re
LU
FC
2
2D VGG on ImageNet
Con
ca
ten
atio
n
FC
3
FC
Cla
ss
FC
3D
Bo
x
So
ftm
ax
L1
Sm
oo
th
3D ConvNet
Object Detection: “Deep Sliding Shapes”
Joint object recognition network:
![Page 41: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/41.jpg)
Object Detection: “Deep Sliding Shapes” Experiments
Train and test on amodal boxes provided in SUN RGB-D
S. Song, S. Lichtenberg, and J. Xiao, “SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite,” CVPR 2015
![Page 42: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/42.jpg)
2D Deep Learning
3D Deep Learning
3D Non-Deep Learning
Object Detection: “Deep Sliding Shapes” Results
Quantitative comparisons:
Object detection accuracy on NYU v2 dataset (mAP)
![Page 43: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/43.jpg)
Sliding Shapes: sofa Ours: bathtub
Object Detection: “Deep Sliding Shapes” Results
Qualitative comparisons:
![Page 44: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/44.jpg)
Sliding Shapes: chair Ours: sofa
Object Detection: “Deep Sliding Shapes” Results
Qualitative comparisons:
![Page 45: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/45.jpg)
Sliding Shapes: table Ours: bed
Object Detection: “Deep Sliding Shapes” Results
Qualitative comparisons:
![Page 46: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/46.jpg)
Sliding Shapes: miss Ours: table and chairs
Object Detection: “Deep Sliding Shapes” Results
Qualitative comparisons:
![Page 47: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/47.jpg)
Sliding Shapes: toilet Ours: garbage bin+bed
Object Detection: “Deep Sliding Shapes” Results
Qualitative comparisons:
![Page 48: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/48.jpg)
Talk Outline
Local Shape Descriptor
Amodal object detection
Semantic scene completion
Scale
Small
Large S. Song, F. Yu, A. Zeng, A. Chang, M. Savva, and T. Funkhouser,
“Semantic Scene Completion from a Single Depth Image,”
submitted to CVPR 2017
![Page 49: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/49.jpg)
Input: Single view depth map Output: Semantic scene completion
Semantic Scene Completion
Goal: given an RGB-D image, label all voxels by semantic class
![Page 50: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/50.jpg)
3D Scene
visible surface
free space
occluded space
outside view
outside room
Semantic Scene Completion
Goal: given an RGB-D image, label all voxels by semantic class
![Page 51: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/51.jpg)
visible surface
free space
occluded space
outside view
outside room
3D Scene
Semantic Scene Completion
Goal: given an RGB-D image, label all voxels by semantic class
![Page 52: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/52.jpg)
semantic scene completion
This paper
scene completion Firman et al.
surface segmentation Silberman et al.
The occupancy and the object identity
are tightly intertwined !
3D Scene
Semantic Scene Completion
Prior work: segmentation OR completion
![Page 53: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/53.jpg)
Prediction: N+1 classes
SSCNet
Input: Single view depth map Output: Semantic scene completion
3D ConvNet
Semantic Scene Completion: “SSCNet”
Approach: end-to-end deep network
![Page 54: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/54.jpg)
Semantic Scene Completion : “SSCNet”
![Page 55: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/55.jpg)
Semantic Scene Completion : “SSCNet”
![Page 56: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/56.jpg)
Encode 3D space using flipped TSDF
Semantic Scene Completion : “SSCNet”
![Page 57: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/57.jpg)
Encode 3D space using flipped TSDF
Voxel size: 0.02 m
Semantic Scene Completion : “SSCNet”
![Page 58: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/58.jpg)
Local geometry
Receptive field: 0.98 m
Semantic Scene Completion : “SSCNet”
![Page 59: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/59.jpg)
High-level 3D context
via big receptive field
provided by
dilated convolution
Receptive field: 2.26
Semantic Scene Completion : “SSCNet”
![Page 60: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/60.jpg)
Multi-scale aggregation
Receptive field: 0.98 m Receptive field:1.62 m Receptive field: 2.26 m
Semantic Scene Completion : “SSCNet”
![Page 61: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/61.jpg)
Semantic Scene Completion: “SSCNet” Experiments
Where to get training data?
![Page 62: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/62.jpg)
Semantic Scene Completion: “SSCNet” Experiments
Where to get training data?
No dense volumetric ground truth with semantic labels for a complete scene
SUN3D: No semantic labelsNYU: only visible surfaces
![Page 63: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/63.jpg)
Semantic Scene Completion: “SSCNet” Experiments
SUNCG dataset
![Page 64: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/64.jpg)
Semantic Scene Completion: “SSCNet” Experiments
SUNCG dataset
• 46K houses
• 50K floors
• 400K rooms
• 5.6M object instances
![Page 65: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/65.jpg)
Semantic Scene Completion: “SSCNet” Experiments
SUNCG dataset
synthetic camera views depth
ground truth
semantic scene
completion
![Page 66: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/66.jpg)
Semantic Scene Completion: “SSCNet” Experiments
SUNCG dataset
![Page 67: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/67.jpg)
Train on SUNCG Test on NYU
Semantic Scene Completion: “SSCNet” Experiments
![Page 68: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/68.jpg)
Semantic Scene Completion: “SSCNet” Results
Result: better than previous volumetric completion algorithms
Comparison to previous algorithms for volumetric completion
![Page 69: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/69.jpg)
Zhang et al.
Ground Truth
Ours(SSCNet)
Color Image Observed Surface
Firman et al.
![Page 70: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/70.jpg)
Semantic Scene Completion: “SSCNet” Results
Result: better than previous 3D model fitting algorithms
Comparison to previous algorithms for 3D model fitting
![Page 71: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/71.jpg)
Ours(SSCNet)Geiger and WangLin et al.
Color Image Observed Surface Ground Truth
![Page 72: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/72.jpg)
Ours(SSCNet)Geiger and WangLin et al.
Color Image Observed Surface Ground Truth
![Page 73: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/73.jpg)
Ours(SSCNet)Geiger and WangLin et al.
Color Image Observed Surface Ground Truth
![Page 74: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/74.jpg)
Summary
Three projects where ConvNets are trained to recognize patterns in voxels
with different …
• Tasks
• Scales
• Training data
• Loss functions
• Network architectures
• Training protocols
![Page 75: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/75.jpg)
Future Challenges
Acquiring larger data sets
Leveraging geometric structure
Leveraging semantic structure
Better integration RGB and D
Better surface parameterizations
Finer-grained categories
Higher resolution
etc.
![Page 76: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/76.jpg)
Future Challenges
Acquiring larger data sets
Leveraging geometric structure
Leveraging semantic structure
Better integration RGB and D
Better surface parameterizations
Finer-grained categories
Higher resolution
etc.
![Page 77: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/77.jpg)
Future Challenges
Acquiring larger data sets
Leveraging geometric structure
Leveraging semantic structure
Better integration RGB and D
Better surface parameterizations
Finer-grained categories
Higher resolution
etc.
1,500 surface reconstructions 36,213 labeled objects
A. Dai, A. Chang, M. Savva,
M. Halber, T. Funkhouser, and M. Niessner,
“ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes,”
submitted to CVPR 2017.
![Page 78: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/78.jpg)
Future Challenges
Acquiring larger data sets
Leveraging geometric structure
Leveraging semantic structure
Better integration RGB and D
Better surface parameterizations
Finer-grained categories
Higher resolution
etc.
M. Halber, T. Funkhouser,
“Fine-to-Coarse Registration of RGB-D Scans,”
submitted to CVPR 2017
![Page 79: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/79.jpg)
Future Challenges
Acquiring larger data sets
Leveraging geometric structure
Leveraging semantic structure
Better integration RGB and D
Better surface parameterizations
Finer-grained categories
Higher resolution
etc.
M. Halber, T. Funkhouser,
“Fine-to-Coarse Registration of RGB-D Scans,”
submitted to CVPR 2017
![Page 80: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/80.jpg)
Future Challenges
Acquiring larger data sets
Leveraging geometric structure
Leveraging semantic structure
Better integration RGB and D
Better surface parameterizations
Finer-grained categories
Higher resolution
etc.
M. Halber, T. Funkhouser,
“Fine-to-Coarse Registration of RGB-D Scans,”
submitted to CVPR 2017
![Page 81: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/81.jpg)
Future Challenges
Acquiring larger data sets
Leveraging geometric structure
Leveraging semantic structure
Better integration RGB and D
Better surface parameterizations
Finer-grained categories
Higher resolution
etc.
Sleeping Area
ottoman
bed
sofadresser with mirror
dresser
nightstand
lamp
wall
dresser
dresser with mirror
Y. Zhang, M. Bai, J. Xiao, P. Kohli, and S. Izadi,
“DeepContext: Context-Encoding Neural Pathways
for 3D Holistic Scene Understanding,”
submitted to CVPR 2017
![Page 82: Scene Understanding with 3D Deep Networksfunk/nips16.pdf · Conv Class Conv 3D Box Softmax L1 SDF 1 l 2 l 3 l Smooth Conv Class Conv 3D Box Softmax L1 Smooth Object Detection: “Deep](https://reader034.vdocuments.us/reader034/viewer/2022050106/5f447dcff9036826f916552b/html5/thumbnails/82.jpg)
Acknowledgments
Princeton:• Angel Chang, Maciej Halber, Manolis Savva, Elena Sizikova,
Shuran Song, Jianxiong Xiao, Fisher Yu, Yinda Zhang, Andy Zeng
Collaborators:• Angela Dai, Matt Fisher, Matthias Niessner, Ersin Yumer
Data:• SUN3D, 7-Scenes, Analysis-by-Synthesis, NYU, Trimble, Planner5D
Funding:• Intel, NSF, Adobe
Thank You!