indoor segmentation and support inference from rgbd images

Indoor Segmentation and Support Inference from RGBD Images

Indoor Segmentation and Support Inference from RGBD ImagesNathan Silberman, Derek Hoiem,Pushmeet Kohli, Rob FergusGoal: Infer Support for Every RegionLamp Supported by NightstandNightstand Supported by FloorThe goal of this work isGive at least one example (lamp supported by)2Goal: Infer Support for Every RegionImage, Parse it, what other region is supporting it3Why infer physical support?

Interacting with objects may have physical consequences!Why is this an interesting problem?

GIVE ONE EXAMPLE4Why infer physical support: Recognition

Object on top of deskObject hanging from file cabinetWhy infer physical support: Recognition

Working with RGB+DepthCaptured with Microsoft KinectRestricted to Indoor ScenesI should point out that - Depth makes the problem easier Access to scene geometry- We can concentrate on representation 7NYU Depth Dataset Version 2.0Collected new NYU Depth DatasetMuch larger than NYU Depth 1.0464 Scenes1449 Densely Labeled framesOver 400,000 Unlabeled framesOver 800 Semantic ClassesFull videos availableLarger variation in scenes Dense Labels much higher quality

http://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.htmlHigh Quality Semantic LabelsBedPillow 1Pillow 2HeadboardNightstandLampWindowDresserPicture 1Wall 1WallPicture 3Doll 1Doll 2FloorPicture 2Pillow 3All of the images have dense, high quality object AND instance labels9High Quality Support LabelsSupport from behindSupport from belowSupport from hidden regionSegmentationSupport InferenceRGBD ImageWhat we want is to go from RGBD to a set of regions, a segmentation from which we can predict support relations

11

InputScene ParsingWhat we want is to go from RGBD to a set of regions, a segmentation from which we can predict support relations12

Major Surfaces Surface Normals Aligned Point Cloud

InputScene ParsingFit surface normals, find major surfaces13

Input Major Surfaces Surface Normals Aligned Point CloudSegmentation

Scene ParsingThen we want to segment the image into regions so that we can infer the support of the resulting regions.14

Segmentation Scheme similar to: Recovering Occlusion Boundaries from a Single Image D. Hoiem, A.N. Stein, A.A. Efros, and M. Hebert, ICCV 2007. Hierarchical SegmentationHierarchical Agglomerative Segmentation15Segmentation Scheme similar to: Recovering Occlusion Boundaries from a Single Image D. Hoiem, A.N. Stein, A.A. Efros, and M. Hebert, ICCV 2007.

Hierarchical Segmentation16Segmentation Scheme similar to: Recovering Occlusion Boundaries from a Single Image D. Hoiem, A.N. Stein, A.A. Efros, and M. Hebert, ICCV 2007.



Hierarchical Segmentation19


Scene ParsingThen we want to segment the image into regions so that we can infer the support of the resulting regions.20Scene Parsing


Support Inference

Then we want to segment the image into regions so that we can infer the support of the resulting regions.21SegmentationSupport InferenceRGBD ImageModeling Choice #1 All objects supported by a single object except Floor requires no support.

42314. Floor3. Wall1. Chair2. Picture(Inverted) Tree RepresentationImage RegionsSome notes on our model

BRIEF PAUSE23Modeling Choice #2All objects are either supported by another region in the image OR a hidden region.

Modeling Choice #2All objects are either supported by another region in the image OR a hidden region.

Deoderant supported by counterModeling Choice #2All objects are either supported by another region in the image OR a hidden region.

Cabinet supported by hidden regionModeling Choice #3Every object is either supported from below or from behind.

Modeling Choice #3Every object is either supported from below or from behind.

Deoderant supported from belowModeling Choice #3Every object is either supported from below or from behind.

Mirror supported from behindModeling Support: Structure ClassesStructure Classes encode high level support prior knowledge(1) Ground (2) Furniture (3) Prop or (4) Structure

Solving the full recognition problem is too hard, having this high level classes allows us to inject prior knowledge into support inference.30Modeling SupportGoal: For each region in regions, infer:Supporting region Support TypeStructure class

Modeling SupportGoal: For each region in regions, infer:Supporting region Support Type Structure class The formal problem per image:

Local SupportLocal Structure ClassPrior- supporting region- support typestructure classJoint Energy Factorizes into three terms: comes from logistic regressor trained on pairwise featuresLocal Support Energy

12- supporting region- support typestructure classLocal Structure Class Energy from logistic regressor trained on features from each individual region

- supporting region- support typestructure classA regions structure class helps predict its support.

StructureFurnitureStructureFloorORPrior (1/4): Transitions- supporting region- support typestructure class

Supporting regions should be nearby ORPrior (2/4): Support Consistency- supporting region- support typestructure classVert dist vs horz dist37A region requires no support if and only if its structure class is floorPrior (3/4): Ground Consistency

- supporting region- support typestructure class

A region is unlikely to be the floor if another floor region is lower than itORPrior (4/4): Global Ground ConsistencyFloorFloorFloorProp- supporting region- support typestructure class

Integer Program FormulationRelaxed to Linear ProgramWe relax the problem and solve it as a Linear Program (Gurobi LP Solver)For 99% of images, LP solution is within 1% of the global optimum40ExperimentsEvaluating SupportAccuracy = # of Correctly Labeled Support Relationships # of Total Labeled Support RelationshipsEvaluation with features extracted from:Regions from Ground Truth LabelsRegions from Segmentation

Baseline #1: Image Plane RulesHeuristic: look at neighboring regions for support

Baselines #2: Structure Class Rules Heuristic: Support is deterministic given Structure Classes

FloorFurniturePropStructure

Baselines #3: Support ClassifierUse only the output of support classifierEvaluating Support(Regions from Ground Truth Labels)

Examples of Manually Labeled RegionsMake clear that this is when we use ground truth regions first, remind ppl what support type means

46Evaluating Support(Regions from Ground Truth Labels)

Examples of Manually Labeled RegionsMake clear that this is when we use ground truth regions first, remind ppl what support type means

47ResultsGround Truth Regions

FloorFurniturePropStructureTALK THROUGH THE SLIDE!!!BE POSITIVE48ResultsGround Truth RegionsCorrect Prediction

TALK THROUGH THE SLIDE!!!BE POSITIVE49ResultsGround Truth RegionsCorrect PredictionIncorrect Prediction

TALK THROUGH THE SLIDE!!!BE POSITIVE50ResultsGround Truth RegionsCorrect PredictionIncorrect PredictionSupport from below

TALK THROUGH THE SLIDE!!!BE POSITIVE51ResultsGround Truth RegionsCorrect PredictionIncorrect PredictionSupport from behind

Support from belowTALK THROUGH THE SLIDE!!!BE POSITIVE52ResultsGround Truth RegionsCorrect PredictionIncorrect PredictionSupport from behind

Support from belowSupport from hidden regionTALK THROUGH THE SLIDE!!!BE POSITIVE53ResultsGround Truth Regions

Correct PredictionIncorrect PredictionSupport from behindSupport from belowSupport from hidden regionTALK THROUGH THE SLIDE!!!BE POSITIVE54ResultsGround Truth Regions

Correct PredictionIncorrect PredictionSupport from behindSupport from belowSupport from hidden regionTALK THROUGH THE SLIDE!!!BE POSITIVE55

Evaluating Support(Regions from Segmentation)Examples of Regions from Segmentation

Make clear that this is when we use ground truth regions first, remind ppl what support type means

56ResultsAutomatically Segmented Regions

Correct PredictionIncorrect PredictionSupport from behindSupport from belowSupport from hidden regionMistakesStructure class error (towel)- rug as floor error

57ResultsAutomatically Segmented Regions

Correct PredictionIncorrect PredictionSupport from behindSupport from belowSupport from hidden regionConclusionAlgorithm for inferring Physical SupportNovel Integer Program Formulation3D Cues for segmentation

Dataset:http://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.htmlCode:http://cs.nyu.edu/~silberman/projects/indoor_scene_seg_sup.html

Si {1..R, hidden, }

R

= argminS,T,M

E(S, T,M |I)

Si {1..R, hidden, }

R

= argminS,T,M

E(S, T,M |I)

E(S, T,M) =Ri=1

ES(S, T ) + EM (M) + EP (S, T,M)

Si

Ti

Mi

P (Si, Ti)

E(S, T,M) =Ri=1

ES(S, T ) + EM (M) + EP (S, T,M)

Si

Ti

Mi

P (Mi)

E(S, T,M) =Ri=1

ES(S, T ) + EM (M) + EP (S, T,M)

Si

Ti

Mi

Mi

E(S, T,M) =Ri=1

ES(S, T ) + EM (M) + EP (S, T,M)

Si

Ti

Mi

E(S, T,M) =Ri=1

ES(S, T ) + EM (M) + EP (S, T,M)

Si

Ti

E(S, T,M) =Ri=1

ES(S, T ) + EM (M) + EP (S, T,M)

Si

Ti

Mi

Mi

E(S, T,M) =Ri=1

ES(S, T ) + EM (M) + EP (S, T,M)

Si

Ti

u,v

wu,vi,j = si,j u, v

j,v

wu,vi,j mi,u i, u

si,j ,mi,uwu,vi,j {0, 1}, i, j, u, v

= argmins,t,m

i,j

si,jsi,j +i,u

mi,umi,u +i,j,u,v

wi,j,u,vwu,vi,j

s.t.j

si,j = 1,u

mi,u = 1 i

j,u,v

wu,vi,j = 1 i

si,2R+1 = mi,1 i

indoor segmentation and support inference from rgbd images

Documents

physical support

single image

occlusion boundaries

resulting regions

set of regions

derek hoiem

nyu depth dataset version

scene geometry