searching for objects using structure in indoor scenes ...morariu/publications/nagaraja...nagaraja...

4
NAGARAJA et al.: SEARCHING FOR OBJECTS USING STRUCTURE IN INDOOR SCENES 1 Searching for Objects using Structure in Indoor Scenes - Supplementary Material Varun K. Nagaraja [email protected] Vlad I. Morariu [email protected] Larry S. Davis [email protected] University of Maryland College Park, MD. USA. 1 Experiments and Results Table 1 shows the operating point of the region classifier. The thresholds for the detectors are set based on the best F1 point on the validation curves. We work with 18 categories and do not include the box category as its performance values are very low with an average precision of 1.4%. mean AP (%) 22.9 52.9 27.0 16.7 29.1 3.8 13.0 14.6 16.2 9.7 10.4 11.6 25.2 20.2 33.3 11.5 18.5 29.4 20.3 Prec (%) 87.5 79.8 52.7 65.1 69.8 16.7 37.8 41.7 56.0 65.7 75.0 48.0 57.5 78.6 73.6 44.1 27.8 90.9 59.3 Rec (%) 23.3 54.5 34.8 20.7 35.3 11.7 19.5 17.5 18.7 11.7 12.4 15.2 32.5 21.2 37.6 18.0 28.9 29.4 24.6 F1 (%) 36.8 64.8 41.9 31.4 46.9 13.8 25.7 24.7 28.0 19.8 21.2 23.1 41.5 33.3 49.7 25.5 28.3 44.4 34.8 Table 1: Operating characteristics of RCNN-depth. The table shows the performance of the region classification module RCNN-depth on the testing set of NYU depth v2 dataset. We use the top 100 region proposals and the threshold for the detectors is chosen such that we obtain the best F1 score on the validation set. RCNN-depth is called by our search strategy after exploring a region. c 2015. The copyright of this document resides with its authors. It may be distributed unchanged freely in print or electronic forms.

Upload: others

Post on 08-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Searching for Objects using Structure in Indoor Scenes ...morariu/publications/Nagaraja...NAGARAJA et al.: SEARCHING FOR OBJECTS USING STRUCTURE IN INDOOR SCENES 1 Searching for Objects

NAGARAJA et al.: SEARCHING FOR OBJECTS USING STRUCTURE IN INDOOR SCENES 1

Searching for Objects using Structure inIndoor Scenes - Supplementary MaterialVarun K. [email protected]

Vlad I. [email protected]

Larry S. [email protected]

University of MarylandCollege Park, MD. USA.

1 Experiments and ResultsTable 1 shows the operating point of the region classifier. The thresholds for the detectorsare set based on the best F1 point on the validation curves. We work with 18 categoriesand do not include the box category as its performance values are very low with an averageprecision of 1.4%.

meanAP (%) 22.9 52.9 27.0 16.7 29.1 3.8 13.0 14.6 16.2 9.7 10.4 11.6 25.2 20.2 33.3 11.5 18.5 29.4 20.3Prec (%) 87.5 79.8 52.7 65.1 69.8 16.7 37.8 41.7 56.0 65.7 75.0 48.0 57.5 78.6 73.6 44.1 27.8 90.9 59.3Rec (%) 23.3 54.5 34.8 20.7 35.3 11.7 19.5 17.5 18.7 11.7 12.4 15.2 32.5 21.2 37.6 18.0 28.9 29.4 24.6F1 (%) 36.8 64.8 41.9 31.4 46.9 13.8 25.7 24.7 28.0 19.8 21.2 23.1 41.5 33.3 49.7 25.5 28.3 44.4 34.8

Table 1: Operating characteristics of RCNN-depth. The table shows the performance ofthe region classification module RCNN-depth on the testing set of NYU depth v2 dataset.We use the top 100 region proposals and the threshold for the detectors is chosen such that weobtain the best F1 score on the validation set. RCNN-depth is called by our search strategyafter exploring a region.

c© 2015. The copyright of this document resides with its authors.It may be distributed unchanged freely in print or electronic forms.

Page 2: Searching for Objects using Structure in Indoor Scenes ...morariu/publications/Nagaraja...NAGARAJA et al.: SEARCHING FOR OBJECTS USING STRUCTURE IN INDOOR SCENES 1 Searching for Objects

2 NAGARAJA et al.: SEARCHING FOR OBJECTS USING STRUCTURE IN INDOOR SCENES

10 20 30 40 50 60 70 80 90 100

0.05

0.1

0.15

0.2

0.25

bathtub

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene+Objects Context (RandBgndSubset)

Scene+Objects Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 100

0.3

0.35

0.4

0.45

0.5

0.55

bed

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene+Objects Context (RandBgndSubset)

Scene+Objects Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 100

0.1

0.15

0.2

0.25

0.3

bookshelf

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene+Objects Context (RandBgndSubset)

Scene+Objects Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 100

0.05

0.1

0.15

0.2

chair

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene+Objects Context (RandBgndSubset)

Scene+Objects Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 100

0.1

0.15

0.2

0.25

0.3

counter

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene+Objects Context (RandBgndSubset)

Scene+Objects Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 100

−0.04

−0.02

0

0.02

0.04

0.06

0.08

desk

Number of Image Regions

Ave

rag

e P

recis

ion

Proposal Rank

Scene+Objects Context (RandBgndSubset)

Scene+Objects Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 1000.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

door

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene+Objects Context (RandBgndSubset)

Scene+Objects Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 1000.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

dresser

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene+Objects Context (RandBgndSubset)

Scene+Objects Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 100

0.05

0.1

0.15

0.2

garbage−bin

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene+Objects Context (RandBgndSubset)

Scene+Objects Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 100−0.05

0

0.05

0.1

0.15lamp

Number of Image Regions

Ave

rag

e P

recis

ion

Proposal Rank

Scene+Objects Context (RandBgndSubset)

Scene+Objects Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 100

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

monitor

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene+Objects Context (RandBgndSubset)

Scene+Objects Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 100

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

night−stand

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene+Objects Context (RandBgndSubset)

Scene+Objects Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 100

0.05

0.1

0.15

0.2

0.25

0.3

pillow

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene+Objects Context (RandBgndSubset)

Scene+Objects Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 100

0.05

0.1

0.15

0.2

0.25

sink

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene+Objects Context (RandBgndSubset)

Scene+Objects Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 100

0.15

0.2

0.25

0.3

0.35

sofa

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene+Objects Context (RandBgndSubset)

Scene+Objects Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 1000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

table

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene+Objects Context (RandBgndSubset)

Scene+Objects Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 1000.08

0.1

0.12

0.14

0.16

0.18

0.2

0.22

0.24

television

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene+Objects Context (RandBgndSubset)

Scene+Objects Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 100

0

0.05

0.1

0.15

0.2

0.25

0.3

toilet

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene+Objects Context (RandBgndSubset)

Scene+Objects Context (OEDBgndSubset)

Figure 1: Scene+Objects Context: Comparison of background selection techniques. Thesearch strategy trained with a background subset selected using determinant maximizationperforms better or equally well as the strategy trained with a background subset selectedrandomly. The main advantage of the determinant maximization based subset selection isthe repeatability of experiments unlike the random subset selection.

Page 3: Searching for Objects using Structure in Indoor Scenes ...morariu/publications/Nagaraja...NAGARAJA et al.: SEARCHING FOR OBJECTS USING STRUCTURE IN INDOOR SCENES 1 Searching for Objects

NAGARAJA et al.: SEARCHING FOR OBJECTS USING STRUCTURE IN INDOOR SCENES 3

10 20 30 40 50 60 70 80 90 100

0.05

0.1

0.15

0.2

0.25

bathtub

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene Context (RandBgndSubset)

Scene Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 100

0.3

0.35

0.4

0.45

0.5

0.55

bed

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene Context (RandBgndSubset)

Scene Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 100

0.1

0.15

0.2

0.25

0.3

bookshelf

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene Context (RandBgndSubset)

Scene Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 100

0.05

0.1

0.15

0.2

chair

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene Context (RandBgndSubset)

Scene Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 100

0.1

0.15

0.2

0.25

0.3

counter

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene Context (RandBgndSubset)

Scene Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 100

−0.04

−0.02

0

0.02

0.04

0.06

0.08

desk

Number of Image Regions

Ave

rag

e P

recis

ion

Proposal Rank

Scene Context (RandBgndSubset)

Scene Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 1000.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

door

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene Context (RandBgndSubset)

Scene Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 1000.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

dresser

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene Context (RandBgndSubset)

Scene Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 100

0.05

0.1

0.15

0.2

garbage−bin

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene Context (RandBgndSubset)

Scene Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 100−0.05

0

0.05

0.1

0.15lamp

Number of Image Regions

Ave

rag

e P

recis

ion

Proposal Rank

Scene Context (RandBgndSubset)

Scene Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 100

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

monitor

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene Context (RandBgndSubset)

Scene Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 100

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

night−stand

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene Context (RandBgndSubset)

Scene Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 100

0.05

0.1

0.15

0.2

0.25

0.3

pillow

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene Context (RandBgndSubset)

Scene Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 100

0.05

0.1

0.15

0.2

0.25

sink

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene Context (RandBgndSubset)

Scene Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 100

0.15

0.2

0.25

0.3

0.35

sofa

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene Context (RandBgndSubset)

Scene Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 1000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

table

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene Context (RandBgndSubset)

Scene Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 1000.08

0.1

0.12

0.14

0.16

0.18

0.2

0.22

0.24

television

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene Context (RandBgndSubset)

Scene Context (OEDBgndSubset)

10 20 30 40 50 60 70 80 90 100

0

0.05

0.1

0.15

0.2

0.25

0.3

toilet

Number of Image Regions

Avera

ge P

recis

ion

Proposal Rank

Scene Context (RandBgndSubset)

Scene Context (OEDBgndSubset)

Figure 2: Scene Context: Comparison of background selection techniques. The perfor-mance of the classifier trained with a background subset selected using determinant maxi-mization is comparable to that of the classifier trained with a random background subset. Themain advantage of the determinant maximization based subset selection is the repeatabilityof experiments unlike the random subset selection.

Page 4: Searching for Objects using Structure in Indoor Scenes ...morariu/publications/Nagaraja...NAGARAJA et al.: SEARCHING FOR OBJECTS USING STRUCTURE IN INDOOR SCENES 1 Searching for Objects

4 NAGARAJA et al.: SEARCHING FOR OBJECTS USING STRUCTURE IN INDOOR SCENES

Groundtruth Proposal Rank Scene Context Scene+Objects Context

lamp

night−stand

chair

night−stand night−stand

lamp

(a) Searching for lamp. Number of regions processed = 35

table table tablechair

table

chair

(b) Searching for table. Number of regions processed = 15

chair

chair

chair

chair

chair

table

chair

chair

table

sofa

chair

table

sofa

chair

chair

chair

(c) Searching for chair. Number of regions processed = 45

counter

sink

counter

(d) Searching for counter. Number of regions processed = 20Figure 3: Search results for different queries. We compare three strategies - ranked se-quence obtained from the region proposal technique (unaware of query class), ranked se-quence obtained from a classifier trained for a query class using scene context features aloneand sequence produced by a search strategy trained for a query class using both scene contextand object-object context features. Red boxes indicate regions labeled as query class, yellowboxes indicate regions other than the query class and blue boxes indicate regions labeled asbackground. The images show a state in the search sequence of different methods at a certainnumber of regions processed. Our strategy which uses both scene context and object-objectcontext can locate an object of the query class earlier than the other methods.