large-scale image parsing

58
LARGE-SCALE IMAGE PARSING Joseph Tighe and Svetlana Lazebnik University of North Carolina at Chapel Hill road building car sky

Upload: alagan

Post on 24-Feb-2016

30 views

Category:

Documents


0 download

DESCRIPTION

sky. building. car. Large-Scale Image Parsing. road. Joseph Tighe and Svetlana Lazebnik University of North Carolina at Chapel Hill. Small-scale image parsing Tens of classes, hundreds of images. Figure from Shotton et al. (2009). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Large-Scale Image  Parsing

LARGE-SCALE IMAGE PARSING

Joseph Tighe and Svetlana LazebnikUniversity of North Carolina at Chapel Hill

road

building

car

sky

Page 2: Large-Scale Image  Parsing

Small-scale image parsingTens of classes, hundreds of images

He et al. (2004), Hoiem et al. (2005), Shotton et al. (2006, 2008, 2009), Verbeek and Triggs (2007), Rabinovich et al. (2007), Galleguillos et al. (2008), Gould et al. (2009), etc.

Figure from Shotton et al. (2009)

Page 3: Large-Scale Image  Parsing

Large-scale image parsingHundreds of classes, tens of thousands of images

build

ing floor seawater

sand

perso

nsky

scrap

er sign

mirror

pillow

founta

inflo

wersho

pcou

nter to

ppa

per

furnit

urecrane pot

arcad

ebri

dge

windshi

eld brick

clock

drawer fan

dishw

asher

vase

closet

hand

lebo

ttleou

tlet

bag

tail li

ght

lights

witch

02000000400000060000008000000

1000000012000000

Non-uniform class frequencies

Page 4: Large-Scale Image  Parsing

Large-scale image parsingHundreds of classes, tens of thousands of images

Evolving training set

http://labelme.csail.mit.edu/

Non-uniform class frequencies

Page 5: Large-Scale Image  Parsing

Challenges What’s considered important for small-

scale image parsing? Combination of local cues Multiple segmentations, multiple scales Context

How much of this is feasible for large-scale, dynamic datasets?

Page 6: Large-Scale Image  Parsing

Our first attempt: A nonparametric approach

Lazy learning: do (almost) nothing up front

To parse (label) an image we will: Find a set of similar images Transfer labels from the similar images by

matching pieces of the image (superpixels)

Page 7: Large-Scale Image  Parsing

Finding Similar Images

Page 8: Large-Scale Image  Parsing

Ocean

Open Field

Highway

Street

Forest

Mountain

Inner City

Tall BuildingWhat is depicted in this image?

Which image is most similar?

Then assign the label from the most similar image

Page 9: Large-Scale Image  Parsing

Pixels are a bad measure of similarity

Most similar according to pixel distance Most similar according to “Bag of Words”

Page 10: Large-Scale Image  Parsing

Origin of the Bag of Words model

Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)

US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/

Page 11: Large-Scale Image  Parsing

What are words for an image?

Page 12: Large-Scale Image  Parsing
Page 13: Large-Scale Image  Parsing
Page 14: Large-Scale Image  Parsing
Page 15: Large-Scale Image  Parsing
Page 16: Large-Scale Image  Parsing

Wing Tail

WheelBuildingPropeller

Page 17: Large-Scale Image  Parsing

Wing TailWheelBuildingPropeller Jet Engine

Page 18: Large-Scale Image  Parsing

Wing TailWheelBuildingPropeller Jet Engine

Page 19: Large-Scale Image  Parsing

Wing TailWheelBuildingPropeller Jet Engine

Page 20: Large-Scale Image  Parsing

But where do the words come from?

Page 21: Large-Scale Image  Parsing
Page 22: Large-Scale Image  Parsing

Then where does the dictionary come from?

Page 23: Large-Scale Image  Parsing

Example Dictionary

Source: B. Leibe

Page 24: Large-Scale Image  Parsing

Another dictionary

………

Source: B. Leibe

Page 25: Large-Scale Image  Parsing

Fei-Fei et al. 2005

Page 26: Large-Scale Image  Parsing

Outline of the Bag of Words method

Divide the image into patches Assign a “word” for each patch Count the number of occurrences of

each “word” in the image

Page 27: Large-Scale Image  Parsing

Does this work for our problem?

65,536 Pixels 256 Dimensions

Page 28: Large-Scale Image  Parsing

Which look the most similar?

Page 29: Large-Scale Image  Parsing

Which look the most similar?

building

roadcar

sky

building

roadcar

sky

building

roadcar

sky

building

roadcar

sky

building

building

roadcar

sky

tree

sky

sky sky

sky

tree

tree

buildingsand

mountain

carroad

Page 30: Large-Scale Image  Parsing

Step 1: Scene-level matchingGist

(Oliva & Torralba, 2001)

Spatial Pyramid(Lazebnik et al.,

2006)Color

Histogram

Retrieval set: Source of possible labelsSource of region-level

matches

Page 31: Large-Scale Image  Parsing

Step 2: Region-level matching

Page 32: Large-Scale Image  Parsing

Step 2: Region-level matching

Superpixels(Felzenszwalb & Huttenlocher, 2004)

Page 33: Large-Scale Image  Parsing

Step 2: Region-level matching

Snow

Road

Tree

BuildingSky

Pixel Area (size)

Page 34: Large-Scale Image  Parsing

Road

Sidewalk

Step 2: Region-level matching

Absolute mask(location)

Page 35: Large-Scale Image  Parsing

Step 2: Region-level matching

Road

SkySnowSidewalk

Texture

Page 36: Large-Scale Image  Parsing

Step 2: Region-level matching

Building

Sidewalk

Road

Color histogram

Page 37: Large-Scale Image  Parsing

Step 2: Region-level matching

Superpixels(Felzenszwalb & Huttenlocher, 2004)

Superpixel features

Page 38: Large-Scale Image  Parsing

Region-level likelihoods Nonparametric estimate of class-conditional

densities for each class c and feature type k:

Per-feature likelihoods combined via Naïve Bayes:

),(#))),(((#)|)((ˆ

cDcrfNcrfP ik

ik kth feature

type of ith region

Features of class c within some radius

of riTotal features of

class c in the dataset

k

iki crfPcrPfeatures

)|)((ˆ)|(ˆ

Page 39: Large-Scale Image  Parsing

Region-level likelihoods

Building Car Crosswalk

SkyWindowRoad

Page 40: Large-Scale Image  Parsing

Step 3: Global image labeling

How do we resolve issues like this?

sky

tree

sand

road

searoad

Original imageMaximum likelihood labeling sky

sand

sea

Page 41: Large-Scale Image  Parsing

Step 3: Global image labeling

Compute a global image labeling by optimizing a Markov random field (MRF) energy function:

i ji

jijiii cccccrLE,

),(][),(log)( cLikelihood score for region ri and label ci

Co-occurrence penalty

Vector of

region labels

Regions Neighboring regions

Smoothing penalty

Page 42: Large-Scale Image  Parsing

Step 3: Global image labeling

Compute a global image labeling by optimizing a Markov random field (MRF) energy function:

Maximum likelihood labeling

Edge penalties Final labeling Final edge penalties

road

building

carwindow

sky

road

building

car

sky

i ji

jijiii cccccrLE,

),(][),(log)( cLikelihood score for region ri and label ci

Co-occurrence penalty

Vector of

region labels

Regions Neighboring regions

Smoothing penalty

Page 43: Large-Scale Image  Parsing

Step 3: Global image labeling

Compute a global image labeling by optimizing a Markov random field (MRF) energy function:

sky

tree

sand

road

searoad

sky

sand

sea

Original imageMaximum likelihood labeling

Edge penalties MRF labeling

i ji

jijiii cccccrLE,

),(][),(log)( cLikelihood score for region ri and label ci

Co-occurrence penalty

Vector of

region labels

Regions Neighboring regions

Smoothing penalty

Page 44: Large-Scale Image  Parsing

Joint geometric/semantic labeling

Semantic labels: road, grass, building, car, etc. Geometric labels: sky, vertical, horizontal

Gould et al. (ICCV 2009)

sky

tree car

road

sky

horizontal

vertical

Original image Semantic labeling Geometric labeling

Page 45: Large-Scale Image  Parsing

Joint geometric/semantic labeling

Objective function for joint labeling:

ir

ii gcEEF regions

),()()(),( gcgc

Geometric/semantic consistency penalty

Semantic labels

Geometric labels

Cost of semantic labeling

Cost of geometric labeling

sky

tree car

road

sky

horizontal

vertical

Original image Semantic labeling Geometric labeling

Page 46: Large-Scale Image  Parsing

Example of joint labeling

Page 47: Large-Scale Image  Parsing

Understanding scenes on many levels

To appear at ICCV 2011

Page 48: Large-Scale Image  Parsing

DatasetsTraining images Test images Labels

SIFT Flow (Liu et al., 2009) 2,488 200 33

Barcelona (Russell et al., 2007) 14,871 279 170

LabelMe+SUN 50,424 300 232

Page 49: Large-Scale Image  Parsing

Datasets

wall

book

s... plate

chair

bed

keyb

oard

utensi

l

scree

nde

sk

cupbo

ardna

pkin

placem

at cuppic

ture

counte

r... lamp

toilet

1001000

10000100000

1000000

# o

f Su-

perp

ixel

s

buildi

ng tree

road car

window riv

er rock

sandde

sert

perso

nfen

ce

awnin

g

crossw

alk boat po

le cow moon

1001000

10000100000

1000000

Log

Scal

e (x

1000

)

Training images Test images Labels

SIFT Flow (Liu et al., 2009) 2,488 200 33

Barcelona (Russell et al., 2007) 14,871 279 170

LabelMe+SUN 50,424 300 232

Page 50: Large-Scale Image  Parsing

Overall performance

SIFT Flow Barcelona LabelMe + SUNSemantic Geom

.Semantic Geom. Semantic Geom.

Base 73.2 (29.1)

89.8 62.5 (8.0) 89.9 46.8 (10.7)

81.5

MRF 76.3 (28.8)

89.9 66.6 (7.6) 90.2 50.0 (9.1) 81.0

MRF + Joint 76.9 (29.4)

90.8 66.9 (7.6) 90.7 50.2 (10.5)

82.2LabelMe + SUN Indoor LabelMe + SUN OutdoorSemantic Geom. Semantic Geom.

Base 22.4 (9.5) 76.1 53.8 (11.0) 83.1MRF 27.5 (6.5) 76.4 56.4 (8.6) 82.3MRF + Joint 27.8 (9.0) 78.2 56.6 (10.8) 84.1

*SIFT Flow: 74.75

Page 51: Large-Scale Image  Parsing

Per-class classification rates

0%25%50%75%

100%SiftFlow Barcelona LM + Sun

0%25%50%75%

100%

Page 52: Large-Scale Image  Parsing

Results on SIFT Flow dataset

Page 53: Large-Scale Image  Parsing

55.3 92.2 93.6

Results on LM+SUN datasetImage Ground truth

Initial semantic Final semantic Final geometric

Page 54: Large-Scale Image  Parsing

58.9 93.057.3

Results on LM+SUN datasetImage Ground truth

Initial semantic Final semantic Final geometric

Page 55: Large-Scale Image  Parsing

11.6

0.0

60.3 93.0

Image Ground truth

Initial semantic Final semantic Final geometric

Results on LM+SUN dataset

Page 56: Large-Scale Image  Parsing

65.6 75.8 87.7

Image Ground truth

Initial semantic Final semantic Final geometric

Results on LM+SUN dataset

Page 57: Large-Scale Image  Parsing

Running times

SIFT Flow

Barcelona

dataset

Page 58: Large-Scale Image  Parsing

Conclusions Lessons learned

Can go pretty far with very little learning Good local features, and global (scene)

context is more important than neighborhood context

What’s missing A rich representation for

scene understanding The long tail Scalable, dynamic

learningroad

building

car

sky