using the forest to see the trees: a computational model relating features, objects and scenes
DESCRIPTION
Using the Forest to see the Trees: A computational model relating features, objects and scenes. Antonio Torralba CSAIL-MIT. Joint work with. Aude Oliva, Kevin Murphy, William Freeman Monica Castelhano, John Henderson. SceneType 2 {street, office, …}. S. O 1. O 1. O 1. O 1. O 2. O 2. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/1.jpg)
Using the Forest to see the Trees: A computational model relating
features, objects and scenes
Antonio Torralba
CSAIL-MIT
Joint work with
Aude Oliva, Kevin Murphy, William Freeman
Monica Castelhano, John Henderson
![Page 2: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/2.jpg)
From objects to scenes
ImageI
Local features L L L L
O2 O2 O2 O2
SSceneType 2 {street, office, …}
O1 O1 O1 O1Object localization
Riesenhuber & Poggio (99); Vidal-Naquet & Ullman (03); Serre & Poggio, (05); Agarwal & Roth, (02), Moghaddam, Pentland (97), Turk, Pentland (91),Vidal-Naquet, Ullman, (03) Heisele, et al, (01), Agarwal & Roth, (02), Kremp, Geman, Amit (02), Dorko, Schmid, (03) Fergus, Perona, Zisserman (03), Fei Fei, Fergus, Perona, (03), Schneiderman, Kanade (00), Lowe (99)
![Page 3: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/3.jpg)
From scenes to objects
SSceneType 2 {street, office, …}
Image
Local features L L L
I
O1 O1 O1 O1 O2 O2 O2 O2
L
Global gistfeatures
GObject localization
![Page 4: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/4.jpg)
From scenes to objects
SSceneType 2 {street, office, …}
Image
Local features L L L
I
O1 O1 O1 O1 O2 O2 O2 O2
L
Global gistfeatures
GObject localization
![Page 5: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/5.jpg)
The context challenge
2
What do you think are the hidden objects?
1
Biederman et al 82; Bar & Ullman 93; Palmer, 75;
![Page 6: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/6.jpg)
The context challengeWhat do you think are the hidden objects?
Answering this question does not require knowing how the objects look like. It is all about context.
Chance ~ 1/30000
![Page 7: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/7.jpg)
From scenes to objects
SSceneType 2 {street, office, …}
Image
Local features L L L
I
L
Global gistfeatures
G
![Page 8: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/8.jpg)
Scene categorization
Office Corridor Street
Oliva & Torralba, IJCV’01; Torralba, Murphy, Freeman, Mark, CVPR 03.
![Page 9: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/9.jpg)
Place identificationOffice 610 Office 615
Draper street
59 other places…
Scenes are categories, places are instances
![Page 10: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/10.jpg)
Supervised learning
Vg , Office}
Vg ,
{
{ Office}
Vg ,{ Corridor}
Vg ,{ Street}
…
Classifier
![Page 11: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/11.jpg)
Supervised learning
Vg , Office}
Vg ,
{
{ Office}
Vg ,{ Corridor}
Vg ,{ Street}
…
Classifier
Which feature vector for a whole image?
![Page 12: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/12.jpg)
Global features (gist)First, we propose a set of features that do not encode specific object information
Oliva & Torralba, IJCV’01; Torralba, Murphy, Freeman, Mark, CVPR 03.
![Page 13: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/13.jpg)
| vt | PCA
80 features
Global features (gist)
V = {energy at each orientation and scale} = 6 x 4 dimensions
First, we propose a set of features that do not encode specific object information
G
Oliva & Torralba, IJCV’01; Torralba, Murphy, Freeman, Mark, CVPR 03.
![Page 14: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/14.jpg)
Example visual gists
I
I’
Global features (I) ~ global features (I’)
Cf. “Pyramid Based Texture Analysis/Synthesis”, Heeger and Bergen, Siggraph, 1995
![Page 15: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/15.jpg)
Learning to recognize places
• Hidden states = location (63 values)
• Observations = vGt (80 dimensions)
• Transition matrix encodes topology of environment
• Observation model is a mixture of Gaussians centered on prototypes (100 views per place)
Office 610 Corridor 6b Corridor 6c Office 617
We use annotated sequences for training
![Page 16: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/16.jpg)
Wearable test-bed v1
Kevin Murphy
![Page 17: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/17.jpg)
Wearable test-bed v2
![Page 18: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/18.jpg)
Place/scene recognition demo
![Page 19: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/19.jpg)
From scenes to objects
SSceneType 2 {street, office, …}
Image
Local features L L L
I
O1 O1 O1 O1 O2 O2 O2 O2
L
Global gistfeatures
GObject localization
![Page 20: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/20.jpg)
Global scene features predicts object location
vg
New image
Image regions likely to contain the target
![Page 21: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/21.jpg)
Vg ,1{ }X1
Vg ,2{ }X2
Vg ,3{ }X3
Vg ,4{ }X4
…
Training set (cars)
Global scene features predicts object location
The goal of the training is to learn the association between the location of the target and the global scene features
![Page 22: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/22.jpg)
Global scene features predicts object location
vg X
Results for predicting the vertical location of people
Estimated Y
Tru
e Y
Results for predicting the horizontal location of people
Estimated X
Tru
e X
![Page 23: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/23.jpg)
The layered structure of scenes
p(x2|x1)p(x)
In a display with multiple targets present, the location of one target constraints the ‘y’ coordinate of the remaining targets, but not the ‘x’ coordinate.
![Page 24: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/24.jpg)
Global scene features predicts object location
Stronger contextual constraints can be obtained using other objects.
vg X
![Page 25: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/25.jpg)
![Page 26: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/26.jpg)
![Page 27: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/27.jpg)
1
![Page 28: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/28.jpg)
1
![Page 29: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/29.jpg)
Attentional guidance
Localfeatures
Saliency
Saliency models: Koch & Ullman, 85; Wolfe 94; Itti, Koch, Niebur, 98; Rosenholtz, 99
![Page 30: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/30.jpg)
Attentional guidance
Localfeatures
Globalfeatures
Saliency
Scene prior
TASK
Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003
![Page 31: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/31.jpg)
Attentional guidance
Localfeatures
Globalfeatures
Saliency
Objectmodel
TASK
Scene prior
![Page 32: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/32.jpg)
Comparison regions of interest
Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003
![Page 33: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/33.jpg)
Comparison regions of interest
Saliencypredictions
Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003
10%
20%
30%
![Page 34: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/34.jpg)
Comparison regions of interest
Saliencypredictions
Saliency and Global scene priors
Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003
10%
20%
30%
![Page 35: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/35.jpg)
Comparison regions of interest
Dots correspond to fixations 1-4
Saliencypredictions
10%
20%
30%
Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003
![Page 36: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/36.jpg)
Comparison regions of interest
Dots correspond to fixations 1-4
Saliencypredictions
Saliency and Global scene priors
10%
20%
30%
Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003
![Page 37: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/37.jpg)
Results
Saliency Region Contextual Region
1 2 3 4
Chance level: 33 %
100
90
80
70
60
50
% offixationsinsidethe region
Fixation number1 2 3 4
100
90
80
70
60
50
Scenes without people Scenes with people
Fixation number
![Page 38: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/38.jpg)
Task modulation
Localfeatures
Globalfeatures
Saliency
Scene prior
TASK
Torralba, 2003; Oliva, Torralba, Castelhano, Henderson. ICIP 2003
![Page 39: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/39.jpg)
Task modulation
Mug search Painting search
Saliencypredictions
Saliency and Global scene priors
![Page 40: Using the Forest to see the Trees: A computational model relating features, objects and scenes](https://reader034.vdocuments.us/reader034/viewer/2022042721/56815432550346895dc2313b/html5/thumbnails/40.jpg)
• From the computational perspective, scene context can be derived from global image properties and predict where objects are most likely to be.
• Scene context considerably improves predictions of fixation locations. A complete model of attention guidance in natural scenes requires both saliency and contextual pathways
Discussion