Download - Attention in Computer Vision
![Page 1: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/1.jpg)
![Page 2: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/2.jpg)
Attention in Computer Vision
Mica Arie-Nachimson and Michal Kiwkowitz
May 22, 2005Advanced Topics in Computer Vision
Weizmann Institute of Science
![Page 3: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/3.jpg)
Problem definition – Search Order
Object recognition
NO
• Vision applications apply “expensive” algorithms (e.g. recognition) to image patches
• Mostly naïve selection of patches• Selection of patches determines number of calls to
“expensive” algorithm
![Page 4: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/4.jpg)
Problem Definition - Search Order
Object recognition
NOYES
• More sophisticated selection of patches would imply less calls to “expensive” algorithm
• Attention used to efficiently focus on incoming data (better use for limited processing capacity)
![Page 5: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/5.jpg)
Problem Definition - Search Order
Object recognition
12345
6
![Page 6: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/6.jpg)
Outline• What is Attention• Attention in Object Recognition
• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison
• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms
• FLNN• VSLE
![Page 7: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/7.jpg)
Outline• What is Attention• Attention in Object Recognition
• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison
• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms
• FLNN• VSLE
![Page 8: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/8.jpg)
Attention
• Attention implies allocating resources, perceptual or cognitive, to some things at the expense of not allocating them to something else.
![Page 9: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/9.jpg)
What is Attention
• You are sitting in class listening to a lecture.
• Two people behind you are talking. – Can you hear the lecture?
• One of them mentions the name of a friend of yours. – How did you know?
![Page 10: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/10.jpg)
Attention in Other Applications
• Face Detection (feature selection)
• Video Analysis (temporal block selection)
• Robot Navigation (select locations)
• …
![Page 11: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/11.jpg)
Attention is Directed by:
Bottom-up: • From small to large units of meaning • Rapid • Task-independent
![Page 12: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/12.jpg)
Attention is Directed by:
Top-down:• Use higher levels (context, expectation)
to process incoming information (Guess)• Slower• Task dependent
http://www.rybak-et-al.net/nisms.html
![Page 13: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/13.jpg)
Outline• What is Attention• Attention in Object Recognition
• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison
• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms
• FLNN• VSLE
![Page 14: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/14.jpg)
When is information selected (filtered)? – Early selection (Broadbent, 1958)– Cocktail party phenomenon (Moray, 1959)– Late selection (Treisman, 1960) - attenuation
• All information is sent to perceptual systems for processing
• Some is selected for complete processing• Some is more likely to be selected
Attention
WHICH?
![Page 15: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/15.jpg)
Parallel SearchIs there a green O ?
+
A. Treisman, G. Gelade, 1980
![Page 16: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/16.jpg)
Conjunction Search
Is there a green N ?
+
A. Treisman, G. Gelade, 1980
![Page 17: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/17.jpg)
Results
A. Treisman, G. Gelade, 1980
![Page 18: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/18.jpg)
Conjunction Search
+
A. Treisman, G. Gelade, 1980
![Page 19: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/19.jpg)
Color map Orientation map
A. Treisman, G. Gelade, 1980
![Page 20: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/20.jpg)
Color map Orientation map
A. Treisman, G. Gelade, 1980
![Page 21: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/21.jpg)
Conjunction Search
+
A. Treisman, G. Gelade, 1980
![Page 22: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/22.jpg)
Primitives
PP PP
PP
Intensity
PP P
PPP
Orientation
PP PP
PP
Color
xx
x
xs
x
Curvature
II
I
IILine End
Movement
x x x
xx
x
![Page 23: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/23.jpg)
Feature Integration Theory
Attention - two stages:
Attention•Serial Processing•Localized Focus•Slower•Conjunctive search
Pre-attention•Parallel Processing•Low Level Features•Fast•Parallel Search
How is the Focus
found & shifted?
A. Treisman, G. Gelade, 1980
![Page 24: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/24.jpg)
Outline• What is Attention• Attention in Object Recognition
• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison
• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms
• FLNN• VSLE
![Page 25: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/25.jpg)
Shifts in Attention
“Shifts in selective visual attention: towards the underlying neural circuitry”,
Christof Koch, and Shimon Ullman, 1985
C. Koch, and S. Ullman, 1985
Feature Maps
•Orientation•Color•Curvature•Line end•Movement
Feature Maps
•Orientation•Color•Curvature•Line end•Movement
Feature Maps
•Orientation•Color•Curvature•Line end•Movement
Feature Maps
•Orientation•Color•Curvature•Line end•Movement
Feature Maps
•Orientation•Color•Curvature•Line end•Movement
Central RepresentationAttention
SaliencySaliency
![Page 26: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/26.jpg)
Saliency
“A model of saliency-based visual attention for rapid scene analysis”
Laurent Itti, Christof Koch, and Ernst Niebur, 1998
L. Itti, C. Koch, and E. Niebur, 1998
• Salient - stands out
• Example – telephone & road sign have high saliency
![Page 27: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/27.jpg)
from C. Koch L. Itti, C. Koch, and E. Niebur, 1998
![Page 28: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/28.jpg)
Intensity
L. Itti, C. Koch, and E. Niebur, 1998
Cells in the retina
![Page 29: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/29.jpg)
01
2
Intensity
Create 8 spatial scale using Gaussian pyramids
8
L. Itti, C. Koch, and E. Niebur, 1998
![Page 30: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/30.jpg)
IntensityCenter-Surround difference operator- Sensitive to local spatial
discontinuities- Principle computation in the retina &
primary visual cortex- Subtract coarse scale from fine
scale
+
-
Fine scale
Coarse scale
L. Itti, C. Koch, and E. Niebur, 1998
+
-
fine
coarse
![Page 31: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/31.jpg)
Toy Example
0 0 0
0 0 0
0 0 0
0 0 0
0 255 0
0 0 0
Fine level Coarse level
Gauss Pyramid Interpolation
Coarse level
Point-by-point subtraction
0 0 0
0 255 0
0 0 0
![Page 32: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/32.jpg)
Toy Example
255 255 255
255 255 255
255 255 255
255 255 255
255 255 255
255 255 255
Fine level Coarse level
Gauss Pyramid Interpolation
Coarse level
Point-by-point subtraction
0 0 0
0 0 0
0 0 0
![Page 33: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/33.jpg)
Intensity
4,3,2c 4,3 ccs
)()(),( sIcIscI
)5()2()5,2( III
Compute:
6 Intensity maps
)6()2()6,2( III
Different ratios – multiscale feature extraction
)6()3()6,3( III
L. Itti, C. Koch, and E. Niebur, 1998
![Page 34: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/34.jpg)
Color
Same c and s as with intensity12 Color maps
Kandel et al. (2000). Principles of Neural Science. McGraw-Hill/Appleton & Lange
L. Itti, C. Koch, and E. Niebur, 1998More
![Page 35: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/35.jpg)
Orientation
Same c and s as with intensity24 Orientation maps
}135,90,45,0{
|),(),(|),,( sOcOscO
From Visual system presentation by S. Ullman
L. Itti, C. Koch, and E. Niebur, 1998More
![Page 36: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/36.jpg)
from C. Koch L. Itti, C. Koch, and E. Niebur, 1998
![Page 37: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/37.jpg)
More
Normalization Operator
L. Itti, C. Koch, and E. Niebur, 1998
![Page 38: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/38.jpg)
Saliency Map
3
)()()( ONCNINS
L. Itti, C. Koch, and E. Niebur, 1998
![Page 39: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/39.jpg)
1. Extract Feature Maps
Algorithm- up to now
2. Compute Center-Surround (42)
• Intensity – I (6)
• Color – C (12)
• Orientation – O (24)
3. Combine each channel into conspicuity map
4. Compute saliency by summing and normalizing maps
![Page 40: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/40.jpg)
Laurent Itti, Christof Koch, and Ernst Niebur, 1998
![Page 41: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/41.jpg)
Leaky integrate-and-fire neurons“Inhibition of return”
Winner Takes All
Selection (FOA)
L. Itti, C. Koch, and E. Niebur, 1998
FOA – Focus Of Attention
![Page 42: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/42.jpg)
Results
• FOA shifts: 30-70 ms• Inhibition: 500-900 ms
Inhibition of return ends
L. Itti, C. Koch, and E. Niebur, 1998
![Page 43: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/43.jpg)
Results
Spatial Frequency Content, Reinage & Zador, 1997
Image
SFC
Saliency
Output
L. Itti, C. Koch, and E. Niebur, 1998
![Page 44: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/44.jpg)
Results
(a) (b)
(c) (d)
Image
SFC
Saliency
Output
L. Itti, C. Koch, and E. Niebur, 1998Spatial Frequency Content, Reinage & Zador, 1997
![Page 45: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/45.jpg)
Outline• What is Attention• Attention in Object Recognition
• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison
• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms
• FLNN• VSLE
![Page 46: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/46.jpg)
Attention & Object Recognition
• “Is bottom-up attention useful for object recognition?”– U. Rutishauser, D. Walther, C. Koch and P. Perona,
2004
U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004
Computer recognition
Human recognition
segmented Cluttered scenes
labeled Non labeled
Attention
![Page 47: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/47.jpg)
Object Recognition
saliency model
U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004
Growing region in strongest map
To Object Recognition
(Lowe)
More
![Page 48: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/48.jpg)
Attention & Object Recognition
Learning inventories – “grocery cart problem”
U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004
Real world scenes1 image for training (15 fixations)
2-5 images for testing (20 fixations)
![Page 49: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/49.jpg)
testing
training Object recognitionMatch
![Page 50: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/50.jpg)
“Grocery Cart” Problem
U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004
training testing1
testing2
![Page 51: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/51.jpg)
“Grocery Cart” Problem
Downsides:
• Bias of human photography
• Small image set
U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004
Solution• Robot as acquisition tool
![Page 52: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/52.jpg)
Robot - Landmark Learning
Objective – how many objects are found and classified correctly?
Navigation – simple obstacle avoiding algorithm using infrared sensors
U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004
![Page 53: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/53.jpg)
Object recognition
< 3 key points
![Page 54: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/54.jpg)
Landmark Learning
With
Attention
U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004
![Page 55: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/55.jpg)
Landmark Learning
With Random Selection
U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004
![Page 56: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/56.jpg)
Landmark Learning - Results
U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004
![Page 57: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/57.jpg)
Saliency Based Object Recognition
• Biologically motivated• Uses bottom-up, allows
combining top-down information
• Segmentation– Cluttered scenes– Unlabeled objects– Multiple objects in single image
• Static priority map
U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004
![Page 58: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/58.jpg)
Outline• What is Attention• Attention in Object Recognition
• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison
• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms
• FLNN• VSLE
![Page 59: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/59.jpg)
Comparison
“Comparing attention operators for learning landmarks”, R. Sim, S. Polifroni, G. Dudek , June 2003
Other attention operators for low level features
R. Sim, S. Polifroni, G. Dudek , June 2003
![Page 60: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/60.jpg)
Comparison
R. Sim, S. Polifroni, G. Dudek , June 2003
Edge density Radial symmetry
Smallest eigenvalue Caltech saliency
![Page 61: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/61.jpg)
Comparison
• Landmark learning
• Training – learn landmarks knowing camera pose
• Testing - determine pose of camera according to landmarks (pose estimation)
R. Sim, S. Polifroni, G. Dudek , June 2003
![Page 62: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/62.jpg)
Comparison - Results
• All operators better than random
• Radial symmetry worst results
• Caltech operator performs similar to edge and eigenvalue operators
• BUT – More complex to implement – More computing time
• Less preferred candidate in practice
R. Sim, S. Polifroni, G. Dudek , June 2003
![Page 63: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/63.jpg)
Outline• What is Attention• Attention in Object Recognition
• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison
• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms
• FLNN• VSLE
![Page 64: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/64.jpg)
The Problem
Object recognition
12345
6
![Page 65: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/65.jpg)
Outline• What is Attention• Attention in Object Recognition
• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison
• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms
• FLNN• VSLE
![Page 66: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/66.jpg)
Biological Motivation
• An alternative approach: continuous search difficulty
• Based on similarity:– Between Targets and Non-Targets in the scene– Between Non-Targets and Non-Targets in the scene
• Similar structural units do not need separate treatment
• Structural units similar to a possible target get high priority
Duncan & Humphreys [89]
![Page 67: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/67.jpg)
Biological Motivation
similar
similar
not similar
not similar
search difficulty
target- nontarget similarity
nontarget- nontarget similarity
Duncan & Humphreys [89]
![Page 68: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/68.jpg)
Biological Motivation
• Explains pop-out vs. serial search phenomenon
Non-targets:
Target:
Duncan & Humphreys [89]
![Page 69: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/69.jpg)
Biological Motivation
• Explains pop-out vs. serial search phenomenon
Non-targets:
Target:
Duncan & Humphreys [89]
![Page 70: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/70.jpg)
similar
similar
not similar
not similar
search difficulty
Biological Motivation
• Explains pop-out vs. serial search phenomenon Non-targets:
Target:
Non-targets:
Target:
target- nontarget similarity
nontarget- nontarget similarity
Duncan & Humphreys [89]
![Page 71: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/71.jpg)
Using Inner-scene Similarities
• Every candidate is characterized by a vector of n attributes
• n-dimentional metric space– A candidate is a point in the space– Some distance function d is associated with
the space
Avraham & Lindenbaum [04] Avraham & Lindenbaum [05]
![Page 72: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/72.jpg)
Using Inner-scene Similarities Example
• One feature only: object area
• d: regular Euclidean distance Feature space
![Page 73: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/73.jpg)
Outline• What is Attention• Attention in Object Recognition
• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison
• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms
• FLNN• VSLE
![Page 74: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/74.jpg)
Difficulty of Search
• The difficulty measure is the number of queries until the first target is found
• Two main factors– Distance between Targets and Non-Targets– Distance between Non-Targets and Non-
Targets
Feature space
![Page 75: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/75.jpg)
CoverDifficulty of Search
Feature space
c: the number of circles in the cover
![Page 76: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/76.jpg)
Difficulty of Search
c will be our measure of the search difficulty
We need some constraint on the
circles’ size!
c: the number of circles
![Page 77: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/77.jpg)
dt: max-min target distanceDifficulty of Search
dt
![Page 78: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/78.jpg)
dt-cover
diamete
r
d t
Difficulty of Searchdt
![Page 79: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/79.jpg)
Minimum dt-cover
c: The number of circles in the minimal dt-cover
diamete
r
d t
Difficulty of Searchdt
![Page 80: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/80.jpg)
c: the number of circlesDifficulty of Search
dt
c = 7
dt
dt
![Page 81: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/81.jpg)
c: insects exampleDifficulty of Search
dt
Feature spacec = 3
![Page 82: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/82.jpg)
Example: easy searchDifficulty of Search
dt
c = 2
![Page 83: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/83.jpg)
Example: hard searchDifficulty of Search
c = # of candidates
dt
![Page 84: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/84.jpg)
Define the Difficulty using c
• Lower bound: Every search algorithm needs c calls to the oracle before finding the first target in the worst case
• Upper bound: There is an algorithm that will need max. c calls to the oracle to find the first target, for all search tasks
Difficulty of Search
![Page 85: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/85.jpg)
Lower bound
Every search algorithm needs c calls to the oracle before finding the first target in the worst case
Difficulty of Search
1
2
3
4
5dt
dt
dt
dt
![Page 86: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/86.jpg)
Upper bound
There is an algorithm that will need max. c calls to the oracle to find the first target, for all search tasks
FLNN-Farthest Labeled Nearest Neighbor
Difficulty of Search
![Page 87: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/87.jpg)
Outline• What is Attention• Attention in Object Recognition
• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison
• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms
• FLNN• VSLE
![Page 88: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/88.jpg)
FLNNFarthest Labeled Nearest Neighbor
Efficient Algorithms
1
2
3
4
5
c is a tight bound!
![Page 89: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/89.jpg)
How do we compute c?Difficulty of Search
dt
– Need to know dt
– Compute the minimal dt-cover
– Count number of circles c=7
dt
![Page 90: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/90.jpg)
– Need to know dt
– Compute the minimal dt-cover
– Count number of circles = c
To know the exact dt we need to know all the targets and non-targets, but that’s what we’re looking for…
Computing the minimal dt-cover is NP-complete!
Ok, that’s easy…
Difficulty of Search
dt
How do we compute c?
![Page 91: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/91.jpg)
Upper & Lower Bounds on c
• Upper bounds:– The number of candidates
– Know that dt is larger than some d0:• Can approximate cover size
• Lower bounds:– FLNN worst case
– Know that dt is larger than some d0:• Can approximate cover size
Difficulty of Search
![Page 92: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/92.jpg)
Outline• What is Attention• Attention in Object Recognition
• Saliency Model• Feature Integration Theory• Saliency Algorithm• Saliency & Object Recognition• Comparison
• Inner Scene Similarity Model• Biological motivation• Difficulty of Search Tasks• Algorithms
• FLNN• VSLE
![Page 93: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/93.jpg)
Improving FLNN
• What’s wrong with FLNN?– Relates only to the nearest known neighbor– Finds only the first target efficiently– Cannot be easily extended to include top-
down information
Efficient Algorithms
![Page 94: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/94.jpg)
VSLEVisual Search using Linear Estimation
• Each candidate has a prob. to be a target• Query the candidate with the highest probability• Update other candidates’ prob. according to the
known results– Every known target/non-target affects other
candidates in reverse order to its distance.
If we know results for candidates 1,…,m:
• Dynamic priority map
Efficient Algorithms
![Page 95: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/95.jpg)
Efficient Algorithms
0.650.4
0.45
0.6 0.5
0.54
0.450.51
0.530.46
0.58
0.51
0.1
0.4
0.450.5
0.560.48
0.5
0.56
0.63
0.70.68
VSLEVisual Search using Linear Estimation
![Page 96: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/96.jpg)
Efficient Algorithms
0.15
0.45
0.6 0.60.63
0.450.65
0.20.25
0.53
0.23
0.55
0.1 0.620.15
0.59
0.210.27
0.65
VSLEVisual Search using Linear Estimation
0.06
0.45
0.12 0.550.18
0.95
0.220.28
More
![Page 97: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/97.jpg)
Combining Top-Down Information
• Simply specify the initial probabilities to match previous known data
• Add known target objects to the space. This will alter the probabilities accordingly and speed up search
Efficient Algorithms
![Page 98: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/98.jpg)
Experiment 1: COIL-100Efficient Algorithms
Columbia Object Image Library [96]
![Page 99: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/99.jpg)
Experiment 1: COIL-100
• Features:– 1st, 2nd, 3rd gaussian derivatives 9 basis
filters– 5 scales 9x5 = 45 features
• Euclidean distance
Efficient Algorithms
Rao & Ballard [95]
![Page 100: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/100.jpg)
Experiment 1: COIL-100Efficient Algorithms
10 cars10 cups
# queries# queries
![Page 101: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/101.jpg)
Experiment 2: hand segmentedEfficient Algorithms
• Every large segment is a candidate• 24 candidates• 4 targets
Berkeley hand segmented DB
Martin, Fowlkes, Tal & Malik [01]
![Page 102: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/102.jpg)
Experiment 2: hand segmented
• Features: color histograms and
separated into 8 bins each 64 features
• Euclidean distance
Efficient Algorithms
![Page 103: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/103.jpg)
Experiment 3: automatic color segmentation
• Automatic color segmented image for face detection
Efficient Algorithms
![Page 104: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/104.jpg)
Experiment 3: color segmentation
• 146 candidates
• 4 features: segment size, mean value of red, green and blue
• Euclidean distance
Efficient Algorithms
# queries
![Page 105: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/105.jpg)
Combining top-down information
• Add known targets to the space
Efficient Algorithms
Without additional targets With additional targets
# queries# queries
![Page 106: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/106.jpg)
Summary: similarity modelSaliency model• Biologically motivated• Uses bottom-up, allows
combining top-down information
• Segmentation• Static priority map
Similarity model• Biologically motivated• Uses bottom-up, allows
combining top-down information
• No segmentation• Dynamic priority map• Measures the search
difficulty
![Page 107: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/107.jpg)
Summary
• What is attention
• Aid object recognition tasks by choosing the area of interest
• Two approaches: saliency model and similarity model– Biological motivation– Algorithms
![Page 108: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/108.jpg)
Thank You!
![Page 109: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/109.jpg)
Linearly Estimating l(xk)
A linear estimation for l(xk):
Which, of course, minimizes the error
Solving a set of equations gives an estimation:
![Page 110: Attention in Computer Vision](https://reader035.vdocuments.us/reader035/viewer/2022070405/56813fe5550346895daad1fd/html5/thumbnails/110.jpg)
Linearly Estimating l(xk)
Estimation:
Where vector of known labels,
and is computed as follows (i,j=1,…,m):
R and r depend only on the distances, computed in
advance once