street smarts: visual attention on the go alexander patrikalakis may 13, 2009 6.xxx
TRANSCRIPT
Vision of Attention
• For machines to recreate human visual attention, we must accept that humans:– Maintain multi-scale orientation, intensity, and
color feature neuronal maps in parallel– Combine multi-scale features into a central
conspicuity (saliency) map– Maintain a Winner-Take-All neural network that
saccades to and subsequently inhibits decreasingly salient points
ExampleObject recognition at all points of an image is infeasible time-wise
Visual attention allows us to find the interesting points quickly
Ullman agrees: “Recognition over the whole scene leads to a combinatorial explosion.”
Implementation Steps
• Analyzed previous work done by Ullman, Itti, and Koch on visual attention
• Implemented visual saliency model in C++ using Intel OpenCV, IPP, and TBB
• Implemented FOA shifting by saccading to points with decreasing saliency map values; same effect as a 2D neuronal matrix
Results
• Tested algorithm on 13 geometric scenes, and obtained plausible salient winners in each
• Tested algorithm on 40 natural scenes (roads and highways) and found that signs and signals are very salient (usually saccaded to first)
• Algorithm resilient to noise and takes advantage of multi-scale analysis
Itti: Normalization
• Promote maps with small numbers of strong maxima
• Suppress maps with large numbers of equally strong maxima
• Method: scales maps by the difference between global maximum and mean of remaining maxima
Contributions
• Reviewed past work done on biologically inspired visual attention models
• Identified Itti’s algorithm as a candidate for saliency detection in natural scenes involving road signs
• Demonstrated algorithm’s effectiveness on many natural scenes involving road signs
• Created a prototype saliency heuristic for evaluating sign effectiveness