attention model based sift keypoints filtration for image retrieval
DESCRIPTION
Attention Model Based SIFT Keypoints Filtration for Image Retrieval. Ke Gao 1,2 , Shouxun Lin 1 , Yongdong Zhang 1 , Sheng Tang 1 , Huamin Ren 1,3 1 Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, 100080 - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/1.jpg)
Attention Model Based SIFT Keypoints Filtration for Image Retrieval
Ke Gao1,2, Shouxun Lin1, Yongdong Zhang1,Sheng Tang1, Huamin Ren1,3
1Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, 100080
2Graduate University of the Chinese Academy of Sciences, Beijing, China, 1000803Beijing University of Chinese Medicine
Seventh IEEE/ACIS International Conference on Computer and Information Science 2008
![Page 2: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/2.jpg)
Outline
• Introduction• Review of Attention Model• SIFT Keypoints Filtration using Attention
Model– SIFT Keypoints Extraction– Attention Model based Keypoints Filtration
• Experiment Evaluation• Conclusion
![Page 3: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/3.jpg)
Introduction
• Local Image Descriptors is applied in object recognition and image retrieval [1], [2] – distinctive, robust, and do not require
segmentation
[1] Mikolajczyk K, Schmid, C, “A Performance Evaluation of Local Descriptors”. IEEE Trans.Pattern Analysis and Machine Intelligence, 2005, 27(10), p1615-1630[2] V. Ferrari, T. Tuytelaars, and L. Van Gool. “Simultaneous Object Recognition and Segmentation by Image Exploration”, Proc. Eighth European Conf. Computer Vision, 2004, p40-54
![Page 4: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/4.jpg)
Introduction
• Two considerations to using local image descriptors– keypoints should be placed at local peaks in a
scale-space search (remain stable over transformations)
– a description of each keypoint must be distinctive, concise, and invariant over transformations caused by changes in camera pose and lighting
![Page 5: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/5.jpg)
Introduction[3] D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints”, Int’l J. Computer Vision, vol. 2, no. 60, 2004, p91-110[4] Abdel-Hakim AE, Farag AA, “CSIFT: A SIFT Descriptor with Color Invariant Characteristics”. Computer Vision and Pattern Recognition, 2006,Vol. 2, p1978-1983[5] T. Tuytelaars and L. Van Gool, “Matching Widely Separated Views Based on Affine Invariant Regions”, Int’l J. Computer Vision, 2004,Vol. 1, no. 59, p61-85
![Page 6: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/6.jpg)
Introduction
• Scale Invariant Feature Transform (SIFT)– most robust among the other local invariant
feature descriptors– It combines a scale invariant region detector and a
descriptor based on the gradient distribution in the detected regions
– The descriptor is represented by a 3D histogram of gradient locations and orientations
![Page 7: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/7.jpg)
Introduction
![Page 8: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/8.jpg)
Introduction
• PCA-SIFT has been developed based on SIFT algorithm [6]– It applies Principal Components Analysis (PCA) to
the normalized image gradient patch– accelerates matching speed by reducing feature
dimensions from 128 to 36 for each patch
[6] Yan Ke, Rahul Sukthankar, “PCA-SIFT: A More Distinctive Representation for Local Image Descriptors”. In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 2004,Vol.2, p506-513
![Page 9: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/9.jpg)
Introduction
• Shortage– On a typical image, it returns a large number of
features– Especially when the object appears small in the
image• This paper proposes a novel method to filter
the SIFT keypoints based on attention model
![Page 10: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/10.jpg)
Review of Attention Model
• Attention is at the nexus between cognition and perception
• Select a subset of the available sensory information before further processing
• A number of computational attention models were developed, such as the models proposed in [7], [8][7] J. K. Tsotsos, S. M. Culhane, W.Y.K. Wai, et al, “Modeling visual attention via selective tuning”, Artificial Intelligence, 1995,78: p507-545[8] Itti L, Gold C, Koch C, “Visual attention and target detection in cluttered natural scenes”. Optical Engineering, 2001,40(9), p1784-1 793
![Page 11: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/11.jpg)
Review of Attention Model
• saliency-based attention model for scene analysis [8]
• “saliency region” means the region which has evident contrast with its surrounding
![Page 12: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/12.jpg)
Review of Attention Model
![Page 13: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/13.jpg)
SIFT Keypoints Filtration using Attention Model
• Content-based image retrieval can be looked as the problem of transforming the image into a set of feature vectors
• For good retrieval performance, the extracted features should satisfy two criteria– Distinctiveness– Matching speed
![Page 14: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/14.jpg)
SIFT Keypoints Filtration using Attention Model
• SIFT descriptors are accurate enough• Too many keypoints from each image– most of them are “noise points” come from
background• This paper uses attention model to filter SIFT
keypoints
![Page 15: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/15.jpg)
SIFT Keypoints Filtration using Attention Model
• SIFT Keypoints Extraction• Attention Model based Keypoints Filtration
![Page 16: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/16.jpg)
SIFT Keypoints Extraction
• Four major stages of SIFT:1. scale-space peak selection2. keypoints localization3. orientation assignment4. keypoint descriptor
![Page 17: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/17.jpg)
SIFT Keypoints Extraction
• In 1st stage– potential interest points are identified by scanning
over all possible scales and image locations– the only possible scale-space kernel is the
Gaussian function– the scale space of an image is defined as a
function 2 2 2( ) / 2
2
( , , ) ( , , ) ( , )1( , , )
2x y
L x y G x y I x y
G x y e
![Page 18: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/18.jpg)
SIFT Keypoints Extraction
– To efficiently detect stable keypoint locations in scale space, a series of difference-of-Gaussian (DoG) images are established
– DoG function provides a close approximation to the scale-normalized Laplacian of Gaussia,
– the maxima and minima of produce the most stable image featuresex. gradient, Hessian, or Harris corner function
2 2G 2 2G
2 2 ( , , ) ( , , )G G x y k G x yGk
![Page 19: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/19.jpg)
SIFT Keypoints Extraction
• In 2nd stage, candidate keypoints are localized to sub-pixel accuracy and eliminated if found to be unstable
• The third identifies the dominant orientations for each keypoint based on its local image patch
• The final stage builds a local image descriptor for each keypoint, based upon the image gradients in its local neighborhood
![Page 20: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/20.jpg)
SIFT Keypoints Extraction
• The dimension of standard SIFT descriptor for each keypoint is 128
• PCA-SIFT reduces the dimension to 36• This work is based on the first three stages,
and further uses attention model to filter these keypoints– Provides benefits both in retrieval accuracy and
matching speed
![Page 21: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/21.jpg)
Attention Model based Keypoints Filtration
• After the SIFT keypoints extraction, attention model is used to generate saliency map
• Fuzzy growing [9] is performed to find all of the saliency regions for original image
• Considering the calculation complexity, the number of saliency regions per image is limited to 3
[9] Ma Y F, Zhang H J, “Contrast-based image attention analysis by using fuzzy growing”. Proceedings of the 11th ACM International Conference on Multimedia. Berkeley, CA, USA: ACM, 2003, p374 – 381
![Page 22: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/22.jpg)
Attention Model based Keypoints Filtration
![Page 23: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/23.jpg)
Attention Model based Keypoints Filtration
• saliency regions (SR) in saliency map can be in arbitrary shapes
• use rectangle for simplicity• assume that no rectangle will overlap with
each other• SR is defined as
represents center denotes the size of SR
{ _ , _ , , }Center x Center y Width Height
( _ , _ )Center x Center y
( , )Width Height
![Page 24: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/24.jpg)
Attention Model based Keypoints Filtration
• Based on the definition of SR, each SIFT keypoint is attached with a saliency weight
is 1 if (x,y) is in the center 0 if (x,y) is NOT subject to any SR
weightKP
2 2
2 2
2 ( _ ) ( _ )1
weight dis weight
dis
weight area pos
KP KPR SR
x Center x y Center yKPR
Width HeightSR R R
disKPR
![Page 25: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/25.jpg)
Attention Model based Keypoints Filtration
• denotes the saliency weight of this saliency region
• Observe that the importance of a detected region is usually reflected by its region area weight and position weight
• If a region is too small to provide any useful information, it would not be considered– ranked the regions bigger than 5% of image– only the top 3 regions will be reserved as SRs
weightSR
areaR posR
![Page 26: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/26.jpg)
Attention Model based Keypoints Filtration
• Area weight of the current SR is calculated as the following function:
• Since people often pay more attention to the region near the image center– use normalized Gaussian template to assign the
position weight
areaR
1
currentarea n
ii
areaRarea
posR
![Page 27: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/27.jpg)
Attention Model based Keypoints Filtration
![Page 28: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/28.jpg)
Attention Model based Keypoints Filtration
• The saliency weight of each SIFT keypoint is generated
• Rank all keypoints in an image with their• Only the top N keypoints will be reserved to
extract SIFT descriptors• N is determined– between retrieval accuracy and speed
weightKP
weightKP
![Page 29: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/29.jpg)
Experiment Evaluation
Data sets• consists of three categories
1. The same object with different background or under different viewpoints
2. Video frames extracted from some movies3. Usual images with different size and content
• Most of the original photos are downloaded fromALOI (http://staff.science.uva.nl/~aloi/)
Caltech (http://vision.caltech.edu/Image_Datasets/Caltech256/)
![Page 30: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/30.jpg)
Experiment Evaluation
![Page 31: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/31.jpg)
Experiment Evaluation
• Some geometric and photometric transformations have been made to evaluate the algorithm under different conditions
• According to different objects, the data set is divided into about 50 classes, and each class has more than 20 relevant images
• 6,000 images and 7,240,000 standard SIFT keypoints
![Page 32: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/32.jpg)
Experiment Evaluation
• Evaluation Metrics– use the method Bag of Words proposed in [10]– which vector quantizes the SIFT descriptors into
clusters uses k-means– represents an image as a bag of “words”– Using ‘term frequency’ as standard weighting– all of the images are organized as an inverted file
[10] J.Sivic, A.Zisserman, “Video Google: A Text Retrieval Approach to Object Matching in Videos”. Proceedings of the International Conference on Computer Vision, 2003, p1470-1477
![Page 33: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/33.jpg)
Experiment Evaluation
• Evaluation Metrics– Image matching is based on cosine between these
quantized vectors– This method can ensure in-time retrieval, and proven
to be very useful– If the cosines distance between image vectors larger
than the chosen threshold, this pair of images is called a match
– all of the images will be ranked with the matching degree
![Page 34: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/34.jpg)
Experiment Evaluation
• Evaluation Metrics– To describe the image ranking sequence of image
retrieval in this data set– adopt average retrieval precision
is the query image, denotes each image of ranking result, and n is 20q ip
![Page 35: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/35.jpg)
Experiment Evaluation
• Evaluation Metrics
![Page 36: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/36.jpg)
Experiment Evaluation
• Experimental Results and Discussion– Comparing attention model based SIFT keypoints
filtration algorithm (AF-SIFT) to the standard SIFT and PCA-SIFT
– The dimension of standard SIFT is 128• A 4*4 array of histograms, each with 8 orientation bins
– PCA-SIFT descriptor dimension for each keypoint is 36
![Page 37: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/37.jpg)
Experiment Evaluation
• Experimental Results and Discussion– Using two methods to compare its performance1. AF-SIFT1 uses 128-dimension descriptors in the
standard way2. AF-SIFT2 uses a 2*2 array with 8 orientation
bins, and its dimension is 32
![Page 38: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/38.jpg)
Experiment Evaluation
• Experimental Results and Discussion
– It’s a bit time-consuming for filtering algorithms, but the processing is completed off-line
– effectively reduce the background features, so it in fact decreases the whole calculation time
![Page 39: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/39.jpg)
Experiment Evaluation
• Experimental Results and Discussion– ALOI has few background confusion• varying illumination or view point
– Movie frames have little obvious difference between foreground and background
– Coral Gallery are nature photos with confusion background
![Page 40: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/40.jpg)
Experiment Evaluation
• Experimental Results and Discussion
![Page 41: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/41.jpg)
Experiment Evaluation
• Experimental Results and Discussion– filtrated SIFT keypoints provides information of
saliency regions– the ranking of keypionts is based on the global
distribution, not only relies on local patches– the most distinctive keypoints are reserved– avoid the infection of background features– made the cluster result become more exact
![Page 42: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/42.jpg)
Experiment Evaluation
• Experimental Results and Discussion– Fig. 9 shows how the matching reliability varies as
a function of N– N denotes the number
of SIFT keypoints left behind the filtration
– A good tradeoffbetween accuracyand speed should beachieved
![Page 43: Attention Model Based SIFT Keypoints Filtration for Image Retrieval](https://reader035.vdocuments.us/reader035/viewer/2022062813/568164c3550346895dd6d930/html5/thumbnails/43.jpg)
Conclusion
• AF-SIFT provides an effective alternative of standard SIFT
• Region-based image retrieval• Seeking for ways to apply this idea to large
image database retrieval