cisc 7610 lecture 9 image retrieval - michael i...

Post on 24-Jun-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CISC 7610 Lecture 9Image retrieval

Topics:How hard is computer vision?

Image retrieval tasksIndexing methods

Query by image: near-exact matchClassical image classification

Convolutional neural network classificationImage retrieval corpora

How hard is computer vision?

Zitnik, U. Washington, CSE P 576: Computer Vision, Lecture 1, https://courses.cs.washington.edu/courses/csep576/11sp/pdf/Intro.pdf

Marvin Minsky, MITTuring award,1969

“In 1966, Minsky hired a first-year undergraduate student and assigned him a problem to solve over the summer: connect a television camera to a computer and get the machine to describe what it sees.”Crevier 1993, pg. 88

How hard is computer vision?

Zitnik, U. Washington, CSE P 576: Computer Vision, Lecture 1, https://courses.cs.washington.edu/courses/csep576/11sp/pdf/Intro.pdf

Marvin Minsky, MITTuring award,1969

How hard is computer vision?

Gerald Sussman, MITPanasonic Professor of Electrical Engineering

“You’ll notice that Sussman never worked in vision again!” – Berthold Horn

Zitnik, U. Washington, CSE P 576: Computer Vision, Lecture 1, https://courses.cs.washington.edu/courses/csep576/11sp/pdf/Intro.pdf

Image retrieval tasks

● Query by description

● Query by image: near-exact matches

● Query by image: similar images

● Desired properties of an image retrieval system

Query by image: similar imagesGoogle image search

Query by image: near-exact matchesAmazon A9 Flow

Desired properties of an image retrieval system

● Invariance to – rotation, scaling, cropping

● Decoupling of – illumination, pose, background, occlusion,

intra-class variability, viewpoint

Lexing Xie, Columbia EE6882 Lecture 2http://www.ee.columbia.edu/~sfchang/course/svia/slides/lecture2.pdf

Image indexing methods

● Text around images– Captions, articles, descriptions, metadata

● Folksonomy / human tags– Provided by people to organize their own photos

● Games with a purpose– Provide additional incentive for humans to label images

● Autotagging: automatically classify images– Hardest, but most scalable

Text around images

Games with a purpose:ESP Game, Google Image Labeler

Autotagging: automatic classificationBehold image search

Query by image: near-exact matchSIFT features

● Compute salient points in image

● Characterize them with invariant features

● Index them with a text search engine

● Enforce geometric constraints after retrieval

Rueger, “Multimedia Information Retrieval” Lecture 2 www.nii.ac.jp/userimg/lectures/20120319/Lecture2.pdf

SIFT: Scale-Invariant Feature Transform

● Image features that can be used to match different views of the same object

● Robust to substantial changes in illumination, scale, rotation, viewpoint, noise

● Lowe, D.G. (2004). “Distinctive Image Features from Scale-Invariant Keypoints.” International Journal of Computer Vision, 60, 2, pp. 91-110.

SIFT Algorithm

● Detect scale space extrema

● Localize candidate keypoints

● Assign an orientation to each keypoint

● Produce keypoint descriptor

Detect scale space extrema:Scale space

Scale

● Representation of image as it is shrunk

● Provides invariance to size of object / image

● Repeatedly smooth and shrink image

Rueger, “Multimedia Information Retrieval” Lecture 2 www.nii.ac.jp/userimg/lectures/20120319/Lecture2.pdf

Rueger, “Multimedia Information Retrieval” Lecture 2 www.nii.ac.jp/userimg/lectures/20120319/Lecture2.pdf

Detect scale space extrema:Example smoothed images

Detect scale space extrema:Compute differences between scales

Scaleoctave

Gaussian images Difference-of Gaussian images

-

-

-

-

Rueger, “Multimedia Information Retrieval” Lecture 2 www.nii.ac.jp/userimg/lectures/20120319/Lecture2.pdf

Detect scale space extrema:Example difference images

Rueger, “Multimedia Information Retrieval” Lecture 2 www.nii.ac.jp/userimg/lectures/20120319/Lecture2.pdf

Scale

Localize candidate keypoints

● Seek extrema in x and y, but also in scale

● So the scale just before a feature gets blurred out by the smoothing

● Find points greater than all of their neighbors

Rueger, “Multimedia Information Retrieval” Lecture 2 www.nii.ac.jp/userimg/lectures/20120319/Lecture2.pdf

Assign an orientation to each keypointand produce descriptor

● Find “orientation” at each pixel

● Compute histogram of these orientations over pixels around the keypoint

● Align it to the dominant direction

● Provides robustness to rotation, pose, lightingRueger, “Multimedia Information Retrieval” Lecture 2 www.nii.ac.jp/userimg/lectures/20120319/Lecture2.pdf

SIFT Retrieval example

(Lowe, 2004)

Classical image tagging:Features

● Color features– Color histograms

– Color histograms in other color spaces

● Texture features– Tamura texture features

Grayscale histograms

Rueger, “Multimedia Information Retrieval” Lecture 5 www.nii.ac.jp/userimg/lectures/20120319/Lecture5.pdf

3D Color histograms

● Count how many times each color appears

● Usually want to quantize colors first

● Ignores where in the image each color appears

Rueger, “Multimedia Information Retrieval” Figure 3.3. Morgan & Claypool: 2010.

Color histogram example

● Draw a 3D color histogram for the following image

● Draw a color histogram for each channel

● Which one better characterizes the content?

R G B0 0 0 black

255 0 0 red0 255 0 green0 0 255 blue0 255 255 cyan

255 0 255 magenta255 255 0 yellow255 255 255 white

Rueger, “Multimedia Information Retrieval” Lecture 5 www.nii.ac.jp/userimg/lectures/20120319/Lecture5.pdf

Color histograms in other color spaces: HSL, HSV

● Hue-Saturation-Lightness / Value

● Separates color into more meaningful axes

● Hue: color

● Saturation: intensity

● Lightness / Value: black / white balance

Tamura texture features

● Texture is a property of image regions, not pixels

● Perceptual experiments yielded a small set of descriptors that capture how people see texture

● Can attempt to replicate those computationally

Rueger, “Multimedia Information Retrieval” Lecture 5 www.nii.ac.jp/userimg/lectures/20120319/Lecture5.pdf

Tamura texture features

● Compute texture features on image

● Create 3D histogram like color histogram

Rueger, “Multimedia Information Retrieval” Figure 3.5. Morgan & Claypool: 2010.

Coarseness Contrast Directionality

Classical image tagging:Classification

Shih-Fu Chang, Columbia EE6882 Lecture 1http://www.ee.columbia.edu/~sfchang/course/svia/slides/lecture1.pdf

Modern image tagging:Convolutional neural networks

● Combination of filtering with pooling

● Filters are learned to optimize classification

● Online demos: – http://yann.lecun.com/exdb/lenet/

– http://cs.stanford.edu/people/karpathy/convnetjs/demo/mnist.html

Image retrieval corpora

● Pascal visual object classes (VOC)

● Imagenet

● MS common objects in context (COCO)

● Places2

Pascal visual object classes

● 20 Categories

● 50k images

● Localize and classify objects

● Ran 2007-2012

ImageNet – http://www.image-net.org

● 1000 categories

● 1.2 million images

● Images of nouns in WordNet

● Several related challenges

MS Common Objects in Context (COCO) – http://mscoco.org/

● 91 objects types that would be easily recognizable by a 4 year old

● 330k images, 2.5 million labeled instances

● Objects in real context

Places2 – http://places2.csail.mit.edu/

● Recognize places / scenes, not objects

● Setting for where objects will appear

● 400 scene types, 10M images

Summary

● Computer vision is hard

● Labels can come directly from humans or via autotagging models

● Fingerprinting supports near-exact matching

● Classical image classification uses hand-designed features with a learned classifier

● Convolutional neural networks learn both the features and the classifiers

● Several large image retrieval corpora have recently been released

top related