global and high-level image descriptions
DESCRIPTION
Global and high-level image descriptions. Present by Yao Pan. Low-level descriptor. Pixel Patch Mathematical transformation ( Laplace,Gauss ) of pixel and patch(SIFT,GIST…). High Level Task:. Object Recognition Scene classification. Semantic Gap. Pixe l intensity, gradient. Low Level - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/1.jpg)
Global and high-level image descriptions
Present by Yao Pan
![Page 2: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/2.jpg)
Low-level descriptor
• Pixel• Patch• Mathematical transformation
(Laplace,Gauss) of pixel and patch(SIFT,GIST…)
![Page 3: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/3.jpg)
Object Recognition Scene classification
Pixel intensity, gradient
Semantic Gap
High Level Task:
Low Level image feature
![Page 4: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/4.jpg)
Analogy to text analysis
We want to classify which author write this article?Or what type of content(science, politics, entertainment)?
Letter frequency
Meaning groupSentencephrasewordLetter frequency
![Page 5: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/5.jpg)
• Modeling the shape of the scene: a Holistic Representation of the Spatial EnvelopeA. Oliva and A. Torralba. IJCV 2001
Efficient Object Category Recognition Using ClassemesLorenzo Torresani, Martin Szummer, Andrew Fitzgibbon. ECCV 2010
Objects as Attributes for Scene ClassificationLi-Jia Li*, Hao Su*, Yongwhan Lim, Li Fei-Fei. ECCV 2010
Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature SparsificationL-J. Li, H. Su, E. Xing, L. Fei-Fei. NIPS 2010
Overview
![Page 6: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/6.jpg)
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
A. Oliva and A. Torralba. IJCV 2001
![Page 7: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/7.jpg)
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
Motivation
Scene CategorizationOne way of doing this is: segment and detect the objects in the picture. Classify the scene according to which objects the picutre contains.
But, segmentation and object detection are hard problems.
Picture from: J. Yao, S. Fidler and R. Urtasun
![Page 8: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/8.jpg)
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
MotivationExperiment in Cognitive PsychologyMary C.Potter, 1975, science• Subjects were presented a target scene picture or a
scene name beforehand• Then they were presented a sequence of pictures at
rates up to 8 per second. They were asked to press the button when they saw the target.
• Detection rate are surprising high (more than 90%).
![Page 9: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/9.jpg)
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
Motivation• Subsequent experiment implies that object information
might be ignored during rapid categorization of scene.
• Human are using some holistic visual features(spatial layout, spatial structure, shape of scene...).
• In this paper, the author terms them as Spatial Envelope.• Scenes belonging to same category share similar spatial
structure that can be extracted without segmentating the image.
![Page 10: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/10.jpg)
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
ApproachWhat is scene?• Traditionally: unconstrained configuration of
objects.• In this paper: treat it as an individual object.
![Page 11: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/11.jpg)
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
What exactly is spatial envelope?
Spatial Envelope Properties• Naturalness
• Straight horizontal and vertical line in man-made scene vs. textured zone of natural landscape.
• Openness• Roughness• Expansion• RuggednessFinding a low-dimensional scene space that scenes of same category are projected together.
![Page 12: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/12.jpg)
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
NaturalnessNatural vs. man-made
Slides credit: scene understanding seminar
![Page 13: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/13.jpg)
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
Openness• Decrease as number of boundary increases
Slides credit: scene understanding seminar
![Page 14: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/14.jpg)
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
Roughness• Size of elements at each spatial scale
Slides credit: scene understanding seminar
![Page 15: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/15.jpg)
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
Expansion(mainly for man-made scene)A flat view of a building would have a low degree of Expansion. A street with long vanishing lines would have a high degree of Expansion.
Slides credit: scene understanding seminar
![Page 16: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/16.jpg)
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
ExpansionFollow up: Depth estimation from image structureA. Torralba, A. Oliva, 2003
![Page 17: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/17.jpg)
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
Ruggedness (mainly for natural scene)
• Deviation of ground relative to horizon
Slides credit: scene understanding seminar
![Page 18: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/18.jpg)
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
ApproachHow to translate these abstract concept to computable mathematical values?
Discrete Fourier transform(DFT)Windowed DFTPCA
![Page 19: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/19.jpg)
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
ApproachDiscrete Fourier transform(DFT)
DFT of an image:
Where i(x,y) is the intensity distribution
![Page 20: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/20.jpg)
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
What is the fourier transform of an image?
Fourier transform of signal Fourier transform of an imageOriginal image
Fourier transform(amplitude spectrum)
![Page 21: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/21.jpg)
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
Fourier transform of an image
Polar form:Original points represent DC(zeros frequency)
![Page 22: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/22.jpg)
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
Fourier transform of an image
Keep low frequency onlyLost image detail
Keep high frequency onlyLost gradient
![Page 23: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/23.jpg)
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
ApproachDifferent scene categories have different spectral signatures• Amplitude captures roughness• Orientation captures dominant edges
![Page 24: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/24.jpg)
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
• 8 categories. • Natural: Coast, Country, Forest, Mountain• Man-made: Highway, Street, Close-up, Tall building
• Choose 400 target images from database with first 7 neighbors for each.
• Neighbors are define as Euclidean distance between attributes
Experiment
![Page 25: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/25.jpg)
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
![Page 26: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/26.jpg)
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
Experiment• Scenes were considered correctly recognized when
at least 4 neighbors having same category membership.
![Page 27: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/27.jpg)
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
Experiment
![Page 28: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/28.jpg)
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
ExperimentConfusion matrix for natural scene
Confusion matrix for man-made scene
Average Accuracy:WDST: 92% DST: 86%
![Page 29: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/29.jpg)
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
Limitation• Primarily for man-made vs. natural differences.• Coarse-grained classification
![Page 30: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/30.jpg)
Efficient Object Category Recognition Using Classemes
Lorenzo Torresani, Martin Szummer, Andrew Fitzgibbon
![Page 31: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/31.jpg)
Efficient Object Category Recognition Using Classemes
Motivation
Large-scale object category recognitionRequirement:• Novel category
• Zero-shot learning: Possible category does not appear in training example. Countless object category, it is impossible to cover all in training dataset.
• Compact descriptor• Disk vs. memory
• Simple classifier
![Page 32: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/32.jpg)
Efficient Object Category Recognition Using Classemes
Motivation
Existing system:• Attribute approach
• Categories are described by a set of boolean attributes
Has beak Has tail Near water
duck √ × √
• Drawback: Need human to label the training data. Some categories are hard to extract attribute.
![Page 33: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/33.jpg)
Efficient Object Category Recognition Using Classemes
Approach
• Classeme• Represent object as a combination of other object
classes (classeme) to which they are related.• These classeme are extracted automatically and do
not necessarily contain semantic meaning.
c
![Page 34: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/34.jpg)
Efficient Object Category Recognition Using Classemes
Approach
• Classeme Learning• Choose a set of category label from Large Scale Concept
Ontology for Multimedia. C=2659 categories in total• Learn a One-versus-all classifier (by Multiple Kernel
learning)• For image x, it is represented as a vector • To achieve compactness, vector is not stored in double
precision, but quantized to Q levels (1bit to 4 bit). • After getting the representation, apply classification method
such as SVM…
c
1( ) [ ( ),..., ( )]cf x x x
c
![Page 35: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/35.jpg)
Efficient Object Category Recognition Using Classemes
Approach
![Page 36: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/36.jpg)
Efficient Object Category Recognition Using Classemes
Approachsky
c
crying
Get 150 training images for each classeme from bing.com image search engine
![Page 37: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/37.jpg)
Experiment 1: Multiclass classification
Dataset: Caltech256• 256 categories, 30608 images.Competitor: • multiclass SVM• Neural network • Decision forests• Nearest neighour• LP-β
• Combine multiple complementary features(color based, shape based, texture based) and learn the weights for different features.
![Page 38: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/38.jpg)
Experiment 1: Multiclass classification
Accuracy comparison • Accuracy:• 36% versus 42%
• But much faster speed
![Page 39: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/39.jpg)
Experiment 1: Multiclass classification
Accuracy comparison
![Page 40: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/40.jpg)
Experiment 1: Multiclass classification
Speed comparison
Over two orders of magnitude faster!
![Page 41: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/41.jpg)
Experiment 1: Multiclass classification
Over two orders of magnitude faster!
Compactness comparison
![Page 42: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/42.jpg)
Objects as Attributes for Scene Classification
Li-Jia Li*, Hao Su*, Yongwhan Lim, Li Fei-Fei
Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification
L-J. Li, H. Su, E. Xing, L. Fei-Fei.
![Page 43: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/43.jpg)
Motivation
More of a action recognition instead of scene?
![Page 44: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/44.jpg)
Object Recognition Scene classification
Pixel intensity, gradient
Semantic Gap
High Level Task:
Low Level image feature
![Page 45: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/45.jpg)
Approach
After we get OB representation for each image, we can use any machine learning method for the classification.(In this paper, SVM and Logistic regression are chosen).
![Page 46: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/46.jpg)
ApproachSpatial pyramid representation
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. S. Lazebnik, C. Schmid and J. Ponce
![Page 47: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/47.jpg)
Implementation Detail
How to choose the object bank? From where? How many?
Choose the most frequent object from popular dataset (LabelMe, ESP, ImageNet, Flickr) and find their intersection.Finally result in 200 objects.
![Page 48: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/48.jpg)
Implementation Detail
The large object number brings the problem of dimension curve.
The second paper mainly deals with the computation problem.
In this paper, N=200 object detector, computer response on S=12 scales, L=3 spatial pyramid level. So the total response for each image is:200*12*(1+4+16)≈50000 dimensions
![Page 49: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/49.jpg)
Experiment
Datasets:• 15-Scene: 15 natural scene classese• LabelMe: 9 classes• MIT Indoor: 67 indoor scenes• UIUC sports: 8 complex event classes
![Page 50: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/50.jpg)
Experiment
15-Scene
![Page 51: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/51.jpg)
Experiment
LabelMe: 9 categories
beach, mountain, bathroom, church, garage, office, sail, street, forest
![Page 52: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/52.jpg)
Experiment
MIT indoor(67 categories):
![Page 53: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/53.jpg)
ExperimentUIUC sports(8 categories):
![Page 54: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/54.jpg)
Experiment
![Page 55: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/55.jpg)
Experiment
More performance gain on MIT indoor and UIUC sports datasets because these two are more complex.
• Similar texture(low-level) but different objects(semantic information)
• Confirm the effectiveness of object-bank in high-level task.
![Page 56: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/56.jpg)
Experiment
Accuracy with growing object bank
Classification performance continuously increases when more objects are incorporated in the OB representation.
![Page 57: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/57.jpg)
ExperimentComparison with classeme in object recognition
Not fair because classeme is proposed for speed
OB Classeme
Caltech256 39% 36%
![Page 58: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/58.jpg)
Efficiency comparison of Classeme and Object bank
Classeme Object bank Spatial envelope
Feature extraction per image
0.4s 7.2s 0.1s
Feature descriptor size
12KB for continuous626byte for binary
236KB 2KB
Dataset: 50 images randomly selected from Caltech256.
What if we binary the continuous value of object bank?
![Page 59: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/59.jpg)
Spatial Envelope on MIT indoor
Dataset: A subset of MIT indoor which contains 8 categories
Average classification accuracy
By chance
Spatial envelope
17.5% 12.5%
![Page 60: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/60.jpg)
Retrospect
Spatial envelope(2001): Ignore object information
Classemes(2010), Object bank(2010): Utilize object information.
• For coarse-grained scene. Fast scene recognition.
• For fine-grained scene or object categories. Increase in accuracy.
![Page 61: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/61.jpg)
Acknowledge
• Dr. Devi Parikh
![Page 62: Global and high-level image descriptions](https://reader036.vdocuments.us/reader036/viewer/2022062501/5681636e550346895dd44a7c/html5/thumbnails/62.jpg)
• Questions?