global and high-level image descriptions
DESCRIPTION
Global and high-level image descriptions. Present by Yao Pan. Low-level descriptor. Pixel Patch Mathematical transformation ( Laplace,Gauss ) of pixel and patch(SIFT,GIST…). High Level Task:. Object Recognition Scene classification. Semantic Gap. Pixe l intensity, gradient. Low Level - PowerPoint PPT PresentationTRANSCRIPT
Global and high-level image descriptions
Present by Yao Pan
Low-level descriptor
• Pixel• Patch• Mathematical transformation
(Laplace,Gauss) of pixel and patch(SIFT,GIST…)
Object Recognition Scene classification
Pixel intensity, gradient
Semantic Gap
High Level Task:
Low Level image feature
Analogy to text analysis
We want to classify which author write this article?Or what type of content(science, politics, entertainment)?
Letter frequency
Meaning groupSentencephrasewordLetter frequency
• Modeling the shape of the scene: a Holistic Representation of the Spatial EnvelopeA. Oliva and A. Torralba. IJCV 2001
Efficient Object Category Recognition Using ClassemesLorenzo Torresani, Martin Szummer, Andrew Fitzgibbon. ECCV 2010
Objects as Attributes for Scene ClassificationLi-Jia Li*, Hao Su*, Yongwhan Lim, Li Fei-Fei. ECCV 2010
Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature SparsificationL-J. Li, H. Su, E. Xing, L. Fei-Fei. NIPS 2010
Overview
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
A. Oliva and A. Torralba. IJCV 2001
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
Motivation
Scene CategorizationOne way of doing this is: segment and detect the objects in the picture. Classify the scene according to which objects the picutre contains.
But, segmentation and object detection are hard problems.
Picture from: J. Yao, S. Fidler and R. Urtasun
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
MotivationExperiment in Cognitive PsychologyMary C.Potter, 1975, science• Subjects were presented a target scene picture or a
scene name beforehand• Then they were presented a sequence of pictures at
rates up to 8 per second. They were asked to press the button when they saw the target.
• Detection rate are surprising high (more than 90%).
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
Motivation• Subsequent experiment implies that object information
might be ignored during rapid categorization of scene.
• Human are using some holistic visual features(spatial layout, spatial structure, shape of scene...).
• In this paper, the author terms them as Spatial Envelope.• Scenes belonging to same category share similar spatial
structure that can be extracted without segmentating the image.
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
ApproachWhat is scene?• Traditionally: unconstrained configuration of
objects.• In this paper: treat it as an individual object.
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
What exactly is spatial envelope?
Spatial Envelope Properties• Naturalness
• Straight horizontal and vertical line in man-made scene vs. textured zone of natural landscape.
• Openness• Roughness• Expansion• RuggednessFinding a low-dimensional scene space that scenes of same category are projected together.
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
NaturalnessNatural vs. man-made
Slides credit: scene understanding seminar
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
Openness• Decrease as number of boundary increases
Slides credit: scene understanding seminar
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
Roughness• Size of elements at each spatial scale
Slides credit: scene understanding seminar
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
Expansion(mainly for man-made scene)A flat view of a building would have a low degree of Expansion. A street with long vanishing lines would have a high degree of Expansion.
Slides credit: scene understanding seminar
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
ExpansionFollow up: Depth estimation from image structureA. Torralba, A. Oliva, 2003
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
Ruggedness (mainly for natural scene)
• Deviation of ground relative to horizon
Slides credit: scene understanding seminar
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
ApproachHow to translate these abstract concept to computable mathematical values?
Discrete Fourier transform(DFT)Windowed DFTPCA
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
ApproachDiscrete Fourier transform(DFT)
DFT of an image:
Where i(x,y) is the intensity distribution
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
What is the fourier transform of an image?
Fourier transform of signal Fourier transform of an imageOriginal image
Fourier transform(amplitude spectrum)
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
Fourier transform of an image
Polar form:Original points represent DC(zeros frequency)
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
Fourier transform of an image
Keep low frequency onlyLost image detail
Keep high frequency onlyLost gradient
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
ApproachDifferent scene categories have different spectral signatures• Amplitude captures roughness• Orientation captures dominant edges
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
• 8 categories. • Natural: Coast, Country, Forest, Mountain• Man-made: Highway, Street, Close-up, Tall building
• Choose 400 target images from database with first 7 neighbors for each.
• Neighbors are define as Euclidean distance between attributes
Experiment
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
Experiment• Scenes were considered correctly recognized when
at least 4 neighbors having same category membership.
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
Experiment
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
ExperimentConfusion matrix for natural scene
Confusion matrix for man-made scene
Average Accuracy:WDST: 92% DST: 86%
Modeling the shape of the scene: a Holistic Representation of the Spatial Envelope
Limitation• Primarily for man-made vs. natural differences.• Coarse-grained classification
Efficient Object Category Recognition Using Classemes
Lorenzo Torresani, Martin Szummer, Andrew Fitzgibbon
Efficient Object Category Recognition Using Classemes
Motivation
Large-scale object category recognitionRequirement:• Novel category
• Zero-shot learning: Possible category does not appear in training example. Countless object category, it is impossible to cover all in training dataset.
• Compact descriptor• Disk vs. memory
• Simple classifier
Efficient Object Category Recognition Using Classemes
Motivation
Existing system:• Attribute approach
• Categories are described by a set of boolean attributes
Has beak Has tail Near water
duck √ × √
• Drawback: Need human to label the training data. Some categories are hard to extract attribute.
Efficient Object Category Recognition Using Classemes
Approach
• Classeme• Represent object as a combination of other object
classes (classeme) to which they are related.• These classeme are extracted automatically and do
not necessarily contain semantic meaning.
c
Efficient Object Category Recognition Using Classemes
Approach
• Classeme Learning• Choose a set of category label from Large Scale Concept
Ontology for Multimedia. C=2659 categories in total• Learn a One-versus-all classifier (by Multiple Kernel
learning)• For image x, it is represented as a vector • To achieve compactness, vector is not stored in double
precision, but quantized to Q levels (1bit to 4 bit). • After getting the representation, apply classification method
such as SVM…
c
1( ) [ ( ),..., ( )]cf x x x
c
Efficient Object Category Recognition Using Classemes
Approach
Efficient Object Category Recognition Using Classemes
Approachsky
c
crying
Get 150 training images for each classeme from bing.com image search engine
Experiment 1: Multiclass classification
Dataset: Caltech256• 256 categories, 30608 images.Competitor: • multiclass SVM• Neural network • Decision forests• Nearest neighour• LP-β
• Combine multiple complementary features(color based, shape based, texture based) and learn the weights for different features.
Experiment 1: Multiclass classification
Accuracy comparison • Accuracy:• 36% versus 42%
• But much faster speed
Experiment 1: Multiclass classification
Accuracy comparison
Experiment 1: Multiclass classification
Speed comparison
Over two orders of magnitude faster!
Experiment 1: Multiclass classification
Over two orders of magnitude faster!
Compactness comparison
Objects as Attributes for Scene Classification
Li-Jia Li*, Hao Su*, Yongwhan Lim, Li Fei-Fei
Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification
L-J. Li, H. Su, E. Xing, L. Fei-Fei.
Motivation
More of a action recognition instead of scene?
Object Recognition Scene classification
Pixel intensity, gradient
Semantic Gap
High Level Task:
Low Level image feature
Approach
After we get OB representation for each image, we can use any machine learning method for the classification.(In this paper, SVM and Logistic regression are chosen).
ApproachSpatial pyramid representation
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. S. Lazebnik, C. Schmid and J. Ponce
Implementation Detail
How to choose the object bank? From where? How many?
Choose the most frequent object from popular dataset (LabelMe, ESP, ImageNet, Flickr) and find their intersection.Finally result in 200 objects.
Implementation Detail
The large object number brings the problem of dimension curve.
The second paper mainly deals with the computation problem.
In this paper, N=200 object detector, computer response on S=12 scales, L=3 spatial pyramid level. So the total response for each image is:200*12*(1+4+16)≈50000 dimensions
Experiment
Datasets:• 15-Scene: 15 natural scene classese• LabelMe: 9 classes• MIT Indoor: 67 indoor scenes• UIUC sports: 8 complex event classes
Experiment
15-Scene
Experiment
LabelMe: 9 categories
beach, mountain, bathroom, church, garage, office, sail, street, forest
Experiment
MIT indoor(67 categories):
ExperimentUIUC sports(8 categories):
Experiment
Experiment
More performance gain on MIT indoor and UIUC sports datasets because these two are more complex.
• Similar texture(low-level) but different objects(semantic information)
• Confirm the effectiveness of object-bank in high-level task.
Experiment
Accuracy with growing object bank
Classification performance continuously increases when more objects are incorporated in the OB representation.
ExperimentComparison with classeme in object recognition
Not fair because classeme is proposed for speed
OB Classeme
Caltech256 39% 36%
Efficiency comparison of Classeme and Object bank
Classeme Object bank Spatial envelope
Feature extraction per image
0.4s 7.2s 0.1s
Feature descriptor size
12KB for continuous626byte for binary
236KB 2KB
Dataset: 50 images randomly selected from Caltech256.
What if we binary the continuous value of object bank?
Spatial Envelope on MIT indoor
Dataset: A subset of MIT indoor which contains 8 categories
Average classification accuracy
By chance
Spatial envelope
17.5% 12.5%
Retrospect
Spatial envelope(2001): Ignore object information
Classemes(2010), Object bank(2010): Utilize object information.
• For coarse-grained scene. Fast scene recognition.
• For fine-grained scene or object categories. Increase in accuracy.
Acknowledge
• Dr. Devi Parikh
• Questions?