ontology-based image representation and inferenceece417/lecturenotes/ece417_ontology.pdfpattern...
TRANSCRIPT
-
Ontology-based image representation and inference
Ning Xu
Advisor: Thomas Huang
UIUC
Many Slides from Shen-Fu Tsai, Derek Hoiem
-
Outline
• Traditional image representation and inference
• Ontology-based image representation and inference – What is ontology
– Why use ontology
– Researches on ontology • Semantic hierarchic classifiers [Schmid’07]
• album event recognition [Tsai’11]
• ontological image annotation [Tsai’12]
• Conclusion and future work
-
Outline
• Traditional image representation and inference
• Ontology-based image representation and inference – What is ontology
– Why use ontology
– Researches on ontology • Semantic hierarchic classifiers [Schmid’07]
• album event recognition [Tsai’11]
• ontological image annotation [Tsai’12]
• Conclusion and future work
-
Traditional image representation
• Labels/categories are treated independently;
Dog Bicycle Motorbike
Ex1. one-label problem:
Ex2. multi-label problem:
Office,desk,chair,computer
bedroom,bed,mirror,drawer
outdoor,sunlight,tree,sky
-
Traditional image inference
LAB Histogram
Textons
Bag of SIFT
HOG
x x x x
x
x
x
x x
o o
o o
o = Category
label
Examples Image Features Classifier + +
Slide from Derek Hoiem
http://images.google.com/imgres?imgurl=http://scienceblogs.com/bushwells/upload/2006/07/IcePlantOrgy.JPG&imgrefurl=http://scienceblogs.com/bushwells/2006/07/friday_flower_porn.php&h=1704&w=2272&sz=838&hl=en&start=17&tbnid=RBGFTXqFUNjqAM:&tbnh=113&tbnw=150&prev=/images?q=plant&gbv=2&hl=en&safe=off
-
Training phase
Training Labels
Training Images
Classifier Training
Training
Image Features
Trained Classifier
Slide from Derek Hoiem
-
Testing phase
Training Labels
Training Images
Classifier Training
Training
Image Features
Trained Classifier
Image Features
Testing
Test Image
Trained Classifier Outdoor
Prediction
Slide from Derek Hoiem
-
Outline
• Traditional image representation and inference
• Ontology-based image representation and inference – What is ontology
– Why use ontology
– Researches on ontology • Semantic hierarchic classifiers [Schmid’07]
• album event recognition [Tsai’11]
• ontological image annotation [Tsai’12]
• Conclusion and future work
-
What is ontology
• Ontology
– Prior human knowledge, domain knowledge
– a set of concepts and their relations (part of; is a; co occur etc.) in some domain
Slide from Shen-Fu Tsai
Parmenides was among the first to propose an ontological characterization of the fundamental nature of reality.
http://en.wikipedia.org/wiki/Parmenides
-
General ontology structure
scene
Indoor Outdoor
object
natural artifact
event
sports social
Slide from Shen-Fu Tsai
-
Outline
• Traditional image representation and inference
• Ontology-based image representation and inference – What is ontology
– Why use ontology
– Researches on ontology • Semantic hierarchic classifiers [Schmid’07]
• album event recognition [Tsai’11]
• ontological image annotation [Tsai’12]
• Conclusion and future work
-
Why use ontology
• Scalability
– W/o ontology needs N*(N-1)/2 one-versus-one classifiers or N one-versus-rest classifiers for N concepts;
– W ontology needs approximately ceil(log2N) classifiers for N concepts;
– N can be quite large in real dataset. (imagenet, flickr etc.)
-
Why use ontology
• Independently trained concepts classifiers are limited even erroneous
x
x
x x
x
x
x
x
x o
o
o
o
o
Δ
Δ
Δ Δ
Δ
o
o
2
1
-
Why use ontology
• Ontology enables us more knowledgeable – If we know object A is a sedan, then we also know
A is a car, a vehicle, as well as a means of transportation. W/o the need of training all classifiers.
– If we can’t confidently say A is a sedan or SUV, we can label A as a car.
– Bridging the gap between low level concepts and high level ones
-
Outline
• Traditional image representation and inference
• Ontology-based image representation and inference – What is ontology
– Why use ontology
– Researches on ontology • Semantic hierarchic classifiers [Schmid’07]
• album event recognition [Tsai’11]
• ontological image annotation [Tsai’12]
• Conclusion and future work
-
Semantic hierarchies for image classification [Schmid’07]
• Basic idea: use semantic hierarchies to reflect the similarity among categories in the view of visual appearance.
-
First step: feature extraction
• Harris-Laplace detector and Laplacian detector
• Sift descriptor and hue color descriptor (128D + 36D = 164D)
• Bag of words (1000D dictionary)
-
Choice of classifiers
• SVM classifier with extend Gaussian kernel K(Hi,Hj) = e
-1/A*D(Hi,Hj)
where D(Hi,Hj) = , called distance. Hi and Hj are the dictionary histograms of image i and j. A is the mean value of the distances between all training images
• D = ΣnDn where n indicates channels
-
Second step: extract semantic graph
• WordNet contains over 80000 noun synonym sets called synsets.
• Two kinds of semantic relations are defined as hypernymy/hyponymy (is-a) and holonymy/meronymy (part-of).
Wordnet: http://wordnetweb.princeton.edu/perl/webwn
http://wordnetweb.princeton.edu/perl/webwnhttp://wordnetweb.princeton.edu/perl/webwn
-
Extracted subgraphs
-
Semantic graph pruning
• Part-of relation may permits reasoning which is incorrect from the point view of visual appearance. E.g. A car has fuel which is an organic material does not imply similarity to living organism like a cat.
• Pruning: from the base node, reject those nodes which are not connected by the Is-a relation graph.
-
Third step: construct semantic hierarchic classifier
• Define the support of concept A as
• train a given Bi|A classifier with the is-a and part-of relations by a binary SVM classifier.
• Base node is supported by all training images.
• When support(A) = support(Bi), generate a trivial classifier with only one label.
-
Inference
• Given a test image, start from the base node; • Descend to the linked concept when the classifier
returns a positive answer. • There are possibly multiply paths to one concept in the
ontology, the final decision value is defined as
c is the concept, v is the concept set containing c, s is the base node, P is the possible path set from s to v, e are the edges in P. In other words, the maximum decision value over all possible paths is returned, whereas for a given path the minimum decision value over its edge is chosen.
-
Inference
Test image
-
Complexity
• Define complexity = the number of binary classifiers evaluated for a test image.
• It’s difficult to measure the complexity since not only depend on the structure of the hierarchy but also the number of paths considered.
• Only rough estimation on VOC 06 is O(N0.64) which is better than traditional one-versus-rest classifier O(N).
-
Experimental results
• Image dataset (VOC’06):
– 10 concepts: bike, bus, car, cat, cow, dog, horse, motorbike, person, sheep;
– 1277 training images, 1341 testing images
-
Experimental results
• Comparing algorithms: – OAR: One-Against-Rest classifier;
– AVH: Automatically constructed Visual Hierarchy which is a binary tree obtained by iteratively merging categories with smallest average distance;
– SSH: Simple Semantic Hierarchy which only considers is-a relation;
– ESH: Extended Semantic Hierarchy which considers both is-a and part-of relations.
-
Experimental results
-
Experimental results
• A: low level concepts in VOC’06; – SH methods are generally better than OAR, both improve the
efficiency and no loss of accuracy; – SH methods are generally better than AVH, meaning that
apparent visual similarity may not generalize well to object classes while semantic knowledge can better help;
• B: High level concepts in VOC’06; – SH methods are capable of reasoning high level concepts
• C: images from external dataset by querying “vehicle window”, “windscreen”, “windshield” in Google; – To test the generalization ability of classifiers; – SSH can’t work since only is-a relation is considered; – For OAR and AVH, simple reasoning is applied that if there is a
car or bus then there is a window;
-
Publication
• Marszalek, Marcin, and Cordelia Schmid. "Semantic hierarchies for visual object recognition." Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on. IEEE, 2007.
-
Outline
• Traditional image representation and inference
• Ontology-based image representation and inference – What is ontology
– Why use ontology
– Researches on ontology • Semantic hierarchic classifiers [Schmid’07]
• album event recognition [Tsai’11]
• ontological image annotation [Tsai’12]
• Conclusion and future work
-
album event recognition [Tsai’11]
• Goal: recognize the event/topic of a given album (a set of images);
-
Basic idea
• Use co-occurrence relation to identify typical concepts for each event;
– Event hiking:
• Positive concepts: mountain, people walking, outdoor etc.;
• Negative concepts: bedroom, indoor etc.;
– Event Valentine’s day:
• Positive concepts: chocolate, heart, candy;
• Negative concepts: turkey, green clothes etc.;
-
Framework of album event classification
-
What is the object pattern: imperfect object detection
Discovered patterns: {cloud(5/7), sky(6/7), mountain(5/7), indoor(1/7)}
With imperfect detection: let’s discretize the continuous-valued scores of detector output:
Quantized detection: {cloud(5/7), sky(6/7), mountain(5/7), indoor(1/7), person(2/7)} {cloud(5/7), sky(6/7), mountain(5/7), indoor(1/7), person(5/7)}
-
Dataset construction: select popular holidays using Flickr
-
Dataset construction: picking up relevant objects
• For each tag T, Flickr provide some relevant tags
• Take union of relevant object tags to all 10 holidays 500 tags
• For each holiday H
– Rank each tag T by
• R(H, T) = |I(H and T)| / |I(H or T)|
• Pick the top 50 tags
-
List of 38 object detectors Holidays Positively relevant objects
Christmas Christmas tree, gift
Easter Easter egg, basket, rabbit, church
Halloween Attire, pumpkin, jack-o-lantern
Independence Day American flag, firework, crowd
Mardi Gras Mask, necklace, attire, feather boa
Memorial Day American flag, uniform, military uniform, music band
New Year’s Eve Champagne, firework, crowd
St. Patrick’s Day Music band, crowd
Thanksgiving Food, dinner, turkey, pumpkin
Valentine’s Day Heart, bouquet
Other objects Accordion, bassoon, child, cross, drum, euphonium, flag, french horn, light source, room light, shopping basket, soil, stage, table
-
Some Mined Patterns
-
Some Mined Patterns
-
Pattern ranking for album event classification
• Let f(p) = percentage of photos containing pattern p in an album
• For each event E
– For each pattern p
• Try predicting E using f(p)
• Measure the prediction performance by Average Precision(AP)
– Rank all patterns by their APs with respect to E
• Take the union of top patterns for all events
-
Experimental results
• Dataset: 1) small dataset: 3 topics: potluck, hiking, concert; 2) 10 holiday albums collected from flickr;
• Comparing algorithm: – Image-based multiclass Adaboost (SAMME)
• J. Yuan, J. Luo, and Y. Wu. Mining compositional features for boosting. In IEEE CVPR 2008;
• Difference: 1) Mining patterns from the whole dataset; 2) results are majority vote of the image labels of the given album.
– Compositional object pattern with non-flexible pattern (COPF_base)
– Compositional object pattern with flexible pattern (COPF)
-
Classification results of small dataset
-
Classification results of 10 holiday dataset
-
Publication
• Tsai, Shen-Fu, et al. "Compositional object pattern: a new model for album event recognition." Proceedings of the 19th ACM international conference on Multimedia. ACM, 2011.
-
Outline
• Traditional image representation and inference
• Ontology-based image representation and inference – What is ontology
– Why use ontology
– Researches on ontology • Semantic hierarchic classifiers [Schmid’07]
• album event recognition [Tsai’11]
• ontological image annotation [Tsai’12]
• Conclusion and future work
-
Ontological image annotation [Tsai’12]
• Input: image I; concepts C1, C2, …, Cn; output values x1, x2, …, xn from coarse detectors;
• Output: y1, y2, …, yn, where yi is 1 or -1 which indicates whether Ci is present in the image or not.
C3
C5 C4
C1 C2
Coarse C1 detector Coarse C1 detector
Coarse C2 detector Coarse C2 detector
Coarse C3 detector Coarse C3 detector
Coarse C4 detector Coarse C4 detector
Coarse C5 detector Coarse C5 detector
image
refined C1 detection refined C1 detection
refined C2 detection refined C2 detection
refined C3 detection refined C3 detection
refined C4 detection refined C4 detection
refined C5 detection refined C5 detection
-
Basic idea
• Joint inference of concepts, considering their subclass and co-occurrence relations
concepts of interest
WordNet
subclass relation
subclass extraction
Training image
co-occurrence learner
co-occurrence relation
inference
-
Formulation
Unary potential
Potential function
Pairwise potential
-
Relation constraints
• Subclass constraint (hard constraint) – If Ca (dog) is a subclass of Cb (animal), then yb ≥ ya
– Relation obtained from WordNet
• Co-occurrence reward/penalty (soft constraint) – E.g. reward (indoor, table) pair
– E.g. penalize (computer, beach) pair
– Learned from training set
– Only positive pairs are considered
-
Inference
• Find the assignment y that satisfies all constraints with the highest score
-
subclass relation
• Indoor
• Bedroom
• Office
• Outdoor
• Light
• Room light
• Street light
• Computer
• Laptop
• Desktop computer
entity
artifact
devicestructure,
construction
personal computer
source of illumination
bedroom office
laptop desktop computer
room light
street light
WordNet subclass
relations
-
Final subclass relation
-
Ontological learning
-
Baseline algorithms
• RAW: raw output of initial detectors
• Semantic Hierarchy (SH): conditional classifier on each subclass/part-of link
• SVM fusion
-
Results: AUC with %50 training
indoor
outd
oor
bedro
om
offic
e
light
room
light
str
eetlig
ht
com
pute
r
lapto
p
deskto
p
ave
0.6
0.7
0.8
0.9
AU
C
OI
SH
SVM
RAW
-
AUC v.s. #training
20 35 50Percentage of training (%)
0.68
0.70
0.72
0.74
0.76
0.78
Mean A
UC
OI
SH
SVM
RAW
-
Publication
• Tsai, Shen-Fu, et al. "Ontological Inference Framework with Joint Ontology Construction and Learning for Image Understanding." Multimedia and Expo (ICME), 2012 IEEE International Conference on. IEEE, 2012.
-
Outline
• Traditional image representation and inference
• Ontology-based image representation and inference – What is ontology
– Why use ontology
– Researches on ontology • Semantic hierarchic classifiers [Schmid’07]
• album event recognition [Tsai’11]
• ontological image annotation [Tsai’12]
• Conclusion and future work
-
Conclusion
• Advantages:
– Joint inference;
– Scalability;
– More robust and accurate classifiers;
– Bridging the low level semantic and high level ones;
• Disadvantages:
– Harder to understand than traditional methods
– Sometimes prior knowledge is wrong;
– Efficiency and accuracy are usually contradictory;
-
Future work
• Explore ontology deeper to see how much improvement can be achieved in terms of accuracy and efficiency;
• Explore ontology wider to apply ontology on many other domains such as medical imaging, healthcare, AI etc.;
• Explore how to construct ontology automatically or semiautomatically;
-
Thank you !!