object recognition
DESCRIPTION
Object recognition. Object Classes. Individual Recognition. Is this a dog?. Variability of Airplanes Detected. Variability of Horses Detected. ClassNon-class. Class Non-class. Recognition with 3-D primitives. Geons. Visual Class: Common Building Blocks. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/1.jpg)
Object recognition
![Page 2: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/2.jpg)
Object Classes
![Page 3: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/3.jpg)
Individual Recognition
![Page 4: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/4.jpg)
Is this a dog?
![Page 5: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/5.jpg)
Variability of Airplanes Detected
![Page 6: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/6.jpg)
Variability of Horses Detected
![Page 7: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/7.jpg)
Class Non-class
![Page 8: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/8.jpg)
Class Non-class
![Page 9: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/9.jpg)
![Page 10: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/10.jpg)
Recognition with 3-D primitives
Geons
![Page 11: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/11.jpg)
Visual Class: Common Building Blocks
![Page 12: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/12.jpg)
Optimal Class Components?
• Large features are too rare
• Small features are found
everywhere
Find features that carry the highest amount of information
![Page 13: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/13.jpg)
Entropy
Entropy:
x = 0 1 H
p = 0.5 0.5 ? 0.1 0.9 0.47 0.01 0.99 0.08
)p(x log )p(x- H i2i
![Page 14: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/14.jpg)
Mutual Information I(x,y)
X alone: p(x) = 0.5, 0.5 H = 1.0
X given Y:
Y = 0 Y = 1
p(x) = 0.8, 0.2 H = 0.72
p(x) = 0.1, 0.9H = 0.47
H(X|Y) = 0.5*0.72 + 0.5*0.47 = 0.595
H(X) – H(X|Y) = 1 – 0.595 = 0.405
I(X,Y) = 0.405
![Page 15: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/15.jpg)
Mutual information
H(C) when F=1 H(C) when F=0
I(C;F) = H(C) – H(C/F)
F=1 F=0
H(C)
))(()()( cPLogcPcH
![Page 16: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/16.jpg)
Mutual Information II
yx ypxp
yxpyxpYXI
, )()(
),(log),(),(
![Page 17: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/17.jpg)
Computing MI from Examples
• Mutual information can be measured from examples:
100 Faces 100 Non-faces
Feature: 44 times 6 times
Mutual information: 0.1525H(C) = 1, H(C|F) = 0.8475
![Page 18: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/18.jpg)
Full KL Classification Error
FC
p(F|C)
q(C|F)
p(C)
![Page 19: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/19.jpg)
Optimal classification features
• Theoretically: maximizing delivered information minimizes classification error
• In practice: informative object components can be identified in training images
![Page 20: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/20.jpg)
Mutual Info vs. Threshold
0.00 20.00 40.00
Detection threshold
Mu
tu
al
Info
forehead
hairline
mouth
eye
nose
nosebridge
long_hairline
chin
twoeyes
Selecting Fragments
![Page 21: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/21.jpg)
Adding a New Fragment(max-min selection)
?
MIΔ
MI = MI [Δ ;class] - MI [ ;class ]Select: Maxi Mink ΔMI (Fi, Fk)
)Min. over existing fragments, Max. over the entire pool(
);(),;(min);(),;( jjiij
i FCMIFFCMIFCMIFFCMI
![Page 22: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/22.jpg)
Highly Informative Face Fragments
![Page 23: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/23.jpg)
Intermediate Complexity
0
5
10
15
0 1 2 3
Relative object size
100
0123456
0 1 2 3 4
Relative object size
100
x M
erit
0
0.2
0.4
0.6
0.8
1
1.2
0 0.5 1 1.5 2
Relative resolution
- 0 . 5
0
0 . 5
1
1 . 5
0 1 2 3
Relative object size
Relative mutual info.
100 x Merit, weight
a. b.
100 x Merit, weight
100 x Merit, weight
![Page 24: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/24.jpg)
Decision
Combine all detected fragments Fk:
∑wk Fk > θ
![Page 25: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/25.jpg)
Optimal Separation
SVMPerceptron
∑wk Fk = θ is a hyperplane
![Page 26: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/26.jpg)
Combining fragments linearlyConditional independence:
P(F1,F2 | C) = p(F1|C) p(F2|C)
)/()/(
NCFpCFp
> θ
)|(
)|(
NCFip
cFip
> θ
W(Fi) = log)|(
)|(
NCFip
cFip
Σw(Fi) > θ
![Page 27: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/27.jpg)
• Σw(Fi) > θ
If Fi=1 take log)|1(
)|1(
NCFip
cFip
If Fi=0 take log)|0(
)|0(
NCFip
cFip
Instead: Σ wi > θOn all the detected fragments
only
With: wi = w(Fi=1) – w(Fi=0)
![Page 28: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/28.jpg)
Class II
![Page 29: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/29.jpg)
Class Non-class
![Page 30: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/30.jpg)
Fragments with positions
∑wk Fk > θ
On all detected fragments within their regions
![Page 31: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/31.jpg)
Horse-class features
![Page 32: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/32.jpg)
Examples of Horses Detected
![Page 33: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/33.jpg)
Interest points (Harris)SIFT Descriptors
Ix2 IxIy
IxIy
Iy2
∑
![Page 34: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/34.jpg)
Harris Corner Operator
<Ix2> < IxIy<
< < yIxI < yI2>
H=
Averages within a neighborhood.
Corner: The two eigenvalues λ1, λ2 are large
Indirectly:
‘Corner’ = det(H) – k trace2(H)
![Page 35: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/35.jpg)
Harris Corner Examples
![Page 36: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/36.jpg)
SIFT descriptor
David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004), pp. 91-110
Example :
4*4 sub-regions
Histogram of 8 orientations in each
V = 128 values:
g1,1,…g1,8,… …g16,1,…g16,8
![Page 37: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/37.jpg)
SIFT
![Page 38: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/38.jpg)
Constellation of Patches Using interest points
Fegurs, Perona, Zissermann 2003
Six-part motorcycle model, joint Gaussian ,
![Page 39: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/39.jpg)
Bag of wordsand Unsupervised Classification
ObjectObject Bag of ‘words’Bag of ‘words’
![Page 40: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/40.jpg)
Bag of visual words A large collection of image patches
–
1.Feature detection 1.Feature detection and representationand representation
•Regular grid– & VogelSchiele ,2003
–Fei- ,Fei & Perona2005
![Page 41: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/41.jpg)
Each class has its words historgram
–
–
–
![Page 42: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/42.jpg)
pLSAClassify document automatically, find related documents, etc. based on word
frequency.
Documents contain different ‘topics’ such as Economics, Sports, Politics, France… Each topic has its typical word frequency. Economics will have high occurrence of
‘interest’, ‘bonds’ ‘inflation’ etc.
We observe the probabilities p(wi | dn) of words and documents
Each document contains several topics, zk
A word has different probabilities in each topic, p(wi | zk). A given document has a mixture of topics: p(zk | dn) The word-frequency model is:
p(wi | dn) = Σkp(wi|zk) p(zk | dn)
pLSA was used to discover topics, and arrange documents according to their topics.
![Page 43: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/43.jpg)
pLSA
The word-frequency model is:
p(wi | dn) = Σkp(wi|zk) p(zk | dn)
We observe p(wi | dn) and find the best p(wi|zk) and p(zk | dn) to explain the data
pLSA was used to discover topics, and then arrange documents
according to their topics.
![Page 44: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/44.jpg)
Discovering objects and their location in images
Sivic, Russel, Efros, Freedman & Zisserman CVPR 2005
Uses simple ‘visual words’ for classification
Not the best classifier, but obtains unsupervised classification, using pLSA
![Page 45: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/45.jpg)
Visual words – unsueprvised classification
• Four classes: faces, cars, airplanes, motorbikes, and non-class. Training images are mixed.
• Allowed 7 topics, one per class, the background includes 3 topics.
• Visual words: local patches using SIFT descriptors. – (say local 10*10 patches)
codewords dictionarycodewords dictionary
![Page 46: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/46.jpg)
Learning
• Data: the matrix Dij = p(wi | Ij)• During learning – discover ‘topics’ (classes +
background) • p(wi | Ij) = Σ p(wi | Tk) p(Tk | Ij )
• Optimize over p(wi | Tk), p(Tk | Ij )• The topics are expected to discover classes• Got mainly one topic per class image.
![Page 47: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/47.jpg)
Results of learning
![Page 48: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/48.jpg)
Classifying a new image
• New image I:
• Measure p(wi | I)
• Find topics for the new image:
• p(wi | I) = Σ p(wi | Tk) p(Tk | I)
• Optimize over the topics Tk
• Find the largest (non-background) topic
![Page 49: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/49.jpg)
Classifying a new image
![Page 50: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/50.jpg)
On general model learning
• The goal is to classify C using a set of features F. • F have been selected (must have high MI(C;F)) • The next goal is to use F to decide on the class C.
• Probabilistic approach: • Use observations to learn the joint distribution p(C,F)• In a new image, F is observed, find the most likely C, • Max (C) p(C,F)
![Page 51: Object recognition](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814636550346895db3456f/html5/thumbnails/51.jpg)
General model learning • To learn the joint distribution p(C,F): • The model is of the form pθ(C,F)
– Or: pθ(C,X,F)
• For example we had – words in documents: – p(w,D) = Πp(wi,D)– p(wi | D) = Σ p(wi | Tk) p(Tk | D)
• Training examples used to determine optimal θ by maximizing pθ(data)– max (C,X, θ) pθ(C,X,F)
• When θ known, classify new example:– max (C,X) pθ(C,X,F)