discovering collocation patterns: from visual words to visual phrases
DESCRIPTION
Discovering Collocation Patterns: from Visual Words to Visual Phrases. Junsong Yuan, Ying Wu and Ming Yang CVPR’07. Discovering Visual Collocation. An exciting idea: detour. - PowerPoint PPT PresentationTRANSCRIPT
1
Discovering Collocation Patterns:from Visual Words to
Visual Phrases
Junsong Yuan, Ying Wu and Ming YangCVPR’07
2
Discovering Visual Collocation
VisualCollocations
VisualPrimimitives
AB AC
UV
W XY
ZT
U
TT
T B
V A A BCB
W S
T C... C
M JJ
AB AC A A BC
B C... C
...
...
Images
“Bag ofwords”
FeatureExtraction
FeatureQuantization
PatternDiscovery
3
An exciting idea: detour
• Related Work: J. Sivic et al. CVPR04, B. C. Russell et al. CVPR06, G. Wang et al. CVPR06, T. Quack et al. CIVR06, S. C. Zhu et al. IJCV05, …
Text &transaction
data
Imagedata
PLSALDAFIM
Textpattern
Visualpattern
Visualword
lexicon?
Image Data Text Data
?
4
Confrontation
• Spatial characteristics of images– over-counting co-occurrence frequency
• Uncertainty in visual patterns– Continuous visual feature quantized word– Visual synonym and polysemy
EDABCHTABCCSABKZTABMKFABVKSABTKAABEILABO
TRABW...
G12:G13:G67:G78:G79:G112:G113:G198:G215:G216:
AB
ID Group
5
Our Approach
Visual Primitive Dataset
,...},{ BA
A B C ... U V
A,BA,B,
C A,C B,C U,V
W ZYX
...
Clusteringe.g. K-means
Pattern Discovery e.g. FIM
PatternSummarization
New Metric: ADxd
MetricLearninge.g. NCA
wheelsCar
bodiesMeaningful
Patterns
Visualphraselexicon
Visualword
lexicon
6
Selecting visual phrases
• Visual collocations may occur by chance• Selecting phrases by a likelihood ratio test:
– H0: occurrence of phrase P is randomly generated– H1: phrase P is generated by a hidden pattern
• Prior: • Likelihood:
• Check if words are co-located together by chance or statistically meaningful
7
Frequent Word-sets ( |P|>=2 )
AB
ABF ABE
CD
CDE
DECEAE
BE BFAF
Discovery of visual phrases
A B F P
C D E S
A B F T
C D E X
A B D K
……
Closed
FIM
pair-wise student t-test
ranked by L(P)
AB
ABF
BF
AF
likelihood ratio
15.7
14.3
12.2
10.9
9.7CD}}{},{
},{},{},{{
CDBF
ABFAFAB
8
Frequent Itemset Mining (FIM)
• If an itemset is frequent then all of its subsets must also be frequent
nul l
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
Pruned itemsets if{AB} is NOT frequent
9
Phrase Summarization
• Measuring the similarity between visual phrases by KL-divergence Yan et al., SIGKDD 05
• Clustering visual phrases by Normalized-cut
A,B
A,B,F
A,F
B,F
C,D
Normalizedcut
)()(
)()(),(
ji
jiji PGPG
PGPGPPS
G({A,B})
……
KDBA
XEDC
TFBA
SEDC
PFBA
……
KDBA
XEDC
TFBA
SEDC
PFBA
G({B,F})
……
KDBA
XEDC
TFBA
SEDC
PFBA
……
KDBA
XEDC
TFBA
SEDC
PFBA
10
Pattern Summarization Results
H1: wheelsPrc.: 97.5% Rec.: 22.8%
H2: car bodies Prc.: 71.3% Rec.:N.A.
Face database: summarizing top-10 phrases into 6 semantic phrase patterns
Car database: summarizing top-10 phrases into 2 semantic phrase patterns
11
Partition of visual word lexicon
backgroundwords
foregroundwords
}{: iWcodebook },...,,{ 21 MPPP
w2
w20
w77
w2
w1
W168
wk
w3
w4w5
PM
P3
P2
P1
P4
Visual phrases
Metric learning method: • Neighborhood component analysis (NCA).
Goldberger, et al., NIPS05
– improve the leave-one-out performance of the nearest neighbor classifier
12
Evaluation
• K-NN spatial group: K=5• Two image category database: car (123
images) and face (435 images)• Precision of visual phrase lexicon
– the percentage of visual phrases Pi ∈ Ψ that are located in the foreground object
• Precision of background word lexicon– the percentage of background words Wi ∈ Ω−
that are located in the background
• Percentage of images that are retrieved:
13
Results: visual phrases from car category
Visual phrase pattern 1: wheels different colors represent different semantic meanings
Visual phrase pattern 2: car bodies
14
Results: visual phrases from face category
15
Comparison