andrey v. savchenko - sequential hierarchical image recognition based on the pyramid histograms of...
TRANSCRIPT
Andrey V. SavchenkoNational Research University Higher School of EconomicsEmail: [email protected]
Co-authors: Vladimir Milov (N. Novgorod State Technical University)Natalya Belova (NRU HSE, Moscow)
National Research University Higher School of EconomicsNizhny Novgorod
THE 4TH INTERNATIONAL CONFERENCE ON ANALYSIS OF IMAGES, SOCIAL NETWORKS, AND TEXTS
Sequential Hierarchical Image Recognition based on the Pyramid Histograms of Oriented Gradients
with Small Samples
Outline
1.Overview. Rough sets. Three-way decisions2.Hierarchical image recognition. Pyramid HOG3.Sequential three-way decisions in image recognition4.Experimental results. Face recognition5.Conclusion and future work
Overview. Rough sets
Pawlak, Zdzisław. Rough Sets: Theoretical Aspects of Reasoning About Data, 1991
Conferences:1. JRS (Joint Roush Set Symposium): - RSEISP: Rough Sets and Emerging Intelligent
Systems Paradigms - RSCTC: Conference on Rough Sets and Current
Trends in Computing2. International Joint Conference on Rough Sets
(IJCRS) 3. Rough Set Theory Workshop (RST)…
Key idea: set is represented with lower and upper approximations
Three regions of a target set S from universal set U:1. Positive region POS(S)2. Negative regions NEG(S)3. Boundary region: U-POS(S)-NEG(S)
Three-way decisions (TWD) and binary classification
Yiyu Yao, Three-way decisions with probabilistic rough sets, Information Sciences, 2010
“Rules constructed from the three regions are associated with different actions and decisions, which immediately leads to the notion of three-way decision rules. A positive rule makes a decision of acceptance, a negative rule makes a decision of rejection, and a boundary rule makes a decision of abstaining”
Pattern recognition: it is required to assign a query object X to one of C classes specified by the database of reference (model) objects. It is assumed that the class label of the rth model object is known
In case of binary classification (C=2):1. Positive decision - accept the first class2. Negative decision - reject the first class and accept the second class3. Boundary decision: delay the final decision and do not accept either first or
second class
{ }Crc ,...,1)( ∈
Three-way decisions and multi-class recognition
Obvious enhancement to multi-class recognition – (C+1)-way decisions1. Accept class c=12. Accept class c=2…C. Accept class c=CC+1. Boundary decision: delay the decision process if the classification result is
unreliable
Known way to reject unreliable decision - Chow’s rule.Chow C. On optimum recognition error and reject tradeoff // IEEE Trans. Inf. Theory, 1970
( )0
max},...,1{
pXcPCc
≤∈
ro
ropΠ−Π+Π
Π−Π=
1001
100
1) П10 – losses of incorrect decision, which has not been rejected 2) П01 – losses of rejection of correct decision 3) Пro – cost of reject option (Пro≤ П10)
Sequential three-way decisions and granular computing
Key question: how to make a decision if the reject option was chosen?
Yao Y. Granular Computing and Sequential Three-Way Decisions //Proc. of Rough Sets and Knowledge
Technology, LNCS, 2013:
"Objects with a non-commitment decision may be further investigated by using fine-grained granules"
Issues to address:1) How to define granularity levels in a general way for practically important
composite objects so that high granularity levels are processed faster than the low one?
2) It is not necessary that the most reliable decision is obtained at the low granularity level. How to define the way to make a final decision if reject option is chosen at the last level?
3) Is it possible to apply sequential TWD to each granularity level?
Query object X
Check if decision is unreliable
... ...
Level L
Classifier
Resulted class
Level 1
Classifier
Check if decision is unreliable
Level (L-1)
Classifier
Histograms of Oriented Gradients (HOG)Proposed in (Dalal and Triggs, 2005)
Criterion: { }( ) ( )( )∑
=∑=
∆+∆+
∆≤∆∆≤∆∈
)1(
1
)2(
2
212211,
)2()1(
,...,1 1 1
,,,minmin
2
1
K
k
K
k
kkHkkHUV
KKr
Rr
ρ
Image descriptors:1. Local (SIFT, SURF, etc.): a) Keypoint extractionb) Descriptor extraction2. Global (color histograms, HOG)a) Object detectionb) Descriptor extraction
Gradient orientation histogram (from (Lowe, 2004))
Statistical classification
Instead of unknown class distributions let’s use the Gaussian Parzen kernel.
Thus, for group-choice classification of the segment and naïve assumption of features independence inside each segment the generalized PNN is used
Classification task is reduced to the testing of simple hypothesis. In case of equal prior probabilities the Bayesian rule is as follows:
( )
( )∏ ∑= =
=
=
n
j
n
j rnr
r
r
r
kkrj
xkkjxKn
kkWkkXf
1 12121
2121
)~
,~
()(
),,(1
)~
,~
(),(
Final classifier with assumption of segments independence:
Production layer is added to the traditional PNN structure
Unfortunately, if the distribution estimates are used instead of unknown distributions, such approach is not optimal. It is necessary to check complex hypothesis
{ }( )( ))2
~,1
~()2,1(
,...,1max kkrWkkXfrp
Rr⋅
∈
( )
∑=
∑=
∏=
∑=
∆+∆+⋅
∈∆≤∆∆≤∆
)1(
1
)2(
2
221121,1 1 1 1
),()(
),,(1
min},...,1{
max
2
1
K
k
K
k
n
j
rn
rj
kkr
rjxkkjxK
nrn
rpRr
Segment homogeneity testingIdea [Borovkov, 1984] - it is necessary to check complex hypothesis of features samples homogeneity.The following criterion is known to be asymptotically minimax:
Distribution of hypothesis is estimated by the united sample {X(k1,k2), as it is done in Lehmann-Rosenblatt test.
)2~
,1~
,2,1()(
kkkkr
W
Homogeneity-Testing PNN (HT-PNN) for piecewise-regular object recognition
(Savchenko//Proc. of ANNPR, LNAI, 2012)
{ }{ }
∈
)2~
,1~
,2,1()(
)2~
,1~
(),...,2~
,1~
(1),2,1()(
,...,)1(
sup
,...,1
max kkkkr
WkkRXkkXkkXfR
ffRr
)}2~
,1~
( kkX r
( )( )
∏=
∑=
∆+∆+∆+∆+
∑=
∆+∆+
+×
∏=
×
∑=
∆+∆+
∑=
∆+∆+
+×
×∑=
∑=
++
⋅
∈∆≤∆∆≤∆
rn
rjrn
rj
kkr
rjxkk
r
rjxK
n
j
kkj
xkkr
rjxK
n
jn
j
kkj
xkkjxK
rn
rj
kkr
rjxkkjxK
K
k
K
k rnnrnn
rnrnnn
Rr
1
11;
),()(
1;),,(
)(
11
),(1
),,()(
1
1
11
),(1
),,(
1
),()(
),,(
1
1 1
min},...,1{
max
22112211
212211
221121
221121
)1(
1
)2(
2,
2
1
Discrete featuresComputing efficiency of the PNN and the HT-PNN
⋅⋅⋅+∆ ∑
=
R
rrr VUUVO
1
2)12(
If there are only N feature values, segments are described with their histograms (HOGs) Nikkirkkiw ,1)},
~,
~(;{)},,({ 2121 =θ
∑=
=N
j
kkKkkr
jijr
iK1
)~
,~
()~
,~
( 21)(
21)(; θθ
∑=
=N
j
kkwKkkw jijiK1
),(),( 2121;
( )nnkkwnkknkkkk riKr
iKrri +⋅+⋅=Σ /)),()
~,
~(()
~,
~,,(
~21;21
)(;2121
)(; θθ
PNNEquivalent to the Kullback-Leiblerdivergence, if smoothing parameter σ→0
HT-PNNGeneralization of the Jensen-Shannon divergence
Approximate HT-PNN (A-HT-PNN) Generalization of chi-square distance
See details in (Savchenko // Neural Networks, 2013)
{ }Rr
N
ir
iK
iKi
K
k
K
k kk
kkwkkw
KnrXX,...,1
)1(
1
)2(
2 1 2211)(;
21;21
,PNN min
1 1 ),(
),(ln),(min
1),(
2
1 ∈=∆≤∆∆≤∆
→∑=
∑= ∆+∆+
= ∑θ
ρ
∑= ΣΣ
−
⋅+⋅=
N
iri
riKr
irri
iKi
kkkk
kkkkn
kkkk
kkwkkwnrHH
1 2121)(
;
21)(;
21)(
2121)(
;
21;21PNNHT
)~
,~
;,(~
)~
,~
(ln)
~,
~(
)~
,~
;,(~
),(ln),(),(
θ
θθ
θρ
∑=
Σ
Σ
Σ
Σ
−−
−+
−⋅=
N
ir
iK
ri
ri
riKr
iriK
ri
ri
iKi
kk
kkkk
kkkk
kkn
kkw
kkkk
kkkk
kkwkkwnrHH
1 21)(;
2121)(
;
2121)(
;
21)(;)(
21;
2121)(
;
2121)(
;
21;21PNNHTA
)~
,~
(
)~
,~
,,(~
)~
,~
,,(~
)~
,~
(
),(
)~
,~
,,(~
)~
,~
,,(~
),(),(),(
θ
θ
θ
θθ
θ
θρ
Definition of granularity levels.Hierarchical image recognition. Pyramid HOG
Proposed in (Bosch, Zisserman, Munoz // CIVR, 2007)
Objects are divided into L pyramid levels.
We focus on small sample size (SSS) problem. Criterion – the nearest neighbor rule with weighted sum of distances
∑=
⋅=L
lrXX
llwrXXPHOG
1
),()()(
),( ρρ
Key issue – insufficient performance. This criterion requires
)1()1(/
1
)()(KK
L
l
lK
lK ⋅∑
=
⋅ -times more calculations in comparison with conventional HOG
Sequential three-way decisions and granular computing in image recognition
3. Final decision in case of unreliable decision at the finest granularity level –choose the least unreliable level
1. The nearest neighbor rule is used at each granularity level l
2. Posterior probability is estimated based on the properties of the HT-PNN
),()(
},...,1{
minarg)(r
XXl
Rr
l ρν∈
=
Query image X
Check if decision is unreliable
... ...
Level L
Nearest neighbor classifier
Resulted class c(ν)
Check if decision is unreliable
Level 1
Nearest neighbor classifier
Classifier fusion
=
∈
Xll
WPlLl
)()(
ˆmaxarg},...,1{
*
ν
∑=
⋅−
⋅−
=
R
rrXX
lUV
lXXl
UV
Xll
WP
1
),()(
exp
))(,()(
exp)()(
ˆ
ρ
νρ
ν
Sequential analysis at each granularity level
If the distance is included in the negative region, it is not the distance between objects from different classes. Hence, it should be the distance between objects of the same
class and there is no need to continue matching with other models!
Warning! Performance of nearest neighbor rule is insufficient if the number of classes is high
Solution: for each rth model object it is checked if it is possible to accept hypothesis Wr without further verification of the remaining models
Probabilistic rough set of the distance between objects from different classes is created:
1. Positive region ( )
∈>=
Ρ X2121 ,
)(1
,)()(
)(1
XXl
XXll
POS l ρρρ
2. Negative region ( )
∈<=
Ρ X2121 ,
)(0
,)()(
)(0
XXl
XXll
NEG l ρρρ
3. Boundary region ( )
∈≤≤=
Ρ X
2,
1)(
12,
1)()(
0)(
)(1
,)(
0
XXl
XXlll
llBND ρρρ
ρρ
)(0
)(,
)()( llr
Xl
Xl
ρρ <
It is a termination condition in approximate nearest neighbor methods
Real-time recognition with large database
k-NN rule requires the brute force of the whole database
Small training sample RC ≈
Small
database
(tens of
classes)
Medium-sized DB.
Problems with accuracy and
computational speed automatic
real-time recognition
Very-large DB.
Automated content-
based object retrieval
(approximate k-NN)
Solutions:• Modern hardware• Parallel computing• Simplification of similarity measure or its parameters.• Approximate nearest neighbor methods:
1. ANN library (Arya, Mount, etc. // Journal of the ACM, 1998): kd-trees. Only Minkowski distances are supported.
2. Hashing Techniques, LSH (Locality-Sensitive Hashing) (Gionis, Indyk,
Motwani, R. // Proc. of VLDB, 1999). Applications in Google Correlate(Vanderkam, Schonberger, Rowley, Kumar, Nearest Neighbor Search in
Google Correlate // Tech. report, 2013)3. FLANN library (Muja, Lowe // Proc. of VISAPP, 2009)4. NonMetricSpaceLib (Boytsov, Bilegsaikhan // Proc. of SISAP, LNCS, 2013)
Medium-sized databases. Maximum-Likelihood Directed Enumeration Method (DEM)
1. In asymptotic (n, nr→∞) the number of distance calculations in the DEM is constant(doe not depend on the DB size R)2. The DEM is the optimal greedy algorithm for the HT-PNN.
Idea: on each step the next model is selected to maximize likelihood of the previously calculated distances
Initialization: r1 is chosen to maximize average probability to obtain correct decision of k=2-th step
{ } { }( )∑
=−∈+ =
k
ii
rrRk rr
k 1,...,,...,11
1
minarg µµ
ϕ ( )( )( )
i
ii
r
rrPNNHTi
XXr
,
2,,
µ
µµ
ρ
ρρϕ
−≈
−
{ }∑ ∏= =∈
−Φ+=
R R
rr
R
nKr
1 1,,
,...,11
22
1maxarg
νµνµ
µρρ
{ } { }( )( )∏
=−
−∈+ =
k
irPNNHT
rrRk WXXfr
i
k 1,...,,...,11 ,maxarg
1
νν
ρ
Based on the asymptotic properties of the HT-PNN, this rule is equivalent to
Details in (Savchenko // Pattern Recognition, 2012), (Savchenko //Proc. of ICVS, 2013) and (Savchenko // Proc. of PReMI, LNCS, 2015 – accepted)
Experimental study. Face recognition
Parameters:
1. L=2 granularity levels (10x10 and 20x20)2. Alignment of HOGs: Δ=13. Threshold p0=0.85Testing: 20-time repeated random subsampling cross-validation
Essex dataset (323 persons, training set: 5187 photos, test set: 1224 photos)
Image
Face detection (OpenCV LBP)
Convert to grayscale
Gamma correction
3x3 median filter
Contrast equalization
SegmentationGradient magnitude/ orientation estimation
PHOG descriptor
Preprocessing 10x10 grid
Experimental results (1a). Sequential three-way decisions. HT-PNN
Error rate, % Average recognition time, ms.
1. Error rate of hierarchical approach is 0.6-1% lower than the error rate at the ground level (20x20 grid)
2. Average recognition time for the sequential TWD is 3.5-4.5 times lower in comparison with the PHOG
3. Recognition time of sequential TWD with database enumeration is 25-45% lower than the time of sequential TWD
0
2
4
6
8
10
12
323 500 700 1000
Number of models R
Err
or
rate
, %
L=1, grid 10x10 L=1, grid 20x20
Pyramid HOG (PHOG) Sequential TWD with Chow's rule
Sequential TWD with database enumeration
0
10
20
30
40
50
60
70
80
323 500 700 1000
Number of models R
Av
erag
e re
cog
nit
ion
tim
e,
ms.
L=1, grid 10x10 L=1, grid 20x20
Pyramid HOG (PHOG) Sequential TWD with Chow's rule
Sequential TWD with database enumeration
Experimental results (1b). Sequential three-way decisions. Euclidean metric
Error rate, % Average recognition time, ms.
1. Error rate is 1-2.5% higher than for the HT-PNN2. Sequential TWD with database enumeration it is 2-2.5-times faster than the
PHOG and 20-30% faster than the original sequential TWD
0
2
4
6
8
10
12
14
323 500 700 1000
Number of models R
Err
or
rate
, %
L=1, grid 10x10 L=1, grid 20x20
Pyramid HOG, PHOG Sequential T WD with Chow's rule
Sequential T WD with database enumeration
0
2
4
6
8
10
12
14
16
18
323 500 700 1000
Number of models R
Av
era
ge
re
co
gn
itio
n t
ime
, m
s.
L=1, grid 10x10 L=1, grid 20x20
Pyramid HOG, PHOG Sequential TWD with Chow's rule
Sequential TWD with database enumeration
Experimental results (2). OpenCV face recognition
Error rate, % Average recognition time, ms.
1. SVM classifier (libSVM) of HOGs (10x10 grid)2. Eigenfaces (Turk, Pentland // CVPR, 1991)3. Fisherfaces (Belhumeur et al. //IEEE Trans. on PAMI, 1997)4. Histograms of Local Binary Patterns (LBP) (Ahonen et. al //ECCV, 2004)
0
5
10
15
20
25
323 500 700 1000
Number of models R
Err
or
rate
, %
SVM, HOG (10x10) Eigenfaces Fisherfaces LBP
0
2
4
6
8
10
12
14
16
18
20
323 500 700 1000
Number of models RA
ve
rag
e r
eco
gn
itio
n t
ime,
ms.
SVM, HOG (10x10) Eigenfaces Fisherfaces LBP
In case of one image per person, SVM's accuracy is 3% and 5.5% lower than the accuracy of Euclidean distance and the HT-PNN with the same features (10x10 grid). If the number of images per class is high (3-4, R=1000), SVM is expectedly better.
Experimental results (3). ML-DEM
Average recognition time (ms.)
R=5187
0
5
10
15
20
25
30
35
T=1 T=8
Avera
ge r
eco
gn
itio
n t
ime, m
s
Brute force Randomized KD tree Ordering permutation ML-DEM Pivot ML-DEM
R=881
0
1
2
3
4
5
6
T=1 T=8
Avera
ge r
eco
gn
itio
n t
ime, m
s
Brute force Randomized KD tree Ordering permutation ML-DEM Pivot ML-DEM
Original training set Reduced training set (880 medoids)
Error rate: 0.164% 0.573%
The ML-DEM is compared with the following approximate NN methods1. Randomized kd-tree (Silpa-Anan C., Hartley R. Optimised KD-trees for fast image
descriptor matching // CVPR, 2008)2. Ordering permutation (Gonzalez E.C., Figueroa K., Navarro G. Effective Proximity Retrieval
by Ordering Permutations // IEEE Trans. on PAMI, 2008)
Experimental results (4a). Sequential three-way decisions. HT-PNN
Error rate, % Average recognition time, ms.
0
2
4
6
8
10
12
14
16
18
20
65 75 150 225
Err
or
rate
, %
Number of models R
10x1020x20PHOG (10x10+ 20x20)Sequential PHOG (10x10+ 20x20)PHOG (10x10+15x15+ 20x20)Sequential PHOG (10x10+15x15+20x20)
0
5
10
15
20
25
30
35
65 75 150 225A
vera
ge r
eco
gn
itio
n t
ime, m
s.
Number of models R
10x1020x20PHOG (10x10+ 20x20)Sequential PHOG (10x10+ 20x20)PHOG (10x10+15x15+ 20x20)Sequential PHOG (10x10+15x15+20x20)
Dataset: AT&T+Yale+JAFFE (C=65 classes, 778 images)
Experimental results (4b). Sequential three-way decisions. Euclidean metric
Error rate, % Average recognition time, ms.
1. L=3 granularity levels allow to increase the accuracy 2. Sequential TWD speeds up the recognition procedure in 2-4 times in comparison
with PHOG
0
5
10
15
20
25
65 75 150 225
Err
or
rate
, %
Number of models R
10x1020x20PHOG (10x10+ 20x20)Sequential PHOG (10x10+ 20x20)PHOG (10x10+15x15+ 20x20)Sequential PHOG (10x10+15x15+20x20)
0
2
4
6
8
10
12
14
65 75 150 225
Avera
ge r
eco
gn
itio
n t
ime, m
s.
Number of models R
10x1020x20PHOG (10x10+ 20x20)Sequential PHOG (10x10+ 20x20)PHOG (10x10+15x15+ 20x20)Sequential PHOG (10x10+15x15+20x20)
Conclusion1. Insufficient performance of hierarchical image recognition methods is
highlighted.2. Possibility to apply rough set theory, three-way decisions theory and
granularity computing in image recognition is explored3. Fast decision method of sequential analysis for Pyramid HOG features is
proposed.4. We experimentally demonstrated that the proposed approach can be
efficiently applied even with conventional Euclidean metric.5. If the number of classes C is large, brute force solution is not computing
efficient. Hence, approximate nearest neighbor algorithms can be applied.
Future work1. Explore modern features extracted with deep neural networks for
unconstrained face recognition2. Experimental study of more sophisticated segmentation methods3. Application of proposed approach in other pattern recognition tasks, e.g.
speech recognition4. It is necessary to apply our approach with other classifiers for which reject
option is available, e.g., one-against-all multi-class support vectormachine