andrey v. savchenko - sequential hierarchical image recognition based on the pyramid histograms of...

Andrey V. SavchenkoNational Research University Higher School of EconomicsEmail: [email protected]

Co-authors: Vladimir Milov (N. Novgorod State Technical University)Natalya Belova (NRU HSE, Moscow)

National Research University Higher School of EconomicsNizhny Novgorod

THE 4TH INTERNATIONAL CONFERENCE ON ANALYSIS OF IMAGES, SOCIAL NETWORKS, AND TEXTS

Sequential Hierarchical Image Recognition based on the Pyramid Histograms of Oriented Gradients

with Small Samples

Outline

1.Overview. Rough sets. Three-way decisions2.Hierarchical image recognition. Pyramid HOG3.Sequential three-way decisions in image recognition4.Experimental results. Face recognition5.Conclusion and future work

Overview. Rough sets

Pawlak, Zdzisław. Rough Sets: Theoretical Aspects of Reasoning About Data, 1991

Conferences:1. JRS (Joint Roush Set Symposium): - RSEISP: Rough Sets and Emerging Intelligent

Systems Paradigms - RSCTC: Conference on Rough Sets and Current

Trends in Computing2. International Joint Conference on Rough Sets

(IJCRS) 3. Rough Set Theory Workshop (RST)…

Key idea: set is represented with lower and upper approximations

Three regions of a target set S from universal set U:1. Positive region POS(S)2. Negative regions NEG(S)3. Boundary region: U-POS(S)-NEG(S)

Three-way decisions (TWD) and binary classification

Yiyu Yao, Three-way decisions with probabilistic rough sets, Information Sciences, 2010

“Rules constructed from the three regions are associated with different actions and decisions, which immediately leads to the notion of three-way decision rules. A positive rule makes a decision of acceptance, a negative rule makes a decision of rejection, and a boundary rule makes a decision of abstaining”

Pattern recognition: it is required to assign a query object X to one of C classes specified by the database of reference (model) objects. It is assumed that the class label of the rth model object is known

In case of binary classification (C=2):1. Positive decision - accept the first class2. Negative decision - reject the first class and accept the second class3. Boundary decision: delay the final decision and do not accept either first or

second class

{ }Crc ,...,1)( ∈

Three-way decisions and multi-class recognition

Obvious enhancement to multi-class recognition – (C+1)-way decisions1. Accept class c=12. Accept class c=2…C. Accept class c=CC+1. Boundary decision: delay the decision process if the classification result is

unreliable

Known way to reject unreliable decision - Chow’s rule.Chow C. On optimum recognition error and reject tradeoff // IEEE Trans. Inf. Theory, 1970

( )0

max},...,1{

pXcPCc

≤∈

ro

ropΠ−Π+Π

Π−Π=

1001

100

1) П10 – losses of incorrect decision, which has not been rejected 2) П01 – losses of rejection of correct decision 3) Пro – cost of reject option (Пro≤ П10)

Sequential three-way decisions and granular computing

Key question: how to make a decision if the reject option was chosen?

Yao Y. Granular Computing and Sequential Three-Way Decisions //Proc. of Rough Sets and Knowledge

Technology, LNCS, 2013:

"Objects with a non-commitment decision may be further investigated by using fine-grained granules"

Issues to address:1) How to define granularity levels in a general way for practically important

composite objects so that high granularity levels are processed faster than the low one?

2) It is not necessary that the most reliable decision is obtained at the low granularity level. How to define the way to make a final decision if reject option is chosen at the last level?

3) Is it possible to apply sequential TWD to each granularity level?

Query object X

Check if decision is unreliable

... ...

Level L

Classifier

Resulted class

Level 1

Classifier


Level (L-1)

Classifier

Histograms of Oriented Gradients (HOG)Proposed in (Dalal and Triggs, 2005)

Criterion: { }( ) ( )( )∑

=∑=

∆+∆+

∆≤∆∆≤∆∈

)1(

1

)2(

2

212211,

)2()1(

,...,1 1 1

,,,minmin

2

1

K

k

K

k

kkHkkHUV

KKr

Rr

ρ

Image descriptors:1. Local (SIFT, SURF, etc.): a) Keypoint extractionb) Descriptor extraction2. Global (color histograms, HOG)a) Object detectionb) Descriptor extraction

Gradient orientation histogram (from (Lowe, 2004))

Statistical classification

Instead of unknown class distributions let’s use the Gaussian Parzen kernel.

Thus, for group-choice classification of the segment and naïve assumption of features independence inside each segment the generalized PNN is used

Classification task is reduced to the testing of simple hypothesis. In case of equal prior probabilities the Bayesian rule is as follows:

( )

( )∏ ∑= =

=

=

n

j

n

j rnr

r

r

r

kkrj

xkkjxKn

kkWkkXf

1 12121

2121

)~

,~

()(

),,(1

)~

,~

(),(

Final classifier with assumption of segments independence:

Production layer is added to the traditional PNN structure

Unfortunately, if the distribution estimates are used instead of unknown distributions, such approach is not optimal. It is necessary to check complex hypothesis

{ }( )( ))2

~,1

~()2,1(

,...,1max kkrWkkXfrp

Rr⋅

∈

( )

∑=

∑=

∏=

∑=

∆+∆+⋅

∈∆≤∆∆≤∆

)1(

1

)2(

2

221121,1 1 1 1

),()(

),,(1

min},...,1{

max

2

1

K

k

K

k

n

j

rn

rj

kkr

rjxkkjxK

nrn

rpRr

Segment homogeneity testingIdea [Borovkov, 1984] - it is necessary to check complex hypothesis of features samples homogeneity.The following criterion is known to be asymptotically minimax:

Distribution of hypothesis is estimated by the united sample {X(k1,k2), as it is done in Lehmann-Rosenblatt test.

)2~

,1~

,2,1()(

kkkkr

W

Homogeneity-Testing PNN (HT-PNN) for piecewise-regular object recognition

(Savchenko//Proc. of ANNPR, LNAI, 2012)

{ }{ }

∈

)2~

,1~

,2,1()(

)2~

,1~

(),...,2~

,1~

(1),2,1()(

,...,)1(

sup

,...,1

max kkkkr

WkkRXkkXkkXfR

ffRr

)}2~

,1~

( kkX r

( )( )

∏=

∑=

∆+∆+∆+∆+

∑=

∆+∆+

+×

∏=

×

∑=

∆+∆+

∑=

∆+∆+

+×

×∑=

∑=

++

⋅

∈∆≤∆∆≤∆

rn

rjrn

rj

kkr

rjxkk

r

rjxK

n

j

kkj

xkkr

rjxK

n

jn

j

kkj

xkkjxK

rn

rj

kkr

rjxkkjxK

K

k

K

k rnnrnn

rnrnnn

Rr

1

11;

),()(

1;),,(

)(

11

),(1

),,()(

1

1

11

),(1

),,(

1

),()(

),,(

1

1 1

min},...,1{

max

22112211

212211

221121

221121

)1(

1

)2(

2,

2

1

Discrete featuresComputing efficiency of the PNN and the HT-PNN

⋅⋅⋅+∆ ∑

=

R

rrr VUUVO

1

2)12(

If there are only N feature values, segments are described with their histograms (HOGs) Nikkirkkiw ,1)},

~,

~(;{)},,({ 2121 =θ

∑=

=N

j

kkKkkr

jijr

iK1

)~

,~

()~

,~

( 21)(

21)(; θθ

∑=

=N

j

kkwKkkw jijiK1

),(),( 2121;

( )nnkkwnkknkkkk riKr

iKrri +⋅+⋅=Σ /)),()

~,

~(()

~,

~,,(

~21;21

)(;2121

)(; θθ

PNNEquivalent to the Kullback-Leiblerdivergence, if smoothing parameter σ→0

HT-PNNGeneralization of the Jensen-Shannon divergence

Approximate HT-PNN (A-HT-PNN) Generalization of chi-square distance

See details in (Savchenko // Neural Networks, 2013)

{ }Rr

N

ir

iK

iKi

K

k

K

k kk

kkwkkw

KnrXX,...,1

)1(

1

)2(

2 1 2211)(;

21;21

,PNN min

1 1 ),(

),(ln),(min

1),(

2

1 ∈=∆≤∆∆≤∆

→∑=

∑= ∆+∆+

= ∑θ

ρ

∑= ΣΣ

−

⋅+⋅=

N

iri

riKr

irri

iKi

kkkk

kkkkn

kkkk

kkwkkwnrHH

1 2121)(

;

21)(;

21)(

2121)(

;

21;21PNNHT

)~

,~

;,(~

)~

,~

(ln)

~,

~(

)~

,~

;,(~

),(ln),(),(

θ

θθ

θρ

∑=

Σ

Σ

Σ

Σ

−−

−+

−⋅=

N

ir

iK

ri

ri

riKr

iriK

ri

ri

iKi

kk

kkkk

kkkk

kkn

kkw

kkkk

kkkk

kkwkkwnrHH

1 21)(;

2121)(

;

2121)(

;

21)(;)(

21;

2121)(

;

2121)(

;

21;21PNNHTA

)~

,~

(

)~

,~

,,(~

)~

,~

,,(~

)~

,~

(

),(

)~

,~

,,(~

)~

,~

,,(~

),(),(),(

θ

θ

θ

θθ

θ

θρ

Definition of granularity levels.Hierarchical image recognition. Pyramid HOG

Proposed in (Bosch, Zisserman, Munoz // CIVR, 2007)

Objects are divided into L pyramid levels.

We focus on small sample size (SSS) problem. Criterion – the nearest neighbor rule with weighted sum of distances

∑=

⋅=L

lrXX

llwrXXPHOG

1

),()()(

),( ρρ

Key issue – insufficient performance. This criterion requires

)1()1(/

1

)()(KK

L

l

lK

lK ⋅∑

=

⋅ -times more calculations in comparison with conventional HOG

Sequential three-way decisions and granular computing in image recognition

3. Final decision in case of unreliable decision at the finest granularity level –choose the least unreliable level

1. The nearest neighbor rule is used at each granularity level l

2. Posterior probability is estimated based on the properties of the HT-PNN

),()(

},...,1{

minarg)(r

XXl

Rr

l ρν∈

=

Query image X


... ...

Level L

Nearest neighbor classifier

Resulted class c(ν)


Level 1

Nearest neighbor classifier

Classifier fusion

=

∈

Xll

WPlLl

)()(

ˆmaxarg},...,1{

*

ν

∑=

⋅−

⋅−

=

R

rrXX

lUV

lXXl

UV

Xll

WP

1

),()(

exp

))(,()(

exp)()(

ˆ

ρ

νρ

ν

Sequential analysis at each granularity level

If the distance is included in the negative region, it is not the distance between objects from different classes. Hence, it should be the distance between objects of the same

class and there is no need to continue matching with other models!

Warning! Performance of nearest neighbor rule is insufficient if the number of classes is high

Solution: for each rth model object it is checked if it is possible to accept hypothesis Wr without further verification of the remaining models

Probabilistic rough set of the distance between objects from different classes is created:

1. Positive region ( )

∈>=

Ρ X2121 ,

)(1

,)()(

)(1

XXl

XXll

POS l ρρρ

2. Negative region ( )

∈<=

Ρ X2121 ,

)(0

,)()(

)(0

XXl

XXll

NEG l ρρρ

3. Boundary region ( )

∈≤≤=

Ρ X

2,

1)(

12,

1)()(

0)(

)(1

,)(

0

XXl

XXlll

llBND ρρρ

ρρ

)(0

)(,

)()( llr

Xl

Xl

ρρ <

It is a termination condition in approximate nearest neighbor methods

Real-time recognition with large database

k-NN rule requires the brute force of the whole database

Small training sample RC ≈

Small

database

(tens of

classes)

Medium-sized DB.

Problems with accuracy and

computational speed automatic

real-time recognition

Very-large DB.

Automated content-

based object retrieval

(approximate k-NN)

Solutions:• Modern hardware• Parallel computing• Simplification of similarity measure or its parameters.• Approximate nearest neighbor methods:

1. ANN library (Arya, Mount, etc. // Journal of the ACM, 1998): kd-trees. Only Minkowski distances are supported.

2. Hashing Techniques, LSH (Locality-Sensitive Hashing) (Gionis, Indyk,

Motwani, R. // Proc. of VLDB, 1999). Applications in Google Correlate(Vanderkam, Schonberger, Rowley, Kumar, Nearest Neighbor Search in

Google Correlate // Tech. report, 2013)3. FLANN library (Muja, Lowe // Proc. of VISAPP, 2009)4. NonMetricSpaceLib (Boytsov, Bilegsaikhan // Proc. of SISAP, LNCS, 2013)

Medium-sized databases. Maximum-Likelihood Directed Enumeration Method (DEM)

1. In asymptotic (n, nr→∞) the number of distance calculations in the DEM is constant(doe not depend on the DB size R)2. The DEM is the optimal greedy algorithm for the HT-PNN.

Idea: on each step the next model is selected to maximize likelihood of the previously calculated distances

Initialization: r1 is chosen to maximize average probability to obtain correct decision of k=2-th step

{ } { }( )∑

=−∈+ =

k

ii

rrRk rr

k 1,...,,...,11

1

minarg µµ

ϕ ( )( )( )

i

ii

r

rrPNNHTi

XXr

,

2,,

µ

µµ

ρ

ρρϕ

−≈

−

{ }∑ ∏= =∈

−Φ+=

R R

rr

R

nKr

1 1,,

,...,11

22

1maxarg

νµνµ

µρρ

{ } { }( )( )∏

=−

−∈+ =

k

irPNNHT

rrRk WXXfr

i

k 1,...,,...,11 ,maxarg

1

νν

ρ

Based on the asymptotic properties of the HT-PNN, this rule is equivalent to

Details in (Savchenko // Pattern Recognition, 2012), (Savchenko //Proc. of ICVS, 2013) and (Savchenko // Proc. of PReMI, LNCS, 2015 – accepted)

Experimental study. Face recognition

Parameters:

1. L=2 granularity levels (10x10 and 20x20)2. Alignment of HOGs: Δ=13. Threshold p0=0.85Testing: 20-time repeated random subsampling cross-validation

Essex dataset (323 persons, training set: 5187 photos, test set: 1224 photos)

Image

Face detection (OpenCV LBP)

Convert to grayscale

Gamma correction

3x3 median filter

Contrast equalization

SegmentationGradient magnitude/ orientation estimation

PHOG descriptor

Preprocessing 10x10 grid

Experimental results (1a). Sequential three-way decisions. HT-PNN

Error rate, % Average recognition time, ms.

1. Error rate of hierarchical approach is 0.6-1% lower than the error rate at the ground level (20x20 grid)

2. Average recognition time for the sequential TWD is 3.5-4.5 times lower in comparison with the PHOG

3. Recognition time of sequential TWD with database enumeration is 25-45% lower than the time of sequential TWD

0

2

4

6

8

10

12

323 500 700 1000

Number of models R

Err

or

rate

, %

L=1, grid 10x10 L=1, grid 20x20

Pyramid HOG (PHOG) Sequential TWD with Chow's rule

Sequential TWD with database enumeration

0

10

20

30

40

50

60

70

80

323 500 700 1000

Number of models R

Av

erag

e re

cog

nit

ion

tim

e,

ms.

L=1, grid 10x10 L=1, grid 20x20

Pyramid HOG (PHOG) Sequential TWD with Chow's rule


Experimental results (1b). Sequential three-way decisions. Euclidean metric


1. Error rate is 1-2.5% higher than for the HT-PNN2. Sequential TWD with database enumeration it is 2-2.5-times faster than the

PHOG and 20-30% faster than the original sequential TWD

0

2

4

6

8

10

12

14

323 500 700 1000

Number of models R

Err

or

rate

, %

L=1, grid 10x10 L=1, grid 20x20

Pyramid HOG, PHOG Sequential T WD with Chow's rule

Sequential T WD with database enumeration

0

2

4

6

8

10

12

14

16

18

323 500 700 1000

Number of models R

Av

era

ge

re

co

gn

itio

n t

ime

, m

s.

L=1, grid 10x10 L=1, grid 20x20

Pyramid HOG, PHOG Sequential TWD with Chow's rule


Experimental results (2). OpenCV face recognition


1. SVM classifier (libSVM) of HOGs (10x10 grid)2. Eigenfaces (Turk, Pentland // CVPR, 1991)3. Fisherfaces (Belhumeur et al. //IEEE Trans. on PAMI, 1997)4. Histograms of Local Binary Patterns (LBP) (Ahonen et. al //ECCV, 2004)

0

5

10

15

20

25

323 500 700 1000

Number of models R

Err

or

rate

, %

SVM, HOG (10x10) Eigenfaces Fisherfaces LBP

0

2

4

6

8

10

12

14

16

18

20

323 500 700 1000

Number of models RA

ve

rag

e r

eco

gn

itio

n t

ime,

ms.

SVM, HOG (10x10) Eigenfaces Fisherfaces LBP

In case of one image per person, SVM's accuracy is 3% and 5.5% lower than the accuracy of Euclidean distance and the HT-PNN with the same features (10x10 grid). If the number of images per class is high (3-4, R=1000), SVM is expectedly better.

Experimental results (3). ML-DEM

Average recognition time (ms.)

R=5187

0

5

10

15

20

25

30

35

T=1 T=8

Avera

ge r

eco

gn

itio

n t

ime, m

s

Brute force Randomized KD tree Ordering permutation ML-DEM Pivot ML-DEM

R=881

0

1

2

3

4

5

6

T=1 T=8

Avera

ge r

eco

gn

itio

n t

ime, m

s

Brute force Randomized KD tree Ordering permutation ML-DEM Pivot ML-DEM

Original training set Reduced training set (880 medoids)

Error rate: 0.164% 0.573%

The ML-DEM is compared with the following approximate NN methods1. Randomized kd-tree (Silpa-Anan C., Hartley R. Optimised KD-trees for fast image

descriptor matching // CVPR, 2008)2. Ordering permutation (Gonzalez E.C., Figueroa K., Navarro G. Effective Proximity Retrieval

by Ordering Permutations // IEEE Trans. on PAMI, 2008)

Experimental results (4a). Sequential three-way decisions. HT-PNN


0

2

4

6

8

10

12

14

16

18

20

65 75 150 225

Err

or

rate

, %

Number of models R

10x1020x20PHOG (10x10+ 20x20)Sequential PHOG (10x10+ 20x20)PHOG (10x10+15x15+ 20x20)Sequential PHOG (10x10+15x15+20x20)

0

5

10

15

20

25

30

35

65 75 150 225A

vera

ge r

eco

gn

itio

n t

ime, m

s.

Number of models R


Dataset: AT&T+Yale+JAFFE (C=65 classes, 778 images)

Experimental results (4b). Sequential three-way decisions. Euclidean metric


1. L=3 granularity levels allow to increase the accuracy 2. Sequential TWD speeds up the recognition procedure in 2-4 times in comparison

with PHOG

0

5

10

15

20

25

65 75 150 225

Err

or

rate

, %

Number of models R


0

2

4

6

8

10

12

14

65 75 150 225

Avera

ge r

eco

gn

itio

n t

ime, m

s.

Number of models R


Conclusion1. Insufficient performance of hierarchical image recognition methods is

highlighted.2. Possibility to apply rough set theory, three-way decisions theory and

granularity computing in image recognition is explored3. Fast decision method of sequential analysis for Pyramid HOG features is

proposed.4. We experimentally demonstrated that the proposed approach can be

efficiently applied even with conventional Euclidean metric.5. If the number of classes C is large, brute force solution is not computing

efficient. Hence, approximate nearest neighbor algorithms can be applied.

Future work1. Explore modern features extracted with deep neural networks for

unconstrained face recognition2. Experimental study of more sophisticated segmentation methods3. Application of proposed approach in other pattern recognition tasks, e.g.

speech recognition4. It is necessary to apply our approach with other classifiers for which reject

option is available, e.g., one-against-all multi-class support vectormachine

Thank you for listening!Questions?

andrey v. savchenko - sequential hierarchical image recognition based on the pyramid histograms of...

Presentations & Public Speaking