shape context matching for e cient ocr · object character recognition figure:a few digits from the...

Background & MotivationShape ContextFast Matching

Shape Context Matching For Efficient OCR

Sudeep Pillai

May 14, 2012

Sudeep Pillai Shape Context Matching For Efficient OCR


Table of contents

1 Background & MotivationMotivationBackground

2 Shape ContextWhat is a Shape Context?Matching Shape ContextsSimliarity Measure

3 Fast MatchingDimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching



MotivationBackground

Motivation

Automatic translation/transcription of handwritten/printedtext

Printed text has several geometric constraints that can beutilized for improved performance

Significant push for accuracy, not too much on optimization




Object Character Recognition

MNIST database performance

Digits size normalized, and centered in a fixed-size image60,000 training examples, 10,000 test examples

Classifier Preprocessing Test Error Rate %

Linear Classfiers

Linear classifier (1-layer NN) None 12.0Pairwise linear classifier Deskewing 7.6

K-Nearest Neighbors

K-NN, Euclidean (L2) None 3.09K-NN, Euclidean (L3) Deskewing, noise removal 1.22

K-NN, Shape context matching Shape context extraction 0.63








SVMSs

SVM Gaussian Kernel None 1.4Virtual SVM, deg-9 poly, 2-pixel jittered None 0.56

Neural Nets

Deep convex net, unsup pre-training None 0.83

Convolution Nets

Committe of 35 conv. net Normalization 0.23





Figure: A few digits from the MNIST database








Linear Classfiers

Linear classifier (1-layer NN) None 12.0Pairwise linear classifier Deskewing 7.6

K-Nearest Neighbors

K-NN, Euclidean (L2) None 3.09K-NN, Euclidean (L3) Deskewing, noise removal 1.22

K-NN, Shape context matching Shape context extraction 0.63



What is a Shape Context?Matching Shape ContextsSimliarity Measure

What is a Shape Context?

Definition (Shape)

A shape is represented as a sequence of boundary points:

P = {p1, . . . , pn}, pi ∈ R2

Definition (Shape Context)

Shape context is a descriptor of interest point i.e. a histogram

hi(k) = #{pj j 6= i, xj−xi ∈ bin(k)},

in which bins are uniformly divided in log-polar space




Shape Context Representation

Figure: Graphical representation of shape context bins




Shape Context Histogram

Figure: Graphical representation of shape context histograms <60




Matching Shape Contexts

The cost of matching point pi on the first shape to point qjon the second shape (chi-square distance)

Cij =1

2

K∑k=1

[hi(k)− hj(k)]2

hi(k) + hj(k)

Minimize the total matching cost:∑

iC(pi, qπ(i))

Optimal matching

One possible technique to solve this problem is to use Hungarianmethod in O(n3) time complexity




Properties of shape contexts

Invariant to translation and scale (as it is normalized by themean distance of the n2 point pairs)

Can be made invariant to rotation (local tangent orientation)

Tolerant to small affine distortion (log-polar, spatial blurproportional to r)




Simliarity Measure

Definition

On employing a cubic spline transformation T, the two shapes’similarity can be measured via a weighted sum

D = aDac +Dsc + bDbe

Dsc Shape context distance

Dac Appearance cost

Dbe Bending energy or transformation cost



Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching

Dimensionality Reduction

Approximate matching is possible with full shape contextfeature

A low-dimensional feature descriptor is desirable forperformance purposes

Uniform bin approximation will make matching accuracydecline with feature dimension d2

Multiple modalities are representable even with a reducedsubspace

Use Principal Components Analysis to determine bases thatdefine this shape context subspace

Approximate matching can be performed faster once all <60

vectors are projected onto <3





Figure: Projecting histograms of contour points onto the shape context subspace.The points on the human figure on the right are colored according to their 3-D shapecontext subspace feature values





Figure: Visualization of feature subspace constructed from shape context histogramsfor two different data sets. The RGB channels of each point on the contours arecolored according to its histograms 3-D PCA coefficient values. Set matching in thisfeature space means that contour points of similar color have a low matching cost,while highly contrasting colors incur a high matching cost




Dimensionality Reduction Tradeoffs

Larger d is

Smaller the PCA reconstruction errorLarger the distortion induced by the L1 embeddingLarger the complexity of computing the embedding

Do we really need a <60 feature vector to represent a shape?

Shapes are almost never similarApproximate measures make more senseExtract only most discriminating dimensions as descriptor




Pyramid Matching

X and Y are two sets of vectors in a <d feature space

Find an approximate correspondence between X and Y




Pyramid Matching Overview




Pyramid Matching Kernels

Construct a sequence of grids at resolution 0, . . . , L where agrid at a resolution l has D = 2dl cells.

Compute the histograms H lX and l

Y where

H lX and H l

Y are histograms of X and Y at resolution lH l

X(i) and H lY (i) are the number of points of X and Y in the

ith cell

Compute the number of matches for each resolution using:

I(H lX , H

lY ) =

D∑i=1

min(H lX(i), H

lY (i))




Pyramid Matching Kernels

Summing all the I l giving more importance to the highresolution with:

K(X,Y ) = IL+

L∑l=0

−1 1

2L−1(I l−I l+1) =

1

2LI0+

L∑l=1

1

2L−l+1I l

where I l − I l+1 is the number of new matches




Pyramid Matching (l = 0)




Pyramid Matching




Comparison with Optimal Matching




Vocabulary-guided Matching

Figure: The bins are concentrated on decomposing the space where features cluster,particularly for high-dimensional features (in this figure <2). Features are small pointsin red, bin centers are larger black points, and blue lines denote bin boundaries. Thevocabulary-guided bins are irregularly shaped Voronoi cells.




Performance

Computing partial matching

Earth Mover’s Distance O(dm3 logm)Hungarian method O(dm3)Greedy matching O(dm2 logm)Pyramid match O(dmL)

for sets with O(m) <d features and pyramids with L levels




Affine Constraints - RANSAC

Figure: Interest points computed onimage 1

Figure: Interest points computed onimage 2





Figure: Find correspondences between interest points





Figure: Outlier removal via RANSAC (Random Sampling And Consensus)




Additional improvements

RANSAC gives an initial estimate of affine transformationbetween canonical set of points and query points

Utilize affine transformation estimate to performvocabulary/geometrically guided searching/matching

Could use MLESAC/PROSAC to perform probabilisticsearching

Ability to add constraints to the pyramid matching scheme toreduce query time, and improve robustness to partial matching




Conclusions

Investigated and implemented a shape descriptor invariant torotation and scale

Integrated an approximate matching scheme that has a lineartime complexity

Scheme extends well with increase in size of the databse ofdescriptors

Significant improvement in speed with little tradeoff inaccuracy

Source code available soon




Conclusions

Thanks!


shape context matching for e cient ocr · object character recognition figure:a few digits from the...

Documents