shape context matching for e cient ocr · object character recognition figure:a few digits from the...
TRANSCRIPT
Background & MotivationShape ContextFast Matching
Shape Context Matching For Efficient OCR
Sudeep Pillai
May 14, 2012
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
Table of contents
1 Background & MotivationMotivationBackground
2 Shape ContextWhat is a Shape Context?Matching Shape ContextsSimliarity Measure
3 Fast MatchingDimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
MotivationBackground
Motivation
Automatic translation/transcription of handwritten/printedtext
Printed text has several geometric constraints that can beutilized for improved performance
Significant push for accuracy, not too much on optimization
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
MotivationBackground
Object Character Recognition
MNIST database performance
Digits size normalized, and centered in a fixed-size image60,000 training examples, 10,000 test examples
Classifier Preprocessing Test Error Rate %
Linear Classfiers
Linear classifier (1-layer NN) None 12.0Pairwise linear classifier Deskewing 7.6
K-Nearest Neighbors
K-NN, Euclidean (L2) None 3.09K-NN, Euclidean (L3) Deskewing, noise removal 1.22
K-NN, Shape context matching Shape context extraction 0.63
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
MotivationBackground
Object Character Recognition
MNIST database performance
Digits size normalized, and centered in a fixed-size image60,000 training examples, 10,000 test examples
Classifier Preprocessing Test Error Rate %
SVMSs
SVM Gaussian Kernel None 1.4Virtual SVM, deg-9 poly, 2-pixel jittered None 0.56
Neural Nets
Deep convex net, unsup pre-training None 0.83
Convolution Nets
Committe of 35 conv. net Normalization 0.23
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
MotivationBackground
Object Character Recognition
Figure: A few digits from the MNIST database
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
MotivationBackground
Object Character Recognition
MNIST database performance
Digits size normalized, and centered in a fixed-size image60,000 training examples, 10,000 test examples
Classifier Preprocessing Test Error Rate %
Linear Classfiers
Linear classifier (1-layer NN) None 12.0Pairwise linear classifier Deskewing 7.6
K-Nearest Neighbors
K-NN, Euclidean (L2) None 3.09K-NN, Euclidean (L3) Deskewing, noise removal 1.22
K-NN, Shape context matching Shape context extraction 0.63
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
What is a Shape Context?Matching Shape ContextsSimliarity Measure
What is a Shape Context?
Definition (Shape)
A shape is represented as a sequence of boundary points:
P = {p1, . . . , pn}, pi ∈ R2
Definition (Shape Context)
Shape context is a descriptor of interest point i.e. a histogram
hi(k) = #{pj j 6= i, xj−xi ∈ bin(k)},
in which bins are uniformly divided in log-polar space
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
What is a Shape Context?Matching Shape ContextsSimliarity Measure
Shape Context Representation
Figure: Graphical representation of shape context bins
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
What is a Shape Context?Matching Shape ContextsSimliarity Measure
Shape Context Histogram
Figure: Graphical representation of shape context histograms <60
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
What is a Shape Context?Matching Shape ContextsSimliarity Measure
Matching Shape Contexts
The cost of matching point pi on the first shape to point qjon the second shape (chi-square distance)
Cij =1
2
K∑k=1
[hi(k)− hj(k)]2
hi(k) + hj(k)
Minimize the total matching cost:∑
iC(pi, qπ(i))
Optimal matching
One possible technique to solve this problem is to use Hungarianmethod in O(n3) time complexity
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
What is a Shape Context?Matching Shape ContextsSimliarity Measure
Properties of shape contexts
Invariant to translation and scale (as it is normalized by themean distance of the n2 point pairs)
Can be made invariant to rotation (local tangent orientation)
Tolerant to small affine distortion (log-polar, spatial blurproportional to r)
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
What is a Shape Context?Matching Shape ContextsSimliarity Measure
Simliarity Measure
Definition
On employing a cubic spline transformation T, the two shapes’similarity can be measured via a weighted sum
D = aDac +Dsc + bDbe
Dsc Shape context distance
Dac Appearance cost
Dbe Bending energy or transformation cost
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching
Dimensionality Reduction
Approximate matching is possible with full shape contextfeature
A low-dimensional feature descriptor is desirable forperformance purposes
Uniform bin approximation will make matching accuracydecline with feature dimension d2
Multiple modalities are representable even with a reducedsubspace
Use Principal Components Analysis to determine bases thatdefine this shape context subspace
Approximate matching can be performed faster once all <60
vectors are projected onto <3
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching
Dimensionality Reduction
Figure: Projecting histograms of contour points onto the shape context subspace.The points on the human figure on the right are colored according to their 3-D shapecontext subspace feature values
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching
Dimensionality Reduction
Figure: Visualization of feature subspace constructed from shape context histogramsfor two different data sets. The RGB channels of each point on the contours arecolored according to its histograms 3-D PCA coefficient values. Set matching in thisfeature space means that contour points of similar color have a low matching cost,while highly contrasting colors incur a high matching cost
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching
Dimensionality Reduction Tradeoffs
Larger d is
Smaller the PCA reconstruction errorLarger the distortion induced by the L1 embeddingLarger the complexity of computing the embedding
Do we really need a <60 feature vector to represent a shape?
Shapes are almost never similarApproximate measures make more senseExtract only most discriminating dimensions as descriptor
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching
Pyramid Matching
X and Y are two sets of vectors in a <d feature space
Find an approximate correspondence between X and Y
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching
Pyramid Matching Overview
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching
Pyramid Matching Kernels
Construct a sequence of grids at resolution 0, . . . , L where agrid at a resolution l has D = 2dl cells.
Compute the histograms H lX and l
Y where
H lX and H l
Y are histograms of X and Y at resolution lH l
X(i) and H lY (i) are the number of points of X and Y in the
ith cell
Compute the number of matches for each resolution using:
I(H lX , H
lY ) =
D∑i=1
min(H lX(i), H
lY (i))
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching
Pyramid Matching Kernels
Summing all the I l giving more importance to the highresolution with:
K(X,Y ) = IL+
L∑l=0
−1 1
2L−1(I l−I l+1) =
1
2LI0+
L∑l=1
1
2L−l+1I l
where I l − I l+1 is the number of new matches
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching
Pyramid Matching (l = 0)
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching
Pyramid Matching (l = 1)
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching
Pyramid Matching (l = 2)
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching
Pyramid Matching
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching
Comparison with Optimal Matching
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching
Vocabulary-guided Matching
Figure: The bins are concentrated on decomposing the space where features cluster,particularly for high-dimensional features (in this figure <2). Features are small pointsin red, bin centers are larger black points, and blue lines denote bin boundaries. Thevocabulary-guided bins are irregularly shaped Voronoi cells.
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching
Performance
Computing partial matching
Earth Mover’s Distance O(dm3 logm)Hungarian method O(dm3)Greedy matching O(dm2 logm)Pyramid match O(dmL)
for sets with O(m) <d features and pyramids with L levels
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching
Affine Constraints - RANSAC
Figure: Interest points computed onimage 1
Figure: Interest points computed onimage 2
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching
Affine Constraints - RANSAC
Figure: Find correspondences between interest points
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching
Affine Constraints - RANSAC
Figure: Outlier removal via RANSAC (Random Sampling And Consensus)
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching
Additional improvements
RANSAC gives an initial estimate of affine transformationbetween canonical set of points and query points
Utilize affine transformation estimate to performvocabulary/geometrically guided searching/matching
Could use MLESAC/PROSAC to perform probabilisticsearching
Ability to add constraints to the pyramid matching scheme toreduce query time, and improve robustness to partial matching
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching
Conclusions
Investigated and implemented a shape descriptor invariant torotation and scale
Integrated an approximate matching scheme that has a lineartime complexity
Scheme extends well with increase in size of the databse ofdescriptors
Significant improvement in speed with little tradeoff inaccuracy
Source code available soon
Sudeep Pillai Shape Context Matching For Efficient OCR
Background & MotivationShape ContextFast Matching
Dimensionality ReductionMatching Shape Contexts via Pyramid MatchingEfficient Matching
Conclusions
Thanks!
Sudeep Pillai Shape Context Matching For Efficient OCR