hmm for printed text recognition slides

INTRODUCTION RELATED WORK MAIN TOPIC RESULTS CONCLUSION REFERENCES

Combining Structure and Parameter Adaptation ofHMM for Printed Text Recognition

ANOOP RMTech Computational Linguistics

Government Engineering CollegeSreekrishnapuram

December 3, 2014

ANOOP RMTech Computational Linguistics Combining Structure and Parameter Adaptation of HMM for Printed Text Recognition


CONTENTS

1 INTRODUCTION

2 RELATED WORK

3 MAIN TOPIC

4 RESULTS

5 CONCLUSION

6 REFERENCES



OCR

Uses?

Indexing multimediaReduced size

Google books, gogglesRead for blind people ...

Challenges?

Different fontsSimilar characters

Noise in the documentsText orientation

How it works?

How we works?Segmentation

Generic vs specialized fontTraining vs adaptation



HMM Introduction

What are models?

Boolean, Vector, Probabilistic ...Dynamic Bayesian network , Finite stochastic automates

Who is Markov?

Russian mathematician Andrey MarkovP One can make predictions for the future of the process based

solely on its present state.

What is hidden?

2 stochastic processAn unobservable process linked to an observable.

Why should we know?

speech, handwriting, gesture recognition, part-of-speechtagging, musical score following, partial discharges andbioinformatics.



HMM Fundamentals

Model: λ = {A,B, π}A: State transition probability

ai,j = probability of changing from state i to state j

B Emission probability

bj,k = probability of word at location k is associated to tag j

π: Intial state probability

πi = probability of starting in state i

Two types depending on observation

Discrete Model Continuous Model

In most cases, when continuous density HMMs (CDHMM) areused, the data distribution associated to each state isrepresented with a mixture of Gaussians (GMM)



GMM

Overview of the extraction of GMM parameters from the speech signal.[1]



HMM Structure optimization

Training HMMs with Baum-Welch and Viterbi algorithmsrequires specifying whole structure hyper-parameters.

Maximum a Posteriori estimation scheme uses an entropicprior to determine the best HMM structure together withemission distribution parameters.

Left-right topology is known to be the best choice for OCR,speech and handwriting recognition, because its linearstructure is the most suitable for the sequential nature of thedata

Apart from general-purpose methods like cross-validation, theHMM structure is generally determined using one of the twomain sets of methods: heuristic estimation or model selection



Heuristic Approaches

Heuristic methods are task-specific

A typical solution in speech and text recognition uses thewidth of the unit models (phonemes, characters. . .),estimated on the training data set, as a criterion to determinethe number of states of the model.

The Bakis procedure sets the number of states of a left/rightmodel to a fraction of α the average ”width” of the unitmodels.

A validation set is used to choose the optimal value of α thatminimizes the error rate.



Model Selection Approaches

Train several HMM models with various structureconfigurations, and then compare them using a criterion thatis generally estimated on the training data set.

A key point of model selection is the exploration strategy ofthe HMM structure “search-space”.

A first set of methods applies a global search where allcandidate models are fully trained and then compared using agiven criterion

The other set of methods explores the search-space in aniterative and greedy-search manner

Each modification in the HMM structure must be followed bya complete re-estimation of the model parameters on thetraining data.



Model Selection Approaches cond..

The choice of the criterion that controls model selection iscrucial.The maximum likelihood (ML) criterion has been widely usedMost HMM model selection approaches use the BayesianframeworkModel selection = likelihood term + penalization term(Occam’s razor principle of parsimony)Above criteria have been derived with the assumption that thenumber of samples is much larger than the number ofestimated model parameters.Eventhough compact not a best model because it grants nointerest to inter-class discrimination.The discriminative information criterion selects the mostdiscriminant models, resulting in a higher performancecompared to Bayesian at the cost of larger model structuresANOOP RMTech Computational Linguistics Combining Structure and Parameter Adaptation of HMM for Printed Text Recognition


HMM Adaption

Like training, adaptation is basically a procedure for modelparameter estimation, but differs from training mainly on twopoints:

the training procedure assumes that the available amount ofdata is large enough, while the scarcity of data is consideredexplicitly by adaptation algorithms, either by the clustering ofthe parameters to adapt (MLLR) or by introducing priorknowledge (MAP);when training new models, one starts from a heuristically orrandomly initialized model, while adaptation uses an existing(already trained) baseline generic model.

MLLR can be considered as a specialization of MAP(withlarge adaption data)



HMM Adaption cond..

Specialization of generic CDHMM is done by increasing thelikelihood of the adaptation data conditionally to the models

It is found that adapting the covariance matrices of Gaussianshad little influence on the final result, and therefore, only theGaussian centroıds are generally adapted

Most of adaptation methods use a linear transformation toadapt the set of Gaussian parameters

MAP does not make use of linear transformation butseparately updates the parameters of each Gaussian in aniterative manner, so as to converge to the maximum aposteriori estimates of the Gaussian parameters.



Combining Structure & Parameter Adaptation

To combine structure as well as parameter adaptation into asingle framework, an optimization scheme is derived that canhandle scarcity of labeled dataTo overcome this difficulty,

a semi-supervised adaptation framework where the adaptationdata set is used for the re-estimation of the parameters ofGaussian mixtures while the unlabeled validation data set isused to optimize the structure modification operations, byestimating the criterion used by each algorithm.determine a strategy to explore the HMM structuresearch-space.

The parameter adaptation stage is based on supervised MAPor MLLR

the structure adaptation stage involves two basic merging andsplitting operators of HMM states.



Basic Operations Used for HMM Structure Adaptation

Merging operations.

Merge the two successive states in the model with the closestemission probability densitiesDoubles size of Gaussian componentsIterative Mixture collapsing algorithm

Splitting operations.

Proceed with a temporal split on the state with the GMM ofhigher varianceCreating two new states identical to the initial stateNew state is the result of merging ssplit and sadj .



Model Selection Based Structure Adaptation

This iterative algorithm directs the adaptation of the HMM inorder to maximize a likelihood criterion.

At each iteration of the algorithm, all the HMM models areadapted separately.

For each of the models, compare the effectiveness of threealternative models: the current model, + 1 state, -1 state;

The effectiveness of a model is estimated by the probability,computed over the validation data set.

The parameter adaptation of the modified HMM is performedafter each structure modification to re-estimate its parametersbased on this new data alignment.



Model Selection Based Structure Adaptation cond..

If for a given model m, the modified models are not betterthan the current model, the structure of m will no longer bemodified in further iterations.

At the end of each iteration it of the algorithm, compute theaverage likelihood

This average likelihood is compared to the one of the previousiteration . If the likelihood variation is below a threshold, thealgorithm is stopped

The complexity of each iteration is O(TNS2), where N is thenumber of HMM models, S the total number of states of allmodels and T the total number of observations in thevalidation data set.



State Duration Based Structure Adaptation

This algorithm uses the average width of the characters as acriterion to determine the optimal structure of a charactermodel.

The set of states to be split or merged is determined a priori,in a non-iterative manner, according to their empirical averageduration

The criterion is the difference between the width of the HMMstate on the training data set and its width on the new data.

∆i =D′

i −Di

Di.

∆i > Threshold+ then split

∆i < Threshold− then merge



State Duration Based Structure Adaptation cond..

The purpose of the use of quantiles to determine thethreshold level Threshold+ is : the increase of the number ofstate occurrences caused by all the state splittings performedmust match the average increase.

Threshold− is determined same way

This algorithm has a complexity of O(G 2D2S), (G : numberof Gaussians per mixture, D: dimension of the feature space,S : total number of states of all HMMs)



Results

The evaluations performed both on synthetic data (3,100,000characters) and on real data (1,120,000 characters), to adapta set of 89 HMM character models, have shown that thesestructure adaptation algorithms, especially the heuristic-basedone (SD-SA), have a real impact on the effectiveness of thesystem, and are better than the state-of-the-art adaptationalgorithms (MLLR and MAP).

The proposed recognition approach also compares favorablywith a commercial OCR engine on both synthetic data as wellas real data



Conclusion

The state-of-art algorithms only consider adaptation of thedata model.

Structure optimization procedures can allow to find the beststructure when training HMM models.

MS-SA & SD-SA are designed for left-right topologyCDHMMs so as to adapt a set of generic models to new data.

These algorithms adapt the parameters of HMM models usinglittle labeled data, together with an adaptation of the HMMstructure that is directed so as to optimize a statisticalcriterion estimated on a moderate amount of unlabeled data.



Future Scope

The algorithms can be improved by

Adapting the number of components of each Gaussian mixtureto take into account the statistics of the new dataIncluding contextual state splitting proceduresGeneralization of these algorithms to other types of HMMtopology

Another shortcoming is their complexity (which is relativelyhigh for MS-SA algorithm) and the fact that some labeleddata is still required.



References I

[1] K. Ait-Mohand, T. Paquet, and N. Ragot, “Combining Structure andParameter Adaptation of HMMs for Printed Text Recognition,” PatternAnalysis and Machine Intelligence, IEEE Transactions on, vol. 36, no. 9,pp. 1716–1732, Sep. 2014.

[2] J. Zhang and R. Kasturi, “A Novel Text Detection System Based onCharacter and Link Energies,” Image Processing, IEEE Transactions on,vol. 23, no. 9, pp. 4187–4198, Sep. 2014.

[3] “Optical character recognition - Wikipedia, the free encyclopedia.”[Online]. Available:http://en.wikipedia.org/wiki/Optical character recognition. [Accessed:20-Dec-2014].

[4] R. Dugad and U. Desai, “A tutorial on hidden Markov models,” SignalProcessing and Artificial Neural Networks Laboratory Department ofElectrical Engineering Indian Institute of Technology, 1996.



References II

[5] “Hidden Markov model - Wikipedia, the free encyclopedia.” [Online].Available: http://en.wikipedia.org/wiki/Hidden Markov model. [Accessed:20-Dec-2014].

[6] “I SFU CMPT 413: HMM2 Ngrams versus HMMs - YouTube.” [Online].Available: https://www.youtube.com/watch?v=sxziC8Zh8Kw. [Accessed:20-Dec-2014].

[7] M. N. Stuttle, “A Gaussian mixture model spectral representation forspeech recognition,” Hughes Hall and Cambridge University EngineeringDepartment, 2003.

[8] R. Farnoosh and B. Zarpak, “Image segmentation using Gaussian mixturemodel,” IUST International Journal of Engineering Science, vol. 19, no. 1,pp. 29–32, 2008.


hmm for printed text recognition slides

Documents