hmm for printed text recognition slides
DESCRIPTION
Presentation based on IEEE paper titled "Combining Structure and Parameter Adaptation of HMM for Printed Text Recognition".TRANSCRIPT
INTRODUCTION RELATED WORK MAIN TOPIC RESULTS CONCLUSION REFERENCES
Combining Structure and Parameter Adaptation ofHMM for Printed Text Recognition
ANOOP RMTech Computational Linguistics
Government Engineering CollegeSreekrishnapuram
December 3, 2014
ANOOP RMTech Computational Linguistics Combining Structure and Parameter Adaptation of HMM for Printed Text Recognition
INTRODUCTION RELATED WORK MAIN TOPIC RESULTS CONCLUSION REFERENCES
CONTENTS
1 INTRODUCTION
2 RELATED WORK
3 MAIN TOPIC
4 RESULTS
5 CONCLUSION
6 REFERENCES
ANOOP RMTech Computational Linguistics Combining Structure and Parameter Adaptation of HMM for Printed Text Recognition
INTRODUCTION RELATED WORK MAIN TOPIC RESULTS CONCLUSION REFERENCES
OCR
Uses?
Indexing multimediaReduced size
Google books, gogglesRead for blind people ...
Challenges?
Different fontsSimilar characters
Noise in the documentsText orientation
How it works?
How we works?Segmentation
Generic vs specialized fontTraining vs adaptation
ANOOP RMTech Computational Linguistics Combining Structure and Parameter Adaptation of HMM for Printed Text Recognition
INTRODUCTION RELATED WORK MAIN TOPIC RESULTS CONCLUSION REFERENCES
HMM Introduction
What are models?
Boolean, Vector, Probabilistic ...Dynamic Bayesian network , Finite stochastic automates
Who is Markov?
Russian mathematician Andrey MarkovP One can make predictions for the future of the process based
solely on its present state.
What is hidden?
2 stochastic processAn unobservable process linked to an observable.
Why should we know?
speech, handwriting, gesture recognition, part-of-speechtagging, musical score following, partial discharges andbioinformatics.
ANOOP RMTech Computational Linguistics Combining Structure and Parameter Adaptation of HMM for Printed Text Recognition
INTRODUCTION RELATED WORK MAIN TOPIC RESULTS CONCLUSION REFERENCES
HMM Fundamentals
Model: λ = {A,B, π}A: State transition probability
ai,j = probability of changing from state i to state j
B Emission probability
bj,k = probability of word at location k is associated to tag j
π: Intial state probability
πi = probability of starting in state i
Two types depending on observation
Discrete Model Continuous Model
In most cases, when continuous density HMMs (CDHMM) areused, the data distribution associated to each state isrepresented with a mixture of Gaussians (GMM)
ANOOP RMTech Computational Linguistics Combining Structure and Parameter Adaptation of HMM for Printed Text Recognition
INTRODUCTION RELATED WORK MAIN TOPIC RESULTS CONCLUSION REFERENCES
GMM
Overview of the extraction of GMM parameters from the speech signal.[1]
ANOOP RMTech Computational Linguistics Combining Structure and Parameter Adaptation of HMM for Printed Text Recognition
INTRODUCTION RELATED WORK MAIN TOPIC RESULTS CONCLUSION REFERENCES
HMM Structure optimization
Training HMMs with Baum-Welch and Viterbi algorithmsrequires specifying whole structure hyper-parameters.
Maximum a Posteriori estimation scheme uses an entropicprior to determine the best HMM structure together withemission distribution parameters.
Left-right topology is known to be the best choice for OCR,speech and handwriting recognition, because its linearstructure is the most suitable for the sequential nature of thedata
Apart from general-purpose methods like cross-validation, theHMM structure is generally determined using one of the twomain sets of methods: heuristic estimation or model selection
ANOOP RMTech Computational Linguistics Combining Structure and Parameter Adaptation of HMM for Printed Text Recognition
INTRODUCTION RELATED WORK MAIN TOPIC RESULTS CONCLUSION REFERENCES
Heuristic Approaches
Heuristic methods are task-specific
A typical solution in speech and text recognition uses thewidth of the unit models (phonemes, characters. . .),estimated on the training data set, as a criterion to determinethe number of states of the model.
The Bakis procedure sets the number of states of a left/rightmodel to a fraction of α the average ”width” of the unitmodels.
A validation set is used to choose the optimal value of α thatminimizes the error rate.
ANOOP RMTech Computational Linguistics Combining Structure and Parameter Adaptation of HMM for Printed Text Recognition
INTRODUCTION RELATED WORK MAIN TOPIC RESULTS CONCLUSION REFERENCES
Model Selection Approaches
Train several HMM models with various structureconfigurations, and then compare them using a criterion thatis generally estimated on the training data set.
A key point of model selection is the exploration strategy ofthe HMM structure “search-space”.
A first set of methods applies a global search where allcandidate models are fully trained and then compared using agiven criterion
The other set of methods explores the search-space in aniterative and greedy-search manner
Each modification in the HMM structure must be followed bya complete re-estimation of the model parameters on thetraining data.
ANOOP RMTech Computational Linguistics Combining Structure and Parameter Adaptation of HMM for Printed Text Recognition
INTRODUCTION RELATED WORK MAIN TOPIC RESULTS CONCLUSION REFERENCES
Model Selection Approaches cond..
The choice of the criterion that controls model selection iscrucial.The maximum likelihood (ML) criterion has been widely usedMost HMM model selection approaches use the BayesianframeworkModel selection = likelihood term + penalization term(Occam’s razor principle of parsimony)Above criteria have been derived with the assumption that thenumber of samples is much larger than the number ofestimated model parameters.Eventhough compact not a best model because it grants nointerest to inter-class discrimination.The discriminative information criterion selects the mostdiscriminant models, resulting in a higher performancecompared to Bayesian at the cost of larger model structuresANOOP RMTech Computational Linguistics Combining Structure and Parameter Adaptation of HMM for Printed Text Recognition
INTRODUCTION RELATED WORK MAIN TOPIC RESULTS CONCLUSION REFERENCES
HMM Adaption
Like training, adaptation is basically a procedure for modelparameter estimation, but differs from training mainly on twopoints:
the training procedure assumes that the available amount ofdata is large enough, while the scarcity of data is consideredexplicitly by adaptation algorithms, either by the clustering ofthe parameters to adapt (MLLR) or by introducing priorknowledge (MAP);when training new models, one starts from a heuristically orrandomly initialized model, while adaptation uses an existing(already trained) baseline generic model.
MLLR can be considered as a specialization of MAP(withlarge adaption data)
ANOOP RMTech Computational Linguistics Combining Structure and Parameter Adaptation of HMM for Printed Text Recognition
INTRODUCTION RELATED WORK MAIN TOPIC RESULTS CONCLUSION REFERENCES
HMM Adaption cond..
Specialization of generic CDHMM is done by increasing thelikelihood of the adaptation data conditionally to the models
It is found that adapting the covariance matrices of Gaussianshad little influence on the final result, and therefore, only theGaussian centroıds are generally adapted
Most of adaptation methods use a linear transformation toadapt the set of Gaussian parameters
MAP does not make use of linear transformation butseparately updates the parameters of each Gaussian in aniterative manner, so as to converge to the maximum aposteriori estimates of the Gaussian parameters.
ANOOP RMTech Computational Linguistics Combining Structure and Parameter Adaptation of HMM for Printed Text Recognition
INTRODUCTION RELATED WORK MAIN TOPIC RESULTS CONCLUSION REFERENCES
Combining Structure & Parameter Adaptation
To combine structure as well as parameter adaptation into asingle framework, an optimization scheme is derived that canhandle scarcity of labeled dataTo overcome this difficulty,
a semi-supervised adaptation framework where the adaptationdata set is used for the re-estimation of the parameters ofGaussian mixtures while the unlabeled validation data set isused to optimize the structure modification operations, byestimating the criterion used by each algorithm.determine a strategy to explore the HMM structuresearch-space.
The parameter adaptation stage is based on supervised MAPor MLLR
the structure adaptation stage involves two basic merging andsplitting operators of HMM states.
ANOOP RMTech Computational Linguistics Combining Structure and Parameter Adaptation of HMM for Printed Text Recognition
INTRODUCTION RELATED WORK MAIN TOPIC RESULTS CONCLUSION REFERENCES
Basic Operations Used for HMM Structure Adaptation
Merging operations.
Merge the two successive states in the model with the closestemission probability densitiesDoubles size of Gaussian componentsIterative Mixture collapsing algorithm
Splitting operations.
Proceed with a temporal split on the state with the GMM ofhigher varianceCreating two new states identical to the initial stateNew state is the result of merging ssplit and sadj .
ANOOP RMTech Computational Linguistics Combining Structure and Parameter Adaptation of HMM for Printed Text Recognition
INTRODUCTION RELATED WORK MAIN TOPIC RESULTS CONCLUSION REFERENCES
Model Selection Based Structure Adaptation
This iterative algorithm directs the adaptation of the HMM inorder to maximize a likelihood criterion.
At each iteration of the algorithm, all the HMM models areadapted separately.
For each of the models, compare the effectiveness of threealternative models: the current model, + 1 state, -1 state;
The effectiveness of a model is estimated by the probability,computed over the validation data set.
The parameter adaptation of the modified HMM is performedafter each structure modification to re-estimate its parametersbased on this new data alignment.
ANOOP RMTech Computational Linguistics Combining Structure and Parameter Adaptation of HMM for Printed Text Recognition
INTRODUCTION RELATED WORK MAIN TOPIC RESULTS CONCLUSION REFERENCES
Model Selection Based Structure Adaptation cond..
If for a given model m, the modified models are not betterthan the current model, the structure of m will no longer bemodified in further iterations.
At the end of each iteration it of the algorithm, compute theaverage likelihood
This average likelihood is compared to the one of the previousiteration . If the likelihood variation is below a threshold, thealgorithm is stopped
The complexity of each iteration is O(TNS2), where N is thenumber of HMM models, S the total number of states of allmodels and T the total number of observations in thevalidation data set.
ANOOP RMTech Computational Linguistics Combining Structure and Parameter Adaptation of HMM for Printed Text Recognition
INTRODUCTION RELATED WORK MAIN TOPIC RESULTS CONCLUSION REFERENCES
State Duration Based Structure Adaptation
This algorithm uses the average width of the characters as acriterion to determine the optimal structure of a charactermodel.
The set of states to be split or merged is determined a priori,in a non-iterative manner, according to their empirical averageduration
The criterion is the difference between the width of the HMMstate on the training data set and its width on the new data.
∆i =D′
i −Di
Di.
∆i > Threshold+ then split
∆i < Threshold− then merge
ANOOP RMTech Computational Linguistics Combining Structure and Parameter Adaptation of HMM for Printed Text Recognition
INTRODUCTION RELATED WORK MAIN TOPIC RESULTS CONCLUSION REFERENCES
State Duration Based Structure Adaptation cond..
The purpose of the use of quantiles to determine thethreshold level Threshold+ is : the increase of the number ofstate occurrences caused by all the state splittings performedmust match the average increase.
Threshold− is determined same way
This algorithm has a complexity of O(G 2D2S), (G : numberof Gaussians per mixture, D: dimension of the feature space,S : total number of states of all HMMs)
ANOOP RMTech Computational Linguistics Combining Structure and Parameter Adaptation of HMM for Printed Text Recognition
INTRODUCTION RELATED WORK MAIN TOPIC RESULTS CONCLUSION REFERENCES
Results
The evaluations performed both on synthetic data (3,100,000characters) and on real data (1,120,000 characters), to adapta set of 89 HMM character models, have shown that thesestructure adaptation algorithms, especially the heuristic-basedone (SD-SA), have a real impact on the effectiveness of thesystem, and are better than the state-of-the-art adaptationalgorithms (MLLR and MAP).
The proposed recognition approach also compares favorablywith a commercial OCR engine on both synthetic data as wellas real data
ANOOP RMTech Computational Linguistics Combining Structure and Parameter Adaptation of HMM for Printed Text Recognition
INTRODUCTION RELATED WORK MAIN TOPIC RESULTS CONCLUSION REFERENCES
Conclusion
The state-of-art algorithms only consider adaptation of thedata model.
Structure optimization procedures can allow to find the beststructure when training HMM models.
MS-SA & SD-SA are designed for left-right topologyCDHMMs so as to adapt a set of generic models to new data.
These algorithms adapt the parameters of HMM models usinglittle labeled data, together with an adaptation of the HMMstructure that is directed so as to optimize a statisticalcriterion estimated on a moderate amount of unlabeled data.
ANOOP RMTech Computational Linguistics Combining Structure and Parameter Adaptation of HMM for Printed Text Recognition
INTRODUCTION RELATED WORK MAIN TOPIC RESULTS CONCLUSION REFERENCES
Future Scope
The algorithms can be improved by
Adapting the number of components of each Gaussian mixtureto take into account the statistics of the new dataIncluding contextual state splitting proceduresGeneralization of these algorithms to other types of HMMtopology
Another shortcoming is their complexity (which is relativelyhigh for MS-SA algorithm) and the fact that some labeleddata is still required.
ANOOP RMTech Computational Linguistics Combining Structure and Parameter Adaptation of HMM for Printed Text Recognition
INTRODUCTION RELATED WORK MAIN TOPIC RESULTS CONCLUSION REFERENCES
References I
[1] K. Ait-Mohand, T. Paquet, and N. Ragot, “Combining Structure andParameter Adaptation of HMMs for Printed Text Recognition,” PatternAnalysis and Machine Intelligence, IEEE Transactions on, vol. 36, no. 9,pp. 1716–1732, Sep. 2014.
[2] J. Zhang and R. Kasturi, “A Novel Text Detection System Based onCharacter and Link Energies,” Image Processing, IEEE Transactions on,vol. 23, no. 9, pp. 4187–4198, Sep. 2014.
[3] “Optical character recognition - Wikipedia, the free encyclopedia.”[Online]. Available:http://en.wikipedia.org/wiki/Optical character recognition. [Accessed:20-Dec-2014].
[4] R. Dugad and U. Desai, “A tutorial on hidden Markov models,” SignalProcessing and Artificial Neural Networks Laboratory Department ofElectrical Engineering Indian Institute of Technology, 1996.
ANOOP RMTech Computational Linguistics Combining Structure and Parameter Adaptation of HMM for Printed Text Recognition
INTRODUCTION RELATED WORK MAIN TOPIC RESULTS CONCLUSION REFERENCES
References II
[5] “Hidden Markov model - Wikipedia, the free encyclopedia.” [Online].Available: http://en.wikipedia.org/wiki/Hidden Markov model. [Accessed:20-Dec-2014].
[6] “I SFU CMPT 413: HMM2 Ngrams versus HMMs - YouTube.” [Online].Available: https://www.youtube.com/watch?v=sxziC8Zh8Kw. [Accessed:20-Dec-2014].
[7] M. N. Stuttle, “A Gaussian mixture model spectral representation forspeech recognition,” Hughes Hall and Cambridge University EngineeringDepartment, 2003.
[8] R. Farnoosh and B. Zarpak, “Image segmentation using Gaussian mixturemodel,” IUST International Journal of Engineering Science, vol. 19, no. 1,pp. 29–32, 2008.
ANOOP RMTech Computational Linguistics Combining Structure and Parameter Adaptation of HMM for Printed Text Recognition
INTRODUCTION RELATED WORK MAIN TOPIC RESULTS CONCLUSION REFERENCES
ANOOP RMTech Computational Linguistics Combining Structure and Parameter Adaptation of HMM for Printed Text Recognition
INTRODUCTION RELATED WORK MAIN TOPIC RESULTS CONCLUSION REFERENCES
ANOOP RMTech Computational Linguistics Combining Structure and Parameter Adaptation of HMM for Printed Text Recognition