optical character recognition: using the ullman algorithm for graphical matching iddo aviram
DESCRIPTION
Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram. OCR- a Brief Review. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram](https://reader036.vdocuments.us/reader036/viewer/2022062310/56816682550346895dda2677/html5/thumbnails/1.jpg)
Optical Character Recognition:Using the Ullman Algorithm for Graphical Matching
Iddo Aviram
![Page 2: Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram](https://reader036.vdocuments.us/reader036/viewer/2022062310/56816682550346895dda2677/html5/thumbnails/2.jpg)
OCR- a Brief Review
• Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text.
• OCR is a task, and not a mathematically defined problem.
![Page 3: Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram](https://reader036.vdocuments.us/reader036/viewer/2022062310/56816682550346895dda2677/html5/thumbnails/3.jpg)
OCR- a Brief Review
• People are using many disciplines for OCR.
• We will show just a simple, not representative, approach to deal partly with the OCR task.
Fourier Transforms
Pattern Matching
Machine Learning
Differential GeometryComputer Vision
Neural Networks
Expert Systems
Optimization Problems
Topology
Decision Making
![Page 4: Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram](https://reader036.vdocuments.us/reader036/viewer/2022062310/56816682550346895dda2677/html5/thumbnails/4.jpg)
OCR- a Brief Review• The task can be very hard, and state-of-the-art algorithms
might be not good enough for some practical purposes. In several cases, however, OCR tools can perform well and be useful.
Harder EasierHandwritten Printed
Cursive Block letters
Free handwritten Scribe script
Offline Online
Connected writing Non-connected writing
Degraded manuscripts Well-preserved manuscripts
Non restricted writing Restricted writing
![Page 5: Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram](https://reader036.vdocuments.us/reader036/viewer/2022062310/56816682550346895dda2677/html5/thumbnails/5.jpg)
OCR- a Brief Review
• The human brain does amazingly well with OCR tasks, so usually the computer results are evaluated by a comparison with a manually created ground truth data.
• However, sometimes even humans are not capable of recognition.
![Page 6: Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram](https://reader036.vdocuments.us/reader036/viewer/2022062310/56816682550346895dda2677/html5/thumbnails/6.jpg)
OCR- a Brief Review
• Can you read these scripts?
: תקווה בפתח נדלן: למעלה)
,yad1.co.ilמתוך 2012
למטה:" " חבצלת ה מתוך
1912)
![Page 7: Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram](https://reader036.vdocuments.us/reader036/viewer/2022062310/56816682550346895dda2677/html5/thumbnails/7.jpg)
OCR- a Brief Review
• Can you read this script?
מוקדמת גרסה + " כאב" שיר ל
," רבים" מים שיר , אריאל מאיר- ה שנות 70סוף
![Page 8: Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram](https://reader036.vdocuments.us/reader036/viewer/2022062310/56816682550346895dda2677/html5/thumbnails/8.jpg)
OCR- a Brief Review
• Can you read this script?
. ? : את“ תן �ועת לקוס � והברכתך �את הש�לם לבלבל אמר למלך אמר ] [ ] [ חמר ] [ י פן קוס בח מז על אל ז ע והרם אחאמה עמד אשר ה�אכל
ה�אכל.”
חרס על כתובת- ) אוסטרקון)עוזה חורבת
הברזל , IIתקופת- ה לפני 7המאההספירה
חרס על דיוהעתיקות רשות
![Page 9: Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram](https://reader036.vdocuments.us/reader036/viewer/2022062310/56816682550346895dda2677/html5/thumbnails/9.jpg)
• Using graphical tools for object recognition.• A possible scheme:– Binarization– Segmentation by connected components – Thinning– Graphical modeling– Graphical matching– Rule-Based Selection
OCR- Motivation for Graphical Matching
![Page 10: Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram](https://reader036.vdocuments.us/reader036/viewer/2022062310/56816682550346895dda2677/html5/thumbnails/10.jpg)
• Binarization:
OCR- Motivation for Graphical Matching
![Page 11: Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram](https://reader036.vdocuments.us/reader036/viewer/2022062310/56816682550346895dda2677/html5/thumbnails/11.jpg)
• Segmentation-> Thinning-> Graphical modeling:
OCR- Motivation for Graphical Matching
![Page 12: Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram](https://reader036.vdocuments.us/reader036/viewer/2022062310/56816682550346895dda2677/html5/thumbnails/12.jpg)
• Given an historical manuscript, a blessing of Brit Milah:
OCR- Motivation for Graphical Matching
![Page 13: Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram](https://reader036.vdocuments.us/reader036/viewer/2022062310/56816682550346895dda2677/html5/thumbnails/13.jpg)
• We’re interested in finding the occurrences of the letter Mem (not final):
OCR- Motivation for Graphical Matching
![Page 14: Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram](https://reader036.vdocuments.us/reader036/viewer/2022062310/56816682550346895dda2677/html5/thumbnails/14.jpg)
• By sub-graph matching we can find candidates:
OCR- Motivation for Graphical Matching
Graphical modeling
Graphical matching
![Page 15: Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram](https://reader036.vdocuments.us/reader036/viewer/2022062310/56816682550346895dda2677/html5/thumbnails/15.jpg)
• Given two graphs H and G as input, the problem is whether H has a subgraph that is isomorphic to G.
Subgraph Isomorphism Problem
3
1
2 4
2
1
3
H
G
• In this example the answer is ‘yes’ since there’s an isomorphic correspondence:
1G-1H,2G-3H,3G-2H.(There are additional isomorphic correspondences).
![Page 16: Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram](https://reader036.vdocuments.us/reader036/viewer/2022062310/56816682550346895dda2677/html5/thumbnails/16.jpg)
• Graph isomorphism – Graphs G(VG,EG) and H(VH,EH) are isomorphic if
|VG|=|VH| and there is an invertible function F from VG to VH such that for all nodes u and v in VG, (u,v) E∈ G if and only if (F(u),F(v)) E∈ H.
– Such a function F is said to be an isomorphic correspondence.
Subgraph Isomorphism Problem
![Page 17: Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram](https://reader036.vdocuments.us/reader036/viewer/2022062310/56816682550346895dda2677/html5/thumbnails/17.jpg)
• The subgraph problem is NP-complete.• There is a very simple reduction:CLIQUE ≤P Subgraph Isomorphism
• However, for many specific types of practical problems (even with ‘big’ inputs), algorithms do answer fast.
Subgraph Isomorphism Problem
![Page 18: Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram](https://reader036.vdocuments.us/reader036/viewer/2022062310/56816682550346895dda2677/html5/thumbnails/18.jpg)
• An Algorithm for Subgraph Isomorphism, J. R. Ullmann, Journal of the ACM, 1976.
• Although old, this algorithm is still very popular and having good results in practice.
The Ullman Algorithm
![Page 19: Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram](https://reader036.vdocuments.us/reader036/viewer/2022062310/56816682550346895dda2677/html5/thumbnails/19.jpg)
• There are algebraic formulations for graph isomorphism and subgraph isomorphism, that we will take use of.
• The adjacency matrix AH of a graph H would be:
The Ullman Algorithm
![Page 20: Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram](https://reader036.vdocuments.us/reader036/viewer/2022062310/56816682550346895dda2677/html5/thumbnails/20.jpg)
• We will use the notion of a permutation matrix.• Any permutation matrix is equivalent to an isomorphic
correspondence.
The Ullman Algorithm
M’=- - - -
Isomorphic Correspondence Permutation Matrix
F= F~M’
![Page 21: Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram](https://reader036.vdocuments.us/reader036/viewer/2022062310/56816682550346895dda2677/html5/thumbnails/21.jpg)
• Two graphs and are isomorphic with a correspondence F is similar to , and the similarity matrix is M’~F.
The Ullman Algorithm
F= 𝐴𝐻 2=𝑀 ′ 𝐴𝐻𝑀 ′− 1
Isomorphic Correspondence Permutation Matrix~
- - - -
M’=
F~M’
Isomorphism criterion:
iff is isomorphic to H, with a correspondence F~M’.
![Page 22: Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram](https://reader036.vdocuments.us/reader036/viewer/2022062310/56816682550346895dda2677/html5/thumbnails/22.jpg)
𝐴𝐻 2=𝑀 ′ (𝑀 ′ 𝐴𝐻)𝑇
• We can develop this equation that defines an isomorphism:
The Ullman Algorithm
Since M’ is an orthonormal matrix, thus =I
Since is a symmetric matrix
Isomorphism criterion:
iff is isomorphic to H, with a correspondence F~M’.
![Page 23: Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram](https://reader036.vdocuments.us/reader036/viewer/2022062310/56816682550346895dda2677/html5/thumbnails/23.jpg)
• In a similar fashion (without proof) we have an algebraic criterion for a subgraph isomorphism.
The Ullman Algorithm
M=’1G-1H
2G-3H
3G-2H
4G-φ
Isomorphic Correspondence Permutation Matrix~
F= Subgraph isomorphism criterion:
𝐴𝐺=𝑀 ′ (𝑀 ′ 𝐴𝐻)𝑇iff G is subgraph isomorphic to H, with a correspondence F~rectangular M’.
![Page 24: Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram](https://reader036.vdocuments.us/reader036/viewer/2022062310/56816682550346895dda2677/html5/thumbnails/24.jpg)
• We have a graph G and a graph H, and we want to know if G is subgraph isomorphic to H .
• So, We’ll search for a permutation matrix M* of size |x || that satisfies the subgraph isomorphism criterion.• We will enumerate over candidate permutation matrices of the
same size, denoting a candidate by M’, from a set of candidates that satisfies:
(The set of all M*-s) (The set of all M’-s) . During the enumeration, we check the isomorphism criterion over each candidate. If a candidate satisfies the criterion, we will return ‘yes’. If we would not find such a candidate, we will return ‘no’.
The Ullman Algorithm
![Page 25: Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram](https://reader036.vdocuments.us/reader036/viewer/2022062310/56816682550346895dda2677/html5/thumbnails/25.jpg)
The Ullman Algorithm• Ullmann’s algorithm I
– Construction of another matrix M(0) with the same size of the M’-s:
– Generation of all M’-s by setting to 0 all but one 1 in each row of M(0)
– A subgraph isomorphism has been found if M implies: .
}1,0{,otherwise0
)deg()deg(if1,
)0(,
jiGiHj
ji mVV
m
)1()1( ,, jijiG pa
3
1
2 4
2
1
3
H
G
011100100
0010001011010010
G
H
A
A
001011111111
0M
![Page 26: Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram](https://reader036.vdocuments.us/reader036/viewer/2022062310/56816682550346895dda2677/html5/thumbnails/26.jpg)
The Ullman Algorithm• Ullmann’s algorithm I
– Example
001011111111
001011110001
001011110010
001011110100
001011111000
001001000001
001010000001
001000010100
001010000100
001000011000
001001001000
1
3
2
4 1
3
3
2 2
3
1
4 1
3
1
2
1
2
3
4
2
3
3
1 1
3
2
1
011100100
)'(' THAMMP
011100100
with compared GA
1
3
2
Inner Nodes – M-s
Root - M(0)
Leaves – M’-s
![Page 27: Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram](https://reader036.vdocuments.us/reader036/viewer/2022062310/56816682550346895dda2677/html5/thumbnails/27.jpg)
The Ullman Algorithm• Ullman’s algorithm II
– Construction of another matrix M(0) with the same size of the M’-s:
– Generation of all M‘-s by setting to 0 all but one 1 in each row of M(0) . However, in this version, we will also prune all inner nodes M-s that have at least one 1 entry that doesn‘t comply with the refinement rule (to be defined). We are guaranteed to end up with the right answer since we still hold:
(The set of all M*-s) (The set of all M’-s) – A subgraph isomorphism has been found if there is M‘ that satisfies
.
}1,0{,otherwise0
)deg()deg(if1,
)0(,
jiGiHj
ji mVV
m
)1()1( ,, jijiG pa
![Page 28: Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram](https://reader036.vdocuments.us/reader036/viewer/2022062310/56816682550346895dda2677/html5/thumbnails/28.jpg)
The Ullman Algorithm• Ullmann’s refinement rule for prunning the search tree:
• Observation:• If a vertex of G, , corresponds to a vertex of H, , then for each
adjacent vertex of in G, denoted , there must be a vertex in H, denoted , in H that holds:
• A. is adjacent to in H• B. corresponds to
𝑣𝐻
𝑣𝐴𝐻
𝑣𝐺
𝑣𝐴𝐺
![Page 29: Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram](https://reader036.vdocuments.us/reader036/viewer/2022062310/56816682550346895dda2677/html5/thumbnails/29.jpg)
The Ullman Algorithm• Algebraic notation:
• For all mi,j = 1 (that is already fixed):
• Any inner node M that does not satisfy this rule is prunned, because all of its decendants are not M*-s.