the greedy prepend algorithm for decision list induction deniz yuret michael de la maza
TRANSCRIPT
The Greedy Prepend Algorithm for Decision List Induction
Deniz Yuret
Michael de la Maza
Overview
• Decision Lists
• Greedy Prepend Algorithm
• Opus search and UCI problems
• Version space search and secondary structure prediction
• Limited look-ahead search and Turkish morphology disambiguation
Introduction to Decision Lists
• Prototypical machine learning problem:– Decide democrat or republican for 435
representatives based on 16 votes.
Class Name: 2 (democrat, republican)1. handicapped-infants: 2 (y,n)2. water-project-cost-sharing: 2 (y,n)3. adoption-of-the-budget-resolution: 2 (y,n)4. physician-fee-freeze: 2 (y,n)5. el-salvador-aid: 2 (y,n)6. religious-groups-in-schools: 2 (y,n)…16. export-administration-act-south-africa: 2 (y,n)
Introduction to Decision Lists
• Prototypical machine learning problem:– Decide democrat or republican for 435
representatives based on 16 votes.
1. If adoption-of-the-budget-resolution = y and anti-satellite-test-ban = n and water-project-cost-sharing = y then democrat2. If physician-fee-freeze = y then republican3. If TRUE then democrat
Alternative Representations
• Decision trees:
Alternative Representations
• CNF:
• DNF:
Alternative Representations
• For 0 < k < n and n > 2,
k-CNF(n) U k-DNF(n) is a subset of k-DL(n)
• For 0 < k < n and n > 2,
k-DT(n) is a subset of k-CNF(n) ∩ k-DNF(n)
• k-DT(n) is a subset of k-DL(n)
Rivest 1987
Overview
• Decision Lists
• Greedy Prepend Algorithm
• Opus search and UCI problems
• Version space search and secondary structure prediction
• Limited look-ahead search and Turkish morphology disambiguation
Decision List Induction
• Start with an empty decision list or a default rule.
• Keep adding the best rule that covers the unclassified and misclassified cases.
Design Decisions:
• Where to add the new rules (front, back)
• Criteria for best rule
• Search algorithm for best rule
The Greedy Prepend Algorithm
GPA(data)1. dlist = NIL2. default-class = most-common-class(data)3. rule = [ if true then default-class ]4. while gain(rule, dlist, data) > 05. do dlist = prepend(rule, dlist)6. rule = max-gain-rule(dlist, data)7. return dlist
The Greedy Prepend Algorithm
• Starts with a default rule that picks the most common class
• Prepends subsequent rules to the front of the decision list
• The best rule is the one with maximum gain (increase in number of correctly classified instances)
• Several search algorithms implemented
Rule Search
• The default rule predicts all instances to belong to the most common category
+ -
Correct
Assignments
Partition with respect to the
Base Rule
False Assignments
Training Set
Rule Search
• At each step add the maximum gain rule
+ -
+
+
-
-
Partition with respect to the Decision List
Partition with respect to the
Next Rule
Overview
• Decision Lists
• Greedy Prepend Algorithm
• Opus search and UCI problems
• Version space search and secondary structure prediction
• Limited look-ahead search and Turkish morphology disambiguation
Opus Search: Simple tree
Opus Search: Fixed order tree
Opus Search: Optimal pruning
GPA-Opus on UCI Problems
Overview
• Decision Lists
• Greedy Prepend Algorithm
• Opus search and UCI problems
• Version space search and secondary structure prediction
• Limited look-ahead search and Turkish morphology disambiguation
A Generic Prediction Algorithm: Sequence to Structure
MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD??????????????????????????????????????
A Generic Prediction Algorithm: Sequence to Structure
MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD-?????????????????????????????????????
A Generic Prediction Algorithm: Sequence to Structure
MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD-?????????????????????????????????????
A Generic Prediction Algorithm: Sequence to Structure
MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD--????????????????????????????????????
A Generic Prediction Algorithm: Sequence to Structure
MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD--????????????????????????????????????
A Generic Prediction Algorithm: Sequence to Structure
MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD---???????????????????????????????????
A Generic Prediction Algorithm: Sequence to Structure
MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD----H-----????????????????????????????
A Generic Prediction Algorithm: Sequence to Structure
MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD----H-----H???????????????????????????
A Generic Prediction Algorithm: Sequence to Structure
MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD----H-----H???????????????????????????
A Generic Prediction Algorithm: Sequence to Structure
MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD----H-----HH??????????????????????????
A Generic Prediction Algorithm: Sequence to Structure
MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD----H-----HHHHHHHHHH------EEEEE------?
A Generic Prediction Algorithm: Sequence to Structure
MRRWFHPNITGVEAENLLLTRGVDGSFLARPSKSNPGD----H-----HHHHHHHHHH------EEEEE-------
GPA Rules
• The first three rules of the sequence-to-structure decision list – 58.86% performance (of 66.36%)
GPA Rule 1
• Everything => Loop
GPA Rule 2
HELIX
L4 L3 L2 L1 0 R1 R2 R3 R4
* * !GLY !GLY !ASN !GLY !PRO !PRO !PRO
!PRO !GLY !PRO
!PRO
!SER
(Non-polaror large)
GPA Rule 3
STRAND
L4 L3 L2 L1 0 R1 R2 R3 R4
!LEU !ALA !ASP !ALA CYS !PRO !ARG !LEU !LEU
!LEU
!GLN
!ASP ILE !GLN !MET !MET
!GLU
!GLY LEU !GLU
!PRO PHE !LYS
TRP !PRO
TYR
(Non-Polar and Not
Charged)
VAL
(Non-polar)
GPA-Opus not feasible for secondary structure prediction
• 9 positions
• 20 possible amino-acids per position
• Size of rule space:– With only pos=val type attributes: 21^9– If we include disjunctions: 2^180
GPA Version Space Search
Searching for a candidate rule:• Pick a random instance• If the instance is currently misclassified
and candidate rule corrects it: generalize candidate rule to include instance
• If the instance is currently correct and candidate rule changes classification: specialize candidate rule to exclude instance
GPA Secondary Structure Prediction Results
• PhD 72.3
• NNSSP 71.7
• GPA 69.2
• DSC69.1
• Predator 69.0
Overview
• Decision Lists
• Greedy Prepend Algorithm
• Opus search and UCI problems
• Version space search and secondary structure prediction
• Limited look-ahead search and Turkish morphology disambiguation
Morphological Analyzer for Turkish
masalı• masal+Noun+A3sg+Pnon+Acc (= the story)• masal+Noun+A3sg+P3sg+Nom (= his story)• masa+Noun+A3sg+Pnon+Nom^DB+Adj+With (= with
tables)
• Oflazer, K. (1994). Two-level description of Turkish morphology. Literary and Linguistic Computing
• Oflazer, K., Hakkani-Tür, D. Z., and Tür, G. (1999) Design for a turkish treebank. EACL’99
• Kenneth R. Beesley and Lauri Karttunen, Finite State Morphology, CSLI Publications, 2003
Features, IGs and Tags
• 126 unique features• 9129 unique IGs
• ∞ unique tags• 11084 distinct tags observed
in 1M word training corpus
masa+Noun+A3sg+Pnon+Nom^DB+Adj+With
stemfeatures features
inflectional group (IG) IGderivationalboundary
tag
Morphological disambiguation
• Task: pick correct parse given context1. masal+Noun+A3sg+Pnon+Acc
2. masal+Noun+A3sg+P3sg+Nom
3. masa+Noun+A3sg+Pnon+Nom^DB+Adj+With
– Uzun masalı anlat Tell the long story– Uzun masalı bitti His long story ended– Uzun masalı oda Room with long table
Morphological disambiguation
• Task: pick correct parse given context1. masal+Noun+A3sg+Pnon+Acc
2. masal+Noun+A3sg+P3sg+Nom
3. masa+Noun+A3sg+Pnon+Nom^DB+Adj+With
Key Idea
Build a separate classifier for each feature.
GPA on Morphological Disambiguation
1. If (W = çok) and (R1 = +DA)
Then W has +Det
2. If (L1 = pek)
Then W has +Det
3. If (W = +AzI)
Then W does not have +Det
4. If (W = çok)
Then W does not have +Det
5. If TRUE
Then W has +Det
• “pek çok alanda”(R1)
• “pek çok insan”(R2)
• “insan çok daha”(R4)
GPA-Opus not feasible
Attributes for a five word window:• The exact word string (e.g. W=Ali'nin)• The lowercase version (e.g. W=ali'nin)• All suffixes (e.g. W=+n, W=+In, W=+nIn,
W=+'nIn, etc.)• Character types (e.g. Ali'nin would be
described with W=UPPER-FIRST, W=LOWER-MID, W=APOS-
MID, W=LOWERLAST)
Average 40 features per instance.
GPA limited look-ahead search
• New rules are restricted to adding one new feature to existing rules in the decision list
GPA Turkish morphological disambiguation results
• Test corpus: 1000 words, hand tagged
• Accuracy: 95.87% (conf. int: 94.57-97.08)
• Better than the training data !?
Contributions and Future Work
• Established GPA as a competitive alternative to SVM’s, C4.5 etc.
• Need theory on why the best-gain rule does well.
• Need to study robustness to irrelevant or redundant attributes.
• Need to speed up the application of the resulting decision lists (convert to FSM?)