rule induction with extension matrices dr. xindong wu journal of the american society for...
Post on 19-Dec-2015
213 views
TRANSCRIPT
Rule Induction with Extension Matrices Dr. Xindong Wu
Journal of the American Society for Information ScienceVOL. 49, NO. 5, 1998
Presented by Peter DuvalS3 Elimination Strategy
EM
pathInevit
able Selector
HCV
NEM
PE
NE
EMD
HFL
Eliminable Selector
Redundant Selector
MFL
MCV
S1 Fast Strategy
S2 Precedence Strategy
S4 Least Frequency Selector
Persistent
Intersecting Group
Context
• HFL/HCV presents an alternative to decision trees with rule induction.
• HCV can be used as a benchmark for rule induction.
Context
• This paper condenses Dr. Wu’s Ph.D. dissertation on the extension matrix approach and HFL/HCV algorithm.
• Look to University of Illinois, J. Wong and R.S. Michalski for work leading to HFL/HCV.
• HFL/HCV appears to be underrepresented in literature citations.
Overview
1. Represent the negative training data as row vectors in a matrix.
2. Process positive examples as they come to eliminate uninformative attributes in the negative examples.
3. Read conjunctive rules from the resulting matrix.
4. Simplify and cleanup the rules.
• A positive example (PE):
Positive and Negative Examples
Play windy)high, mild, (overcast, )v ,...,(ve ak1kk
Playt Don’ windy)high, hot, (rainy, )v ,..., (ve ak1kk
• A negative example (NE):
Negative example matrix (NEM)
Gather the negative examples as row vectors in the NEM:
rainy hot high windy
rainy cool normal windy
sunny hot normal windy
sunny mild high windy
Positive Example (PE) Against NEM
A positive example written as a row vector:
rainy hot high windy
rainy cool normal windy
sunny hot normal windy
sunny mild high windy
overcast mild high windy
Extension Matrices
• Delete any matching elements in the NEM
windyhighmildsunny
windynormalhotsunny
windynormalcoolrainy
windyhighhotrainy
overcast mild high windy
Extension Matrices
• We construct one Extension Matrix per Positive Example
***
*
*
**
sunny
normalhotsunny
normalcoolrainy
hotrainy
Extension Matrices
Let’s make a second extension matrix:
rainy hot high windy
rainy cool normal windy
sunny hot normal windy
sunny mild high windy
calmnormalmildovercast
Extension Matrices
The second extension matrix:
windyhighsunny
windyhotsunny
windycoolrainy
windyhighhotrainy
*
*
*
Extension Matrices
Finally let’s make a third extension matrix:
windyhighmildsunny
windynormalhotsunny
windynormalcoolrainy
windyhighhotrainy
calmhighhotrainy
Extension Matrices
The third extension matrix:
windymildsunny
windynormalhotsunny
windynormalcool
windy
*
*
***
Dead Elements• Dead Elements, *, take the place of attributes
that fail to distinguish the negative example from the corresponding positive example.
calmhighhotrainy
windymildsunny
windynormalhotsunny
windynormalcool
windy
*
*
***
***
***
***
****
sunny
sunny
cool
Matrix Disjunction (EMD)• If there exists a dead element in any position of
the extension matrices, the EMD will have a dead element there, too.
***
*
*
**
sunny
normalhotsunny
normalcoolrainy
hotrainy
windyhighsunny
windyhotsunny
windycoolrainy
windyhighhotrainy
*
*
*
windymildsunny
windynormalhotsunny
windynormalcool
windy
*
*
***
“OR” the dead elements
***
**
**
**
sunny
hotsunny
coolrainy
hotrainy
Partitions• Once a dead row would be created, start a new EMD.
***
*
*
**
sunny
normalhotsunny
normalcoolrainy
hotrainy
windyhighsunny
windyhotsunny
windycoolrainy
windyhighhotrainy
*
*
*
***
***
***
****
sunny
sunny
cool
windymildsunny
windynormalhotsunny
windynormalcool
windy
*
*
***
Partition 1 Partition 2
…
***
**
**
**
sunny
hotsunny
coolrainy
hotrainy
Matrix Disjunction (EMD)• Let’s construct the EMD using just the first two
Extension Matrices.
***
*
*
**
sunny
normalhotsunny
normalcoolrainy
hotrainy
windyhighsunny
windyhotsunny
windycoolrainy
windyhighhotrainy
*
*
*
“OR” the dead elements
***
**
**
**
sunny
hotsunny
coolrainy
hotrainy
Matrix Disjunction (EMD)• The EMD has dramatically reduced the amount
of superfluous information.
***
*
*
**
sunny
normalhotsunny
normalcoolrainy
hotrainy
windyhighsunny
windyhotsunny
windycoolrainy
windyhighhotrainy
*
*
*
“OR” the dead elements
***
**
**
**
sunny
hotsunny
coolrainy
hotrainy
Paths• Choose one non-dead element from each row.
This is called a path.
• We can create paths in EMs and EMDs.
***
**
**
**
sunny
hotsunny
coolrainy
hotrainy
Path Cover ≡ Conjunctive Formula
• The path corresponds to a conjuctive formula expressed in variable-valued logic.
PlaytDonsunnyrainyOutlook _']],[[
***
**
**
**
sunny
hotsunny
coolrainy
hotrainy
Path = Cover ≡ Conjunctive Formula
PlaytDon
coolhoteTemperatur
sunnyOutlook
_'
]],[[
]][[
HFL
Wu developed HFL to find good rules. An algorithm with 4 strategies, it finds a compact disjunction of conjunctions:1. Fast
2. Precedence
3. Elimination
4. Least Frequency
01
101
10
101
010
11
1
X4 X3 X2 X1
HFL Strategies: Fast
X3≠1 covers all negative examples.
X3≠1 => positive class. We can stop processing.
1
01
10
01
10
1
X3 X2 X1
HFL Strategies: Precedence• [X1≠1] and [X3≠1] are inevitable selectors.• Record conjunction and label the rows as covered.• Below, a path is formed. All rows are covered. We are
done.
ClassNegative
X
X
_
]]1[3[
]]1[1[
ClassPositive
X
X
_
]]1[3[
]]1[1[
HFL Strategies: Elimination
• Redundant selectors in attribute X2 can be eliminated because non-dead X3 values cover all of the rows covered by X2.
• All elements in column X2 become dead elements.
01
101
10
101
010
11
X4 X3 X2 X1
01
11
1
11
01
11
X4 X3 X2 X1
10
0
10
0
10
1
X3 X2 X1
HFL Strategies: Least Frequency
• Attribute X1 selectors are least frequent and can be eliminated.
• Other strategies must be applied before applying Least Frequency again.
10
01
10
01
10
11
X3 X2 X1
HCV Algorithm
• HCV improves HFL:1. Partition the positive examples into intersecting
groups.2. Apply HFL on each partition3. OR the conjunctive formulae from each partition.
Well described in:http://www.cs.uvm.edu/~xwu/Publication/JASIS.ps See Wu’s 1993 Ph.D dissertation for more background:http://www.era.lib.ed.ac.uk/bitstream/1842/581/3/1993-xindongw.pdf
HCV Software
• Features many refinements and switches• Works with C4.5 data.• Can be run through a web interface:
HCV Online Interface• Is described in Appendix A of Wu’s textbook,
and online:HCV Manual
Golf
Rules for the 'Play' class (Covering 3 examples): The 1st conjunctive rule: [ temperature != { cool } ] ^ [ outlook != { sunny } ] -->
the 'Play' class (Positive examples covered: 3)
Rules for the 'Don't_Play' class (Covering 4 examples): The 2nd conjunctive rule: [ outlook != { overcast } ] ^ [ wind = { windy } ] --> the
'Don't_Play' class (Positive examples covered: 4)
The total number of conjunctive rules is: 2 The default class is: 'Don't_Play' (Examples in class: 4) Time taken for induction (seconds): 0.0 (real), 0.0
(user), 0.0 (system) Rule file or preprocessed test file not found. Skipping
deduction
HCV
• HCV is competitive with other decision tree and rule producing algorithms.
• HCV generally produces more compact rules.• HCV outputs variable-valued logic.• HCV handles noise and discretization.• HCV guarantees a “conjunctive rule for a
concept”.
Ideas
• Can HFL/HCV be applied to chess? Bratko did this with ID3. [Crevier 1993, 177]
• How can HCV be parallelized?• How does the extension matrix approach work in
closed-world situations?• Is HCV 2.0 a good candidate for automated
parameter tuning by genetic algorithm or other evolutionary technique?
Exam Questions
• Definitions:Extension Matrix: a matrix of negative examples as row
vectors, where, for a given positive example, elements that match the positive example are replaced with dead elements, denoted as ‘*’.
Dead Element: an element of a negative example which cannot be used to distinguish a given positive example from the negative example.
Path: a set of non-dead elements, one each from all of the rows of an extension matrix.
Exam Questions
• Four stages of HFL:1. Fast: A single attribute value that covers all rows
2. Precedence: Favor attributes that are the only non-dead element of a row.
3. Elimination: Get rid of redundant elements.
4. Least Frequency: Get rid of columns that cover where non-dead values cover the fewest rows.
See slides labeled “HFL Strategies”