Rule Induction with Extension Matrices
Leslie Damon, based on slides by Yuen F. Helbig
Dr. Xindong Wu, 1998
Outline
Extension matrix approach for rule induction The MFL and MCV optimization problems The HCV solution Noise handling and discretization in HCV Comparison of HCV with ID3-like algorithms
including C4.5 and C4.5 rules
Attribute-based induction algorithms
Attribute based induction concentrates on symbolic and heuristic computations
•doesn’t require built in knowledge
Best known are the ID3-like algorithms•low order polynomial in time and space
Alternatively, the extension matrix approach•Developed by Hong, et al at Univ. of Illinois in 1985•Uses extension matrix as its mathematical basis
A positive example is such an example that belongs to a known class, say ‘Play’
All the other examples can be called negative examples
Positive and Negative Examples
€
ek+ =(v1k
+ , ..., vak+ )
(overcast, mild, high, windy) => Play
€
ek− =(v1k
− ,..., vak− )
(rainy, hot, high, windy) => Don’t Play
Negative example matrix is defined as
€
NEM=(e1− ,..., en
−)T =(rij)nxa
rainy hot high windy
rainy cool normal windy
sunny hot normal windy
sunny mild high windy
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
Negative Example Matrix
⎩⎨⎧
=kijr when, v+
jk NEMij NEMij when, v+
jk NEMij
The extension matrix (EM) of a positive example against NEM, is defined as
€
EMk =(rijk)nxa, k ={1 ,..., }pdead-element
Extension Matrix
A dead element cannot be used to distinguish a positive example from negative examples
Example Extension Matrix
rainy hot high windy
rainy cool normal windy
sunny hot normal windy
sunny mild high windy
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
Negative Extension Matrix (NEM)
Positive Example
[ ]overcast mild high windy
Example Extension Matrix
rainy hot
rainy cool normal
sunny hot normal
sunny
* *
*
*
* * *
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
Extension Matrix (EM)
Positive Example
[ ]overcast mild high windy
⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛
∗∗
∗∗
01
10
1
3 2 1X X X
e.g., {X1 1, X2 0, X1 1} and {X1 1, X3 1, X2 0} are paths in the extension matrix above
A set of ‘n’ non-dead elements that come from ‘i’different rows is called a path in an extension matrix
Attributes
Extension matrix
Paths in Extension Matrices
Conjunctive Formulas
A path in the EMk of the positive example k against NEM corresponds to a conjunctive formula or cover
],r[X L ijji
n
1i i≠=
=∧
€
{r1j 1 ,..., rnjn}
Path: {X 1, X 0, X 1}
Formula: X 1 X 0 X 1
Path: {X 1, X , X 0}
Formula: X 1 X X 0
1 2 1
1 2 1
1 3 2
1 3 2
= = =≠ ∩ ≠ ∩ ≠
= = =≠ ∩ ≠ ∩ ≠
11
A path in the EMD of
against NE corresponds to a conjunctive
formula or cover,
€
L =∧i=1
n
[Xji ≠riji ], which covers
€
(e1+ , ..., en
+ )against NE and vice-versa
nxaij)(rEMD=Disjunction Matrix
⎩⎨⎧
=ijr when,
otherwise
€
∨k2=1k =EMik2
( ,i )j = ( ,NEM i )j
€
∃k1 ∈ {i1 ,..., ik} :EMk1( ,i )j =∗
€
{ei1+, ..., eik
+}
€
{r1j 1, ..., rnjn}
all of
Extension Matrix Disjunction
EMD Example
rainy hot high windy
rainy cool normal windy
sunny hot normal windy
sunny mild high windy
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
Negative Extension Matrix (NEM)
EMD Example
rainy hot
rainy cool normal
sunny hot normal
sunny
* *
*
*
* * *
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
Extension Matrix (EM)
Positive Example
[ ]overcast mild high windy
EMD Example
rainy hot
rainy cool
sunny hot
sunny
* *
* *
* *
* * *
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
Positive Example
[ ]overcast mild normal calm
Extension Matrix Disjunction (EMD)
EMD Example
* * * *
* * *
* * *
* * *
cool
sunny
sunny
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
Positive Example
[ ]rainy hot high calm
Extension Matrix Disjunction (EMD)
MFL and MCV (1)
The minimum formula problem (MFL) Generating a conjunctive formula that covers a
positive example or an intersecting group of positive examples against NEM and has the minimum number of different conjunctive selectors
The minimum cover problem (MCV) Seeking a cover that covers all positive
examples in PE against NEM and has the minimum number of conjunctive formulae with each conjunctive formula being as short as possible
MFL and MCV (2)
NP-hard
Two complete algorithms are designed to solve them when each attribute domain Di {i 1,…,a} satisfies |Di| 2O(na2a) for MFLO(n2a4a pa24a) for MCV
When |Di| 2, the domain can be decomposed into several, each having base 2
HCV is a extension matrix based rule induction algorithm which is Heuristic Attribute based Noise tolerant
Divides the positive examples into intersecting groups.
Uses HFL heuristics to find a conjunctive formula which covers each intersecting group.
Low order polynomial time complexity at induction time
What is HCV ?
HFL finds a heuristic conjunctive formula which corresponds to a path in an extension or disjunction matrix
Consists of 4 strategies, applied in turn Time complexity of O(na3)
What is HFL ?
HFL - Fast Strategy
Selector [X5 {normal, dry-peep}] can be a possible selector, which will cover all 5 rows
⎟⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜⎜
⎝
⎛
∗−
∗−∗∗
∗
normalfastmediumlow
peepdryfastspotslightabsent
normalstripslightlow
peepdryfasthale
normalstripslightabsent
⎟⎟⎟⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜⎜⎜⎜
⎝
⎛
∗∗∗
∗∗
∗∗∗
1
01
10
01
10
1
⎟⎟⎟⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜⎜⎜⎜
⎝
⎛
∗∗∗
∗∗
∗∗∗
1
01
10
01
10
1
HFL - Precedence
Selector [X1 1] and [X3 1] are two inevitable selectors in the above extension matrix
⎟⎟⎟⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜⎜⎜⎜
⎝
⎛
∗∗∗
∗∗
∗∗∗
1
01
10
01
10
1
HFL - Elimination
Attribute X2 can be eliminated by X3
⎟⎟⎟⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜⎜⎜⎜
⎝
⎛
∗∗∗∗∗∗
∗∗∗
01
101
10
101
010
11
HFL - Least Frequency
Attribute X1 can be eliminated and there still exists a path
⎟⎟⎟⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜⎜⎜⎜
⎝
⎛
∗∗
∗∗
∗∗
10
01
10
01
10
11
HFL Algorithm (1)
Procedure HFL(EM; Hfl) S0: Hfl {} S1: /* the fast strategy */ Try the fast strategy on all these rows which haven't
been covered; If successful, add a corresponding selector to Hfl
and return(Hfl)S2: /* the precedence strategy */ Apply the precedence strategy to the uncovered
rows; If some inevitable selectors are found,
add them to Hfl, label all the rows they cover, and go to S1
HFL Algorithm (2)
S3: /* the elimination strategy */ Apply the elimination strategy to those attributes
that have neither been selected nor eliminated; If an eliminable selector is found, reset all the elements
in the corresponding column with *, and go to S2. S4: /* the least frequency strategy */ Apply the least frequency strategy to those attributes
which have neither been selected nor eliminated, and find a least frequency selector;
Reset all the elements in the corresponding column with *, and go to S2.
Return(Hfl)
HCV Algorithm
HCV:
partitions the PEs into intersecting groupscalls HFL to find the Hfl for each groupbuilds covering formula by doing a logical OR of
the Hflsreturns the covering formula Hcv
Complexity of HCV
Worst case time complexity
Space requirement 2na
€
(O (na+ (2na +na+na+1) +(na3
j=i+1
p
∑i=1
p
∑ ) +1))
€
≈ (O pna3 +p2 )na
HCV Example
HCV Example
absent slight strip normal normal
high heavy hole fast dry peep
low slight strip normal normal
absent slight spot fast dry peep
low medium flack fast normal
−
−
⎡
⎣
⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥
NEM for Pneumonia
HCV Example
absent slight strip normal
hole fast dry peep
low slight strip normal
absent slight spot fast dry peep
low medium fast normal
*
* *
*
*
−
−
⎡
⎣
⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥
EM1
Positive Example 1
[ ]high heavy flack normal bubble like−
HCV Example
absent slight strip normal
high hole fast dry peep
low slight strip normal
absent slight spot fast dry peep
low medium fast normal
*
*
*
*
−
−
⎡
⎣
⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥
EM2
Positive Example 2
[ ]medium heavy flack normal bubble like−
HCV Example
absent strip normal
high heavy hole fast
strip normal
absent fast
medium flack fast normal
* *
*
* * *
* * *
*
⎡
⎣
⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥
EM3
Positive Example 3
[ ]low slight spot normal dry peep−
HCV Example
absent slight strip normal
heavy hole fast dry peep
low slight strip normal
absent slight spot fast dry peep
low fast normal
*
*
*
* *
−
−
⎡
⎣
⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥
EM4
Positive Example 4
[ ]high medium flack normal bubble like−
HCV Example
absent strip normal
high heavy hole fast dry peep
low strip normal
absent spot fast dry peep
low medium fast normal
* *
* *
*
*
−
−
⎡
⎣
⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥
EM5
Positive Example 5
€
medium slight flack normal bubble − like[ ]
HCV Example
EM1 EM2∩
absent slight strip normal
hole fast dry peep
low slight strip normal
absent slight spot fast dry peep
low medium fast normal
*
* *
*
*
−
−
⎡
⎣
⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥
HCV Example
EM1 EM2 EM3∩
absent strip normal
hole fast
strip normal
absent fast
medium fast normal
* *
* * *
* * *
* * *
* *
⎡
⎣
⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥
∩
HCV Example
EM1 EM2 EM3 EM4∩
absent strip normal
hole fast
strip normal
absent fast
fast normal
* *
* * *
* * *
* * *
* * *
⎡
⎣
⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥
∩∩
HCV Example
EM1 EM2 EM3 EM4 EM5∩
absent strip normal
hole fast
strip normal
absent fast
fast normal
* *
* * *
* * *
* * *
* * *
⎡
⎣
⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥
∩∩ ∩
HCV Example
HFL Step 1: Fast Strategy
absent strip normal
hole fast
strip normal
absent fast
fast normal
* *
* * *
* * *
* * *
* * *
⎡
⎣
⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥
HFL Rules = {}
HCV Example
HFL Step 2: Precedence
absent strip normal
hole fast
strip normal
absent fast
fast normal
* *
* * *
* * *
* * *
* * *
⎡
⎣
⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥
HFL Rules = {}
HCV Example
HFL Step 3: Elimination
absent strip normal
hole fast
strip normal
absent fast
fast normal
* *
* * *
* * *
* * *
* * *
⎡
⎣
⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥
HFL Rules = {}
HCV Example
absent strip normal
hole fast
strip normal
absent fast
fast normal
* *
* * *
* * *
* * *
* * *
⎡
⎣
⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥
HFL Rules = {}
HFL Step 4: Least-Frequency
HCV Example
HFL Step 4: Least-Frequency
* * *
* * *
* * *
* * * *
* * *
strip normal
hole fast
strip normal
fast
fast normal
⎡
⎣
⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥
HFL Rules = {}
HCV Example
HFL Step 2: Precedence
* * *
* * *
* * *
* * * *
* * *
strip normal
hole fast
strip normal
fast
fast normal
⎡
⎣
⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥
HFL Rules = {ESR fast }≠
HCV Example
HFL Step 2: Precedence
* * *
* * * * *
* * *
* * * * *
* * * * *
strip normal
strip normal
⎡
⎣
⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥
HFL Rules = {ESR fast }, go to S1
≠
HCV Example
* * *
* * * * *
* * *
* * * * *
* * * * *
strip normal
strip normal
⎡
⎣
⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥
≠
HFL Step 1: Fast Strategy
HFL Rules = {ESR fast, AUSCULTATION normal }≠
€
* * strip * *
* * * * *
* * strip * *
* * * * *
* * * * *
⎡
⎣
⎢ ⎢ ⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥ ⎥ ⎥
HCV Example
HFL Step 1: Fast Strategy
HFL Rules = {ESR fast , AUSCULTATION normal }
≠≠
HCV Example
HCV generated rule
C4.5rules generated rule
Example (8)
HCV Noise Handling
Don’t care values are dead elements
Approximate partitioning partitioning of PE into groups can be approximate
rather than strict
Stopping criteria similar to -c option of C4.5
Real-Valued Attributes
HCV uses the Information Gain Heuristic
Stop splitting criteria Stop if the information gain on all cut points is the
same. Stop if the number of examples to split is less than a
certain number. Limit the total number of intervals.
Comparison (1)
Tr aining Set 1 Tr aining Set 2 Tr aining Set 3Algorithm
rules conditions rules conditions rules conditionsID3 53 216 105 498 30 98
C4.5 60 262 113 566 27 89
C4.5 with grouping 9 31 55 353 20 102
C4.5 Rules 31 101 97 374 23 65
C4.5rules with grouping 8 19 46 188 11 35
NewID 21 143 59 401 18 101
HCV 7 16 39 168 18 62
Table 1: Number of rules and conditions using Monk 1, 2 and 3 dataset as training set 1, 2 and 3 respectively
Comparison (2)
Table 2: AccuracyAlgorithm Test Set 1 Test Set 2 Test Set 3
ID3 83.3% 68.3% 94.4%
C4.5 82.4% 69.7% 90.3%C4.5 with grouping 100% 82.4% 93.1%
C4.5 Rules 92.4% 75.7% 85.4%
C4.5rules with grouping 100% 81.0% 91.4%NewID 93% 78% 89%
HCV 100% 81.7% 90.3%
Comparison (3)
Conclusions
Rules generated in HCV take the form of variable-valued logic rules, rather than decision trees
HCV generates very compact rules in low-order polynomial time
Noise handling and discretization
Predictive accuracy comparable to the ID3 family of algorithms viz., C4.5, C4.5rules
a Number of attributesXa ath attributee Vector of positive examplese– Vector of negative examples
Value of ath attribute in the kth positive examplen Number of negative examplesp Number of positive examples(rij)axb ijth element of axb matrix
A(i,j) ijth element of matrix A
Extension Matrix Terminology
+akv