presented by peter duval

35
Rule Induction with Extension Matrices Dr. Xindong Wu Journal of the American Society for Information Science VOL. 49, NO. 5, 1998 Presented by Peter Duval S3 Elimination Strategy EM path Inevitable Selector HCV NEM PE NE EMD HFL Eliminable Selector Redun dant Selector MFL MCV S1 Fast Strategy S2 Precedence Strategy S4 Least Frequency Selector Persistent Intersecting Group

Upload: simone

Post on 25-Feb-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Persistent. S1 Fast Strategy. Inevitable Selector. EM. Rule Induction with Extension Matrices Dr. Xindong Wu Journal of the American Society for Information Science VOL. 49, NO. 5, 1998. HFL. Redundant Selector. path. EMD. S2 Precedence Strategy. PE. Presented by Peter Duval. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Presented by Peter Duval

Rule Induction with Extension Matrices Dr. Xindong Wu

Journal of the American Society for Information ScienceVOL. 49, NO. 5, 1998

Presented by Peter DuvalS3 Elimination Strategy

EM

pathInevita

ble Selector

HCV

NEM

PE

NE

EMD

HFL

Eliminable Selector

Redundant Selector

MFL

MCV

S1 Fast Strategy

S2 Precedence Strategy

S4 Least Frequency Selector

Persistent

Intersecting Group

Page 2: Presented by Peter Duval

Context

• HFL/HCV presents an alternative to decision trees with rule induction.

• HCV can be used as a benchmark for rule induction.

Page 3: Presented by Peter Duval

Context

• This paper condenses Dr. Wu’s Ph.D. dissertation on the extension matrix approach and HFL/HCV algorithm.

• Look to University of Illinois, J. Wong and R.S. Michalski for work leading to HFL/HCV.

• HFL/HCV appears to be underrepresented in literature citations.

Page 4: Presented by Peter Duval

Overview

1. Represent the negative training data as row vectors in a matrix.

2. Process positive examples as they come to eliminate uninformative attributes in the negative examples.

3. Read conjunctive rules from the resulting matrix.

4. Simplify and cleanup the rules.

Page 5: Presented by Peter Duval

• A positive example (PE):

Positive and Negative Examples

Play windy)high, mild, (overcast, )v ,...,(ve ak1kk

Playt Don’ windy)high, hot, (rainy, )v ,..., (ve ak1kk

• A negative example (NE):

Page 6: Presented by Peter Duval

Negative example matrix (NEM)

Gather the negative examples as row vectors in the NEM:

rainy hot high windyrainy cool normal windysunny hot normal windysunny mild high windy

Page 7: Presented by Peter Duval

Positive Example (PE) Against NEM

A positive example written as a row vector:

rainy hot high windyrainy cool normal windysunny hot normal windysunny mild high windy

overcast mild high windy

Page 8: Presented by Peter Duval

Extension Matrices

• Delete any matching elements in the NEM

windyhighmildsunnywindynormalhotsunnywindynormalcoolrainywindyhighhotrainy

overcast mild high windy

Page 9: Presented by Peter Duval

Extension Matrices

• We construct one Extension Matrix per Positive Example

*******

sunnynormalhotsunnynormalcoolrainy

hotrainy

Page 10: Presented by Peter Duval

Extension Matrices

Let’s make a second extension matrix:

rainy hot high windyrainy cool normal windysunny hot normal windysunny mild high windy

calmnormalmildovercast

Page 11: Presented by Peter Duval

Extension Matrices

The second extension matrix:

windyhighsunnywindyhotsunnywindycoolrainywindyhighhotrainy

***

Page 12: Presented by Peter Duval

Extension Matrices

Finally let’s make a third extension matrix:

windyhighmildsunnywindynormalhotsunnywindynormalcoolrainywindyhighhotrainy

calmhighhotrainy

Page 13: Presented by Peter Duval

Extension Matrices

The third extension matrix:

windymildsunnywindynormalhotsunnywindynormalcoolwindy

*

****

Page 14: Presented by Peter Duval

Dead Elements• Dead Elements, *, take the place of attributes

that fail to distinguish the negative example from the corresponding positive example.

calmhighhotrainy

windymildsunnywindynormalhotsunnywindynormalcoolwindy

*

****

Page 15: Presented by Peter Duval

*************

sunnysunny

cool

Matrix Disjunction (EMD)• If there exists a dead element in any position of

the extension matrices, the EMD will have a dead element there, too.

*******

sunnynormalhotsunnynormalcoolrainy

hotrainy

windyhighsunnywindyhotsunnywindycoolrainywindyhighhotrainy

***

windymildsunnywindynormalhotsunnywindynormalcoolwindy

*

****

“OR” the dead elements

Page 16: Presented by Peter Duval

*********

sunnyhotsunnycoolrainyhotrainy

Partitions• Once a dead row would be created, start a new EMD.

*******

sunnynormalhotsunnynormalcoolrainy

hotrainy

windyhighsunnywindyhotsunnywindycoolrainywindyhighhotrainy

***

*************

sunnysunny

cool

windymildsunnywindynormalhotsunnywindynormalcoolwindy

*

****

Partition 1 Partition 2

Page 17: Presented by Peter Duval

*********

sunnyhotsunnycoolrainyhotrainy

Matrix Disjunction (EMD)• Let’s construct the EMD using just the first two

Extension Matrices.

*******

sunnynormalhotsunnynormalcoolrainy

hotrainy

windyhighsunnywindyhotsunnywindycoolrainywindyhighhotrainy

***

“OR” the dead elements

Page 18: Presented by Peter Duval

*********

sunnyhotsunnycoolrainyhotrainy

Matrix Disjunction (EMD)• The EMD has dramatically reduced the amount

of superfluous information.

*******

sunnynormalhotsunnynormalcoolrainy

hotrainy

windyhighsunnywindyhotsunnywindycoolrainywindyhighhotrainy

***

“OR” the dead elements

Page 19: Presented by Peter Duval

*********

sunnyhotsunnycoolrainyhotrainy

Paths• Choose one non-dead element from each row.

This is called a path.

• We can create paths in EMs and EMDs.

Page 20: Presented by Peter Duval

*********

sunnyhotsunnycoolrainyhotrainy

Path Cover ≡ Conjunctive Formula• The path corresponds to a conjuctive formula

expressed in variable-valued logic.

PlaytDonsunnyrainyOutlook _']],[[

Page 21: Presented by Peter Duval

*********

sunnyhotsunnycoolrainyhotrainy

Path = Cover ≡ Conjunctive Formula

PlaytDoncoolhoteTemperatur

sunnyOutlook

_']],[[

]][[

Page 22: Presented by Peter Duval

HFL

Wu developed HFL to find good rules. An algorithm with 4 strategies, it finds a compact disjunction of conjunctions:1. Fast2. Precedence3. Elimination4. Least Frequency

Page 23: Presented by Peter Duval

0110110101

01011

1

X4 X3 X2 X1

HFL Strategies: FastX3≠1 covers all negative examples. X3≠1 => positive class.

We can stop processing.

Page 24: Presented by Peter Duval

101

1001

101

X3 X2 X1

HFL Strategies: Precedence• [X1≠1] and [X3≠1] are inevitable selectors.• Record conjunction and label the rows as covered.• Below, a path is formed. All rows are covered. We are

done.

ClassNegativeX

X

_]]1[3[

]]1[1[

ClassPositiveXX

_]]1[3[]]1[1[

Page 25: Presented by Peter Duval

HFL Strategies: Elimination• Redundant selectors in attribute X2 can be

eliminated because non-dead X3 values cover all of the rows covered by X2.

• All elements in column X2 become dead elements.

0110110101

01011

X4 X3 X2 X1

0111111

0111

X4 X3 X2 X1

Page 26: Presented by Peter Duval

100

100

101

X3 X2 X1

HFL Strategies: Least Frequency

• Attribute X1 selectors are least frequent and can be eliminated.

• Other strategies must be applied before applying Least Frequency again.

1001

1001

1011

X3 X2 X1

Page 27: Presented by Peter Duval

HCV Algorithm

• HCV improves HFL:1. Partition the positive examples into intersecting

groups.2. Apply HFL on each partition3. OR the conjunctive formulae from each partition.

Well described in:http://www.cs.uvm.edu/~xwu/Publication/JASIS.ps See Wu’s 1993 Ph.D dissertation for more background:http://www.era.lib.ed.ac.uk/bitstream/1842/581/3/1993-xindongw.pdf

Page 28: Presented by Peter Duval

HCV Software

• Features many refinements and switches• Works with C4.5 data.• Can be run through a web interface:

HCV Online Interface• Is described in Appendix A of Wu’s textbook,

and online:HCV Manual

Page 29: Presented by Peter Duval

Golf

Rules for the 'Play' class (Covering 3 examples): The 1st conjunctive rule: [ temperature != { cool } ] ^ [ outlook != { sunny } ] -->

the 'Play' class (Positive examples covered: 3)

Rules for the 'Don't_Play' class (Covering 4 examples): The 2nd conjunctive rule: [ outlook != { overcast } ] ^ [ wind = { windy } ] --> the

'Don't_Play' class (Positive examples covered: 4)

The total number of conjunctive rules is: 2 The default class is: 'Don't_Play' (Examples in class: 4) Time taken for induction (seconds): 0.0 (real), 0.0

(user), 0.0 (system) Rule file or preprocessed test file not found. Skipping

deduction

Page 30: Presented by Peter Duval

HCV

• HCV is competitive with other decision tree and rule producing algorithms.

• HCV generally produces more compact rules.• HCV outputs variable-valued logic.• HCV handles noise and discretization.• HCV guarantees a “conjunctive rule for a

concept”.

Page 31: Presented by Peter Duval

Ideas

• Can HFL/HCV be applied to chess? Bratko did this with ID3. [Crevier 1993, 177]

• How can HCV be parallelized?• How does the extension matrix approach work in

closed-world situations?• Is HCV 2.0 a good candidate for automated

parameter tuning by genetic algorithm or other evolutionary technique?

Page 32: Presented by Peter Duval

The End.

• Presentation based on slides by Leslie Damon.• Questions?

Page 33: Presented by Peter Duval

Exam Questions

• Definitions:Extension Matrix: a matrix of negative examples as row

vectors, where, for a given positive example, elements that match the positive example are replaced with dead elements, denoted as ‘*’.

Dead Element: an element of a negative example which cannot be used to distinguish a given positive example from the negative example.

Path: a set of non-dead elements, one each from all of the rows of an extension matrix.

Page 34: Presented by Peter Duval

Exam Questions

• Four stages of HFL:1. Fast: A single attribute value that covers all rows2. Precedence: Favor attributes that are the only non-

dead element of a row.3. Elimination: Get rid of redundant elements.4. Least Frequency: Get rid of columns that cover

where non-dead values cover the fewest rows.

See slides labeled “HFL Strategies”

Page 35: Presented by Peter Duval

Exam Questions

• The Pneumonia/Tuberculosis problem is worked through in the paper and Leslie Damon’s slides. Here is the EMD:

absent strip normalhole faststrip normal

absent fastfast normal

* ** * ** * *

* * ** * *