ripper fast effective rule induction

31
RIPPER Fast Effective Rule Induction Machine Learning 2003 Merlin Holzapfel & Martin Schmidt Mholzapf @ uos .de Martisch @ uos .de

Upload: marlis

Post on 28-Jan-2016

36 views

Category:

Documents


1 download

DESCRIPTION

RIPPER Fast Effective Rule Induction. Machine Learning 2003 Merlin Holzapfel & Martin Schmidt [email protected] [email protected]. Rule Sets - advantages. easy to understand usually better than decision Tree learners representable in first order logic > easy to implement in Prolog - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: RIPPER Fast Effective Rule Induction

RIPPERFast Effective Rule Induction

Machine Learning 2003

Merlin Holzapfel & Martin Schmidt

[email protected] [email protected]

Page 2: RIPPER Fast Effective Rule Induction

Rule Sets - advantages

easy to understand usually better than decision Tree learners representable in first order logic

– > easy to implement in Prolog

prior knowledge can be added

Page 3: RIPPER Fast Effective Rule Induction

Rule Sets - disadvantages

scale poorly with training set size problems with noisy data

– likely in real-world data

goal: – develop rule learner that is efficient on noisy data – competitive with C4.5 / C4.5rules

Page 4: RIPPER Fast Effective Rule Induction

Problem with Overfitting

overfitting also handles noisy cases underfitting is too general

solution pruning:– reduced error pruning (REP)– post pruning– pre pruning

Page 5: RIPPER Fast Effective Rule Induction

Post Pruning (C4.5)

overfit & simplify– construct tree that overfits– convert tree to rules– prune every rule separately– sort rules according accuracy– consider order when classifying

bottom - up

Page 6: RIPPER Fast Effective Rule Induction

Pre pruning

some examples are ignored during concept generation

final concept does not classify all training data correctly

can be implemented in form of stopping criteria

Page 7: RIPPER Fast Effective Rule Induction

Reduced Error Pruning

seperate and conquer– split data in training and validation set– construct overfitting tree– until pruning reduces accuracy

• evaluate impact on validation set

of pruning a rule• remove rule so it improves accuracy most

Page 8: RIPPER Fast Effective Rule Induction

Time Complexity

REP has a time complexity of O(n4)– initial phase of overfitting alone has a

complexity of O(n²) alternative concept Grow:

– faster in benchmarks– time complexity still O(n4) with noisy data

Page 9: RIPPER Fast Effective Rule Induction

Incremental Reduced Error Pruning - IREP

by Fürnkranz & Widmer (1994) competitive error rates faster than REP and Grow

Page 10: RIPPER Fast Effective Rule Induction

How IREP Works

iterative application of REP random split of sets

bad split has negative influence

(but not as bad as with REP) immediately pruning after a rule is

grown (top-down approach)

no overfitting

Page 11: RIPPER Fast Effective Rule Induction

Cohens IREP Implementation

build rules until new rule results in too large error rate– divide data (randomly) into growing set(2/3) and pruning

set(1/3)– grow rule from growing set– immediately prune rule

• Delete final sequence of conditions– delete condition that maximizes function v

until no deletion improves value of v

– add pruned rule to ruleset– delete every example covered by rule (p/n)

NP

nNppruneNegprunePosRulev

)(

),,(

Page 12: RIPPER Fast Effective Rule Induction

Cohens IREP - Algorithm

Page 13: RIPPER Fast Effective Rule Induction

IREP and Multiple Classes

order classes according to increasing prevalence

(C1,....,Ck)

– find rule set to separate C1 from other classes

IREP(PosData=C1,NegData=C2,...,Ck)

– remove all instances learned by rule set

– find rule set to separate C2 from C3,...,Ck

...

– Ck remains as default class

Page 14: RIPPER Fast Effective Rule Induction

IREP and Missing Attributes

handle missing attributes:

– for all tests involving A• if attribute A of an instance

is missing test fails

Page 15: RIPPER Fast Effective Rule Induction

Differences Cohen <> Original

pruning:final sequence <> single final condition

stopping condition:error rate 50% <> accuracy(rule) < accuracy(empty rule)

application:missing attributes, numerical variables, multiple classes

<>

two-class problems

Page 16: RIPPER Fast Effective Rule Induction

Time Complexity

IREP: O(m log² m), m = number of examples(fixed number of classification noise)

Page 17: RIPPER Fast Effective Rule Induction

37 Benchmark Problems

Page 18: RIPPER Fast Effective Rule Induction

Generalization Performance

IREP performs worse on benchmark problems than C4.5rules

won-lost-tie ratio: 11-23-3

error ratio– 1.13 excluding mushroom– 1.52 including mushroom

C4.5rules of rateerror

IREP of rateerror

Page 19: RIPPER Fast Effective Rule Induction

Improving IREP

three modifications:

– alternative metric in pruning phase

– new stopping heuristics for rule adding

– post pruning of whole rule set(non-incremental pruning)

Page 20: RIPPER Fast Effective Rule Induction

the Rule-Value Metric

old metric not intuitiveR1: p1 = 2000, n1 = 1000

R2: p1 = 1000, n1 = 1

metric preferes R1 (fixed P,N)

leads to occasional failure to converge new metric (IREP*)

np

nppruneNegprunePosRulev

),,(*

Page 21: RIPPER Fast Effective Rule Induction

Stopping Condition

50%-heuristics often stops too soon with moderate sized examples

sensitive to the ‘small disjunct problem‘ solution:

– after a rule is added, the total description length of rule set and missclassifications (DL=C+E)

– If DL is d bits larger then the smallest length so far stop (min(DL)+d<DLcurrent)

– d = 64 in Cohen‘s implementation MDL (Minimal Description Length) heuristics

Page 22: RIPPER Fast Effective Rule Induction

IREP*

IREP* is IREP, improved by the new rule-value metric and the new stopping condition

28-8-1 against IREP 16-21-0 against C4.5rules

error ratio 1.06 (IREP 1.13)respectively 1.04 (1.52) including mushrooms

Page 23: RIPPER Fast Effective Rule Induction

Rule Optimization

post prunes rules produced by IREP*– The rules are considered in turn– for each rule R, two alternatives are

constructed•Ri‘ new rule

•Ri‘‘ based on Ri

– final rule is chosen according to MDL

Page 24: RIPPER Fast Effective Rule Induction

RIPPER

1. IREP* is used to obtain a rule set

2. rule optimization takes place

3. IREP* is used to cover remaining positive examples

Repeated Incremental Pruning to Produce Error Reduction

Page 25: RIPPER Fast Effective Rule Induction

RIPPERk

apply steps 2 and 3 k times

Page 26: RIPPER Fast Effective Rule Induction

RIPPER Performance

28-7-2 against IREP*

Page 27: RIPPER Fast Effective Rule Induction

Error Rates

RIPPER obviously is competitive

Page 28: RIPPER Fast Effective Rule Induction

Efficency of RIPPERk

modifications do not change complexity

Page 29: RIPPER Fast Effective Rule Induction

Reasons for Efficiency

find model with IREP* and then improve– effiecient first model with right size– optimization takes linear time

C4.5 has expensive optimization improvement process – to large initial model

RIPPER is especially more efficient on

large noisy datasets

Page 30: RIPPER Fast Effective Rule Induction

Conclusions

IREP is efficient rule learner for large noisy datasets but performs worse than C4.5

IREP improved to IREP* IREP* improved to RIPPER k iterated RIPPER is RIPPERk RIPPERk more efficient and performs

better than C4.5

Page 31: RIPPER Fast Effective Rule Induction

References

Fast Effective Rule Induction

William W. Cohen [1995]

Incremental Reduced Error Pruning

J. Fürnkranz & G. Widmer [1994] Efficient Pruning Methods

William W. Cohen [1993]