boolean decision rules via column generation06-09... · boolean decision rules via column...

14
IBM Research AI / December 6, 2018 / © 2018 IBM Corporation Boolean Decision Rules via Column Generation Sanjeeb Dash* Oktay Günlük* Dennis Wei* IBM Research 1

Upload: others

Post on 30-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Boolean Decision Rules via Column Generation06-09... · Boolean Decision Rules via Column Generation ... Column generation to efficiently search space of rules without restrictions

IBM Research AI / December 6, 2018 / © 2018 IBM Corporation

Boolean Decision Rules via Column Generation—

Sanjeeb Dash*Oktay Günlük*Dennis Wei*

IBM Research

1

Page 2: Boolean Decision Rules via Column Generation06-09... · Boolean Decision Rules via Column Generation ... Column generation to efficiently search space of rules without restrictions

Problem Statement

Learn Boolean rules for binary classification• Disjunctive normal form (DNF, OR of ANDs)• Conjunctive normal form (CNF, AND of ORs)

Rules with few clauses and conditions are interpretable

Optimize accuracy vs. simplicity using integer programming (IP)

Non-smoker OR Cholesterol < 160 Blood pressure < 140AND

IBM Research AI / December 6, 2018 / © 2018 IBM Corporation

Heart disease risk < 5%

# accounts < 5 # accounts ≥ 7 Debt > $1000ANDOR Credit risk = high

2

Page 3: Boolean Decision Rules via Column Generation06-09... · Boolean Decision Rules via Column Generation ... Column generation to efficiently search space of rules without restrictions

Related Models

IBM Research AI / December 6, 2018 / © 2018 IBM Corporation

DNF Boolean rule = Decision rule setIF A THEN Y=1IF B AND C THEN Y=1IF D AND E THEN Y=1ELSE Y=0

Decision listIF A THEN Y=1ELSE IF B AND C THEN Y=1ELSE IF D AND E THEN Y=1ELSE Y=0

Decision tree

3

Page 4: Boolean Decision Rules via Column Generation06-09... · Boolean Decision Rules via Column Generation ... Column generation to efficiently search space of rules without restrictions

IBM Research AI / December 6, 2018 / © 2018 IBM Corporation

Assume non-binary features have been binarized• Categorical: “one-hot” coding (e.g. color=red, color=blue)• Numerical: comparison with thresholds (e.g. blood pressure ≤ 130, >130)

4

Preliminaries

Page 5: Boolean Decision Rules via Column Generation06-09... · Boolean Decision Rules via Column Generation ... Column generation to efficiently search space of rules without restrictions

Main Challenge

IBM Research AI / December 6, 2018 / © 2018 IBM Corporation

Exponentially many possible clauses• e.g. # accounts, # accounts AND debt, # accounts AND debt AND months since delinquency, …

Previous works limited search using heuristics

5

Page 6: Boolean Decision Rules via Column Generation06-09... · Boolean Decision Rules via Column Generation ... Column generation to efficiently search space of rules without restrictions

Column Generation

IBM Research AI / December 6, 2018 / © 2018 IBM Corporation

Select clauses from exponentially large set

clause complexity costs

clause data matrix

Master IP/LP

6

Page 7: Boolean Decision Rules via Column Generation06-09... · Boolean Decision Rules via Column Generation ... Column generation to efficiently search space of rules without restrictions

Column Generation

IBM Research AI / December 6, 2018 / © 2018 IBM Corporation

Solve only over small subsets

clause complexity costs

clause data matrix

Restricted Master LP

7

Page 8: Boolean Decision Rules via Column Generation06-09... · Boolean Decision Rules via Column Generation ... Column generation to efficiently search space of rules without restrictions

Column Generation

IBM Research AI / December 6, 2018 / © 2018 IBM Corporation

Solve only over small subsets

clause complexity costs

clause data matrix

Restricted Master LPAugment with improving clauses (columns)

8

Page 9: Boolean Decision Rules via Column Generation06-09... · Boolean Decision Rules via Column Generation ... Column generation to efficiently search space of rules without restrictions

Column Generation

IBM Research AI / December 6, 2018 / © 2018 IBM Corporation

Solve only over small subsets

clause complexity costs

clause data matrix

Restricted Master LP

Pricing IP

Augment with improving clauses (columns)

9Also generate columns using heuristic

Page 10: Boolean Decision Rules via Column Generation06-09... · Boolean Decision Rules via Column Generation ... Column generation to efficiently search space of rules without restrictions

Procedure and Optimality Guarantees

IBM Research AI / December 6, 2018 / © 2018 IBM Corporation 10

Improving column

generated?

Re-solve Restricted MLP

Yes

No

IPs solved using CPLEX

5 min time limit overall

Page 11: Boolean Decision Rules via Column Generation06-09... · Boolean Decision Rules via Column Generation ... Column generation to efficiently search space of rules without restrictions

Procedure and Optimality Guarantees

IBM Research AI / December 6, 2018 / © 2018 IBM Corporation 11

Improving column

generated?

Non-existence proven?

Solution integral?

Solution matches

lower bound?

Re-solve Restricted MLP

Obtain feasible solution to MIP

MIP solved

MLP solved

Lower bound on MIP

Weaker lower bound on MIP

Yes Yes

Yes

Yes

No No

No

Report solution and gap

No

Page 12: Boolean Decision Rules via Column Generation06-09... · Boolean Decision Rules via Column Generation ... Column generation to efficiently search space of rules without restrictions

Accuracy-Complexity Trade-Off

12IBM Research AI / December 6, 2018 / © 2018 IBM Corporation

Lines connect Pareto-efficient points

Column generation (CG) dominates on 8 of 16 datasets and is close on 2 others

Page 13: Boolean Decision Rules via Column Generation06-09... · Boolean Decision Rules via Column Generation ... Column generation to efficiently search space of rules without restrictions

Accuracy Maximization

13IBM Research AI / December 6, 2018 / © 2018 IBM Corporation

CG competitive with RIPPER [Cohen 1995]

CG can find simpler rules that are no less accurate(adult, bank, magic, FICO)

dataset CG BRS AM BCD RIPPER CART RF

adult 83.5 81.7 83.0 82.4 83.6 83.1 84.7

bank 90.0 87.4 90.0 89.7 89.9 89.1 88.7

gas 98.0 92.2 97.6 97.0 99.0 95.4 99.7

magic 85.3 82.5 80.7 80.3 84.5 82.8 86.6

mushroom 100.0 99.7 99.9 99.9 100.0 96.2 99.9

musk 95.6 93.3 96.9 92.1 95.9 90.1 86.2

FICO 71.7 71.2 71.2 70.9 71.8 70.9 73.1

adult 88.0 39.1 15.0 13.2 133.3 95.9

bank 9.9 13.2 6.8 2.1 56.4 3.0

gas 123.9 22.4 62.4 27.8 145.3 104.7

magic 93.0 97.2 11.5 9.0 177.3 125.5

mushroom 17.8 17.5 15.4 14.6 17.0 9.3

musk 123.9 33.9 101.3 24.4 143.4 17.0

FICO 13.3 23.2 8.7 4.8 88.1 155.0

accuracy

complexity

Page 14: Boolean Decision Rules via Column Generation06-09... · Boolean Decision Rules via Column Generation ... Column generation to efficiently search space of rules without restrictions

Conclusion

14IBM Research AI / December 6, 2018 / © 2018 IBM Corporation

Accurate and interpretable Boolean classification rules

Column generation to efficiently search space of rules without restrictions

Optimality guarantees on training set

Superior accuracy-simplicity trade-offs

Poster #79, Room 210, 10:45 – 12:45 today