benjamin b. perry laboratory for knowledge discovery in databases kansas state university

31
Kansas State University Department of Computing and Information Sciences Ben Perry – M.S. thesis defense Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University http://www.kddresearch.org http://www.cis.ksu.edu/~bbp9857 A Genetic Algorithm for Learning ayesian Network Adjacency Matrices from Dat Ben Perry – M.S. Thesis Defense Ben Perry – M.S. Thesis Defense

Upload: cicero

Post on 07-Feb-2016

26 views

Category:

Documents


0 download

DESCRIPTION

Ben Perry – M.S. Thesis Defense. A Genetic Algorithm for Learning Bayesian Network Adjacency Matrices from Data. Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University http://www.kddresearch.org http://www.cis.ksu.edu/~bbp9857. Overview. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

Benjamin B. PerryLaboratory for Knowledge Discovery in Databases

Kansas State University

http://www.kddresearch.orghttp://www.cis.ksu.edu/~bbp9857

A Genetic Algorithm for Learning Bayesian Network Adjacency Matrices from Data

Ben Perry – M.S. Thesis DefenseBen Perry – M.S. Thesis Defense

Page 2: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

• Bayesian Network– Definitions and examples– Inference and learning

• Genetic Algorithms• Structure Learning Background

– Problem– K2 algorithm– Sparse Candidate

• Improving K2: Permutation Genetic Algorithm (GASLEAK)– Shortcoming: greedy, sensitive to ordering– Permutation GA

• Master’s thesis: Adjacency Matrix GA (SLAM GA)– Rationale

• Evaluation with Known Bayesian Networks• Summary

OverviewOverview

Page 3: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

• Bayesian Network– Directed acyclic graph – Vertices (nodes): denote events, or states of affairs (each a random variable)– Edges (arcs, links): denote conditional dependencies, causalities– Model of conditional dependence assertions (or CI assumptions)

• Example (“Ben’s Presentation” BBN) (sprinkler)

• General Product (Chain) Rule for BBNs`

Bayesian Belief Networks (BBNS):Bayesian Belief Networks (BBNS):DefinitionDefinition

X1

X2

X3

X4

Sleep:NarcolepticWellBadAll-nighter

Appearance: Good, Bad

Memory: Elephant, Good, Bad, None

Ben is nervous:Extremely, Yes, No

X5

Ben’s presentation:Good, Not so good, Failed miserably

P(Well, Good, Good, No, Good) = P(G) · P(G | W) · P(G | W) · P(N | G, G) · P(G | N)

n

iiin21 Xparents |XPX , ,X,XP

1

Page 4: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

• Idea– Want: model that can be used to perform inference– Desired properties

• Correlations among variables• Ability to represent functional, logical, stochastic relationships• Probability of certain events

• Inference: Decision Support Problems– Diagnosis (medical, equipment)

– Pattern recognition (image, speech)– Prediction

• Want to Learn: Most Likely Model that Generates Observed Data– Under certain assumptions (Causal Markovity), it has been shown that we can do it– Given: data D (tuples or vectors containing observed values of variables)– Return: directed graph (V, E) expressing target CPTs– NEXT: Genetic algorithms

Graphical ModelsGraphical Modelsof Probability Distributionsof Probability Distributions

Page 5: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

• Idea– Emulate natural process of survival of the fittest (Example: Roaches adapt)– Each generation has many diverse individuals– Each individual competes for the chance to survive– Most common approach: best individuals live to the next generation and mate

– Produce children with traits from both parents– If parents are strong, children might be stronger

• Major components (operators)– Fitness function– Chromosome manipulation

– Cross-over (Not the “John Edward” type!), mutation

• From (Educated?) Guess to Gold– Initial population typically random or not much better than random – bad scores– Performs well with a non-deceptive search space and good genetic operators– Ability to escape local optima with mutations.– Not guaranteed to get the best answer, but usually gets close

Genetic AlgorithmsGenetic Algorithms

Page 6: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

Learning Structure:Learning Structure:K2K2 Algorithm Algorithm

• Algorithm Learn-BBN-Structure-K2 (D, Max-Parents)FOR i 1 to n DO // arbitrary ordering of variables {x1, x2, …, xn}

WHILE (Parents[xi].Size < Max-Parents) DO // find best candidate parent

Best argmaxj>i (P(D | xj Parents[xi]) // max Dirichlet score

IF (Parents[xi] + Best).Score > Parents[xi].Score) THEN Parents[xi] += Best

RETURN ({Parents[xi] | i {1, 2, …, n}})

• A Logical Alarm Reduction Mechanism [Beinlich et al, 1989]– BBN model for patient monitoring in surgical anesthesia– Vertices (37): findings (e.g., esophageal intubation), intermediates, observables– K2: found BBN different in only 1 edge from gold standard (elicited from expert)

17

6 5 4

19

10 21

311127

20

22

15

34

32

1229

9

28

7 8

30

2518

26

1 2 3

33 14

35

23

13

36

24

16

37

Page 7: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

Learning Structure:Learning Structure:K2K2 downfalls downfalls

• Greedy (may fall into local maxima)• Highly dependent upon node ordering• Optimal node ordering must be given• If optimal order is already known, an expert could probably create the network• Number of orderings consistent with DAGs is exponential (n!)

Page 8: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

• General Idea:– Inspect k-best parent candidates at a time. (K2 only inspects one)

– k is typically very small ~ 5 ≤ k ≤ 15– Exponential to the order of k

• Algorithm:Loop until no improvements or iteration limit exceeds:

For each node, select the top k parent candidates (mutual information or m_disc) [Restrict]

Build a network by manipulating parents (add, remove, reverse from candidate set for each node) . Only accept changes that maximizes the network score (Minimum Descriptor Length) [Maximize phase]

• Must handle cycles.. expensive.– K2 gives this to us for free– Next: Improving K2

Learning Structure:Learning Structure:Sparse CandidateSparse Candidate

Page 9: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

GASLEAKGASLEAK::A Permutation GA for Variable OrderingA Permutation GA for Variable Ordering

[2] Representation Evaluatorfor Bayesian Network

Structure Learning Problems

Genetic Algorithm for Structure Learningfrom Evidence, AIS, and K2

D: Training Data

: Evidence Specification

Dtrain (Structure Learning)

Dval (Inference)

[1] Permutation Genetic Algorithm

α

CandidateOrdering

f(α)

OrderingFitness

OptimizedOrdering

α̂

eI

Page 10: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

• Elitist• Chromosome representation

– Integer permutation ordering– Sample chromosome in a BBN of 5 nodes might look like: 3 1 2 0 4

• Seeding– Random shuffle

• Operators– Order crossover– Swap mutation

• Fitness– RMSE

• Job farm– Java-based; Utilize many machines regardless of OS

Properties of the Genetic AlgorithmProperties of the Genetic Algorithm

Page 11: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

Frequency of Validation Set Fitness

0 200 400 600 800 1000 1200 14000.802

0.816

0.830

0.844

0.858

0.871

0.885

0.899

0.913

0.927

0.941

0.955

0.969

0.982

0.996

Histogram of estimated fitness for all 8! = 40320 permutations of Asia variables.

• Not encouraging– Bad fitness function

or bad evidence b.v.– Many graph errors

GASLEAK resultsGASLEAK results

Page 12: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

• SLAM GA – Structure Learning Adjacency Matrix Genetic Algorithm• Initial population- tried several approaches:

– Completely Random Bayesian Networks (Box-Muller, Max parents)– Many illegal structures; wrote fixCycles algorithm.

– Random networks generated from parents pre-selected by the Restrict phase of Sparse Candidate

– Performed better than random– Aggregate of k learned networks from K2 given random orderings (cycles

eliminated) – Best approach

Master’s Thesis: SLAM GAMaster’s Thesis: SLAM GA

Page 13: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

For small networks, k=1 is best. For larger networks, k=2 is best.

D K2Random Order

K2Random OrderAggregator

BBN

BBN

K2Random Order BBN

.

.

.

.

Training Data

Aggregate BBN

K2 Manager

BBN

1

2

k

Aggregator InstantiaterAggregator Instantiater

Page 14: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

• Chromosome representation– Edge matrix – n^2 bits– Each bit represents a parent edge to node. – 1 = parent, 0 = not parent

• Operators– Crossover: Swap parents, fix cycles.

SLAM GASLAM GA

Page 15: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

SLAM GA: CrossoverSLAM GA: Crossover

Page 16: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

• Chromosome representation– Edge matrix – n^2– Each bit represents a parent edge to node. – 1 = parent, 0 = not parent

• Operators– Crossover: Swap parents, fix cycles.– Mutation: Reverse, delete, or add a random number of edges. Fix cycles.

• Fitness

– Total Bayesian Dirichlet equivalence score for each node

SLAM GASLAM GA

Page 17: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

Results - AsiaResults - Asia

Best of first generation Actual

15 Graph Errors1 Graph Error

Learned network

Page 18: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

Best fitness per generation

3300

3350

3400

3450

3500

3550

3600

3650

3700

3750

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97

Generation

Best

Fitn

ess

of G

ener

atio

n

K2x1

K2x2

Rnd

Results – AsiaResults – Asia

Page 19: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

Results - PokerResults - Poker

Best of first generation Actual

11 Graph Errors2 Graph Errors

Learned network

Page 20: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

Results - PokerResults - Poker

Best fitness per generation

0

500

1000

1500

2000

2500

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97

Generations

Best

Fitn

ess

of G

ener

atio

n

K2x1

K2x2

Rnd

Page 21: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

Results - GolfResults - Golf

Best of first generation Actual

11 Graph Errors4 Graph Errors

Learned network

Page 22: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

Results - GolfResults - Golf

Best fitness per generation

0

500

1000

1500

2000

2500

3000

3500

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97

Generation

Best

Fitn

ess

of G

ener

atio

n

K2x1

K2x2

Rnd

Page 23: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

Results – Boerlage92Results – Boerlage92

Initial ActualLearned network

Page 24: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

Results - Boerlage92Results - Boerlage92

Boerlage92

0

200

400

600

800

1000

1200

1400

1600

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96

Generation

Best

Fitn

ess

of G

ener

atio

n

K2x1K2x2Rnd

Page 25: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

Results - AlarmResults - Alarm

Best network per generation

0

1000

2000

3000

4000

5000

6000

7000

8000

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97

Generation

Best

Fitn

ess

of G

ener

atio

n

K2x1

K2x2

Rnd

Page 26: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

Final Fitness ValuesFinal Fitness Values

Asia Poker Golf Boerlage92 AlarmK2x1 3722.084 1999.395 3081.16 1228.621 5006.827K2x2 3720.6069 2011.54 3220.985 1429.355 7095.658Random 3722.249 2001.884 3214.614 1459.587 6861.285

Page 27: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

K2 vs. SLAM GAK2 vs. SLAM GA

• K2:– Very good if ordering is known

– Ordering is often not known– Greedy, very dependent on ordering.

• SLAM GA– Stochastic; falls out of local optima trap– Can improve on bad structures learned by K2– Takes much longer than K2

Page 28: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

GASLEAK vs. SLAM GAGASLEAK vs. SLAM GA

• GASLEAK:– Gold network never recovered– Much more computationally-expensive

– K2 is run on each [new] individual each generation– Each chromosome must be scored

– Final network has many graph errors• SLAM GA

– For small networks, gold standard network often recovered. – Relatively few graph errors for final network.

– Less computationally intensive– Initial population most expensive– Each chromosome must be scored

Page 29: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

SLAM GA: RamificationsSLAM GA: Ramifications

• Effective structure learning algorithm– Ideal for small networks

• Improvement over GASLEAK– SLAM GA faster in spite of same GA parameters– SLAM GA more accurate

• Improvement over K2• Aggregate algorithm produces better initial population• Parent-swapping crossover technique effective

– Diversifies search space while retaining past information

Page 30: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

SLAM GA: Future WorkSLAM GA: Future Work

• Parameter tweaking• Better fitness function

– Several ‘bad’ structures score better than gold standard– GA works fine

• ‘Intelligent’ mutation operator – Add edges from pre-qualified set of candidate parents

• New instantiation methods– Use GASLEAK– Other structure-learning algorithms

• Scalability– Job farm

Page 31: Benjamin B. Perry Laboratory for Knowledge Discovery in Databases Kansas State University

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

SummarySummary

• Bayesian Network• Genetic Algorithms• Learning Structure: K2, Sparse Candidate• GASLEAK• SLAM GA