benjamin b. perry laboratory for knowledge discovery in databases kansas state university

Kansas State University

Department of Computing and Information SciencesBen Perry – M.S. thesis defense

Benjamin B. PerryLaboratory for Knowledge Discovery in Databases

http://www.kddresearch.orghttp://www.cis.ksu.edu/~bbp9857

A Genetic Algorithm for Learning Bayesian Network Adjacency Matrices from Data

Ben Perry – M.S. Thesis DefenseBen Perry – M.S. Thesis Defense

• Bayesian Network– Definitions and examples– Inference and learning

• Genetic Algorithms• Structure Learning Background

– Problem– K2 algorithm– Sparse Candidate

• Improving K2: Permutation Genetic Algorithm (GASLEAK)– Shortcoming: greedy, sensitive to ordering– Permutation GA

• Master’s thesis: Adjacency Matrix GA (SLAM GA)– Rationale

• Evaluation with Known Bayesian Networks• Summary

OverviewOverview

• Bayesian Network– Directed acyclic graph – Vertices (nodes): denote events, or states of affairs (each a random variable)– Edges (arcs, links): denote conditional dependencies, causalities– Model of conditional dependence assertions (or CI assumptions)

• Example (“Ben’s Presentation” BBN) (sprinkler)

• General Product (Chain) Rule for BBNs`

Bayesian Belief Networks (BBNS):Bayesian Belief Networks (BBNS):DefinitionDefinition

Sleep:NarcolepticWellBadAll-nighter

Appearance: Good, Bad

Memory: Elephant, Good, Bad, None

Ben is nervous:Extremely, Yes, No

Ben’s presentation:Good, Not so good, Failed miserably

P(Well, Good, Good, No, Good) = P(G) · P(G | W) · P(G | W) · P(N | G, G) · P(G | N)

iiin21 Xparents |XPX , ,X,XP

• Idea– Want: model that can be used to perform inference– Desired properties

• Correlations among variables• Ability to represent functional, logical, stochastic relationships• Probability of certain events

• Inference: Decision Support Problems– Diagnosis (medical, equipment)

– Pattern recognition (image, speech)– Prediction

• Want to Learn: Most Likely Model that Generates Observed Data– Under certain assumptions (Causal Markovity), it has been shown that we can do it– Given: data D (tuples or vectors containing observed values of variables)– Return: directed graph (V, E) expressing target CPTs– NEXT: Genetic algorithms

Graphical ModelsGraphical Modelsof Probability Distributionsof Probability Distributions

• Idea– Emulate natural process of survival of the fittest (Example: Roaches adapt)– Each generation has many diverse individuals– Each individual competes for the chance to survive– Most common approach: best individuals live to the next generation and mate

– Produce children with traits from both parents– If parents are strong, children might be stronger

• Major components (operators)– Fitness function– Chromosome manipulation

– Cross-over (Not the “John Edward” type!), mutation

• From (Educated?) Guess to Gold– Initial population typically random or not much better than random – bad scores– Performs well with a non-deceptive search space and good genetic operators– Ability to escape local optima with mutations.– Not guaranteed to get the best answer, but usually gets close

Genetic AlgorithmsGenetic Algorithms

Learning Structure:Learning Structure:K2K2 Algorithm Algorithm

• Algorithm Learn-BBN-Structure-K2 (D, Max-Parents)FOR i 1 to n DO // arbitrary ordering of variables {x1, x2, …, xn}

WHILE (Parents[xi].Size < Max-Parents) DO // find best candidate parent

Best argmaxj>i (P(D | xj Parents[xi]) // max Dirichlet score

IF (Parents[xi] + Best).Score > Parents[xi].Score) THEN Parents[xi] += Best

RETURN ({Parents[xi] | i {1, 2, …, n}})

• A Logical Alarm Reduction Mechanism [Beinlich et al, 1989]– BBN model for patient monitoring in surgical anesthesia– Vertices (37): findings (e.g., esophageal intubation), intermediates, observables– K2: found BBN different in only 1 edge from gold standard (elicited from expert)

311127

Learning Structure:Learning Structure:K2K2 downfalls downfalls

• Greedy (may fall into local maxima)• Highly dependent upon node ordering• Optimal node ordering must be given• If optimal order is already known, an expert could probably create the network• Number of orderings consistent with DAGs is exponential (n!)

• General Idea:– Inspect k-best parent candidates at a time. (K2 only inspects one)

– k is typically very small ~ 5 ≤ k ≤ 15– Exponential to the order of k

• Algorithm:Loop until no improvements or iteration limit exceeds:

For each node, select the top k parent candidates (mutual information or m_disc) [Restrict]

Build a network by manipulating parents (add, remove, reverse from candidate set for each node) . Only accept changes that maximizes the network score (Minimum Descriptor Length) [Maximize phase]

• Must handle cycles.. expensive.– K2 gives this to us for free– Next: Improving K2

Learning Structure:Learning Structure:Sparse CandidateSparse Candidate

GASLEAKGASLEAK::A Permutation GA for Variable OrderingA Permutation GA for Variable Ordering

[2] Representation Evaluatorfor Bayesian Network

Structure Learning Problems

Genetic Algorithm for Structure Learningfrom Evidence, AIS, and K2

D: Training Data

: Evidence Specification

Dtrain (Structure Learning)

Dval (Inference)

[1] Permutation Genetic Algorithm

CandidateOrdering

OrderingFitness

OptimizedOrdering

• Elitist• Chromosome representation

– Integer permutation ordering– Sample chromosome in a BBN of 5 nodes might look like: 3 1 2 0 4

• Seeding– Random shuffle

• Operators– Order crossover– Swap mutation

• Fitness– RMSE

• Job farm– Java-based; Utilize many machines regardless of OS

Properties of the Genetic AlgorithmProperties of the Genetic Algorithm

Frequency of Validation Set Fitness

0 200 400 600 800 1000 1200 14000.802

Histogram of estimated fitness for all 8! = 40320 permutations of Asia variables.

• Not encouraging– Bad fitness function

or bad evidence b.v.– Many graph errors

GASLEAK resultsGASLEAK results

• SLAM GA – Structure Learning Adjacency Matrix Genetic Algorithm• Initial population- tried several approaches:

– Completely Random Bayesian Networks (Box-Muller, Max parents)– Many illegal structures; wrote fixCycles algorithm.

– Random networks generated from parents pre-selected by the Restrict phase of Sparse Candidate

– Performed better than random– Aggregate of k learned networks from K2 given random orderings (cycles

eliminated) – Best approach

Master’s Thesis: SLAM GAMaster’s Thesis: SLAM GA

For small networks, k=1 is best. For larger networks, k=2 is best.

D K2Random Order

K2Random OrderAggregator

K2Random Order BBN

Training Data

Aggregate BBN

K2 Manager

Aggregator InstantiaterAggregator Instantiater

• Chromosome representation– Edge matrix – n^2 bits– Each bit represents a parent edge to node. – 1 = parent, 0 = not parent

• Operators– Crossover: Swap parents, fix cycles.

SLAM GASLAM GA

SLAM GA: CrossoverSLAM GA: Crossover

• Chromosome representation– Edge matrix – n^2– Each bit represents a parent edge to node. – 1 = parent, 0 = not parent

• Operators– Crossover: Swap parents, fix cycles.– Mutation: Reverse, delete, or add a random number of edges. Fix cycles.

• Fitness

– Total Bayesian Dirichlet equivalence score for each node

SLAM GASLAM GA

Results - AsiaResults - Asia

Best of first generation Actual

15 Graph Errors1 Graph Error

Learned network

Best fitness per generation

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97

Generation

Results – AsiaResults – Asia

Results - PokerResults - Poker

11 Graph Errors2 Graph Errors

Learned network

Results - PokerResults - Poker

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97

Generations

Results - GolfResults - Golf

11 Graph Errors4 Graph Errors

Learned network

Results - GolfResults - Golf

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97

Generation

Results – Boerlage92Results – Boerlage92

Initial ActualLearned network

Results - Boerlage92Results - Boerlage92

Boerlage92

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96

Generation

K2x1K2x2Rnd

Results - AlarmResults - Alarm

Best network per generation

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97

Generation

Final Fitness ValuesFinal Fitness Values

Asia Poker Golf Boerlage92 AlarmK2x1 3722.084 1999.395 3081.16 1228.621 5006.827K2x2 3720.6069 2011.54 3220.985 1429.355 7095.658Random 3722.249 2001.884 3214.614 1459.587 6861.285

K2 vs. SLAM GAK2 vs. SLAM GA

• K2:– Very good if ordering is known

– Ordering is often not known– Greedy, very dependent on ordering.

• SLAM GA– Stochastic; falls out of local optima trap– Can improve on bad structures learned by K2– Takes much longer than K2

GASLEAK vs. SLAM GAGASLEAK vs. SLAM GA

• GASLEAK:– Gold network never recovered– Much more computationally-expensive

– K2 is run on each [new] individual each generation– Each chromosome must be scored

– Final network has many graph errors• SLAM GA

– For small networks, gold standard network often recovered. – Relatively few graph errors for final network.

– Less computationally intensive– Initial population most expensive– Each chromosome must be scored

SLAM GA: RamificationsSLAM GA: Ramifications

• Effective structure learning algorithm– Ideal for small networks

• Improvement over GASLEAK– SLAM GA faster in spite of same GA parameters– SLAM GA more accurate

• Improvement over K2• Aggregate algorithm produces better initial population• Parent-swapping crossover technique effective

– Diversifies search space while retaining past information

SLAM GA: Future WorkSLAM GA: Future Work

• Parameter tweaking• Better fitness function

– Several ‘bad’ structures score better than gold standard– GA works fine

• ‘Intelligent’ mutation operator – Add edges from pre-qualified set of candidate parents

• New instantiation methods– Use GASLEAK– Other structure-learning algorithms

• Scalability– Job farm

SummarySummary

• Bayesian Network• Genetic Algorithms• Learning Structure: K2, Sparse Candidate• GASLEAK• SLAM GA

benjamin b. perry laboratory for knowledge discovery in databases kansas state university

Documents

1 benjamin perry, venkata kambhampaty, kyle brumsted, lars...

survey of algorithms to query image databases comp...

benjamin sintay, poster authors: chris mart, and perry...

biron - birkbeck institutional research online ·...

trust & distrust as cultural frames kevin perry perry@ruc.dk

benjamin perry and martin swany university of delaware...

indexing time series. outline spatial databases temporal...

john perry

benjamin guinebertière - microsoft azure: document db and...

nc loose estates index: perry-pettway · nc loose estates...

popup registration form 012818 pdf - … · camp perry camp...

text databases. outline spatial databases temporal databases...

perry pyramid presentation to tma 7/26/16 - michele perry

ios press ontop: answering sparql queries over relational...

perry test

in.gov | the official website of the state of indiana...

gis data assessment for region ii of chile – implications...

multimedia databases text i. outline spatial databases...

kate perry

perry elementary perry press - department of defense ... ·...