benjamin b. perry laboratory for knowledge discovery in databases kansas state university
Post on 07-Feb-2016
26 Views
Preview:
DESCRIPTION
TRANSCRIPT
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
Benjamin B. PerryLaboratory for Knowledge Discovery in Databases
Kansas State University
http://www.kddresearch.orghttp://www.cis.ksu.edu/~bbp9857
A Genetic Algorithm for Learning Bayesian Network Adjacency Matrices from Data
Ben Perry – M.S. Thesis DefenseBen Perry – M.S. Thesis Defense
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
• Bayesian Network– Definitions and examples– Inference and learning
• Genetic Algorithms• Structure Learning Background
– Problem– K2 algorithm– Sparse Candidate
• Improving K2: Permutation Genetic Algorithm (GASLEAK)– Shortcoming: greedy, sensitive to ordering– Permutation GA
• Master’s thesis: Adjacency Matrix GA (SLAM GA)– Rationale
• Evaluation with Known Bayesian Networks• Summary
OverviewOverview
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
• Bayesian Network– Directed acyclic graph – Vertices (nodes): denote events, or states of affairs (each a random variable)– Edges (arcs, links): denote conditional dependencies, causalities– Model of conditional dependence assertions (or CI assumptions)
• Example (“Ben’s Presentation” BBN) (sprinkler)
• General Product (Chain) Rule for BBNs`
Bayesian Belief Networks (BBNS):Bayesian Belief Networks (BBNS):DefinitionDefinition
X1
X2
X3
X4
Sleep:NarcolepticWellBadAll-nighter
Appearance: Good, Bad
Memory: Elephant, Good, Bad, None
Ben is nervous:Extremely, Yes, No
X5
Ben’s presentation:Good, Not so good, Failed miserably
P(Well, Good, Good, No, Good) = P(G) · P(G | W) · P(G | W) · P(N | G, G) · P(G | N)
n
iiin21 Xparents |XPX , ,X,XP
1
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
• Idea– Want: model that can be used to perform inference– Desired properties
• Correlations among variables• Ability to represent functional, logical, stochastic relationships• Probability of certain events
• Inference: Decision Support Problems– Diagnosis (medical, equipment)
– Pattern recognition (image, speech)– Prediction
• Want to Learn: Most Likely Model that Generates Observed Data– Under certain assumptions (Causal Markovity), it has been shown that we can do it– Given: data D (tuples or vectors containing observed values of variables)– Return: directed graph (V, E) expressing target CPTs– NEXT: Genetic algorithms
Graphical ModelsGraphical Modelsof Probability Distributionsof Probability Distributions
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
• Idea– Emulate natural process of survival of the fittest (Example: Roaches adapt)– Each generation has many diverse individuals– Each individual competes for the chance to survive– Most common approach: best individuals live to the next generation and mate
– Produce children with traits from both parents– If parents are strong, children might be stronger
• Major components (operators)– Fitness function– Chromosome manipulation
– Cross-over (Not the “John Edward” type!), mutation
• From (Educated?) Guess to Gold– Initial population typically random or not much better than random – bad scores– Performs well with a non-deceptive search space and good genetic operators– Ability to escape local optima with mutations.– Not guaranteed to get the best answer, but usually gets close
Genetic AlgorithmsGenetic Algorithms
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
Learning Structure:Learning Structure:K2K2 Algorithm Algorithm
• Algorithm Learn-BBN-Structure-K2 (D, Max-Parents)FOR i 1 to n DO // arbitrary ordering of variables {x1, x2, …, xn}
WHILE (Parents[xi].Size < Max-Parents) DO // find best candidate parent
Best argmaxj>i (P(D | xj Parents[xi]) // max Dirichlet score
IF (Parents[xi] + Best).Score > Parents[xi].Score) THEN Parents[xi] += Best
RETURN ({Parents[xi] | i {1, 2, …, n}})
• A Logical Alarm Reduction Mechanism [Beinlich et al, 1989]– BBN model for patient monitoring in surgical anesthesia– Vertices (37): findings (e.g., esophageal intubation), intermediates, observables– K2: found BBN different in only 1 edge from gold standard (elicited from expert)
17
6 5 4
19
10 21
311127
20
22
15
34
32
1229
9
28
7 8
30
2518
26
1 2 3
33 14
35
23
13
36
24
16
37
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
Learning Structure:Learning Structure:K2K2 downfalls downfalls
• Greedy (may fall into local maxima)• Highly dependent upon node ordering• Optimal node ordering must be given• If optimal order is already known, an expert could probably create the network• Number of orderings consistent with DAGs is exponential (n!)
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
• General Idea:– Inspect k-best parent candidates at a time. (K2 only inspects one)
– k is typically very small ~ 5 ≤ k ≤ 15– Exponential to the order of k
• Algorithm:Loop until no improvements or iteration limit exceeds:
For each node, select the top k parent candidates (mutual information or m_disc) [Restrict]
Build a network by manipulating parents (add, remove, reverse from candidate set for each node) . Only accept changes that maximizes the network score (Minimum Descriptor Length) [Maximize phase]
• Must handle cycles.. expensive.– K2 gives this to us for free– Next: Improving K2
Learning Structure:Learning Structure:Sparse CandidateSparse Candidate
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
GASLEAKGASLEAK::A Permutation GA for Variable OrderingA Permutation GA for Variable Ordering
[2] Representation Evaluatorfor Bayesian Network
Structure Learning Problems
Genetic Algorithm for Structure Learningfrom Evidence, AIS, and K2
D: Training Data
: Evidence Specification
Dtrain (Structure Learning)
Dval (Inference)
[1] Permutation Genetic Algorithm
α
CandidateOrdering
f(α)
OrderingFitness
OptimizedOrdering
α̂
eI
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
• Elitist• Chromosome representation
– Integer permutation ordering– Sample chromosome in a BBN of 5 nodes might look like: 3 1 2 0 4
• Seeding– Random shuffle
• Operators– Order crossover– Swap mutation
• Fitness– RMSE
• Job farm– Java-based; Utilize many machines regardless of OS
Properties of the Genetic AlgorithmProperties of the Genetic Algorithm
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
Frequency of Validation Set Fitness
0 200 400 600 800 1000 1200 14000.802
0.816
0.830
0.844
0.858
0.871
0.885
0.899
0.913
0.927
0.941
0.955
0.969
0.982
0.996
Histogram of estimated fitness for all 8! = 40320 permutations of Asia variables.
• Not encouraging– Bad fitness function
or bad evidence b.v.– Many graph errors
GASLEAK resultsGASLEAK results
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
• SLAM GA – Structure Learning Adjacency Matrix Genetic Algorithm• Initial population- tried several approaches:
– Completely Random Bayesian Networks (Box-Muller, Max parents)– Many illegal structures; wrote fixCycles algorithm.
– Random networks generated from parents pre-selected by the Restrict phase of Sparse Candidate
– Performed better than random– Aggregate of k learned networks from K2 given random orderings (cycles
eliminated) – Best approach
Master’s Thesis: SLAM GAMaster’s Thesis: SLAM GA
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
For small networks, k=1 is best. For larger networks, k=2 is best.
D K2Random Order
K2Random OrderAggregator
BBN
BBN
K2Random Order BBN
.
.
.
.
Training Data
Aggregate BBN
K2 Manager
BBN
1
2
k
Aggregator InstantiaterAggregator Instantiater
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
• Chromosome representation– Edge matrix – n^2 bits– Each bit represents a parent edge to node. – 1 = parent, 0 = not parent
• Operators– Crossover: Swap parents, fix cycles.
SLAM GASLAM GA
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
SLAM GA: CrossoverSLAM GA: Crossover
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
• Chromosome representation– Edge matrix – n^2– Each bit represents a parent edge to node. – 1 = parent, 0 = not parent
• Operators– Crossover: Swap parents, fix cycles.– Mutation: Reverse, delete, or add a random number of edges. Fix cycles.
• Fitness
– Total Bayesian Dirichlet equivalence score for each node
SLAM GASLAM GA
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
Results - AsiaResults - Asia
Best of first generation Actual
15 Graph Errors1 Graph Error
Learned network
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
Best fitness per generation
3300
3350
3400
3450
3500
3550
3600
3650
3700
3750
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97
Generation
Best
Fitn
ess
of G
ener
atio
n
K2x1
K2x2
Rnd
Results – AsiaResults – Asia
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
Results - PokerResults - Poker
Best of first generation Actual
11 Graph Errors2 Graph Errors
Learned network
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
Results - PokerResults - Poker
Best fitness per generation
0
500
1000
1500
2000
2500
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97
Generations
Best
Fitn
ess
of G
ener
atio
n
K2x1
K2x2
Rnd
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
Results - GolfResults - Golf
Best of first generation Actual
11 Graph Errors4 Graph Errors
Learned network
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
Results - GolfResults - Golf
Best fitness per generation
0
500
1000
1500
2000
2500
3000
3500
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97
Generation
Best
Fitn
ess
of G
ener
atio
n
K2x1
K2x2
Rnd
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
Results – Boerlage92Results – Boerlage92
Initial ActualLearned network
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
Results - Boerlage92Results - Boerlage92
Boerlage92
0
200
400
600
800
1000
1200
1400
1600
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96
Generation
Best
Fitn
ess
of G
ener
atio
n
K2x1K2x2Rnd
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
Results - AlarmResults - Alarm
Best network per generation
0
1000
2000
3000
4000
5000
6000
7000
8000
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97
Generation
Best
Fitn
ess
of G
ener
atio
n
K2x1
K2x2
Rnd
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
Final Fitness ValuesFinal Fitness Values
Asia Poker Golf Boerlage92 AlarmK2x1 3722.084 1999.395 3081.16 1228.621 5006.827K2x2 3720.6069 2011.54 3220.985 1429.355 7095.658Random 3722.249 2001.884 3214.614 1459.587 6861.285
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
K2 vs. SLAM GAK2 vs. SLAM GA
• K2:– Very good if ordering is known
– Ordering is often not known– Greedy, very dependent on ordering.
• SLAM GA– Stochastic; falls out of local optima trap– Can improve on bad structures learned by K2– Takes much longer than K2
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
GASLEAK vs. SLAM GAGASLEAK vs. SLAM GA
• GASLEAK:– Gold network never recovered– Much more computationally-expensive
– K2 is run on each [new] individual each generation– Each chromosome must be scored
– Final network has many graph errors• SLAM GA
– For small networks, gold standard network often recovered. – Relatively few graph errors for final network.
– Less computationally intensive– Initial population most expensive– Each chromosome must be scored
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
SLAM GA: RamificationsSLAM GA: Ramifications
• Effective structure learning algorithm– Ideal for small networks
• Improvement over GASLEAK– SLAM GA faster in spite of same GA parameters– SLAM GA more accurate
• Improvement over K2• Aggregate algorithm produces better initial population• Parent-swapping crossover technique effective
– Diversifies search space while retaining past information
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
SLAM GA: Future WorkSLAM GA: Future Work
• Parameter tweaking• Better fitness function
– Several ‘bad’ structures score better than gold standard– GA works fine
• ‘Intelligent’ mutation operator – Add edges from pre-qualified set of candidate parents
• New instantiation methods– Use GASLEAK– Other structure-learning algorithms
• Scalability– Job farm
Kansas State University
Department of Computing and Information SciencesBen Perry – M.S. thesis defense
SummarySummary
• Bayesian Network• Genetic Algorithms• Learning Structure: K2, Sparse Candidate• GASLEAK• SLAM GA
top related