[ieee 2013 5th international conference on modeling, simulation and applied optimization (icmsao...

6
Particle Swarm Optimization based method for Bayesian Network Structure Learning Saoussen Aouay * , Salma Jamoussi * , Yassine Ben Ayed * MIRACL, Multimedia InfoRmation system and Advanced Computing Laboratory Higher Institute of Computer Science and Multimedia, Sfax, BP 1030 - Tunisia Abstract—Bayesian Networks (BNs) are good tools for representing knowledge and reasoning under conditions of uncertainty. In general, learning Bayesian Network structure from a data-set is considered a NP-hard problem, due to the search space complexity. A novel structure-learning method, based on PSO (Particle Swarm Optimization) and the K2 algorithm, is presented in this paper. To learn the structure of a bayesian network, PSO here is used for searching in the space of orderings. Then the fitness of each ordering is calculated by running the K2 algorithm and returning the score of the network consistent with it. The experimental results demonstrate that our approach produces better performance compared to others BN structure learning algorithms. Keywords-Bayesian Networks; Particle Swarm Optimization; K2 Algorithm; Structure Learning I. I NTRODUCTION Bayesian networks also called as probability networks or belief networks [1] are a kind of the graphical models that are widely used to model and manipulate uncertain information [2]. They are born thanks to a marriage be- tween probability theory and graph theory. For this reason, they have been extensively used in many domains where uncertainty plays an important role, like medical diagnostics, classification systems, forecasting [3]. They consist of two components: a structure and a set of parameters. The structure is a directed acyclic graph (DAG) whose nodes represent random variables, and edges represent the direct dependencies between these variables, and the parameters specify a joint probability distribution over the random variables. Over the past decade, learning BNs structure from data has become the focus of an increasingly active area of research. Though, occasionally experts can create fine BNs from their personal knowledge, it can be a very hard task for large domains. In fact, the number of possible structures grows exponentially with the number of variables. Therefore, learning the model structure from data by considering all possible structures exhaustively is infeasible with a large number of variables and it is considered a NP-hard task [4]. As a result many researchers attempt to determine excellent algorithms for learning Bayesian networks from data. And the heuristic K2 search approach is one of the most successful algorithms. However, this solution needs an ordering of the problem variables. Our research is recorded in this area of research. That we examine the use of a bio-inspired approach based on Particle Swarm Optimization (PSO), to search in the space of orderings rather than the full space of structures. So, the best structure returned is that produced by K2 from the best ordering evaluated in this approach. The rest of the paper is organized as follows. In section 2, we give an overview of the concept of BNs. In section 3, we present the different methods of structural learning and we detailed the K2 algorithm. We give a brief introduction of classical PSO then we discuss the details of the proposed al- gorithm in section 4. Results and discussions are represented in section 5. Finally, conclusion and some perspectives are summarized in the last section. II. OVERVIEW OF BAYESIAN NETWORKS Essentially a BN can be defined as a pair (G,P), where G = (V,E) is a directed acyclic graph over a finite set of nodes (or vertices), V , interconnected by directed links (or edges), E, and P = {P (X i kPa (X i ))}is a set of Conditional Probabilistic Distributions (CPD), one for each variable X i (nodes of the graph), and Pa(X i ) is the set of parents of node X i in G [1]. The associated joint probabilistic distribution to a (G, P ) network is defined by: P (X 1 ,...,X n )= n Y i=1 P (X i |Pa (X i )) (1) Consider the figure 1, we can conclude that there are two types of learning problems in Bayesian networks: The parameters learning, where we assume that the network structure was fixed, and we will estimate the conditional probabilities of each node. The structure learning, where the goal is to find the best graph representing the task to solve. III. LEARNING BAYESIAN NETWORK STRUCTURE Learning the structure involves finding a network that best matches data. Various BN learning algorithms have been developed to learn the optimal graph. These methods can be classified in two categories: constraint-based and score- based algorithms . 978-1-4673-5814-9/13/$31.00 2013 IEEE

Upload: yassine

Post on 20-Dec-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Particle Swarm Optimization based method for Bayesian Network StructureLearning

Saoussen Aouay∗, Salma Jamoussi∗, Yassine Ben Ayed∗MIRACL, Multimedia InfoRmation system and Advanced Computing Laboratory

Higher Institute of Computer Science and Multimedia, Sfax, BP 1030 - Tunisia

Abstract—Bayesian Networks (BNs) are good tools forrepresenting knowledge and reasoning under conditions ofuncertainty. In general, learning Bayesian Network structurefrom a data-set is considered a NP-hard problem, due to thesearch space complexity. A novel structure-learning method,based on PSO (Particle Swarm Optimization) and the K2algorithm, is presented in this paper. To learn the structureof a bayesian network, PSO here is used for searching inthe space of orderings. Then the fitness of each ordering iscalculated by running the K2 algorithm and returning thescore of the network consistent with it. The experimental resultsdemonstrate that our approach produces better performancecompared to others BN structure learning algorithms.

Keywords-Bayesian Networks; Particle Swarm Optimization;K2 Algorithm; Structure Learning

I. INTRODUCTION

Bayesian networks also called as probability networksor belief networks [1] are a kind of the graphical modelsthat are widely used to model and manipulate uncertaininformation [2]. They are born thanks to a marriage be-tween probability theory and graph theory. For this reason,they have been extensively used in many domains whereuncertainty plays an important role, like medical diagnostics,classification systems, forecasting [3].They consist of two components: a structure and a setof parameters. The structure is a directed acyclic graph(DAG) whose nodes represent random variables, and edgesrepresent the direct dependencies between these variables,and the parameters specify a joint probability distributionover the random variables.Over the past decade, learning BNs structure from datahas become the focus of an increasingly active area ofresearch. Though, occasionally experts can create fine BNsfrom their personal knowledge, it can be a very hard taskfor large domains. In fact, the number of possible structuresgrows exponentially with the number of variables. Therefore,learning the model structure from data by considering allpossible structures exhaustively is infeasible with a largenumber of variables and it is considered a NP-hard task[4]. As a result many researchers attempt to determineexcellent algorithms for learning Bayesian networks fromdata. And the heuristic K2 search approach is one of themost successful algorithms. However, this solution needs anordering of the problem variables.

Our research is recorded in this area of research. Thatwe examine the use of a bio-inspired approach based onParticle Swarm Optimization (PSO), to search in the spaceof orderings rather than the full space of structures. So, thebest structure returned is that produced by K2 from the bestordering evaluated in this approach.The rest of the paper is organized as follows. In section 2,we give an overview of the concept of BNs. In section 3, wepresent the different methods of structural learning and wedetailed the K2 algorithm. We give a brief introduction ofclassical PSO then we discuss the details of the proposed al-gorithm in section 4. Results and discussions are representedin section 5. Finally, conclusion and some perspectives aresummarized in the last section.

II. OVERVIEW OF BAYESIAN NETWORKS

Essentially a BN can be defined as a pair (G,P), whereG = (V,E) is a directed acyclic graph over a finiteset of nodes (or vertices), V , interconnected by directedlinks (or edges), E, and P = {P (Xi‖Pa (Xi))}is a setof Conditional Probabilistic Distributions (CPD), one foreach variable Xi (nodes of the graph), and Pa(Xi) is theset of parents of node Xi in G [1]. The associated jointprobabilistic distribution to a (G,P ) network is defined by:

P (X1, . . . , Xn) =

n∏i=1

P (Xi |Pa (Xi)) (1)

Consider the figure 1, we can conclude that there are twotypes of learning problems in Bayesian networks:• The parameters learning, where we assume that the

network structure was fixed, and we will estimate theconditional probabilities of each node.

• The structure learning, where the goal is to find the bestgraph representing the task to solve.

III. LEARNING BAYESIAN NETWORK STRUCTURE

Learning the structure involves finding a network that bestmatches data. Various BN learning algorithms have beendeveloped to learn the optimal graph. These methods canbe classified in two categories: constraint-based and score-based algorithms .

978-1-4673-5814-9/13/$31.00 2013 IEEE

Figure 1. Humidity Bayesian Network

A. Constraint-based approaches

These methods construct networks by analyzingdependency relationships among nodes. The dependencyrelationships are measured by using some kind of statisticaltests such as the statistics of X2 or the criterion of mutualinformation. With these tests it is possible to determine aset of constraints on the structure of a Bayesian networklooking for independence between two variables resultsin the absence of an arc between them and a conditionaldependence corresponds to a V-structure [5]. The well-known algorithms belong to this category are:

• PC (Peter and Clark) algorithm [6]: It starts with a fullyconnected graph and when conditional independence isdetected using a statistical test, the corresponding arcis removed. Statistical tests are performed in an orderpreviously given to the variables.

• IC (Inductive Causation) algorithm [7]: which parts ofan empty structure then link pairs of variables when aconditional dependency is detected.

B. Score based approaches

In contrast to previously described methods that attemptedto find statistical dependencies between variables, the scorebased approaches will look for the structure that willmaximize a certain score. These methods pose learningas an optimization problem. They start by defining astatistically motivated score that describes the fitness ofeach possible structure to the observed data. The mostappropriate network represents the graph that is mostrepresentative of the data. Many heuristic search techniqueshave been applied in this way, including: Hill Climbing[8], Simulated Annealing [9], and K2 [10]. We found also,many nature-inspired techniques that are applied to theproblem of BN structure learning. Such techniques includeGenetic Algorithms (GA) [11], Ant Colony Optimization(ACO) [12] and Particle Swarm Optimization (PSO) [13].The scores utilized by these algorithms; include BD [10],BIC [14], AIC [15] and MDL [16].This paper falls within the second group. Through research

conducted in this area many studies reveal that K2 is oneof the most successful algorithms for the structure learningtask. It is widely used although it has the disadvantage ofrequiring ordering nodes as an input parameter. Therefore,our method is based on the K2 algorithm and it tries tofind a solution for his drawback by using a bio-inspiredconcept, the PSO. In the next paragraph we will give somedetails about the K2 algorithm.

K2 Algorithm : K2 is one of the best known algorithmsthat creates and evaluates a BN from a database of casesonce an ordering between the variables is given [10]. Thisapproach part of an empty graph and then it tests theinclusion of parents as follows: the first node in the ordercannot have parents and for subsequent nodes, the set ofparents is chosen as the subset of nodes that increase thehigher the score among all nodes preceding it in the order.The idea of the K2 is to maximize the probability of BNgiven the data structure in the space of DAG complying withthe order. This technique uses Bayes score originally calledBD or the formula of Cooper and Herskovits, the (K2- CH)metric [10], this metric can be expressed as:

ScoreBD = P (G,D) = P (G)×n∏i=1

qi∏j=1

(ri − 1)!

(Nij + ri − 1)!

ri∏k=1

Nijk!

(2)

Where:qi: the number of possible values that the parent of Xi cantake.ri: the number of possible values that Xi can take.Nijk: the number of cases in the dataset D in which Xi

takes value k of its Xi instance and its parent has its jth

value.Nij : the sum of all Nijk for all values that Xi can take.Assuming an uniform prior for P (G) we get the K2 metric:

ScoreBD =

n∏i=1

qi∏j=1

(ri − 1)!

(Nij + ri − 1)!

ri∏k=1

Nijk! (3)

In general, the logarithm of this equation is usually em-ployed because it is handy to be calculated numerically.This score is decomposable because it can be written as aproduct of individual node scores according to a node and hisparents. Clearly, the decomposable score can be expressedby the following equation:

g(i, Pa(i)) =

qi∏j=1

(ri − 1)!

(Nij + ri − 1)!

ri∏k=1

Nijk! (4)

The K2 Algorithm is very powerful if one can determinethe topological ordering of the nodes. Unfortunately, this is

K2 AlgorithmInput: A set of n nodes, an ordering on the nodes, anupper bound u on the number of parents a node mayhave, and a database D containing m cases.Output: For each node, a printout of the parents of thenode.for i = 1→ n doπi = ∅;Pold = g(i, πi);OKTOProceed := true;while OKTOProceed and | πi | < u do

let z be the node in Pred(xi)−πi that maximizesg(i, πi ∪ {z});Pnew := g(i, πi ∪ {z});if Pnew > Pold thenPold := Pnew;πi := πi ∪ {z}

elseOKTOProceed := false

end ifend while

write(Node: xi ,Parents of xi : πi )end for

Algorithm 1. K2 Algorithm

not usually possible. To solve this problem, we propose anovel bio-inspired method to find such ordering, based onthe Particle Swarm Optimization paradigm.

IV. CONSTRUCTING BAYESIAN NETWORKS USING PSO

A. Basic Particle Swarm Optimization principles

PSO is a meta−heuristic method for optimization pro-posed by Eberhart and Kennedy [17], and it is inspired fromthe behavior of flocks of birds and schools of fish in getting afood source. Originally a flock of birds may begin in randomdirections searching for food. Then, every bird moves alongits path, it can discover food in different places. Therefore,this bird memorizes its ’personal best’ position where foundthe most quantity of food. In the same time, his displacementfrom a position to another takes in consideration the ’globalbest’ position funded by the whole swarm. Via this process,a flock of birds will head towards the best location that hasthe largest amount of food.

The PSO algorithm explains this behavior by means of analgorithm for optimization. This approach consists of a setof particles, which are equivalents to the birds. Each particlehas a set of attributes: current velocity, current position, thebest position discovered by the particle so far and the bestposition discovered its neighbors so far, this is the localversion of PSO (lbest). There is another version of PSOcalled global version (gbest) [18] in which all the particles

are considered neighbors of each others. In order to findan optimal or near-optimal solution to the problem, PSOupdates each generation of particles, the information aboutthe best solution obtained by every particle and by theentire swarm. All particles start with randomly initializedvelocities and positions. Then, each dth dimension of thenew velocity of the ith particle is calculated by using thefollowing equation:

vt+1id = wvtid + c1r1(pid − xtid) + c2r2(pgd − xtid) (5)

Where w is the inertia weight [18], c1 and c2 are randomnumbers called learning factor or acceleration factor, r1 andr2 are random values in the range of [0, 1] , pg is the bestposition found so far by all particles and pi is the bestposition discovered so far by the corresponding particle.From (5), we can notice that the displacement of particle isinfluenced by the three following components:• A physical component wvtid: the particle tend to follow

its current direction of movement;• A cognitive component c1r1(pid − xid): the particle

tends to head for the position by which it alreadycrossed;

• A social component c2r2(pgd−xtid): the particle tendsto move towards the best location already attainted byits neighbors;

Once the new particle’s velocity is calculated, so this particlecan move from his current position to another which it maybe defined by the following equation:

xt+1id = vt+1

id + xtid (6)

At each iteration of the algorithm, each particle is movedaccording to (5) and (6). Once the displacement of the par-ticles is effected, the new positions are evaluated. Thereforepi and gi are then updated. The algorithm iterates until astopping condition is verified or reaches a maximum numberof evaluations or iterations.Particle Swarm Optimization is simpler, both in formulationand computer implementation than GA. So, thanks to itseasy concept, this method was used in various applications indifferent fields, in order to resolve continuous optimizationproblems. For that, many researches take up the challengeto apply this technique for certain combinatorial/discreteoptimization problems.For the problem of learning the structure of RB, PSO wasrecently adopted in this field. In our knowledge, there arefew works, dedicated to deriving the optimal structure of theRB, that adopt the PSO algorithm. These studies try to findthe optimal structure by searching in the full space of graphs.They used the binary version of PSO in which every position

of particle corresponds to a DAG which is represented by abinary matrix. The results show that this method providegood performances compared to genetic algorithm [19].Hence, we have the idea to apply a novel discrete PSOalgorithm in the interest of finding an optimal ordering ofthe variables rather than investigating the full space of BNstructures. We believe that presenting the particle positionvector by an ordering nodes can improve the convergenceof learning the BN structure and give better results.

B. Constructing Bayesian networks using PSO

Our PSO-K2 algorithm applies the PSO search techniqueto look for the best ordering. Each ordering is evaluatedby applying the K2 algorithm and finding the DAG whichmaximizes the score. So the best global structure, havingthe highest score, is returned as final result. In what followswe will present the different components of our algorithm:

Position of particles: In this work, the position ofa particle present the node ordering that will be an inputparameter to the K2 algorithm. This ordering is encoded asa vector X = [x1, x2, ..., xD], such that D is the numberof variables. The elements of this vector, called positionvectors, are integers included in the [1, D]. Each componentxi has the rank of the ith variable in the order in question.

Node d 1 2 3 4 5Position 4 1 3 5 2

Ordering node 2 5 3 1 4

Figure 2. Position values and ordering nodes corresponding

We notice from figure 2 that the first rank is taken by thenode number 2 and the second is taken by the variablenumber 5 and so on.

Velocity of particles: V ti is the velocity of the particlei at t iteration. It can be defined by a 1 × D vectorwhose elements are real numbers belonging to the interval[−Vmax, Vmax].

Velocity updating: To avoid converging to local optima,we proposed to combine the two variants of PSO (lbest andgbest). So each particle will be influenced by:

• Pi = [pi1, pi2, ..., piD]: the best personal position(pbest) found by the particle up to now.

• Pl = [pl1, pl2, ..., plD]: the best local position (lbest)obtained in the local neighborhood of the particle .

• Pg = [pg1, pg2, ..., pgD]: the best global position (gbest)achieved so far in the whole swarm.

Therefore, the new velocity of each particle is calculated asfollows:

vt+1id = c1v

tid+ c2(pid−xtid)+ c3(pld−xtid)+ c4(pgd−xtid)

(7)

The coefficients c1, c2, c3, c4 belong to the interval [0, 1]where c1 + c2 + c3 + c4 = 1.to avoid an arbitrary choice of these values, we tried todetermine them automatically according to the score andposition of each particle. For this, we fixed the parameter c1and the others are calculated as follows:

c2 =|scorep−scorei|

(Pi−Xi)+γ+ υ

c3 = |scorel−scorei|(Pl−Xi)+γ

+ υ

c4 =|scoreg−scorei|

(Pg−Xi)+γ+ υ

where:• γ, υ > 0.• scorei, scorep, scorel and scoreg represent the scores

of Xi, Pi Pl and Pg .• Pi −Xi is the number of similar components betweenPi and Xi. Same for Pl −Xi and Pg −Xi.

Position updating: The results of the equation (vt+1id +

xtid) are continuous values. So, to have an appropriatecorrespondence between these values and the position ofeach particle, we use a scheduling rule. We proposed toassign the first rank to the variable with the smallest positionvalue. Then put the variable with the second smallest valuein the second rank, and so on. The following figure illustratesthe principle of the rule.

Dimension d 1 2 3 4 5vt+1id + xtid 0.4 -2.6 1.8 -5.5 0.23

xt+1id (NewPosition) 4 2 5 1 3

Figure 3. The update of the position values

Neighborhood topology: In order to determine theneighbors of a particle, we issued provided that two particlescan not be considered as neighbors only if they check outthe following two constraints:• The number of nodes having the same rank in their

respective positions is more than Tpos.• The difference between their scores does not exceed a

certain threshold Tscore.Swap Operator: We propose to include a mutation

operator in our approach. Mutation can maintain the varietyin the population of particles by preventing the positionsin the swarm to become too similar. This operator consiststo generating two nodes, with a low probability Pmut, andexchanging their values.

Dimension d 1 2 3 4 5Position value 4 2 1 5 3

⇓Dimension d 1 2 3 4 5

Position value 4 5 1 2 3

Figure 4. Swap Operator

PSO-K2 AlgorithmInitialize the entire swarmrepeat

for each particle i doSearch all neighborsUpdate the local best position PlUpdate the velocityUpdate the positionMutationApply K2 algorithm to calculate the new positionand the AIC score of the resulting graphUpdate the personal best position PiUpdate the global best position Pg

end foruntil Stopping criterion is satisfiedReturn the best structure obtained by the overall bestposition Pg

Algorithm 2. PSO-K2 Algorithm

V. EXPRIMENTAL RESULTS

To study the performance of the proposed method, weperformed experiments on different data sets by varying theirsize and comparing the proposed approach to others methodslike the HC (HillClimber algorithm) and the GAWK (Ge-netic Algorithm Wrapper for K2 described in [20]). For thiscomparison, we adopted three data-sets: The Asia network[21] which is a simple network with 8 binary nodes, the Carnetwork [22], which is an 18 variables network and finallythe Alarm network [23] which consists of 37 nodes.

We have generated an initial population of particles (po-sitions and velocities) randomly. The algorithm was im-plemented with Java. To evaluate the Bayesian networksgenerated, we adopted two evaluation criteria: The logarithmof AIC score function and the structural differences betweenthe obtained structure and the original graph.

We launched each algorithm five times and we calculatedthe average score obtained by each algorithm and its bestscore found. The results obtained for Asia, Car and Alarmnetworks are presented, respectively, in Tables I, II and III.By analyzing Table I, we note that the networks foundby PSO-K2 have high scores compared to those obtainedby HC and GAWK algorithms. Similarly, the experimentscarried out on the Car and Alarm networks (see Table II

and Table III) show that the proposed method give the bestperformances whatever the size of the database. We canthus conclude that our approach is more efficient for largedatabases.

Asia500 Asia1000 Asia2500 Asia5000AVG BEST AVG BEST AVG BEST AVG BEST

GAWK -1130,74-1125,85-2262,98-2260,98-5572,79-5566,30-11047,81-11043,40HC -1129,70-1127,11-2256,39-2248,05-5562,73-5553,49-11050,45-11048,11

PSO-K2-1121,99-1121,46-2245,30-2245,30-5552,15-5551,90-11037,14-11036,85

Table IAIC SCORES OF ASIA NETWORK

Car1000 Car2500 Car5000 Car10000AVG BEST AVG BEST AVG BEST AVG BEST

GAWK -2515,93-2461,87-5826,56-5808,04-11467,86-11451,85-23013,39-22992,73HC -2483,07-2479,00-5847,56-5841,58-11498,15-11480,55-23024,02-23007,00

PSO-K2-2452,69-2451,84-5801,18-5797,99-11450,65-11447,93-22968,23-22964,29

Table IIAIC SCORES OF CAR NETWORK

Alarm1000 Alarm2500 Aalarm5000 Alarm10000AVG BEST AVG BEST AVG BEST AVG BEST

GAWK -10753,08-10582,39-26059,64-25536,62-49295,42-48740,59-96970,62-96141,17HC -10654,91-10615,35-25153,32-25132,68-48686,16-48560,13-95724,07-95340,95

PSO-K2-10421,01-10365,49-24920,38-24854,66-48479,35-48424,19-95255,86-94925,23

Table IIIAIC SORES OF ALARM NETWORK

Datasets used in our paper were considered as mainbenchmarks, they have well-defined structures. So, for thisreason we compared each obtained network (with the highestscore) to the reference structure by calculating the numberof correct C, reversed R, added A and deleted D edges.

Results in table IV show that our method showed promis-ing results compared to other methods. Indeed, PSO-K2arrives to discover more than 60% of exact links to theAlarm network and more than 75% for both Aisa and Carbenchmarks. By comparing the number of reversed edgesfound by different algorithms, we note that PSO-K2 reversedthe least arcs. We note also, that the graphs learned by ouralgorithm have the low number of absent arcs. Finally, wecan deduce, that the structures established by PSO-K2 aresimple and less complex. In fact, superfluous arcs in thesegraphs are lower in comparison to those added by othersalgorithms.

VI. CONCLUSION

In this paper, we have proposed new search and scorealgorithm for learning BN structures. This method utilizesthe PSO metaheuristic to guide the search of node orderings.

Asia2500 Asia5000 Car5000 Car10000 Alarm5000 Alarm1000

C R A D C R A D C R A D C R A D C R A D C R A D

GAWK 4 3 5 1 3 5 3 0 13 0 2 4 20 3 7 1 20 19 21 7 22 21 34 5

HC 5 2 1 1 1 7 4 0 6 8 5 3 25 8 11 1 26 15 25 5 23 21 30 2

PSO-K2 6 2 1 0 7 1 1 0 14 0 1 3 28 3 3 1 29 16 12 2 29 16 21 1

Table IVSTRUCTURAL DIFFERENCES OF THREE NETWORKS

We applied our approach to three different benchmarksdatasets of varying their sizes. We also compared ourmethod to two existing approaches (HC and GAWKalgorithms). Two measures of performance were used inthe comparison, including the AIC score and the structuraldifferences between the learned and the original networks.The experimental results are encouraging and prove thatthe PSO-K2 algorithm was much more effective than othersalgorithms.

In the future work, we plan to utilize PSO-K2 withother functions scores. In addition, we hope to evaluate thistechnique for other datasets. Furthermore, we want to studythe learning structure problem with missing data.

REFERENCES

[1] F. V. Jensen, “Bayesian networks and decision graphs,”Springer-Verlag New York, Inc, 2001.

[2] J. Pearl, Reasoning in Intelligent Systems: Networks of Plau-sible Inference. Morgan Kaufmann, 1997.

[3] K. B. Laskey and P. C. G. da Costa, “Uncertainty representa-tion and reasoning in complex systems,” in Complex Systemsin Knowledge-based Environments, 2009, pp. 7–40.

[4] D. G. David M. Chickering and D. Heckerman, “Learningbayesian networks is np-hard,” Microsoft Research, Tech.Rep. MSR-TR-94-17, 1994.

[5] K. Murphy, “A Brief Introduction to Graphical Mod-els and Bayesian Networks,” http://www.cs.ubc.ca/ mur-phyk/Bayes/bnintro.html, 1998.

[6] P. Spirtes, C. Glymour, and R. Scheines, Causation, Predic-tion, and Search, 2nd Edition, 1st ed. The MIT Press, 2001,vol. 1.

[7] J. Pearl and T. Verma, “A theory of inferred causation,”in Principles of Knowledge Representation and Reasoning,1991, pp. 441–452.

[8] I. Tsamardinos, L. E. Brown, and C. F. Aliferis, “The max-min hill-climbing bayesian network structure learning algo-rithm,” in Machine Learning, 2006.

[9] T. Wang, J. W. Touchman, and G. Xue, “Applying two-levelsimulated annealing on bayesian structure learning to infer ge-netic networks,” in In Proceedings of the IEEE ComputationalSystems Bioinformatics Conference, 2004, pp. 647–648.

[10] G. F. Cooper and T. Dietterich, “A bayesian method for theinduction of probabilistic networks from data,” in MachineLearning, 1992, pp. 309–347.

[11] P. Larraaga, R. Murga, M. Poza, and C. Kuijpers, “Structurelearning of bayesian networks by hybrid genetic algorithms,”in AI and Statistics V, Lecture Notes in Statistics 112.Spriger-Verlag, 1995, pp. 165–174.

[12] L. M. D. Campos, J. M. Fernndez-luna, J. A. Gmez, andJ. M. Puerta, “Ant colony optimization for learning bayesiannetworks,” International Journal of Approximate Reasoning,vol. 31, pp. 291–311, 2002.

[13] J. Cowie, L. Oteniya, and R. Coles, “Particle swarm optimi-sation for learning bayesian networks,” in World Congress onEngineering, 2007, pp. 71–76.

[14] G. Schwarz, “Estimating the dimension of a model,” Annalsof Statistics, vol. 6, pp. 461–464, 1978.

[15] L. M. D. Campos and J. F. Huete, “A new approach forlearning belief networks using independence criteria,” Inter-national Journal of Approximate Reasoning, vol. 24, pp. 11–37, 2000.

[16] W. Lam and F. Bacchus, “Learning bayesian belief networks:An approach based on the mdl principle,” ComputationalIntelligence, vol. 10, pp. 269–293, 1994.

[17] J. Kennedy and R. C. Eberhart, “Particle swarm optimization,”in Proceedings of the IEEE International Conference onNeural Networks, 1995, pp. 1942–1948.

[18] F. V. D. Bergh, “An analysis of particle swarm optimizers,”Nature, 2001.

[19] X.-L. Li, “A particle swarm optimization and immune theory-based algorithm for structure learning of bayesian networks,”International Journal of Database Theory and Application,vol. 3, 2010.

[20] W. H. Hsu, H. Guo, B. B. Perry, and J. A. Stilson, “A per-mutation genetic algorithm for variable ordering in learningbayesian networks from data.” in GECCO. Morgan Kauf-mann, 2002, pp. 383–390. [Online]. Available: http://dblp.uni-trier.de/db/conf/gecco/gecco2002.htmlHsuGPS02

[21] S. L. Lauritzen and D. J. Spiegelhalter, Local computationswith probabilities on graphical structures and their applica-tion to expert systems. Morgan Kaufmann Publishers Inc.,1990, pp. 415–448.

[22] “Bayesian network software,” http://www.bayesia.com.

[23] I. Beinlich, G. Suermondt, R. Chavez, and G. Cooper,“The ALARM monitoring system: A case study with twoprobabilistic inference techniques for belief networks,” inProceedings of the 2nd European Conference on AI andMedicine. Springer-Verlag, 1989, pp. 247–256.