introduction to bayesian learning -...

36
Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa [email protected] Apprendimento Automatico: Fondamenti - A.A. 2016/2017

Upload: others

Post on 23-Mar-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Course InformationIntroduction

Introduction to Bayesian Learning

Davide Bacciu

Dipartimento di InformaticaUniversità di [email protected]

Apprendimento Automatico: Fondamenti - A.A. 2016/2017

Page 2: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Course InformationIntroduction

Outline

Lecture 1 - IntroductionProbabilistic reasoning and learning(Bayesian Networks)

Lecture 2 - Parameters Learning (I)Learning fully observed Bayesian models

Lecture 3 - Parameters Learning (II)Learning with hidden variables

If we have time, we will cover also some application examplesof Bayesian learning and Bayesian Networks

Page 3: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Course InformationIntroduction

Course Information

Reference Webpage:

http://pages.di.unipi.it/bacciu/teaching/apprendimento-automatico-fondamenti/

Here you can findCourse informationLecture slidesArticles and course materials

Office HoursWhere? Dipartimento di Informatica - 2nd Floor - Room

367When? Monday 17-19When? Basically anytime, if you send me an email

beforehand.

Page 4: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Course InformationIntroduction

Lecture Outline

1 Introduction

2 Probability TheoryProbabilities and Random VariablesBayes Theorem and Independence

3 Learning as Bayesian InferenceHypothesis SelectionBayesian Classifiers

4 Bayesian NetworksRepresentationLearning and Inference

Page 5: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Course InformationIntroduction

Bayesian Learning

Why Bayesian? Easy, because of frequent use of Bayestheorem...Bayesian Inference A powerful approach to probabilistic

reasoningBayesian Networks An expressive model for describing

probabilistic relationshipsWhy bothering?

Real-world is uncertainData (noisy measurements and partial knowledge)Beliefs (concepts and their relationships)

Probability as a measure of our beliefsConceptual framework for describing uncertainty in worldrepresentationLearning and reasoning become matters of probabilisticinferenceProbabilistic weighting of the hypothesis

Page 6: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Probability TheoryLearning as Bayesian Inference

Part I

Probability and Learning

Page 7: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Probability TheoryLearning as Bayesian Inference

Probabilities and Random VariablesBayes Theorem and Independence

Random Variables

A Random Variable (RV) is a function describing theoutcome of a random process by assigning unique valuesto all possible outcomes of the experiment

Random Process =⇒ Coin Toss

Discrete RV =⇒ X =

{0 if heads1 if tails

The sample space S of a random process is the set of allpossible outcomes, e.g. S = {heads, tails}An event e is a subset e ∈ S, i.e. a set of outcomes, thatmay occur or not as a result of the experiment

Random variables are the building blocks for representing ourworld

Page 8: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Probability TheoryLearning as Bayesian Inference

Probabilities and Random VariablesBayes Theorem and Independence

Probability Functions

A probability function P(X = x) ∈ [0,1] (P(x) in short)measures the probability of a RV X attaining the value x ,i.e. the probability of event x occurringIf the random process is described by a set of RVsX1, . . . ,XN , then the joint conditional probability writes

P(X1 = x1, . . . ,XN = xn) = P(x1 ∧ · · · ∧ xn)

Definition (Sum Rule)Probabilities of all the events must sum to 1∑

x

P(X = x) = 1

Page 9: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Probability TheoryLearning as Bayesian Inference

Probabilities and Random VariablesBayes Theorem and Independence

Product Rule and Conditional Probabilities

Definition (Product Rule a.k.a. Chain Rule)

P(x1, . . . , xi , . . . , xn|y) =N∏

i=1

P(xi |x1, . . . , xi−1, y)

P(x |y) is the conditional probability of x given yReflects the fact that the realization of an event y mayaffect the occurrence of xMarginalization: sum and product rules together yield thecomplete probability equation

P(X1 = x1) =∑x2

P(X1 = x1,X2 = x2)

=∑x2

P(X1 = x1|X2 = x2)P(X2 = x2)

Page 10: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Probability TheoryLearning as Bayesian Inference

Probabilities and Random VariablesBayes Theorem and Independence

Bayes Rule

Given hypothesis hi ∈ H and observations d

P(hi |d) =P(d|hi)P(hi)

P(d)=

P(d|hi)P(hi)∑j P(d|hj)P(hj)

P(hi) is the prior probability of hi

P(d|hi) is the conditional probability of observing d giventhat hypothesis hi is true (likelihood).P(d) is the marginal probability of dP(hi |d) is the posterior probability that hypothesis is truegiven the data and the previous belief about thehypothesis.

Page 11: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Probability TheoryLearning as Bayesian Inference

Probabilities and Random VariablesBayes Theorem and Independence

So, Is This All About Bayes???

Frequentists Vs Bayesians

Figure: Credit goes to Mike West @ Duke University

Page 12: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Probability TheoryLearning as Bayesian Inference

Probabilities and Random VariablesBayes Theorem and Independence

Independence and Conditional Independence

Two RV X and Y are independent if knowledge about Xdoes not change the uncertainty about Y and vice versa

I(X ,Y )⇔ P(X ,Y ) = P(X |Y )P(Y )

= P(Y |X )P(X ) = P(X )P(Y )

Two RV X and Y are conditionally independent given Z ifthe realization of X and Y is an independent event of theirconditional probability distribution given Z

I(X ,Y |Z )⇔ P(X ,Y |Z ) = P(X |Y ,Z )P(Y |Z )

= P(Y |X ,Z )P(X |Z ) = P(X |Z )P(Y |Z )

Page 13: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Probability TheoryLearning as Bayesian Inference

Probabilities and Random VariablesBayes Theorem and Independence

Representing Probabilities with Discrete Data

Joint Probability Distribution Table

X1 . . . Xi . . . Xn P(X1, . . . ,Xn)

x′

1 . . . x′

i . . . x′n P(x

1, . . . , x′n)

x l1 . . . x l

i . . . x ln P(x l

1, . . . , xln)

Describes P(X1, . . . ,Xn) for all the RV instantiations x1, . . . , xn

In general, any probability of interest can be obtained startingfrom the Joint Probability Distribution P(X1, . . . ,Xn)

Page 14: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Probability TheoryLearning as Bayesian Inference

Hypothesis SelectionBayesian Classifiers

Wrapping Up....

We know how to represent the world and the observationsRandom Variables =⇒ X1, . . . ,XNJoint Probability Distribution =⇒ P(X1 = x1, . . . ,XN = xn)

We have rules for manipulating the probabilistic knowledgeSum-ProductMarginalizationBayesConditional Independence

It is about time that we do some learning...in other words, lets discover the values forP(X1 = x1, . . . ,XN = xn)

Page 15: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Probability TheoryLearning as Bayesian Inference

Hypothesis SelectionBayesian Classifiers

Bayesian Learning

Statistical learning approaches calculate the probability ofeach hypothesis hi given the data D, and selectshypotheses/makes predictions on that basisBayesian learning makes predictions using all hypothesesweighted by their probabilities

P(X |D = d) =∑

i

P(X |d,hi)P(hi |d)

=∑

i

P(X |hi) · P(hi |d)

New prediction Hypothesis prediction Posterior weighting

Page 16: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Probability TheoryLearning as Bayesian Inference

Hypothesis SelectionBayesian Classifiers

Single Hypothesis Approximation

Computational and Analytical Tractability IssueBayesian Learning requires a (possibly infinite) summation overthe whole hypothesis space

Maximum a-Posteriori (MAP) predicts P(X |hMAP) using themost likely hypothesis hMAP given the training data

hMAP = arg maxh∈H

P(h|d) = arg maxh∈H

P(d|h)P(h)P(d)

= arg maxh∈H

P(d|h)P(h)

Assuming uniform priors P(hi) = P(hj), yields theMaximum Likelihood (ML) estimate P(X |hML)

hML = arg maxh∈H

P(d|h)

Page 17: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Probability TheoryLearning as Bayesian Inference

Hypothesis SelectionBayesian Classifiers

All Too Abstract?

Let’s go to the Cinema!!!

How do I choose the next movie(prediction)?I might ask my friends for their favoritechoice given their personal taste(hypothesis)Select the movie

Bayesian advice? Make a voting from allthe friends’ suggestions weighted by theirattendance to cinema and tasteMAP advice? From the friend who goesoften to the cinema and whose taste ItrustML advice? From the friend who goesmore often to the cinema

Page 18: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Probability TheoryLearning as Bayesian Inference

Hypothesis SelectionBayesian Classifiers

The Candy Box Problem

A candy manufacturer produces 5 types of candy boxes(hypothesis) that are indistinguishable in the darkness ofthe cinema

h1 100% cherry flavorh2 75% cherry and 25% lime flavorh3 50% cherry and 50% lime flavorh4 25% cherry and 75% lime flavorh5 100% lime flavor

Given a sequence of candies d = d1, . . . ,dN extracted andreinserted in a box (observations), what is the most likelyflavor for the next candy (prediction)?

Page 19: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Probability TheoryLearning as Bayesian Inference

Hypothesis SelectionBayesian Classifiers

Candy Box ProblemHypothesis Posterior

First, we need to compute the posterior for eachhypothesis (Bayes)

P(hi |d) = αP(d|hi)P(hi)

The manufacturer is kind enough to provide us with theproduction shares (prior) for the 5 boxes

P(h1),P(h2),P(h3),P(h4),P(h5) = (0.1,0.2,0.4,0.2,0.1)

Data likelihood can be computed under the assumptionthat observations are independently and identicallydistributed (i.i.d.)

P(d|hi) =N∏

j=1

P(dj |hi)

Page 20: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Probability TheoryLearning as Bayesian Inference

Hypothesis SelectionBayesian Classifiers

Candy Box ProblemHypothesis Posterior Computation

Suppose that the bag is a h5 and consider a sequence of 10observed lime candies

1 | )d

d)|2P(h

d)|4P(h

P(h 3 | )d

P(h 5 | )d

dNumber of samples in

20 4 6 8 10

0.2

0.4

0.6

0.8

1P(h

Post

erio

r pr

obab

ility

of

h

Hyp d0 d1 d2h1 0.1 0 0h2 0.2 0.1 0.03h3 0.4 0.4 0.30h4 0.2 0.3 0.35h5 0.1 0.2 0.31

P(hi |d) = αP(hi)P(d = l |hi)N

Posteriors of false hypothesis eventually vanish

Page 21: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Probability TheoryLearning as Bayesian Inference

Hypothesis SelectionBayesian Classifiers

Candy Box ProblemComparing Predictions

Bayesian learning seeks

P(d11 = l |d1 = l , . . . ,d10 = l) =5∑

i=1

P(d11 = l |hi)P(hi |d)

d

Probability next candy is lime

Post

erio

r pr

obab

ility

of

h

1

0.8

0.6

0.4

0.2

108640

Number of samples in

2

Page 22: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Probability TheoryLearning as Bayesian Inference

Hypothesis SelectionBayesian Classifiers

Observations

Both ML and MAP are point estimates since they onlymake predictions based on the most likely hypothesisMAP predictions are approximately Bayesian ifP(X |d) ∼ P(X |hMAP)

MAP and Bayesian predictions become closer as moredata gets availableML is a good approximation to MAP if dataset is large andthere are no a-priori preferences on the hypotheses

ML is fully data-drivenFor large data sets, the influence of prior becomesirrelevantML has problems with small datasets

Page 23: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Probability TheoryLearning as Bayesian Inference

Hypothesis SelectionBayesian Classifiers

Regularization

P(h) introduces preference across hypothesesPenalize complexity

Complex hypotheses have a lower prior probabilityHypothesis prior embodies trade-off between complexityand degree of fit

MAP hypothesis hMAP

maxh

P(d|h)P(h) ≡ minh− log2(P(d|h))− log2P(h)

Number of bits required to specify hMAP =⇒ choosing the hypothesis that provides maximumcompressionMAP is a regularization of the ML estimation

Page 24: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Probability TheoryLearning as Bayesian Inference

Hypothesis SelectionBayesian Classifiers

Bayes Optimal Classifier

Suppose we want to classify a new observation as one of theclasses cj ∈ C

The most probable Bayesian classification is

c∗OPT = arg maxcj∈C

P(cj |d) = arg maxcj∈C

∑i

P(cj |hi)P(hi |d)

Bayesian Optimal ClassifierNo other classification method (using same hypothesis spaceand prior knowledge) can outperform a Bayesian OptimalClassifier in average.

Page 25: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Probability TheoryLearning as Bayesian Inference

Hypothesis SelectionBayesian Classifiers

Gibbs Algorithm

Bayes Optimal Classifier provides the best results, but can beextremely expensive

Use approximated methods =⇒ the Gibbs Algorithm1 Choose a hypothesis hG from H at random, by drawing it

from the current posterior probability distribution P(hi |d)2 Use hG to predict the classification of the next instance

Under certain conditions, the expected number ofmisclassifications is at most twice the expected value of aBayesian Optimal classifier

Page 26: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Probability TheoryLearning as Bayesian Inference

Hypothesis SelectionBayesian Classifiers

Naive Bayes Classifier

One of the simplest, yet popular, tools based on a strongprobabilistic assumption

Consider the settingTarget classification function f : X −→ CEach instance x ∈ X is described by a set of attributes

x =< a1, . . . ,al , . . . ,aL >

Seek the MAP classification

cNB = arg maxcj∈C

P(cj |a1, . . . ,aL)

Page 27: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Probability TheoryLearning as Bayesian Inference

Hypothesis SelectionBayesian Classifiers

Naive Bayes Assumption

The MAP classification rewrites

cNB = arg maxcj∈C

P(cj |a1, . . . ,aL)

= arg maxcj∈C

P(a1, . . . ,aL|cj)P(cj)

Naive Bayes: Assume conditional independence between theattributes al given classification cj

P(a1, . . . ,aL|cj) =L∏

l=1

P(al |cj)

Naive Bayes Classification

cNB = arg maxcj∈C

P(cj)L∏

l=1

P(al |cj)

Page 28: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Probability TheoryLearning as Bayesian Inference

Hypothesis SelectionBayesian Classifiers

Observations on Naive Bayes

Use it whenDealing with moderate or large training setsAttributes that describe instances are seeminglyconditionally independent

In reality, conditional independence is often violated......but Naive Bayes does not seem to know it. It often workssurprisingly well!!

Text classificationMedical DiagnosisRecommendation systems

Next class we’ll see how to train a Naive Bayes classifier

Page 29: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Bayesian NetworksSummary

Part II

Bayesian Networks

Page 30: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Bayesian NetworksSummary

RepresentationLearning and Inference

Representing Conditional Independence

Bayes Optimal Classifier (BOC)No independence information between RVComputationally expensive

Naive Bayes (NB)Full independence given the classExtremely restrictive assumption

AlarmAnnouncement

NeighbourCall

Radio

BurglaryEarthquake

Bayesian Network (BNs) describeconditional independence betweensubsets of RV by a graphical model

I(X ,Y |Z )⇔ P(X ,Y |Z ) = P(X |Z )P(Y |Z )

Combine a-priori informationconcerning RV dependencies withobserved data

Page 31: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Bayesian NetworksSummary

RepresentationLearning and Inference

Bayesian Network - Directed Representation

Directed Acyclic Graph (DAG)Nodes represent random variables

Shaded⇒ observedEmpty⇒ un-observed

Edges describe the conditionalindependence relationshipsEvery variable is conditionallyindependent w.r.t. itsnon-descendant, given its parents

Conditional Probability Tables (CPT) local to each nodedescribe the probability distribution given its parents

P(Y1, . . . ,YN) =N∏

i=1

P(Yi |pa(Yi))

Page 32: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Bayesian NetworksSummary

RepresentationLearning and Inference

Naive Bayes as a Bayesian Network

... ... aL

c

ala1 a2 D

L

c

al

Naive Bayes classifier can be represented as a BayesianNetworkA more compact representation =⇒ Plate NotationAllows specifying more (Bayesian) details of the model

Page 33: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Bayesian NetworksSummary

RepresentationLearning and Inference

Plate Notation

L C

D

al βj

c

π

Bayesian Naive Bayes

Boxes denote replication for anumber of times denoted by theletter in the cornerShaded nodes are observedvariablesEmpty nodes denote un-observedlatent variablesBlack seeds identify constant terms(e.g. the prior distribution π over theclasses)

Page 34: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Bayesian NetworksSummary

RepresentationLearning and Inference

Inference in Bayesian Networks

Inference - How can one determine the distribution of thevalues of one/several network variables, given the observedvalues of others?

Bayesian Networks contain all the needed information(joint probability)NP-hard problem in generalSolves easily for a single unobserved variable......otherwise

Exact inference on small DAGsApproximated inference (Variational methods)Simulate the network (Monte Carlo)

Page 35: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Bayesian NetworksSummary

RepresentationLearning and Inference

Learning with Bayesian Networks

StructureFixed Structure Fixed Variables

X YP(Y |X)

X YP(X , Y )

Dat

a Com

plet

e

Naive BayesCalculate Frequencies (ML)

Discover dependenciesfrom the data

Structure SearchIndependence tests

Inco

mpl

ete

Latent variables

EM Algorithm (ML)MCMC, VBEM (Bayesian)

Difficult Problem

Structural EM

Parameter Learning Structure Learning

Page 36: Introduction to Bayesian Learning - unipi.itpages.di.unipi.it/bacciu/wp-content/uploads/sites/12/2016/11/ml_lect1.pdf · Probabilistic reasoning and learning (Bayesian Networks) Lecture

Bayesian NetworksSummary

Take Home Messages

Bayesian techniques formulate learning as probabilisticinferenceML learning selects the hypothesis that maximizes datalikelihoodMAP learning selects the most likely hypothesis given thedata (ML Regularization)Conditional independence relations can be expressed by aBayesian network

Parameters Learning (Next Lecture)Structure Learning