lecture 17 slides may 30 th , 2006

41
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilme s Page 1 University of Washington Department of Electrical Engineering EE512 Spring, 2006 Graphical Models Jeff A. Bilmes <[email protected]> Jeff A. Bilmes <[email protected]> Lecture 17 Slides May 30 th , 2006

Upload: stone-combs

Post on 02-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

University of Washington Department of Electrical Engineering EE512 Spring, 2006 Graphical Models Jeff A. Bilmes . Lecture 17 Slides May 30 th , 2006. Announcements. READING: M. Jordan: Chapters 13,14,15 (on Gaussians and Kalman) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 1

University of WashingtonDepartment of Electrical Engineering

EE512 Spring, 2006 Graphical Models

Jeff A. Bilmes <[email protected]>Jeff A. Bilmes <[email protected]>

Lecture 17 Slides

May 30th, 2006

Page 2: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 2

• READING: – M. Jordan: Chapters 13,14,15 (on Gaussians and Kalman)

• Reminder: TA discussions and office hours:– Office hours: Thursdays 3:30-4:30, Sieg Ground Floor

Tutorial Center– Discussion Sections: Fridays 9:30-10:30, Sieg Ground Floor

Tutorial Center Lecture Room

• No more homework this quarter, concentrate on final projects!!

• Makeup class, tomorrow Wednesday, 5-7pm, room TBA (watch email).

Announcements

Page 3: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 3

• L1: Tues, 3/28: Overview, GMs, Intro BNs.• L2: Thur, 3/30: semantics of BNs + UGMs• L3: Tues, 4/4: elimination, probs, chordal I• L4: Thur, 4/6: chrdal, sep, decomp, elim• L5: Tue, 4/11: chdl/elim, mcs, triang, ci props.• L6: Thur, 4/13: MST,CI axioms, Markov prps.• L7: Tues, 4/18: Mobius, HC-thm, (F)=(G)• L8: Thur, 4/20: phylogenetic trees, HMMs• L9: Tue, 4/25: HMMs, inference on trees• L10: Thur, 4/27: Inference on trees, start poly

• L11: Tues, 5/2: polytrees, start JT inference• L12: Thur, 5/4: Inference in JTs• Tues, 5/9: away• Thur, 5/11: away• L13: Tue, 5/16: JT, GDL, Shenoy-Schafer• L14: Thur, 5/18: GDL, Search, Gaussians I• L15: Mon, 5/22: laptop crash • L16: Tues, 5/23: search, Gaussians I• L17: Thur, 5/25: Gaussians• Mon, 5/29: Holiday• L18: Tue, 5/30• L19: Thur, 6/1: final presentations

Class Road Map

Page 4: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 4

• L1: Tues, 3/28: • L2: Thur, 3/30:• L3: Tues, 4/4: • L4: Thur, 4/6:• L5: Tue, 4/11:• L6: Thur, 4/13:• L7: Tues, 4/18:• L8: Thur, 4/20: Team Lists, short abstracts I• L9: Tue, 4/25:• L10: Thur, 4/27: short abstracts II• L11: Tues, 5/2:

• L12: Thur, 5/4: abstract II + progress• L--: Tues, 5/9• L--: Thur, 5/11: 1 page progress report• L13: Tue, 5/16:

• L14: Thur, 5/18: 1 page progress report• L15: Tues, 5/23• L16: Thur, 5/25: 1 page progress report• L17: Tue, 5/30: Today• L18: Wed, 5/31:• L19: Thur, 6/1: final presentations

• L20: Tue, 6/6 4-page papers due (like a conference paper), Only .pdf versions accepted.

Final Project Milestone Due Dates

• Team lists, abstracts, and progress reports must be turned in, in class and using paper (dead tree versions only).

• Final reports must be turned in electronically in PDF (no other formats accepted).

• No need to repeat what was on previous progress reports/abstracts, I have those available to refer to.

• Progress reports must report who did what so far!!

Page 5: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 5

• Gaussian Graphical Models

Summary of Last Time

Page 6: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 6

• Other forms of inference.• Structure learning in graphical models

Outline of Today’s Lecture

Page 7: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 7

Books and Sources for Today

• Jordan chapters 13-15• Other references contained in presentation …

Page 8: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 8

Graphical Models

1. We start with some probability distribution P1. Could be specified as a given, or more likely we have training data of

some number of samples. Goal is to learn P or some approximation to it (training) and then use P in some way (inference for making decisions, such as most probable assignment, max-product semi-ring, etc.)

2. The graph =(,) represents “structure” in P

3. Graph can provide efficient representation and computational inference for

4. There can be multiple graphs that represent a given (e.g., complete graph represents all ).

5. Goal: find computationally cheap exact or approximate graph cover for 6. Once we do this, we just compute probabilities using the junction tree

algorithm or search algorithm, etc.

Page 9: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 9

Graphical Models & Tree-width

1. The complexity parameter for G=(V,E)

2. Def: k-tree: k-nodes, clique of size k. n>k nodes, connect nth node to previous k fully connected nodes

3. Example: 4-tree

note: all separators are of size 4

4-tree with 4 nodes4-tree with 5 nodes4-tree with 6 nodes

Page 10: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 10

Graphical Models & Tree-width

1. Def: partial k-tree: any sub-graph of a k-tree

2. Def: tree-width of a graph G is smallest k such that G is a partial k-tree.

3. Thm: The tree-width decision problem is NP-complete1. We mentioned this before, proven by Arnborg,

4. Thm: exact probabilistic inference (computing probabilities, etc.) is exponential in the tree-width

1. Time-space tradeoffs can help here, but what if all of the points in the achievable region are intolerably computationally expensive?

5. The big question, what if exact inference is too expensive?

Page 11: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 11

When exact inference is too expensive

1. Two general approaches: either an exact solution to an approximate problem, or an approximate solution to an exact problem.

2. Exact solution to approximate problem1. Structure learning: find a low tree-width (or “cheap” in some way)

graphical model that is still “high-quality” in some way, and then perform exact inference on the approximate model.

2. This can be easy or hard depending on the tree-width and on the measure of “high-quality”, and on the learning paradigm.

3. Approximate solution to an exact problem1. Approximate inference, tries to approximate in some way what

must be computed: Loopy Belief propagation, Sampling/Pruning, Variational/Mean-field, and hybrids between the above

Page 12: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 12

Finding k-trees

1. How do we score a k-tree?1. Maximum likelihood, or conditional score

2. May we assume that truth itself is a k-tree1. Sometimes simplifications can be made if we assume that truth is

part of a known model class, such as a k-tree for some fixed constant k independent of n=|V|, the number of nodes.

3. How to find best 1-tree?

Page 13: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 13

Finding 1-trees

1. Given P, goal is to find best 1-tree approximation of P in a maximum likelihood sense.

Page 14: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 14

Finding 1-trees

Page 15: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 15

Finding 1-trees

Page 16: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 16

Finding 1-trees

Page 17: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 17

Finding 1-trees

Page 18: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 18

Finding 1-trees

Page 19: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 19

Finding 1-trees

Page 20: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 20

Plethora of negative results

• Chickering1996, Chickering/Meek/Heckerman2003: learning Bayesian networks in ML sense is NP-hard (“is there a BN with fixed upper bound on in-degree that achieves a given ML score?”)

• Dasgupta1999: learning polytrees in ML sense is NP-hard (“is there a poly-tree with fixed upper-bound in-degree with given ML score?”) and worse, there is constant c such that NP-complete to decide if there is polytree with score <= c*OPT_score.

• Meek2001: learning even a path (sub-class of trees) in ML sense is NP-hard.

Page 21: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 21

Plethora of negative results

• Srebro/Karger2001: learning k-trees in ML sense is hard.• So, generative model structure learning is likely to be a

difficult problem (unless k=1, or P=NP).• We next spend a bit of time talking about the Srebro/Karger

result.

Page 22: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 22

Optimal ML k-trees is NP-complete

Page 23: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 23

Optimal ML k-trees is NP-complete

Page 24: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 24

Optimal ML k-trees is NP-complete

Page 25: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 25

Optimal ML k-trees is NP-complete

Page 26: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 26

Optimal ML k-trees is NP-complete

Page 27: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 27

Optimal ML k-trees is NP-complete

Page 28: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 28

Some good news …

• PAC framework: key difference, assume graph is in concept class (learn the class of k-trees). This means that if we have sampled data, we assume that the sampled data is from truth which itself is a k-tree.

• Hoeffgen’93: Can robustly (polynomial samples in n, 1/ 1/) PAC learn bounded tree-width graphical models, and can robustly and efficiently (algorithm polynomial in same) PAC learn 1-trees.

• Narasimhan&Bilmes2004: Can robustly and efficiently PAC learn bounded tree-width graphical models.

Page 29: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 29

More good news …

• Abbeel,Koller,Ng2005: Can robustly and efficiently PAC learn bounded-degree factor graphs

– note: this does not have complexity guarantee. E.g., x grids have bounded degree but not tree-width. Star has unbounded degree but bounded tree-width. Tree-width crucial for computation in general.

Page 30: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 30

How to PAC-learn such graphs …

• Mutual information is symmetric submodular

Page 31: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 31

How to PAC-learn such graphs …

• Submodularity and Optimization

(Narisimhan&Bilmes,2004)

Page 32: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 32

Another positive result

• Since mutual information is symmetric-submodular, we can find optimal partitions:

• where• This has implications for clustering (Narishamhan,Jojic,Bilmes’05) and

also for structure learning (can find optimal 1-step graph decomposition by finding the optimal k-separator).

Page 33: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 33

Finding ML decompositions …

• Optimal to one level

Page 34: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 34

Discriminative structure

• Goal might be classification using a generative model.

• Distinction between parameters & structure• Two possible goals:

– 1) find one global structure that classifies well– 2) find class-specific structure (one per class)

• In either case, finding a good discriminative structure may render discriminative parameter learning less necessary.

Page 35: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 35

Optimal discriminative structure procedure …

• choose (for now, lets just assume =1)• Find tree that best satisfies:

Page 36: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 36

Properties

• Options:– can fix structure and train parameters using either maximum

likelihood (generative) or maximum conditional likelihood (discriminative)

– Can learn discriminative structure, and can train either generatively or discriminatively

– In all cases, assume appropriate regularization.

• Bad news: KL-divergence not decomposable w.r.t. tree in the discriminative case.

• Goal: identify a local discriminative measure on edges in a graph (analogous to mutual information for generative case).

Page 37: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 37

EAR measure

• EAR (explaining away residual) measure.

(Bilmes,’98)• Goal is to maximize EAR:

– Intuition: dependence class-conditionally, but otherwise independent

• EAR is approximation to expected log conditional posterior. Exact for independent “auxiliary” variables.

Page 38: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 38

Conditional mutual information?

• Conditional mutual information is not guaranteed to discriminate well.

• Building a MST using (;|) as edge weights will not necessarily produce a tree with good classification properties. EAR fixes this in certain cases.

• Example: 3 features (,,) and a class

Page 39: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 39

Generative training/structure

Page 40: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 40

Generative training/structure

Page 41: Lecture 17 Slides May 30 th , 2006

Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 41

General Structure Learning