a bayesian probability calculus for density matricesmanfred/pubs/c76talk.pdf · 2013-11-03 · a...

85
A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin University of California - Santa Cruz Web: .com ”manfred” The Technion, Haifa, Israel, Nov 4, 2013 1 / 85

Upload: others

Post on 27-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

A Bayesian Probability Calculus for Density Matrices

Manfred K. Warmuth and Dima Kuzmin

University of California - Santa Cruz

Web: .com ”manfred”

The Technion, Haifa, Israel, Nov 4, 2013

1 / 85

Page 2: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Outline

1 Multiplicative updates - the genesis of this research

2 Conventional and Generalized Probability Distributions

3 Conventional and generalized Bayes rule

4 Bounds, derivation, calculus

2 / 85

Page 3: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Outline

1 Multiplicative updates - the genesis of this research

2 Conventional and Generalized Probability Distributions

3 Conventional and generalized Bayes rule

4 Bounds, derivation, calculus

3 / 85

Page 4: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Multiplicative updates

A set of experts i learns something online

Maintain one weight wi per expert i

Multiplicative update

wi :=wi e−η Li

normalization

Bayes rule is special casewhen η = 1, Li := − log(P(y |Mi )), and wi = P(Mi )

P(Mi |y) :=P(Mi ) P(y |Mi )

normalization

Motivated by using a relative entropy regularization

Weight vector is uncertainty vector over experts / models

4 / 85

Page 5: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Matrix version of multiplicative updates

Maintain density matrix W as a parameter

Multiplicative update

W :=exp(logW − η L)

normalization

Motivated by using a quantum relative entropy regularization

Density matrix is uncertainty over rank 1 subspaces

Capping the eigenvalues at 1k : uncertainty over rank k subspaces

Today: What corresponds to Bayes ???

5 / 85

Page 6: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Matrix version of multiplicative updates

Maintain density matrix W as a parameter

Multiplicative update

W :=exp(logW − η L)

normalization

Motivated by using a quantum relative entropy regularization

Density matrix is uncertainty over rank 1 subspaces

Capping the eigenvalues at 1k : uncertainty over rank k subspaces

Today: What corresponds to Bayes ???

6 / 85

Page 7: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Visualisations: ellipses

We illustrate symmetric matrices as ellipses- affine transformations of the unit ball:

Ellipse = {Su : ‖u‖2 = 1}Dotted lines connect point u on unit ball with point Su on ellipse

7 / 85

Page 8: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Ellipses cont.

For symmetric matrices, the eigenvectors form the axes of the ellipseand eigenvalues their lengths

Su = σu, u is an eigenvector, σ is an eigenvalue

8 / 85

Page 9: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Dyads

One eigenvalue one

All others zero

9 / 85

Page 10: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

From vectors to matrices

Real vectors x =∑

i xi︸︷︷︸real coefficients

ei

Symmetric matrices S =∑

i σi︸︷︷︸real coefficients

sis>i︸︷︷︸

dyads

i.e. linear combinations of unit vectors / dyads

Probability vectors ω =∑

i ωi︸︷︷︸mixture coefficients

ei

Density matrices W =∑

i ωi︸︷︷︸mixture coefficients

wiw>i︸ ︷︷ ︸

pure density matrices

i.e. mixtures of unit vectors / dyads

Matrices are generalized distributions

10 / 85

Page 11: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

From vectors to matrices

Real vectors x =∑

i xi︸︷︷︸real coefficients

ei

Symmetric matrices S =∑

i σi︸︷︷︸real coefficients

sis>i︸︷︷︸

dyads

i.e. linear combinations of unit vectors / dyads

Probability vectors ω =∑

i ωi︸︷︷︸mixture coefficients

ei

Density matrices W =∑

i ωi︸︷︷︸mixture coefficients

wiw>i︸ ︷︷ ︸

pure density matrices

i.e. mixtures of unit vectors / dyads

Matrices are generalized distributions

11 / 85

Page 12: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

From vectors to matrices

Real vectors x =∑

i xi︸︷︷︸real coefficients

ei

Symmetric matrices S =∑

i σi︸︷︷︸real coefficients

sis>i︸︷︷︸

dyads

i.e. linear combinations of unit vectors / dyads

Probability vectors ω =∑

i ωi︸︷︷︸mixture coefficients

ei

Density matrices W =∑

i ωi︸︷︷︸mixture coefficients

wiw>i︸ ︷︷ ︸

pure density matrices

i.e. mixtures of unit vectors / dyads

Matrices are generalized distributions

12 / 85

Page 13: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Outline

1 Multiplicative updates - the genesis of this research

2 Conventional and Generalized Probability Distributions

3 Conventional and generalized Bayes rule

4 Bounds, derivation, calculus

13 / 85

Page 14: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Conventional probability theory

Space is set A of n elementary events / points

{a1, a2, a3, a4, a5}

Event is subset

{a2, a3, a5} = (0, 1, 1, 0, 1)>

Distribution is probability vector

(.1, .2, .3, .1, .3)>

Probability of event 0 · .1 + 1 · .2 + 1 · .3 + 0 · .1 + 1 · .3 = .8

14 / 85

Page 15: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Conventional case in matrix notation

The n elementary events are matrices with a single one on diagonal(0 0 0 0 00 0 0 0 00 0 0 0 00 0 0 1 00 0 0 0 0

)= e4e

>4

Event is diagonal matrix P with {0, 1} entries

P =

(0 0 0 0 00 1 0 0 00 0 1 0 00 0 0 0 00 0 0 0 1

)= e2e

>2 + e3e

>3 + e5e

>5

Distribution is diagonal matrix Wwith probability distribution along the diagonal

W =

(.1 0 0 0 00 .2 0 0 00 0 .3 0 00 0 0 .1 00 0 0 0 .3

)Probability of event P wrt distribution W is tr(WP) = W • P

15 / 85

Page 16: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Elementary events = states in quantum physics

Conventional: finitely many states

{eie>i : 1 ≤ i ≤ n}

Generalized: continuously many states uu> called dyads

{uu> : u ∈ Rn, ||u||2 = 1}

Degenerate ellipsesuu> one-dimensional projection matrix onto direction u

Prefer to use the dyads uu> as our statesinstead of the unit vectors u

16 / 85

Page 17: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Generalized probabilites over Rn

Elementary events are the dyads uu>

Event is symmetric matrix P with {0, 1} eigenvalues

P = U(

0 0 0 0 00 1 0 0 00 0 1 0 00 0 0 0 00 0 0 0 1

)U> = u2u

>2 + u3u

>3 + u5u

>5

U orthogonal eigensystem, ui columns/eigenvectorsP projection matrix onto arbitrary subspace of Rn: P2 = P

Distribution is density matrix W:

W = U(.1 0 0 0 00 .2 0 0 00 0 .3 0 00 0 0 .1 00 0 0 0 .3

)U> =

∑i

ωi uiu>i︸ ︷︷ ︸

mixture of dyads

Eigenvalues ωi form probability vector

Event probabilities become traces (later)

17 / 85

Page 18: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Density matrices = mixtures of dyads

Many mixtures lead to same density matrix

There always exists a decomposition into n dyads that correspond toeigenvectors

Alternate defnitions of density matrices?

Symmetric: W> = W

Positive definite: u>Wu ≥ 0 ∀uTrace one: sum of diagonal elements is one

18 / 85

Page 19: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Variance

View the symmetric positive definite matrix C as a covariance matrixof some random cost vector c ∈ Rn, i.e.

C = E(

(c− E(c))(c− E(c))>)

The variance along any vector u is

V(c>u) = E((c>u− E(c>u)

)2)

= E((

(c> − E(c>)) u)2

)

= u> E(

(c− E(c))(c− E(c))>)

︸ ︷︷ ︸C

u

Variance as trace

u>Cu = tr(u>Cu) = tr(C uu>) = C • uu> ≥ 0

19 / 85

Page 20: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Plotting the variance

Curve of the ellipse is plot of vector Cu , where u is unit vectorThe outer figure eight is direction u times the variance u>CuFor an eigenvector, this variance equals the eigenvalueand touches the ellipse

20 / 85

Page 21: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

3 dimensional variance plots

21 / 85

Page 22: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Assignment of generalized probabilities

Density matrix W assigns generalized probabilityu>Wu = tr(Wuu>) to dyad uu>

Sum of probabilities over an orthornormal basis ui is 1

For any two orthogonal directions:u>1 Wu1 + u>2 Wu2 = 1

a + b + c = 1

22 / 85

Page 23: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Sum of probabilities over an orthornormal basis

∑i

tr(Wuiu>i ) = tr(W

∑i

uiu>i ) = tr(WUU>︸ ︷︷ ︸

I

) = tr(W) = 1

Uniform density matrix: 1n I

tr(1

nI uu>) =

1

ntr(uu>) =

1

n

All dyads have generalized probability 1n

Generalized probability of n orthogonal dyadssums to 1

Classical: for any probability vector ω∑i

tr(diag(ω) eie>i ) =

∑i

ωi = 1

23 / 85

Page 24: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Total variance along orthogonal set of directions

A density matrixFor any two orthogonal directions

u>1 Au1 + u>2 Au2 = 1

a + b + c = 1

24 / 85

Page 25: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Gleason’s Theorem

Definition

Scalar function µ(u) from unit vectors u in Rn to Ris called generalized probability measure if:

∀u, 0 ≤ µ(u) ≤ 1

If u1, . . . ,un form an orthonormal basis for Rn,then

∑µ(ui ) = 1

Theorem

Let n ≥ 3. Then any generalized probability measure µ on Rn has theform:

µ(u) = tr(Wuu>)

for a uniquely defined density matrix W

25 / 85

Page 26: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Key generalization

Disjoint events now correspond to orthogonal events

When events have same eigensystem, then

tr

U(

1 0 0 0 00 0 0 0 00 0 0 0 00 0 0 0 00 0 0 0 1

)U> U

(0 0 0 0 00 1 0 0 00 0 1 0 00 0 0 0 00 0 0 0 0

)U>︸ ︷︷ ︸

orthogonal

= 0 iff

(10001

(01100

)︸ ︷︷ ︸

disjoint

= 0

26 / 85

Page 27: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Probability of events

Generalized probability of event P is

tr(WP) = tr(∑i

ωi wiw>i P) =

∑i

probability︷︸︸︷ωi

variance︷ ︸︸ ︷w>i Pwi︸ ︷︷ ︸

expected variance

Random variable is symmetric matrix SExpectation of S is

tr(WS) = tr(W∑i

σisis>i ) =

∑i

outcome︷︸︸︷σi

probability︷ ︸︸ ︷s>i W si︸ ︷︷ ︸

expected outcome

27 / 85

Page 28: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Quantum measurement

Quantum measurement for mixture state W =∑

i ωiwiw>i

and observable S =∑

i σisis>i :

After measurement, state collapses into one of {s1s>1 , . . . , sns>n }

Successor state is sis>i with probability si

>W si

The expected state is again a density matrix

W −→∑i

sis>i W sis

>i =

∑i

probability︷ ︸︸ ︷s>i W si sis

>i︸ ︷︷ ︸

expected state

Successor state sis>i associated with outcome σi

Expected outcome tr(WS) =∑

i σi s>i W si

We use the trace computations but our density matrices are updateddifferently

28 / 85

Page 29: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Outline

1 Multiplicative updates - the genesis of this research

2 Conventional and Generalized Probability Distributions

3 Conventional and generalized Bayes rule

4 Bounds, derivation, calculus

29 / 85

Page 30: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Conventional setup

Model Mi is chosen with prior probability P(Mi )

Datum y is generated with probability P(y |Mi )

P(y) =∑i

P(Mi )P(y |Mi )︸ ︷︷ ︸expected likelihood

30 / 85

Page 31: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Conventional Bayes Rule

P(Mi |y) =P(Mi )P(y |Mi )

P(y)

4 updates with the same data likelihood

Update maintains uncertainty information about maximum likelihood

Soft max

31 / 85

Page 32: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Conventional Bayes Rule

P(Mi |y) =P(Mi )P(y |Mi )

P(y)

4 updates with the same data likelihood

Update maintains uncertainty information about maximum likelihood

Soft max

32 / 85

Page 33: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Conventional Bayes Rule

P(Mi |y) =P(Mi )P(y |Mi )

P(y)

4 updates with the same data likelihood

Update maintains uncertainty information about maximum likelihood

Soft max

33 / 85

Page 34: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Conventional Bayes Rule

P(Mi |y) =P(Mi )P(y |Mi )

P(y)

4 updates with the same data likelihood

Update maintains uncertainty information about maximum likelihood

Soft max

34 / 85

Page 35: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Bayes Rule for density matrices

D(M|y) =exp (logD(M) + logD(y|M))

tr (above matrix)

1 update with datalikelyhood matrixD(y|M)

Update maintainsuncertainty informationabout maximumeigenvalue

Soft max eigenvaluecalculation

35 / 85

Page 36: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Bayes Rule for density matrices

D(M|y) =exp (logD(M) + logD(y|M))

tr (above matrix)

2 updates with samedata likelyhood matrixD(y|M)

Update maintainsuncertainty informationabout maximumeigenvalue

Soft max eigenvaluecalculation

36 / 85

Page 37: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Bayes Rule for density matrices

D(M|y) =exp (logD(M) + logD(y|M))

tr (above matrix)

3 updates with samedata likelyhood matrixD(y|M)

Update maintainsuncertainty informationabout maximumeigenvalue

Soft max eigenvaluecalculation

37 / 85

Page 38: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Bayes Rule for density matrices

D(M|y) =exp (logD(M) + logD(y|M))

tr (above matrix)

4 updates with samedata likelyhood matrixD(y|M)

Update maintainsuncertainty informationabout maximumeigenvalue

Soft max eigenvaluecalculation

38 / 85

Page 39: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Bayes Rule for density matrices

D(M|y) =exp (logD(M) + logD(y|M))

tr (above matrix)

10 updates with samedata likelyhood matrixD(y|M)

Update maintainsuncertainty informationabout maximumeigenvalue

Soft max eigenvaluecalculation

39 / 85

Page 40: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Bayes Rule for density matrices

D(M|y) =exp (logD(M) + logD(y|M))

tr (above matrix)

20 updates with samedata likelyhood matrixD(y|M)

Update maintainsuncertainty informationabout maximumeigenvalue

Soft max eigenvaluecalculation

40 / 85

Page 41: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Many iterations of conventional Bayes w. same datalikelihood

Plot of posterior probability as a function of the iteration number

(P(Mi )) = (.29, .4, .3, .01) (P(y |Mi )) = (.7, .84, .85, .9)

Initially .85 overtakes .84Eventually .9 overtakes bothLargest likelihood: sigmoid smallest likelihood: reverse sigmoid

41 / 85

Page 42: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Many iterations of generalized Bayes Rule

Prior D(M) is diagonalized prior (P(Mi )) of previous plotData likelihood D(y |M) = U diag((P(y |Mi ))UT ,

where the eigensystem U is a random rotation matrixPlot of projections of the posterior onto the four eigendirections ui

42 / 85

Page 43: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Forward and backward

diagonal with rotation

−150 −100 −50 0 50 100 1500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

−150 −100 −50 0 50 100 1500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

43 / 85

Page 44: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Conventional rule as special case

If D(M) and D(y|M) have the same eigensystem, then generalizedBayes rule specializes the the conventional case

. . . 0

P(Mi |y)

0. . .

=

. . . 0

P(Mi )

0. . .

. . . 0P(y |Mi )

0. . .

tr( · · · )

44 / 85

Page 45: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Diagonal case in dimension 2

Sequence of 2-dim ellipses for diagonal case

45 / 85

Page 46: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

General case in dimension 2

46 / 85

Page 47: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

What’s the upshot

Nifty generalization of Bayes rule

Lots of evidence that it is the right generalization

We don’t have an application yet where the calculus is needed for theanalysis

47 / 85

Page 48: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Properties D(M) and D(y|M)

D(M|y) =exp (logD(M) + logD(y|M))

tr (above matrix)

D(M)

symmetric positive definite

eigenvalues sum to one

D(y|M)

symmetric positive definite

all eigenvalues in range [0..1]

Later we discuss where these matrices come from

48 / 85

Page 49: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Non-commutative Bayes Rule?

The product of two symmetric positive matrices can be neithersymmetric nor positive definite

D(M|y) =

sym.pos.def︷ ︸︸ ︷exp(

sym.︷ ︸︸ ︷log

sym.pos.def︷ ︸︸ ︷D(M) +

sym.︷ ︸︸ ︷log

sym.pos.def︷ ︸︸ ︷D(y|M)

)tr (· · · )

49 / 85

Page 50: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Intersection properties of Bayes

Conventional Bayes:

P(Mi ) P(y |Mi ) P(Mi |y)

0 0 0a 0 00 b 0

a b abP(y)

Computes intersection of two sets

50 / 85

Page 51: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Intersection properties on new Bayes Rule

exp(log A + log B)︸ ︷︷ ︸

Result lies in intersection of both spans:here a degenerate ellipse of dimension one

51 / 85

Page 52: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Same eigensystem, then trace dot product of eigenvectors

W and S have the same eigensystem

tr(WS) =∑i

ωiσi

The ωi can track the high σi

52 / 85

Page 53: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Rotated matrices are “invisible”

W: S1 : S2 :

W is any matrix with eigensystem I

S1,S2 are any matrices with rotated eigensystem andtr(S1) = tr(S2), then

tr(WS1) = tr(WS2)

So W cannot distinguish S1 and S2

53 / 85

Page 54: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Hadamard matrices

(1 11 −1

) ( 1 1 1 11 −1 1 −11 1 −1 −11 −1 −1 1

)Hn =

(Hn−1 Hn−1

Hn−1 −Hn−1

)Columns orthogonal

1√nH orthonormal - here called rotated eigensystem

Dyads hh> formed by any column h of 1√nH has 1

n along diagonal

1

n

( 1 −1 1 −1−1 1 −1 1

1 −1 1 −1−1 1 −1 1

)

Diagonal of hh> consists of all ones

54 / 85

Page 55: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Rotated versus unrotated in n dimensions

If W has eigensystem Iand S has rotated eigensystem:

W: S :

tr(∑i

ωieie>i︸ ︷︷ ︸

W

∑j

σj1

nhjh>j︸ ︷︷ ︸

S

) =1

n

∑i ,j

ωiσj tr(eie>i hjh

>j )︸ ︷︷ ︸

1

=1

ntr(W)tr(S)

A density matrix W with unit eigen system does not see much detailabout matrices in the rotated system

55 / 85

Page 56: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Avoiding logs of zeros

Rewriteexp(log A + log B) as lim

n→∞(A1/nB1/n)n

Lie-Trotter Formula

Limit always exists and well behaved

Short hand for new product

A�B := limn→∞

(A1/nB1/n)n

Generalized Bayes Rule becomes

D(M|y) =D(M)�D(y|M)

tr(D(M)�D(y|M))

56 / 85

Page 57: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

� - commutative matrix product

Plain matrix product is non-commutative and can violate symmetry andpositive definiteness. � does not have these drawbacks.

57 / 85

Page 58: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Behaviour of the limit for �

“Ears” indicating negative definiteness are smaller for (A1/2B1/2)2

compared to AB

Non-commuting part shrinks as well

58 / 85

Page 59: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Operating on bunnies with �

Commutative combination of linear transformations

c©Marc Alexa

59 / 85

Page 60: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

� Properties

1 Commutative, associative, identity matrix as neutral elmt,preserves symmetry and positive definiteness

2 A� B = AB iff A and B commute

3 range(A� B) = range(A) ∩ range(B)

4 tr(A� B) ≤ tr(AB) with equality when A and B commute

5 For any unit direction u ∈ range(A),

uu> � A = eu>(log+ A)u uu>

6 det(A� B) = det(A) det(B), as for the regular matrix product

7 Typically A� (B + C) 6= A� B + A� C

60 / 85

Page 61: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Conventional setup again

Model Mi is chosen with prior probability P(Mi )Datum y is generated with probability P(y |Mi )

P(y) =∑i

P(Mi )P(y |Mi )︸ ︷︷ ︸expected likelihood

Theorem of Total Probability

61 / 85

Page 62: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Generalized setup

D(y) = tr(D(M)�D(y|M)) ≤ tr(D(M)D(y|M))

=∑i

probability︷︸︸︷δi

variance︷ ︸︸ ︷d>i D(y|M)di︸ ︷︷ ︸

expected variance

=∑i

probability︷ ︸︸ ︷u>i D(M)ui

outcome︷︸︸︷λi︸ ︷︷ ︸

expected measurement

where D(y|M) =∑

i λiuiu>i

Upper bounds similar to Theorem of Total Probability

Only decouples when D(M) and D(y|M) have same eigensystem

≤ becomes =Probabilities → P(Mi ) and variance/outcome → P(y |Mi )

62 / 85

Page 63: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Outline

1 Multiplicative updates - the genesis of this research

2 Conventional and Generalized Probability Distributions

3 Conventional and generalized Bayes rule

4 Bounds, derivation, calculus

63 / 85

Page 64: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Conventional bound ito MAP

Probability domain

P(y) =∑i

P(y |Mi )P(Mi ) ≥ P(y |Mi )P(Mi )

Log domain

− log P(y) = − log∑i

P(y |Mi )P(Mi )

≤ mini

(− log P(y |Mi )− log P(Mi ))

64 / 85

Page 65: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Generalized bound ito MAP

Only in the log domain

− logm>Am ≤ −m> log(A)m, for any unit vector m and symmetricpositive definite matrix A

− logD(y) =

= − log tr(D(y|M)�D(M))

≤ minm

(− logm>(D(y|M)�D(M))m)

≤ minm

(−m> log(D(y|M)�D(M))m)

= minm

(−m> logD(y|M)m−m> logD(M)m)

65 / 85

Page 66: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Derivation of updates [KW97]

infw

∆(w,w0)︸ ︷︷ ︸divergence to w0

+ η︸︷︷︸learning rate

Loss(w)

Can derive large variety of updates by varying divergence, lossfunction and learning rate

Examples: Gradient descent update, exponentiated gradient update,Ada-Boost (η →∞)

Here we will derive Bayes rule with this framework

66 / 85

Page 67: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Conventional Bayes Rule

Mixture coefficients ωi

Prior P(Mi )

inf∑i ωi=1

1

η

∑i

ωi logωi

P(Mi )︸ ︷︷ ︸rel.entropy

−∑i

ωi log P(y |Mi )︸ ︷︷ ︸expected loss

η =∞: maximum likelihood

η = 1: Bayes Rule- Soft max

Special case of Exponentiated Gradient update

67 / 85

Page 68: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Minimization of ω

Lagrangian:

L(ω) =1

η

∑i

ωi logωi

P(Mi )−∑i

ωi log P(y |Mi ) + λ((∑i

ω)− 1)

∂L(ω)

∂ωi=

1

η

(log

ωi

P(Mi )+ 1

)− log P(y |Mi ) + λ

Setting partials zero:

ω∗i = P(Mi ) exp(λ− 1 + η log P(y |Mi ))

Enforcing sum constraint:

ω∗i =P(Mi )P(y |Mi )

η∑j P(Mj)P(y |Mj)η

η = 1: Conventional Bayes rule68 / 85

Page 69: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Conventional Bayes again

inf∑i ω=1

1

η

∑i

ωi logωi︸ ︷︷ ︸neg.entropy

− 1

η

∑i

ωi log P(Mi )︸ ︷︷ ︸initial data

−∑i

ωi log P(y |Mi )

Prior and data treated the same when η = 1

Commutativity

69 / 85

Page 70: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Bayes Rule for density matrices

Parameter is density matrix W

Prior is density matrix D(M)

inftr(W)=1

1

ηtr(W(logW − logD(M))︸ ︷︷ ︸

Quantum rel. entr.

− tr(WlogD(y|M))︸ ︷︷ ︸Fancier mixture loss

η =∞: minimized when W is dyad uu> and u is the eigenvectorbelonging to a minimum eigenvalue of − logD(y|M)

η = 1: Generalized Bayes Rule- Soft maximum eigenvalue calculation

Special case of Matrix Exponentiated Gradient update

70 / 85

Page 71: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Generalized Bayes Rule again

inftr(W)=1

1

ηtr(W(logW))︸ ︷︷ ︸quantum entropy

− 1

ηtr(WlogD(M))︸ ︷︷ ︸

initial data

− tr(WlogD(y|M))

Von Neumann Entropy is just entropy of eigenvalues

Prior and all data treated the same when η = 1

Commutativity

71 / 85

Page 72: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

D(y|M)?

Where does data likelyhood matrix D(y|M) come from?

From a joint distribution on space (Y,M)

72 / 85

Page 73: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Joint distributions

Conventional joints:

Two sets of elementary events - A and B

Joint space A× B

Elementary events are pairs (ai , bj)

Joint distribution is a probability vector over pairs

Generalized joints:

Two real vector spaces: A and B of dimension nA and nB

Joint space: tensor product A⊗ B - real space of dimension nAnB

Elementary events are dyads of joint space

Joint distribution is a density matrix over joint space

73 / 85

Page 74: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Joint probability?

Given joint density matrix D(A,B)a dyad aa> from space Aa dyad bb> from space BWhat’s the joint probability of aa> and bb>?

D(a,b) =?Recall D(a) = tr(D(A) aa>).Thus D(a,b) = tr(D(A,B) ?)

Conventional: look up probability of jointly specified event (ai , bj) in jointtable

What is a jointly specified dyad?

We will use Kronecker product74 / 85

Page 75: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Kronecker Product

Kronecker product of n ×m matrix E and p × q matrix F is a np ×mqmatrix E⊗ F which in block form is given as:

E⊗ F =

e11F e12F . . . e1mFe21F e22F . . . e2mF. . . . . . . . . . . . . . . . . . . . . .en1F en2F . . . enmF

Properties:

(E⊗ F)(G⊗H) = EG⊗ FH

tr(E⊗ F) = tr(E)tr(F)

If D(A) and D(B) are density matrices,then so is D(A)⊗D(B)

(a⊗ b)(a⊗ b)> = aa> ⊗ bb>

is a dyad of space (A,B)

75 / 85

Page 76: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Joint probability

Use aa> ⊗ bb> as jointly specified dyad

Joint probability: D(a,b) = tr(D(A,B)(aa> ⊗ bb>))

Not every dyad on the joint space can be written as aa> ⊗ bb>!!!

This issue in quantum physics is known as entanglement

76 / 85

Page 77: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

More!

ConditionalsMarginalizationTheorem of total probability

Need additional Kronecker product propertiesPartial trace, etc

Goes beyond the scope of this talk

Many subtle quantum physics issues show up in the calculus

77 / 85

Page 78: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Sample calculus rules

D(A) = trB(D(A,B))D(A,b) = trB(D(A,B)(IA ⊗ bb>))Marginalization

D(A|B) = D(A,B)� (IA ⊗D(B))−1

Conditional in terms of the jointIntroduced by Cerf and Adami

D(A) = trB(D(A|B)� (IA ⊗D(B)))Theorem of total probability

D(M|y) = D(M)�D(y|M)tr(D(M)�D(y|M))

Our Bayes rule

D(b|A) = D(b)D(A|b)� (D(A|B)� (IA ⊗D(B)))−1

Another Bayes rule

78 / 85

Page 79: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Summary

We maintain uncertainty about direction of maximum variance with adensity matrix

Update generalizes conventional Bayes’s rule

Motivate the update based on a maxent principle

Probability calculus that retains conventional probabilities as a specialcase

79 / 85

Page 80: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Outlook

Calculus for other matrix classes

On-line update for PCA :-)

Applications that need the calculus

Connections to quantum computation

80 / 85

Page 81: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Open problem 1

You can implement the conventional Bayes Rule in the tube- in vitro selection with 1020 variable

Recall Generalized Bayes Rule?

D(M|y) =exp (logD(M) + logD(y|M))

tr (above matrix)

Is there a physical realization?

81 / 85

Page 82: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Open problem 1

You can implement the conventional Bayes Rule in the tube- in vitro selection with 1020 variable

Recall Generalized Bayes Rule?

D(M|y) =exp (logD(M) + logD(y|M))

tr (above matrix)

Is there a physical realization?

82 / 85

Page 83: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Open problem 1

You can implement the conventional Bayes Rule in the tube- in vitro selection with 1020 variable

Recall Generalized Bayes Rule?

D(M|y) =exp (logD(M) + logD(y|M))

tr (above matrix)

Is there a physical realization?

83 / 85

Page 84: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Open problem 2

The Generalized Bayes Rule retains the Conventional Bayes Rule asthe hardest case

Learning the eigensystem for free

Find other cases where there is a “free matrix lunch”

84 / 85

Page 85: A Bayesian Probability Calculus for Density Matricesmanfred/pubs/C76talk.pdf · 2013-11-03 · A Bayesian Probability Calculus for Density Matrices Manfred K. Warmuth and Dima Kuzmin

Open problem 2

The Generalized Bayes Rule retains the Conventional Bayes Rule asthe hardest case

Learning the eigensystem for free

Find other cases where there is a “free matrix lunch”

85 / 85