adaptive annealing: a near-optimal connection between sampling and counting

153
Adaptive annealing: a near- optimal connection between sampling and counting Daniel Štefankovič (University of Rochester) Santosh Vempala Eric Vigoda (Georgia Tech)

Upload: brigit

Post on 25-Feb-2016

50 views

Category:

Documents


0 download

DESCRIPTION

Adaptive annealing: a near-optimal connection between sampling and counting. Daniel Štefankovič (University of Rochester) Santosh Vempala Eric Vigoda (Georgia Tech). Adaptive annealing: a near-optimal connection between sampling and counting. If you want to count - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Adaptive annealing: a near-optimal connection between

sampling and counting

Daniel Štefankovič(University of Rochester)

Santosh VempalaEric Vigoda(Georgia Tech)

Page 2: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Adaptive annealing: a near-optimal connection between

sampling and counting

Daniel Štefankovič(University of Rochester)

Santosh VempalaEric Vigoda(Georgia Tech)

If you want to count using MCMC then statistical physicsis useful.

Page 3: Adaptive annealing:  a near-optimal  connection between  sampling and counting

1. Counting problems

2. Basic tools: Chernoff, Chebyshev

3. Dealing with large quantities (the product method)

4. Statistical physics

5. Cooling schedules (our work)

6. More…

Outline

Page 4: Adaptive annealing:  a near-optimal  connection between  sampling and counting

independent sets

spanning trees

matchings

perfect matchings

k-colorings

Counting

Page 5: Adaptive annealing:  a near-optimal  connection between  sampling and counting

independent sets

spanning trees

matchings

perfect matchings

k-colorings

Counting

Page 6: Adaptive annealing:  a near-optimal  connection between  sampling and counting

spanning trees

Compute the number of

Page 7: Adaptive annealing:  a near-optimal  connection between  sampling and counting

spanning trees

Compute the number of

det(D – A)vv0 1 0 11 0 1 00 1 0 11 0 1 0

2 0 0 00 2 0 00 0 2 00 0 0 2

2 -1 0-1 2 -1 0 -1 2

Kirchhoff’s Matrix Tree Theorem:

-D A

det

Page 8: Adaptive annealing:  a near-optimal  connection between  sampling and counting

spanning trees

Compute the number of

polynomial-timealgorithmG

number of spanning trees of G

Page 9: Adaptive annealing:  a near-optimal  connection between  sampling and counting

independent sets

spanning trees

matchings

perfect matchings

k-colorings

Counting ?

Page 10: Adaptive annealing:  a near-optimal  connection between  sampling and counting

independent sets

Compute the number of

(hard-core gas model)

independent set subset S of vertices, of a graph no two in S are neighbors=

Page 11: Adaptive annealing:  a near-optimal  connection between  sampling and counting

# independent sets = 7

independent set = subset S of vertices no two in S are neighbors

Page 12: Adaptive annealing:  a near-optimal  connection between  sampling and counting

...

# independent sets = G1

G2

G3

Gn

...Gn-2

...Gn-1

Page 13: Adaptive annealing:  a near-optimal  connection between  sampling and counting

...

# independent sets = G1

G2

G3

Gn

...Gn-2

...Gn-1

2

3

5

Fn-1

Fn

Fn+1

Page 14: Adaptive annealing:  a near-optimal  connection between  sampling and counting

# independent sets = 5598861

independent set = subset S of vertices no two in S are neighbors

Page 15: Adaptive annealing:  a near-optimal  connection between  sampling and counting

independent sets

Compute the number of

polynomial-timealgorithmG

number of independent sets of G

?

Page 16: Adaptive annealing:  a near-optimal  connection between  sampling and counting

independent sets

Compute the number of

polynomial-timealgorithmG

number of independent sets of G

!(unlikely)

Page 17: Adaptive annealing:  a near-optimal  connection between  sampling and counting

#P-complete

#P-complete even for 3-regular graphs

graph G # independent sets in G

(Dyer, Greenhill, 1997)

FP

#P

P

NP

Page 18: Adaptive annealing:  a near-optimal  connection between  sampling and counting

graph G # independent sets in G

approximation

randomization

?

Page 19: Adaptive annealing:  a near-optimal  connection between  sampling and counting

graph G # independent sets in G

approximation

randomization

?which is more important?

Page 20: Adaptive annealing:  a near-optimal  connection between  sampling and counting

graph G # independent sets in G

approximation

randomization

?which is more important?

My world-view: (true) randomness is important conceptually but NOT computationally(i.e., I believe P=BPP).

approximation makes problems easier (i.e., I believe #P=BPP)

Page 21: Adaptive annealing:  a near-optimal  connection between  sampling and counting

We would like to know Q

Goal: random variable Y such that

P( (1-)Q Y (1+)Q ) 1-

“Y gives (1-estimate”

Page 22: Adaptive annealing:  a near-optimal  connection between  sampling and counting

We would like to know Q

Goal: random variable Y such that

P( (1-)Q Y (1+)Q ) 1-

polynomial-timealgorithmG,,

FPRAS: Y

(fully polynomial randomized approximation scheme):

Page 23: Adaptive annealing:  a near-optimal  connection between  sampling and counting

1. Counting problems

2. Basic tools: Chernoff, Chebyshev

3. Dealing with large quantities (the product method)

4. Statistical physics

5. Cooling schedules (our work)

6. More...

Outline

Page 24: Adaptive annealing:  a near-optimal  connection between  sampling and counting

We would like to know Q 1. Get an unbiased estimator X, i. e.,

E[X] = Q

Y=X1 + X2 + ... + Xnn

2. “Boost the quality” of X:

Page 25: Adaptive annealing:  a near-optimal  connection between  sampling and counting

P( Y gives (1)-estimate )

1 -

The Bienaymé-Chebyshev inequality

V[Y]E[Y]2

1

Page 26: Adaptive annealing:  a near-optimal  connection between  sampling and counting

P( Y gives (1)-estimate )

1 -

Y=X1 + X2 + ... + Xn

n

The Bienaymé-Chebyshev inequality

V[Y]E[Y]2 = 1 V[X]

E[X]2n

squared coefficient of variation SCV

V[Y]E[Y]2

1

Page 27: Adaptive annealing:  a near-optimal  connection between  sampling and counting

P( Y gives (1)-estimate of Q )

Let X1,...,Xn,X be independent, identically distributed random variables, Q=E[X]. Let

The Bienaymé-Chebyshev inequality

1 - V[X]n E[X]2

1

ThenY=

X1 + X2 + ... + Xn

n

Page 28: Adaptive annealing:  a near-optimal  connection between  sampling and counting

P( Y gives (1)-estimate of Q ) - 2 . n . E[X] / 3 1 –

Let X1,...,Xn,X be independent, identically distributed random variables, 0 X 1, Q=E[X]. Let

Chernoff’s bound

Y=X1 + X2 + ... + Xn

nThen

e

Page 29: Adaptive annealing:  a near-optimal  connection between  sampling and counting
Page 30: Adaptive annealing:  a near-optimal  connection between  sampling and counting

n V[X]E[X]2

1

1

n 1E[X]

3

ln (1/)

0X1

Number of samples to achieve precision with confidence

Page 31: Adaptive annealing:  a near-optimal  connection between  sampling and counting

n V[X]E[X]2

1

1

n 1E[X]

3

ln (1/)

0X1

Number of samples to achieve precision with confidence

BADGOOD

BAD

Page 32: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Median “boosting trick”

P( ) 3/4

n 1E[X]

4

(1-)Q (1+)Q

Y=X1 + X2 + ... + Xn

n

Y=

BY BIENAYME-CHEBYSHEV:

Page 33: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Median trick – repeat 2T times(1-)Q (1+)Q

P( ) 3/4

P( ) 1 - e-T/4> T out of 2T

median is in

P( ) 1 - e-T/4

BY BIENAYME-CHEBYSHEV:

BY CHERNOFF:

Page 34: Adaptive annealing:  a near-optimal  connection between  sampling and counting

n V[X]E[X]2

32

n 1E[X]

3

ln (1/)

0X1

+ median trick

ln (1/)

BAD

Page 35: Adaptive annealing:  a near-optimal  connection between  sampling and counting

n V[X]E[X]2

1

ln (1/)

Creating “approximator” from X = precision = confidence

Page 36: Adaptive annealing:  a near-optimal  connection between  sampling and counting

1. Counting problems

2. Basic tools: Chernoff, Chebyshev

3. Dealing with large quantities (the product method)

4. Statistical physics

5. Cooling schedules (our work)

6. More...

Outline

Page 37: Adaptive annealing:  a near-optimal  connection between  sampling and counting

(approx) counting sampling Valleau,Card’72 (physical chemistry), Babai’79 (for matchings and colorings), Jerrum,Valiant,V.Vazirani’86

random variables: X1 X2 ... Xt

E[X1 X2 ... Xt]

= O(1)V[Xi]E[Xi]2

the Xi are easy to estimate = “WANTED”

the outcome of the JVV reduction:

such that1)2)

squared coefficient of variation (SCV)

Page 38: Adaptive annealing:  a near-optimal  connection between  sampling and counting

E[X1 X2 ... Xt]

= O(1)V[Xi]E[Xi]2

the Xi are easy to estimate = “WANTED” 1)

2)

O(t2/2) samples (O(t/2) from each Xi) give 1 estimator of “WANTED” with prob3/4

Theorem (Dyer-Frieze’91)

(approx) counting sampling

Page 39: Adaptive annealing:  a near-optimal  connection between  sampling and counting

JVV for independent sets

P( ) 1# independent sets =

GOAL: given a graph G, estimate the number of independent sets of G

Page 40: Adaptive annealing:  a near-optimal  connection between  sampling and counting

JVV for independent sets

P( )P( ) =

?

??

??P( ) ?P( )P( )

X1 X2 X3 X4

Xi [0,1] and E[Xi] ½ = O(1)V[Xi]E[Xi]2

P(AB)=P(A)P(B|A)

Page 41: Adaptive annealing:  a near-optimal  connection between  sampling and counting

JVV for independent sets

P( )P( ) =

?

??

??P( ) ?P( )P( )

X1 X2 X3 X4

Xi [0,1] and E[Xi] ½ = O(1)V[Xi]E[Xi]2

P(AB)=P(A)P(B|A)

Page 42: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Self-reducibility for independent sets

?

??P( ) 5

7=

Page 43: Adaptive annealing:  a near-optimal  connection between  sampling and counting

?

??P( ) 5

7=

57=

Self-reducibility for independent sets

Page 44: Adaptive annealing:  a near-optimal  connection between  sampling and counting

?

??P( ) 5

7=

57= 5

7=

Self-reducibility for independent sets

Page 45: Adaptive annealing:  a near-optimal  connection between  sampling and counting

??P( ) 3

5=

35=

Self-reducibility for independent sets

Page 46: Adaptive annealing:  a near-optimal  connection between  sampling and counting

??P( ) 3

5=

35= 3

5=

Self-reducibility for independent sets

Page 47: Adaptive annealing:  a near-optimal  connection between  sampling and counting

35

57= 5

7=

35

57= 2

3 = 7

Self-reducibility for independent sets

Page 48: Adaptive annealing:  a near-optimal  connection between  sampling and counting

SAMPLERORACLEgraph G

random independentset of G

JVV: If we have a sampler oracle:

then FPRAS using O(n2) samples.

Page 49: Adaptive annealing:  a near-optimal  connection between  sampling and counting

SAMPLERORACLEgraph G

random independentset of G

JVV: If we have a sampler oracle:

then FPRAS using O(n2) samples.

SAMPLERORACLE

graph G set fromgas-model Gibbs at

ŠVV: If we have a sampler oracle:

then FPRAS using O*(n) samples.

Page 50: Adaptive annealing:  a near-optimal  connection between  sampling and counting

O*( |V| ) samples suffice for counting

Application – independent sets

Cost per sample (Vigoda’01,Dyer-Greenhill’01) time = O*( |V| ) for graphs of degree 4.

Total running time: O* ( |V|2 ).

Page 51: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Other applicationsmatchings O*(n2m) (using Jerrum, Sinclair’89)

spin systems: Ising model O*(n2) for <C

(using Marinelli, Olivieri’95) k-colorings O*(n2) for k>2 (using Jerrum’95)

total running time

Page 52: Adaptive annealing:  a near-optimal  connection between  sampling and counting

1. Counting problems

2. Basic tools: Chernoff, Chebyshev

3. Dealing with large quantities (the product method)

4. Statistical physics

5. Cooling schedules (our work)

6. More…

Outline

Page 53: Adaptive annealing:  a near-optimal  connection between  sampling and counting

easy = hot

hard = cold

Page 54: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Hamiltonian

12

4

0

Page 55: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Hamiltonian H : {0,...,n}

Big set =

Goal: estimate |H-1(0)|

|H-1(0)| = E[X1] ... E[Xt ]

Page 56: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Distributions between hot and cold

(x) exp(-H(x))

= inverse temperature

= 0 hot uniform on = cold uniform on H-1(0)

(Gibbs distributions)

Page 57: Adaptive annealing:  a near-optimal  connection between  sampling and counting

(x) Normalizing factor = partition function

exp(-H(x))

Z()= exp(-H(x)) x

Z()

Distributions between hot and cold

(x) exp(-H(x))

Page 58: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Partition function

Z()= exp(-H(x)) x

have: Z(0) = ||want: Z() = |H-1(0)|

Page 59: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Partition function - example

Z()= exp(-H(x)) x have: Z(0) = ||

want: Z() = |H-1(0)|

12

4

0

Z() = 1 e-4.

+ 4 e-2.

+ 4 e-1.

+ 7 e-0.

Z(0) = 16Z()=7

Page 60: Adaptive annealing:  a near-optimal  connection between  sampling and counting

(x) exp(-H(x))Z()

Assumption: we have a sampler oracle for

SAMPLERORACLE

graph G

subset of Vfrom

Page 61: Adaptive annealing:  a near-optimal  connection between  sampling and counting

(x) exp(-H(x))Z()

Assumption: we have a sampler oracle for

W

Page 62: Adaptive annealing:  a near-optimal  connection between  sampling and counting

(x) exp(-H(x))Z()

Assumption: we have a sampler oracle for

W X = exp(H(W)( - ))

Page 63: Adaptive annealing:  a near-optimal  connection between  sampling and counting

(x) exp(-H(x))Z()

Assumption: we have a sampler oracle for

W X = exp(H(W)( - ))

E[X] = (s) X(s) s

= Z()Z()

can obtain the following ratio:

Page 64: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Partition function

Z() = exp(-H(x)) x

Our goal restated

Goal: estimate Z()=|H-1(0)| Z() =Z(1) Z(2) Z(t)

Z(0) Z(1) Z(t-1)Z(0)

0 = 0 < 1 < 2 < ... < t =

...

Page 65: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Our goal restated

Z() =Z(1) Z(2) Z(t)Z(0) Z(1) Z(t-1)

Z(0)...

How to choose the cooling schedule?

Cooling schedule:

E[Xi] =Z(i)Z(i-1)

V[Xi]E[Xi]2

O(1)

minimize length, while satisfying

0 = 0 < 1 < 2 < ... < t =

Page 66: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Our goal restated

Z() =Z(1) Z(2) Z(t)Z(0) Z(1) Z(t-1)

Z(0)...

How to choose the cooling schedule?

Cooling schedule:

E[Xi] =Z(i)Z(i-1)

V[Xi]E[Xi]2

O(1)

minimize length, while satisfying

0 = 0 < 1 < 2 < ... < t =

Page 67: Adaptive annealing:  a near-optimal  connection between  sampling and counting

1. Counting problems

2. Basic tools: Chernoff, Chebyshev

3. Dealing with large quantities (the product method)

4. Statistical physics

5. Cooling schedules (our work)

6. More...

Outline

Page 68: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Parameters: A and n

Z() =AH: {0,...,n}

Z() = exp(-H(x)) x

Z() = ak e- kk=0

n

ak = |H-1(k)|

Page 69: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Parameters

Z() =A H: {0,...,n}

independent sets

matchings

perfect matchings

k-colorings

2V

V!kV

A

EVVEn

V!

Page 70: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Parameters

Z() =A H: {0,...,n}

independent sets

matchings

perfect matchings

k-colorings

2V

V!kV

A

EVVEn

V!matchings = # ways of marrying them so that no unhappy couple

Page 71: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Parameters

Z() =A H: {0,...,n}

independent sets

matchings

perfect matchings

k-colorings

2V

V!kV

A

EVVEn

V!matchings = # ways of marrying them so that no unhappy couple

Page 72: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Parameters

Z() =A H: {0,...,n}

independent sets

matchings

perfect matchings

k-colorings

2V

V!kV

A

EVVEn

V!matchings = # ways of marrying them so that no unhappy couple

Page 73: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Parameters

Z() =A H: {0,...,n}

independent sets

matchings

perfect matchings

k-colorings

2V

V!kV

A

EVVEn

V!marry ignoring “compatibility” hamiltonian = number of unhappy couples

Page 74: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Parameters

Z() =A H: {0,...,n}

independent sets

matchings

perfect matchings

k-colorings

2V

V!kV

A

EVVEn

V!

Page 75: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Previous cooling schedules

Z() =A H: {0,...,n}

+ 1/n (1 + 1/ln A) ln A

“Safe steps”

O( n ln A)Cooling schedules of length

O( (ln n) (ln A) )

0 = 0 < 1 < 2 < ... < t =

(Bezáková,Štefankovič, Vigoda,V.Vazirani’06)

(Bezáková,Štefankovič, Vigoda,V.Vazirani’06)

Page 76: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Previous cooling schedules

Z() =A H: {0,...,n}

+ 1/n (1 + 1/ln A) ln A

“Safe steps”

O( n ln A)Cooling schedules of length

O( (ln n) (ln A) )

0 = 0 < 1 < 2 < ... < t =

(Bezáková,Štefankovič, Vigoda,V.Vazirani’06)

(Bezáková,Štefankovič, Vigoda,V.Vazirani’06)

Page 77: Adaptive annealing:  a near-optimal  connection between  sampling and counting

+ 1/n (1 + 1/ln A) ln A

“Safe steps”

(Bezáková,Štefankovič, Vigoda,V.Vazirani’06)

Z() = ak e- kk=0

nW X = exp(H(W)( - ))

1/e X 1V[X]E[X]2 e1

E[X]

Page 78: Adaptive annealing:  a near-optimal  connection between  sampling and counting

+ 1/n (1 + 1/ln A) ln A

“Safe steps”

(Bezáková,Štefankovič, Vigoda,V.Vazirani’06)

Z() = ak e- kk=0

nW X = exp(H(W)( - ))

Z() = a0 1 Z(ln A) a0 + 1 E[X] 1/2

Page 79: Adaptive annealing:  a near-optimal  connection between  sampling and counting

+ 1/n (1 + 1/ln A) ln A

“Safe steps”

(Bezáková,Štefankovič, Vigoda,V.Vazirani’06)

Z() = ak e- kk=0

nW X = exp(H(W)( - ))

E[X] 1/2e

Page 80: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Previous cooling schedules

+ 1/n (1 + 1/ln A) ln A

“Safe steps”

O( n ln A)Cooling schedules of length

O( (ln n) (ln A) )

(Bezáková,Štefankovič, Vigoda,V.Vazirani’06)

(Bezáková,Štefankovič, Vigoda,V.Vazirani’06)

1/n, 2/n, 3/n, .... , (ln A)/n, .... , ln A

Page 81: Adaptive annealing:  a near-optimal  connection between  sampling and counting

No better fixed schedule possible

Z() =A H: {0,...,n}

Za() = (1 + a e ) A1+a

- n A schedule that works for all

(with a[0,A-1])

has LENGTH ( (ln n)(ln A) )

THEOREM:

Page 82: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Parameters

Z() =A H: {0,...,n}Our main result:

non-adaptive schedules of length *( ln A )

Previously:

can get adaptive scheduleof length O* ( (ln A)1/2 )

Page 83: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Related work

can get adaptive scheduleof length O* ( (ln A)1/2 )

Lovász-Vempala Volume of convex bodies in O*(n4) schedule of length O(n1/2)(non-adaptive cooling schedule, using specific properties of the “volume” partition functions)

Page 84: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Existential part

for every partition function there exists a cooling schedule of length O*((ln A)1/2)

Lemma:

can get adaptive scheduleof length O* ( (ln A)1/2 )

there exists

Page 85: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Cooling schedule (definition refresh)

Z() =Z(1) Z(2) Z(t)Z(0) Z(1) Z(t-1)

Z(0)...

How to choose the cooling schedule?

Cooling schedule:

E[Xi] =Z(i)Z(i-1)

V[Xi]E[Xi]2

O(1)

minimize length, while satisfying

0 = 0 < 1 < 2 < ... < t =

Page 86: Adaptive annealing:  a near-optimal  connection between  sampling and counting

W X = exp(H(W)( - ))E[X2]E[X]2

Z(2-) Z()Z()2= C

E[X] Z() Z()=

Express SCV using partition function(going from to )

V[X]E[X]2 +1

=

Page 87: Adaptive annealing:  a near-optimal  connection between  sampling and counting

f()=ln Z()

Proof:

E[X2]E[X]2

Z(2-) Z()Z()2= C

C’=(ln C)/2

2-

(f(2-) + f())/2 (ln C)/2 + f()

graph of f

Page 88: Adaptive annealing:  a near-optimal  connection between  sampling and counting

f()=ln Z() f is decreasing f is convex f’(0) –n f(0) ln A

Properties of partition functions

Page 89: Adaptive annealing:  a near-optimal  connection between  sampling and counting

f()=ln Z() f is decreasing f is convex f’(0) –n f(0) ln A

f() = ln ak e- kk=0

n

f’() =ak k e- k

k=0-

n

ak e- kk=0

n

Properties of partition functions

(ln f)’ = f’f

Page 90: Adaptive annealing:  a near-optimal  connection between  sampling and counting

f()=ln Z()

f is decreasing f is convex f’(0) –n f(0) ln A

Proof:either f or f’changes a lot

Let K:=f

(ln |f’|) 1K

1Then

for every partition function there exists a cooling schedule of length O*((ln A)1/2)

GOAL: proving Lemma:

Page 91: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Proof:Let K:=f

(ln |f’|) 1K1

Then

c := (a+b)/2, := b-ahave f(c) = (f(a)+f(b))/2 – 1

(f(a) – f(c)) / f’(a) (f(c) – f(b)) / f’(b)

a bc

f is convex

Page 92: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Let K:=f

(ln |f’|) 1K

Then

c := (a+b)/2, := b-ahave f(c) = (f(a)+f(b))/2 – 1

(f(a) – f(c)) / f’(a) (f(c) – f(b)) / f’(b)f is convex

f’(b) f’(a)1-1/f e-f

Page 93: Adaptive annealing:  a near-optimal  connection between  sampling and counting

f:[a,b] R, convex, decreasingcan be “approximated” using

f’(a)f’(b)(f(a)-f(b))

segments

Page 94: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Proof:

2-

Technicality: getting to 2-

Page 95: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Proof:

2-

i

i+1

Technicality: getting to 2-

Page 96: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Proof:

2-

i

i+1 i+2

Technicality: getting to 2-

Page 97: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Proof:

2-

i

i+1 i+2

Technicality: getting to 2-

i+3

ln ln A

extrasteps

Page 98: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Existential Algorithmic

can get adaptive scheduleof length O* ( (ln A)1/2 )

there exists

can get adaptive scheduleof length O* ( (ln A)1/2 )

Page 99: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Algorithmic construction

(x) exp(-H(x))Z()

using a sampler oracle for

we can construct a cooling schedule of length 38 (ln A)1/2(ln ln A)(ln n)

Our main result:

Total number of oracle calls 107 (ln A) (ln ln A+ln n)7 ln (1/)

Page 100: Adaptive annealing:  a near-optimal  connection between  sampling and counting

current inverse temperature

ideally move to such that

E[X] =Z()Z()

E[X2]E[X]2

B2B1

Algorithmic construction

Page 101: Adaptive annealing:  a near-optimal  connection between  sampling and counting

current inverse temperature

ideally move to such that

E[X] =Z()Z()

E[X2]E[X]2

B2B1

Algorithmic construction

X is “easy to estimate”

Page 102: Adaptive annealing:  a near-optimal  connection between  sampling and counting

current inverse temperature

ideally move to such that

E[X] =Z()Z()

E[X2]E[X]2

B2B1

Algorithmic construction

we make progress (where B11)

Page 103: Adaptive annealing:  a near-optimal  connection between  sampling and counting

current inverse temperature

ideally move to such that

E[X] =Z()Z()

E[X2]E[X]2

B2B1

Algorithmic construction

need to construct a “feeler” for this

Page 104: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Algorithmic constructioncurrent inverse temperature

ideally move to such that

E[X] =Z()Z()

E[X2]E[X]2

B2B1

need to construct a “feeler” for this

= Z()Z()

Z(2)Z()

Page 105: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Algorithmic constructioncurrent inverse temperature

ideally move to such that

E[X] =Z()Z()

E[X2]E[X]2

B2B1

need to construct a “feeler” for this

= Z()Z()

Z(2)Z()

bad “feeler”

Page 106: Adaptive annealing:  a near-optimal  connection between  sampling and counting

estimator for Z()Z()

Z() = ak e- kk=0

n

For W we have P(H(W)=k) = ak e- k

Z()

Page 107: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Z() = ak e- kk=0

n

For W we have P(H(W)=k) = ak e- k

Z()

For U we have P(H(U)=k) = ak e- k

Z()

If H(X)=k likely at both , estimator

Z()Z()

estimator for

Page 108: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Z() = ak e- kk=0

n

For W we have P(H(W)=k) = ak e- k

Z()

For U we have P(H(U)=k) = ak e- k

Z()

If H(X)=k likely at both , estimator

Z()Z()

estimator for

Page 109: Adaptive annealing:  a near-optimal  connection between  sampling and counting

For W we have P(H(W)=k) = ak e- k

Z()

For U we have P(H(U)=k) = ak e- k

Z()

P(H(U)=k)P(H(W)=k)ek(-)=

Z()Z()

Z()Z()

estimator for

Page 110: Adaptive annealing:  a near-optimal  connection between  sampling and counting

For W we have P(H(W)=k) = ak e- k

Z()

For U we have P(H(U)=k) = ak e- k

Z()

P(H(U)=k)P(H(W)=k)ek(-)=

Z()Z()

Z()Z()

PROBLEM: P(H(W)=k) can be too small

estimator for

Page 111: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Rough estimator for

Z() = ak e- kk=0

n

For W we have P(H(W)[c,d]) =

ak e- k

Z()

k=c

d

Z()Z()

For U we have P(H(W)[c,d]) =

ak e- k

Z()

k=c

d

interval insteadof single value

Page 112: Adaptive annealing:  a near-optimal  connection between  sampling and counting

P(H(U)[c,d])P(H(W)[c,d]) e ec(-)

e 1

If |-| |d-c| 1 thenRough estimator for

We also need P(H(U) [c,d]) P(H(W) [c,d]) to be large.

Z()Z()

Z()Z()

Z()Z()

ak e- kk=c

d

ak e- kk=c

d ec(-) =ak e- (k-c)

k=c

d

ak e- (k-c)d

k=c

Page 113: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Split {0,1,...,n} into h 4(ln n) ln Aintervals [0],[1],[2],...,[c,c(1+1/ ln A)],...

for any inverse temperature there exists a interval with P(H(W) I) 1/8h

We say that I is HEAVY for

We will:

Page 114: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Split {0,1,...,n} into h 4(ln n) ln Aintervals [0],[1],[2],...,[c,c(1+1/ ln A)],...

for any inverse temperature there exists a interval with P(H(W) I) 1/8h

We say that I is HEAVY for

We will:

Page 115: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Algorithm find an interval I which is heavy for the current inverse temperature

see how far I is heavy (until some *)

use the interval I for the feeler

repeat

Z()Z()

Z(2)Z()

either * make progress, or * eliminate the interval I * or make a “long move”

ANALYSIS:

Page 116: Adaptive annealing:  a near-optimal  connection between  sampling and counting

distribution of h(X) where X

...

I = a heavy interval at

I isheavy

Page 117: Adaptive annealing:  a near-optimal  connection between  sampling and counting

distribution of h(X) where X

...

I = a heavy interval at

no longer heavy at !

I is NOTheavy

I isheavy

Page 118: Adaptive annealing:  a near-optimal  connection between  sampling and counting

distribution of h(X) where X’

...

I = a heavy interval at

heavy at ’

I isheavy

I isheavy

I is NOTheavy

Page 119: Adaptive annealing:  a near-optimal  connection between  sampling and counting

I isheavy

I isheavy

I is NOTheavy

I isheavy I is NOT

heavy

use binary search to find *

**+1/(2n)

= min{1/(b-a), ln A}

I=[a,b]

Page 120: Adaptive annealing:  a near-optimal  connection between  sampling and counting

I isheavy

I isheavy

I is NOTheavy

I isheavy I is NOT

heavy

use binary search to find *

**+1/(2n)

= min{1/(b-a), ln A}

I=[a,b]

How do you know that you can use binary search?

Page 121: Adaptive annealing:  a near-optimal  connection between  sampling and counting

I isheavy

I isheavy

How do you know that you can use binary search?

I is NOTheavy

I is NOTheavy

Lemma: the set of temperatures for which I is h-heavy is an interval.

ak e- kk=0

nak e- k

kI 1

8h

P(h(X) I) 1/8h for XI is h-heavy at

Page 122: Adaptive annealing:  a near-optimal  connection between  sampling and counting

How do you know that you can use binary search?

ak e- kk=0

nak e- k

kI 1

8h

c0 x0 + c1 x1 + c2 x2 + .... + cn xn

Descarte’s rule of signs: x=e-

+ - + +

sign change number of positive roots

number of sign changes

Page 123: Adaptive annealing:  a near-optimal  connection between  sampling and counting

How do you know that you can use binary search?

ak e- kk=0

nak e- k

kI 1

h

c0 x0 + c1 x1 + c2 x2 + .... + cn xn

Descarte’s rule of signs: x=e-

+ + +

sign change number of positive roots

number of sign changes

-1+x+x2+x3+...+xn

1+x+x20

-

Page 124: Adaptive annealing:  a near-optimal  connection between  sampling and counting

How do you know that you can use binary search?

ak e- kk=0

nak e- k

kI 1

8h

c0 x0 + c1 x1 + c2 x2 + .... + cn xn

Descarte’s rule of signs: x=e-

+ + +

sign change number of positive roots

number of sign changes

-

Page 125: Adaptive annealing:  a near-optimal  connection between  sampling and counting

I isheavy

I isheavy

I is NOTheavy

* *+1/(2n)

can roughly compute ratio of Z()/Z(’) for ’ [,*] if |-|.|b-a| 1

I=[a,b]

Page 126: Adaptive annealing:  a near-optimal  connection between  sampling and counting

I isheavy

I isheavy

I is NOTheavy

* *+1/(2n)

can roughly compute ratio of Z()/Z(’) for ’ [,*] if |-|.|b-a| 1

I=[a,b]

find largest such thatZ()Z()

Z(2)Z()

C

1. success

2. eliminate interval

3. long move

Page 127: Adaptive annealing:  a near-optimal  connection between  sampling and counting
Page 128: Adaptive annealing:  a near-optimal  connection between  sampling and counting

if we have sampler oracles for

then we can get adaptive scheduleof length t=O* ( (ln A)1/2 )

independent sets O*(n2) (using Vigoda’01, Dyer-Greenhill’01)

matchings O*(n2m) (using Jerrum, Sinclair’89)

spin systems: Ising model O*(n2) for <C

(using Marinelli, Olivieri’95) k-colorings O*(n2) for k>2 (using Jerrum’95)

Page 129: Adaptive annealing:  a near-optimal  connection between  sampling and counting

1. Counting problems

2. Basic tools: Chernoff, Chebyshev

3. Dealing with large quantities (the product method)

4. Statistical physics

5. Cooling schedules (our work)

6. More...

Outline

Page 130: Adaptive annealing:  a near-optimal  connection between  sampling and counting

6. More… a) proof of Dyer-Frieze

b) independent sets revisited

c) warm starts

Outline

Page 131: Adaptive annealing:  a near-optimal  connection between  sampling and counting

O(t2/2) samples (O(t/2) from each Xi) give 1 estimator of “WANTED” with prob3/4

Theorem (Dyer-Frieze’91)

Appendix – proof of:E[X1 X2 ... Xt]

= O(1)V[Xi]E[Xi]2

the Xi are easy to estimate = “WANTED” 1)

2)

Page 132: Adaptive annealing:  a near-optimal  connection between  sampling and counting

How precise do the Xi have to be?First attempt – term by term

(1 )(1 )(1 )... (1 ) 1t

t

t

t

Main idea:

each term (t2) samples (t3) total

n V[X]E[X]2

1

ln (1/)

Page 133: Adaptive annealing:  a near-optimal  connection between  sampling and counting

How precise do the Xi have to be?Analyzing SCV is better

(Dyer-Frieze’1991)

P( X gives (1)-estimate )

1 - V[X]E[X]2

1

squared coefficient of variation (SCV)GOAL: SCV(X) 2/4

X=X1 X2 ... Xt

Page 134: Adaptive annealing:  a near-optimal  connection between  sampling and counting

How precise do the Xi have to be?

(Dyer-Frieze’1991)

SCV(X) = (1+SCV(X1)) ... (1+SCV(Xt)) - 1

Main idea:

SCV(Xi) t SCV(X) <

SCV(X)= V[X]E[X]2

E[X2]E[X]2= -1

Analyzing SCV is better

proof:

Page 135: Adaptive annealing:  a near-optimal  connection between  sampling and counting

How precise do the Xi have to be?

(Dyer-Frieze’1991)

SCV(X) = (1+SCV(X1)) ... (1+SCV(Xt)) - 1

Main idea:

SCV(Xi) t SCV(X) <

SCV(X)= V[X]E[X]2

E[X2]E[X]2= -1

Analyzing SCV is better

proof:

X1, X2 independent E[X1 X2] = E[X1]E[X2]

X1, X2 independent X12,X2

2 independent

X1,X2 independent SCV(X1X2)=(1+SCV(X1))(1+SCV(X2))-1

Page 136: Adaptive annealing:  a near-optimal  connection between  sampling and counting

How precise do the Xi have to be?

(Dyer-Frieze’1991)X1 X2 ... Xt

X =Main idea:

SCV(Xi) t SCV(X) <

each term (t /2) samples (t2/2) total

Analyzing SCV is better

Page 137: Adaptive annealing:  a near-optimal  connection between  sampling and counting

6. More… a) proof of Dyer-Frieze

b) independent sets revisited

c) warm starts

Outline

Page 138: Adaptive annealing:  a near-optimal  connection between  sampling and counting

12

4Hamiltonian

0

Page 139: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Hamiltonian – many possibilities

0

1

2

(hardcore lattice gas model)

Page 140: Adaptive annealing:  a near-optimal  connection between  sampling and counting

What would be a natural hamiltonian for planar graphs?

Page 141: Adaptive annealing:  a near-optimal  connection between  sampling and counting

What would be a natural hamiltonian for planar graphs?

H(G) = number of edges

natural MC

(1+)

1(1+) try G - {u,v}

try G + {u,v}

pick u,v uniformly at random

Page 142: Adaptive annealing:  a near-optimal  connection between  sampling and counting

natural MC

(1+)

1(1+) try G - {u,v}

try G + {u,v}

pick u,v uniformly at random

uv

uv

(1+)n(n-1)/2

1(1+)n(n-1)/2G G’

Page 143: Adaptive annealing:  a near-optimal  connection between  sampling and counting

uv

uv

(1+)n(n-1)/2

1(1+)n(n-1)/2

G) number of edges

satisfies the detailed balance condition

(G) P(G,G’) = (G’) P(G’,G)

G G’

( = exp(-))

Page 144: Adaptive annealing:  a near-optimal  connection between  sampling and counting

6. More… a) proof of Dyer-Frieze

b) independent sets revisited

c) warm starts

Outline

Page 145: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Mixing time: mix = smallest t such that | t - |TV 1/e

Relaxation time: rel = 1/(1-2)

rel mix rel ln (1/min)

n ln n)

n)

(n=3)

(discrepancy may be substantially bigger for, e.g., matchings)

Page 146: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Mixing time: mix = smallest t such that | t - |TV 1/e

Relaxation time: rel = 1/(1-2)

Estimating (S)

1 if X S0 otherwiseY={

X

E[Y]=(S) ...

X1X2X3

Xs

METHOD 1

Page 147: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Mixing time: mix = smallest t such that | t - |TV 1/e

Relaxation time: rel = 1/(1-2)

Estimating (S)

1 if X S0 otherwiseY={

X

E[Y]=(S) ...

X1X2X3

Xs

METHOD 1

X1 X2 X3... Xs

METHOD 2 (Gillman’98, Kahale’96, ...)

Page 148: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Mixing time: mix = smallest t such that | t - |TV 1/e

Relaxation time: rel = 1/(1-2)

Further speed-up

X1 X2 X3... Xs

|t - |TV exp(-t/rel) Var(0/)

( (x)(0(x)/(x)-1)2)1/2 small called warm start

METHOD 2 (Gillman’98, Kahale’96, ...)

Page 149: Adaptive annealing:  a near-optimal  connection between  sampling and counting

Mixing time: mix = smallest t such that | t - |TV 1/e

Relaxation time: rel = 1/(1-2)

Further speed-up

X1 X2 X3... Xs

METHOD 2 (Gillman’98, Kahale’96, ...)

|t - |TV exp(-t/rel) Var(0/)

( (x)(0(x)/(x)-1)2)1/2 small called warm start

sample at can be used as a warm start for ’

cooling schedule can step from ’ to

Page 150: Adaptive annealing:  a near-optimal  connection between  sampling and counting

sample at can be used as a warm start for ’

cooling schedule can step from ’ to

0 1 2 3 m

....

= “well mixed” states

m=O( (ln n)(ln A) )

Page 151: Adaptive annealing:  a near-optimal  connection between  sampling and counting

0 1 2 3 m

....

= “well mixed” states

Xs

X1 X2 X3... Xs

METHOD 2

run the our cooling-schedulealgorithm with METHOD 2using “well mixed” statesas starting points

Page 152: Adaptive annealing:  a near-optimal  connection between  sampling and counting

0 1 k

Output of our algorithm: k=O*( (ln A)1/2 )

small augmentation (so that we can use sample from current as a warm start at next)

still O*( (ln A)1/2 )0 1 2 3 m

....

Use analogue of Frieze-Dyer for independent samplesfrom vector variables with slightly dependent coordinates.

Page 153: Adaptive annealing:  a near-optimal  connection between  sampling and counting

if we have sampler oracles for

then we can get adaptive scheduleof length t=O* ( (ln A)1/2 )

independent sets O*(n2) (using Vigoda’01, Dyer-Greenhill’01)

matchings O*(n2m) (using Jerrum, Sinclair’89)

spin systems: Ising model O*(n2) for <C

(using Marinelli, Olivieri’95) k-colorings O*(n2) for k>2 (using Jerrum’95)