fragmentation coagulation processescbl.eng.cam.ac.uk/pub/intranet/mlg/readinggroup/fcp_rcc.pdf ·...

20
Fragmentation Coagulation Processes Konstantina Palla Reading and Communication Club Konstantina Palla 1 / 20

Upload: others

Post on 30-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fragmentation Coagulation Processescbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/fcp_rcc.pdf · Talk based on Modelling Genetic Variations using Fragmentation-Coagulation Processes

Fragmentation Coagulation Processes

Konstantina Palla

Reading and Communication Club

Konstantina Palla 1 / 20

Page 2: Fragmentation Coagulation Processescbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/fcp_rcc.pdf · Talk based on Modelling Genetic Variations using Fragmentation-Coagulation Processes

Talk based on

• Modelling Genetic Variations using Fragmentation-Coagulation ProcessesY. W. Teh, C. Blundell and L. T. Elliott. NIPS 2011.

• Scalable Imputation of Genetic Data with a DiscreteFragmentation-Coagulation ProcessL. T. Elliott and Y. W. Teh. NIPS 2012.

Konstantina Palla 2 / 20

Page 3: Fragmentation Coagulation Processescbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/fcp_rcc.pdf · Talk based on Modelling Genetic Variations using Fragmentation-Coagulation Processes

Partition-valued Processes

Mechanisms

Discrete Fragmentation Coagulation Process

Continuous Fragmentation Coagulation Process

Konstantina Palla 3 / 20

Page 4: Fragmentation Coagulation Processescbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/fcp_rcc.pdf · Talk based on Modelling Genetic Variations using Fragmentation-Coagulation Processes

PARTITION-VALUED PROCESSES

1  3  7  

2  8  

4  5  6  

9  

1  7  

2  8  4  

5  6  

9  

3  

1  

2  

5  

6  9  

3  7  4  

8  

1  8  

2  5  

6  9  

7  4  

3  

π1   π2   π3   π4  

t1   t2   t3   t4  

Let:

• [n] denote the natural numbers1, . . . , n.

• Π[n] is the set of unlabelledpartitions of [n]

• Each π ∈ Π[n] is a set of disjointnon-empty clusters indexed by acovariate (time/location); πi

• Markov chain whose states arepartitions of the natural numbers

• When ti+1 = ti + dt and dt→ 0;Markov jump process

Konstantina Palla 4 / 20

Page 5: Fragmentation Coagulation Processescbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/fcp_rcc.pdf · Talk based on Modelling Genetic Variations using Fragmentation-Coagulation Processes

PARTITIONING MECHANISMS

1. Fragmentation

2. Coagulation

3. Fragmentation/Coagulation

Konstantina Palla 5 / 20

Page 6: Fragmentation Coagulation Processescbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/fcp_rcc.pdf · Talk based on Modelling Genetic Variations using Fragmentation-Coagulation Processes

FRAGMENTATION MECHANISM

• π(0): all objects belong to the samecluster

• define: πt+1|πt ∼ FRAGα,θf (πt)

• Partition each cluster ci of πt furtheraccording to CRPci (α, θf ), wherei = 1, . . . ,K and K is the number ofclusters in πt and θf is a parametergoverning the fragmentation rate.

• Then πt+1 is a refinement of πt

• Top down generative process fortrees

1    2    3    4    5    6    7    8    9  

1    3    4    8   2    7    9   5    6  

1    8   3    4  

1  

2    7    9   5    6  

8   3    4   2   7    9   5   6  

1   8   2   5   6  3   4   7   9  

πt+2  

πt  

πt+1  

πt+3  

πt+4  

Let ρi be the random partitions drawn from each CRPci (0, θf ) at state πt

• P(πt+1|πt) =∏K

i=1 p(ρi)

• For a single cluster in πt p(ρi) = CRPci (0, θf ) =Γ(θf )

Γ(θf +ni)θKi

f

∏Kij=1 Γ(nij),

where Ki is the number of clusters in partition ρi

• Example of discrete time case: nested CRP [Blei, Griffiths, Jordan, 2003]Konstantina Palla 6 / 20

Page 7: Fragmentation Coagulation Processescbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/fcp_rcc.pdf · Talk based on Modelling Genetic Variations using Fragmentation-Coagulation Processes

FRAGMENTATION MECHANISM - CONTINUOUS CASE

• Continuous-indexed analogue of nCRP.

• Use of θf dt, compute P(π(t + dt)|π(t)) as dt→ 0

• For a single cluster in π(t):

p(ρi) =θ

Ki−1f

Γ(ni)

∏Kij=1 Γ(nij)limdt→0dtKi−1

• Fragmentation rate of a single cluster to Ki clusters:

θfKi−1

∏Ki=1j=1 Γ(nij)

Γ(ni)

• Rate of a single cluster c in π(t) fragmenting to two clusters aand b (c = a ∪ b) is for Ki = 2: θf

Γ(na)Γ(nb)Γ(nc)

• Example: Dirichlet Diffusion Trees [Neal, 2001]. At a branchpoint:

• P(following branch k) = nkm

• P(diverging) =θf dt

m• nk: number of objects which previously took branch k• K: current number of branches from this branch point• m =

∑Kk=1 nk: number of samples which previously took

the current path• P(π(t + dt)|π(t)) = 1

1θf dt

213

1  2  3  4  

1  2  

3  4  

Konstantina Palla 7 / 20

Page 8: Fragmentation Coagulation Processescbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/fcp_rcc.pdf · Talk based on Modelling Genetic Variations using Fragmentation-Coagulation Processes

COAGULATION MECHANISM

• π(0): each object is in its owncluster.

• define: πt+1|πt ∼ COAGα,θc (πt)

• Partition the set of clusters of πt

according to a CRPπt (α, θc), replaceeach cluster with the union of itselements,

• πt+1 is coarser than πt

• Bottom up generative process fortrees

• For α = 0, K clusters in πt+1 and nin πt: P(πt+1|πt) = CRPπt (α, θc) =

Γ(θc)Γ(θc+n)θ

Kc∏K

i=1 Γ(ni)

1    2    3    4    5    6    7    8    9  

1    3    4    8   2    7    9   5    6  

1    8   3    4  

1  

2    7    9   5    6  

8   3    4   2   7    9   5   6  

1   8   2   5   6  3   4   7   9  

πt+2  

πt+4  

πt+3  

πt+1  

πt  

Konstantina Palla 8 / 20

Page 9: Fragmentation Coagulation Processescbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/fcp_rcc.pdf · Talk based on Modelling Genetic Variations using Fragmentation-Coagulation Processes

COAGULATION MECHANISM - CONTINUOUS CASE

• Continuous-indexed analogue of thediscrete coagulation tree

• Kingman’s coalescent [Kingman,1982]

• Define coagulation parameter θcdt and

π(t + dt)|π(t) ∼ COAGα, θc

dt(π(t))

• K and n the number of clusters inπ(t + dt) and π(t)

• As dt→ 0: P(π(t + dt)|π(t)) =

θcK−n[

∏Ki=1 Γ(ni)] limdt→0

dt−K+1

dt−n+1

• Rate of two clusters in π(t)coagulating to one in π(t + dt), forK = n− 1 : θ−1

c

1    2    3    4    5    6    7    8    9  

1    3    4    8   2    7    9   5    6  

1    8   3    4  

1  

2    7    9   5    6  

8   3    4   2   7    9   5   6  

1   8   2   5   6  3   4   7   9  

πt+2  

πt+4  

πt+3  

πt+1  

πt  

Konstantina Palla 9 / 20

Page 10: Fragmentation Coagulation Processescbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/fcp_rcc.pdf · Talk based on Modelling Genetic Variations using Fragmentation-Coagulation Processes

FRAGMENTATION - COAGULATION DUALITY

Duality between Pitman-Yor fragmentations and coagulations.

Theorem [Pitman, 1999]For all 0 < α < 1, 0 ≤ β < 1 and θ > −αβ, the following statements are equivalent

π ∼ CRP[n](αβ, θ) and F|π ∼ FRAGα,−αβ(c)

η ∼ CRP[n](α, θ) and ρ|η ∼ COAGβ, θα

(η)

Equivalent: P(π = S,F = T) = P(ρ = S, η = T)

1  2  3  5  8  

4  6  9  

7  

1  3  8  

4  6  9  

7  

2  5  

Fragmenta3on  

Coagula3on  

π   η  

A  

B  

C  

D  

1  3  8  

2  5  

F1  4  6  9  

F2   7  F3  

A  B  

C  

D  

ρ  

• P(π) = P(ρ) = CRP(αβ, θ)

• P(F) = P(η) = CRP(α, θ)

Fragmentation(/coagulation) is the timereversal of coagulation(/fragmentation)(with appropriately chosen parameters).

Konstantina Palla 10 / 20

Page 11: Fragmentation Coagulation Processescbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/fcp_rcc.pdf · Talk based on Modelling Genetic Variations using Fragmentation-Coagulation Processes

DISCRETE FRAGMENTATION COAGULATION PROCESS -DFCP

A   B  

C  

E  

D  

CRP[n](R,  θ)   CRP[n](R,  θ)  

CRP[n](0,  θ)   CRP[n](0,  θ)   CRP[n](0,  θ)  

πt   πt+1   πt+2  

• A Markov chain over partitions.• Transition using fragmentation followed by coagulation.• Assume T steps on Markov chain. Define (Rt)

T−1t=1 ; control the dependence

between πt and πt+1

• Duality theorem for β = 0, α = R at each step:

π ∼ CRP[n](0, θ) and Fc|π ∼ FRAGR,0(c)∀c ∈ πη ∼ CRP[n](R, θ) and ρ|η ∼ COAG0, θR

(η)

Konstantina Palla 11 / 20

Page 12: Fragmentation Coagulation Processescbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/fcp_rcc.pdf · Talk based on Modelling Genetic Variations using Fragmentation-Coagulation Processes

DFCP - CONT.

A   B  

C  

E  

D  

CRP[n](R,  θ)   CRP[n](R,  θ)  

CRP[n](0,  θ)   CRP[n](0,  θ)   CRP[n](0,  θ)  

πt   πt+1   πt+2  

• Stationary distribution: CRP(0, θ) Show detailed balanceP(A)P(A→ B) = P(B)P(B→ A) Hint:P(A)P(A→ B) =

∑C P(A)P(B|C)P(C|A) and apply duality twice

Konstantina Palla 12 / 20

Page 13: Fragmentation Coagulation Processescbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/fcp_rcc.pdf · Talk based on Modelling Genetic Variations using Fragmentation-Coagulation Processes

DFCP - CONT.

A   B  

C  

E  

D  

CRP[n](R,  θ)   CRP[n](R,  θ)  

CRP[n](0,  θ)   CRP[n](0,  θ)   CRP[n](0,  θ)  

• Reversible markov chain; due to F/C duality

• Exchangeable: each πt has CRP marginal

• Projectivity: CRP projective

Konstantina Palla 13 / 20

Page 14: Fragmentation Coagulation Processescbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/fcp_rcc.pdf · Talk based on Modelling Genetic Variations using Fragmentation-Coagulation Processes

CONTINUOUS FCP

• Continuum limit of DFCP;π(t + dt)|π(t) when dt→ 0

• Time Markov process overpartitions, an exchangeablefragmentation-coalescence process[Berestycki, 2004]

• Binary events only: at most onefragmentation or one coagulation ateach time

• One cluster fragments to two,OR

• Two clusters coagulate to one• Parameters: θ = R

α, where

α > 0 and R > 0 parametersgoverning the rate ofcoagulation and fragmentationrespectively

A   B  

C  

CRP[n](Rdt,  θ)  

CRP[n](0,  θ)   CRP[n](0,  θ)  

πt   πt+dt  

Konstantina Palla 14 / 20

Page 15: Fragmentation Coagulation Processescbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/fcp_rcc.pdf · Talk based on Modelling Genetic Variations using Fragmentation-Coagulation Processes

CONTINUOUS FCP - CONT.

Why binary events?

Fragmentation: for each cluster ci in π(t)

• θf = R, probability of fragmentingto Ki clusters in π(t + dt):

P(ρi) =

RKi−1

Γ(ni)

Ki∏j=1

Γ(nij)limdt→0dtKi−1

=

O(1) ,Ki = 1O(dt) ,Ki = 2O(dt2) ,Ki > 2

Coagulation: K clusters in π(t + dt), n inπ(t)

• Set θc = 1α

P(π(t + dt)|π(t)) =

αn−KK∏

i=1

Γ(ni)limdt→0dt−K+1

dt−n+1

=

O(1) ,K = nO(dt) ,K = n− 1O(dt2) ,K < n− 1

Konstantina Palla 15 / 20

Page 16: Fragmentation Coagulation Processescbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/fcp_rcc.pdf · Talk based on Modelling Genetic Variations using Fragmentation-Coagulation Processes

CONTINUOUS FCP - CONT.

• CFCP continuous-time Markov process π = (π(t), t ∈ [0, T])

• Space of partitions (states) is finite→Markov jump process

• Transition rate matrix Q[qij]: the transition rate from state i to j

• Total transition rate from state i: qi =∑

j6=i qij

qi = fragrate + coagrate = R∑

c Hnc−1 + α n(n−1)2

• Homogeneous Poisson Process: initial state π(0) = s, mean number of eventsqsT

• Interarrival time ∼ exp(qs)

• Equilibrium distribution: CRP(0, Rα

).

Konstantina Palla 16 / 20

Page 17: Fragmentation Coagulation Processescbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/fcp_rcc.pdf · Talk based on Modelling Genetic Variations using Fragmentation-Coagulation Processes

SEQUENTIAL GENERATIVE PROCESS IN CFCP

• Incremental construction: describe the law of the path of object i given thepaths of 1, . . . , i− 1 objects

• Let ci(t) denote the cluster object i belongs to at time t.

• c1(t) = constant

• t = 0 : CRPn(0, Rα

)

• t>0: If ith object is in an existing cluster c, ci(t−) = c:

• if c fragments to a and b: P(ci(t)|ci(t−))

=

{|a||c|−1 , ci(t) = a|b||c|−1 , ci(t) = b

Same as DFT’s probability of following a path.• If c coagulates at time t: P(ci(t) = c′) = 1• If no F/C event involving c takes place, c fragments with rate: R

|c|−1Same as DFT’s branching probability.

• t>0: If ith object is alone in cluster, ci(t−) = ∅, it joins an existing cluster withrate α and any existing cluster with rate α|π|[i−1](t)|

Same as Kingman’s coalescent.

Konstantina Palla 17 / 20

Page 18: Fragmentation Coagulation Processescbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/fcp_rcc.pdf · Talk based on Modelling Genetic Variations using Fragmentation-Coagulation Processes

CONDITIONAL DISTRIBUTION IN CFCP - CONT.

1  3  6  

2  7  

4  5    8  

9  

1  3  6  

2  4  5  7  8  9  

1  3  6  

2  5  8  

47  

9  

θ/(θ+n)   Coag,  rate:  α  

3/5  

2/5  

Frag,  rate:  R/2  

(figure taken from Yee Whye Teh’s slides, MLSS 2011)Konstantina Palla 18 / 20

Page 19: Fragmentation Coagulation Processescbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/fcp_rcc.pdf · Talk based on Modelling Genetic Variations using Fragmentation-Coagulation Processes

CFCP - PROPERTIES

• Rate of fragmentation same as for Dirichlet diffusion trees (with constant rate)

• Rate of coagulation is same as Kingman’s coalescent.

• Markov process is

• reversible; fragmentation (DFT) is precisely the converse of coagulation(KC).

• exchangeable; CRP marginals

Konstantina Palla 19 / 20

Page 20: Fragmentation Coagulation Processescbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/fcp_rcc.pdf · Talk based on Modelling Genetic Variations using Fragmentation-Coagulation Processes

Thank you!

Konstantina Palla 20 / 20