considerate approaches to abc model selection
DESCRIPTION
Talk given at ISBA 2012 in the Approximate Bayesian Computation Special Topic SessionTRANSCRIPT
Considerate Approaches to ABC Model Selection
Michael P.H. Stumpf, ChristopherBarnes, Sarah Filippi, Thomas Thorne
Theoretical Systems Biology Group
26/06/2012
Considerate Approaches to ABC Model Selection Stumpf et al. 1 of 15
Evolving Networks
(a) Duplication attachment (b) Duplication attachmentwith complimentarity
(c) Linear preferentialattachment
wi
wj
(d) General scale-free
Considerate Approaches to ABC Model Selection Stumpf et al. Model Selection 2 of 15
Inference and Model Selection
We have observed data, D, that was generated by some system thatwe seek to describe by a mathematical model. In principle we canhave a model-set, M = M1, . . . , Mν, where each model Mi has anassociated parameter θi .We may know the different constituent parts of the system, Xi , andhave measurements for some or all of them under some experimentaldesigns, T.
Model Posterior︷ ︸︸ ︷Pr(Mi |T,D)
=
Likelihood︷ ︸︸ ︷Pr(D|Mi ,T)
Prior︷ ︸︸ ︷π(Mi)
ν∑j=1
Pr(D|Mj ,T)π(Mj)︸ ︷︷ ︸Evidence
For complicated models and/ordetailed data the likelihoodevaluation can becomeprohibitively expensive.
Approximate InferenceWe can approximate the likelihood and/or the models. The “true”model is unlikely to be in M anyway.
Considerate Approaches to ABC Model Selection Stumpf et al. Model Selection 3 of 15
Inference and Model Selection
We have observed data, D, that was generated by some system thatwe seek to describe by a mathematical model. In principle we canhave a model-set, M = M1, . . . , Mν, where each model Mi has anassociated parameter θi .We may know the different constituent parts of the system, Xi , andhave measurements for some or all of them under some experimentaldesigns, T.
Model Posterior︷ ︸︸ ︷Pr(Mi |T,D)
=
Likelihood︷ ︸︸ ︷Pr(D|Mi ,T)
Prior︷ ︸︸ ︷π(Mi)
ν∑j=1
Pr(D|Mj ,T)π(Mj)︸ ︷︷ ︸Evidence
For complicated models and/ordetailed data the likelihoodevaluation can becomeprohibitively expensive.
Approximate InferenceWe can approximate the likelihood and/or the models. The “true”model is unlikely to be in M anyway.
Considerate Approaches to ABC Model Selection Stumpf et al. Model Selection 3 of 15
Inference and Model Selection
We have observed data, D, that was generated by some system thatwe seek to describe by a mathematical model. In principle we canhave a model-set, M = M1, . . . , Mν, where each model Mi has anassociated parameter θi .We may know the different constituent parts of the system, Xi , andhave measurements for some or all of them under some experimentaldesigns, T.
Model Posterior︷ ︸︸ ︷Pr(Mi |T,D)=
Likelihood︷ ︸︸ ︷Pr(D|Mi ,T)
Prior︷ ︸︸ ︷π(Mi)
ν∑j=1
Pr(D|Mj ,T)π(Mj)︸ ︷︷ ︸Evidence
For complicated models and/ordetailed data the likelihoodevaluation can becomeprohibitively expensive.
Approximate InferenceWe can approximate the likelihood and/or the models. The “true”model is unlikely to be in M anyway.
Considerate Approaches to ABC Model Selection Stumpf et al. Model Selection 3 of 15
Inference and Model Selection
We have observed data, D, that was generated by some system thatwe seek to describe by a mathematical model. In principle we canhave a model-set, M = M1, . . . , Mν, where each model Mi has anassociated parameter θi .We may know the different constituent parts of the system, Xi , andhave measurements for some or all of them under some experimentaldesigns, T.
Model Posterior︷ ︸︸ ︷Pr(Mi |T,D)=
Likelihood︷ ︸︸ ︷Pr(D|Mi ,T)
Prior︷ ︸︸ ︷π(Mi)
ν∑j=1
Pr(D|Mj ,T)π(Mj)︸ ︷︷ ︸Evidence
For complicated models and/ordetailed data the likelihoodevaluation can becomeprohibitively expensive.
Approximate InferenceWe can approximate the likelihood and/or the models. The “true”model is unlikely to be in M anyway.
Considerate Approaches to ABC Model Selection Stumpf et al. Model Selection 3 of 15
Inference and Model Selection
We have observed data, D, that was generated by some system thatwe seek to describe by a mathematical model. In principle we canhave a model-set, M = M1, . . . , Mν, where each model Mi has anassociated parameter θi .We may know the different constituent parts of the system, Xi , andhave measurements for some or all of them under some experimentaldesigns, T.
Model Posterior︷ ︸︸ ︷Pr(Mi |T,D)=
Likelihood︷ ︸︸ ︷Pr(D|Mi ,T)
Prior︷ ︸︸ ︷π(Mi)
ν∑j=1
Pr(D|Mj ,T)π(Mj)︸ ︷︷ ︸Evidence
For complicated models and/ordetailed data the likelihoodevaluation can becomeprohibitively expensive.
Approximate InferenceWe can approximate the likelihood and/or the models. The “true”model is unlikely to be in M anyway.
Considerate Approaches to ABC Model Selection Stumpf et al. Model Selection 3 of 15
Approximate Bayesian Computation
We can define the posterior as
p(θi |x) =f (x |θi)π(θi)
p(x)Here fi(x |θ) is the likelihood which is often hard to evaluate; considerfor example
y = max[0, y+g1+y×g2] with g1, g2 ∼ N(0,σ1/2) anddydt
= g(y ; θ).
But we can still simulate from the data-generating model, whence
p(θi |x) =∫X
1(y = x)f (y |θi)π(θi)
p(x)dy
≈∫X
1 (∆(y , x) < ε) f (y |θi)π(θi)
p(x)dy
Solutions for Complex Problems (?)Approximate (i) data, (ii) model or (iii) distance.
Considerate Approaches to ABC Model Selection Stumpf et al. Approximate Bayesian Computation 4 of 15
Approximate Bayesian Computation
We can define the posterior as
p(θi |x) =f (x |θi)π(θi)
p(x)Here fi(x |θ) is the likelihood which is often hard to evaluate; considerfor example
y = max[0, y+g1+y×g2] with g1, g2 ∼ N(0,σ1/2) anddydt
= g(y ; θ).
But we can still simulate from the data-generating model, whence
p(θi |x) =∫X
1(y = x)f (y |θi)π(θi)
p(x)dy
≈∫X
1 (∆(y , x) < ε) f (y |θi)π(θi)
p(x)dy
Solutions for Complex Problems (?)Approximate (i) data, (ii) model or (iii) distance.
Considerate Approaches to ABC Model Selection Stumpf et al. Approximate Bayesian Computation 4 of 15
Approximate Bayesian Computation
We can define the posterior as
p(θi |x) =f (x |θi)π(θi)
p(x)Here fi(x |θ) is the likelihood which is often hard to evaluate; considerfor example
y = max[0, y+g1+y×g2] with g1, g2 ∼ N(0,σ1/2) anddydt
= g(y ; θ).
But we can still simulate from the data-generating model, whence
p(θi |x) =∫X
1(y = x)f (y |θi)π(θi)
p(x)dy
≈∫X
1 (∆(y , x) < ε) f (y |θi)π(θi)
p(x)dy
Solutions for Complex Problems (?)Approximate (i) data, (ii) model or (iii) distance.
Considerate Approaches to ABC Model Selection Stumpf et al. Approximate Bayesian Computation 4 of 15
ABC with Summary Statistics
If the data, D, are very complex and detailed, direct comparisonbetween real and simulated data becomes prohibitive. In suchsituations, which originally motivated ABC approaches, summarystatistics of the data are compared. We then have
pS,ε(θi |D) ∝∫X
1 (∆ (S(x)), S(yθ)) < ε) f (y |θ)π(θi)dy
Sufficient StatisticsThis only works is the statistic S(.) is sufficient, i.e. if for s = S(x) wehave
p(x |s, θ) = p(x |s)
Sufficency for Model SelectionIf S(.) is sufficient for parameter estimation (in all models iconsidered) it is not necessarily sufficient for model selection (Robertet al., PNAS (2011)).
Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 5 of 15
ABC with Summary Statistics
If the data, D, are very complex and detailed, direct comparisonbetween real and simulated data becomes prohibitive. In suchsituations, which originally motivated ABC approaches, summarystatistics of the data are compared. We then have
pS,ε(θi |D) ∝∫X
1 (∆ (S(x)), S(yθ)) < ε) f (y |θ)π(θi)dy
Sufficient StatisticsThis only works is the statistic S(.) is sufficient, i.e. if for s = S(x) wehave
p(x |s, θ) = p(x |s)
Sufficency for Model SelectionIf S(.) is sufficient for parameter estimation (in all models iconsidered) it is not necessarily sufficient for model selection (Robertet al., PNAS (2011)).
Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 5 of 15
ABC with Summary Statistics
If the data, D, are very complex and detailed, direct comparisonbetween real and simulated data becomes prohibitive. In suchsituations, which originally motivated ABC approaches, summarystatistics of the data are compared. We then have
pS,ε(θi |D) ∝∫X
1 (∆ (S(x)), S(yθ)) < ε) f (y |θ)π(θi)dy
Sufficient StatisticsThis only works is the statistic S(.) is sufficient, i.e. if for s = S(x) wehave
p(x |s, θ) = p(x |s)
Sufficency for Model SelectionIf S(.) is sufficient for parameter estimation (in all models iconsidered) it is not necessarily sufficient for model selection (Robertet al., PNAS (2011)).
Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 5 of 15
ABC with Summary Statistics
Generate data X ∼ N(1, 1) and use ABC to infer µ (assuming thatσ2 = 1 is known).
θ
p(θ)
0
200
400
600
0
50
100
150
200
250
mean
−4 −2 0 2 4min
−4 −2 0 2 4
0
5
10
15
20
25
30
0
50
100
150
200
250
300
var
−4 −2 0 2 4max
−4 −2 0 2 4
Role of Summary StatisticsMean (sufficient) correctly
infers µ.
Max/Min capture someinformation on µ.
Var fails to capture anyinformation on µ.
We need a way of constructingsets of statistics that together are(approximately) sufficient.
Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 5 of 15
ABC with Summary Statistics
Generate data X ∼ N(1, 1) and use ABC to infer µ (assuming thatσ2 = 1 is known).
θ
p(θ)
0
200
400
600
0
50
100
150
200
250
mean
−4 −2 0 2 4min
−4 −2 0 2 4
0
5
10
15
20
25
30
0
50
100
150
200
250
300
var
−4 −2 0 2 4max
−4 −2 0 2 4
Role of Summary StatisticsMean (sufficient) correctly
infers µ.
Max/Min capture someinformation on µ.
Var fails to capture anyinformation on µ.
We need a way of constructingsets of statistics that together are(approximately) sufficient.
Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 5 of 15
A Closer Look at Summary Statistics
We interpret a summary statistic as a function,
S : Rd −→ Rw , S(x) = s.
If S is sufficient then (we include the model indicator variable in θ)
p(θ|x) = p(θ|s)
Information Theoretical PerspectiveA summary statistic is an information compression device. Now let Sbe a set of statistics which together are sufficient. Then the mutualinformation
I(Θ; X ) =
∫Ω
∫X
p(θ, x) logp(θ, x)
p(θ)p(x)dθdx = I(θ, S)
Constructing Minimally Sufficient Summary StatisticsWe seek the set U ⊆ S with minimal cardinality such thatI(Θ; S) = I(Θ;U).
Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 6 of 15
A Closer Look at Summary Statistics
We interpret a summary statistic as a function,
S : Rd −→ Rw , S(x) = s.
If S is sufficient then (we include the model indicator variable in θ)
p(θ|x) = p(θ|s)Information Theoretical PerspectiveA summary statistic is an information compression device. Now let Sbe a set of statistics which together are sufficient. Then the mutualinformation
I(Θ; X ) =
∫Ω
∫X
p(θ, x) logp(θ, x)
p(θ)p(x)dθdx = I(θ, S)
Constructing Minimally Sufficient Summary StatisticsWe seek the set U ⊆ S with minimal cardinality such thatI(Θ; S) = I(Θ;U).
Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 6 of 15
A Closer Look at Summary Statistics
We interpret a summary statistic as a function,
S : Rd −→ Rw , S(x) = s.
If S is sufficient then (we include the model indicator variable in θ)
p(θ|x) = p(θ|s)Information Theoretical PerspectiveA summary statistic is an information compression device. Now let Sbe a set of statistics which together are sufficient. Then the mutualinformation
I(Θ; X ) =
∫Ω
∫X
p(θ, x) logp(θ, x)
p(θ)p(x)dθdx = I(θ, S)
Constructing Minimally Sufficient Summary StatisticsWe seek the set U ⊆ S with minimal cardinality such thatI(Θ; S) = I(Θ;U).
Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 6 of 15
Constructing Sufficient Statistics
Proposition
Let X be a random variable generated according to f (·|θ). Let S be asummary statistic and U and T two subsets of S such that U = U(X ),T = T(X ) and S = S(X ) satisfy U ⊂ T ⊂ S. We have
I(Θ; S|T ) = I(Θ; S|U) − I(Θ; T |U) .
In order to construct a subset T of S such that I(Θ; S|T ) = 0, it is thussufficient to add statistics from S one by one until the condition holds.If we denote by S(k) the k th statistic to be added (with k 6 w) we haveS(k) = S(k)(X ), and then
I(Θ; S|S(1), . . . , S(k+1)) 6 I(Θ; S|S(1), . . . , S(k)) .
Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 7 of 15
Constructing Sufficient Statistics
I(Θ; S|U) =
∫Ω
∫X
p(θ, S(x),U(x)) logp(θ, S(x)|U(x))
p(θ|U(x))p(S(x)|U(x))dxdθ
=
∫X
p(S(x)) [KL(p(Θ|S(x))||p(Θ|U(x)))] dx
= Ep(X) [KL(p(Θ|S(X ))||p(Θ|U(X )))]
An Impossible Algorithm• for all subsets u∗ ⊆ s∗ , perform ABC to obtain estimates pε(Θ|u∗)• determine the setA = u∗ ⊂ s∗ such that KL (pε(Θ|s∗)||pε(Θ|u∗)) = 0,
• the desired subset is argminu∗∈A |u∗|
Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 7 of 15
Constructing Sufficient Statistics
input: a sufficient set of statistics whose values on the dataset is s∗ =s∗1 , . . . , s∗w , a threshold δoutput: a subset v∗ of s∗
choose randomly u∗ in s∗
v∗ ← u∗
q∗ ← s∗\v∗
repeatrepeat
if q∗ = Ø then return v∗
end ifchoose randomly u∗ in q∗
q∗ ← q∗\u∗
perform ABC to obtain pε(Θ|v∗, u∗)until KL (pε(Θ|v∗, u∗)||pε(Θ|v∗)) > δoptionally: v∗ ← OrderDependency (v∗, u∗)v∗ ← v∗ ∪ u∗
q∗ ← s∗\v∗
until q∗ = Øreturn v∗
Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 7 of 15
Examples: Normal Distributions
y1, ...yd ∼ N(µ,σ21) and y1, ...yd ∼ N(µ,σ2
2)
Run
20
40
60
80
100
mean S2 range max random
Run
20
40
60
80
100
mean S2 range max random
Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 8 of 15
Examples: Normal Distributions
y1, ...yd ∼ N(µ,σ21) and y1, ...yd ∼ N(µ,σ2
2)
−2 0 2 4 6 8
−2
02
46
log(BF) predicted
log(
BF
) A
BC
−2 0 2 4 6 8
−2
02
46
8
log(BF) predicted
log(
BF
) A
BC
Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 8 of 15
Examples: Population Genetics
Constant PopulationSize
Run
20
40
60
80
100
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11
ExponentialPopulation Growth
Run
20
40
60
80
100
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11
Two-Island Modelwith Migration
Run
20
40
60
80
100
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11
[S1] Number of Segregating Sites; [S2] Number of Distinct Haplotypes,; [S3] Haplotype Homozygosity; [S4] Average SNPHomozygosity; [S5] Number of occurrences of most common haplotype; [S6] Mean number of pair-wise differences betweenhaplotypes; [S7] Number of Singleton Haplotypes; [S8] Number of Singleton SNPs; [S9] Linkage Disequilibrium.
Summary Statistic ChoiceThe choice of summary statistics appears to depend subtely on thetrue data-generating model. In light of coalescent processes this is,however, to be expected.
Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 9 of 15
Examples: Population Genetics
Constant PopulationSize
Run
20
40
60
80
100
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11
ExponentialPopulation Growth
Run
20
40
60
80
100
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11
Two-Island Modelwith Migration
Run
20
40
60
80
100
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11
[S1] Number of Segregating Sites; [S2] Number of Distinct Haplotypes,; [S3] Haplotype Homozygosity; [S4] Average SNPHomozygosity; [S5] Number of occurrences of most common haplotype; [S6] Mean number of pair-wise differences betweenhaplotypes; [S7] Number of Singleton Haplotypes; [S8] Number of Singleton SNPs; [S9] Linkage Disequilibrium.
Summary Statistic ChoiceThe choice of summary statistics appears to depend subtely on thetrue data-generating model. In light of coalescent processes this is,however, to be expected.
Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 9 of 15
Examples: Population Genetics
Constant PopulationSize
Run
20
40
60
80
100
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11
ExponentialPopulation Growth
Run
20
40
60
80
100
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11
Two-Island Modelwith Migration
Run
20
40
60
80
100
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11
[S1] Number of Segregating Sites; [S2] Number of Distinct Haplotypes,; [S3] Haplotype Homozygosity; [S4] Average SNPHomozygosity; [S5] Number of occurrences of most common haplotype; [S6] Mean number of pair-wise differences betweenhaplotypes; [S7] Number of Singleton Haplotypes; [S8] Number of Singleton SNPs; [S9] Linkage Disequilibrium.
Summary Statistic ChoiceThe choice of summary statistics appears to depend subtely on thetrue data-generating model. In light of coalescent processes this is,however, to be expected.
Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 9 of 15
Examples: Random Walks
Classical RandomWalk
Run
20
40
60
80
100
S1 S2 S3 S4 S5
Persistent RandomWalk
Run
20
40
60
80
100
S1 S2 S3 S4 S5
Biased RandomWalk
Run
20
40
60
80
100
S1 S2 S3 S4 S5
[S1] Mean square displacement; [S2] Mean x and y displacement; [S3] Mean square x and y displacement; [S4] Straightnessindex; [S5] Eigenvalues of gyration tensor.
Parameter Sufficiency for Complex ProblemsHere all statistics that have been chosen for parameter estimation arealso chosen for model selection.
Considerate Approaches to ABC Model Selection Stumpf et al. ABC Summary Statistics 9 of 15
Conditioning on Information
s1 s2 s3
x
Θ
StatisticsSufficient: Implicates same area as
full data.
Ancillary: Implicates all values of θequally.
What is the meaning ofp(θ|s0, s1, . . . , sn)?
Let s = (s0, s1, . . . , sn), andassume I(θ, s) < I(θ, x) butε→ 0.This can happen for sufficientand ancillary s. In the lattercase we obtain
p(θ|s) = π(θ).
How about
p(t |s)
if s is not (quite) sufficient?
Considerate Approaches to ABC Model Selection Stumpf et al. Interpreting ABC 10 of 15
Conditioning on Information
s1 s2
s3
x
Θ
StatisticsSufficient: Implicates same area as
full data.
Ancillary: Implicates all values of θequally.
What is the meaning ofp(θ|s0, s1, . . . , sn)?
Let s = (s0, s1, . . . , sn), andassume I(θ, s) < I(θ, x) butε→ 0.This can happen for sufficientand ancillary s. In the lattercase we obtain
p(θ|s) = π(θ).
How about
p(t |s)
if s is not (quite) sufficient?
Considerate Approaches to ABC Model Selection Stumpf et al. Interpreting ABC 10 of 15
Conditioning on Information
s1 s2
s3
x
Θ
StatisticsSufficient: Implicates same area as
full data.
Ancillary: Implicates all values of θequally.
What is the meaning ofp(θ|s0, s1, . . . , sn)?
Let s = (s0, s1, . . . , sn), andassume I(θ, s) < I(θ, x) butε→ 0.This can happen for sufficientand ancillary s. In the lattercase we obtain
p(θ|s) = π(θ).
How about
p(t |s)
if s is not (quite) sufficient?
Considerate Approaches to ABC Model Selection Stumpf et al. Interpreting ABC 10 of 15
Conditioning on Information
s1 s2
s3
x
Θ
StatisticsSufficient: Implicates same area as
full data.
Ancillary: Implicates all values of θequally.
What is the meaning ofp(θ|s0, s1, . . . , sn)?
Let s = (s0, s1, . . . , sn), andassume I(θ, s) < I(θ, x) butε→ 0.This can happen for sufficientand ancillary s. In the lattercase we obtain
p(θ|s) = π(θ).
How about
p(t |s)
if s is not (quite) sufficient?
Considerate Approaches to ABC Model Selection Stumpf et al. Interpreting ABC 10 of 15
Model Selection vs. Model Checking
Model Selection: Several models M ∈M are compared and one ormore are chosen in light of the data: Find models whichare better than others.
Model Checking: The quality of a model Mi is assessed against theavailable data: Determine if a model is actually ‘good’.
Alternative Approach: ABCµ [Ratmann et al., PNAS].
Posterior Predictive ChecksWe are interested in the posterior predictive distribution,
p(t(X )|s(X )) =
∫Θ
p(t(X )|θ)p(θ|s(X ))dθ.
In particular we have
p(s(X )|s(X )) 6= p(s(X )|X )
unless t(X ) is sufficient.
Considerate Approaches to ABC Model Selection Stumpf et al. Interpreting ABC 11 of 15
Model Selection vs. Model Checking
Model Selection: Several models M ∈M are compared and one ormore are chosen in light of the data: Find models whichare better than others.
Model Checking: The quality of a model Mi is assessed against theavailable data: Determine if a model is actually ‘good’.
Alternative Approach: ABCµ [Ratmann et al., PNAS].
Posterior Predictive ChecksWe are interested in the posterior predictive distribution,
p(t(X )|s(X )) =
∫Θ
p(t(X )|θ)p(θ|s(X ))dθ.
In particular we have
p(s(X )|s(X )) 6= p(s(X )|X )
unless t(X ) is sufficient.
Considerate Approaches to ABC Model Selection Stumpf et al. Interpreting ABC 11 of 15
ABC on Network Data
(e) Duplication attachment (f) Duplication attachmentwith complimentarity
(g) Linear preferentialattachment
wi
wj
(h) General scale-free
Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 12 of 15
ABC on Network Data
Summarizing Networks• Data are noisy and incomplete.• We can simulate models of network
evolution, but this does not allow us tocalculate likelihoods for all but verytrivial models.
• There is also no sufficient statistic thatwould allow us to summarize networks,so ABC approaches require somethought.
• Many possible summary statistics ofnetworks are expensive to calculate.
Full likelihood: Wiuf et al., PNAS (2006).
ABC: Ratman et al., PLoS Comp.Biol. (2008).
ABC (better): Thorne & Stumpf, J.Roy.Soc. Interface (2012).
Stumpf & Wiuf, J. Roy. Soc. Interface (2010).
Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 12 of 15
Spectral Distances
a
b
c
d e0 1 1 1 01 0 1 1 01 1 0 0 01 1 0 0 10 0 0 1 0
a b c d e
abcde
A =
Graph SpectraGiven a graph G with nodes N and edges (i, j) ∈ E with i, j ∈ N, theadjacency matrix, A, of the graph is defined by
ai,j =
1 if (i, j) ∈ E ,
0 otherwise.
The eigenvalues, λ, of this matrix provide one way of defining thegraph spectrum.
Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 12 of 15
Spectral Distances
A simple distance measure between graphs having adjacencymatrices A and B, known as the edit distance, is to count the numberof edges that are not shared by both graphs,
D(A, B) =∑
i,j
(ai,j − bi,j)2.
However for unlabelled graphs we require some mapping h fromi ∈ NA to i ′ ∈ NB that minimizes the distance
D(A, B) > D ′h(A, B) =∑
i,j
(ai,j − bh(i),h(j))2,
Given a spectrum (which is relatively cheap to compute) we have
D ′(A, B) =∑
l
(λ(α)l − λ
(β)l
)2
Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 13 of 15
Spectral Distances
A simple distance measure between graphs having adjacencymatrices A and B, known as the edit distance, is to count the numberof edges that are not shared by both graphs,
D(A, B) =∑
i,j
(ai,j − bi,j)2.
However for unlabelled graphs we require some mapping h fromi ∈ NA to i ′ ∈ NB that minimizes the distance
D(A, B) > D ′h(A, B) =∑
i,j
(ai,j − bh(i),h(j))2,
Given a spectrum (which is relatively cheap to compute) we have
D ′(A, B) =∑
l
(λ(α)l − λ
(β)l
)2
Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 13 of 15
Spectral Distances
A simple distance measure between graphs having adjacencymatrices A and B, known as the edit distance, is to count the numberof edges that are not shared by both graphs,
D(A, B) =∑
i,j
(ai,j − bi,j)2.
However for unlabelled graphs we require some mapping h fromi ∈ NA to i ′ ∈ NB that minimizes the distance
D(A, B) > D ′h(A, B) =∑
i,j
(ai,j − bh(i),h(j))2,
Given a spectrum (which is relatively cheap to compute) we have
D ′(A, B) =∑
l
(λ(α)l − λ
(β)l
)2
Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 13 of 15
Protein Interaction Network Data
Species Proteins Interactions Genome size Sampling fraction
S.cerevisiae 5035 22118 6532 0.77
D. melanogaster 7506 22871 14076 0.53
H. pylori 715 1423 1589 0.45
E. coli 1888 7008 5416 0.35
Model
Mod
el p
roba
bilit
y
0.0
0.1
0.2
0.3
0.4
0.5
DA DAC LPA SF DACL DACR
Organism
S.cerevisae
D.melanogaster
H.pylori
E.coli
Model Selection• Inference here was based on all
the data, not summarystatistics.
• Duplication models receive thestrongest support from the data.
• Several models receive supportand no model is chosenunambiguously.
Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 14 of 15
Protein Interaction Network Data
Species Proteins Interactions Genome size Sampling fraction
S.cerevisiae 5035 22118 6532 0.77
D. melanogaster 7506 22871 14076 0.53
H. pylori 715 1423 1589 0.45
E. coli 1888 7008 5416 0.35
Model
Mod
el p
roba
bilit
y
0.0
0.1
0.2
0.3
0.4
0.5
DA DAC LPA SF DACL DACR
Organism
S.cerevisae
D.melanogaster
H.pylori
E.coli
Model Selection• Inference here was based on all
the data, not summarystatistics.
• Duplication models receive thestrongest support from the data.
• Several models receive supportand no model is chosenunambiguously.
Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 14 of 15
Protein Interaction Network Data
Species Proteins Interactions Genome size Sampling fraction
S.cerevisiae 5035 22118 6532 0.77
D. melanogaster 7506 22871 14076 0.53
H. pylori 715 1423 1589 0.45
E. coli 1888 7008 5416 0.35
Model
Mod
el p
roba
bilit
y
0.0
0.1
0.2
0.3
0.4
0.5
DA DAC LPA SF DACL DACR
Organism
S.cerevisae
D.melanogaster
H.pylori
E.coli
Model Selection• Inference here was based on all
the data, not summarystatistics.
• Duplication models receive thestrongest support from the data.
• Several models receive supportand no model is chosenunambiguously.
Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 14 of 15
Protein Interaction Network Data
0.0 0.4 0.8
05
1015
δD
A
0.0 0.4 0.8
02
46
8
α
0.0 0.4 0.8
05
1015
δ
DA
C
0.0 0.4 0.8
02
46
8
α
0.0 0.4 0.8
02
46
810
δ
DA
CL
0.0 0.4 0.8
01
23
4
α
0.0 0.4 0.8
02
46
810
p
0 2 4 6 8 10
0.0
0.2
0.4
0.6
0.8
1.0
m
0.0 0.4 0.8
02
46
8
δ
DA
CR
0.0 0.4 0.8
01
23
4
α
0.0 0.4 0.8
01
23
45
p
0 2 4 6 8 10
0.0
0.2
0.4
0.6
0.8
1.0
m
S.cerevisiaeD. melanogasterH. pyloriE. coli
Considerate Approaches to ABC Model Selection Stumpf et al. Network Evolution 14 of 15
Considerate Use of ABC
• ABC is a tool for situations where conventional statisticalapproaches fail or are too cumbersome.
• If all the data are used then this is (relatively) unproblematic; if thedata are compressed/corrupted then caution is required.
• Some of the issues arising in ABC mirror those also encounteredin “conventional” statistics:
Any Bayesian inference uses the data only via the minimalsufficient statistic. This is because the calculation of theposterior distribution involves multiplying the likelihood by theprior and normalizing. Any factor of the likelihood that is afunction of y alone will disappear after normalization.
D. Cox (2006).• In other cases it seems prudent to accept the additional (and
considerable) computational cost of constructing suitable summarystatistics (such as in Barnes et al., Stat&Comp 2012).
Considerate Approaches to ABC Model Selection Stumpf et al. Conclusion 15 of 15
Acknowledgements
Considerate Approaches to ABC Model Selection Stumpf et al. Conclusion 15 of 15