estimating mutation rates from clonal tree data (using modeling to understand the immune system)...
TRANSCRIPT
Estimating Mutation RatesEstimating Mutation Ratesfrom Clonal Tree Datafrom Clonal Tree Data
(using modeling to understand the immune system)(using modeling to understand the immune system)
Steven H. KleinsteinDepartment of Computer Science
Princeton University
The Immune System
• Protects the body from dangerous pathogens– viruses, bacteria, parasites
• Provides basis for vaccines (e.g., flu)
• Implicated in disease:– Autoimmune (Lupus, MS, Rheumatoid Arthritis)– Sepsis, Cancer
Relatively new science, began with Jenner in 1796
Understanding will lead to better diagnostics and therapiesUnderstanding will lead to better diagnostics and therapies
Why Model the Immune System?
• Immune response involves the collective and coordinated response of 1012 cells and molecules
• Distributed throughout body– blood, lymph nodes, spleen, thymus, bone marrow, etc.
• Interactions involve feedback loops and non-linear dynamics• Experiments often require artificial constructs• High variability observed in experimental results
Somatic Hypermutation: important component of responseSomatic Hypermutation: important component of response
Experiments provide only a static window onto the real dynamics of immunity
Talk Outline
• Background Immunology
1. What are clonal trees?2. What can we learn from them?3. Method for estimating the hypermutation rate
Simulation Method Analytical Method
4. Results on simulation and experimental data
• Summary, Conclusions and Open Problems
GerminalCenter
EffectorCells
Overview of Humoral Immune Response
AgB
BB
B
B
Ag
Germinal Centers are the Site of Affinity Maturation(selection for cells with affinity-increasing mutations)
Germinal Centers are the Site of Affinity Maturation(selection for cells with affinity-increasing mutations)
B
B
Foreign Pathogen(Antigen)
B
B
B
More about Germinal Centers
• Hundreds of germinal centers in spleen
• Composed of many cell types (B, T, FDC)
• Spatial structure that changes over time
• Driven by stochastic mutation process
• Observation involves sacrifice of animal
While experiments have elucidated basic mechanisms,
how these fit together is not well understood
While experiments have elucidated basic mechanisms,
how these fit together is not well understood
How do we learn about somatic hypermutation?
1. Inject mouse with antigen to provoke an immune response
2. Harvest spleen, cross-section and stain to identify proliferating cells
B cells T cells
3. Microdissect groups of cells 4. Sequence DNA coding for B cell receptor
Experimental Microdissection Data
How can we estimate the rate of mutation?How can we estimate the rate of mutation?
Estimating the B Cell Mutation RateCounting mutations is not sufficient,
number of cell divisions at time of measurement is unknown
Also, problems of mutation schedule and positive/negative selection Also, problems of mutation schedule and positive/negative selection
Number of Cell Divisions
Num
ber
of M
utat
ions
High Mutation Rate
Low Mutation Rate
Observed Number of Mutations
Clonal Tree Data from Micro-DissectionClonal trees from sequence data based on pattern of shared mutations
Germline GGGATTCTC1 -C-----G-2 -------G-3 A------GA4 A---C--GA
Clonal tree ‘shapes’ reflect underlying dynamicsClonal tree ‘shapes’ reflect underlying dynamics
1(GA)9(CA)
3
1 3
4
8(TG)
2(GC)
5(TC)
Germline
Clonal Tree ‘Shape’ Reflects Underlying Dynamics
BA C D
g ct c
BA D
Initial Sequence
a t
Initial Sequence
a
D
t
g ct
BA
Investigate with computer simulation of B cell clonal expansion
Simulate clonal trees with known mutation rate and number of divisions
Compare: Rate of 0.2 division-1 for 14 divisionsRate of 0.4 division-1 for 7 divisions
Use simulation to identify relevant shape measuresUse simulation to identify relevant shape measures
0
1
2
3
4
5
6
7
8
9
10
11
EDGES
VERTICIE
S
NODES
LEAVES
LEAF C
ELLS
INTERM
IDIA
TE VERTIC
ES
INTRM
EDIATE C
ELLS
REPEAT NODES
REPEAT CELL
S
INTERM
EDIATE N
ODES
INTERM
EDIATE N
ODE CELL
S
GERMLI
NE
UNIQUE M
UTATIO
NS
TOTAL M
UTATIO
NS
ROOT CELL
S
ANY ROOT
Ave
rag
e N
um
ber
Intermediate Vertices is Useful Measure Compare: Rate of 0.2 division-1 for 14 divisions
Rate of 0.4 division-1 for 7 divisions
0.2
0.4
0.6
0.8
1
1.2
1.4
1.5 1.7 1.9 2.1 2.3 2.5
Total Mutations Per Sequence
Inte
rmed
iate
Ver
ticie
s
Shape measures can supplement information from mutation countingShape measures can supplement information from mutation counting
(Simulation Data)
Method for Estimating Mutation RateFind mutation rate that produces distribution of tree
‘shapes’ most equivalent to observed set of trees
1.E-39
1.E-37
1.E-35
1.E-33
1.E-31
1.E-29
1.E-27
1.E-25
0.10 0.20 0.30 0.40 0.50
Mutation Rate (per division)
Like
lihoo
d
Assumes equivalent mutation rate in all trees, although number divisions may differ
Also developed analytical method based on same underlying idea(Kleinstein, Louzoun and Shlomchik, The Journal of Immunology, In Press)
Also developed analytical method based on same underlying idea(Kleinstein, Louzoun and Shlomchik, The Journal of Immunology, In Press)
ExperimentalObservations
Set of ObservedTree Shapes
Mutation Rate
Distribution ofTree Shapes
Simulation ofB cell expansion
Number of mutationsIntermediate vertices
Sequences at root
=
Simulation Method
1 2 … d
Tree 1
Tree 2
…
Tree n
# divisions (d)
1 2 … d
Tree 1
Tree 2
…
Tree n
# divisions (d)
Equivalent, E(t,d)
Observable, O(t,d)
Used version of Golden Section Search to optimize mutation rate ()Used version of Golden Section Search to optimize mutation rate ()
( , )( | , )
( , )
d
d
E t dL t
O t d
( ) ( | , )t
L L t
2. Likelihood of experimentally observed tree t:
3. Likelihood of experimental dataset:
1. Run simulation many times to fill in equivalent and observable matrices
Reasonable clone size?
Sametree shape?
For each mutation rate () and lethal fraction ()…
t
t
Validating the Simulation MethodUse simulation to construct artificial data sets with limited number of trees/sequences reflecting currently available experimental data
y = 1.2073x
R2 = 0.9149
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Actual Mutation Rate (per division)
Pre
dict
ed M
utat
ion
Rat
e (p
er d
ivis
ion)
Method works even with limited number of clonal trees and sequencesMethod works even with limited number of clonal trees and sequences
Results for simulation method
Estimate of method precision(SD = 0.035 division-1)
Analytical MethodFormulas to approximate tree shapes…
Minimize error X() over all experimentally observed trees (t)
2 2 2
( )( ) ( ) ( )
t t t
dt t t t
M M R R P PX MIN
VAR M VAR R VAR P
For each observed tree, choose number of divisions to minimize error
Observed shape Calculated shape
The average number of mutations per sequence (M)1(1 )M d
d
t
eSR
)1( 1
The average number of sequences present at the root of the tree (R)
Total number of sequences in nodes with repeated sequences (P) 111 (1 ) tS
tP S p
Validating the Analytical Method
y = 1.075x
R2 = 0.9435
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Actual Mutation Rate (per division)
Pre
dict
ed M
utat
ion
Rat
e (p
er d
ivis
ion)
Use simulation to construct artificial data sets with limited number of trees/sequences reflecting currently available experimental data
Method works even with limited number of clonal trees and sequencesMethod works even with limited number of clonal trees and sequences
Experimental Data from Autoimmune Response
Data set consists of 31 trees from 7 mice, average 6 sequences / treeData set consists of 31 trees from 7 mice, average 6 sequences / tree
Data set particularly well-suited for estimating mutation rate
• Defined pick sizes provide upper-bound on clone size• Small picks (< 50 cells) minimize positive selection
Micro-dissection from extra-follicular areas in MRL/lpr AM14 heavy chain transgenic mice(William, Euler, Christensen, and Shlomchik. Science. 2002 )
B cells(Idiotype+ RF)
T cells(CD3+)
SingleMicro-dissections
Stems Pruned to Reflect Local Expansion
Closely related cells remain in close spatial proximity
Mutation information retained in local branching structureMutation information retained in local branching structure
Long stems may be artifact of local micro-dissection
Initial Sequence
Tim
e
Stem is removed
New tree root
Pick
0.0E+00
2.0E-04
4.0E-04
6.0E-04
8.0E-04
1.0E-03
1.2E-03
0 0.1 0.2 0.3 0.4 0.5 0.6
Fraction of FWR Replacement Mutations Lethal ()
Est
imat
ed M
utat
ion
Rat
e
Simulation Estimate
Analytical Estimate
Analytical Estimate (10,000)
Mutation Rate in an Autoimmune Response
Estimated mutation rate is 0.7 x 10-3 – 0.9 x 10-3 base-pair-1 division-1Estimated mutation rate is 0.7 x 10-3 – 0.9 x 10-3 base-pair-1 division-1
Consider range of values for lethal mutation frequency ()
Likely value based on FWR R:S Ratios
Mutation Rate in the Primary NP ResponseData set consists of 23 trees, average 7 sequences / tree
(Jacob et al., 1991), (Jacob and Kelsoe, 1992), (Jacob et al., 1993) and (Radmacher et al., 1998)
Estimated mutation rate is 0.9 x 10-3 – 1.1 x 10-3 base-pair-1 division-1Estimated mutation rate is 0.9 x 10-3 – 1.1 x 10-3 base-pair-1 division-1
Data based on many large picks:
Clone sizes may be very largeIn estimation, limited to:103 in simulation estimate104 in analytical estimate
Positive selection clearly factor
Still evaluating validity of methods for this NP data set
0.0E+00
2.0E-04
4.0E-04
6.0E-04
8.0E-04
1.0E-03
1.2E-03
0 0.1 0.2 0.3 0.4 0.5 0.6
Fraction of FWR Replacement Mutations Lethal (l)
Est
imat
ed M
utat
ion
Rat
e
Simulation Estimate
Analytical Estimate (10,000)
Analytical Estimate (15,000)
Summary for Estimating Mutation Rate
Mutation rate reflected in clonal tree ‘shapes’Developed simulation and analytical methodsSynthetic datasets to estimate precisionApplied to autoimmune response & NP responseMost accurate mutation rate estimate to dateFuture improvements with additional data
Noisy Optimization
2.50E-27
3.00E-27
3.50E-27
4.00E-27
4.50E-27
5.00E-27
0 64000 128000 192000 256000 320000
Simulation Runs per Likelihood Evaluation
Lik
elih
oo
d
Can we adapt the number of runs at each parameter setting to the minimum required to determine which direction to look?
Can we adapt the number of runs at each parameter setting to the minimum required to determine which direction to look?
How many runs should be used for each likelihood evaluation?
Bias in Simulation Method
0.25
0.27
0.29
0.31
0.33
0.35
0.37
0.39
0.41
0.43
0.45
0 10 20 30 40 50 60
Number of Sequences per Dataset
Est
imat
ed M
utat
ion
Rat
e
Can we remove the bias from the simulation method?Can we remove the bias from the simulation method?
No bias is present in the analytical method
y = 1.2073x
R2 = 0.9149
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Actual Mutation Rate (per division)
Pre
dict
ed M
utat
ion
Rat
e (p
er d
ivis
ion)
Acknowledgements
For more information:[email protected], www.cs.princeton.edu/~stevenk
Yoram Louzoun (Bar-Ilan University)
Mark Shlomchik (Yale University)
(Kleinstein, Louzoun and Shlomchik, The Journal of Immunology, In Press)
PICASsoPICASsoProgram in Integrative Information, Computer and Application Sciences
• Interdisciplinary Computational Seminars– Graduate student-oriented– Wednesdays at noon
• PICSciE Computational Colloquium– Tuesdays at 4pm
• Information and mailing list (see website)
www.cs.princeton.edu/picassowww.cs.princeton.edu/picasso
Summary of Experimental DataAn Total number of cells used to generate the sequences
Un Number of unique sequences used to create the tree
Tn The average number of mutations per sequence
Mn The number of unique mutations in the tree
IN The number of intermediate nodes (with sampled sequences)
RN The number of sampled sequences at the root of the tree
Autoimmune Response Primary NP ResponseMouse Pick/Tree An Un Tn Mn In Rn 2205 5a2,3 30 7 1.38 11 0 2 2205 5a5 20 6 1.00 7 0 2 2205 5f1,2.K 100 2 0.17 1 0 5 2205 5f1,2.L 100 1 0.00 0 0 5 2205 5f1,2.M 100 1 0.00 0 0 2 2540 11f,g 85 6 2.17 7 2 0 2540 11g1 30 2 1.20 3 0 0 4270 10j1 10 1 0.00 0 0 6 4270 10j2 20 3 1.83 6 1 0 5281 14c1 30 3 6.00 3 1 0 5281 14a1,3 20 3 0.18 2 0 9 5281 14c2 10 3 2.00 4 1 2 7976 16b3.A 15 3 2.17 6 1 0 7976 16b3.B 15 2 1 0 1 7976 16c1 50 1 0.00 0 0 5 7976 16c2.A 20 3 0.29 2 0 5 7976 16c2.B 20 1 0.00 0 0 7 7976 16d1 20 3 0.25 2 0 6 7976 16d2 20 2 0.14 1 0 6 7976 16c3.A 10 2 0.75 1 0 1 7976 16c3.B 10 1 0.00 0 0 1 7983 17a4.A 10 2 0.33 2 0 3 7983 17a4.B 10 2 0.25 1 0 2 7983 17a4.C 10 1 0.00 0 0 1 7983 17a5.A 50 6 3.25 15 1 1 7983 17a5.B 50 1 0.00 0 0 1 4641 12a2,b1 17 3 2 1 2 4641 12c1 10 1 0.00 0 0 6 4641 12c2 6 2 0.14 1 0 6 4641 12d2 50 4 2.80 9 1 1 4641 12e1,2 20 2 0.09 1 0 10
Germinal Center Day An Un Mn In Rn 61AM40 8 ? 7 11 1 1 61AM41.A 8 ? 5 7 1 0 61AM41.B 8 ? 3 3 1 0 61AM14 8 ? 6 10 0 7 61AM16 8 ? 11 20 1 2 61AB08 10 ? 2 1 0 1 B12 10 ? 3 3 0 1 B17.A 10 ? 2 1 0 2 B17.B 10 ? 3 2 0 1 B17.C 10 ? 2 4 0 0 L1AB01 10 ? 2 1 0 8 L1AB02 10 ? 2 1 0 9 L1AB03 10 ? 2 1 0 1 L1AD01.A 14 ? 5 11 1 4 L1AD01.B 14 ? 2 1 0 1 L1AD02 14 ? 7 9 3 3 L1AD03 14 ? 4 9 2 0 L1AD05 14 ? 7 10 4 0 61AD01 14 ? 7 9 3 1 61AD02 14 ? 10 51 4 0 61AA02 16 ? 5 7 1 0 GC8 16 ? 10 23 6 0 GC24 16 ? 14 38 6 0
Acknowledgements
For more information:[email protected], www.cs.princeton.edu/~stevenk
Yoram Louzoun (Bar-Ilan University)
Mark Shlomchik (Yale University)
(Kleinstein, Louzoun and Shlomchik, The Journal of Immunology, In Press)