dynamic modelling of microarray data
DESCRIPTION
Dynamic modelling of microarray data. Martino Barenco Institute of Child Health / UCL. Outline. Goal: predict targets of a known transcription factor in a complex response using dynamic models and time course microarray data. HVDM: Hidden Variable Dynamic Modelling. - PowerPoint PPT PresentationTRANSCRIPT
Dynamic modelling of microarray data.
Martino BarencoInstitute of Child Health / UCL
Goal: predict targets of a known transcription factor in a complex response using dynamic models and time course microarray data. HVDM: Hidden Variable Dynamic Modelling
Outline
1) Principle + Results (Genome Biology 2006)
2) Techniques (R/Bioconductor implementation: rHVDM)
Gene expression modelTranscript concentration Xj(t):
€
dX j (t)dt
= B j + S j f (t) − D j X j (t)
transcription rates
degradation rate
transcription factor activity
f(t)Bj=0Sj=3Dj=1
Bj 010Dj 10.1
Sj 36Bj 010Dj 10.1
Sj 36
Xj(t)
Algorithm Principle:I) Training step:
Inputs:- Previous biological knowledge: known targets of the transcription factor- Expression values of those targets
Output:- Transcription factor activity (the hidden variable)- Kinetic parameters for the training genes
II) Screening step (for each single gene):Input:- Transcription factor activity - Expression profile of the gene
Output:- Dependency status of the gene: target or not?
B j S j f (t) D j X j (t)dX j (t)
dt
Training step (j: training genes)
Screening step (j: individual gene being screened)
B j S j f (t) D j X j (t)dX j (t)
dt
The p53 network
Active p53
Rb/E2F1
E2F1
Rb
CDK4Cell CycleG1/S Arrest p73
14-3-3
Jun-Bp21
Baxp53AIFPuma
FasPiddDR5
bcl2
mybJun
MDM2
p19Arf
p53
CHK2
Active ATM
ATM
DNADamage
G2/MArrest
Survival
DeathReceptor
MitochondrialApoptosis
Experimental setup
Human T cells (MOLT4/p53 wild-type) submitted to 5Gy irradiation.
mRNA harvested 2,4,6,8,10,12 hours after irradiation, and just before (0 hrs time point).
Affymetrix microarrays (HG-U133) were then run.
Experiment was run in triplicates.
Results of training step: activity profile of p53
Screening Q: what are the other genes that are p53
activated? Putative p53 targets must both:
a) Fit the model wellb) Have a sensitivity coefficient Sj>0
€
dX j (t)dt
= B j + S j f (t) − D j X j (t)
Model Sensitivityscore M (Z-score)
damage-specific DNA binding protein 2, 48kDa DDB2 203409_at 18.74 18.24CD38 antigen (p45) CD38 205692_s_at 36.69 14.77ferredoxin reductase FDXR 207813_s_at 79.82 13.19hypothetical protein FLJ22457 FLJ22457 221081_s_at 60.45 11.01tripartite motif-containing 22 TRIM22 213293_s_at 41.36 10.99carnitine O-octanoyltransferase CROT 204573_at 84.4 10.98glutaminase 2 (liver, mitochondrial) GLS2 205531_s_at 42.83 10.28leucine-rich repeats and death domain containing LRDD 219019_at 78.8 9.9hect domain and RLD 5 HERC5 219863_at 37.65 9.55cyclin G1 CCNG1 208796_s_at 17.04 9.37BCL2-interacting killer BIK 205780_at 19.43 9.35activating signal cointegrator 1 complex subunit 3 ASCC3 212815_at 60.34 9.26sestrin 1 SESN1 218346_s_at 8.37 9.25p53 target zinc finger protein WIG1 219628_at 41.33 9.19tumor necrosis factor receptor superfamily, member 10bTNFRSF10B 209295_at 27.34 9.05chromosome 6 open reading frame 4 C6orf4 215411_s_at 86.45 8.81cyclin-dependent kinase inhibitor 1A(p21) CDKN1A 202284_s_at 24.98 8.4etoposide induced 2.4 mRNA EI24/PIG8 216396_s_at 88.04 8.2mitogen-activated protein kinase kinase kinase kinase 4 MAP4K4 206571_s_at 62.88 7.54lymphoid-restricted membrane protein LRMP 204674_at 26.92 7.36xeroderma pigmentosum, group C XPC 209375_at 43.09 7.36TNF (ligand) superfamily, member 4 (Ox40L) TNFSF4 207426_s_at 34.73 7.15Human cleavage /polyadenylation specificity factor CPSF1 33132_at 77.75 7.09AMP-activated protein kinase, beta 1 subunit PRKAB1 201834_at 25.72 7.01transducer of ERBB2, 1 TOB1 202704_at 92.69 6.79p53-inducible cell-survival factor P53CSV 218403_at 48.33 6.5sortilin-related receptor, L(DLR class) SORL1 203509_at 15.66 6.34Fas (TNF receptor superfamily, member 6) FAS 216252_x_at 44.31 6.23ribonucleotide reductase M1 polypeptide RRM1 201477_s_at 46.58 6.19archaemetzincins-2 AMZ2 218167_at 37.48 6.16galactose-3-O-sulfotransferase 4 GAL3ST4 219815_at 38.62 5.97growth arrest and DNA-damage-inducible, alpha GADD45A 203725_at 84.23 5.89hypothetical protein FLJ11259 FLJ11259 218627_at 7.23 5.87major histocompatibility complex, class I, B HLA-B 209140_x_at 89.77 5.79testis specific, 10 TSGA10 220623_s_at 20.85 5.67hypothetical protein MDS025 MDS025 218288_s_at 31.35 5.66TP53 activated protein 1 TP53AP1 209917_s_at 22.22 5.65leukemia inhibitory factor LIF 205266_at 14.86 5.62interferon stimulated exonuclease gene 20kDa-like 1 ISG20L1 219361_s_at 48.55 5.56
Gene Title Gene Symbol Affymetrix Identifier
P21: part oftraining set
CD38:Uncovered by screening
Verification experimentsiRNA knock down of p53:
HVDM predictions:
Ingredients needed1) ODE integrator:
€
dX j (t)dt
= B j + S j f (t) − D j X j (t) + parameter values
€
X j,MODEL(t)
2) Model fitting:
Find set of parameter values s.t.
€
X j,MODEL(t) ≅ X j,DATA (t)3) Want to take measurement noise into the data into account
4) Specifically for the Bioconductor implementation: be reasonably quick
1) ODE integration
01020304050607080
0 2 4 6 8 10 12
- Want to estimate slope of at t=6
€
X j,MODEL (t)
€ - Slope=weighted sum of time points around t=6
€
dX j,MODEL(t)dt
≅ A.X j,MODEL (t)
€
A.X j (t) = B j + S j f (t) − D j X j (t)- i.e. the ODE is turned into a system of linear equations
€
X j,MODEL (t) = (A + D jI )-1(B j 1+ S j f(t))Formal solution:
2) Model fitting1) Start with a “random” set of parameters:
2) Compute a solution:
3) Compare with data using a merit function:
4) Vary p systematically until a minimum value for M(p) is reached.
€
p = {B1,..,Bm,S1,..,Sm,D1,..,Dm, f }
€
j = 1,...,m
€
M(p) =ˆ X j(ti) − X j(ti)
σ X j(ti)( )
⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟
n time points (i)m genes (j)
∑2
€
X j,MODEL (t) = (A + D jI )-1(B j 1+ S j f(t))
Fitting algorithms: Originally used simplex-based method
(Nelder-Mead) (GB paper) Followed by a MCMC step to determine
confidence intervals (GB paper) rHVDM (Bioconductor) uses Levenberg-
Marquardt (gradient-based). By-product is the Hessian, which allows to
compute confidence intervals.
Difference between MCMC and LM confidence intervals.
Basal rates
0
10
20
30
40
50
60
70
80
203409_at 218346_s_at 209295_at 202284_s_at 205780_at
Sensitivity
0
0.5
1
1.5
2
2.5
203409_at 218346_s_at 209295_at 202284_s_at 205780_at
Degradation
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
203409_at 218346_s_at 209295_at 202284_s_at 205780_at
Transcription factor activity (sample1)
0
50
100
150
200
250
300
350
400
450
1 2 3 4 5 6 7
Importance of confidence intervals Biological data is inherently noisy. Don’t want
to assume that measurement are exact. example:
Genes with a flat profile would be a good fit to the equation (Sj=0)
Essential to identify these situations to detect targets of the transcription factor
€
dX j (t)dt
= B j + S j f (t) − D j X j (t)
Parameter count reduction / identifiability
€ dXj(t)dt= Bj + Sj (α g(t) + β) – Dj Xj(t)
= (Bj + Sj β) + α Sj g(t) – Dj Xj(t)
= ~Bj + ~Sj g(t) – Dj Xj(t)
€
dX j (t)dt
= B j + S j f (t) − D j X j (t)
Replace f(t) with
€
f (t) = αg(t) + β
Solution:Let Sp21=1 (removes “”’’ ambiguity)and f(0)=0 (removes “’’ ambiguity) parameter count is reduced by 2
Confidence intervals importance II
Solution measure one of the kinetic parameters independently, integrate that in the fitting:
Initial fitting:
Measurementerror
AlgorithmicspeedParameter
identifiability
Parameter countreduction
Confidenceintervals
AcknowledgementsSonia Shah (Bloomsbury Centre for
Bioinformatics)Dan Brewer (Institute of Cancer Research)Crispin Miller (Patterson Institute for Cancer
Research)Daniela Tomescu (ICH)Mike Hubank (ICH)Robin Callard (ICH)Jaroslav Stark (CISBIC, Imperial College)