analysis and modeling of afcs data what do we need and how do we get it?

Analysis and Modeling of AfCS Data

What do we need and how do we get it?

What do we need?

Statistical Data Analysis

MechanisticModeling

Thinkin’ about it.

Paul

RamaMadhu

MichalGilLily Madhu

The fabled sweet spot.

Some basic vocabulary

• Input: an experimental condition and/or treatment.– Ligands– siRNA– Toxins

• Observables– Calcium concentration– RNA expression– Chemotactic index

• Features– Peak calcium concentration– Basal calcium concentration

Statistical Data Analysis

• Did an observable change significantly during a treatment?

• What features of an observable change significantly?

• What groups of observables change in a correlated fashion across a number of treatments/measurements?

• Which input and observables are statistically prerequisites for other observables.

Mechanistic Models

• “Mechanistic” Models are a series of assertions about the causal structure and dynamics of a system.

• Examples

– A is necessary for B to occur– P(A,B) != P(A)*P(B)

– dB/dt= k1*A – k2*B

– dP(X,t)/dt= Wx’x(X’,t)P(X’)-Wxx’(X,t)P(X)

Logical Statistical

Deterministic Stochastic

(I won’t talk about spatial today.)

What do we need?

• Most Alliance data is geared to statistical data analysis

• For modeling we get:– What is there? (e.g. Ryanodine

receptors and IP3 receptors?) (Transcription)

– Input/Output relations between ligands/siRNA and “outputs” (protein phosphorylation, Ca2+ traces as a function of ligand and knock down)

– Measures of uncertainty in these relations.

We have an unprecedented set of quality-controlled data to get started.

Capturing Mechanistic Knowledge

• The FXM is using PathwayBuilder to capture “Mechanistic” knowledge– But there are still necessary and useful abstractions that

are used

• We NEED highly-curated, biochemically reasonable pathways– Molecule pages are making great strides

• We MUST annotate model uncertainties– Confidences in the existence of an interaction– Confidences in the type of mechanism–

• We MUST annotate (relative) parameter value ranges

Pathway Builder Representation

• Initially very abstract• But every “process” may be assigned a model• Right now– different pathways must be made for each

mechanistic hypothesis.• But shortly we will be able to encode parameter

uncertainty.

Abstract concepts can be modeled

Pathway Builder Representation

• Initially very abstract• But every “process” may be assigned a model• Right now– different pathways must be made for each

mechanistic hypothesis.• But shortly we will be able to encode parameter

uncertainty.

Levels of Abstraction

• In the current AFCS release of PathwayBuilder the “Futile Cycle” is just graphical.

• But it is possible to assign an abstract model to that box to encode a phenomenological model of all the interactions within.

FXM Map

Cytosolic calcium

C5a

UDP

Paring down models

• Current calcium models do NOT capture the path between receptors and calcium.

• But they do set the fundamental “response circuit” for calcium dynamics– given and initial change in IP3 and DAG what calcium dynamics do you expect given expression of different channels and calcium and IP3 receptors.

• So what ARE all the paths from receptors to effects on the calcium transients?

• How do they regulate each other and interact?

Current calcium measurements

• Current calcium models don’t explain different ligand response or variabilities.

Peak height

Peak width

Final calcium up

slop

e

downslope

downslope variability?

C5A Response UDP Response

Single Cell Data

Q: oscillation: C5a (high dose); UDP (low dose)

Stolen from Lily Jiang

New Feature

Fraction of cells with each CLASS of responseParticularities of each response

Modeling Issues

How do we explain data variability with models?Mathematical representation of models

Model explanations

( , )dX

f X p wdt

The time dependent behavior of X depends on:Initial conditions of X Exact values of pThe nature of the uncertainty, w

0 1 2

1 2 3( )

dxk k y k x

dtdy

k y k x k ydt

E.g.p

Bistability: Parameter dependence

A simple model of the positive feedback

Monostable

Weakly bistable

Irreversibly Bistable

kC=1.6

kc

kc – catalytic constant for the trans-autophosphorylation.

Sta

tio

nar

y st

ate

[FA

K-I

]

B-p

A A-p

Exogenous noise

kc=1.6

Endogenous Noise

det

½p1p

0p

0 . 3 0 . 5 1 1 . 5 2E

0 . 0 0 5

0 . 0 1

0 . 0 5

0 . 1

0 . 5

1Xs s

E 0 E ½ E 1

tdBEfXK

Xkdt

XK

XEk

XK

XEkdXdX )(

*

**

0)()()(

))(( 22

2

0

0

Ef

XK

Kk

XXKXk

XKXXEkE

ssssss

ssss pEEf )(

Dynamical Noise Effects

N

E

*X *X E

With tiny noise on E+ Without noise on E+

We are NOT talking about space

• Though we could…

Model Sensitivity and Features

Can be used to bounds on parameter values

Model Building and (In)Validation with Data Collaboration

Matt Onsum, Ryan Feeley, Michael Frenklach & Andrew Packard

Given a set of mechanistic models, we can determine which model is the most consistent with data.

Set of Models

Parameter Uncertainty

Data

Check for Consistency

1. Consistent Models

2. Invalidated models

3. Information on constrained data/parameters

Model Invalidation

An experiment consists of: – Measured observable, D– features of the data– Experimental tolerance in measuring observable, e – Mathematical Model, M(), showing dependency on active

variables n

– A set of acceptable values for . Since each parameter of the model has uncertainty, there exists a hypercube, H, of possible values for

The experiment actually asserts an inequality constraint among the active variables:

|M() - D| < e.H

Therefore we set up the following constrained optimization:

Subject to

Model/Data Consistency

Much can be accomplished in this optimization framework

Check the consistency of the assertions. Does there exist a satisfying all of the assertions? -- Invalidate proposed mechanisms -- Quick tests to indicate likely sources of inconsistency -- Subsets of the assertions may be readily considered The (deterministic) experimental uncertainties are directly transferred into prediction uncertainties. Generate a “best fit” parameter (more on this in the next slide)

Typical Data ProcessingGiven:

– A priori knowledge: -1 k 1 k n.– An experiment: (M(), D, e) with n

From this, all that can be concluded is |M()-D|<e.

But, typically the procedure is:– Freeze all parameters except one, at the nominal: k=0 for k k0

– Find range of the investigated (unfrozen) parameter:

max/min k0

subject to: k=0 for k k0

-1 k0 1

|M()-D|<e

The reported range is a subset of what can actually be inferred from (M(), D, e), but the implied higher dimensional cube (the new, in-literature feasible set) neither contains, nor is a subset of the feasible parameter set.

eDM )(

1,1 : 21

Mistakes in Isolation

E66

44

4

5

C

A

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1

E67 C

A

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1

45

Related work: Consistency of Methane Combustion Database

• GRI-Mech has 300+ elementary reactions, 53 Species, and 102 “active” parameters•The community needed a database containing all relevant experimental info for methane combustion to determine the “right” kinetic parameters•It was realized that the best-fit parameter values did not give the combustion model good predictive power•Needed a way to better incorporate uncertainty about the parameter values

Pathway diagram for methane combustion [Turns]

Michael Frenklach, Andrew Packard, Pete Seiler and Ryan Feeley, “Collaborative data processing in developing predictive models of complex reaction systems,” International Journal of Chemical Kinetics, vol. 36, issue 1, pp. 57-66, 2004.Michael Frenklach, Andy Packard and Pete Seiler, “Prediction uncertainty from models and data,” 2002 American Control Conference, pp. 4135-4140, Anchorage, Alaska, May 8-10, 2002.

Sensitivity of Data Set Consistency to Assertions

20 40 60 80 100

0

1

2

3x 10

-3

Parameter number

* ()

20 40 60 80 100

0

1

2

3x 10

-3

Parameter number

* ()

20 40 60

0

0.1

0.2

0.3

0.4

Dataset unit number

* (l)

20 40 60

0

0.1

0.2

0.3

0.4

Dataset unit number

* (u)

Feature number Feature number

Upper BoundLower Bound

Modeling goal for the AfCS data

•Distinguish and rank competing mechanistic models• •Propose experiments that will further distinguish competing models

•Identify structural problems with current models

• In the examples that follow pathway 1 will be the “true model”

• Data was generated by simulating the true model with random initial conditions. Each initial condition assumed to be Gaussian with nominal mean, and 0.01 variance. This was run 1000 times and the resulting pathways were averaged to give trace data

• We then try to find the maximum parameter uncertainty that still allows us to identify the true pathway.

Distinguishing similar pathways

Example 1: Test for degenerate solutions

Features:4- Peak value

Example 2: Test for complex formation

Feature:

Example 3: Test for reversibility

Features:1- Rise Time (i.c. 1,1)2- Peak value3- Rise Time (i.c. 2,2)5- Rise Time after pretreatment

Example 4: Test for missing intermediate

Summary of Toy Examples

1. For large parameter and experimental uncertainty, multiple models can fit the same data.

2. Some experiments provide tighter constraints then others.

3. Repeats of these experiments (reduction of uncertainty) improves our ability to distinguish similar pathways.

Initial data is dose response

0

10

20

30

40

50

60

1 12 23 34 45 56 67 78 89 100 111 122 133 144 155

C5a 1uM

C5a 0.5

C5a 0.25

C5a 0.1

C5a 0.05

Can we use legacy models to explain AFCS data?

We began with the model by Goldbeter

Steady state Ca2+

Peak Output

Upper BoundLower Bound

Formation of active G-protein

Inactivation of G-protein

Weisner model fits a single response

Simulations of base model show two sensitive parameters.

However, we could not fit the model to the dose response data.

The model was not able to reproduce the change in steady state values

Steady-state calcium level.

Passive ER Ca2+ leak

Km ion pump

Vmax ion exchanger

Vmax Ca2+ ATPase pump

Lower Bound Upper Bound

Conclusions

• Method for model validation

• Showed that it can distinguish between canonical pathways even with high uncertainty

• We have begun invalidating literature models

So what do we need? (Experiment)

• A number of well-chosen knock-downs upstream of calcium and in “independent” parts of the different receptor pathways. (Accurate assessment of loss of function, induction)

• Ways of separating exogenous variability from endogenous variability from measurement noise.

• Measurement of the dose-response of intermediates (not just calcium) for the single FXM ligands and

• Determination of a set of physiologically relevant and significantly affected features.

• Similarly for double-ligand responses.

• Single cell assays should be expanded!

So what do we need? (Analysis)

• Identification of important features in the data that we wish to explain.

• Determination of value and variance of significantly changing features.

• Figure out a consistent way to classify single cell responses.

So what do we need? (Modeling)

• Biochemists and geneticists editing the maps and making hypotheses– Perhaps we should have a model hypothesis page as an

addendum to Henry’s?

• An initial “frozen” data set to be the test bed for all initial modeling discussions.

• An initial “frozen” analysis thereof

• A series of “minimal” pathways derived from the FXM maps that are believed to be the significant determinants of our output signals.

• Choice of mathematical picture and inference about the significance of the single cell responses.

• A direct way of driving experiments from models.

Acknowledgements

• Matt Onsum• Ryan Feeley• Andrew Packard• Michael Frenklach• Michael Samoilov• Alex Gilman

Matt Andy

Mike

The Alliance and especially the FXM

Lily JiangMadhu NatarajanGil Sambrano

analysis and modeling of afcs data what do we need and how do we get it?

Documents