theoretical physics meets cellular biology: a few case studies some theoretical physicists are...

19
Theoretical Physics Meets Cellular Biology: A Few Case Studies Some theoretical physicists are asking whether general physical principles can constrain and explain biological behavior. Their goal is to find common principles, applicable to systems ranging in complexity from bacteria to brains. They use physics insights to provide novel analyses of biological data (and suggest new types of experiment). Curtis Callan Physics Department Princeton University 9/3/2009 1 Founder's Day, IOP Bhubaneswar

Upload: hilary-scott

Post on 23-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Theoretical Physics Meets Cellular Biology: A Few Case Studies Some theoretical physicists are asking whether general physical principles can constrain

Theoretical Physics Meets Cellular Biology:A Few Case Studies

Some theoretical physicists are asking whether general physical principles can constrain and explain biological behavior.

Their goal is to find common principles, applicable to systems ranging in complexity from bacteria to brains.

They use physics insights to provide novel analyses of biological data (and suggest new types of experiment).

Curtis CallanPhysics DepartmentPrinceton University

9/3/2009 1Founder's Day, IOP Bhubaneswar

Page 2: Theoretical Physics Meets Cellular Biology: A Few Case Studies Some theoretical physicists are asking whether general physical principles can constrain

A Sampling of Promising Research Themes

• Small number fluctuations impose limits on biochemical signaling: • Information capacity of gene regulatory elements (on/off or better?)• Accuracy of pattern formation in embryogenesis (fruit fly studies)

• Biological systems evolve in high-dimensional spaces; what reduced dimension quantities should we measure and attempt to explain?• Worm (C.elegans) motions span a low-dimensional shape space• We can infer a full multi-variable prob dist’n from measured pairwise

correlations (by max entropy). This actually works in• Neurobiology (multi-neuron firing patterns in the retina)• Cellular signaling (activation of protein kinases in a cascade)• Evolution (what distribution on amino acids defines a protein)

• Statistical inference (in the sense of particle physics and cosmology) can be used to fully exploit modern “high-throughput” biological experiments• Use data on transcription factor binding to infer prob dist’n on binding

energy model parameters rather than genomic binding loci• Design new gene regulation exp’ts to fully exploit statistical inference 9/3/2009 2Founder's Day, IOP Bhubaneswar

Page 3: Theoretical Physics Meets Cellular Biology: A Few Case Studies Some theoretical physicists are asking whether general physical principles can constrain

Transcription Factors Turn Genes On/Off:

Special proteins (TFs) in the cytosol bind to promoter DNA to help (or hinder) RNAP copy a gene to mRNA.

RNAP protein complex makes a mRNA copy of the gene. Ribosome translates triplets of bases into amino acids via the “genetic code”.

TF protein binds here to a short (~20bp) DNA sequence

TF

TFTF

TF

TF-DNA binding energy depends on sequence. Generic non-specific binding is weak; a TF “scans” the genome for strong binding sequences; they become occupied at lowest [TF] and this determines which gene(s) a particular TF regulates.

A TF can have many binding sites: think “individuals” in a diverse “population”. Binding site sequence is “genotype” and associated energy is a quantitative “phenotype”. If we know the map between them, binding site evolution becomes a physics problem!9/3/2009 3Founder's Day, IOP Bhubaneswar

Page 4: Theoretical Physics Meets Cellular Biology: A Few Case Studies Some theoretical physicists are asking whether general physical principles can constrain

TF Binding Sites: Statistical At Best

9/3/2009 4Founder's Day, IOP Bhubaneswar

Page 5: Theoretical Physics Meets Cellular Biology: A Few Case Studies Some theoretical physicists are asking whether general physical principles can constrain

Much is Known About TF Binding Energies:

We need energy as a function of sequence being “read” by the TF:

E=0.4 kT E=0.2 kT

….AACGCTTGCATAGCTACTGGACTACTTACATAGGTACCCCTG….

C GTA AGT TCC

1.2

3.7

2.9

0.0

1.8

0.0

0.1

3.0

5.6

0.5

0.8

0.0

2.5

1.2

0.6

0.0

0.0

5.2

1.3

3.1

6.0

0.0

1.4

3.2

A

C

G

T

Surprisingly simple “energy matrix” captures sequence-dependent TF binding energy remarkably well: E(TGTGAC)=0.7 in the example

We can use binding assay data from chip technology to infer parameters of the energy matrix (could even do massive direct E measurements). From now on, we’ll study TFs where E(seq) is “known”. bind a TF

dye label

Spot segments of intergenic DNA, bind labeled TF, fluoresce

9/3/2009 5Founder's Day, IOP Bhubaneswar

Page 6: Theoretical Physics Meets Cellular Biology: A Few Case Studies Some theoretical physicists are asking whether general physical principles can constrain

Evolution Through the Lens of Energy

ecolisalmonella.

Binding site function is governed largely by energy and it, not sequence, should be conserved in evolution. Study by comparing “orthologous” binding sites between aligned intergenic sequence. Here’s a bacterial example:

CRP site sequence varies (a lot) between closely related species

.

Is energy more conserved than sequence? Yeast ABF1 is a great test case: 100s of binding sites, well-diverged family of sister species, accurate energy matrix is known.

Scatter plot all orthologous energy pairs for all intergenic regions of S.cerevisiae vs S.bayanus (40% sequence divergence)

For E>1 ABF1 energy randomizes between species; for E<1 it is conserved (for 100s of sites): selection acts on energy phenotype.

9/3/2009 6Founder's Day, IOP Bhubaneswar

Page 7: Theoretical Physics Meets Cellular Biology: A Few Case Studies Some theoretical physicists are asking whether general physical principles can constrain

ABF1 Binding Energy Evolves in a Systematic WayPhylogenetic tree of well-sequenced yeasts enables detailed study of site energy evolution statistics.

Order Scer sites by E, show how sites of similar E evolve to E-clouds in other species. Do for increasing divergence times (Scer to Spar, Sbay, …). Simulate infinite divergence time by randomly pairing Scer sites with each other. Actual data seem to relax toward this equilibrium. Binding sites do a random walk in E like particles a confining potential: why?

tevo =0.4

X

Scer-Spar Scer-Sbay Scer-Scer

tevo =0.4 tevo =1.0 tevo = infinite

Find ABF1 functional sites by demanding orthologous site E conservation across four species: clean sample of ~600 sites

9/3/2009 7Founder's Day, IOP Bhubaneswar

Page 8: Theoretical Physics Meets Cellular Biology: A Few Case Studies Some theoretical physicists are asking whether general physical principles can constrain

Population Genetics of Binding Site Evolution (I)

Consider two different “alleles” of a specific TF binding site (red, blue). Base changes arise by mutation:

Symbol represents a single yeast; color stands for allele of a particular Abf1 site.

Clonal population of yeast organisms, of fixed size N, living, reproducing, dying:

Background mutations occasionally cause a mutant allele to appear in one organism in the population of N.

Mutant allele usually dies out quite quickly.

It can, rarely, “sweep” the population (for finite N) even if the two alleles have the same fitness!

As time marches on, different sequence states of this allele sweep the population. Neutral rate for site a to go to site b is mab . How do fitness effects change the rate?

9/3/2009 8Founder's Day, IOP Bhubaneswar

Page 9: Theoretical Physics Meets Cellular Biology: A Few Case Studies Some theoretical physicists are asking whether general physical principles can constrain

Population Genetics of Binding Site Evolution (II)

Mutations lead will change site energies: whether they “fix” depends on “fitness”.Assume that “fitness” of a site depends on sequence s through energy: F(E(s))

Kimura-Ohta population genetics result: if mutation a to b leads to a fitness change D Fab = F(b) – F(a), fixation rate in a

population of size N (how big?) becomes

Null site seq distribution P0(s) satisfies

detailed balance under bkgd rate mab.

Functional site distribution, Q(s) satis-fies detailed balance under K-O rate:

Q(s) = exp(2NF(s))P0(s)or, since F depends only on E:

Q(E) = exp(2NF(E))P0(E)

Fitness 2NF(E) is read off from E distribution of TF binding sites in one species! Neat idea of Mus-tonen & Lassig (2005).

To exploit this, we assume all sites [sn] of a TF have same F(E). Then phenotype distri-bution of one site over time equals its distri-bution over “space” (genomic sites of the TF).

9/3/2009 9Founder's Day, IOP Bhubaneswar

Page 10: Theoretical Physics Meets Cellular Biology: A Few Case Studies Some theoretical physicists are asking whether general physical principles can constrain

Specific Application to Yeast ABF1

• Run energy matrix over all inter-genic sites to get E histogram (dots)

• Run over random genome to generate null model P0(E) (blue line)

• Approx 600 “excess” low-E sites give funct’l distribution Q(E) (bars)

• Functional sites are confirmed by multi-genome E conservation• Compute E-dependent fitness via

relation 2NF(E)= log(Q(E)/P0(E))• Using any genome on yeast tree

gives same simple result (next panel)

• Kimura-Ohta method implies that sites undergo stochastic evolution in a “potential well” U=-2NF(E)

• Stochastic “temperature” is set by back-ground point mutation rate.

P0(E)

Q(E)

log(Q(E)/P0(E))

Finite well depth lets sites “boil off”

9/3/2009 10Founder's Day, IOP Bhubaneswar

Page 11: Theoretical Physics Meets Cellular Biology: A Few Case Studies Some theoretical physicists are asking whether general physical principles can constrain

Simulated ABF1 Binding Site Evolution

• Start with a list of functional ABF1 sites in the initial genome {s}Q1 .

• Derive point mutation rate matrix mba from intergenic region average Scer/Sbay substitution rate (or synonymous codon usage).

• Use Gillespie algorithm to evolve sites in {s}Q1 over time t using K-O rates based on mba and fitness F(E) to assess single base substitution tries.

• Generate simulated sample {s}Q2 of ABF1 sites in evolved genome. Simulate phylo tree by cloning sites at tree nodes and using branch times.

• Individual simulated site pairs carry no useful information, but statistics of site ensembles can be usefully compared with data

Diagnostic histogram of DE values for evolved site pairs. Solid line: simulated data. Dark bars: real data.

Parameter-free account of site evolution as a process dominated by energy!

Scer-Spar Scer-Smik Scer-Sbay

9/3/2009 11Founder's Day, IOP Bhubaneswar

Page 12: Theoretical Physics Meets Cellular Biology: A Few Case Studies Some theoretical physicists are asking whether general physical principles can constrain

What Have We Learned About Evolution?

• Binding sites of broad-acting TFs like yeast ABF1 are a favorable arena for studying evolution.• A quantitative phenotype (energy) and its relation to phenotype

(site sequence) are available• Existence of hundreds of independent binding sites makes

statistical comparisons meaningful• Quantitative fitness function derived from simplistic assumptions

(universality/time independence) gives sensible results• Stochastic picture suggests informative experiments: direct energy

measurement on mass scale, directed genetic modification, …• Our fitness function implies that max selective advantage of a

functional site is quite small: 2NDF~10 … but N is surely large!• Modern sequencing technology might enable experiments to test

whether tiny changes have stably observable effects. TBD

9/3/2009 12Founder's Day, IOP Bhubaneswar

Page 13: Theoretical Physics Meets Cellular Biology: A Few Case Studies Some theoretical physicists are asking whether general physical principles can constrain

Fluctuations, Noise & Genetic Switches

• Many cellular processes depend on presence (absence) of a small number of actors (TFs, signaling molecules, photons, ..)

• The associated fluctuations and noise have critical influence on how things work (not always fully appreciated):• Sensing chemical gradients (chemotaxis)• Flipping of genetic switches (development)• Can a gene do more than just be on or off?

• We have been exploring all of these questions, especially the development issue. We’ll take a quick tour of something neat that can be done with the fruit fly embryo

9/3/2009 13Founder's Day, IOP Bhubaneswar

Page 14: Theoretical Physics Meets Cellular Biology: A Few Case Studies Some theoretical physicists are asking whether general physical principles can constrain

Cell Fate in Fruit Fly Embryo Development

Nuclei need to know `where they are’ in order to make cell fate choice (thorax v. abdomen). Sense the concentration of Bicoid protein diffusing from the head of the embryo: if [Bcd] is big enough, they express Hb (red), if too small, Hb is not expressed (green). Mechanism for cell fate choice.

[Bcd]

Maternal Bcd RNA

Thorax ([Hb] high) Abdomen ([Hb] low)

Until generation 13, the nuclei live in a `syncytium’ without cell walls

9/3/2009 14Founder's Day, IOP Bhubaneswar

Page 15: Theoretical Physics Meets Cellular Biology: A Few Case Studies Some theoretical physicists are asking whether general physical principles can constrain

Hb

Bcd

DNA

Observations on Triply Immuno-stained Drosophila Embryos (Gregor et al):

Positions of nuclei in embryo identified by DNA signal (blue)

Infer [Hb] and [Bcd] within each nucleus from R/G signals

103s of nuclear data points give the input/output relation for expression control system:Powerful tool for studying issues of noise and fluct-uations in gene expression

Observe diffusion gradient of primary morphogen [Bcd]: above some threshold, it turns on [Hb]

Measure [Bcd] and [Hb] in each and every nucleus of cycle 14 embryo with ~2300

nuclei:

9/3/2009 15Founder's Day, IOP Bhubaneswar

Page 16: Theoretical Physics Meets Cellular Biology: A Few Case Studies Some theoretical physicists are asking whether general physical principles can constrain

Convert measurements to expression pdf:

Normalized noise in readout of [g=Bcd] from input [c=Hb] . Note approx 15% accuracy in decision region. It would be hard to do better than this because of small-number fluctuations in the [Bcd] molecules arriving at the genomic site where expression of Hb is controlled.

Input vs output Noise vs input

9/3/2009 16Founder's Day, IOP Bhubaneswar

Page 17: Theoretical Physics Meets Cellular Biology: A Few Case Studies Some theoretical physicists are asking whether general physical principles can constrain

Mutual information and the genetic switch

Can a noisy switch do more than just be “on” or “off”? How many bits can it set?Shannon information between output [g] and input [c] is the canonical measure:

Positive, measured in “bits”. 0 if (g,c) uncorrelated;1 if two levels can be sensed, ...

Input/output function P(g|c) re-flects the physics of TF binding and transcription. We take it as fixed. It is probabilistic and de-fined by measured mean, variance

The quantities needed to evaluate P(g|c) have all been measured .. No need to know why it has the form it does.

9/3/2009 17Founder's Day, IOP Bhubaneswar

Page 18: Theoretical Physics Meets Cellular Biology: A Few Case Studies Some theoretical physicists are asking whether general physical principles can constrain

Information Maximization

The distributions PTF(c) and Pexp(g) are related. Take I(c,g) to be a functional of PTF(c) and maximize. Gives max mutual info for the regulated gene and identifies the distributions of input c and output g that achieve this optimum. Variational solution is very simple:

Results for MI are impressive: Iopt=1.7 bits, Idata=1.5 bits

Optimal distribution matches observed distribution of [Bcd]

Black is data, red is optimal

9/3/2009 18Founder's Day, IOP Bhubaneswar

Page 19: Theoretical Physics Meets Cellular Biology: A Few Case Studies Some theoretical physicists are asking whether general physical principles can constrain

Bottom Line on Information Transmission

• Transcriptional regulation <-> communication channel, noise <-> measure of ‘regulatory power’

• Assuming optimality we can predict distribution of expression levels (Drosophila); not assuming optimality we get (nontrivial) bounds on I

• Simple elements: around 1 to 3 bits• Future: multiple genes

9/3/2009 19Founder's Day, IOP Bhubaneswar