mathematics and the life sciences in the 21 st century selection dynamics

Post on 22-Jan-2016

19 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

- PowerPoint PPT Presentation

TRANSCRIPT

Some Mathematical Challenges from Life Sciences

Part III

Peter Schuster, Universität WienPeter F.Stadler, Universität Leipzig

Günter Wagner, Yale University, New Haven, CTAngela Stevens, Max-Planck-Institut für Mathematik in den

Naturwissenschaften, Leipzigand

Ivo L. Hofacker, Universität Wien

Oberwolfach, GE, 16.-21.11.2003

1. Mathematics and the life sciences in the 21st century

2. Selection dynamics

3. RNA evolution in silico and optimization of structure and properties

OCH2

OHO

O

PO

O

O

N1

OCH2

OHO

PO

O

O

N2

OCH 2

OHO

PO

O

O

N3

OCH2

OHO

PO

O

O

N4

N A U G Ck = , , ,

3' - end

5 ' - end

Na

Na

Na

Na

5 '-end 3 ’-endGCG GAU AUUCG CUUA AG UUG GG A G CUG AAG A AGGUC UUCGAUC A ACCAGCUC GAG C CCAGA UCUGG CUGUG CACAG

3 '-end

5 ’-end

70

60

50

4030

20

10

Definition of RNA structure

0

421 8 16

10

19

9

14

6

13

5

11

3

7

12

21

17

22

18

25

20

26

24

28

272315 29 30

31

B in a r y seq u en c es a re e n co d edb y th e ir d ec im a l e q u iv a len ts :

= 0 a n d = 1 , fo r ex a m p le ,

" 0 " 0 0 0 0 0 =

" 1 4 " 0 111 0 = ,

" 2 9 " 111 0 1 = , e tc .

,

C

C CC CC

C C

C

G

GGG

GGG G

M u ta n t c la ss

0

1

2

3

4

5

Sequence space of binary sequences of chain lenght n=5

Population and population support in sequence space: The master sequence

s p ac e

S eq uence

Con

cent

rati

on

M a s te r s e q u e n ce

P o p u la tio n s u p p o r t

M a s te r s e q u e n ce

Population and population support in sequence space: The quasi-species

spac e

S equ enc e

Con

cent

ratio

n

M aste r seq uen ce

P o p u la tio n su pp o rt

M aste r seq uen ce

The increase in RNA production rate during a serial transfer experiment

Decrease in m ean fitnessdue to quasispecies form ation

Sk I. = ( )fk f S k = ( )

S e q u e n c e s p a c e S tru c tu re sp a c e R e a l n u m b e rs

Mapping from sequence space into structure space and into function

Folding of RNA sequences into secondary structures of minimal free energy, G0300

G

G

G

G

GG

G GG

G

GG

G

G

G

G

U

U

U

U

UU

U

U

U

UU

A

A

AA

AA

AA

A

A

A

A

U

C

C

CC

C

C

C

C

C

C

C

C

5 ’-en d 3 ’-en d

GGCGCGCCCGGCGCC

GUAUCGAAAUACGUAGCGUAUGGGGAUGCUGGACGGUCCCAUCGGUACUCCA

UGGUUACGCGUUGGGGUAACGAAGAUUCCGAGAGGAGUUUAGUGACUAGAGG

RNAStudio.lnk

The Hamming distance between structures in parentheses notation forms a metric

in structure space

H a m m in g d is ta n c e d (S ,S ) = H 1 2 4

d (S ,S ) = 0H 1 1

d (S ,S ) = d (S ,S )H H1 2 2 1

d (S ,S ) d (S ,S ) + d (S ,S )H H H1 3 1 2 2 3

( i)

( ii )

( ii i)

f0 f

f1

f2

f3

f4

f6

f5f7

Replication rate constant:

fk = / [ + dS (k)]

dS (k) = dH(Sk,S)

Evaluation of RNA secondary structures yields replication rate constants

The flowreactor as a device for studies of evolution in vitro and in silico

Replication rate constant:

fk = / [ + dS (k)]

dS (k) = dH(Sk,S)

Selection constraint:

# RNA molecules is controlled by the flow

Stock S olution Reaction M ixture

NNtN )(

spaceS equen ce

Con

cent

rati

on

M a s te r s e q u e n c e

M u ta n t c lo u d

“ O ff - th e -c lo u d ” m u ta tio n s

The molecular quasispecies in sequence space

S = ( )I

f S = ( )

S

f

I M

utat

ion

G e n o ty p e-P h e n o ty p e M a p p in g

E va lua t ion o f th e

P hen o ty pe

Q jI1

I2

I3

I4 I5

In

Q

f1

f2

f3

f4 f5

fn

I1

I2

I3

I4

I5

I

In + 1

f1

f2

f3

f4

f5

f

f n + 1

Q

Evolutionary dynamics including molecular phenotypes

In silico optimization in the flow reactor: Trajectory (biologists‘ view)

Tim e (arbitrary units)

Ave

rage

dis

tanc

e fr

om in

itial

str

uctu

re 5

0 -

d

S

500 750 1000 12502500

50

40

30

20

10

0

Evolutionary trajectory

In silico optimization in the flow reactor: Trajectory (physicists‘ view)

Tim e (arbitrary units)

Ave

rage

str

uctu

re d

ista

nce

to ta

rget

d

S

500 750 1000 12502500

50

40

30

20

10

0

Evolutionary trajectory

Movies of optimization trajectories over the AUGC and the GC alphabet

AUGC GC

Statistics of the lengths of trajectories from initial structure to target (AUGC-sequences)

R u n tim e o f tra je c to rie s

Fre

quen

cy

0 10 00 20 00 30 00 40 00 50 000

0 .0 5

0 .1

0 .1 5

0 .2

44A

ve

rag

e s

tru

ctu

re d

ista

nc

e t

o t

arg

et

d

S

Evolutionary trajectory

1250

10

0

44

42

40

38

36Relay steps

Nu

mb

er o

f rela

y s

tep

T ime

Endconformation of optimization

4443A

ve

rag

e s

tru

ctu

re d

ista

nc

e t

o t

arg

et

d

S

Evolutionary trajectory

1250

10

0

44

42

40

38

36Relay steps

Nu

mb

er o

f rela

y s

tep

T ime

Reconstruction of the last step 43 44

444342A

ve

rag

e s

tru

ctu

re d

ista

nc

e t

o t

arg

et

d

S

Evolutionary trajectory

1250

10

0

44

42

40

38

36Relay steps

Nu

mb

er o

f rela

y s

tep

T ime

Reconstruction of last-but-one step 42 43 ( 44)

Reconstruction of step 41 42 ( 43 44)

44434241A

ve

rag

e s

tru

ctu

re d

ista

nc

e t

o t

arg

et

d

S

Evolutionary trajectory

1250

10

0

44

42

40

38

36Relay steps

Nu

mb

er o

f rela

y s

tep

T ime

Reconstruction of step 40 41 ( 42 43 44)

4443424140A

ve

rag

e s

tru

ctu

re d

ista

nc

e t

o t

arg

et

d

S

Evolutionary trajectory

1250

10

0

44

42

40

38

36Relay steps

Nu

mb

er o

f rela

y s

tep

T ime

444342414039

Evolutionary process

Reconstruction

Av

era

ge

str

uc

ture

dis

tan

ce

to

ta

rge

t

dS

Evolutionary trajectory

1250

10

0

44

42

40

38

36Relay steps

Nu

mb

er o

f rela

y s

tep

T ime

Reconstruction of the relay series

Change in RNA sequences during the final five relay steps 39 44

Transition inducing point mutations Neutral point mutations

In silico optimization in the flow reactor: Trajectory and relay steps

Tim e (arbitrary units)

Ave

rage

str

uctu

re d

ista

nce

to ta

rget

d

S

500 750 1000 12502500

50

40

30

20

10

0

Evolutionary trajectory

Relay steps

S kS j

kP

kP

P

Birth-and-death process with immigration

NN-10 1 2 3 4 5 6 7 8 9 10

x

(x) = x + ( -x) N (x) = x

T1,0 T0,1

Tim e t

Par

ticle

nu

mbe

r

(t)

X

0

2

4

6

8

10

12

Calculation of transition probabilities by means of a birth-and-death process with immigration

1008

1214

Tim e (arbitrary units)

Av

era

ge

str

uc

ture

dis

tan

ce

to

ta

rge

t

dS

5002500

20

10

Uninterrupted presence

Evolutionary trajectory

Nu

mb

er o

f relay

ste

p

Neutral genotype evolution during phenotypic stasis

Neutral point mutationsTransition inducing point mutations

28 neutral point mutations during a long quasi-stationary epoch

18

19

20

21

26

28

29 31

A random sequence of minor or continuous transitions in the relay series

Tim e (arbitrary units)

750 1000 1250Av

era

ge

str

uc

ture

dis

tan

ce

to

ta

rget

d

S

30

20

10

Uninterrupted presence

Evolutionary trajectory

35

30

25

20 Nu

mb

er o

f relay

step

A random sequence of minor or continuous transitions in the relay series

18

192527

202224

2123

2630

28

29 31

Tim e (arbitrary units)

750 1000 1250Ave

rag

e st

ruct

ure

dis

tan

ce t

o t

arg

et

dS

30

20

10

Uninterrupted presence

Evolutionary trajectory

35

30

25

20 Nu

mb

er of relay step

A random sequence of minor or continuous transitions in the relay series

100

101

102

103

104

105

Rank

10-6

10-5

10-4

10-3

10-2

10-1

Fre

quen

cy o

f occ

urre

nce

5 '-E nd

3 '-E nd

7 0

6 0

5 0

4 03 0

2 0

1 01 0

25

Probability of occurrence of different structures in the mutational neighborhood of tRNAphe

Rare neighbors

Main transitions

Frequent neighbors

Minor transitions

In silico optimization in the flow reactor: Main transitions

M ain transitionsRelay steps

Tim e (arbitrary units)

Ave

rage

str

uctu

re d

ista

nce

to ta

rget

d

S

500 750 1000 12502500

50

40

30

20

10

0

Evolutionary trajectory

00 09 31 44

Three important steps in the formation of the tRNA clover leaf from a randomly chosen initial structure corresponding to three main transitions.

Statistics of the numbers of transitions from initial structure to target (AUGC-sequences)

N u m b er o f tra ns ition s

Freq

uenc

y

0 20 40 60 80 1000

0 .05

0 .1

0 .15

0 .2

0 .25

0 .3

A ll tran sitio n s

M ain tra n sit ion s

Alphabet Runtime Transitions Main transitions No. of runs

AUGC 385.6 22.5 12.6 1017 GUC 448.9 30.5 16.5 611 GC 2188.3 40.0 20.6 107

Statistics of trajectories and relay series (mean values of log-normal distributions)

S 1(j)

S k(j)

S 2(j)

S 3(j)

S m( j )

k

k

k k

k

P

P

P P

P

P

Transition probabilities determining the presence of phenotype Sk(j) in the population

S 1(j)

S k(j)

S 2(j)

S 3(j)

S m( j )

k

k

k k

k

P

P

P P

P

P

N = sa t(j)

p . . < >l (j)

1

Statistics of evolutionary trajectories

Population size

N

Number of replications

< n >rep

Number of transitions

< n >tr

Number of main transitions

< n >dtr

The number of main transitions or evolutionary innovations is constant.

1008

1214

Tim e (arbitrary units)

Av

era

ge

str

uc

ture

dis

tan

ce

to

ta

rge

t

dS

5002500

20

10

Uninterrupted presence

Evolutionary trajectory

Nu

mb

er o

f relay

ste

p

Neutral genotype evolution during phenotypic stasis

Neutral point mutationsTransition inducing point mutations

28 neutral point mutations during a long quasi-stationary epoch

Variation in genotype space during optimization of phenotypes

Mean Hamming distance within the population and drift velocity of the population center in sequence space.

Spread of population in sequence space during a quasistationary epoch: t = 150

Spread of population in sequence space during a quasistationary epoch: t = 170

Spread of population in sequence space during a quasistationary epoch: t = 200

Spread of population in sequence space during a quasistationary epoch: t = 350

Spread of population in sequence space during a quasistationary epoch: t = 500

Spread of population in sequence space during a quasistationary epoch: t = 650

Spread of population in sequence space during a quasistationary epoch: t = 820

Spread of population in sequence space during a quasistationary epoch: t = 825

Spread of population in sequence space during a quasistationary epoch: t = 830

Spread of population in sequence space during a quasistationary epoch: t = 835

Spread of population in sequence space during a quasistationary epoch: t = 840

Spread of population in sequence space during a quasistationary epoch: t = 845

Spread of population in sequence space during a quasistationary epoch: t = 850

Spread of population in sequence space during a quasistationary epoch: t = 855

Sk I. = ( )fk f S k = ( )

S e q u e n c e s p a c e S tru c tu re sp a c e R e a l n u m b e rs

Mapping from sequence space into structure space and into function

Sk I. = ( )fk f S k = ( )

S e q u e n c e s p a c e S tru c tu re sp a c e R e a l n u m b e rs

Sk I. = ( )fk f S k = ( )

S e q u e n c e s p a c e S tru c tu re sp a c e R e a l n u m b e rs

The pre-image of the structure Sk in sequence space is the neutral network Gk

Neutral networks are sets of sequences forming the same structure.

Gk is the pre-image of the structure Sk in sequence space:

Gk = -1(Sk) {Ij | (Ij) = Sk}

The set is converted into a graph by connecting all sequences of Hamming distance one.

Neutral networks of small RNA molecules can be computed by exhaustive folding of complete sequence spaces, i.e. all RNA

sequences of a given chain length. This number, N=4n , becomes very large with increasing length, and is prohibitive for numerical computations.

Neutral networks can be modelled by random graphs in sequence space. In this approach, nodes are inserted randomly into sequence space until the size of the pre-image, i.e. the number of neutral sequences, matches the neutral network to be studied.

Mean degree of neutrality and connectivity of neutral networks

j = 2 7 = 0 .4 4 4 ,/1 2 k = (k )j

| |G k

c r = 1 - -1 ( 1 )/ -

k c r . . . .>

k c r . . . .<

n e tw o rk is c o n n e c te dG k

n e tw o rk is c o n n e c te dn o tG k

C o n n e c tiv i ty th re sh o ld :

A lp h a b e t s iz e : = 4 AUG C

G S Sk k k= ( ) | ( ) = -1 I Ij j

c r

2 0 .5

3 0 .4 2 3

4 0 .3 7 0

GC,AU

GUC,AUG

AUGC

A connected neutral network

G ian t C om po n en t

A multi-component neutral network

5 '-E n d5 '-E n d 5 '-E n d5 '-E n d

3 '-E n d3 '-E n d 3 '-E n d3 '-E n d

7070 7070

60606060

5050 5050

4040 4040 3030 3030

2020 20

20

1010 1010

A lp h a b e t D eg ree o f n eu tra lity

AU

AUG

AUGC

UGC

GC

- -

- -

0 .2 7 5 0 .0 6 4

0 .2 6 3 0 .0 7 1

0 .0 5 2 0 .0 3 3

- -

0 .2 1 7 0 .0 5 1

0 .2 7 9 0 .0 6 3

0 .2 5 7 0 .0 7 0

0 .0 5 7 0 .0 3 4

0 .0 7 3 0 .0 3 2

0 .2 0 1 0 .0 5 6

0 .3 1 3 0 .0 5 8

0 .2 5 0 0 .0 6 4

0 .0 6 8 0 .0 3 4

Degree of neutrality of cloverleaf RNA secondary structures over different alphabets

Reference for postulation and in silico verification of neutral networks

Stru ctu re

CUGGGAAAAAUCCCCAGACCGGGGGUUUCCCCGG

C o m p a tib le seq u en ceStru ctu re

5 ’-en d

3 ’-en d

CUGGGAAAAAUCCCCAGACCGGGGGUUUCCCCGG

G

G

G

G

G

G

G

C

C

C

C G

G

G

G

C

C

C

C

C

C

C

U A

U

U

G

U

AA

A

A

U

C o m p a tib le seq u en ceStru ctu re

5 ’-en d

3 ’-en d

CUGGGAAAAAUCCCCAGACCGGGGGUUUCCCCGG

G

G

C

C

C

C G

G

G

G

C

C

G

G

G

G

G

C

C

C

C

C

U A

U

U

G

U

AA

A

A

U

C o m p a tib le seq u en ceStru ctu re

5 ’-en d

3 ’-en d

B a se p a ir s : AU , UAG C , CGG U , UG

S in g le n u c le o tid e s : A U G C, , ,

The compatible set Ck of a structure Sk consists of all sequences which form

Sk as its minimum free energy structure (the neutral network Gk) or one of its

suboptimal structures.

G kN e u tra l N e tw o rk

S truc tu re S k

G k C k

C o m p a tib le S e t C k

The intersection of two compatible sets is always non empty: C0 C1

S tru c tu re S 0

S tru c tu re S 1

Reference for the definition of the intersection and the proof of the intersection theorem

A sequence at the intersection of two neutral networks is compatible with both structures

CUGGGAAAAAUCCCCAGACCGGGGGUUUCCCCGG

3 ’ -e n d

Min

imum

fre

e en

ergy

con

form

atio

n S

0

Sub

opti

mal

con

form

atio

n S 1

G

G

G

G

G

G

G

G

G

G

G

GC

C

C

C

U

U

U

U

C

C

C

C

C

C

U

A

A

A

A

A

C

G

G

G

G

G

G

C

C

C

C

U

U

G

G

G

G

G

C

C

C

C

C

C

C

U

U

AA

A

A

A

U

G

5.10

5.9

02

8

1415

1817

23

19

2722

38

45

25

3633

3940

4341

3.30

7.40

5

3

7

4

109

6

1312

3.1

0

11

21

20

16

2829

26

3032

424644

24

353437

49

31

4748

S 0S 1

b asin '1 '

lo n g liv in gm e tastab le s tru c tu re

b as in '0 '

m in im u m free en erg y s tru c tu re

Barrier tree for two long living structures

A ribozyme switch

E.A.Schultes, D.B.Bartel, Science 289 (2000), 448-452

Two ribozymes of chain lengths n = 88 nucleotides: An artificial ligase (A) and a natural cleavage ribozyme of hepatitis--virus (B)

The sequence at the intersection:

An RNA molecules which is 88 nucleotides long and can form both structures

Two neutral walks through sequence space with conservation of structure and catalytic activity

Examples of smooth landscapes on Earth

Massif Central

Mount Fuji

Examples of rugged landscapes on Earth

Dolomites

Bryce Canyon

G en o ty p e S p ac e

Fitn

ess

Sta r t o f W a lk

E n d o f W a lk

Evolutionary optimization in absence of neutral paths in sequence space

Evolutionary optimization including neutral paths in sequence space

G e n o ty p e S p a ce

Fitn

ess

Sta r t o f W a lk

E n d o f W a lk

R an d o m D rif t P e rio d s

A d ap tiv e P erio d s

Example of a landscape on Earth with ‘neutral’ ridges and plateaus

Grand Canyon

Neutral ridges and plateus

top related