rna secondary structure what is rna? definition of rna secondary structure rna molecule evolution...

40
RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic Hierarchy Stochastic Context Free Grammars & Evolution Miscelaneous topics

Upload: julie-hopkins

Post on 29-Dec-2015

227 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

RNA Secondary Structure

What is RNA?

Definition of RNA secondary Structure

RNA molecule evolution

Algorithms for base pair maximisation

Chomsky’s Linguistic Hierarchy

Stochastic Context Free Grammars & Evolution

Miscelaneous topics

Page 2: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

Base PairingFrom Przytycka

CC

C

N

N C

O

C

CC

C

O

N C

O

N

cytosine

Uracyl

N

CC

C

NC

NC

N

N

O

N

CC

C

NC

NC

N

N

Adenine

Guanine

PYRIMIDINES PURINES

H donor acceptor

Page 3: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

An Example: t-RNA

From Paul Higgs

Page 4: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

Known RNAst-RNA (transfer-)

m-RNA (messenger-)

mi-RNA (micro-)

Sn-RNA (small nuclear)

RNA-I (interfering)

Srp-RNA (Signal Recognition Particle)

5S RNA

16S RNA

23S RNA

RNA viruses: Retroviruses (HIV), Coronavirus (SARS),.

….

Page 5: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

Functions of RNAs

Information Transfer: mRNA

Codon -> Amino Acid adapter: tRNA

Enzymatic Reactions:

Other base pairing functions: ???

Structural:

Metabolic: ???

Regulatory: RNAi

Page 6: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

Known RNA Structureshttp://www.rnabase.org/metaanalysis/ httpp://www.sanger.ac.uk/Software/rfam http://www.scor.lbl,gov

Figure 1: The cumulative number of publicly available RNA containing structures determined by x-ray crystallography (red), nmr spectroscopy (purple) or all techniques combined (blue) has been steadily increasing since the first RNA containing structure was released in 1978. There has been a substantial acceleration in RNA structure determinations since the mid-1990s.

Figure 2: In a positive new trend, the average number of conformational map outliers per residue solved has shown a consistent downtrend recently. Interestingly, most of the improvement can be attributed to structures determined by x-ray crystallography. There has been no consistent trend for structures determined by NMR spectroscopy.

Rfam – database of RNA alignments and secondary structure models

Scor - database of RNA experimentally solved structures

Page 7: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

RNA SS: recursive definitionNussinov (1978) remade from Durbin et al.,1997

i,j pairbifurcation

j unpairedi unpaired

i jj-1i+1

iji+1

jj-1i

i k

jk+1

Secondary Structure : Set of paired positions on inteval [i,j].

A-U + C-G can base pair. Some other pairings can occur + triple interactions exists.

Pseudoknot – non nested pairing: i < j < k < l and i-k & j-l.

Page 8: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

RNA Secondary Structure

2

0

1)2( ),1)(()()1(n

k

TkNkTnTnT

n

nnT

2

53

8

5715~)( 2/3

N1 NL

The number of secondary structures:

( )

N1 NL( ) N1 NL( )

NLN1

))

NkN1) Nk+1 )NL()

Waterman,1978

Page 9: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

RNA: Matching Maximisation.remade from Durbin et al.,1997

Example: GGGAAAUCC (A-U & G-C)

0 0 02 03 04 05 16 27 3

0 0 0 0 0 0 1 2 32

0 0 0 0 0 1 2 23

0 0 0 0 1 1 14

0 0 0 1 1 15

0 0 1 1 16

0 0 0 07

0 0 0

0 0

G G G A A A U C C

j

i

G G

G A

A A

U C

0i)(i, & 01)-i(i, tionInitialisa

j)]1,(kk)(i,[max

j)(i,1)-j1,(i

1)-j(i,

j)1,(i

max

j)(i,

jki

U

A A

CA

C

G

GG

Page 10: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

RNA Secondary Structure EvolutionFrom Durbin et al.(1998) Biological Sequence Comparison

Page 11: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

Inference about hidden structure

Observable

Observable Unobservable

Unobservable

U

C G

A

C

AU

A

C

)()(

)()(

),(

SequencePSequenceStructureP

StructurePStructureSequenceP

StructureSequenceP

Goldman, Thorne & Jones, 96

Knudsen & Hein, 99

Pedersen & Hein, 03

Page 12: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

Goldman, Thorne & Jones: ”Structure” + ”Evolution”

1 A S D F G H J K L P2 A S D F G H J K L P3 D S D F G K J K L C4 D S D F G K J K L C

HMM x x x x x x x L x x x

42

1 3

Page 13: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

Three Questions

What is the probability of the data?

What is the most probable ”hidden” configuration?

What is the probability of specific ”hidden” state?

Training: Given a set of instances, find parameters making

them probable if they were independent.

O1 O2 O3 O4 O5 O6 O7 O8 O9 O10

H1

H2

H3

PO5

H5 2 P(O5 H5 2) PO4

H4 j

H 4 j

p j ,i

Page 14: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

The Basic Calculations

What is the most probable ”hidden” configuration?O1 O2 O3 O4 O5 O6 O7 O8 O9 O10

H1

H2

H3

What is the probability of specific ”hidden” state?

O1 O2 O3 O4 O5 O6 O7 O8 O9 O10

H1

H2

H3

The time required for these calculations is proportional to K2*L, where K is the number of hidden states and L the length of the sequence.

Page 15: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

Empirical Doublet Models

Partial Doublet Model

AU UA GC CG UG GU

AU -1.16 .18 .5 .12 .02 .27

UA .18 -1.16 .12 .5 .27 .02

CG .33 .08 -.82 .13 .02 .23

CG .08 .33 .13 -.82 .23 .02

UG .08 1.00 .1 1.26 -2.56 .04

GU 1.00 .08 1.26 .1 .04 -2.56

Singlet/Marginalized Doublet Model

A C G U

A -.75/-1.15 .16/.13 .32/.79 .26/.23

C .4/.09 -1.57/-.84 .24/.16 .93/.59

G .55/.45 .17/.13 -.96/-.7 .24/.11

U .35/.18 .51/.70 .19/.16 -1.05/-1.03

Alignment of slowly N related molecules – L long

AUUGCAUUCCAAUUGCAUUCCA rN1,N2 = #(N1->N2,N2->N1)/[NP/U(NP/U-1)/2] N1 not N2

AUUGCAUUCCAAUUGCAUUCCA where NP/U is number of paired/unpaired in alignment

AUUGCAUUCCAAUUGCAUUCCA r’N1,N2 = #N1*rN1,N2/#N2

AUUGCAUUCCAAUUGCAUUCCA

Page 16: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

Doublet EvolutionFrom Bjarne Knudsen

Page 17: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

Structure Dependent Evolution: RNA

U A C A C C G U

U

C G

A C

AU

CU A C A C C G U

U A C A C C G U

U A C A C C G U 1 2 3 4 5 6 7

23

68

457

1 2 3 4 5 6 7

23

68

457

)(

)(

,

,

UnpairedHistoryP

PairedHistoryP

ji

ji

)(

)(

,

,

UnpairedHistoryP

PairedHistoryP

ji

ji

Page 18: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

Structure Dependent Evolution: RNA

Page 19: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

Grammars: Finite Set of Rules for Generating Stringsi. A starting symbol:

ii. A set of substitution rules applied to variables - - in the present string:

Reg

ula

r

Co

nte

xt F

ree

Co

nte

xt S

ensi

tive

Gen

eral

(a

lso

era

sin

g)

finished – no variables

Page 20: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

Chomsky Linguistic HierarchySource: Biological Sequence Comparison

W nonterminal sign, a any sign, are strings, but , not null string. Empty String

Regular Grammars W --> aW’ W --> a

Context-Free Grammars W -->

Context-Sensitive Grammars 1W2 --> 12

Unrestricted Grammars 1W2 -->

The above listing is in increasing power of string generation. For instance "Context-Free Grammars" can generate all sequences "Regular Grammar" can in addition to some more.

Page 21: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

Simple String Generators

Terminals (capital) --- Non-Terminals (small)

i. Start with S S --> aT bS T --> aS bT

One sentence – odd # of a’s:S-> aT -> aaS –> aabS -> aabaT -> aaba

ii. S--> aSa bSb aa bb

One sentence (even length palindromes):S--> aSa --> abSba --> abaaba

Page 22: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

Stochastic GrammarsThe grammars above classify all string as belonging to the language or not.

All variables has a finite set of substitution rules. Assigning probabilities to the use of each rule will assign probabilities to the strings in the language.

S -> aSa -> abSba -> abaaba

i. Start with S. S --> (0.3)aT (0.7)bS T --> (0.2)aS (0.4)bT (0.2)

If there is a 1-1 derivation (creation) of a string, the probability of a string can be obtained as the product probability of the applied rules.

S -> aT -> aaS –> aabS -> aabaT -> aaba

ii. S--> (0.3)aSa (0.5)bSb (0.1)aa (0.1)bb

*0.3

*0.3 *0.2 *0.7 *0.3 *0.2

*0.5 *0.1

Page 23: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

S --> LS L .869 .131F --> dFd LS .788 .212L --> s dFd .895 .105

Secondary Structure Generators

Page 24: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

SCFG Analogue to HMM calculations (Durbin et al,1998)

W

i j1 L

WL WR

i’ j’

The time required for these calculations is proportional to K2*L3, where K is the number of hidden states and L the length of the sequence.

What is the probability of the data?

What is the most probable ”hidden” configuration?

What is the probability of specific ”hidden” state?

S

Page 25: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic
Page 26: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

RNA Secondary StructureKnudsen & Hein, 03

Page 27: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

From Knudsen & Hein (1999)

1. Accuracy as certainty threshold is increased.

2. Accuracy as function of sequence number:

Page 28: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

RNA Secondary StructureKnudsen & Hein, 03

Page 29: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

Observing Evolution has 2 parts

P(x):

P(Further history of x):

U

C G

A

C

AU

A

C

xx

Page 30: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

RNA Structure Prediction and Alignment

Sankoff, 1985 Combined RNA secondary structure & alignment

Gorodkin 1997 Foldalign – only hairpins

2002 Dynalign

Perriquet 2002 Carnac

Can only align molecules of same type.

Page 31: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

RNA Structure Representations

From Fontana, 2003

Moulton et al.,2002

E MountainsCircle with chords

Ordered Tree

Balanced Nested Parenthesis

Full Description

Page 32: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

RNA Structure Evolution

Insertion-deletion process of

Doublets

Singlets

There are methods of tree alignments that could probably be extended to statistica tree alignment.

Page 33: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

Metrics on RNA StructuresMoulton,2000

Base Pair Metrics

Tree Metrics

Mountain Metrics

Page 34: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

Population Genetics of Coupled MutationsW.Stephan,96 & P.Higgs,98

Possible separation of long term and short term evolution

Creation of Linkage Disequilibrium of paired sites.

Page 35: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

SingletDoublet ModelsKirby et al, 95, Tillier et al.,98, Savill et al.,01

Jukes-Cantor with bias toward base pairing:

1/4, 1 difference, pairing gained

1/4, 1 difference, pairing unchanged

Ri,j=

1/4, 1 difference pairing lost

0, 2 differences

Page 36: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

Contagious Dependencies: Overlapping Reading Frames & CG frequenciesPedersen & Jensen,01

n n n n n n n n n n n

Page 37: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

DoubletTetraplet ModelsNerman & Durbin at B.Knudsen’s exam 02

N2 N4

N2N1

In principle a 44 times 44 matrix (65.536 entries!!) is need, but proper parametrisation and symmetries is could reduce this substantially.

Stacking:

Page 38: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

RNA + Protein Structure Dependent Molecular Evolution

Singlet

Straight forward, no interference from RNA level.

Doublets

What seems to be needed is a parametrisation of how base pairing creates departure from a independent singlet,singlet model.

Page 39: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

RNA Folding

Molecular Dynamics of RNA Structures

RNA Structure – Sequence Landscapes

RNA Homology Modelling & Threading

RNA Gene Finding

Close to Optimal Structures

Constraint Satisfaction Modelling

Miscellaneous Topics

Page 40: RNA Secondary Structure What is RNA? Definition of RNA secondary Structure RNA molecule evolution Algorithms for base pair maximisation Chomsky’s Linguistic

Literature & www-sites

Eddy, S. Non-coding RNA genes and the modern RNA world.Nat Rev Genet. 2001 Dec;2(12):919-29. Review.

Eddy, S. “Computational genomics of noncoding RNA genes” Cell. 2002 Apr 19;109(2):137-40. Review. Fontana (2002) Modelling “evo-devo” with RNA BioEssays 24.12.1164-77

Knudsen, B. and J.J.Hein (2003) "Practical RNA Folding” (In Press, RNA)

Knudsen, B. and J.J.Hein (1999) "Using stochastic context free grammars and molecular evolution to predict RNA secondary structure (Bioinformatics vol 15.5 15.6.446-454)

Moore (1999) Structural Motifs in RNA Ann.Rev.Biochem. 68.287-300.

Moulton et al. (2000) Metrics on RNA Secondary Structures J.Compu.Biol. 7.1/2.277-

Perriquet et al.(2003) Finding the common homologous structure shared by two homologous RNAs. Bioinformatics 19.1.108-116.

http://www.imb-jena.de/RNA.html

http://scor.lbl.gov/index.html

http://www.rnabase.org/metaanalysis/