rapid ab initio rna folding including pseudoknots via graph tree decomposition jizhen zhao, liming...

Post on 20-Jan-2018

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Increased number of ncRNAs ncRNA function other than coding proteins, e.g., structural, catalytic, and regulatory factors ncRNA genes do not have strong statistical features, such as ORFs, or polyadenylated, except Transcribed ncRNA molecules can fold into stable (and unique) secondary or tertiary structures

TRANSCRIPT

Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai Russell Malmberg

Computer Science Plant Biology University of Georgia

Why another RNA folding algorithm?

• The need for RNA analysis tools has increased because of the number of recently found functional RNAs (i.e., ncRNAs).

• RNA folding algorithms are not completely satisfactory in spite of having been intensively studied for more than 25 years.

Increased number of ncRNAs

• ncRNA function other than coding proteins, e.g., structural, catalytic, and regulatory factors

• ncRNA genes do not have strong statistical features, such as ORFs, or polyadenylated, except

• Transcribed ncRNA molecules can fold into stable (and unique) secondary or tertiary structures

Increased number of ncRNAs• rRNAs and tRNAs• RNA maturation: snRNA in recognizing splicing

sites• RNA modification: snoRNA converting uridine to

pseudo-uridine• Regulation of gene expression and translation:

e.g., miRNAs• DNA replication: e.g., telomerase RNAs - template

for addition of telomeric repeats• Etc.

In introns, intergenic regions, or 5’ and 3’ UTRs,

Increased number of ncRNAs(Bompfunewerer, et al, 2005)

Class Size Function Phylogeneticdistribution

tRNA 70-80 Translation ubiquitous

rRNA16S/18S28S+5.8S/23S5S

1.5K3K130

translation ubiquitous

RNase PMRP

220-440250-350

tRNA -maturation ubiquitouseukarya

snoRNA

telomerase

130

400-550

pseudouridinylationaddition of repeats

snRNAU1 ~ U6

100-600130-140

SpliceosomemRNA maturation

EukaryaEukarya, archaea

U7

7SK

~65

~300

Histone mRNAMaturationTranslationalregulation

Eukayotes

vertebrata

tmRNA 300-400 Tags proteinFor proteolysis

bacteria

miRNA ~22 Post-tran. Reg. Multi-cellular orgs

Long history of RNA foldings• First simple RNA folding algorithm (Nussinov

1978) • Thermodynamic based (Zuker&Stiegler 1981)• Zuker’s (1989) • mFOLD 3.2 • RNAfold (a part of Vienna Package 1.6.1)

• Not all that accurate on single sequence• Inherent computational complex from DP• Unable to predict pseudoknots

Background

• Base pairings allow RNA to fold

Watson-Crick base pairs: A-U, C-GWobble pair G-U

non-canonical pairs are also possible

N N

N

O

H

H

5’-u-u-c-c-g-a-a-g-c-u-c-a-a-c-g-g-g-a-a-a-u-g-a-g-c-u-3’

P a

P c

5’ 3’

P u a

P

g

P

CYTOSINEN

N

N

O

H

H

H

N

N

GUANINE

URACIL ADENINE

N N

O

O

H

N

N

N

N

N

HH

Secondary structure is important to tertiary structure

Hairpin loopJunction (Multiloop)

Bulge Loop

Single-Stranded

Interior Loop

Stem

Image– Wuchty

Pseudoknot

aacguu ccccucu ggggcagc cc

aga ugccc

stem (double helix): stacked base pairs

loop: strand of unpaired bases

accacc ggu

aacguu ccccucu acc ggggcagc ggucc

aga ugcacccc

Pseudoknots: crossing patterns of stems

terminates translation errors

Bacterial tmRNA consensus structure(Felden et al. 2001. NAR 29)

Pseudoknots in TMV 3’ UTR

Promotes efficient translationBinds EF1A, cooperates with 5’UTR

(Leathers et al. 1993 MCB 13Zeenko et al. 2002 JVI 76)

Previous work (Nussinov’s)

• maximizing the number of base pairs (Nussinov et al, 1978)

simple case(i, j) = 1

Previous work (Zuker’s)

• Thermodynamic energy based method (Zuker and Stiegler 1981)

• Energy minimization algorithm: find the secondary structure to minimize the free energy (G)

G calculated as sum of individual contributions of:– loops– base pairs– secondary structure elements

Previous work (Zuker’s)

• Free-energy values (kcal/mole at 37oC )

• Energies of stems calculated as stacking contributions between neighboring base pairs

Previous work (Zuker’s)

MFOLD: computing loop dependent energies

Previous work (Zuker’s)

Difficult issues

• Energy associated with any position is only influenced by local sequence and structure

• mFOLD does not predict pseudoknots• PKnots: (Eddy and Rivas 1999) predict

restricted cases of pseudoknots, O(n6) time and O(n4) space

• Min energy-based pseudoknot prediction is NP-hard (Lyngso and Pederson 2000)

Pseudoknots drastically increase the complexity

Heuristic RNA folding algoithmsILM (Ruan et al 2004)HotKnots (Ren et al 2005)

• Fast, sometime slow• unlimited class of pseudoknots• do not guarantee the optimality of the

predicted structure

This work

• Graph-theoretic based, aviod nucleotide level DP

• Unlimited pseudoknot structures

• Optimal solutions

• Fast

• Comparable performance in accuracy

This work (summary)

1. Model: similar to ILM, without loop energy

2. Approach: Find all stable stems, construct a stem graphReduce folding to independent set problem

3. Techniques:tree-decompose the stem graphDP to obtain optimal solution

This work (approach)

This work (approach)

A set of non-overlapping stems corresponds to an independent set of the stem graph.

The weight of each vertex is related to the energy of the corresponding stem.

This work (techniques)

A tree decomposition of the stem graph

Tree width t = 4

This work (techniques)

A tree decomposition of the stem graph

Tree width t = 4

Find an approximate tree decomposition of width t

MWIS can be found in time O(2tN), N=O(n2)by DP over the tree

Time can be improved to O(et/e) = O(1.44t)

This work (experimental results) Data sets: 50 tRNAs (length 71 - 79) 50 pseudoknots (23 - 113) 11 large RNAs (210 - 412

Compared with PKnots (DP, optimal, restricted pks)ILM (heuristic, unrestricted)HotKnots (heuristic, unrestricted

Measuresensitivity = TP/Real totalspecificity = TP/(TP+FP)Time

This work (experimental results)

Conclusion

• A new graph-theoretic algorithm to RNA folding

• Performance comparable with the best in both accuracy and speed

• With much room to be improved

• Applications in multiple structure alignment as well as in folding single sequence

• A part of NIH project for ncRNA gene search

top related