non-coding rna william liu cs374: algorithms in biology november 23, 2004

31
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Post on 19-Dec-2015

225 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Non-coding RNA

William LiuCS374: Algorithms in Biology

November 23, 2004

Page 2: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Non-Coding RNA Background Basics

Biology Overview Why ncRNA - Central Dogma? Problem Space HMM/sCFG Solution

Paper Pair HMMs on Tree Structures Alignment of Trees, Structural Alignment Experimental Evaluation

Conclusion

Page 3: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Central Dogma of Molec. Bio.

Page 4: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Biology Overview RNA merely plays an

accessory role

Complexity is defined by proteins encoded in the genome

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 5: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Biology Overview

Non-coding RNA (ncRNA) is a RNA molecule that functions w/o being translated into a protein

Most prominent examples: Transfer RNA (tRNA), Ribosomal RNA (rRNA)

Page 6: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Genome Biol. 2002; Beyond The Proteome: Non-coding Regulatory RNAs

Why Non-coding RNA Protein-coding genes

can’t account for all complexity

ncRNA is important!

Gene regulators

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 7: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Non-coding RNA Problems

Finding ncRNA genes in the genome: locate these genes

Finding Homologs of ncRNA: figure out what they do

Page 8: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Finding ncRNA Genes Protein Approaches

Statistically biased (codon triplets) Open Reading Frames

ncRNA Approaches High CG content (hyperthermophiles) Promoter/Terminator identification (E. Coli)Comparative Genome Analysis

Comparative Genome Analysis

Page 9: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Genetic Code

Page 10: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Similarity Searching

Proteins BLAST, Sequence Alignment (DP) Genes that code for proteins are conserved

across genomes (e.g. low rate of mutation) ncRNA

Secondary structure usually conserved Alignment scoring based on structure is

imperative

Page 11: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

ncRNA: Sequence vs Structure

Page 12: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Alignment Approaches

sCFGs: Modeling secondary structure, scoring sequences

HMM for scoring of sequence and secondary structure alignment

Page 13: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Pair HMMs on Tree Structures Outline

Alignment on Trees Structural Alignment

• Secondary Structure Representation• Hidden Markov Model• Recurrence Relations

Experimental Evaluation Future Work

Page 14: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Alignment on Trees

b

a

c

d

e

f g

ih b

a

c

d

e

f g

ih

Page 15: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Structural Alignment

Problem: Given an RNA sequence with known Secondary Structure and an RNA sequence (unknown structure), obtain the optimal alignment of the two

A U C G A A A G A UG

G

GG

ACACCC

G

A

CU

AAA

GAU

Page 16: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Structural Representation

Skeletal Tree

(, ): Branch Structure

(X, , Y): Base-pairs

(X, ) or (, Y): Unpaired bases

X,Y {A,U,G,C}

Page 17: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Hidden Markov Model M: Match state, I:

Insertion state, D: Deletion state

XY: State transition probability from X to Y

X: Initial probability : Emission

probability for pair x,y

X,Y {M,I,D}€

POX (x,y)

Page 18: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Notation

Let w=a1a2…an be an unfolded RNA sequence of length n

Let w[i] denote ith symbol in w

Let w[i,j] denote a substring aiai+1…aj of w

Page 19: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Notation Let T be a skeletal tree representing a folded

RNA sequence (known structure)

Let v(j) denote the label of node j in tree T

Let T[j] denote the subtree rooted at node j in tree T

Let jn denote the nth child of node j in tree T

Page 20: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Recurrence Relation (Match)

Page 21: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Recurrence Relation (Delete)

Page 22: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Recurrence Relation (Insert)

Page 23: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Structural Alignment Intuition: Given the ncRNA sequence, b with

unknown structure, generate a predicted folded structure for b, align the resulting tree with the ncRNA with known secondary structure a.

Complexity: O(K M N3 )K = # states in pair HMM,

M = size of skeletal tree,

N = length of unfolded sequence

Page 24: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Experimental Evaluation Dynamic Programming to calculate

recurrence relations, prototype system to execute algorithm

Experiments on 2 families of RNA: Transfer RNAs and Hammerhead Ribozyme

Page 25: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Parameters

Gorodkin et al. (1997)

Page 26: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Results: tRNA

Page 27: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Results: Hammerhead Ribozyme

Page 28: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Future Work

Since based on dynamic programming (of pairwise alignment), many DP techniques can apply

Refine emission probabilities, relate score matrix (reliable alignment for RNA families)

Page 29: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Conclusions

ncRNA space is quite open - no really great techniques yet

How many ncRNA genes are there? Absence of evidence ≠ evidence of absence Eddy’s call to arms

“it is time for RNA computational biologists to step up”

Page 30: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

Thanks!

Page 31: Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

References Sakakibara, K., “Pair Hidden Markov Models on

Tree Structures”, Bioinformatics, 19:232-240, 2003

Eddy, S., “Computational Genomics of Noncoding RNA Genes”, Cell, Vol 109:137-140, 2002

Szymanski, M., Barciszewski, J., “Beyond The Proteome: Non-coding Regulatory RNAs”