non-coding rna william liu cs374: algorithms in biology november 23, 2004
Post on 19-Dec-2015
225 views
TRANSCRIPT
Non-coding RNA
William LiuCS374: Algorithms in Biology
November 23, 2004
Non-Coding RNA Background Basics
Biology Overview Why ncRNA - Central Dogma? Problem Space HMM/sCFG Solution
Paper Pair HMMs on Tree Structures Alignment of Trees, Structural Alignment Experimental Evaluation
Conclusion
Central Dogma of Molec. Bio.
Biology Overview RNA merely plays an
accessory role
Complexity is defined by proteins encoded in the genome
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Biology Overview
Non-coding RNA (ncRNA) is a RNA molecule that functions w/o being translated into a protein
Most prominent examples: Transfer RNA (tRNA), Ribosomal RNA (rRNA)
Genome Biol. 2002; Beyond The Proteome: Non-coding Regulatory RNAs
Why Non-coding RNA Protein-coding genes
can’t account for all complexity
ncRNA is important!
Gene regulators
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Non-coding RNA Problems
Finding ncRNA genes in the genome: locate these genes
Finding Homologs of ncRNA: figure out what they do
Finding ncRNA Genes Protein Approaches
Statistically biased (codon triplets) Open Reading Frames
ncRNA Approaches High CG content (hyperthermophiles) Promoter/Terminator identification (E. Coli)Comparative Genome Analysis
Comparative Genome Analysis
Genetic Code
Similarity Searching
Proteins BLAST, Sequence Alignment (DP) Genes that code for proteins are conserved
across genomes (e.g. low rate of mutation) ncRNA
Secondary structure usually conserved Alignment scoring based on structure is
imperative
ncRNA: Sequence vs Structure
Alignment Approaches
sCFGs: Modeling secondary structure, scoring sequences
HMM for scoring of sequence and secondary structure alignment
Pair HMMs on Tree Structures Outline
Alignment on Trees Structural Alignment
• Secondary Structure Representation• Hidden Markov Model• Recurrence Relations
Experimental Evaluation Future Work
Alignment on Trees
b
a
c
d
e
f g
ih b
a
c
d
e
f g
ih
Structural Alignment
Problem: Given an RNA sequence with known Secondary Structure and an RNA sequence (unknown structure), obtain the optimal alignment of the two
A U C G A A A G A UG
G
GG
ACACCC
G
A
CU
AAA
GAU
Structural Representation
Skeletal Tree
(, ): Branch Structure
(X, , Y): Base-pairs
(X, ) or (, Y): Unpaired bases
X,Y {A,U,G,C}
Hidden Markov Model M: Match state, I:
Insertion state, D: Deletion state
XY: State transition probability from X to Y
X: Initial probability : Emission
probability for pair x,y
X,Y {M,I,D}€
POX (x,y)
Notation
Let w=a1a2…an be an unfolded RNA sequence of length n
Let w[i] denote ith symbol in w
Let w[i,j] denote a substring aiai+1…aj of w
Notation Let T be a skeletal tree representing a folded
RNA sequence (known structure)
Let v(j) denote the label of node j in tree T
Let T[j] denote the subtree rooted at node j in tree T
Let jn denote the nth child of node j in tree T
Recurrence Relation (Match)
Recurrence Relation (Delete)
Recurrence Relation (Insert)
Structural Alignment Intuition: Given the ncRNA sequence, b with
unknown structure, generate a predicted folded structure for b, align the resulting tree with the ncRNA with known secondary structure a.
Complexity: O(K M N3 )K = # states in pair HMM,
M = size of skeletal tree,
N = length of unfolded sequence
Experimental Evaluation Dynamic Programming to calculate
recurrence relations, prototype system to execute algorithm
Experiments on 2 families of RNA: Transfer RNAs and Hammerhead Ribozyme
Parameters
Gorodkin et al. (1997)
Results: tRNA
Results: Hammerhead Ribozyme
Future Work
Since based on dynamic programming (of pairwise alignment), many DP techniques can apply
Refine emission probabilities, relate score matrix (reliable alignment for RNA families)
Conclusions
ncRNA space is quite open - no really great techniques yet
How many ncRNA genes are there? Absence of evidence ≠ evidence of absence Eddy’s call to arms
“it is time for RNA computational biologists to step up”
Thanks!
References Sakakibara, K., “Pair Hidden Markov Models on
Tree Structures”, Bioinformatics, 19:232-240, 2003
Eddy, S., “Computational Genomics of Noncoding RNA Genes”, Cell, Vol 109:137-140, 2002
Szymanski, M., Barciszewski, J., “Beyond The Proteome: Non-coding Regulatory RNAs”