computational prediction of target genes of micrornas · computational prediction of target genes...
TRANSCRIPT
Computational prediction of target genes of microRNAs
by
M. Hossein Radfar
A thesis submitted in conformity with the requirementsfor the degree of Doctor of Philosophy
Graduate Department of Electrical and Computer EngineeringUniversity of Toronto
Copyright © 2014 by M. Hossein Radfar
Abstract
Computational prediction of target genes of microRNAs
M. Hossein Radfar
Doctor of Philosophy
Graduate Department of Electrical and Computer Engineering
University of Toronto
2014
MicroRNAs (miRNAs) are a class of short (21-25 nt) non-coding endogenous RNAs
that mediate the expression of their direct target genes post-transcriptionally. The goal
of this thesis is to identify the target genes of miRNAs using computational methods.
The most popular computational target prediction methods rely on sequence based de-
terminants to predict targets. However, these determinants are neither sufficient nor
necessary to identify functional target sites, and commonly ignore the cellular conditions
in which miRNAs interact with their targets in vivo. Since miRNAs activity reduces the
steady-state abundance of mRNA targets, the main goal of this thesis is to augment large
scale gene expression profiles as a supplement to sequence-based computational miRNA
target prediction techniques. We develop two computational miRNA target prediction
methods: InMiR and BayMiR; in addition, we study the interaction between miRNAs
and lncRNAs using long RNA expression data.
InMiR is a computational method that predicts the targets of intronic miRNAs based
on the expression profiles of their host genes across a large number of datasets. InMiR can
also predict which host genes have expression profiles that are good surrogates for those
of their intronic miRNAs. Host genes that InMiR predicts are bad surrogates contain
significantly more miRNA target sites in their 3 UTRs and are significantly more likely
to have predicted Pol II-III promoters in their introns.
We also develop BayMiR that scores miRNA-mRNA pairs based on the endogenous
ii
footprint of miRNAs on gene expression in a genome-wide scale. BayMiR provides an
“endogenous target repression” index, that identifies the contribution of each miRNA in
repressing a target gene in presence of other targeting miRNAs.
This thesis also addresses the interactions between miRNAs and lncRNAs. Our anal-
ysis on expression abundance of long RNA transcripts (mRNA and lncRNA) shows that
the lncRNA target set of some miRNAs have relatively low abundance in the tissues
that these miRNAs are highly active. We also found lncRNAs and mRNAs that shared
many targeting miRNAs are significantly positively correlated, indicating that these set
of highly expressed lncRNAs may act as miRNA sponges to promote mRNA regulation.
iii
Acknowledgements
I would like to thank my supervisors and members of my committee Willy Wong, Quaid
Morris, and Zhaolei Zhang. I would like to thank Willy Wong for all his friendly guidance
and support throughout my PhD study. I also would like to thank Quaid Morris whose
curiosity, enthusiasm, support, and suggestions have greatly improved the quality of my
research. Additionally, I thank the members of Morris Lab and Sensory Communications,
both past and present, for valuable discussions and advice. Finally, I would like to thank
Nastaran, my wife, for her endless support, encouragement, and patience.
iv
Contents
1 Introduction 1
2 Background and Literature Review 6
2.1 small RNAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 miRNAs biogenesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Mechanisms of miRNAs-mediated gene regulation . . . . . . . . . 9
2.3 Identification of miRNA targets . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.1 miRNA over-expression experiments . . . . . . . . . . . . . . . . 11
2.3.2 miRNA knockdown (antagonism) experiments . . . . . . . . . . . 12
2.3.3 Prediction based on HITS/PAR-CLIP . . . . . . . . . . . . . . . 12
2.3.4 Target prediction using luciferase reporters . . . . . . . . . . . . . 13
2.3.5 Measuring the protein output after miRNA over expression or an-
tagonism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Computational miRNAs target prediction methods . . . . . . . . . . . . 14
2.4.1 TargetScan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4.2 Pictar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.3 miRSVR-miRanda . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.4 GenMiR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.5 HOCTAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.6 COMETA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
v
2.4.7 Sylamer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.8 MIRZA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3 Intronic miRNAs and prediction of their targets 26
3.1 Intronic miRNAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Method: InMIR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.1 Computing weights for putative miRNA regulators on individual
datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.2 Mapping host gene weights to miRNA weights . . . . . . . . . . . 31
3.2.3 Combining multiple datasets to predict functional targets . . . . . 32
3.2.4 Determining a cutoff value for significant interactions . . . . . . . 34
3.2.5 Predicting miRNA targets using inverse correlation (CORR method) 36
3.2.6 Processing hosts and targets data . . . . . . . . . . . . . . . . . . 39
3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.1 Data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.2 Detecting good host gene surrogate . . . . . . . . . . . . . . . . . 39
3.3.3 Targeting of host genes by miRNAs partially explains their pre-
dicted surrogacy . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3.4 Correlation measurements are not good indicators of surrogacy . . 46
3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4 BayMiR: a computational miRNA target prediction method 53
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2.1 BayMiR method . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2.2 BayMiR identifies highly repressed targets on miRNA over-expression
assays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
vi
4.2.3 miRNA activity and expression profiles are significantly correlated 63
4.2.4 mRNAs harboring miRNA target sites near the both ends of the
3’ UTR have higher endogenous down-regulation signals . . . . . 65
4.3 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.3.1 BayMiR model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.3.2 Processing mRNA expression Data . . . . . . . . . . . . . . . . . 72
4.3.3 MiRNA-mRNA interaction analysis . . . . . . . . . . . . . . . . 73
4.3.4 Enrichment analysis . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.3.5 Availability of BayMiR and supporting data . . . . . . . . . . . . 74
4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5 Impact of miRNAs on long non-coding RNAs 81
5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.2.1 lncRNA targets of some miRNAs have relative low expression in
the tissues in which the miRNAs are highly active . . . . . . . . . 83
5.2.2 lncRNAs that significantly positively correlated with mRNAs may
decoy their common targeting miRNAs . . . . . . . . . . . . . . . 85
5.2.3 Highly expressed lncRNAs in the cytoplasm contain significantly
less seed match sites than those in the nucleus . . . . . . . . . . . 89
5.2.4 High relative number of lncRNA targets in allosomes and chromo-
somes 20-22 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.2.5 LncRNAs that contain seed match sites have significantly higher
expression compared to those that lack seed match sites . . . . . . 91
5.2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.3 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.3.1 Microarray data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.3.2 Measuring correlation between mRNAs and lncRNAs . . . . . . . 96
vii
5.3.3 Hyper-geometric test analysis . . . . . . . . . . . . . . . . . . . . 96
5.3.4 Identifying the complementary seed match sites in the lncRNA
transcripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6 Conclusions and Future Work 98
6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.2 Future directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.2.1 A Bayesian approach to decipher the TF-miRNA-mRNA-lncRNA
regulatory network . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.2.2 Identifying lncRNA binding sites complementary to mRNA sequences101
6.2.3 Using sequence and expression evidence in parallel . . . . . . . . . 102
Bibliography 104
viii
List of Tables
3.1 The description of symbols used in this chapter . . . . . . . . . . . . . . 28
3.2 InMiR procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.1 Comparison between the expression level of cytoplasmic and nucleic lncRNAs 90
ix
List of Figures
2.1 MiRNAs Biogenesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1 Interaction between hosts, targets, and intronic miRNAs using DAG . . . 32
3.2 Simplified Interaction between hosts, targets, and intronic miRNAs . . . 32
3.3 Plots a-d: the CDFs of the weights wigk . . . . . . . . . . . . . . . . . . . 35
3.4 Receiver Operating Characteristic (ROC) curve analysis . . . . . . . . . 36
3.5 boxplots of weights obtained from the procedure described in Table I . . 38
3.6 the interaction network of target and host genes of intronic miRNAs . . . 41
3.7 Distribution of number of putative targets of intronic miRNAs . . . . . . 42
3.8 the scatter plot of good and bad surrogate host genes . . . . . . . . . . . 43
3.9 Venn diagrams showing overlap between good and bad surrogates . . . . 44
3.10 the CDF of the number of miRNAs targeting host (blue) and non-host genes 44
3.11 Number of intergenic and intronic miRNAs . . . . . . . . . . . . . . . . 45
3.12 Host genes targeted by intronic miRNAs of other hosts . . . . . . . . . . 45
3.13 Correlation coefficients averaged over five correlation datasets . . . . . . 47
3.14 Scatter plots of five correlation datasets . . . . . . . . . . . . . . . . . . . 48
3.15 Intronic miRNAs comprises a significant portion of miRNAs in species . 49
3.16 Regulatory mechanisms of intronic miRNAs . . . . . . . . . . . . . . . . 50
3.17 The host genes targeted by their own intronic miRNAs. . . . . . . . . . . 51
3.18 host gene and intronic miRNA resembles a ”biological switch” . . . . . . 52
x
4.1 BayMiR Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2 BayMiR performance in the miRNA over-expression experiments. . . . . 59
4.3 Cumulative distribution of scores for the validated targets. . . . . . . . . 60
4.4 Comparing BayMiR and Cometa bar plots . . . . . . . . . . . . . . . . . 61
4.5 Comparing BayMiR and Cometa score CDFs . . . . . . . . . . . . . . . . 62
4.6 Validated KEGG pathways . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.7 Enrichment Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.8 WNT signaling pathway. . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.9 KEGG “Pathways in cancer” . . . . . . . . . . . . . . . . . . . . . . . . 66
4.10 miRNA targeting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.11 mRNAs harboring miRNA target sites near the both end of the 3’ UTR . 68
4.12 BayMiR and position contribution scores . . . . . . . . . . . . . . . . . . 77
4.13 BaymiR predicts down-regulated genes in samples not included in training
data. Blue circled line: prediction error on training data and red circled
line: prediction error on test data. . . . . . . . . . . . . . . . . . . . . . . 78
4.14 Estimated (red) and actual (blue) expression profiles of nine genes across
28 test samples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.15 The 3‘ UTR of mRNAs harbor many conserved seed matches. . . . . . . 79
4.16 Example of combinatorial regulation masking inverse correlation. . . . . . 79
4.17 Gene expression variability increases as the number of target sites increases 80
5.1 lncRNA targets have low expression in some tissues . . . . . . . . . . . . 86
5.2 lncRNA targets have low expression in some tissues . . . . . . . . . . . . 88
5.3 Highly expressed lncRNAs in the cytoplasm . . . . . . . . . . . . . . . . 90
5.4 Distribution of lncRNAs on the human chromosome. . . . . . . . . . . . 92
5.5 Abundance of target and non-target lncRNAs in 26 different tissues. . . . 93
6.1 Proposed approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
xi
Chapter 1
Introduction
MicroRNAs (miRNAs) are short (21-25 nt) non-coding RNAs that repress the expres-
sion of their direct target mRNAs [1, 2]. miRNAs play critical roles in a wide range of
normal and diseased-related biological processes [3, 4]. miRNAs biogenesis can be con-
cisely described as follows (for details see Chapter II). Primary miRNAs (pri-miRNAs)
are transcribed from intra/intergenic genomic loci and cleaved by Drosha to form ap-
proximately 70-nt hairpin precursors (called pre-miRNAs) that are subsequently cleaved
by the RNase III enzyme, Dicer, to generate miRNA duplexes. One strand of the duplex,
the mature miRNA, is loaded into the RNA-induced silencing complex (RISC) [5] and
guides it to recognize mRNA targets through partial base pairing with the 3’ UTRs of
targets [6].
The presence of targets sites with perfect complementarity to the seed region (po-
sitions 2-7 of 5’ end) of miRNAs is a strong predictor of targeting but it is neither
sufficient nor necessary [6–10]. Over the last decade many other sequence determinants
have been proposed to specify efficient mRNA-miRNA duplexes including: AU composi-
tion flanking target sites [7], thermodynamic stability of binding sites [11], evolutionary
conservation of the seed, [12–14], secondary structure accessibility [5, 15–17], target-site
abundance [18, 19], seed-pairing stability [18], 3’ pairing contribution [7], loop in position
1
Chapter 1. Introduction 2
9-12 of miRNA-mRNA hybrids [9] and the binding location in the 3’ UTR [7, 17]. Due
to the limited number of validated miRNA targets, the exact specificity and sensitivity
of current determinants are unclear [20–23]; however, estimates of precision of these de-
terminants, alone or together, are typically reported to be about 50% at a sensitivity of
6-12% [24, 25], suggesting that sequence-based prediction methods are not fully capturing
miRNA target preferences.
Popular computational miRNA target prediction techniques use these sequence fea-
tures to determine the functional miRNA target sites [2, 7, 8, 12–15, 26–30]. These
techniques however ignore the cellular conditions in which miRNAs interact with their
targets in vivo. Gene expression data are rich resources that can complement sequence
features to take into account the context dependency of miRNAs. In mammals, it is
estimated that miRNAs primarily and dominantly repress the steady-state expression
level of their targets [31–39]. Therefore, down-regulation of an mRNA’s expression when
the miRNA is active is evidence of a functional target site on the gene in vivo. Although
many methods have been introduced to incorporate mRNA and miRNA expression data
into miRNA target predictions, existing methods either require paired miRNA-mRNA
data [40–53], have only been tested in miRNA transfection assays [33, 34, 54], or do not
consider the combinatorial impact of multiple miRNAs on mRNA expression [55, 56].
The main objective of this thesis is to improve the prediction accuracy of sequence-
based miRNA target prediction methods by incorporating large amount of mRNA ex-
pression into miRNA target prediction. We develop computational methods that do not
require miRNA expression data because not all miRNAs are profiled and those did are
noisy, have insufficient replicates, and have inconsistent measurements by different labs
or methods [57].
Recent study has shown that miRNAs may regulate long non-coding RNAs and vice
versa, so indirectly impact mRNA regulation [58–62]. In this thesis, we also study in-
teraction between miRNA, mRNAs, and lncRNAs using the expression abundance of
Chapter 1. Introduction 3
mRNAs and lncRNAs that contain miRNA target sites in a wide range of tissues.
This thesis consists of three parts. The first part concerns predicting the mRNA
targets of miRNAs located in the intron of protein coding genes (the so-called intronic
miRNAs). We develop InMiR, a computational method that not only predicts the targets
of intronic miRNAs but also determines if a host mRNA level is a good surrogate for the
intronic mRNA level. Because some intronic miRNAs are co-expressed with their host
genes (share the same transcriptional machinery), their expression level may highly be
correlated [43, 45, 63–72]. Accordingly the inverse correlation between host and target
mRNAs of intronic miRNAs may be an indiction of targeting. InMiR applies this notion
into a linear regression model in which the expression abundance of a target mRNA
is expressed in terms of the expression level of host mRNAs whose intronic miRNAs
have seed match sites in the target mRNA. InMiR identifies 1,935 mRNA targets for 22
intronic miRNAs and determines that at least 30 % of miRNAs are co-expressed with
their host mRNAs.
In the second part of the thesis, we develop BayMiR, a Bayesian method that scores
miRNA-mRNA pairs based on the endogenous footprint of miRNAs on genome-wide
gene expression. BayMiR provides an “endogenous target repression” score which iden-
tifies the contribution of each miRNA in repressing a target gene in presence of other
targeting miRNAs. BayMiR relates the changes in the log-transformed expression level
of mRNAs to the activity level of miRNAs. Since miRNA and target mRNA expression
data are anti-correlated [73], for each miRNA, BayMiR uses the negative mean of target
expression levels as an estimate of the activity level of the miRNA. BayMiR analysis was
conducted on 1,539 human miRNAs and the expression levels of 13,303 genes measured
on 5,372 microarray experiments and predicts that approximately 60 % of miRNA-mRNA
duplexes with matched conserved targets sites have detectable down-regulation signal on
gene expression.
In the third part, we study the interactions between miRNAs and lncRNA as well
Chapter 1. Introduction 4
as the impact of this interaction on mRNAs. lncRNAs are suggested to act as miRNA
sponges and consequently reduce miRNA functionality. In addition, some studies indicate
that miRNAs can regulate lncRNA post-transcriptionally in a similar manner to that of
mRNAs. Our analysis on expression abundance of 7,535 RNA transcripts (mRNA and
lncRNA) across 27 tissues shows that the lncRNA target set of some miRNAs have
relatively low abundance in the tissues that these miRNAs are highly active, suggesting
that miRNAs may modulate the expression of these lncRNAs in some specific tissues. We
also found lncRNAs and mRNAs that shared many targeting miRNAs are significantly
positively correlated, indicating that these set of highly expressed lncRNAs may sponge
the miRNAs to promote mRNA regulation. Our analysis also showed that the lncRNAs
that are highly expressed in the cytoplasm are under selective pressure to have less target
sites compared to those highly expressed in the nucleus, suggesting that miRNAs may
regulate only cytoplasmic specific lncRNAs.
This thesis is organized into six chapters as follows:
• Chapter 2 provides background on miRNA biogenesis and describes popular exper-
imental and computational miRNA target prediction methods. We also discuss the
pros and cons of the these methods.
• Chapter 3 describes InMiR. We verify InMiR performance and compare it with
HocTar, the only available intronic miRNA prediction method. Using InMiR, we
analyze 140 Affymetrix datasets from Gene Expression Omnibus and build a net-
work of 19,926 interactions among 57 intronic miRNAs and 3,864 targets. InMiR
also predicts which host genes have expression profiles that are good surrogates for
those of their intronic miRNAs. We show host genes that InMiR predicts are bad
surrogates contain significantly more miRNA target sites in their 3’ UTRs and are
significantly more likely to have predicted Pol II and Pol III promoters in their in-
trons. By combining our results with previous reports, we distinguish three classes
Chapter 1. Introduction 5
of intronic miRNAs: Those that are tightly regulated with their host gene; those
that are likely to be expressed from the same promoter but whose host gene is
highly regulated by miRNAs; and those likely to have independent promoters.
• In Chapter IV we introduce BayMiR, a computational Bayesian method that scores
an miRNA-mRNA pair based on the endogenous repression of the mRNA induced
by the miRNA in presence of all other miRNAs that have conserved seed match sites
in the 3’ UTR of the mRNA. We show BayMiR assigns higher scores to predicted
miRNA targets that are more down-regulated in miRNA over-expression assays,
enriched for independently validated targets, and more consistently annotated with
GO and KEGG terms when compared with high-scoring TargetScan targets and
Cometa. In this chapter we also show that validated miRNA targets exhibit high
expression variability and suggests that gene expression variation can also be used
as a score for predicting miRNA targets.
• Chapter V addresses the possible interaction between lncRNA and miRNA by
analyzing the lncRNA expression data measured across 26 tissues. We investigate
whether miRNAs can repress the expression of lncRNAs and if lncRNA can sponge
miRNAs to mediate their functions.
• Finally Chapter VI summarizes thesis achievements, biological importance of re-
search, and gives some direction for future research in this field.
Chapter 2
Background and Literature Review
2.1 small RNAs
The human genome encodes a wide variety of functional elements with either defined
products (e.g., mRNAs) or reproducible biochemical signatures (e.g., transcription fac-
tor binding sites). Non-coding RNAs are a class of RNAs that are distinguished from
messenger RNAs in that they are not translated into proteins. So far approximately
19,000 non-coding RNAs have been annotated in the human genome [74]. Non-coding
RNAs are divided into two sub-categories: long non-coding RNAs (>200 nt) and small
non-coding RNAs. Small ncRNAs are short (18-200 nt) RNAs with roles in almost every
aspect of biology of animals, plants, and fungi. The main small RNAs are:
• Transfer RNAs (tRNAs) typically 73 to 93 nucleotides in length carry amino acids
to the translation machinery
• Small nucleolar RNAs (snoRNAs): they guide the modifications of RNAs; one of
these modifications is methylation; they are highly involved in ribosomal RNA
nucleotide modification.
• Small interfering RNAs (siRNAs) also known as silencing RNAs are 20-25 nt in
6
Chapter 2. Background and Literature Review 7
length double stranded exogenous RNAs. They interfere with the expression of
the genes with complementary nucleotide sequence. There is a subtle difference
between miRNAs and siRNAs in that siRNAs are exogenous either are taken up by
cells or enter via vectors like viruses. In addition siRNAs typically bind perfectly
to their targets. siRNAs are used to validate gene function through transfection
experiments.
• Small nuclear ribonucleic acids (snRNAs) are involved in transcription, splicing,
and formation of precursor mRNAs. They are also associated with small nuclear
ribonucleoproteins (snRNP).
• Piwi-interacting RNAs (piRNAs); they interact with Piwi proteins and silence
genes; their functionality remains largely unknown but recent studies suggest their
role as protecting the genome from invasive transposable elements in the germline
expressed primarily in the testes [75].
• MicroRNAs (miRNAs); they impact many biological processes through post-transcriptional
modulation of gene expression. In the following, we describe the biogenesis of miR-
NAs in details.
2.2 miRNAs biogenesis
MicroRNAs (miRNAs) are a class of short (21-25 nt) non-coding RNAs that play impor-
tant roles in post-transcriptional modulation of gene expression in animals and plants.
miRNAs were first discovered at Ambros lab in 1993 as regulator of developmental tim-
ing in C. elegans [76] using monitoring for mutants with increasing phenotypes but not
distinguished as a distinct class of regulatory genes until 2000. miRNAs are associated
in various aspects of animal development, function as tumor suppressors, oncogenes, and
their phenotypic signatures have been found in various studies during the past 13 years
Chapter 2. Background and Literature Review 8
[4, 65, 77–83].
Herein, we describe the biogenesis of miRNAs in the eukaryotic organisms, especially
in the human. miRNAs are encoded in different loci in the human genome both in
genic and intergenic regions [84]. The transcription of miRNAs proceeds in four or
five steps and takes place in both the nucleus and cytoplasm (Fig. 2.1). Apart from
some miRNAs residing in the introns of protein coding genes, the transcription occurs
as follows: Long primary miRNAs (pri-miRNAs) are transcribed from intra/intergenic
genomic loci by Polymerase II (Pol II) in the nucleus. Primary miRNAs are capped and
polyadenylated to maintain stability and then cleaved by an enzyme called Drosha and
its co-factor Pasha to form approximately 70-nt hairpin precursors (pre-miRNAs). Some
microRNA precursors are modified by enzymes such as Tutases [85]. The pre-miRNAs are
transported into the cytoplasm by exportin-5 and subsequently cleaved by the RNase III
enzyme, Dicer, to generate a 19-25 nt double-stranded duplex. This duplex is loaded into
the RNA-induced silencing complex (RISC)[5]. The entire composition of RISC is not yet
known but Argonaute 1-4 proteins (AGO1-4) along with the mature miRNA are shown
to be the main contributors in gene silencing [86]. The mature miRNA guides RISC to
recognize mRNA targets through partial base pairing with the 3’ UTRs of targets [6].
Finally, the miRNA is released and takes part in another round of regulation. The
transcription process for some miRNAs residing in introns (i.e., intronic miRNAs) is
slightly different. This group of intronic miRNAs are processed from the spliced introns
of their host genes. In this case, introns are folded and make either long or short hairpin
structures which, in the latter case, they directly form the precursor miRNAs and obvi-
ate Drosha incorporation; this latter group is called mirtrons [87]. We discuss intronic
miRNAs later in this thesis in the third chapter.
Chapter 2. Background and Literature Review 9
ANRV324-CB23-08 ARI 24 August 2007 13:45
Figure 1miRNA biogenesis. An miRNA gene is transcribed, generally by RNA polymerase II (Pol II), generatingthe primary miRNA (pri-miRNA). In the nucleus, the RNase III endonuclease Drosha and thedouble-stranded RNA-binding domain (dsRBD) protein DGCR8/Pasha cleave the pri-miRNA toproduce a 2-nt 3′ overhang containing the ∼70-nt precursor miRNA (pre-miRNA). Exportin-5transports the pre-miRNA into the cytoplasm, where it is cleaved by another RNase III endonuclease,Dicer, together with the dsRBD protein TRBP/Loquacious, releasing the 2-nt 3′ overhang containing a∼21-nt miRNA:miRNA∗ duplex. The miRNA strand is loaded into an Argonaute-containingRNA-induced silencing complex (RISC), whereas the miRNA∗ strand is typically degraded.
www.annualreviews.org • microRNA Functions 177
Ann
u. R
ev. C
ell D
ev. B
iol.
2007
.23:
175-
205.
Dow
nloa
ded
from
arj
ourn
als.
annu
alre
view
s.or
gby
Uni
vers
ity o
f T
oron
to o
n 11
/16/
09. F
or p
erso
nal u
se o
nly.
Figure 2.1: (the figure copied from [88]) Primary miRNAs (pri-miRNAs) are transcribed fromintra/intergenic genomic loci and cleaved by Drosha to form approximately 70-nt hairpin pre-cursors (pre-miRNAs) that subsequently cleaved by the RNase III enzyme, Dicer, to generatemiRNA duplexes. One strand of the duplex, the mature miRNA, is loaded into the RNA-induced silencing complex (RISC) and guides it to recognize mRNA targets through partialbase pairing with the 3’ UTRs of targets
2.2.1 Mechanisms of miRNAs-mediated gene regulation
miRNA target recognition in animal and plant is slightly different. In plants, miRNAs
cleave and degrade the mRNA target through nearly perfect Watson-Crick base pairing
to the 3’ UTR region of the mRNA target. In animals, by contrast, miRNAs pair im-
perfectly to their targets and the mechanism by which interfere gene expression is not
well-understood. Overall, two mechanisms for miRNA-mediated regulation of genes have
been suggested: translation inhibition and mRNA degradation [31, 33, 35–37]. MiRNAs
Chapter 2. Background and Literature Review 10
can inhibit the translation of an mRNA to a protein in the initiation, elongation, or
termination stages. Initially, it was thought that translational inhibition is the primary
mode of miRNA regulation in animals and miRNAs destabilize the mRNA target only if
perfect complementary occurs. Later, however, several independent studies showed that
significant portion of the reduction in the protein product (> 84%) is due to miRNA-
induced changes at the transcriptional level [31, 36, 37]. Accordingly, in mammals,
miRNAs primarily and dominantly repress the steady-state expression level of their tar-
gets. Several mechanisms for mRNAs destabilization by miRNAs have been suggested.
mRNA degradation is initiated by a shortening of the mRNA poly(A) tail which eventu-
ally leads to mRNA deadenylation followed by decapping and subsequent exonucleolytic
digestion [35]. mRNAs degradation is often taken place in P-bodies which are enriched
in enzymes involved in mRNA turnover [89]. One of the most abundant elements in
P-bodies is a protein called GW182 [90] which recruits the deadenylase and decapping
complexes and exerts the mRNA destabilization. Deadenylation is the primary effect in
post-transcriptional regulation of mRNAs; miRNAs has been known to promote mRNA
destabilization [91]. In this process, GW182 interacts with Argonaute proteins which
together promote the recruitment of the CAF1-CCR4-NOT1 deadenylase complex, fol-
lowed by decapping and exonucleolytic digestion [92]. MiRNAs may also up-regulate the
expression of a gene indirectly by targeting genes that down-regulate this gene.
2.3 Identification of miRNA targets
Perhaps the most challenging and important issue in the study of miRNAs is identi-
fying the bona fide mRNA targets in animals. The function of a miRNA is specified
by its targets. During the past decade, numerous efforts have been made to improve
miRNA target identification but relatively few mRNA targets have been experimentally
validated. There are several reasons for this incomplete specification of miRNA targets.
Chapter 2. Background and Literature Review 11
First, sequence complementary between miRNAs and their targets is imperfect; short
sequence of miRNAs (19-25 nt) as well as imperfect base pairing with their targets make
hundreds of genes candidate targets for each miRNA, many of which are false positive:
approximately 70% of known genes have predicted putative targets. Also, it is unclear
how RISC elements are recruited and interact to silence the targets. miRNA regulation
is often situation-, time-, or tissue- specific. As such, a gene might only be a functional
target of a miRNA in a specific time and tissue even when there is a sequence comple-
mentary between the target and miRNA. Finally, many of short RNA sequences reported
as miRNAs are actually miRNAs.
There are a wide variety of experimental and bioinformatic methods to determine the
miRNA targets. In the following, we briefly describe these techniques.
2.3.1 miRNA over-expression experiments
In this method, miRNAs are transfected into cells and change the expression level of tran-
scripts are measured using mRNA expression profiling. The transcripts whose expressions
significantly decrease after miRNA transfection are declared targets. One notable break-
through was the experiment conducted by Lim et al [33] who showed that transfecting
a tissue-specific miRNA into HeLa cell shifts the expression profile to that of the tissue
where the miRNA preferentially expressed. miRNA transfection experiments have sub-
sequently been extensively used to evaluate the sequence features proposed for target
identification and validate the functional targets predicted by computational methods
[7, 37, 93–95]. Exogenous miRNAs transfection however perturbs the expression levels of
the targets of miRNAs endogenously expressed in the cell [39]. Thus miRNAs transfection
may cause up-regulation of endogenous miRNA targets, probably due to the competition
for limited number of RISC many of which are taken up by transfected miRNA in the
cell under experiment [39].
Chapter 2. Background and Literature Review 12
2.3.2 miRNA knockdown (antagonism) experiments
In miRNA knockdown experiments, the expression of miRNAs are inhibited using dif-
ferent strategies and subsequently significantly up-regulated transcripts are treated as
targets of the inhibited miRNAs [96]. One approach to inhibit a miRNA is to use syn-
thetic miRNA targets, the so called antimirs [97–99]. Antimirs are chemically modified,
single-stranded nucleic acids designed to specifically bind to and inhibit miRNAs. These
ready-to-use inhibitors can be introduced into cells using tr Other approach is to interfere
in the RISC formation—vital part of the miRNA regulation machinery— and measure the
change in expression level of transcripts in the tissue in which some miRNAs are highly
expressed; those up-regulated are possibly functional targets. Detecting up-regulation
signals in the target set in knockdown experiments is weak compare to down-regulation
signal in the over-expression experiments, making the latter a better choice [98].
2.3.3 Prediction based on HITS/PAR-CLIP
High-throughput sequencing of RNA isolated by crosslinking immunoprecipitation (HITS-
CLIP)[100] and photoactivatable-ribonucleoside-enhanced crosslinking and immunopre-
cipitation (PAR-CLIP) [101] have been applied to determine the binding sites of RISC
proteins mainly AGO/EIF2C1-4. In HITS-CLIP, Argonauts bound RNAs are isolated,
purified, and sequenced to identify sequence regions complementary to the nucleotides 2-7
of 5’ end of miRNAs (seed regions). Alternatively, PAR-CLIP method provides a high-
resolution crossing linking by incorporating photoreactive ribonucleoside analogs into in
vivo RNA transcripts. PAR-CLIP improves the separation of UV-crosslinked target RNA
segments from background when compared with solely CLIP-based methods. In contrast
to previous findings, PAR-CLIP data suggests that CDS regions are also highly enriched
for RBP sites associated with RISCs.
There are however technical difficulties associated with implementing these methods;
Chapter 2. Background and Literature Review 13
furthermore, limited data are available ( target sites for 25 miRNAs), and surprisingly,
predicted targets using CLIP-based methods poorly overlap with the target sets predicted
by popular target prediction methods. In addition, CLIP based experiments can only
identify target regions (about 100 nt) and not the precise position of miRNA binding
sites; moreover, CLIP assays can only be used in cell lines.
2.3.4 Target prediction using luciferase reporters
Luciferase based vectors are commonly used to monitor the expression change of miRNA
targets [102, 103]. Luciferase vectors contain luciferase genes from Renilla or firefly that
emit light. In this method, the 3’ UTR segment of a gene that included miRNA target
sites is inserted between the luciferase coding sequence and the poly(A) signal. Next
luciferase vectors are transfected into a cell line and luciferase activity measured and
compared to that of an analogous reporter with the mutant 3’ UTR sequence. Luciferase
vectors have been extensively used to validate the functional targets [7, 33]. Luciferases
are, however, costly, labour intensive, lack reproducibility between samples, making this
approach unlikely to be scalable to genome-wide determination of miRNA target sites
[104].
2.3.5 Measuring the protein output after miRNA over expres-
sion or antagonism
Since miRNAs inhibit the translation of mRNA to protein, over-expression or loss of a
miRNA in a cell should decrease or increase the protein levels of its target mRNAs. To
quantify this impact, two recent coincident studies revealed the impact of miRNAs on
protein output [36, 37]. In these studies, miRNAs are over-expressed or knocked-down
in a cultured cell and then stable isotope labeling by amino acids in cell culture (SILAC)
followed by mass spectrometry is used to measure the protein level. Together these
Chapter 2. Background and Literature Review 14
works found that : (i) the target genes of most down-regulated proteins are enriched in
hepta-nucleotide motifs associated with the seed regions of transfected miRNAs; (ii) the
opposite effect was observed when miRNAs were knocked down but in much lesser extent
compared to over-expression. (iii) change in the expression level of mRNAs correlates well
with that of their proteins, suggesting that mRNA degradation may be the primary effect
of miRNAs on gene regulation. Proteomic experiments are, however, very expensive and
time consuming and only handle a small fraction of proteins at a time. Moreover they
cannot be used to study the impact of miRNAs on non-coding transcripts.
2.4 Computational miRNAs target prediction meth-
ods
In metazoa, miRNAs pair imperfectly to almost all known transcripts; partial base paring
makes the identification of bona fide targets very difficult. The computational methods
mostly exploit the attributes identified using experimental methods to provide a genome-
wide prediction of the targets of all known miRNAs. Many of the computational meth-
ods have applied the following determinants: perfect Watson-Crick base pairing with the
miRNA seed region (the 2-7 nucleotides on the 5’ end of a miRNA)[6], AU composition
of surrounding sequence [7], thermodynamic stability of binding sites [11], evolutionary
conservation of the seed [12, 13], accessibility of binding sites [5, 7, 15–17], target-site
abundance [18], seed-pairing stability [18], 3’ pairing contribution [7], and binding posi-
tion in the 3’ UTR [7]. In the following, we describe some of most popular computational
methods that use the above determinants to predict the miRNA targets.
2.4.1 TargetScan
The first version of TargetScan was introduced in collaboration between Bartel and Burge
labs at MIT in 2003 [6]. TargetScan has been frequently updated. Release 6.2 launched
Chapter 2. Background and Literature Review 15
in June 2012 predicts the targets of nine mammalian species including human [7, 12, 18].
In addition to target prediction, TargetScan provides a wide range of information and
options about miRNA and transcript sequences; here we focus on the target prediction
aspect of TargetScan. TargetScan predicts a protein coding gene to be a target of an
miRNA if the 3’ UTR of the target harbors a conserved 7 mer or 8 mer motifs that can
base pair the seed region (the 2-7 nucleotides on the 5’ end of a miRNA). Two types of 7
mer motifs are defined: those with exact match to the seed+the position 8 of the miRNA
and those with exact match to the seed followed by an A. 8 mer motifs are those that
match seed+position 8 of miRNAs followed by an A. The 7 mer and 8 mer motifs are
commonly called target sites. A target sites is declared to be conserved if its conserved
branch length score is above a threshold as defined in [12]. In TargetScanS, a refinement
of TargetScan, the efficacy of a target site is specified by eight determinants. The weights
assigned these scores are obtained from over-expression miRNAs experiments where each
score reflects the correlation between the log-fold change of down-regulated transcripts
after miRNA transfection and presence/absence of the given determinant in the down-
regulated transcripts. The scores are as follows:
• Type site contribution which determines score for 7 mer and 8 mer motifs; 8 mer
motifs are assigned higher score since transcripts containing 8 mer are more down-
regulated than those with 7 mer.
• The 3’ contribution: complementarity of the target sequence to a region outside
of the seed (especially nucleotide 13-17 on the 3’ end of the miRNA) improves the
down-regulation.
• Local A+U rich content: the flanking sequences of the seed in down-regulated
targets are enriched for A and U nucleotides.
• Target site position contribution: the more target sites are away from the centre of
the 3’ UTR, the more down-regulation achieved.
Chapter 2. Background and Literature Review 16
• Target abundance : miRNAs whose target sites are enriched in many transcript are
weak regulators because they dilute their effect of target transcript.
• Seed pairing stability: the stability of a miRNA-target duplex determines the effi-
cacy of targeting, a weaker SPS deceases the miRNA targeting.
• Conserved branch length for each site.
TargetScanS adds all individual scores except the conservation score and denotes the
aggregate the context+score. These determinants represent the sequence and location
characteristics of miRNA target sites. TargetScan scans approximately 18,000 genes
(30,000 transcripts) for conserved and non-conserved target sites match to the seed of
about 1,200 miRNAs families (1,500 individual miRNAs annotated by miRBase [105]).
TargetScan identifies half a million conserved target sites for all 1,500 miRNAs. Exploring
interactions identified by TargetScan shows each transcript harbors on average 25 target
sites and each miRNA targets on average 324 transcripts; on the extreme, miRNA hsa-
miR-3163 has seed match target sites in 2575 transcripts and gene TNRC6B contains
507 seed match sties.
2.4.2 Pictar
Pictar was developed by Rajewsky’s group in 2005 [106]. Analogous to TargetScan,
Pictar uses the 5’ end of the miRNA to identify targets but with minor differences.
Pictar defines the seed as a sequence of the length 7 nt starting at position 1 or 2 of the
5’ end of the miRNA. In addition, imperfect base pairing are allowed between the miRNA
and regions in the 3’ UTR of the target (one deletion or one insertion). Pictar applies two
filters to all perfect and imperfect predicted target sites. The first filter retains all target
sites that are conserved across the human, chimpanzee, mouse, rodent, chicken, and fish.
The second one filters out the target sites whose free energy of the entire miRNA:mRNA
duplex are above a threshold. The perfect and imperfect target sites that pass these two
Chapter 2. Background and Literature Review 17
filters are assigned probability p = 0.8 and 1−p#of imperfects sites
, respectively. The target site
probabilities are then used in a hidden Markov model (HMM) to compute the posterior
probability that a 3’ UTR is generated by motifs complementary to the seeds of a set
of miRNAs. The states of the HMM are associated with miRNA binding sites and
background. In this way, Pictar scores reflect the combinatorial targeting of miRNAs. In
this regards, Pictar is, in essence, similar to Ahab, a method developed by the same group
to determine the transcription factor binding sites [107]. Pictar has not been updated
since was first introduced; Pictar predicts 42,073 interactions among 6,108 mRNAs and
132 miRNAs far less than TargetScan. Despite the short coverage of the genome, Pictar
predicts approximately 27 percent of 1,129 experimentally validated targets which placed
it second after TargetScan based on our analysis. Pictar has many tuning parameters
and the sensitivity of prediction to the parameters is unclear and the software is not
publicly available.
2.4.3 miRSVR-miRanda
As explained earlier, TargetScan scores reflect correlation between presence of a miRNA
target site with some particular attributes in the target sequence and the down-regulation
level of the target after over-expressing the miRNA . Inspired by this, miRSVR [8], a
prediction method developed by Leslie’s group in 2010, uses miRanda sequence determi-
nants (as input) and mRNA expression data after miRNA over-expression (as output) to
train a support vector regression classifier for prediction [108] . Given a set of determi-
nants, miRSVR then predicts the expected down-regulation of the targets. miRSVR uses
predicted targets identified by the miRanda, an algorithm that uses dynamic program-
ming to score a 3’ UTR-miRNA duplex based on maximum complementary alignment.
MiRanda applies the following sequence alignment scores: G:C and A:T are +5, +2 for
G:U wobble pairs, -3 for mismatch pairs, and the gap-open and gap-elongation parame-
ters were set to -8.0 and -2.0. Moreover, miRanda scales complementary subsequences to
Chapter 2. Background and Literature Review 18
the 5’ end of miRNA by factor 2 to account for the importance of seed match. Selected
target sites by dynamic programming that pass the free energy of duplex formation and
conservation filters are declared target. miRSVR analysis shows that some targets with
non-conserved, imperfect complementary seed match are significantly down-regulated on
the transfection assays; moreover they showed that the set of experimentally validated
targets are assigned high scores by miRSVR. Although miRSVR claims that it borrows
its strength from the SVR classifier, Garia et al did not gain any performance improve-
ment when they replaced their simple regression classifier with a SVM type classifier [18].
mirSVR scores 680,066 interactions among 17,467 mRNAs and 248 miRNAs; only 710 of
these interactions are experimantally validated and rate of false positive is not clear.
2.4.4 GenMiR
GenMiR was developed in Morris and Frey labs in 2005 [51, 52, 109]. GenMiR integrates
matched mRNA-miRNA expression data into sequence-based prediction methods using
a probabilistic model. GenMiR computes the posterior probability that a target is bona
fide using the product of prior probability and likelihood. The prior probability is ini-
tially obtained from TargetScan predicted targets and learnt when fitting the model. The
likelihood is computed from the expression data using a linear model that relates that the
change in the expression level of the target to those of targeting miRNAs predicted by
TargetScan. GenMiR uses the expectation-maximization algorithm to learn the parame-
ters of the model and to infer the posterior probabilities. Because computing the posterior
probability is intractable, GenMiR applies a variational Bayesian method to replace the
posterior probability with a simpler probability. GenMiR++ is the latest version of the
GenMiR which includes more model parameters to account for the tissue specificity of
miRNAs. GenMiR is the first prediction method that takes into account the multiple tar-
geting effect of miRNAs when incorporating paired miRNA-mRNA expression data into
sequence prediction algorithms. This development had a distinct advantage over meth-
Chapter 2. Background and Literature Review 19
ods that use pairwise correlation between miRNA and mRNA expression vectors since
miRNAs have been shown to co-operatively regulate gene expression. GenMiR however
needs a large number of matched miRNA-mRNA expression data sets to attain an ac-
curate prediction. Moreover since GenMiR applies variational inference, the posterior
probability is simplified to a uni-modal probability which may not capture the variation
in the actual posterior. GenMiR provides high confidence scores for approximately 1,500
mRNAs when applying 104 miRNAs and 88 paired expression data. GenMiR success-
fully predicts experimentally validated targets of the let-7 family, one of the well studied
miRNA family.
2.4.5 HOCTAR
About half of the discovered miRNAs are resided in the introns of protein coding genes—
the so-called intronic miRNAs [110]. A number of mRNA-miRNA expression profiling
experiments have shown that the expression levels of some intronic miRNAs are posi-
tively correlated with those of their host genes, suggesting that they may share the same
transcriptional elements and hence co-expressed [43, 45, 64, 67]. Hoctar, developed in
Banfi’s lab in 2009, scores the target genes of intronic miRNAs based on anti-correlation
between the expression vectors of host and target genes of intronic miRNAs. HocTar
works as follows. HocTar chooses 160 mRNA expression microarray data sets from the
Affymetrix HG-U133A platform. For each intronic miRNA, in each data set HocTar mea-
sures correlation coefficients between the expression vector of the host gene and those of
putative targets of the intronic miRNAs. The putative targets of the intronic miRNA are
obtained by taking the union of targets predicted by TargetScan, miRanda, and PicTar.
HocTar then selects the top 3 percentile of most negatively correlated targets in each
data set. This process is repeated for all 160 data sets and each target is assigned a score
based on its cumulative occurrence in the selected targets (top 3 percentile ) across data
sets. The authors showed that the first 50th percentile of the rank list of predicted tar-
Chapter 2. Background and Literature Review 20
gets are highly enriched for experimentally validated targets as validation test. Although
using the host gene expression levels as surrogate for the expression levels of intronic
miRNAs is a remarkable novelty of HocTar, HocTar has several shortcomings. HocTar
assumes that all intronic miRNAs are co-expressed with their host genes whereas recent
study has shown only 20-40 % of intronic miRNAs do so [111–118]. In addition, Hoctar
computes the correlation coefficients between the individual probes of a host and a target
gene; as such, a host-target pair may have different correlation coefficients; HocTar then
selects the probe with higher negative coefficient and ignores the others. Using the mean
or median of probe expression vector of a gene, which is commonly used, may better
reflects the actual correlation between the expression levels of host and target genes.
Furthermore, the threshold used in Hoctar (top 3 percentile) is not statistically defined.
Lastly, HocTar pools together the target sets predicted by three different methods that
has shown to have least overlap which potentially can increase false positive rate [20].
2.4.6 COMETA
COMETA, developed by Banfi’s group in 2012, is a prediction method that applies a pro-
cedure similar to HOCTAR but not limited to intronic miRNAs [55]. COMETA works
based the assumption that the functional targets of a miRNA tend to be co-expressed,
and identifies co-expressed target sets as follows. Similar to HOCTAR, COMETA pools
together the targets predicted by TargetScan, miRanda, and Pictar and uses 217 microar-
ray gene expression data sets to determine the co-expression targets. For each target gene
of a given miRNA and in each data set, COMETA computes Pearson correlation coeffi-
cients between the target and all other genes on the assay and generates a rank list based
on these coefficients. This process is repeated across all data sets and each gene is as-
signed a score based on its cumulative occurrence on the top third percentile of the ranked
lists of the targets. Carrying out this procedure for all targets of a selected miRNA we
obtain co-expression lists consisting of co-expression scores of all genes with significant
Chapter 2. Background and Literature Review 21
positive correlation with the targets. Finally, COMETA averages the co-expression scores
of a gene across co-expression lists to obtain a co-rank list of targets for each miRNA.
The authors showed that experimentally validated targets significantly place above the
median in the rank lists and concluded that co-expressed targets are possibly functional.
They also observed when grouping the target sets into two clusters using hierarchal clus-
tering of co-expression scores, the co-rank list of one cluster is significantly different from
a co-rank list obtained from a same-size random subset of targets. They showed genes in
this cluster are more down-regulated than those in the other cluster after over-expressing
miRNA-26 and miRNA-98. Using this finding, they built target gene networks for 755
miRNAs and showed some of these networks enriched in some biological processes.
Analogous to HOCTAR, when making the co-expression rank list, COMETA consid-
ers the probe with highest correlation among multiple probe sets representing the same
gene. Moreover, when using union of targets predicted by three prediction methods, a
miRNA target list contains a large number of genes (of order 300-1000); among this set,
it is high likely to find co-expressed genes that participate in some biological processes.
For instance, within a set of 1000 co-expressed genes, 500 may be targets of a miRNA;
this does not implies this 500 genes are co-expressed because they are miRNA targets.
Finally, when performing hierarchical clustering, it is unclear why the targets are always
clustered into two groups and why one group has more consistent co-rank list and the
other lacks.
2.4.7 Sylamer
Sylamer is a prediction method that uses the hypergeometric test to identify if a miRNA
seed match is overrepresented in the top/bottom of a gene list ranked based on their
expression levels after over-expressing/knocking down the miRNA [54]. In other words,
Sylamer is a systematic approach to identify if the down/up-regulated genes harbor
excessively the target sites matched to the seed region of a given miRNA after transfecting
Chapter 2. Background and Literature Review 22
the miRNA into a cell line and profiling the mRNAs. Sylamer works as follows: Let N
denote the number of genes ranked based on their expression levels in a miRNA over-
expression experiment. Let Mi denote the number of genes whose expression levels is less
than an incremental cutoff i× T where i = 1, 2 . . . is incremented till Mi reaches N . Let
SN and SMidenote the number of the miRNA seed match ( i.e. motifs in 3’ UTR of genes
complementary to the seed match of a given miRNA) in all (N) and selected (Mi) genes,
respectively. For each i, Sylamer computes a P-value using a hypergeometric test with
input parameters N , Mi, SN , and SMito identify if SM seed matches are significantly
over-represented in set of Mi genes compared to SN seed matches presented in N genes.
Finally Sylamer generates a curve using computed Pis and searches for a peak on the
curve. Occurrence of a peak at the top of the rank gene list implies that most down-
regulated target sequences are significantly enriched for the seed match and subsequently
they may be functional targets of the over-expression miRNA. The procedure for miRNA
knockdown experiments is the same but the genes are ranked in descending order based
on their expression levels. Sylamer enrichment plots confirmed the overrepresentation
of motifs complementary to seed of miR-155 and miR-430 in the down-regulated genes
after miR-155 and miR-430 over-expression, respectively. Sylamer usage is limited to
miRNA transfection experiments in which the common method for identifying targets
is to compare the cumulative distribution of gene expression levels harboring the seed
match with all other genes [7]. Whether Sylamer outperforms the cumulative distribution
comparison based methods is unclear. In addition, Sylamer does not propose how to
recognize a significant peak rather than visually inspecting the enrichment plot.
2.4.8 MIRZA
Given a set of mRNA fragments cross-linked in AGO-clip experiments and a set of miRNA
sequences, MIRZA models the mRNA-miRNA hybrid structures and estimates the model
parameters. MIRZA infers the model parameters by maximizing the binding probability
Chapter 2. Background and Literature Review 23
of mRNA fragments in Ago-CLIP data [9]. For each miRNA, µ, and mRNA fragment
m, MIRZA defines the target quality R(m|µ) the ratio of the probability m bound to µ
among other target sites and the abundance of m. MIRZA links R(m|µ) to E(µ,m, σ)
defined as the free binding energy of RISC-loaded µ bound to m where the hybrid has the
configuration σ. E(µ,m, σ) consists of two parts: (i)Estr(σ) which depends on different
hybrid structures such as energy of symmetric loop, and (ii) sum of energy of each
hybridized pair. To infer the energy parameters, MIRZA maximizes the likelihood with
respect to these parameters; the likelihood defined as∏
i
∑µR(mi|µ)πµ where πµ is the
probability that a bound RISC loads µ. The model fit by MIRZA shows the highest
tendency for hybridization in the positions 2-7 of 5’ end of the miRNA (seed match)
followed by positions 13-16 and 18-19 ; position 9 is not supported for hybridization by
MIRZA since it opens a symmetric loop of the length 3 nucleotides followed by a bulge.
They used 2,988 51-nucleotide-long mRNA fragments that cross-linked in Ago-CLIP
data. Using the estimated parameters, they predicted the miRNA µ that binds to m by
maximizing the energy term E(µ,m, σ). They showed MIRZA predicts targets as good as
other popular methods when using miRNA-induced log-fold change in transfection data.
MIRZA predicts that many non-canonical target sites might be effective and efficacy of
miRNA targeting depends on their expression levels, i.e. low expression needs perfect
seed match whereas high expression not. MIRZA is in fact a special case of pair HMM
designed for sequence alinement [119]. The hybrid structure predicted by MIRZA was
previously addressed by other groups and the finding that non-canonical sites might be
effective was observed in miRSVR [7, 8]. Nonetheless MIRZA is the first method that
estimates the energy parameters of miRNA-mRNA hybrid using a probabilistic approach.
Chapter 2. Background and Literature Review 24
2.5 Conclusions
Experimental miRNA target prediction approaches are unable to provide genome-wide
prediction of miRNA targeting. As the number of identified miRNAs grows using experi-
mental approaches becomes more limited since these methods are costly, time consuming,
incomprehensive. Bioinformatic methods, on the other hand, can provide a genome-wide
prediction of miRNA targets. During the past decade many miRNA target prediction
methods have been developed. The vast majority of these methods use sequence deter-
minants to predict the target genes of miRNAs. Many performance evaluation studies
have shown that current sequence features alone cannot provide accurate prediction of
miRNA targeting. Using mRNA and miRNA expression data can supplement the se-
quence features to obtain more accurate prediction. Unfortunately, not all miRNAs are
profiled and even with the advent of high throughput sequence techniques measuring
accurate abundance of mature miRNAs remains a challenge. On the other hand mRNA
expression data are abundant, less noisy and available for a wide range of biological
samples. Therefore, augmenting sequence based determinants with mRNA expression
data is a promising notion that can improve prediction of miRNA targets. In this the-
sis, we devise computational miRNA target prediction methods that incorporate mRNA
expression data into sequence prediction methods. We show that our proposed methods
provide better predictive estimates than those reported by the state-of-the-art target
prediction methods. Almost all popular miRNA target prediction methods score the
strength of a miRNA-mRNA pair using sequence evidence. Although these methods
show that these scores correlate with down-regulation of targets in the miRNA trans-
fected experiments, it does not necessary imply the down-regulation of targets in vivo
estimated from mRNA expression data. We, on the other hand, devise computational
methods that score miRNA-miRNA pairs based the down-regulation impact of miRNAs
in vivo. In addition, our scoring strategies include a large number of samples in contrast
to miRNA transfection scoring +methods which are limited to a small number of miR-
Chapter 2. Background and Literature Review 25
NAs and biological conditions. One of the areas that has not been explored in miRNA
target prediction studies is devising computational methods that measure the impact of
miRNAs on non protein coding genes, especially long non-coding RNAs. In this the-
sis, we also try to predict the lncRNAs that are targeted by miRNAs using mRNA and
lncRNA expression data sets.
Chapter 3
Intronic miRNAs and prediction of
their targets
3.1 Intronic miRNAs
Approximately half of mammalian miRNAs are hosted within the introns of protein-
coding genes, so it may be possible to predict the targets of some of these intronic miR-
NAs without having to measure their expression level. Indeed, many intronic miRNAs
appear to lack their own promoters and are processed out of introns[43, 45, 63–72]. Esti-
mates for the proportion of intronic miRNAs whose expression profiles are significantly
correlated with their host gene vary between 34% (25/74 [43]) and 71% (22/31 [67]). If
these co-expression relationships can be detected without having to measure the miRNA
expression, then host gene expression levels can be used as a surrogate for the miRNA
levels when doing target prediction [56]. There are substantial advantages to doing this.
First, host gene expression levels are measured at the same time and on the same plat-
form as the target gene expression levels, thus removing the need to model platform and
laboratory-based effects. Also, there are hundreds of suitable Gene Expression Omnibus
datasets for well-studied model organisms that can be used for target prediction, thus
26
Chapter 3. Intronic miRNAs and prediction of their targets 27
adding considerable statistical power to any target predictions.
However, not all host gene expression profiles are useful for predicting the targets of
their intronic miRNAs. Some of these intronic miRNAs show evidence of having their own
promoter[111–118]. For example, two independent studies found putative promoters for
one-third of intronic miRNAs [111, 112]. Furthermore, host gene mRNAs may themselves
be under post-transcriptional regulation by other miRNA. As such, it is important to
distinguish host genes with expression profiles that are good surrogates for those of their
intronic miRNAs from those that are not.
In this chapter, we propose a new method that both identifies intronic miRNAs
whose host gene’s expression provide good surrogates for their expression level as well
as predicting the mRNA targets of these miRNAs. Our method takes as input a set of
potential miRNA target sites based on sequence comparisons and then among these sites
it identifies those likely to be functional sites based on the degree to which host gene’s
expression is predictive of down-regulation of the mRNA. When predicting regulators
of a particular mRNA, we consider the combined effect of all of its potential regulators
because most miRNAs are regulated by multiple miRNAs [20, 51, 106, 109, 120]. Our
method can use any mRNA expression profiles, however, here we use 140 gene expression
data series chosen for their size and their use of the same microarray platform. We
distinguish between good and bad host gene surrogates based on the proportion of their
hosted miRNA’s potential targets that we predict to be functional. Host genes that we
deem to be bad surrogates based on this test have more predicted Pol II/III promoters
in their introns as well as more predicted miRNA binding sites in their 3’ UTRs.
3.2 Method: InMIR
We modeled the change of an mRNA’s expression level in a sample by a linear combina-
tion of the host gene expression levels of a subset of the miRNAs with potential target
Chapter 3. Intronic miRNAs and prediction of their targets 28
sites in the 3’ UTR of the mRNA. We distinguished the functional and non-functional
target sites by fitting this linear model to expression profiling data from a large number of
studies and then examining the distributions of weights assigned each potential miRNA
regulator.
This linear modeling approaches differs from previous ones [51, 106, 109] in a number
of important aspects. First, we use host gene expression levels as surrogates for miRNA
expression levels. Also, we predict functional and non-functional sites by integrating
evidence from multiple profiling studies rather than a single study. This change allows us
to employ a much simpler linear model for each individual dataset because we need not
rely upon prior assumptions to detect statistical signals of regulation. The parameters
of our model can be easily estimated using ordinary least squares linear regression. In
the following, we describe our methodology and obtained results in detail.
3.2.1 Computing weights for putative miRNA regulators on in-
dividual datasets
Table 3.1: The description of symbols used in this chaptersymbol Descriptiong gene indexk miRNA indexi dataset indexG # of target genesKg # of putative targeting miRNAs for gene gT # of samplesni noise vector corresponding to dataset ixig expression of gene g in dataset iHig a matrix containing the expressions of host genes in dataset i
hikg expression of the gene hosting miRNA k that targets gene g in dataset i∆xig change in expression level of gene g in dataset iwig regulatory weights of miRNAs targeting gene g in dataset i
Our linear model is as follows: Given N gene expression datasets Di, i = 1, . . . N
Chapter 3. Intronic miRNAs and prediction of their targets 29
(see materials and Table S1), let ∆xig = {∆xitg}Tt=1 denote an T -element vector whose
elements correspond to the decrease in the expression level of the gth target gene over
T samples in the ith dataset. We model this vector as a linear function of Kg intronic
miRNAs whose host gene expression levels are denoted by hikg = {hitkg}Tt=1, k = 1, . . . , Kg.
These intronic miRNAs represent putative regulators of the mRNA identified based on
a sequence-based miRNA prediction algorithm, such as TargetScan. Based on the above
assumptions and definitions, we build the following model:
∆xi1g
∆xi2g...
∆xiTg
target gene
= wi1g
hi11g
hi21g
...
hiT1g
+ wi2g
hi12g
hi22g
...
hiT2g
+ . . .+ wiKg
hi1Kgg
hi2Kgg
...
hiTKgg
︸ ︷︷ ︸
the contribution of the intronic miRNAs
+
ni1
ni2...
niT
noise
(3.1)
where wikg, k = 1, . . . , Kg is a weight that represents the contribution of the kth intronic
miRNA in regulating the target gene g and ni = {nit}Tt=1 represents modeling error or
noise. Typically, we cannot measure ∆xikg directly, so we approximate it by the difference
between the mean mRNA expression level in the sample and the measured level of xikg,
i.e., ∆xikg = −(xikg − 1G
∑Gg=1 x
ikg) , where G denotes the number of genes in the dataset.
We also assume that the noise vector is sampled from a multivariate Gaussian distribution
whose covariance matrix is proportional to the identity matrix, i.e., is spherical. Equation
(3.1) can be written in matrix-vector notation as
∆xig = Higw
ig + ni, i = 1, . . . , N (3.2)
in which Hig = [hi1gh
i2g . . .h
iKgg
] denotes the expression data of Kg host genes over T
samples.
In the model, a positive weight, wikg, indicates the contribution of the host gene k
Chapter 3. Intronic miRNAs and prediction of their targets 30
in decreasing the expression level (∆xig) of the target gene g . Analogously, a negative
weight, wikg, indicates the contribution of the host gene k in increasing the expression
level (∆xig) of the target gene g . We call this the unconstrained linear model (ULM)
to distinguish it from previous models [51, 109] that constrain the weights wi to be
positive thereby insisting that miRNAs act only to down-regulate the expression of their
target genes. We relax this constraint for convenience because doing so simplifies the
fitting procedure without impacting the predictions of the model. In this chapter, we
focus on the down-regulation role of miRNAs as only few miRNAs have been reported
to up-regulate target gene expression [121, 122].
Under these assumptions, we can estimate wig using ordinary least squares linear
regression, i.e., we minimize the root mean squared error between the reconstruction of
the mRNA down-regulation profile based on the miRNA estimates and the observed one,
i.e.,:
wig = arg min
wig
(∆xig −Higw
ig)>(∆xig −Hi
gwig) (3.3)
where > denotes the matrix transpose operation. Note that the solution to equation
(3.3) corresponds to the maximum likelihood estimate of wi. The maximum likelihood
estimate of wik is given by
wig = arg max
wig
p(∆xig|wig,H
ig). (3.4)
The vector ng is modeled by a zero mean white Gaussian noise of the form
pn(ng) ∼ N (0,Σn) =1
|2πΣn|T2
exp(−1
2n>g Σ−1
n n). (3.5)
If we assume that the noise process has a diagonal covariance matrix of the form Σn = σ2I
Chapter 3. Intronic miRNAs and prediction of their targets 31
where I denotes the identity matrix, then maximum likelihood function is given by
p(∆xig|wig,H
ig) =
1
(|2πσ2|)T2
exp(− 1
2σ2(∆xig −Hi
gwig)>(∆xig −Hi
gwig)). (3.6)
Thus, maximizing the log of p(∆xig|wig,H
ig) is equivalent
wig = arg min
wig
(∆xig −Higw
ig)>(∆xig −Hi
gwig) (3.7)
We solved (3.3) individually in each dataset to obtain N wig vectors for the target
gene g. In order to be able to compare weights across datasets, we rescaled the weights
for each mRNA within each dataset by dividing each element in wig by the sum of the
absolute values of its elements, i.e.,∑N
i=1 |wig| thus ensuring that −1 ≤ wikg ≤ 1, ∀i, k.
In the next section we describe how we combine weights from multiple datasets to make
a single prediction for each putative miRNA and mRNA interaction.
3.2.2 Mapping host gene weights to miRNA weights
Our model uses host gene expression as a surrogate for the expression level(s) of its
intronic miRNAs. This requires us to resolve some of the host gene / intronic miRNA
relationships that are not one-to-one, because some host genes contain multiple intronic
miRNAs and some intronic miRNAs are duplicated in more than one host gene. Fig. 3.1
shows a directed acyclic graph (DAG) representing these relationship for eight intronic
miRNAs that are possible regulators for the expression of gene LSM12 whose protein
product accumulates in stress granules [123]. This DAG can be interpreted as a graphical
model in which the expression patterns of intronic miRNAs are hidden. Because our goal
is not only to predict miRNA targets but also to determine which host genes are good
surrogates for their intronic miRNAs, we assign weights directly to host genes rather
than miRNAs. So, the host genes of duplicated miRNAs get separate weights. Also,
Chapter 3. Intronic miRNAs and prediction of their targets 32
Figure 3.1: Interaction between hosts, targets, and intronic miRNAs using DAG. A directedacyclic graph (DAG) that represents interactions between host genes, intronic miRNAs, andthe target. The top nodes represent the host genes. The middle layer represents the intronicmiRNAs located in the introns of the host genes at the first layer. And the bottom layerdenotes the target gene. In this DAG, the gene LSM12 is targeted by intronic miRNAs miR-19a, miR-19b,miR-26a,miR-26b, miR-27b, miR-214, miR-340,and miR-874 which are located inthe introns of CTDSP2, CTDSPL, MIRHG1, CTDSP1, C9orf3, RNF130, DNM3, and KLHL3.
Figure 3.2: The simplified DAG of Fig. 3.1 in which host genes have a direct interaction withthe target.
when a host gene contains more than one intronic miRNA with putative targets in a
given mRNA, we assign this host gene weight to each of these miRNAs. The host gene /
target mRNA model that we fit for LSM12 after making these adjustments is shown in
Fig. 3.2.
3.2.3 Combining multiple datasets to predict functional targets
We make our predictions of functional targets by comparing the distribution of weights
assigned to a host gene / mRNA pair across the datasets to a distribution in which the
Chapter 3. Intronic miRNAs and prediction of their targets 33
association between host genes and their expression profiles is randomized. Specifically,
we generate a null distribution of weights by permuting the labels of the host genes
and re-calculating the weights for all putative pairs in every dataset. All of the weights
calculated during this process comprise the empirical null distribution. Then for each
host gene / mRNA pair, we compare the distribution of weights for this pair against this
null distribution by calculating the two-sided Wilcoxon-Mann-Whitney (WMW) ranksum
P-value, we call this value Pkg for the k-th host gene and the g-th mRNA. We also record
whether the mean of the distribution of real weights for a given pair is larger or smaller
than the mean of the null distribution. The means of the weight distributions that are
larger than random reflect a prediction by our model that a miRNA associated with the
host gene is down-regulating the target mRNA. As we will describe later, we use host
gene / mRNA pairs whose weights are smaller than random when distinguishing good
and bad host gene surrogates.
We interpret Pkg as an enrichment measure and determine a cutoff value, for both
positive and negative enrichment, by comparing it to P-values calculated for host gene /
mRNA pairs that are unlikely to interact. We generated P-values for these likely negative
examples by calculating a two-tailed WMW P-value, Qkg, for each putative host gene
/ mRNA pair as described above except that we replace the actual weight distribution
with that we computed after permuting the host gene labels. Formally, we define Pkg
and Qkg as follows:
Pkg = WMW({wikg}Ni=1,
{{qikg}Kk=1
}Ni=1
)(3.8)
Qkg = WMW({qikg}Ni=1,
{{qikg}Kk=1
}Ni=1
)(3.9)
where WMW(S, S ′)
)is a function that calculates a two-tailed WMW P-value for sets S
and S ′ and {qikg} is the set of weights fit to the permuted data.
Fig. 3.3.a-d show the CDFs of weights (i.e. wigk and qigk ,∀k) for all host genes whose
intronic miRNAs have potential target sites in LSM12. The CDF of the pooled weights
Chapter 3. Intronic miRNAs and prediction of their targets 34
obtained from the permuted data (the thick gray line) is also shown. These weights
were obtained from two methods: ULM (Fig. 3.3.a-b) and a method that sets weights
by correlation (Fig. 3.3.c-d) (the CORR method, see materials for details). Recently,
the HOCTAR method was introduced that uses inverse correlation with host genes to
detect intronic miRNA targets [56]; here we use the CORR method to demonstrate how
well inverse correlation performed within our framework. From Fig. 3.3.c-d, we see that
the distributions obtained from CORR from the actual and permuted data are almost
indistinguishable suggesting that CORR is unpowered and/or prone to misclassification
compared to ULM. Moreover, these observations also confirm the cooperative impact
of miRNAs on target genes. By contrast, the distributions of three host genes, namely
CTDSP1,CTDSP2, and CTDSPL, obtained from ULM—also from constrained linear
model (CLM) (Fig.S4)—are significantly different from their permuted counterparts and
the pooled distribution. The table at the bottom of Fig. 3.3 lists Pkg and Qkg for each
interaction. In the next subsection we specify a cutoff point in order to determine the
significant interactions that we will be using to make predictions about targets.
3.2.4 Determining a cutoff value for significant interactions
We apply ROC analysis to determine a cutoff point for specifying significant Pkg. Fig. 3.4
shows the ROC curves for the ULM and CORR methods when we use − logPkg as the
discriminant values for the positive examples and − logQkg for the negative examples.
By using a cutoff of 0.01 for the ULM Pkg values, we are able to achieve a sensitivity
of 32% at 100% predicted specificity. In other words, 32% of interactions predicted by
TargetScan are assigned weights whose distributions are more distinguishable from a
random distribution than any of those assigned the permuted host gene / mRNA pairs.
If we insist on 100% specificity, CORR only recovers 17% of the TargetScan predicted
host gene / mRNA interactions; achieving 32% sensitivity with CORR requires lowering
the specificity to 94%. The corresponding cumulative distribution of these log P-values
Chapter 3. Intronic miRNAs and prediction of their targets 35
-0.5 0 0.5 10
0.2
0.4
0.6
0.8
1a: ULM
weights
CD
F
-0.5 0 0.5 10
0.2
0.4
0.6
0.8
1b: permuted ULM
weights
CD
F
-1 -0.5 0 0.5 10
0.2
0.4
0.6
0.8
1c: Corr
weights
CD
F
-1 -0.5 0 0.5 10
0.2
0.4
0.6
0.8
1d:permuted Corr
weights
CD
F
Line color Host gene miRNA ULM Perm‐ULM Corr Perm‐Corr
C9orf3 miR‐27b 1.2x10‐2 2.5x10‐2 6.6x10‐1 5.3 x10‐1
CTDSP1 miR‐26b 3.1 x10‐8 2.9 x10‐1 6.5 x10‐1 3.5 x10‐2
CTDSP2 miR‐26a‐1 1.7 x10‐4 9.6 x10‐1 9.8 x10‐1 2.0 x10‐1
CTDSPL miR‐26a‐2 2.1 x10‐5 1.2 x10‐1 2.3 x10‐1 1.7 x10‐1
DNM3 miR‐214 3.4 x10‐2 3.1 x10‐1 3.5 x10‐1 8.4 x10‐1
KLHL3 miR‐847 3.8 x10‐1 7.3 x10‐1 2.5 x10‐2 3.3 x10‐1
RNF130 mir‐340 3.1 x10‐1 2.3 x10‐1 5.0 x10‐2 5.9 x10‐1
PermMean ‐ ‐ ‐ ‐
Pcutoff =10‐2
Target gene: LSM12
Figure 3.3: Plots a-d: the CDFs of the weights wigk (a-b) and ρigk, (c and d)∀i, g for seven
host genes obtained from ULM (a and b), and CORR (c and d) with the actual (a and c) andpermutation setups (b and d). The thick gray line in each plot is the CDF obtained from thepooled permutation data for each method. The Table lists the p-values (Willcoxon ranksumtest) showing the probability that the weight or correlation data are drawn from the pooledpermutated data (see (3.8) and (3.9) for detail). P-values marked in red are predicted to besignificant (P < 0.01). It should be noted that the host gene MIRHG1 was excluded for analysissince the expression data related this host gene did not exist in the retrieved dataset.
Chapter 3. Intronic miRNAs and prediction of their targets 36
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False positive (1-Specificity)
True
pos
itive
rate
(Sen
sitiv
ity)
ROC curve
ULMCORRRandom
cutoff=0.01
Figure 3.4: Receiver Operating Characteristic (ROC) curve analysis to determine the cutoffpoint. We set the cutoff point to 0.01 (− log10 0.01 = 2) to identify significant host-targetinteractions. The blue, red, and black curves show the ROC associated with ULM, CORR, andrandom, respectively.
is shown in Fig.S1-2. In the example in Fig. 3.3, detect significant interactions between
CTDSP1 and LSM12 (P-value=3.1 × 10−8(ULM)), between CTDSP2 and LSM12 (P-
value=1.7 × 10−4 (ULM)), and between CTDSPL and LSM12 (P-values=2.1 × 10−5
(ULM)) significant. Fig. 3.5 shows the boxplots of weights of 7 host genes whose intronic
miRNAs putatively target LSM12.
3.2.5 Predicting miRNA targets using inverse correlation (CORR
method)
Gennarino and colleague [56] recently described an algorithm, HOCTAR, that predict
intronic microRNA targets based on inverse correlation of their host genes with other
Chapter 3. Intronic miRNAs and prediction of their targets 37
Table 3.2: InMiR procedure
for g = 1 : G(number of target genes)
Ifind all intronic miRNAs which putatively target g using TargetScan
Imap intronic miRNAs to their host genes ,k = 1, . . . ,Kg
for i = 1 : N( number gene expression datasets)
Iextract the expression data of the host genes, Hig
Iextract the expression data of the target gene, xig
I solve wig = arg min
wig
‖∆xig −Higw
ig‖
I permute the rows using a permuted matrix, M , to get MHig
I solve qig = arg minqig
‖∆xig −MHigrw
ig‖
end
for k = 1 : Kg
I compute the P-values:
Pkg = WMW({wikg}Ni=1,
{{qikg}Kk=1
}Ni=1
)Qkg = WMW
({qikg}Ni=1,
{{qikg}Kk=1
}Ni=1
)end
end
Iset two classes of data I:{Pkg|∀ i, g, k} and II:{Qkg|∀ i, g, k}
Iplot ROC curve and determine a cutoff point (Pcutoff) to get almost zerofalse positive
Ideclare the interaction between host gene k and target gene g significantif Pk,g < Pcutoff
Chapter 3. Intronic miRNAs and prediction of their targets 38
-0.4
-0.2
0
0.2
0.4
0.6
miR
-26b
---C
TD
SP
1
miR
-26a
-1--
-CT
DS
P2
miR
-340
---R
NF
130
miR
-214
---D
NM
3
miR
-847
---K
LHL3
miR
-27b
---C
9orf
3
miR
-26a
-2--
-CT
DS
PL
miRNAs---Host Genes
* *
Wei
ghts
val
ues
Targeted Gene:LSM12
Median of permutated data
Figure 3.5: Shown are the boxplots of weights obtained from the procedure described in TableI. The significant negative interactions, i.e. those with P < Pcutoff and meangk > random, haveasterisk marks. The horizontal dashed line indicates the median of weights obtained from thepermutation test.
mRNAs across a large number of datasets. As we have previously demonstrated [52],
linear models that consider the impact of multiple potential miRNA regulators generate
more accurate target predictions than simple correlations, consistent with recent obser-
vations of miRNA-target interactions [20, 120]. To assess whether these observations
hold for target predictions based on host gene expression, we also assessed a version of
our method in which we replace the weights with correlations. The resulting algorithm
is very similar to HOCTAR.
In particular, we denote the correlation coefficient by ρigk = corr(xig,hik), ∀i, k, g
where corr(·, ·) represents the Pearson correlation coefficient. We then use these correla-
tions ρigk for real and permuted datasets in the place of weights to calculate the P-value
based enrichment measures as described in Section II.C. We call this method as CORR.
Chapter 3. Intronic miRNAs and prediction of their targets 39
3.2.6 Processing hosts and targets data
We retrieved the mirRBase V.16 gene context repository and extracted all human intronic
miRNA-host gene association. We also downloaded 140 gene expression datasets (GDS
V.2011) from Gene Expression Omnibus (GEO) which were built on the Affymetrix
HG-U133 microarray platform [56] using MATLAB function getgeodata.m (Table S1 and
materials). Only those probe IDs that could be mapped to gene symbols (according to
HGNC) were considered for analysis. We averaged the expression levels of all transcripts
per gene. We used the list of putatively predicted target genes (9448) and their intronic
miRNAs (134) from the TargetScan (release 5.1) repository.
3.3 Results
3.3.1 Data set
140 curated gene expression data sets, called GDS, were downloaded from Gene Expres-
sion Omnibus (GEO) using the MATLAB Bioinformatics toolbox function getgeodata.m.
The list of these GDSs are given in Table S1. Each dataset is then processed as follows.
First, we excluded those genes for which we have missing values. Then we filtered out
genes with absolute values less than 10th percentile using MATLAB function genelow-
valfilter.m. The expression profile related to the host gens are normalized so that all
have length one. Mathematically this means higk ←higk
‖higk‖,∀i, k, g. For the target genes,
we obtain the decrease in expression level as ∆xg = xg −xg where xg = 1kg
∑Kg
k=1 xgk,∀g.
3.3.2 Detecting good host gene surrogate
Using the method described in the last section, we defined for each host gene a set of sig-
nificant interactions between the host gene’s expression level and those of the predicted
targets of its associated intronic miRNAs (i.e. those for which Pkg < Pcutoff). Further-
Chapter 3. Intronic miRNAs and prediction of their targets 40
more, we know whether that an interaction is a ”negative” one when the mean of weights
over all datasets ( i.e. mean(wkg) = 1N
∑Ni=1w
ikg) is larger than random expectation or
a ”non-negative” one, when the mean is smaller than random expectation. When we
examine all the significant interactions between a host ( or equivalently its miRNA) and
its predictive targets, we find that these interactions are almost exclusively negative or
non-negative.
We retrieved and processed the expression profiles of 75 host genes and 3864 target
genes (see materials and Table S3 ) over 140 datasets. For all target genes (G = 3864),
we carried out the procedure given in Materials subsection 5 for obtaining p-values for
ULM, CLM, and CORR methods. All of these p-values are available in Table S3. We
report the results for ULM, the significant interactions from CLM are similar and, as
we described in the last section, using CORR reduces our sensitivity or specificity or
both. After applying the cutoff at P = 0.01, we find that 22 (29%) host genes have more
negative interactions than positive ones. Those host genes and their 1935 target genes
are shown in Fig. 3.6.
Fig. 3.7 shows the number of TargetScan-predicted targets for each of these 22 host
genes, along with the number of significant interactions for these predicted targets and
the number of these significant interactions that are negative. As shown, for 21 out of 22
host genes, almost all interactions are negative (equal light green and yellow bars). We
take this as evidence that the host gene expression level is a good surrogate for that of
its intronic miRNAs. Indeed when we consider all of the host genes with any significant
interactions, we find that they fall into two main classes: those whose interactions are
almost exclusively negative and those that are non-negative (Fig. 3.8). Furthermore,
those that are non-negative are highly enriched for those with possible promoters, as
predicted by sequence analysis in [111], for their intronic miRNAs (Fig. 3.8 and Fig. 3.9).
We also observe that significantly negatively enriched host genes have, on average, high
mean p-values (blue circles). For instance, 7 out of 8 host genes, namely HNRNPK
Chapter 3. Intronic miRNAs and prediction of their targets 41
Figure 3.6: A gene-gene interaction network of target and host genes of intronic miRNAs withsignificant negative interactions. Each green and red node shows a host and target gene, respec-tively. An edge indicates that there is a significant negative interaction between two nodes, i.e.meangk > random and Pkg < Pcutoff. The size of each host node is proportional to the number ofthe edges connected to it. Host–intronic miRNAs pairs are: MCM7–miR-106b/93/25, LARP7–miR-367/302a/302b,LARP7–miR-302c/d, RNF130–miR-340,PPIL2–miR-130b/301b,HUWE1–miR-98/let-7f, CTDSP2–miR-26a, CTDSP1–miR-26b, RCL1–miR-101,COPZ1–miR-148b,PANK2–miR-103,TRPM3–miR-204, DNM2–miR-199a/638, IARS2–miR-215/194,HNRNPK–miR-7, SREBF2–miR-33a, WWP2–miR-140, DALRD3–miR-425/191, EVL–miR-342, LPP–miR-28, ACADVL–miR-324,KIAA1797–miR-491, C3orf60–miR-191.
, COPZ1, HUWE1, PANK2, ACADVL, LARP7,and IARS2 appear at the top of the
ranked mean p-value list . Thus, significantly negatively interactions and high mean p-
values are two determinants which may provide strong evidence for detecting co-expressed
host-intronic miRNA pairs.
Chapter 3. Intronic miRNAs and prediction of their targets 42
0
100
200
300
400
500
600
700
800
900
1000
1100
1200
Host genes--Intronic miRNAs
Num
ber
of t
arge
t ge
nes
MCM
7--m
iR-1
06b/
93/2
5
LARP7-
-miR
-367
/302
a/30
2b
LARP7-
-miR
-302
c/d
RNF130-
-miR
-340
PPIL2-
-miR
-130
b/30
1b
HUWE1-
-miR
-98/
let-7
f
CTDSP2--m
iR-2
6a
CTDSP1--m
iR-2
6b
RCL1--m
iR-1
01
COPZ1--m
iR-1
48b
PANK2--m
iR-1
03
TRPM3-
-miR
-204
DNM2-
-miR
-199
a/63
8
IARS2-
-miR
-215
/194
HNRNPK--miR
-7
SREBF2--m
iR-3
3a
WW
P2--m
iR-1
40
DALRD3-
-miR
-425
/191
EVL--m
iR-3
42
LPP--m
iR-2
8
ACADVL--m
iR-3
24
KIAA17
97--m
iR-4
91
C3orf6
0--m
iR-1
91
# of putative targets (using TargetScan)# of putative targets with P
value<P
cutoff
# of putative targets meet where: Pvalue
<Pcutoff
& mean(w)<mean(rw)
Figure 3.7: Each dark green bar shows the number of putative targets—obtained fromTargetScan—of intronic miRNAs of the corresponding host gene labeled in the x-axis. Lightgreen bars indicate the number of putative targets which satisfy the condition Pgk > Pcutoff (sig-nificantly regulated). Number of putative targets that meet the both conditions Pgk > Pcutoff
and meangk > random (significantly negatively regulated), are shown by yellow bars.
3.3.3 Targeting of host genes by miRNAs partially explains
their predicted surrogacy
Even if a host gene and intronic miRNA are expressed from the same promoter, they
could have different expression levels due to different post-transcriptional regulation. To
investigate this, we examined the predicted miRNA targets within the 3’ UTRs of host
genes. We found host genes are targeted by miRNAs much more than non-host genes
Chapter 3. Intronic miRNAs and prediction of their targets 43
0% 25% 50% 75% 100%
8
12
16
20
Good surrogate hosts
Bad surrogate hosts
Hosts whose intronic miRNAs have predicted promoters
-log 10
p-v
alue
s
Percentage of negatively enriched targets
Figure 3.8: Each circle, associated with a host, shows the mean of − log10 p-values of theenriched genes vs the percentage of negatively enriched genes targeted by the intronic miRNAsof host genes. The blue and red circles are associated with good and bad surrogate host genes,respectively. The circles corresponding to the hosts whose intronic miRNAs have predictedpromoters marked by yellow triangles.
(P < 10−22, Wilcoxon ranksum test) though we were unable to detect a preference for
targeting by intronic versus intergenic miRNAs (Fig:FigS5b). However, we found that
negatively enriched host genes have significantly fewer (P < 0.02, Wilcoxon ranksum
test) miRNA targets than non-negatively enriched hosts (Fig. 3.11). So, down-regulation
of the host gene by other miRNAs could provide another possible explanation for why
some host expression levels are bad surrogates for those of their intronic miRNAs. The
pattern of interactions among host genes and their intronic miRNAs suggests that there
may be some hierarchical structure in intronic miRNA-based regulation (Fig. 4.17).
Chapter 3. Intronic miRNAs and prediction of their targets 44
16 143
39 13 4
Good surrogate hosts
Bad surrogate hosts
Hosts whose intronic miRNAs have
independent promoters
Figure 3.9: Venn diagrams showing overlap between good and bad surrogate host genes andhosts whose intronic miRNAs have predicted promoters.
0 0.05 0.1 0.15 0.20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
CD
F
# of putative targeing miRNAs/base pair
P<7.2723e-022
Host genesnon-host genes
Figure 3.10: the CDF of the number of miRNAs targeting host (blue) and non-host genes (red)per base; that is, number of target / 3’UTR length. The CDFs are obtained from analyzing367 host genes and 17000 non-host genes.
Chapter 3. Intronic miRNAs and prediction of their targets 45
0
10
20
30
40
50
60
Host genes
CL
CN
5C
TD
SP
LW
DR
82
KL
HL
3C
9o
rf5
LP
PC
AL
CR
PA
NK
3C
TD
SP
2A
NK
1H
OX
A9
CH
RM
2T
LN2
AA
TK
DN
M2
HT
R2
CM
ES
TA
ST
N1
DN
M3
GA
BR
EN
R6
A1
WW
P2
AP
OL
D1
CT
DS
P1
HN
RN
PK
GP
C1
HO
XC
5P
TP
RN
2T
RP
M3
AR
RB
1C
17
orf
91
CH
MC
OP
Z1
ME
GF
6A
CA
DV
LE
ML
2F
GF
13P
DE
2A
RC
L1
VP
S1
3B
C9
orf
3D
AL
RD
3D
NM
1H
UW
E1
SL
IT3
SM
C4
SR
EB
F1
AC
10
68
64
.1C
3o
rf6
0C
6o
rf1
55
DL
EU
2E
VL
HO
XC
4IA
RS
2K
IAA
17
97
LA
RP
7M
CM
7M
IRH
G1
MY
H7
BP
AN
K2
PP
IL2
PT
PR
NR
NF
13
0S
LIT
2S
RE
BF
2T
OP
3B
TR
PM
1
Num
ber
of p
utat
ive
targ
etin
g m
iRN
As
IntronicIntergenicGenes predicted to be good surrogates
Figure 3.11: Number of intergenic and intronic miRNAs that putatively target our set of hostgenes. Bars marked by red circles are associated with the genes predicted to be good surrogates.
Figure 3.12: Host genes targeted by intronic miRNAs of other hosts. The nodes correspondingto the hosts predicted to be good surrogates are shown in red.
Chapter 3. Intronic miRNAs and prediction of their targets 46
3.3.4 Correlation measurements are not good indicators of sur-
rogacy
Correlation between the expression patterns of the host genes and their intronic miRNAs
in a single dataset are not a good indicator of surrogacy. We observed that correlation
measurements reported by five different groups are highly non-overlapped and somehow
inconsistent. Only 11 host-miRNA pairs show high positive correlation (ρ > 0.4) at
least in two of these five datasets (Fig. 3.13). Out of these 11 host genes, 4 host genes
are predicted to be good surrogates by our model. While the intronic miRNAs of none
of these 4 hosts have promoters, 6 out of 7 hosts predicted to be bad surrogates have
intronic miRNAs with promoters (Fig. 3.13). Thus, 7 highly correlated host-intronic
miRNA pairs pass neither our criteria nor the promoterless condition.
We collected the correlation results reported by Wang et al. [43], Liang et al. [67],
Baskerville et al. [64], and Ruike et al. [45]. Wang’s data, reported in terms of p-values,
are transformed to Pearson correlation coefficients to be consistent with other data. The
transformation is done based on the significance of a correlation coefficient test [124].
In addition, we applied the data given in [125] and [126] and computed the correlation
between the matched host genes and intronic miRNAs;we refer this method as Rad. In
order to compute the correlation between the expression profiles of miRNAs and mRNAs
in Rad data, we analyzed the data collected in [20]. Ritchie et al. analyzed miRNAs
expression data cloned by Landgraf, et al. [126]. After downloading their data and
processing them we obtain the expressions of 117 human miRNAs and the expression of
22283 genes over 117 samples. We then computed the correlation between all miRNA-
mRNA pairs.
In this way, we obtain correlation coefficients for 84 host-intronic miRNA pairs from
five different datasets. We expect that co-expressed host-miRNA pairs show strong corre-
lation in at least two of these five datasets. The scatter plots (Fig. 3.14) of the correlation
Chapter 3. Intronic miRNAs and prediction of their targets 47
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Hosts whose intronic miRNAs have promoters
Good surrogates Bad surrogates
EV
L--m
iR-3
42LP
P--
miR
-28
CTD
SP
2--m
iR-2
6aC
TDS
P1-
-miR
-26b
ME
ST-
-miR
-335
C9o
rf3--
miR
-24
GA
BR
E--
miR
-452
PD
E2A
--m
iR-1
39G
PC
1--m
iR-1
49S
LIT3
--m
iR-2
18S
LIT2
--m
iR-2
18
Pea
rson
Cor
rela
tion
Coe
ffici
ent
Figure 3.13: Pearson correlation coefficients averaged over five correlation datasets. Only thosehost-intronic miRNAs pairs which are significant (P < 0.05) in at least two datasets and overlapwith our host gene list are considered. The hosts marked with a yellow triangle contain intronicmiRNAs with predicted independent promoters.
data however show that the data are highly non-overlapping and somehow inconsistent,
suggesting that solely relying on correlation data may not be sufficient to declare a host-
intronic miRNA pair co-transcribed.
3.4 Discussion
InMiR models the combinatorial effect of miRNAs using a simple and biologically plausi-
ble linear model. Because we use ordinary linear regression for target prediction, InMiR
is fast and easy to update to incorporate new mRNA expression data. We used data
Chapter 3. Intronic miRNAs and prediction of their targets 48
-0.2 0 0.2 0.4 0.6 0.8-0.2
0
0.2
0.4
0.6
0.8
1
1.2a
1(Rad)
2
0 0.2 0.4 0.6 0.8 1-0.2
0
0.2
0.4
0.6
0.8
1
1.2b
1(Linag)
2
0.4 0.5 0.6 0.7 0.8 0.9 1-0.4
-0.2
0
0.2
0.4
0.6
0.8
1c
1(Wang)
2
-0.4 -0.2 0 0.2 0.4 0.6 0.8 1
0.4
0.5
0.6
0.7
0.8
0.9
1d
1(Ruike)
2
2 :Linag
2 :Wang
2 :Ruike
2 :Baskerville
Figure 3.14: Scatter plots of five correlation datasets. Scatter plots of five correlation datasets(Table S4). (a) the scatter plot of Rad’s data versus Liang’s, Wang’s, Ruike’s, and Baskerville’sdata. (b) the scatter plot of Liang’s data versus Wang’s, Ruike’s, and Baskerville’s data. (c)the scatter plot of Wang’s data versus Ruike’s and Baskerville’s data. (d) the scatter plot ofRuike’s data versus Baskerville’s data.
from ∼1,500 gene expression arrays to predict interactions in human between 57 in-
tronic miRNAs and 3,864 potential targets. InMiR can also be readily applied to other
species beside human because intronic miRNAs constitute a large portion of the miRNA
complement of a variety of species (Fig. 3.15).
Unlike previously described methods, InMiR does not assume that all host genes have
expression levels that are equally good surrogates. The set of host genes predicted by
InMiR to be bad surrogates is enriched for those with predicted intronic promoters as
Chapter 3. Intronic miRNAs and prediction of their targets 49
0
100
200
300
400
500
600
700
800
900
1000
Species
Num
ber
of m
iRN
As
Homo
sapie
ns(h
sa)
Bos ta
urus
(bta
)
Pan tr
oglod
ytes(
ptr)
Mus
mus
culus
(mm
u)
Pongo
pyg
mae
us(p
py)
Gallus
gall
us(g
ga)
Mac
aca
mula
tta(m
ml)
Danio
rerio
(dre
)
Equus
caba
llus(
eca)
Ornith
orhy
nchu
s ana
tinus
(oan
)
Ciona
intes
tinali
s(cin
)
Rattu
s nor
vegic
us(rn
o)
Canis
fam
iliaris
(cfa
)
Taenio
pygia
gut
tata
(tgu)
Xenop
us tr
opica
lis(x
tr)
Sus sc
rofa
(ssc
)
Caeno
rhab
ditis
elega
ns(c
el)
Droso
phila
mela
noga
ster(d
me)
Mon
odelp
his d
omes
tica(
mdo
)
Tetra
odon
nigr
ovirid
is(tn
i)
intergenicintron3UTRexon
38%
55%
2%5%Homo sapiens
Figure 3.15: Intronic miRNAs comprises a significant portion of identified miRNAs in otherspecies. Stack bars showing the number of miRNAs located in exon (brown), 3’UTR (yellow),intron (cyan) , and intergenic regions (blue) in 20 species for which more than 100 microRNAshave been detected. Data are retrieved from miRBase (v.15).
well as having a larger number of microRNA target sites in their 3’ UTRs.
As shown in Fig. 3.16, our observations suggest at least three types of regulatory
relationships between host genes and their intronic microRNAs: (a) an intronic miRNA
and its host gene are transcribed from the same promoter; the mature miRNA is then
processed from intron before or after splicing using Drosha or independently (mirtrons)
and the subsequent steady-state expression levels of the host and intronic miRNA are
highly correlated (Fig6.a); (b) an intronic miRNA has its own promoter and is transcribed
independently from the host gene at least some of the time (Fig 6.b); (c) the intronic
Chapter 3. Intronic miRNAs and prediction of their targets 50
Repressed mRNA
miRNA
miR
NA
P
roce
ssin
g
miRNA mRNA
RISC
UTR
Intron
Exon
Transcriptional start site
Co-expressed host and intronic miRNA
Independent-transcribed and expressed intronic
miRNA
Co-transcribed host and intronic miRNA but not co-expressed
miRNA mRNA
Ta
rgte
d b
y a
miR
NA
C
A B
miR
NA
P
roce
s sin
g
miR
NA
P
roce
s sin
g
Figure 3.16: Regulatory mechanisms. Three possible scenarios for the transcription and ex-pression of a host and its intronic miRNA.
miRNA and host are transcribed from the same promoter but the post-transcriptional
regulation of the host gene expression levels is different than those of the miRNA (Fig
6.c). For example, a host gene could be down-regulated by its own intronic miRNA; we
found three self-regulated hosts, all of which were predicted as bad surrogates by InmiR
(Fig. 3.17) or host genes could be down-regulated by other co-expressed miRNAs.
The host gene / intronic miRNA interactions that we observe suggest a variety of new
Chapter 3. Intronic miRNAs and prediction of their targets 51
Figure 3.17: The host genes targeted by their own intronic miRNAs. The host genes in ourdataset which are targeted by their own intronic miRNAs. All of these hosts are predicted tobe bad surrogates.
regulatory mechanisms. For example, tightly coupled host gene and intronic miRNA ex-
pression could support a rapid ”biological switch” in cellular state in which host gene
expression also expresses an intronic miRNA that immediately down-regulates genes ex-
pressed in the competing state (Fig. 3.18). Our observation raise a number of interesting
questions. Are intronic miRNAs with their own promoter ever expressed from the host
gene’s promoter? How is this decision regulated? How does the independent transcrip-
tion of an intronic miRNA affect host gene transcription? Does the processing of intronic
miRNA interfere with splicing? This may depend on whether Drosha cleaves the pre-
miRNA before or after splicing. Kim and Kim [71] speculated that both mechanisms may
occur but no conclusive results can be drawn yet. Answers to these not well-understood
mechanisms provide a clearer picture of intronic miRNA biogenesis.
Chapter 3. Intronic miRNAs and prediction of their targets 52
Gene 2 is being expressed with
its intronic miRNA
time
Repressed mRNA
miRNA2 mRNA2
Tar
gtin
g m
RN
A2
by
miR
NA
1
mRNA1Gene 1 is already expressed
mRNA1 mRNA2
Gene 1 is repressed and
Gene 2 is expressed
Host and its intronic miRNA cooperatively resemble a Biological Switch
miR
NA
P
roce
ssin
g
Figure 3.18: Tightly coupled host gene and intronic miRNA expression could support a rapid”biological switch” in cellular state in which host gene expression also expresses an intronicmiRNA that immediately down-regulates genes expressed in the competing state.
Chapter 4
BayMiR: inferring evidence for
endogenous miRNA-induced gene
repression from mRNA expression
profiles
4.1 Introduction
In the previous chapter, we introduced InMiR, a computational method for predicting the
target genes of intronic miRNAs. Although we showed that InMiR can successfully pre-
dict the targets of intronic miRNAs, many of miRNAs are not intronic and many intronic
miRNAs are not co-expressed with their host genes, a prerequisite for using host genes
as surrogates of intronic miRNAs in the InMiR model. Therefore, we need a prediction
method that works for all miRNAs and independent from the host genes. In this chap-
ter, we introduce BayMiR, a new computational method, that predicts the functionality
of potential miRNA target sites using the activity level of the miRNAs inferred from
genome-wide mRNA expression profiles [127]. For each mRNA-miRNA pair, BayMiR
53
Chapter 4. BayMiR: a computational miRNA target prediction method54
computes an “endogenous target repression” score that identifies the contribution of each
miRNA in repressing the target mRNA expression in presence of other targeting miR-
NAs that are active in the same cellular contexts. We also found that validated miRNA
targets exhibit high expression variability, suggesting that an index of mRNA expression
variation can also be used as another score for predicting miRNA targets. We bench-
marked BayMiR, the expression variation index, Cometa, and the TargetScan “context
scores” on two tasks: predicting independently validated miRNA targets and predicting
the decrease in mRNA abundance in miRNA overexpression assays. BayMiR performed
better than all other methods in both benchmarks and, surprisingly, the variation index
performed better than Cometa and some individual determinants of the TargetScan con-
text scores. Furthermore, BayMiR predicted miRNA target sets are more consistently
annotated with GO and KEGG terms than similar sized random subsets of genes with
conserved miRNA seed regions. We have thus refined the functional classification of miR-
NAs by assigning them function based on enrichment of their BayMiR predicted targets
in KEGG pathways. Our work suggests that modeling multiplicative interactions among
miRNAs is important to predict endogenous, miRNA-induced decreases in steady-state
mRNA abundance.
BayMiR infers miRNA activity levels based on the expression profiles of its putative
targets (predicted on the basis of conserved seed matches) and then it refines these
target predictions using the regression model. We also found that expression variability
is significantly higher among mRNAs with more miRNA target sites and, furthermore,
that it can be used to identify more likely targets. Accordingly, we used the variance
of gene expression levels across a wide range of samples including different cell types,
cell lines, and disease/healthy tissues as another mRNA-miRNA scoring scheme. These
scores are called “gene variation” index.
BayMiR analysis was conducted on 1,539 human miRNAs and the expression levels of
13,303 genes measured on 5,372 microarray experiments and predicts that approximately
Chapter 4. BayMiR: a computational miRNA target prediction method55
60 % of miRNA-mRNA duplexes with matched conserved targets sites have detectable
down-regulation signal on gene expression. We evaluated and compared the efficacy of the
proposed scores with eight TargetScan scores (a collection of most important sequence
based features) as well as Cometa scores (an mRNA expression based miRNA target
prediction method) using over-expression miRNAs experiments, validated targets, and
GO and KEGG enrichment analysis. Using these benchmarks, we found the BayMiR
scores consistently outperform both the sequence and expression scores and identify to
what extent down-regulated genes on a global set of microarrays are under control of
miRNAs.
4.2 Results
4.2.1 BayMiR method
BayMiR (Fig. 4.1) calculates the degree to which mRNA down-regulation inferred from a
large set of microarrays can be explained by inferred miRNA activity. BayMiR makes this
prediction by integrating sequence and expression evidence. Because many targets are
under the control of multiple miRNAs [20, 51, 106, 120], BayMiR applies a linear model
that relates the target expression vector (measured variable) to a weighted combination
of the miRNA activity vectors (regressor variables). BayMiR infers the activity vector of
a given miRNA by averaging the normalized expression vectors of its predicted mRNA
targets based on sequence-based prediction methods. These miRNA activity vectors are
then used as regressors in a Bayesian linear regression model of the “down-regulation”
expression vector of each mRNA. The resulting regression coefficients of each miRNA are
interpreted as the strength of miRNA-mediated repression of the target mRNA.
We also considered the variability in gene expression of a target mRNA as a deter-
minant to distinguish functional and non-functional targets of a given miRNA. The gene
variation index for each mRNA is computed as the variance of gene expression levels
Chapter 4. BayMiR: a computational miRNA target prediction method56
across all samples.
Each expression vector consists of the transcriptional abundance of the target in one
of 392 biological samples collected from 5,372 microarray experiments. We determine the
coefficients of the regression model using a penalized likelihood approach called elastic
net regression [128](see 4.3.1) modified to assign only positive coefficients. By using
this regression model, each sequence-predicted miRNA-mRNA interaction is assigned
one coefficient; this coefficient represents how much the inferred activity profile of that
miRNA contributes to predicting that mRNA’s “down-regulation” profile (see 4.3.1) when
considering the activity profiles of all other miRNAs predicted to target the mRNA. We
call these coefficients “BayMiR scores” and interpret a zero BayMiR score as representing
a lack of evidence in the expression data for regulation of the mRNA by that miRNA.
4.2.2 BayMiR identifies highly repressed targets on miRNA
over-expression assays
To evaluate whether the BayMiR scores reflect the strength of miRNA-mediated repres-
sion of mRNA targets, we measured the consistency between the BayMiR scores and
relative down-regulation of targets in a set of miRNA over-expression experiments. One
expects high scoring targets to be down-regulated more in miRNA over-expression exper-
iments. We note that a similar metric has previously been used to evaluate the efficiency
of TargetScan scores [7, 18], and that this set of miRNA over-expression assays were
not used in BayMiR to obtain the scores; thus, we are not influencing the results of our
evaluation by either selecting bias metrics or by evaluating our model on the training
data. We downloaded the data collected by Khan et. al [39] in which 23 miRNAs were
transfected into seven different cell types and the log-fold change of the expression levels
of mRNAs were measured. To examine that the degree to which our scores can predict
the log-fold change of mRNAs in the miRNA over-expression arrays, for each score, we
binned mRNAs into five bins based on their scores and computed the mean of mRNA
Chapter 4. BayMiR: a computational miRNA target prediction method57
5,372 samples
wKmiRNA activity vectors
miRNAK
Target expression vector yg
w2w1
......
identifying the target set of miRNA1 using
sequence determinants
averaging the expression vectors of the target set of miR1miRNA1
hg =[h1,h2,...,hK]miRNAs-mRNAg scores
...
Bayesian linear regression
yg = h1w1+h2w2+...+hKwK+eg
mRNA Expression Data Set
13,303 mR
NA
s
Figure 4.1: BayMiR Method. Flowchart of the BayMiR algorithm. For each miRNA, BayMiRfirst identifies the set of targets based on the presence of conserved complementary sites tothe seed region of the miRNA in the 3’UTR of the target. Next, for each miRNA, BayMiRextracts the mRNA expression vectors associated with the selected targets from the mRNAgene expression data set, and averages them to obtain the miRNA activity vector. ThesemiRNA activity vectors are used as regressors in a Bayesian linear regression model to explainthe down-regulation in the expression level of the target. Finally, BayMiR infers scores (theregression coefficients) using a penalized likelihood method called elastic net regression. Eachscore indicates the strength of miRNA- mediated repression on the target genes.
log-fold changes in each bin. We observed that negative log-fold repression levels decrease
consistently as scores decrease for both determinants (Fig. 4.2.(top)). In total, 3,867 out
of 10,125 mRNAs are down-regulated in the miRNA over-expression experiments. We
then asked if our scoring schemes can detect repressed targets better than the individ-
ual components of the TargetScan context score[7]. When comparing negative mean
log-fold changes for messages whose scores were greater than the median score for the
corresponding miRNA, BayMiR scores outperforms all TargetScan scores, even the con-
text+score which is a combination of all individual TargetScan scores (Fig. 4.2.(middle)).
In addition, when we combined BayMiR scores and the TargetScan context+score the
Chapter 4. BayMiR: a computational miRNA target prediction method58
performance further improved (Wilcoxon-Mann-Whitney test: P < 0.001), indicating
that BayMiR can augment the TargetScan scoring system to further improve the per-
formance. Target site conservation is another scoring scheme used by TargetScan, so we
also compared BayMiR scores with conservation scores for all conserved target sites of
all conserved miRNA families and found similar improvements (Fig. 4.2.(bottom)). Our
analysis also shows that the gene variation score was a better predictor of log-fold change
than seed pairing stability, relative location of seed match in the 3’ UTR, and target
abundance; however, it is worse than the other components of the context score on this
assay (Fig. 4.2(middle)).
High-scoring BayMiR targets are enriched for validated targets
To test whether the set of experimentally validated targets are enriched among high-
scoring BayMiR targets, we measured the significance of overlap between the targets
with scores greater than the median and the experimentally validated targets retrieved
from TarBase [129]. Enrichment using the hyper-geometric test showed that the validated
targets are enriched in the sets of high-scoring genes both for BayMiR and gene variation
predicted targets, P < 10−5 and P < 10−4 respectively. A cumulative distribution
analysis is also shown in Fig. 4.3. Together these observations support that the hypothesis
that repressed targets under the endogenous conditions are more likely to be functional
targets.
BayMiR predicts miRNA-induced repression better than Cometa
Next, we used the same evaluation strategy to compare BayMiR scores with an mRNA-
miRNA scoring method which also uses large-scale gene expression data. Recently, Gen-
narino et al. [55] showed that the target set of a miRNA tend to be co-expressed and
based on this property they proposed Cometa, a computational method that scores each
sequence-based miRNA target prediction based on how correlated it is with other pre-
Chapter 4. BayMiR: a computational miRNA target prediction method59
0.4
0.8
1.4
0
0.2
0.6
1.0
1.2
BayMiR
gene varia
tion
0-20 20-40 40-60 60-80 80-100
0
0.2
0.4
0.6
0.8
1.0
conte
xt+ sc
ore+B
ayMiR
BayMiR
score
conte
xt+ sc
ore
site t
ype
targe
t abu
ndan
ce
0
0.2
0.4
0.6
0.8
1.0
BayMiR
Conse
rvatio
n
Avg
fold
dec
reas
e in
abu
ndan
ce (l
og2)
score percentage
Avg
fold
dec
reas
e in
abu
ndan
ce (l
og2)
Avg
fold
dec
reas
e in
abu
ndan
ce (l
og2)
mean log-fold change for mRNAs whose scores > median of all mRNA scores
mean log-fold change for targets in the transfection experiments
seed
pairin
g stab
ility
local
AU
posit
ion co
ntribu
tion
gene
varai
tion
3' UTR co
ntribu
tion
mean log-fold change for mRNAs whose scores > median of all mRNA scores
Figure 4.2: (top) mRNAs in the over-expression miRNA assays are grouped into five bins basedon their BayMiR and gene variation scores; the mean log-fold change of the mRNAs in each binis plotted in as a bar. There are two groups of bars; the left- and right-hand groups correspondto BayMiR and gene variation, respectively. (middle) Comparing BayMiR and gene variationscores with seven sequence scores from TargetScan. Each bar represents the negative meanlog-fold change for mRNAs whose scores are greater than the median of all mRNA scores forthe selected determinant in the miRNA over-expression assays. The most left-hand group isobtained by combining the context+ scores with BayMiR scores. The dashed line shows themean log-fold change for all targets in the miRNA over-expression assays (bottom) ComparingBayMiR scores with the conservation scores as measured by TargetScan. The conservationscores are given only for the targets with conserved target sites complementary to the seedregions of the conserved miRNA families. Error bars indicate 95% confidence intervals for theestimated means.
Chapter 4. BayMiR: a computational miRNA target prediction method60
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
BayMiR Scores
P< 10-8
ValidatedAll
0 1 2 3 4 5 6 7 8 9 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Gene variation
P< 10-4
ValidatedAll
Figure 6:
27
CD
F
CD
F
Figure 4.3: Cumulative distribution of scores for the validated targets. Validated targets areassigned higher BayMiR scores and gene variation scores compared to the other putative targets.Shown are the cumulative distributions of BayMiR (left plot) and gene variation scores (rightplot) scores for validated targets (blue) and all putative targets (red).
dicted targets of the miRNA. Examining the down-regulated targets on the miRNA
over-expression assays shows that negative mean log-fold expression changes for targets
selected by our scoring schemes are significantly higher than those selected by Cometa
scores (P < 10−40, Fig. 4.5). Moreover, our methods’ high scoring targets are significantly
more down-regulated compared to Cometa high scoring targets (P < 10−60 Fig. 4.4) on
the over-expression assays. Although Cometa targets are also enriched for validated tar-
gets, this enrichment is smaller than BayMiR scoring targets (P < 0.01 v.s. P < 10−5).
BayMiR target sets have more consistent GO-BP and KEGG annotations
Many miRNAs participate in the coordinate regulation of biological processes [130]; as
such, we should expect that, in general, better target prediction methods would generate
miRNA target sets that have higher enrichment[109]. To test whether BayMiR predicted
targets are more consistently annotated with GO (release 2012-2-19 ) and KEGG (release
Chapter 4. BayMiR: a computational miRNA target prediction method61
0
0.2
0.4
0.6
0.8
1.0
BayMiR
Gene v
ariati
on
Cometa
Avg
fold
dec
reas
e in
abu
ndan
ce (l
og2)
mean log-fold change for mRNAs whose scores > median of all scores
mean log-fold change for targets in the transfection experiments
Figure 4.4: BayMiR high scoring targets are more down-regulated in miRNA over-expressionassays than Cometa high scoring targets. Each bar represents the mean of negative log-foldchange after miRNA over-expression for genes with scores greater than median.
2012-02-14)terms than TargetScan targets, we used Fisher’s exact test with an FDR mul-
tiple test correction (see method and materials) to score the enrichment of 1,233 GO-BP
terms and 259 KEGG pathways within the target sets of each of 1,264 miRNA families.
We found a nearly three-fold increase in enriched terms and pathways (FDR < 0.1)
within BayMiR-predicted target sets compared to equally-sized random subsets of Tar-
getScan (31,976 vs 11,890, P < 10−200). Examination of the enriched GO-BP terms and
KEGG pathways revealed a wide diversity of biological processes regulated by miRNAs
(Table S1, FDR < 0.1 and Table S2, FDR < 0.1). We found that 35 % of miRNAs that
have BayMiR target sets are enriched for the GO term “regulation of expression” sug-
gesting that miRNAs have substantial influence in gene regulation through their control
of other gene regulators.
Chapter 4. BayMiR: a computational miRNA target prediction method62
-2 -1.5 -1 -0.5 0 0.5 1 1.50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Fold Change (log2)
Cum
ulat
ive
fract
ion
BayMiRGene variation
Cometa
Figure 4.5: Comparing BayMiR and Cometa. BayMiR high scoring targets are more down-regulated in miRNA over-expression assays than Cometa high scoring targets. The cumulativedistribution of log-fold change for high-scoring mRNAs; blue, red, and black represent graphsassociated with BayMiR, gene variation, and Cometa.
We also searched for miRNAs with known functions among the miRNAs enriched in
our pathway analysis. A list of miRNAs with experimentally supported functions among
their enriched pathways are given in Table S3. Notably the miR-17 family is frequently
seen in the list. This family has been extensively studied and shown to play an important
role in many cancer-related processes and pathways [80, 81], and references in Fig. 4.17.
Enrichment map of the top 30 most frequent enriched GO-BP/pathways are depicted
in Fig. 4.7. When we examined the mRNAs in KEGG pathways targeted by miRNAs, we
found that although there are extensive co-regulation of mRNAs by multiple miRNAs, a
handful of miRNAs appeared to be responsible for most of the regulation. For example,
in the WNT signaling pathway, five miRNAs target 32 out of 46 genes predicted to be
targeted by any of the 45 miRNAs with targets in this pathway (Fig. 4.8). Similarly,
the 106 genes in “Pathways in cancer” are targeted by 83 miRNAs but only 10 of these
Chapter 4. BayMiR: a computational miRNA target prediction method63
miRNAs Pathways PMID
miR‐17/20ab/93/106ab/519d Pathways in cancer 16461460;18485879;
miR‐17/20ab/93/106ab/519d Pathways in cancer 18328430;20101220;
miR‐124/124ab/506 Axon guidance 18619591;
miR‐138/138ab Pathways in cancer 18201269; 20332227;
miR‐155 T‐cell‐receptor signaling17463289;19877012;
miR‐17/20ab/93/106ab/519d Pathways in cancer 18596939;19135980;
miR‐15abc16abc/195/424/497 p53 signaling pathway 19626115;
miR‐17/20ab/93/106ab/519d Pathways in cancer 17608773;19066217;
miR‐17/20ab/93/106ab/519d MAPK signaling pathway18700987;
miR‐17/20ab/93/106ab/519d p53 signaling pathway 19696742;
miR‐200bc/429/548a Pathways in cancer 19671845;18829540;
miR‐200bc/429/548a Pathways in cancer 17804704;18376396;
miR‐1ab/206/613 Pathways in cancer 18593897;19684618 ;
miR‐17/20ab/93/106ab/519d Pathways in cancer 19597473;16461460;
miR‐25/32/92abc/363/367 Phosphoinositide signali20388916;
miR‐302abcde/372/373/520 Pathways in cancer 17695719;18193036;
miR‐29abcd Pathways in cancer 19247375;19818597;
miR‐29abcd Focal adhesion 19956414;
let‐7/98/4458/4500 bladder cancer 21993544;
miR‐29abcd Small cell lung cancer 17890317;
miR‐302abcde/372/373/520 Cell cycle 18328430;
miR‐133abc Apoptosis 17715156;
miR‐17/20ab/93/106ab/519d Cell cycle 18700987;
miRNA Family GO‐BP terms PMID
miR‐17/20ab/93/106ab/519d G1‐S transition of mitotic cell cycle 19153141;20404090;
miR‐17/20ab/93/106ab/519d Cell‐cycle phase 18212054;
miR‐155 TGFb receptor signaling pathway 19701459;
miR‐155 Immune system development 17463290;18291670
miR‐17/20ab/93/106ab/519d G1/S transition of mitotic cell cycle 18836483;18700987
miR‐17/20ab/93/106ab/519d Apoptosis 19696742;17384677
miR‐181abcd/4262 Regulation of apoptosis 20145152;
miR‐17/20ab/93/106ab/519d Apoptosis 19696742;17881434;
miR‐17/20ab/93/106ab/519d Cell‐cycle process 17881434;18836483
miR‐17/20ab/93/106ab/519d G1‐S transition of mitotic cell cycle 18836483;
miR‐221/222/222ab/1928 Regulation of cell‐matrix adhesion 20110463;
miR‐221/222/222ab/1928 Induction of apoptosis 17616664;19730150
miR‐224 Regulation of apoptosis 18319255;
miR‐27abc/27a‐3p Muscle cell differentiation 20388916;
miR‐29abcd Extracellular matrix organisation 18390668;
miR‐34abc/449ab cell cycle arrest 17554337;
miR‐302abcde/372/373/520 regulation of cell cycle 18328430;
miR‐124/124ab/506 neuron differentiation 17679093,17344415
miR‐125ab/4319 apoptosis 19293287;
miR‐125ab/4319 neurogenesis 16227573;
miR‐133abc regulation of apoptosis 17715156;
miR‐146ac/146b‐5p immune response 16885212;
miR‐15abc16abc/195/424/497 regulation of cell cycle 17242205,18701644
miR‐17/20ab/93/106ab/519d regulation of cell cycle 18700987;
miR‐221/222/222ab/1928 hemopoiesis 16330772;
Figure 12:
33
Figure 4.6: Validated KEGG pathways. List of miRNAs with proposed functions found in ourenriched KEGG list; the third column gives the Pubmed IDs of the references.
miRNAs collectively target more than 75% these genes (Fig S.3). Although some of this
consolidation of targeting can be explained with a large variability in number of mRNA
targets per miRNA, there is significantly more consolidation than we would expect by
chance (Fig. 4.10, P < 10−19)
These observations suggest that important miRNA regulators of specific biological
processes can be identified in silico through gene set enrichment analysis of BayMiR
target sets.
4.2.3 miRNA activity and expression profiles are significantly
correlated
To test if miRNA activities obtained using the BayMiR procedure are correlated with
the miRNA expression profiles, we downloaded the miRNA expression data from the
Chapter 4. BayMiR: a computational miRNA target prediction method64
Figure 4.7: Enrichment map for top 30 most frequent KEGG pathways; each node indicates apathway; there is an edge between two pathways if they share more than ten miRNAs; the edgesthickness is proportional to the number of shared miRNAs; the size of each node is proportionalto the number of miRNAs enriched in the corresponding pathway. Note that we say a miRNAenriched in a pathway when the predicted targets of the miRNA are over-represented in thepathway based on a statistical test.
mimiRNA repository [57] and computed the correlation between matched activity and
expression vectors. After excluding miRNA expression data that are not consistent across
multiple resources (according to P > 0.05 reported in the mimiRNA resource) and map-
ping the biological samples of the miRNA expression data to our biological groups we
obtained paired matches for 48 miRNAs. Interestingly, we found that 96 % of the pairs
(46 out 48) have the Pearson correlation coefficients greater than 0.35 compared to 4%
positive correlation obtained from a similar analysis but with the permuted activity vec-
tors (P < 0.05 and Table S.4). This correlation analysis shows that miRNA activities
inferred from the mean of inverse expression of their targets are highly correlated with
expression data for those miRNAs.
Chapter 4. BayMiR: a computational miRNA target prediction method65
Figure 4.8: WNT signaling pathway: 32 targets of 5 miRNAs are involved in the pathway(red boxes). 14 mRNAs are targeted by the remaining miRNAs are colored in yellow; and23 mRNAs involved in the pathway were excluded from the BayMiR target list since theirexpression variabilities across arrays were very low (white boxes). The miRNA family IDs: miR-518a-5p/520d-5p/524-5p,miR-556-3p,miR-4514/4692,miR-548aeajamx ,miR-135ab/135a-5p.
4.2.4 mRNAs harboring miRNA target sites near the both ends
of the 3’ UTR have higher endogenous down-regulation
signals
To investigate any association between endogenous target repression scores provided by
BayMiR and sequence and gene variation determinants, we measured the correlation
between the scores of all paired determinants(Fig. 4.11). The heat map shows that
BayMiR scores correlate most highly with the position contribution scores. In addition,
when we ranked all mRNA-miRNA pairs based on their BayMiR scores, the top 50
percentile of the ranked list have higher position contribution scores than the bottom
50 percentile (P < 10−200, Wilcoxon-Mann-Whitney test and Fig. 4.12). The position
contribution scores provide estimate of expected repression in terms of the distance of
Chapter 4. BayMiR: a computational miRNA target prediction method66
Figure 4.9: KEGG “Pathways in cancer”: 68 targets of 10 miRNAs are involved in thepathway (red boxes). 38 genes targeted by the other miRNAs are colored in yellow; and62 genes involved in the pathway were excluded from the BayMiR target list since theirexpression variabilities across arrays were very low (white boxes). The miRNA family IDs:miR-17/17-5p/20ab/20b-5p/93/106ab/427/518a-3p/519d,miR-548ah/3609,miR-4729,miR-203,miR-548p,miR-3647-3p,miR-300/381/539-3p,miR-142-5p,miR-545,miR-125a-5p/125b-5p/351/670/4319’
targets sites from the both end of the 3’ UTR; target sites near to the ORF or the
poly(A) tail are more effective [7] and more conserved than those in the middle of the
3’ UTR [131]. To further investigate this, we located 1,567,294 conserved target sites
matched to the seed region of 1,032 miRNAs on the 3’ UTR of 17,840 mRNAs retrieved
from TargetScan 6.2. The start position of each target site was divided by the length
of the 3’ UTR to obtain the relative position of miRNAs on the 3’ UTRs, denoted by
0 < LmiRNA < 1. We found that target sites located on the both end of 3’ UTRs
(LmiRNA < 0.25 or LmiRNA > 0.75) are assigned higher BayMiR scores than those on the
middle (P < 10−200, Wilcoxon-Mann-Whitney test). Furthermore, we found that target
sites located in the terminus close to the poly(A) tail (LmiRNA > 0.75) are assigned higher
BayMiR scores than to those located on the other terminus (LmiRNA < 0.25, P < 10−5,
Wilcoxon-Mann-Whitney test). Poly(A) shortening is known as one of the mechanisms
Chapter 4. BayMiR: a computational miRNA target prediction method67
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
on average 10% of most targeting miRNAsin each pathway target 88 % of all targets
top N % of miRNAs sorted by target set size
prop
ortio
n of
mR
NA
s ta
rget
ed
KEGG pathway targetsall BayMiR targets
Figure 4.10: miRNA targeting. A small percent of all targeting miRNAs collectively target alarge portion of miRNA targets. The figure shows two cumulative distributions. (red) Propor-tion of set of all mRNAs with BayMiR targets that are covered by union of the target sets ofthe top N% miRNAs (sorted by number of targets) where N increases along the x-axis. (blue)Average of cumulative distributions for all enriched KEGG pathways (n = 108), where eachdistribution was created as per the red line but restricted to targeted mRNA associated withthe KEGG pathways, and their targeting miRNAs.
of mRNA degradation; this mechanism strongly favors the preference of miRNA target
sites near the end of 3’UTR close to the poly(A) tail to recruits mRNA deadenylase
complexes [132]. Together these lines of evidence underline the importance of target site
position in miRNA targeting.
BayMiR scores are also highly correlated with gene variation scores suggesting that
mRNAs with high expression variability are under selective pressure to be miRNA targets.
Chapter 4. BayMiR: a computational miRNA target prediction method68
1.00
0.11
0.40
0.58
0.05
-0.29
0.55
0.12
-0.02
0.11
1.00
-0.05
0.00
0.00
-0.04
0.00
0.00
0.00
0.40
-0.05
1.00
-0.04
-0.03
0.24
0.02
-0.07
0.02
0.58
0.00
-0.04
1.00
-0.01
0.00
-0.11
0.18
-0.02
0.05
0.00
-0.03
-0.01
1.00
-0.20
-0.26
0.04
-0.03
-0.29
-0.04
0.24
0.00
-0.20
1.00
0.07
-0.02
0.02
0.55
0.00
0.02
-0.11
-0.26
0.07
1.00
0.03
-0.01
0.12
0.00
-0.07
0.18
0.04
-0.02
0.03
1.00
0.18
-0.02
0.00
0.02
-0.02
-0.03
0.02
-0.01
0.18
1.00
conte
xt+ sc
ore
3'
UTR
local
AU
posit
ion co
ntribu
tion
targe
t abu
ndan
ce
seed
pairin
g stab
ility
site t
ype
BayMiR
score
gene
varia
tion
context+ score
3' UTR
local AU
position contribution
target abundance
seed pairing stability
site type
BayMiR score
gene variation
Figure 4.11: The heat map shows the Pearson correlation coefficients between each pair ofnine determinants. The correlation coefficients for pairs labeled by 0 are not significant (i.e.,P > 0.05).
4.3 Materials and Methods
4.3.1 BayMiR model
BayMiR applies the following linear model to relate the changes in the log-transformed
expression level of mRNAs to the activity level of miRNAs:
yiM×1
= WM×K
hiK×1
+ εM×1
where yi ∈ RM denote the change in the expression level of the ith mRNA measured
across M samples and is obtained by subtracting the mean; W = [wm,k]M×K denote the
activity levels of K miRNAs across M samples, and each element of hi ∈ R+K represents
the contribution of the corresponding miRNA in down-regulating the expression of the
Chapter 4. BayMiR: a computational miRNA target prediction method69
ith mRNA; ε models error. In our problem K = 1, 252; M = 369 and i = 1, . . . 13, 000.
In this linear equation, yi and W and are observed; hi is the desired unknown variable.
BayMiR infers h by maximizing the posterior probability of h given y and W:
h = arg max log p(h|y,W).
Using Bayes’s rule
h = arg max log p(h|y,W) = arg maxp(y,h,W)∫
hp(y,h,W)dh
= arg maxp(y|h,W)p(h)∫
(p(y|h,W))p(h)dh.
Since the denominator is not a function of h,
h = arg max p(y|h,W)p(h)
where
p(y|h,W) =1
(2π)K2 σKn
exp(−1
2
(∑
m(ym −wm,:h)2)
σ2n
)
We assume that the prior probability p(h) is a compromise between Gaussian and Laplace
distributions given by
pα1,α2(h) = C(α1, α2) exp(−α1|h|2 − α2|h|1)
where | · |2 and | · |1 denote the norm one and two, respectively. Since h appears only in
the argument of exponential functions in the above probabilities and since exponential
function is monotonic, maximizing the posterior probability is equivalent to minimizing
Chapter 4. BayMiR: a computational miRNA target prediction method70
the expression in the argument of exponential function; hence
h = arg min1
2σn
∑m
(ym −wm,:h)2 + α1|h|2 + α2|h|1) (4.1)
Multiplying this expression by σn and let λ1 = σnα1 and λ2 = σnα2, this Bayesian
inference problem can be written in form of a penalized linear regression optimization
given by:
h = arg min1
2
∑m
(ym −wm,:h)2 + λ1
∑k
|hk|+ λ2
∑k
h2k (4.2)
where λis are two tuning parameters and wm,: is a row vector representing the expression
activity of miRNAs in the mth sample.
We solve this optimization using the coordinate-descent method [128] in which, the
objective function is partially optimized with respect to each individual coefficient in an
iterative manner described as follows.
The above equation can be rewritten as
f(h) =1
2
M∑m=1
(ym −∑k 6=j
wm,khk − wm,jhj)2 + λ1
∑k 6=j
|hk|+ λ1|hj|+ λ2
∑k 6=j
h2k + λ2h
2j (4.3)
We minimize f(h) with respect to hj by setting derivative of f(h) to zero. If hj > 0
∂f(h)
∂hj= −
M∑m=1
wm,j(ym −∑k 6=j
wm,khk) +M∑m=1
w2m,jhj + λ1 + λ2hj (4.4)
Therefore
hj =
∑M
m=1(ym−∑K
k 6=j wnm,khk)wn
mj−λ1∑Mm=1 w
n2mj+λ2
,∑M
m=1(ym −∑K
k 6=j wnm,khk)w
nmj > λ1
0, otherwise.
(4.5)
Chapter 4. BayMiR: a computational miRNA target prediction method71
likewise, if hj ≤ 0
hj =
∑M
m=1(ym−∑K
k 6=j wnm,khk)wn
mj+λ1∑Mm=1 w
n2mj+λ2
,∑M
m=1(ym −∑K
k 6=j wnm,khk)w
nmj < λ1
0, otherwise.
(4.6)
In a compact form, the above expressions can be rewritten as
hj =S(∑M
m=1(ym −∑K
k 6=j wm,khk)wmj, λ1
)∑M
m=1w2mj + λ2
(4.7)
where S(x, t) is the soft threshold operator defined as sign(x)(|x| − t)+ where (y)+ = 0
if y < 0 and (y)+ = y if y ≥ 0 [133].
The optimization is based on pathwise coordinate descent where we solve a sequence
of scalar minimization subproblems given in the following routine:
Algorithm(Pathwise coordinate analysis)
while c < K and itr < maxitr
for j = 1 : K
hj ←S
(∑Mm=1(ym−
∑Kk 6=j wm,khk)wmj ,λ1
)∑M
m=1 w2mj+λ2
if |holdj −hjhold
| < ε⇒ c← c+ 1
holdj ← hj
end
itr⇒ itr + 1 and c = 0
end
Since miRNA and target mRNA expression data are anti-correlated [73], for each
miRNA, BayMiR uses the negative mean of target expression levels as an estimate of the
activity level of the miRNA as follows:
wk = − 1
Nk
Nk∑i=1
yi where Nk : number of target genes for kth miRNA (4.8)
Chapter 4. BayMiR: a computational miRNA target prediction method72
and then each activity vector is normalized wk ← wk
‖wk‖. As such, the activity of the
miRNA will be deemed to be positive when its sequence-predicted targets are below
their mean expression level. BayMiR considers a gene as a potential target of a miRNA
if there is a complementary conserved match sites to the seed region of the miRNA.
We tested to see if BayMiR suffers from over-fitting. We divided the biological samples
into training (340 samples) and test (28 samples) sets and predicted the scores using only
the training data. We then used the predicted scores to estimate the gene expression
profiles of the test set and compared it with original test data. Fig. 4.13 illustrates the
training and test errors versus different values of penalties for training and test data.
The difference in prediction error between training and test data is about 0.2, confirming
BaymiR can predict new profiles with reasonable accuracy. In order to see how well
predicted profiles approximate the actual profiles, we plotted the actual down-regulated
profiles along with the predicted profiles for 9 randomly selected genes (Fig. 4.14) . We
note that BaymiR considers only down-regulated genes as potential targets for miRNAs.
These results show that BayMiR does not suffer from over-fitting and can predict targets
that down-regulated in a sample not included in the training data.
4.3.2 Processing mRNA expression Data
The mRNA expression data were downloaded from ArrayExpress Atlas repository at
EMBL-EBI [134], available at www.ebi.ac.uk/gxa/experiment/E-MTAB-62 . The
data consists of 5372 samples profiled on HG-U133A array platforms; As described in
[134], the data were normalized and manually labeled into 369 biological groups covering
a wide range of healthy/cancer tissues, conditions, and cell lines. We did the following
processing on the retrieved expression data; all probe sets with no gene symbols were
excluded. The samples belonging to each biological groups were averaged—the samples
within one biological group are highly correlated (ρ > 0.85). An upper/lower threshold
defined by lth = Q2 − 1.5(Q4−Q2) and uth = Q4 + 1.5(Q4 − Q2) respectively, when Q2
Chapter 4. BayMiR: a computational miRNA target prediction method73
and Q4 represent the second and forth quartiles, were specified to detect and modify the
extreme outliers. The outliers were then replaced with lth or uth. The gene symbol list
in both expression and sequence datasets were updated based on the latest release of
the HUGO Gene Nomenclature Committee (HGNC) (Feb.2012) to have consistent gene
symbols.
4.3.3 MiRNA-mRNA interaction analysis
We downloaded the list of 19,055 protein coding gene symbols from HGNC database
and the list of 1,537 miRNA IDs from MiRbase V.19. We then built seven 19, 055 ×
1, 532 binary connectivity matrices based on the mRNA-miRNA interactions given by:
Targetscan V6.1, [6] and TarBase [129]. All miRNAs are grouped into 1,251 miRNA
families as defined by TargetScan—miRNAs sharing the same seed region. Conserved
target sites are also retrieved from the TargetScan repository.
4.3.4 Enrichment analysis
Gene ontology biological process (GO-BP) annotations were downloaded from the Gene
Ontology Website on April 15th 2012. The file contains 14,000 annotations for 15,000
genes. The enrichment analysis was performed using Fisher Exact test. The test was
performed on BayMiR predicted targets of each of miRNA families. The enrichment
pvalues were corrected using Benjamini-Hochberg test[135] and a FDR cutoff equal to
0.1 was chosen to selected significant enrichment categories. The KEGG enrichment
analysis carried out in a similar manner; The list of 253 KEGG human pathways were with
associated genes downloaded from http://www.genome.jp/kegg/; Fisher exact test
was used to find enriched pathways for BayMiR targets of all miRNA families.
Chapter 4. BayMiR: a computational miRNA target prediction method74
4.3.5 Availability of BayMiR and supporting data
The code for BayMiR is available at morrislab.med.utoronto.ca/BayMiR. package in-
cludes scripts and instructions to re-generate BayMiR scores from the “E-MTAB-62” file
and sequence information, however, a pre-computed version of the BayMiR scores are
also uploaded.
4.4 Discussion
Large-scale mRNA expression profiling datasets provide a rich resource to study the
regulatory impact of miRNAs. Here, we showed that the impact of miRNAs on targets is
detectable in normal tissue and unperturbed cell line data. Given a list of miRNAs with
partial complementarity to a particular mRNA, our computational technique, BayMiR,
scores the relative regulatory impact of the miRNA among other predicting targeting
miRNAs. We showed that BayMiR estimates of miRNA regulatory impact better reflect
independent measures of this impact than the TargetScan context scores; furthermore,
we showed that the context scores and BayMiR can be combined to generate even better
estimates.
BayMiR has several features that make it particularly useful for estimating the poten-
tial regulatory impact of a miRNA. BayMiR models the combinatorial effect of multiple
regulatory miRNAs on a single target which is critical, as most mRNAs are likely to
be targeted by multiple miRNAs (Fig. 4.15). BayMiR is fast; its runtime is less than a
minute in the current version (10,345 mRNAs, 1,123 miRNAs and 359 biological groups),
so is easily applied to a subset of or all available gene expression data. Because BayMiR
estimates the activity of miRNAs based on mRNA expression data, there is no need
for matching miRNA expression profiles. As such, BayMiR predictions can be easily
extended when new miRNAs are found and the current version of BayMiR incorporates
all miRNAs retrieved from the latest release of miRBase (v.19).
Chapter 4. BayMiR: a computational miRNA target prediction method75
Combinatorial regulation by multiple miRNAs has been described for particular mR-
NAs [7] and is likely to play a large role in mRNA expression regulation [51]. Indeed,
human 3’ UTRs contain conserved seed matches for on average 33 of miRNAs (median =
16) (Fig. 4.15). This combinatorial regulation may explain the observations that inverse
correlation under endogenous condition between miRNA and mRNA expression does
not provide strong and consistent evidence of targeting [57, 110] and that the impact of
miRNA regulation on mRNA levels can only be seen within the context of other miRNA
regulations [51, 110]. Fig. 4.16 shows a toy example where combinatorial regulation
masks inverse correlation between miRNA regulators and their targets.
There are a large number of other methods [54–56, 110, 136–144] that infer ei-
ther miRNA activity or predict miRNA targets based on the expression levels of their
sequence-predicted targets, however, no method both infers miRNA activity and predicts
miRNA targets while considering the impact of other miRNAs. For example, Cometa at-
tempts to predict miRNA targets, by identifying tight, co-expressed clusters of sequence-
predicted targets[55]; however it doesn’t account for combinatorial regulation by multiple
miRNAs and provides no estimate of miRNA activity. Other methods such as Sylamer
[54], and a number of web-based applications [138–140], identify miRNA seed regions
that significantly enriched in the 3’ UTRs of down-regulated transcripts as a way of
assessing miRNA activity level in a tissue. Sylamer does not however take into ac-
count multiple targeting effect of miRNAs and has not been used to score the individual
miRNA-mRNA pairs. Other methods use paired miRNA-mRNA expression patterns
to augment sequence-based target prediction [40–53]. These methods typically require
paired miRNA and mRNA measurements in a large number of samples to generate reli-
able predictions. This type of paired expression data is however rare and unavailable for
some miRNAs [145]. On the other hand, there is very large amount of mRNA expres-
sion data available for BayMiR. Two intronic miRNA target prediction methods, InMiR
and Hoctar [56, 110] predict the intronic miRNA targets using the expression levels of
Chapter 4. BayMiR: a computational miRNA target prediction method76
their host genes, and subsequently can also incorporate large mRNA expression data.
However, these methods can only be applied to intronic miRNAs and only to those miR-
NAs whose host gene expression is a good surrogate for their activity. Many host gene
expression levels are not good surrogates [110–113].
Our analysis also reveals that mRNAs with more target sites have higher expression
variation when compared to a random subset of genes and expression variance consistently
increases as number of target sites do (P < 10−33, Fig. 4.17). These observations suggest
that mRNAs with highly variable expression levels are much more likely to be regulated
by miRNAs; our finding is consistent with recent reports that genes regulated by miRNAs
have higher expression variability among humans and between human and other primate
species [146].
miRNA transfection experiments have suggested that the degree of mRNA repression
induced by two seeds is equivalent to the product of repression induced by the seeds
individually [7]. We have observed a similar effect. The version of BayMiR described
here implicitly assumes multiplicative interactions because it log-transforms the mRNA
expression levels before performing regression. Applying BayMiR to non-transformed
expression levels assumes additive interactions and this version of BayMiR performs
much worse in our benchmarks (data not shown).
In this chapter, we introduced BayMiR and demonstrated its merits when compared
to two the state-of-the-art miRNA computational prediction methods. BayMiR applies a
more relevant biological model and uses a large collection gene expression data to decipher
the impact of miRNAs on gene expression data. We measured this impact in terms of
endogenous target repression scores for about half a million miRNA-mRNA duplexes.
This new scoring strategy can be used alone or along with other sequence determinants
to predict functional miRNA-mRNA interactions.
Chapter 4. BayMiR: a computational miRNA target prediction method77
-0.5 0 0.50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Position contribution scores
Cum
ulat
ive
fract
ion
P< 10-100
BayMiR scores > medianBayMiR scores < median
Figure 4.12: Blue: the position contribution scores of miRNA-mRNA pairs whose BayMiRscores > medianBayMiRscores. Red: the position contribution scores of miRNA-mRNA pairswhose BayMiR scores < medianBayMiRscores.
Chapter 4. BayMiR: a computational miRNA target prediction method78
1 1e−1 1e−2 1e−3 1e−4 1e−5 1e−6 1e−7 1e−8 00.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
λ
ER
MS
Training Biological GroupsTest Bioogical Groups
Figure 4.13: BaymiR predicts down-regulated genes in samples not included in training data.Blue circled line: prediction error on training data and red circled line: prediction error on testdata.
5 10 15 20 25−10
−5
0
samples
inte
nsity
5 10 15 20 25−2
−1
0
samples
inte
nsity
5 10 15 20 25−2
−1
0
samples
inte
nsity
5 10 15 20 25−4
−2
0
samples
inte
nsity
5 10 15 20 25−4
−2
0
samples
inte
nsity
5 10 15 20 25−4
−2
0
samples
inte
nsity
5 10 15 20 25−2
−1
0
samples
inte
nsity
5 10 15 20 25
−4
−2
0
samples
inte
nsity
5 10 15 20 25−2
−1
0
samples
inte
nsity
Figure 4.14: Estimated (red) and actual (blue) expression profiles of nine genes across 28 testsamples.
Chapter 4. BayMiR: a computational miRNA target prediction method79
100 101 102 1030
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
# of distinct seed matches in the 3 UTRs of 14,816 transcripts (log-scaled)
Cum
ulat
ive
Freq
uenc
y
mean = 33median = 16
Figure 4.15: The 3‘ UTR of mRNAs harbor many conserved seed matches. Shown is thecumulative distribution of number of seed matches in the 3‘UTR of 14,816 mRNA transcriptswith at least one miRNA seed match.
mRNA mRNA
miRNA1+miRNA2+miRNA3
miRNA 1
miRNA 2
miRNA 3
Sample
-
+
Exp
ress
ion
leve
l
Cor
r =
-0.
25
Cor
r =
-0.
25P
<0.
75
Cor
r =
-0.
25
P<
0.75
Co
rr =
-0.
25P
<0.
75 C
orr
= -
1P
<2-1
00
Figure 4.16: Example of combinatorial regulation masking inverse correlation. Shown in green isthe expression level of a target gene and in red the expression levels of three targeting miRNAs.The negative correlation of each individual miRNAs with the target is insignificant, but whenconsidered together they explain perfectly the down-regulation impact of miRNAs.
Chapter 4. BayMiR: a computational miRNA target prediction method80
0 50 100 150 200 250 300 350 400 450 5000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of Targeting miRNAs
CD
F
Pvalue
< 10-33
gene set with high variationsame size random gene set
0 50 100 150 200 250 3000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of Targeting miRNAs
CD
F
P2-quantile
< 10-11
P3-quantile
< 10-17
var > var2-quantile
var > var3-quantile
all genes
Figure 11:
32
Figure 4.17: Gene expression variability increases as the number of target sites increases in the3’ UTR of genes. (top) miRNA targets have high expression variation. (bottom) Red and bluedemonstrate the cumulative distributions of genes whose variance is larger than median and75th percentile, respectively. Dark: cumulative distribution of variances corresponding to allgenes.
Chapter 5
Impact of miRNAs on long
non-coding RNAs
5.1 Background
Long non-coding RNAs (lncRNAs) are 200nt-100,000nt nucleotide-long RNA transcripts
that do not encode protein, most likely due to the lack of open reading frames. Genecode
v14 has annotated 21,271 lncRNA transcripts (about 9,000 genes) located in genic (anti-
sense or intronic) and intergenic regions [147]. lncRNAs are expressed in the cytoplasm
and nucleus, can be spliced or un-spliced, polyadenylated or not, and host many short
RNAs notably snoRNA and miRNAs [148]. In general, lncRNA abundance is low com-
pared to mRNAs, but surprisingly high in some tissues, suggesting lncRNAs may interact
with cell-specific protein complexes to regulate cell-specific gene expression [59]. Indeed,
recent studies have confirmed that lncRNAs participate in mRNA post-transcriptional
regulation by various mechanisms [58, 149] such as protecting or decaying mRNAs [150],
enhancing or inhibiting mRNA translation [61, 62], and perturbing miRNA activities in
the cell [60]. The latter is the subject of this study.
miRNAs are short non-coding RNAs that partially base pair to many regions in the
81
Chapter 5. Impact of miRNAs on long non-coding RNAs 82
genome containing functional elements [101]. Many studies have shown that miRNAs
recruit the ribonucleoprotein complex, called RISC, to repress the regulation of mRNAs
when 5’ end of miRNAs have nearly perfect base pairing to the 3’ UTR region of mRNAs
[1, 3, 6]. During the past decade miRNA study has mainly focused on their impact
on mRNA regulation whereas interaction between miRNAs and lncRNAs is relatively
unknown.
New studies have suggested that the excessive abundance of lncRNAs may as a decoy
for miRNAs [150], establishing different hypotheses about the functional role of lncRNAs
including: (i) lncRNAs may inhibit miRNA function by sequestering them; (ii) lncRNAs
may increase the expression levels of some mRNAs by acting as a sponge for miRNAs
targeting these mRNAs. (iii) miRNAs repress lncRNA regulation by recruiting RISC in
a similar manner they target mRNAs. Recent genome-wide annotation and profiling of
long non-coding RNAs has created unprecedented opportunity to explore the role of lncR-
NAs in post-transcriptional gene regulation. In this study, we analyzed the expression
abundance of 7,535 RNA transcripts including 2,132 antisense lncRNA (resided on the
opposite strand of protein coding genes), 2,986 lincRNAs (resided on intergenic regions),
241 sense-intronic lncRNAs and 2,176 mRNAs across 26 tissues and 5 cell lines. We
investigated whether miRNAs can have any impact on lncRNA abundance and whether
lncRNAs can sponge miRNAs to promote mRNA regulation. This study is the first
that explores RNA expression data across a large number of tissues to identify possible
interaction between lncRNAs and miRNAs. Juan et al detected some lncRNAs that sig-
nificantly anti-correlated with miRNAs that have seed match sites in these lncRNAs in
normal and tumor breast samples [151]. Guttman, et al. employed lentiviral shRNAs to
silence 147 lncRNAs at an average efficacy of 75%, demonstrating that lncRNAs in gen-
eral are susceptible to regulation by Argonauts-small RNA complexes despite frequent
nucleus localization.
Our work revealed several important biological insights about interactions between
Chapter 5. Impact of miRNAs on long non-coding RNAs 83
lncRNAs, mRNAs, and miRNAs. We found that the lncRNA target set of some miR-
NAs have relatively low abundance in the tissues that these miRNAs are highly active,
suggesting that miRNAs may modulate the expression of these lncRNAs in some specific
tissues similar to cell-type specific miRNA induced mRNA repression [34]. We also found
lncRNAs and mRNAs that shared many targeting miRNAs are significantly positively
correlated, indicating that these set of highly expressed lncRNAs may decoy the miRNAs
to promote mRNA regulation. Our analysis also showed that the lncRNAs that highly
expressed in the cytoplasm are under selective pressure to have less target sites com-
pared to those highly expressed in the nucleus, suggesting that miRNAs may regulate
only cytoplasmic specific lncRNAs.
5.2 Results
5.2.1 lncRNA targets of some miRNAs have relative low ex-
pression in the tissues in which the miRNAs are highly
active
We tested to see if the target set of a conserved miRNA family are repressed in a tissue
in which at least a member of the miRNA family is highly expressed compared to other
tissues. We extracted the list of lncRNA targets of 87 conserved miRNA families from
miRcode [152] (see Materials). For each miRNA family, we ranked the mean of the
expression levels of its targets across all tissues. We also ranked the expression levels of
all lncRNA targets across the tissues. We then computed the element-wise ratio of these
two ranked vectors (i.e. the rank of the target set divided by the rank of all targets) and
sorted the tissues in ascending order based on their ratio scores; thus, for a given miRNA,
the tissue with the smallest score is the tissue in which the target set of the miRNA have
relatively the lowest expression level compared to the other tissues. Interestingly, among
Chapter 5. Impact of miRNAs on long non-coding RNAs 84
41 miRNA families considered in this test (number of targets > 10), we found the lncRNA
targets of 13 miRNA families have the lowest expression ranks in the tissue in which a
member of these miRNA families is highly expressed, suggesting that these miRNAs may
have repressed the expression of their targets in the tissue (Figure 1a-h).
We found that the target set of miR-375 that have lowest expression in Esophagus
compared to other tissues (Figure 1a); Li et al. recently found the miR-375 expression
level in the normal Esophagus is significantly higher than that of the cancerous Esophagus
[153], supporting our finding that highly expressed miR-375 in Esophagus may regulate
the lncRNA target set in this tissue.
We also found the target set of miR-101 have the lowest expression score in Skeletal
Muscles compared to other tissues (Figure 1b); Thomsen et al profiled the expression
levels of 212 miRNAs using deep sequencing and found the miR-101 is the second top
most expressed miRNA in this tissue after miR-1 [154], a supportive evidence that miR-
101 may mediate its target set in the Skeletal Muscle tissue. Another example is miR-122
which was shown to preferentially expressed in liver [64] where we found the target set
of this miRNA have the lowest rank (Figure 1c).
We also found that target set of miR-383 are repressed in liver compared to other
tissues (Figure 1d); miR-383 was shown to be expressed in liver resident stem cells
(HLSCs) [155]. In addition, our analysis shows that this miR-145 may down-regulate the
expression level of its target in heart; Li et al’s experiments showed that miR-145 plays
an important role in regulating mitochondrial apoptotic pathway in heart [156] (Figure
1e).
The target set of miR-34 have low expression levels in testicle and lung where miR-34
have been measured to be highly expressed [157].
Using quantitative real-time RT-PCR assays, Wang et al found that miR-23 is highly
expressed in liver, skeletal muscle, lung, heart, and kidney [158]. Interestingly, we found
that the target set of miR-23 have low relative expression scores in skeletal muscle (rank
Chapter 5. Impact of miRNAs on long non-coding RNAs 85
1), kidney (rank 2), and heart (rank 4) (Figure 1f), suggesting that miR-23 mediate the
expression of their potential targets in these tissues. Additionally, miR-129 was known
to be a cerebellum specific miRNA [159] and in our list, cerebellum is the tissue in
which lncRNA targets of miR-129 have the lowest expression compared to other tissues
(Figure 1g). Also miR-203 has shown to be expressed in the normal bladder tissue [160]
and we found the target set of miR-203 have the lowest relative expression in bladder
compared to other tissues (Figure 1h). Recently Gou et al [161] detected that miR-148a
expression is relatively high in intestine, stomach, heart, colon and liver using Northern
blot experiments. Our analysis shows that the target set of miR-148a in intestine have
the lowest expression compared to the other tissues (Figure 1i). Our results also show
the low relative expression of the target set of miR-148 in brain related tissues where Gou
et al’s experiment could not find high expression level for miR-148. Another miRNA we
found in our list is miR-133 whose targets have low relative expression score in skeletal
muscle (rank 1), and brain (rank 3 and 4) (Figure 1j). Hon et al. found that miR-107
and miR-133, are indeed strongly expressed in brain, and muscle [162]. Finally, miR-
125 is expressed in normal bladder and suppresses the development of bladder cancer
by targeting E2F3 [163]. miR-125 target set have low expression score (rank1) in our
analysis (Figure 1k).
5.2.2 lncRNAs that significantly positively correlated with mR-
NAs may decoy their common targeting miRNAs
Some lncRNAs that contain miRNA target sites are suggested to compete with mRNAs to
bind to miRNAs and subsequently indirectly interfere in post-transcriptional regulation
of mRNAs [150]. In this case, lncRNAs are said to act as miRNA sponges. For example,
lncRNA linc-MD1 has shown to positively regulate the expression of mRNAs MAML1
and MEF2C by acting as sponges of their targeting miRNAs: miR-133 and miR-135
[164]. We investigated to see if mRNA-lncRNA pairs that share common miRNA target
Chapter 5. Impact of miRNAs on long non-coding RNAs 86
Figure 1
0
2
4
6
8
10
12
Esoph
agus
Bladd
er
Cervix
Colon
Trach
ea
Inte
stine
Ovary
Kidne
yLu
ng
Hipoca
mp
Tempo
ral S
uper
ior
Human
Tes
tis
Entor
rinal
Corte
x Par
ietal
Amigd
ala
Corte
x Fro
ntal
Splee
n
Human
Bra
in
Skelet
al M
uscle
Place
nta
Cereb
el
HeartLiv
er
Adipo
se
Thym
us
Mes
ence
fal
Re
lativ
e ra
nk
scor
e
miR-375
0
1
2
3
4
5
Skelet
al M
uscle
ColonHea
rt
Place
nta
Bladd
er
Kidne
y
Hipoca
mp
Ovary
Inte
stine
Trach
ea
Human
Tes
tis
Thym
us
Adipo
se
Tempo
ral S
uper
ior
Corte
x Par
ietal
Amigd
ala
Entor
rinal
Corte
x Fro
ntal
Lung
Human
Bra
in
Esoph
agusLiv
er
Splee
n
Cervix
Cereb
el
Mes
ence
fal
Re
lativ
e ra
nk
scor
e
miR-101/101ab
0
1
2
3
4
5
Liver
Bladd
er
Skelet
al M
uscle
Esoph
agus
Trach
ea
Kidne
y
Human
Tes
tis
Colon
Hipoca
mp
Inte
stine
Amigd
ala
Tempo
ral S
uper
ior
Place
nta
Heart
Entor
rinal
OvaryLu
ng
Corte
x Par
ietal
Corte
x Fro
ntal
Human
Bra
in
Adipo
se
Cervix
Cereb
el
Splee
n
Mes
ence
fal
Thym
us
Re
lativ
e ra
nk
scor
e
miR-383
0
2
4
6
8
10
12
14
Heart
Skelet
al M
uscle
Bladd
er
Kidne
yLiv
er
Trach
ea
Colon
Tempo
ral S
uper
ior
Inte
stine
Human
Tes
tis
Ovary
Hipoca
mp
Amigd
ala
Corte
x Par
ietal
Esoph
agus
Corte
x Fro
ntal
Lung
Entor
rinal
Human
Bra
in
Place
nta
Adipo
se
Splee
n
Cereb
el
Cervix
Thym
us
Mes
ence
fal
Re
lativ
e ra
nk
scor
e
miR-145
0
1
2
3
4
5
Colon
Kidne
y
Inte
stine
Adipo
seLu
ng
Cereb
el
Amigd
ala
Trach
ea
Human
Tes
tisHea
rt
Entor
rinal
Mes
ence
fal
Hipoca
mp
Tempo
ral S
uper
ior
Bladd
er
Ovary
Human
Bra
in
Corte
x Par
ietal
Corte
x Fro
ntal
Place
nta
Esoph
agus
Skelet
al M
uscle
Cervix
Liver
Splee
n
Thym
us
Re
lativ
e ra
nk
scor
e
miR-22/22-3p
0
0.5
1
1.5
2
2.5
3
Kidne
y
Place
ntaLu
ng
Human
Tes
tis
Esoph
agus
Trach
ea
Colon
Amigd
ala
Entor
rinal
Hipoca
mp
Human
Bra
in
Mes
ence
fal
Cereb
el
Tempo
ral S
uper
ior
Inte
stine
Bladd
er
Cervix
Corte
x Par
ietal
Corte
x Fro
ntal
Thym
us
Ovary
Splee
n
Adipo
seHea
rtLiv
er
Skelet
al M
uscle
Re
lativ
e ra
nk
scor
e
miR-34ac/34bc-5p/449abc/449c-5p
Figure 5.1: lncRNA targets have low expression in tissues where their targeting miRNAs arehighly expressed. Each subplot shows the relative expression score of lncRNA targets for givenmiRNAs across 26 tissues
Chapter 5. Impact of miRNAs on long non-coding RNAs 87
sites are enriched for significantly positively or negatively correlated pairs. We computed
the correlation coefficients between all lncRNAs and mRNAs in the dataset and excluded
insignificantly correlated pairs (i.e. those with P > 0.05). The mRNA-lncRNA pairs were
sorted based on the relative number of shared miRNA seed matches which is computed as
the number of common miRNA seed match sites between the lncRNA and mRNA divided
by the length of the lncRNA or the length of 3’ UTR of the mRMA whichever is greater.
We observed that almost half of significantly correlated pairs share at least one target site
(Figure 2.a). We performed a hyper-geometric test (see Materials) to examine if highly
positively correlated pairs are enriched in either end of the sorted list. Figure2b shows
the enrichment plot of positively correlated pairs in the top M set of the sorted list; as
M increases. We observed that first top 400 pairs (those that share about 20 % percent
of targeting miRNAs) in the sorted list significantly enriched for positively correlated
mRNA-lncRNA pairs (P < 0.01). We did not observe any significant enrichment when
M > 400. We also observed the same enrichment pattern when we did not divide the
number of common miRNA target sites by the transcript length (Figure 2 c). To test
if pairs with shared miRNAs are enriched for negatively correlated pairs, we repeated
the analysis but this time searching for negative correlated pairs; we however could not
find any enrichment for the set of negatively correlated lncRNA-mRNAs in the top M
pair of the list (data not shown). Although the enrichment level is not remarkably
significant (P < 0.01), this analysis may suggest some biological insights about possible
role of lncRNAs as miRNA sponges as described in the following. Earlier we discussed
that lncRNAs may sponge miRNAs so indirectly increase the expression levels of those
mRNAs that otherwise would have been the targets of sponged miRNAs. However what
will happen to lncRNAs after sponging miRNAs is not clear. In order for a lncRNA to
act as an effective miRNA sponge, it should highly expressed in the cell [60]; the lncRNA
transcript after sponging miRNAs can be degraded; if this occurs, the expression levels
of lncRNAs and mRNAs that compete for miRNAs should be negatively correlated. Our
Chapter 5. Impact of miRNAs on long non-coding RNAs 88
Figure 2
0 5000 10000 150000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Pair mRNA-lncRNA sorted based on relative # of shared miRNAs
Rel
ativ
e #
of s
har
ed
miR
NA
s
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
lincRNAs sorted based on relative shared miRNAs with mRNAs
-log
10 P
(h
ype
rge
omet
ric
test
)
40
08
00
12
001
600
20
002
400
28
003
200
36
004
000
44
004
800
52
005
600
60
006
400
68
007
200
76
008
000
84
008
800
92
009
600
10
000
10
400
10
800
11
200
11
600
12
000
12
400
12
800
13
200
13
600
14
000
14
400
positive correlation
negative correlation
cutoff line
0
0.5
1
1.5
2
2.5
3
3.5
lncRNAs sorted based on # of shared miRNAs with mRNAs-l
og1
0 P (
hyp
erg
eom
etri
c te
st)
40
08
00
12
001
600
20
002
400
28
003
200
36
004
000
44
004
800
52
005
600
60
006
400
68
007
200
76
008
000
84
008
800
92
009
600
10
000
10
400
10
800
11
200
11
600
12
000
12
400
12
800
13
200
13
600
14
000
14
400
positive correlation
negative correlation
cutoff line
Figure 5.2: (a) Relative number of common miRNA target sites in all significantly correlatedlncRNA-mRNA pairs. (b) Enrichment plot for the set of the positively correlated pairs in theset of lncRNA-mRNA pairs sorted based on the relative number of common miRNA targetsites. (c) Same as b but the number of common miRNA target sites is not divided by thelength of the transcripts. The gray horizontal line depicts the cut-off line, i.e. P = 0.05.
analysis however does not support this hypothesis since we found that lncRNA-mRNA
pairs that share many seed match sites are more positively correlated than negatively. In
conclusions, our analysis in the section supports the following mechanism for lncRNAs
as miRNA sponges. When a mRNA and a lncRNA share more than 20 % targeting
miRNAs, the lncRNA may act as a miRNA sponge. In this mode, the lncRNA regulates
positively and indirectly the mRNA expression. Additionally, the lncRNA transcript is
not degraded by binding miRNAs possibly because these miRNAs do not recruit RISC
to mediate the expression of lncRNAs.
Chapter 5. Impact of miRNAs on long non-coding RNAs 89
5.2.3 Highly expressed lncRNAs in the cytoplasm contain sig-
nificantly less seed match sites than those in the nucleus
Since mature miRNAs are formed in the cytoplasm, we tested if the cytoplasm-specific
lncRNAs are more under selective pressure to base pair with miRNAs than nucleus-
specific lncRNAs. We found that lncRNAs that highly expressed in the cytoplasm have
less seed match sites compared to those highly expressed in the nucleus (Figure 3). We
used the RNA abundance in the cytoplasm and nucleus measured using RNAseq in six
cell lines: GM12878, HepG2, HUVEC, K562,NHEK, HeLaS3 [147]. To analyze reliable
RNAseq measurements, we excluded transcripts whose RPKM < 1 in both the cytoplasm
and the nucleus. We declare a transcript highly expressed in the nucleus if the ratio of
RPKMs in the nucleus and the cytoplasm is greater than 10 and analogously for highly
expressed transcripts in the cytoplasm; we obtained cytoplasmic 33 lncRNAs and 104
nuclear lncRNAs out of total 866 RNAseq-measured transcripts in the six cell lines. To
test the possible repression of lncRNAs by miRNAs in each compartment, we compared
the expression levels of target and non-target lncRNAs in the cytoplasm and nucleus.
We reason that if mature miRNAs are formed in the cytoplasm and if miRNAs repress
the lncRNAs in the cytoplasm, the target transcripts should have lower expression levels
compared to non-target transcripts. Surprisingly, we found opposite results. First, we
found that the lncRNA targets have higher median expression in the cytoplasm than
lncRNAs non-targets and oppositely in the nucleus. However, higher expression is not
statistically significant except in the HeLaS3 cell line (Table 5.1, third row). In conclu-
sions, we could not find any significant difference between the expression levels of target
and non-target lncRNAs expressed in the cytoplasm and nucleus. Surprisingly, however,
for one cell line, HeLaS3, we found the relative expression of targets is lower than those
of non-targets in the nucleus where mature miRNAs are not thought not to be expressed.
Chapter 5. Impact of miRNAs on long non-coding RNAs 90
Figure 3
Table I
Cell line Median expression
in the cytoplasm
Median expression
in the nucleus
R1= Ratio target/non
target expression in
the cytoplasm
R2= Ratio target/non
target expression
in the nucleus
R2/R1
P-value (target
expression ,non target
expression in the cytoplasm
P-value (target expression, non
target expression in the nucleus
GM12878 1.55 1.70 1.20 0.93 0.77 0.13 0.31 HeLaS3 1.35 2.15 1.32 0.86 0.65 0.01 0.0098 HepG2 1.40 1.64 1.30 1.17 0.89 0.01 0.71 HUVEC 1.50 1.65 1.20 1.02 0.84 0.11 0.59 K562 2.10 2.32 0.90 0.81 0.90 0.60 0.25 NHEK 1.95 1.56 1.11 1.05 0.95 0.71 0.63
0 0.002 0.004 0.006 0.008 0.01 0.0120
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
P< 0.02
number of targeting miRNA per lncRNA length
Cum
ulat
ive
dist
ribut
ion
nucleus lncRNA targetscytoplasm lncRNA targets
Figure 5.3: Highly expressed lncRNAs in the cytoplasm have less seed match sites than thoseexpressed in the nucleus. Shown are the cumulative distribution of number of seed match sitesper transcript length for lncRNAs expressed in the nucleus (red) and cytoplasm (green).
Figure 3
Table I
Cell line Median expression
in the cytoplasm
Median expression
in the nucleus
R1= Ratio target/non
target expression in
the cytoplasm
R2= Ratio target/non
target expression
in the nucleus
R2/R1
P-value (target
expression ,non target
expression in the cytoplasm
P-value (target expression, non
target expression in the nucleus
GM12878 1.55 1.70 1.20 0.93 0.77 0.13 0.31 HeLaS3 1.35 2.15 1.32 0.86 0.65 0.01 0.0098 HepG2 1.40 1.64 1.30 1.17 0.89 0.01 0.71 HUVEC 1.50 1.65 1.20 1.02 0.84 0.11 0.59 K562 2.10 2.32 0.90 0.81 0.90 0.60 0.25 NHEK 1.95 1.56 1.11 1.05 0.95 0.71 0.63
0 0.002 0.004 0.006 0.008 0.01 0.0120
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
cumu
lative
dist
ributi
on
number of tareting miRNAs per lncRNA length
p-value =0.027691
nucleus lncRN A targetscytoplasm lncRNA targets
(RPKM)
Table 5.1: Comparison between the expression level of cytoplasmic and nucleic lncRNAs (col-umn II-IV) statistical significant of comparison (columns V-VII); each row is associated withone cell line.
5.2.4 High relative number of lncRNA targets in allosomes and
chromosomes 20-22
Our analysis showed the distribution of annotated lncRNAs across the chromosomes is
very similar to that of mRNAs (Figure 4 a and b). We found that the expression of
lncRNA targets is independent from their genomic loci. To check this, we plotted the
Chapter 5. Impact of miRNAs on long non-coding RNAs 91
sorted expression levels along with chromosomal locations for all lncRNAs. Overall, we
observed the averaged expression level of target lincRNAs across samples is independent
from their locations in the genome (Figure 4 c). Interestingly, we observed that although
number of lncRNA targets is much less than the lncRNA non-targets (about 14%), their
relative number in some chromosomes is much higher than those of non-targets, especially
in chromosomes 20-22 and X and Y (Figure 4 d); more than 10 % of detected miRNAs
are located in chromosomes X, suggesting that miRNAs and lncRNAs may interact in
this genomic locus more than the others. To check if any category of lncRNA targets
is dominantly located in a specific chromosome, we bar-plotted the distribution of each
category in each chromosome and we found no propensity for any lncRNA target category
to be located in any specific chromosome (Figure4 e)
5.2.5 LncRNAs that contain seed match sites have significantly
higher expression compared to those that lack seed match
sites
Since miRNAs tend to repress the expression levels of their targets, one way to check
the activity of miRNAs is to compare the expression of target and non-target transcripts
in different tissues. We compared the expression levels of lncRNAs that contain seed
match sites (target lncRNAs) and those that lack seed match sites (non-target lncRNAs)
within 26 tissues. We observed that target lncRNAs are significantly expressed more than
non-target lncRNAs in each individual 26 tissues (Figure 5a), suggesting that miRNAs
may have fine tuning impact on highly expressed lncRNAs or binding miRNAs may not
participate in lncRNA post-regulation. For the former scenario, it is difficult to quantify
this impact unless we conduct some miRNA induced repression experiments similar to
those available for mRNAs [33]. We also analyzed the distribution of expression levels
of mRNAs, lincRNAs, antisense lncRNA, and sense intronic lncRNAs. As shown in the
Chapter 5. Impact of miRNAs on long non-coding RNAs 92
Figure 4
0 5 10 15 20 250
200
400
600
800
1000
1200
1400
1600
1800
num
ber
of
lncR
NA
s
chromosome no.
0 50 100 150 200 250 300 350 4000
10
20
Exp
ress
ion
leve
l
lincRNAs
Association between lincRNAs and number of miRNA targeting sites
0 50 100 150 200 250 300 350 4000
20
40
chro
mo
som
e l
ocat
ion
0 5 10 15 20 250
0.02
0.04
0.06
0.08
0.1
0.12
fra
ctio
n o
f ln
cRN
As
chromosome no.
non-targettarget
0 5 10 15 20 250
20
40
60
80
100
120
140
fra
ctio
n o
f lnc
RN
As
chromosome no.
lincRNAantisense lncRNAsenseIntronic lncRNA
0 5 10 15 20 250
200
400
600
800
1000
1200
1400
# o
f ln
cRN
As
chromosome no.
non-targettargetFigure 5.4: Distribution of lncRNAs on the human chromosome. (a) the distribution of all lncR-
NAs, (b) sorted expression levels lncRNA targets superimposed by the chromosomal locationsof each lncRNA.(c) the distribution of mRNAs (from wiki) (d) the distribution of targets andnon-targets (e) the distribution of relative number of target and non-target lncRNAs; relativenumbers are computed as # of (non-) targets in each chromosome / # of all (non-) targets. (f)the distribution of all categories of lncRNA targets.
below Figure 5b mRNAs are expressed far more than all types of lncRNAs; among lncR-
NAs, lincRNAs are more expressed than the overlapping lncRNAs. Next we compared
the expression levels of genes harboring miRNAs target sites with those lacking sites
for all four classes of RNAs: mRNA, lincRNAs, antisense lncRNAs, and sense intronic
lncRNAs. We found for all RNAs harboring target sites are expressed at a significantly
higher level (Figure 5c). Only 14% of lncRNAs contain miRNA target sites far less than
90 % of mRNAs that contain miRNA target sites.
Chapter 5. Impact of miRNAs on long non-coding RNAs 93
.
6
6.5
7
7.5
8
Mea
n E
xpre
ssio
n Le
vel
Human
Testi
s
Hipoca
mp
Kidney
Amigdala
Tempo
ral S
uperi
or
Cortex
Pari
etal
Colon
Entorrin
al
Intes
tine
Cortex
Fron
tal
Placen
ta
Trach
ea
Human
Brain
Esoph
agus
Bladde
rOva
ry
Skelet
al Mus
cle
Cerebe
lLiv
erHea
rt
Cervix
Lung
Spleen
Adipos
e
Thymus
Mesen
cefal
Target lncRNAsnon-Target lncRNAs
5 10 15 200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
lincRNAantisense lncRNAsenseintronic lncRNAmRNA
log expression abundance
cum
ulat
ive
dist
ribut
ion
Figure 5.5: Abundance of target and non-target lncRNAs in 26 different tissues. (a) the bar-plotof mean of expression level of target and non-target lncRNAs in each tissue. (b) The cumulativedistributions of the expression levels of lncRNAs and mRNAs measured in the microarray study.
Chapter 5. Impact of miRNAs on long non-coding RNAs 94
5.2.6 Discussion
The functional roles of long non-coding RNAs is still debated, but recent study show
their loss of function impacts gene regulation at least in some tissues [58, 59]. Long non-
coding RNAs encode many miRNAs and contain matching target sites to the seed region
of many miRNAs, suggesting interaction between long and short non-coding RNAs. In
this study, using microarray data, we explored the possible interaction between miRNAs
and lncRNAs and the impact of this interaction on mRNAs. We found that the lncRNA
target set of 11 miRNA families have the lowest relative expression in the tissue in which
a member of the miRNA family is highly expressed. This observation however cannot
be applied to all tissue-specific miRNAs. For instance, miR-1 is reported to be a highly
expressed in heart and a skeletal muscles but in our ranked list these tissues place in 19th
and 22th, respectively. This suggests that high expression of miRNAs does not necessary
implies that they are functional as observed in other studies [144]. Our analysis is similar
to Sood et al [34] conducted on 87 tissues to predict the mRNA target set of miRNAs
in tissues. They found that the mRNA target set of eight highly tissue-specific miRNAs
are expressed at significantly lower expression compared to other tissues in the tissue the
miRNAs are reported to be highly expressed.
Studies conducted on mRNAs showed that there is no significant difference was found
between the expression level of target and non-target mRNAs [34], but for lncRNAs
surprisingly we found that the lncRNA targets have significantly higher expression than
non targets ones in each individual tissue. One possible explanation might be that highly
expressed lncRNA are also highly conserved so as they contain more conserved target
sites. We however rule out conservation since miRcode identifies all seed match sites
regardless of being conserved. miRNAs are suggested to have fine tuning impact on
miRNAs which may explain why highly expressed lncRNAs contain more target sites. In
this scenario miRNAs participate in post-transcriptional regulation of lncRNAs to adjust
their regulation. Therefore not expressed lncRNAs or those have low expression do not
Chapter 5. Impact of miRNAs on long non-coding RNAs 95
require miRNAs to mediate their regulations.
Our study is not conclusive enough to show that if miRNAs target lncRNAs in the
cytoplasm or nucleus. We could not find any significant difference between the expres-
sion level of lncRNA targets in the nucleus and cytoplasm. Nonetheless, we found that
highly expressed lncRNAs in the cytoplasm tend to have less target sites compared to
those in nucleus, establishing hypothesis that since mature miRNAs are generated in
the cytoplasm they targets lncRNAs in there, so highly expressed lncRNAs evolutionary
selected to have less target sites to escape post-transcriptional repression by miRNAs.
Lastly, we found that the relative number of lncRNAs containing seed match sites is high
in chromosomes 20-22 and X and Y. Interestingly, X-chromosome miRNAs which consti-
tute 10% of all identified human miRNAs, are suggested to have potential roles in human
immune system and schizophrenia [165, 166]. Excessive number of lncRNAs along with
miRNAs may be an indication that they both modulate the other cis functional elements
in this chromosome.
5.3 Materials
5.3.1 Microarray data
We processed the expression levels of 7,535 transcripts (52,375 unique probes) across
31 samples profiled using GSE34894 microarray that uses the GPL15094 platform. The
transcripts include 2,132 antisense lncRNA (reside on the opposite strand of protein
coding genes) 2,986 lincRNAs (resided on intergenic regions), 241 sense-intronic lncR-
NAs and 2,176 mRNAs. We averaged the probe expression levels per transcript. Many
lncRNAs have low expression levels or not expressed at all so we excluded them from
the analysis. To do this, we averaged probe expression levels per transcript and then
computed the coefficients of variation (CV) of expression levels across the samples and
excluded transcripts with CV < 0.1. After applying this filter, 791 transcripts were left
Chapter 5. Impact of miRNAs on long non-coding RNAs 96
including 603 mRNAs, 111 lincRNAs, 70 antisense lncRNAs, and 7 sense lncRNAs. The
coefficient of variation of each transcript is defined as the ratio of standard deviation and
mean of the expression levels of the genes across samples.
5.3.2 Measuring correlation between mRNAs and lncRNAs
We computed the Pearson correlation coefficients between the expression levels of all
transcript pairs and sorted the pairs based the correlation coefficients in ascending or-
der. For each pair of transcripts, we obtained the correlation coefficient, number of
targeting miRNA families, and number of shared targeting miRNAs. We focused on
mRNA-lncRNA pairs that significantly negatively or positively correlated (3,066 pairs
with Pearson correlation coefficient < −0.351 and P < 0.05). To examine which lncRNA
group (lincRNA, sense lncRNA, and antisense lncRNA) are more positively or negatively
co-expressed with mRNAs, we compared the cumulative distributions of correlation coef-
ficients associated with mRNA-lincRNA (14472), mRNA-antisense lncRNA (3618), and
mRNA-sense lncRNA (1809) pairs; we observed that sense lincRNAs are less negatively
correlated with mRNAs compared to the two other groups which exhibit same correlation
patterns with mRNAs.
5.3.3 Hyper-geometric test analysis
The test was carried out in the following manner. Let N and C denote the total number
of mRNA-lncRNA pairs and the number of positively correlated pairs in the list, respec-
tively. We chose the top M pairs in the sorted list and counted the number of positively
correlated pairs (let say K) in this set. We then tested the statistical significant of ob-
serving K positively correlated pairs in the set of top M pairs compared to C correlated
pairs in the set of N pairs using the hyper-geometric test. M was set to 400 and increased
with the step size 400 to cover all pairs in the list
Chapter 5. Impact of miRNAs on long non-coding RNAs 97
5.3.4 Identifying the complementary seed match sites in the
lncRNA transcripts
We used miRcode repository to obtain the list of lncRNAs that have complementary
match sites to the seed regions of 87 highly conserved miRNA families. miRcode scan
the entire genome and used the GENECODE V10 annotation and TargetScan protocol
for defining seed match to identify the target set for lncRNA, mRNAs, and pseudo genes.
miRcode identifies 1,048,575 target sites resided in 25,973 transcripts complementary to
the seed regions of 87 highly conserved miRNA families.
Chapter 6
Conclusions and Future Work
6.1 Conclusions
miRNAs participate in many aspect of cell biology through regulating gene expression.
miRNAs are anti-oncogenic as they regulate some oncogenes. Since 2000, scientists have
been collecting evidence to understand how miRNAs recognize their targets. Unfor-
tunately even with recent advances in technology (e.g. PARCLIP) for identifying the
binding sites of microRNA-containing ribonucleoprotein complexes, we are still unable
to identify genuine miRNA targets genome-wide; the elements involved in the miRNA
induced regulation machinery are well-identified but it is unclear under which condition
this regulatory machinery is active. Experimental methods only identify the target sets
under specific conditions and for limited number of miRNAs; therefore it is inevitable to
develop computational high throughput methods. Initially it was thought that mRNA-
miRNA sequence based determinants can provide accurate and comprehensive prediction
of targets; later however many performance evaluation studies showed that these meth-
ods have low specificity and sensitivity and surprisingly generate inconsistent target sets,
necessitating the use of other evidence to augment sequence based determinants. If miR-
NAs have detectable impact on mRNA regulation under endogenous conditions, mRNA
98
Chapter 6. Conclusions and Future Work 99
expression data are the most abundant and informative resource that can be explored
to track the footprint of miRNAs. In this thesis, we used gene expression data to re-
fine target sets of miRNAs predicted by sequence-based prediction methods. In contrast
to many computational prediction methods, we considered the multiple targeting effect
of miRNAs and we didn’t use any miRNA expression data for prediction. Given an
mRNA having multiple conserved miRNA seed match sites to one or more miRNAs,
our methods score each individual miRNA based on its impact on repressing the mRNA
expression measured under endogenous conditions and in presence of other targeting miR-
NAs. BayMiR and InMiR packages are available and can be easily used to new datasets.
We used experimentally validated target sets and miRNA over-expression experiments,
two widely used benchmarks, to evaluate merits of our methods when compared with
best available methods. We introduced InMiR that predicts the target sets of intronic
miRNAs and estimates possible co-expression of an intronic miRNA and its host mRNA.
We found 22 out 57 tested intronic miRNAs are co-expressed with their host genes. We
showed the predicted targets by InMiR highly enriched for validated targets.
We developed BayMiR a Bayesian model that predicts miRNA targets using large
set of mRNA data. BayMiR obviates the need for miRNA expression data that are not
available globally. We showed that scores provided by BayMiR better reflect miRNA
targeting impact than sequence features, namely nine determinants provided by Tar-
getScan, one of the mostly used prediction techniques. In addition we showed BayMiR
outperforms CoMeTa, a recent advanced prediction method that uses gene expression
data.
In this thesis, we also studied the possible interaction between miRNAs and lncRNAs.
We found some evidence that support the role of lncRNAs as miRNA sponges as well as
miRNAs as condition-specific regulator of lncRNAs. The human genome encodes a large
number of lncRNAs, many of which are functional. Our work is the first that incorporates
the large set of lncRNAs to explore the interaction between miRNAs and lncRNA and
Chapter 6. Conclusions and Future Work 100
their indirect impact on mRNAs
6.2 Future directions
6.2.1 A Bayesian approach to decipher the TF-miRNA-mRNA-
lncRNA regulatory network
The function of a vast region of the human genome, consisting of nearly three billion
bases, is still unknown. Researchers have already identified regions that encode proteins
that comprises less than 2% of the genome. Recently the ENCODE has released an
unprecedented expansive resource of genomic data that illuminates the possible func-
tional elements of 80% of the human genome, much of it is transcribed into functional
non-coding RNAs [167]. This new resource not only has transformed the biologists’ view
of the genome but also presents new computational and data-analysis challenges in ge-
nomics. With such a new resource in hand I propose a Bayesian graphical model to study
the interaction between mRNAs, lncRNAs, miRNAs and TFs. Our proposed network
consists of four functional elements: transcription factors (TFs), protein coding RNAs
(mRNAs), long non-coding RNAs (lncRNAs), and miRNAs. Fig. 6.1 shows a graphical
representation of the proposed model and presumed casual relationship between vari-
ables. In this model, TFs activities control the transcription rate of mRNAs, lncRNAs,
and miRNAs. Subsequently miRNAs regulate the expression of mRNAs and lncRNAs.
This model provides a wiring diagram for a cell with which we ultimately hope to predict
the impact of post-transcriptional elements on unexplored sequences, expanding insight
into the function of lncRNAs. Some lncRNAs have been shown to encode defined prod-
ucts whose sequence variants linked to human disease. Additionally these data sets allow
to explore biological function of these RNAs in major cellular sub-compartment and
different cell lines. Unlike mRNAs, lncRNAs have restricted expression in only a sub-
Chapter 6. Conclusions and Future Work 101
Research Proposal Hossein Radfar
The function of a vast region of the human genome, consisting of nearly three billion bases, is still unknown. Researchers have already identified regions that encode proteins that comprises less than 2% of the genome. Recently the ENCODE (Encyclopedia of DNA Elements) project, a consortium consisting of more than 400 scientists, has released an unprecedented expansive resource of genomic data that illuminates the functional elements of 80% of the human genome, much of it is transcribed into functional non-coding RNAs. This new resource not only has transformed the biologists’ view of the genome but also presents new computational and data-analysis challenges in genomics.
With such a new resource in hand and building on extensive experiences I obtained during my PhD program in the field of computational molecular biology, I propose a Bayesian graphical model to predict the human post-transcriptional regulatory network considering the new functional elements provided by ENCODE. Our proposed network consists of four functional elements: transcription factors (TFs), protein coding RNAs (mRNAs), long non-coding RNAs (lncRNAs), and microRNAs (miRNAs). Fig.1 shows a graphical representation of the proposed model and presumed casual relationship between variables. In this model, TFs activities control the transcription rate of mRNAs, lncRNAs, and miRNAs. Subsequently miRNAs regulate the expression of mRNAs and lncRNAs.
This model provides a wiring diagram for a cell with which we ultimately hope to predict the impact of post-transcriptional elements on unexplored sequences, expanding insight into the function of lncRNAs. There are a couple of important contributions associated with this work. The ENCODE catalogue has annotated 9640 lncRNAs, almost half of annotated protein-coding genes. Recently, it has been shown that lncRNAs encode defined products whose sequence variants linked to human disease. There are also growing numbers of experimental evidence showing that miRNAs interfere lncRNAs regulation. Our work will provide a computational genome-wide tool to predict impact of miRNAs on lncRNAs/mRNAs. Additionally, ENCODE now allows to explore biological function of these RNAs in major cellular sub-compartment and different cell lines. Unlike mRNAs, lncRNAs have restricted expression in only a subpopulation of cells. Therefore our model will be tuned to work under individual cell types and cellular compartments. The model aims at computing the posterior probability of binary variables that demonstrate the impact of TFs/ miRNAs on mRNAs /lncRNAs regulation. In this model, the inference is carried out using variational/stochastic sampling methods which we have been used in our previous works. The components of the posterior probability, i.e. the likelihood and prior probabilities, are obtained respectively from the ENCODE catalogue and sequence matching determinants already used for miRNA-mRNA pairs.
TF
mRNA miRNA cRNAln
kg, kl ,kgS , klS,
gy kw
nt
lz
?),,,,,|11( ,,,, klkgnklgklkg sstwzyorp
:, lg zy
:, nk tw
:s
:, ,, klkn
Although many aspects of post-transcriptional regulation are yet to be fully explored, we hope this
project sheds more light on global impact of miRNAs on major functional elements of every cell of every person and across time.
Figure 6.1: A graphical representation of the proposed method
population of cells. Therefore our model will be tuned to work under individual cell types
and cellular compartments. The model aims at computing the posterior probability of
binary variables that demonstrate the impact of TFs/ miRNAs on mRNAs /lncRNAs
regulation. In this model, the inference is carried out using variational/stochastic sam-
pling methods which we have been used in our previous works. The components of the
posterior probability, i.e. the likelihood and prior probabilities, are obtained respec-
tively from the ENCODE catalogue and sequence matching determinants already used
for miRNA-mRNA pairs. Although many aspects of post-transcriptional regulation are
yet to be fully explored, we hope this model sheds more light on global impact of miRNAs
on major functional elements of every cell of every person and across time.
6.2.2 Identifying lncRNA binding sites complementary to mRNA
sequences
Recently some study has shown lncRNAs partially bind to mRNAs and promote or inhibit
their regulation. There is however no genome-wide bioinformatic method available to
provide a map of these interactions; such a network can provide some biological insight
about possible regulatory impact of lncRNAs on mRNAs. We have implemented a local
sequence alignment method that can be readily applied to provide local binding regions.
Chapter 6. Conclusions and Future Work 102
Since we have no information about the length and strength of these complementary
regions the penalty scores for gaps, gap extension, wobble, mismatch should be carefully
tuned. We expect to obtain a large number of hits so we need to perform statistical
tests to refine these hits. How to perform this statistical test is unclear. Identifying this
interactions provide a lncRNA-mRNA network. This network can be analyzed in various
ways. For instance we can perform enrichment analysis to explore if the set of mRNAs
targeted by a lncRNA are enriched in a particular pathway or process.
6.2.3 Using sequence and expression evidence in parallel
Many functional elements of the human genome are potential targets of miRNAs due
to the mammalian partially bound miRNA induced regulation mechanism. Although
perfect base pairing to the seed region of miRNAs has so far been the most prominent
feature in recognizing miRNAs targets, many validated targets contain mismatch and
wobbles in their seed match sites. Moreover, many studies have demonstrated the im-
portance of other sequence structure and base pairing beyond seed match sites including
symmetric loops in the centre of the duplex, sequence match at 3’ UTR end of miRNAs.
Current methods commonly apply dynamic programming sequence alignment techniques
to score the strength of mRNA-miRNA duplexes. These scoring techniques use a set
of heuristically tuned gap, gap extension, mismatch penalties to obtain a final score; in
addition they use a heuristically determined threshold to filter out low score duplexes.
Since these parameters (e.g. threshold, gap, gap extension, and mismatch penalties) are
heuristically and globally used, they may not reflect the strength of mRNA-mRNA du-
plexes under specific conditions. For instance, one particular miRNA may have a lower
alignment score than the other but be functional under a specific condition; in this case
using a set of fixed parameters may not reveal actual targets. These shortcomings ne-
cessitate the use of probabilistic models that can effectively learn and infer condition
specific mRNA-miRNA interactions that encompass complicated and case specific base
Chapter 6. Conclusions and Future Work 103
pairing. One important aspect of our model is that in contrast to previous models that
use the expression data as a way to refine the interaction determined by sequence evi-
dences , our model tends to use the expression data in parallel to sequence evidences.
Since many miRNAs share the same targets, participate in the same pathways, or have
similar structure, we can compare miRNA-mRNA probabilistic models to decipher these
similarities. Using this model we can incorporate all information pertinent to miRNA-
gene interactions (sequence, expression and context determinants) to obtain a reliable
prediction.
Bibliography
[1] D.P. Bartel. MicroRNAs: target recognition and regulatory functions. Cell,
136(2):215–233, 2009.
[2] B. John, A.J. Enright, A. Aravin, T. Tuschl, C. Sander, D.S. Marks, et al. Human
microRNA targets. PLoS Biol, 2(11):e363, 2004.
[3] N. Rajewsky. microRNA target predictions in animals. Nature genetics, 38:S8–S13,
2006.
[4] B. Zhang, X. Pan, G.P. Cobb, and T.A. Anderson. microRNAs as oncogenes and
tumor suppressors. Developmental biology, 302(1):1–12, 2007.
[5] S.L. Ameres, J. Martinez, and R. Schroeder. Molecular basis for target RNA
recognition and cleavage by human RISC. Cell, 130(1):101–112, 2007.
[6] B.P. Lewis, I. Shih, et al. Prediction of mammalian microRNA targets. Cell,
115(7):787–798, 2003.
[7] A. Grimson, K.K.H. Farh, W.K. Johnston, P. Garrett-Engele, L.P. Lim, and D.P.
Bartel. MicroRNA targeting specificity in mammals: determinants beyond seed
pairing. Molecular cell, 27(1):91–105, 2007.
[8] D. Betel, A. Koppal, P. Agius, C. Sander, and C. Leslie. Comprehensive modeling
of microRNA targets predicts functional non-conserved and non-canonical sites.
Genome biology, 11(8):R90, 2010.
104
BIBLIOGRAPHY 105
[9] Mohsen Khorshid, Jean Hausser, Mihaela Zavolan, and Erik van Nimwegen. A bio-
physical mirna-mrna interaction model infers canonical and noncanonical targets.
Nature Methods, 2013.
[10] Doron Betel, Manda Wilson, Aaron Gabow, Debora S Marks, and Chris Sander.
The microrna. org resource: targets and expression. Nucleic acids research,
36(suppl 1):D149–D153, 2008.
[11] M. Rehmsmeier, P. Steffen, M. Hochsmann, and R. Giegerich. Fast and effective
prediction of microRNA/target duplexes. Rna, 10(10):1507, 2004.
[12] R.C. Friedman, K.K.H. Farh, C.B. Burge, and D.P. Bartel. Most mammalian
mRNAs are conserved targets of microRNAs. Genome Research, 19(1):92, 2009.
[13] C.B. Nielsen, N. Shomron, R. Sandberg, E. Hornstein, J. Kitzman, and C.B. Burge.
Determinants of targeting by endogenous and exogenous microRNAs and siRNAs.
Rna, 13(11):1894, 2007.
[14] D. Gaidatzis, E. Van Nimwegen, J. Hausser, and M. Zavolan. Inference of miRNA
targets using evolutionary conservation and pathway analysis. BMC bioinformatics,
8(1):69, 2007.
[15] M. Kertesz, N. Iovino, U. Unnerstall, U. Gaul, and E. Segal. The role of site
accessibility in microRNA target recognition. Nature genetics, 39(10):1278–1284,
2007.
[16] H. Tafer, S.L. Ameres, G. Obernosterer, C.A. Gebeshuber, R. Schroeder, J. Mar-
tinez, and I.L. Hofacker. The impact of target site accessibility on the design of
effective siRNAs. Nature biotechnology, 26(5):578–583, 2008.
[17] W.H. Majoros and U. Ohler. Spatial preferences of microRNA targets in 3’ un-
translated regions. BMC genomics, 8(1):152, 2007.
BIBLIOGRAPHY 106
[18] D.M. Garcia, D. Baek, C. Shin, G.W. Bell, A. Grimson, and D.P. Bartel. Weak
seed-pairing stability and high target-site abundance decrease the proficiency of lsy-
6 and other micrornas. Nature structural & molecular biology, 18(10):1139–1146,
2011.
[19] A. Arvey, E. Larsson, C. Sander, C.S. Leslie, and D.S. Marks. Target mrna abun-
dance dilutes microrna and sirna activity. Molecular systems biology, 6(1), 2010.
[20] W. Ritchie, S. Flamant, and J.E.J. Rasko. Predicting microRNA targets and func-
tions: traps for the unwary. Nature Methods, 6(6):397–398, 2009.
[21] C. Barbato, I. Arisi, M.E. Frizzo, R. Brandi, L. Da Sacco, and A. Masotti. Com-
putational challenges in mirna target predictions: to be or not to be a true target?
Journal of biomedicine and biotechnology, 2009, 2009.
[22] T. Saito and P. Sætrom. Micrornas–targeting and target prediction. New biotech-
nology, 27(3):243–249, 2010.
[23] M. Hammell. Computational methods to identify miRNA targets. In Seminars in
Cell & Developmental Biology. Elsevier, 2010.
[24] P. Alexiou, M. Maragkakis, G.L. Papadopoulos, M. Reczko, and A.G. Hatzigeor-
giou. Lost in translation: an assessment and perspective for computational mi-
crorna target identification. Bioinformatics, 25(23):3049–3055, 2009.
[25] H. Min and S. Yoon. Got target?: computational methods for microrna target
prediction and their extension. Experimental & molecular medicine, 42(4):233,
2010.
[26] S. Griffiths-Jones, R. J. Grocock, S. van Dongen, A. Bateman, and A.J. Enright.
miRBase: microRNA sequences, targets and gene nomenclature. NAR, 34:140–
144, 2006.
BIBLIOGRAPHY 107
[27] S. Griffiths-Jones, H. K. Saini, S. van Dongen, and A. J. Enright. miRBase: tools
for microRNA genomics. Nucleic Acids Research, 36:154–158, 2008.
[28] S. Lall, D. Grun, A. Krek, K. Chen, Y.L. Wang, C.N. Dewey, P. Sood, T. Colombo,
N. Bray, P. MacMenamin, et al. A genome-wide map of conserved microRNA
targets in C. elegans. Current biology, 16(5):460–471, 2006.
[29] I. Ioshikhes, S. Roy, and C.K. Sen. Algorithms for mapping of mRNA targets for
microRNA. DNA and Cell Biology, 26(4):265–272, 2007.
[30] J. Hausser, P. Berninger, C. Rodak, Y. Jantscher, S. Wirth, and M. Zavolan. MirZ:
an integrated microRNA expression atlas and target prediction resource. Nucleic
Acids Research, 37(Web Server issue):W266, 2009.
[31] H. Guo, N.T. Ingolia, J.S. Weissman, and D.P. Bartel. Mammalian microRNAs
predominantly act to decrease target mRNA levels. Nature, 466(7308):835–840,
2010.
[32] S. Mukherji, M.S. Ebert, G.X.Y. Zheng, J.S. Tsang, P.A. Sharp, and A. van Oude-
naarden. Micrornas can generate thresholds in target gene expression. Nature
genetics, 43(9):854–859, 2011.
[33] L.P. Lim, N.C. Lau, P. Garrett-Engele, A. Grimson, J.M. Schelter, J. Castle, D.P.
Bartel, P.S. Linsley, and J.M. Johnson. Microarray analysis shows that some mi-
croRNAs downregulate large numbers of target mRNAs. Nature, 433(7027):769–
773, 2005.
[34] P. Sood, A. Krek, M. Zavolan, G. Macino, and N. Rajewsky. Cell-type-specific
signatures of microRNAs on target mRNA expression. Proceedings of the National
Academy of Sciences of the United States of America, 103(8):2746, 2006.
BIBLIOGRAPHY 108
[35] W. Filipowicz, S.N. Bhattacharyya, and N. Sonenberg. Mechanisms of post-
transcriptional regulation by microRNAs: are the answers in sight? Nature Reviews
Genetics, 9(2):102–114, 2008.
[36] D. Baek, J. Villen, C. Shin, F.D. Camargo, S.P. Gygi, and D.P. Bartel. The impact
of microRNAs on protein output. Nature, 455(7209):64–71, 2008.
[37] M. Selbach, B. Schwanhausser, N. Thierfelder, Z. Fang, R. Khanin, and N. Ra-
jewsky. Widespread changes in protein synthesis induced by microRNAs. Nature,
455(7209):58–63, 2008.
[38] D.T. Humphreys, B.J. Westman, D.I.K. Martin, and T. Preiss. MicroRNAs control
translation initiation by inhibiting eukaryotic initiation factor 4E/cap and poly (A)
tail function. Proceedings of the National Academy of Sciences of the United States
of America, 102(47):16961, 2005.
[39] A.A. Khan, D. Betel, M.L. Miller, C. Sander, C.S. Leslie, and D.S. Marks. Transfec-
tion of small RNAs globally perturbs gene regulation by endogenous microRNAs.
Nature biotechnology, 27(6):549–555, 2009.
[40] J. Vivek, M. David, and Y. Yee. Identification of microrna-mrna modules using
microarray data. BMC Genomics, 12.
[41] B. Liu, L. Liu, A. Tsykin, G.J. Goodall, J.E. Green, M. Zhu, C.H. Kim, and J. Li.
Identifying functional mirna–mrna regulatory modules with correspondence latent
dirichlet allocation. Bioinformatics, 26(24):3105–3111, 2010.
[42] G. Sales, A. Coppe, A. Bisognin, M. Biasiolo, S. Bortoluzzi, and C. Romualdi.
Magia, a web-based tool for mirna and genes integrated analysis. Nucleic acids
research, 38(suppl 2):W352–W359, 2010.
BIBLIOGRAPHY 109
[43] W. Yu-Ping and L. Kuo-Bin. Correlation of expression profiles between microRNAs
and mRNA targets using NCI-60 data. BMC Genomics, 10.
[44] V. Jayaswal, M. Lutherborrow, D.D.F. Ma, and Y.H. Yang. Identification of mi-
crornas with regulatory potential using a matched microrna-mrna time-course data.
Nucleic acids research, 37(8):e60–e60, 2009.
[45] Y. Ruike, A. Ichimura, S. Tsuchiya, K. Shimizu, R. Kunimoto, Y. Okuno, and
G. Tsujimoto. Global correlation analysis for micro-RNA and mRNA expression
profiles in human cell lines. Journal of human genetics, 53(6):515–523, 2008.
[46] X. Li, R. Gill, N.G.F. Cooper, J.K. Yoo, and S. Datta. Modeling microrna-mrna
interactions using pls regression in human colon cancer. BMC medical genomics,
4(1):44, 2011.
[47] A. Muniategui, R. Nogales-Cadenas, M. Vazquez, X.L. Aranguren, X. Agirre,
A. Luttun, F. Prosper, A. Pascual-Montano, and A. Rubio. Quantification of
mirna-mrna interactions. PloS one, 7(2):e30766, 2012.
[48] G.T. Huang, C. Athanassiou, and P.V. Benos. mirconnx: condition-specific mrna-
microrna network integrator. Nucleic acids research, 39(suppl 2):W416–W423,
2011.
[49] S. Nam, M. Li, K. Choi, C. Balch, S. Kim, and K.P. Nephew. Microrna and
mrna integrated analysis (mmia): a web tool for examining biological functions of
microrna expression. Nucleic acids research, 37(suppl 2):W356–W362, 2009.
[50] S. Wuchty, D. Arjona, A. Li, Y. Kotliarov, J. Walling, S. Ahn, A. Zhang, D. Maric,
R. Anolik, J.C. Zenklusen, et al. Prediction of associations between micrornas and
gene expression in glioma biology. PLoS One, 6(2):e14681, 2011.
BIBLIOGRAPHY 110
[51] J. C. Huang, T. Babak, T. W. Corson, G. Chua, S. Khan, B. L. Gallie, T. R.
Hughes, B. J. Blencowe, B. J. Frey, and Q. D. Morris. Using expression profiling
data to identify human microRNA target. Nature Methods, 4:1045–1049, 2007.
[52] J. Huang, Q. Morris, and B. Frey. Detecting microRNA targets by linking sequence,
microRNA and gene expression data. In Research in Computational Molecular
Biology, pages 114–129. Springer, 2006.
[53] JC Huang, BJ Frey, and QD Morris. Comparing sequence and expression for. In
Pacific Symposium on Biocomputing, volume 13, pages 52–63, 2008.
[54] S. van Dongen, C. Abreu-Goodger, and A.J. Enright. Detecting microrna binding
and sirna off-target effects from expression data. Nature methods, 5(12):1023–1025,
2008.
[55] V. A. Gennarino and et al. Identification of microrna-regulated gene networks by
expression analysis of target genes. Genome Research, 2012.
[56] V. A. Gennarino, M. Sardiello, R. Avellino, N. Meola, V. Maselli, S. Anand, L. Cu-
tillo, A. Ballabio, and S. Banfi. MicroRNA target prediction by expression analysis
of host genes. Genome Res, 19:481–490, Dec. 2008.
[57] W. Ritchie, S. Flamant, and J.E.J. Rasko. mimirna: a microrna expression pro-
filer and classification resource designed to identify functional correlations between
micrornas and their targets. Bioinformatics, 26(2):223–227, 2010.
[58] Je-Hyun Yoon, Kotb Abdelmohsen, and Myriam Gorospe. Post-transcriptional
gene regulation by long noncoding rna. Journal of molecular biology, 2012.
[59] Mitchell Guttman, Julie Donaghey, Bryce W Carey, Manuel Garber, Jennifer K
Grenier, Glen Munson, Geneva Young, Anne Bergstrom Lucas, Robert Ach, Lau-
BIBLIOGRAPHY 111
rakay Bruhn, et al. lincrnas act in the circuitry controlling pluripotency and dif-
ferentiation. Nature, 477(7364):295–300, 2011.
[60] Margaret S Ebert and Phillip A Sharp. Microrna sponges: progress and possibili-
ties. Rna, 16(11):2043–2050, 2010.
[61] Huidong Wang, Anna Iacoangeli, Daisy Lin, Keith Williams, Robert B Denman,
Christopher UT Hellen, and Henri Tiedge. Dendritic bc1 rna in translational control
mechanisms. The Journal of cell biology, 171(5):811–821, 2005.
[62] Maite Huarte, Mitchell Guttman, David Feldser, Manuel Garber, Magdalena J
Koziol, Daniela Kenzelmann-Broz, Ahmad M Khalil, Or Zuk, Ido Amit, Michal
Rabani, et al. A large intergenic noncoding rna induced by p53 mediates global
gene repression in the p53 response. Cell, 142(3):409–419, 2010.
[63] A. Rodriguez, S. Griffiths-Jones, J.L. Ashurst, and A. Bradley. Identification
of mammalian microRNA host genes and transcription units. Genome research,
14(10a):1902, 2004.
[64] S. Baskerville and D. P. Bartel. Microarray profiling of microRNAs reveals frequent
coexpression with neighboring miRNAs and host genes. RNA, 11(3):241–247, 2005.
[65] J. Lu, G. Getz, E.A. Miska, E. Alvarez-Saavedra, J. Lamb, D. Peck, A. Sweet-
Cordero, B.L. Ebert, R.H. Mak, A.A. Ferrando, et al. MicroRNA expression profiles
classify human cancers. Nature, 435(7043):834–838, 2005.
[66] R. Bargaje, M. Hariharan, V. Scaria, and B. Pillai. Consensus miRNA expres-
sion profiles derived from interplatform normalization of microarray data. RNA,
16(1):16, 2010.
[67] Y. Liang, D. Ridzon, L. Wong, and C. Chen. Characterization of microRNA ex-
pression profiles in normal human tissues. BMC genomics, 8(1):166, 2007.
BIBLIOGRAPHY 112
[68] P.E. Blower, J.S. Verducci, S. Lin, J. Zhou, J.H. Chung, and et al. MicroRNA
expression profiles for the NCI-60 cancer cell panel. Molecular Cancer Therapeutics,
6(5):1483, 2007.
[69] D. Wang, M. Lu, J. Miao, T. Li, E. Wang, and Q. Cui. Cepred: predicting the
co-expression patterns of the human intronic microRNAs with their host genes.
PLoS One, 4(2), 2009.
[70] D. Ronchetti, M. Lionetti, L. Mosca, L. Agnelli, A. Andronache, S. Fabris, G.L.
Deliliers, and A. Neri. An integrative genomic approach reveals coordinated ex-
pression of intronic miR-335, miR-342, and miR-561 with deregulated host genes
in multiple myeloma. BMC Medical Genomics, 1(1):37, 2008.
[71] Y. K. Kim and V. N. Kim. Processing of intronic microRNAs. The EMBO Journal,
26:775–783, 2007.
[72] S.C. Li, P. Tang, and W.C. Lin. Intronic microRNA: discovery and biological
implications. DNA and Cell Biology, 26(4):195–207, 2007.
[73] J. Piriyapongsa, L. Marino-Ramırez, and I.K. Jordan. Origin and evolution of
human micrornas from transposable elements. Genetics, 176(2):1323–1337, 2007.
[74] J. Khatun. An integrated encyclopedia of dna elements in the human genome.
Nature, 2012.
[75] H. Ishizu, H. Siomi, and M.C. Siomi. Biology of piwi-interacting rnas: new insights
into biogenesis and function inside and outside of germlines. Genes & Development,
26(21):2361–2373, 2012.
[76] R.C. Lee, R.L. Feinbaum, V. Ambros, et al. The c. elegans heterochronic gene lin-4
encodes small rnas with antisense complementarity to lin-14. Cell, 75(5):843–854,
1993.
BIBLIOGRAPHY 113
[77] A. Esquela-Kerscher and F.J. Slack. Oncomirs micrornas with a role in cancer.
Nature Reviews Cancer, 6(4):259–269, 2006.
[78] K. Steffy, C. Allerson, and B. Bhat. Perspectives in microrna therapeutics. Phar-
maceutical Technology, 35:a18–s24, 2011.
[79] G.A. Calin and C.M. Croce. Microrna signatures in human cancers. Nature Reviews
Cancer, 6(11):857–866, 2006.
[80] S. Volinia, G.A. Calin, C.G. Liu, S. Ambs, A. Cimmino, F. Petrocca, R. Visone,
M. Iorio, C. Roldo, M. Ferracin, et al. A microrna expression signature of human
solid tumors defines cancer gene targets. Proceedings of the National Academy of
Sciences of the United States of America, 103(7):2257–2261, 2006.
[81] A.G. Uren, J. Kool, K. Matentzoglu, J. De Ridder, J. Mattison, M. Van Uitert,
W. Lagcher, D. Sie, E. Tanger, T. Cox, et al. Large-scale mutagenesis in¡ i¿
p19arf¡/i¿-and¡ i¿ p53¡/i¿-deficient mice identifies cancer genes and their collab-
orative networks. Cell, 133(4):727–741, 2008.
[82] C.M. Croce and G.A. Calin. mirnas, cancer, and stem cell division. Cell, 122(1):6–7,
2005.
[83] J.A. Chan, A.M. Krichevsky, and K.S. Kosik. Microrna-21 is an antiapoptotic
factor in human glioblastoma cells. Cancer research, 65(14):6029, 2005.
[84] S. Djebali, C.A. Davis, A. Merkel, A. Dobin, T. Lassmann, A. Mortazavi, A. Tanzer,
J. Lagarde, W. Lin, F. Schlesinger, et al. Landscape of transcription in human cells.
Nature, 489(7414):101–108, 2012.
[85] Inha Heo, Chirlmin Joo, Young-Kook Kim, Minju Ha, Mi-Jeong Yoon, Jun Cho,
Kyu-Hyeon Yeom, Jinju Han, and V Narry Kim. Tut4 in concert with lin28 sup-
BIBLIOGRAPHY 114
presses microrna biogenesis through pre-microrna uridylation. Cell, 138(4):696–708,
2009.
[86] G. Hutvagner and M.J. Simard. Argonaute proteins: key players in rna silencing.
Nature Reviews Molecular Cell Biology, 9(1):22–32, 2008.
[87] J.G. Ruby, C.H. Jan, and D.P. Bartel. Intronic microrna precursors that bypass
drosha processing. Nature, 448(7149):83–86, 2007.
[88] Natascha Bushati and Stephen M Cohen. microrna functions. Annu. Rev. Cell
Dev. Biol., 23:175–205, 2007.
[89] R. Parker and U. Sheth. P bodies and the control of mrna translation and degra-
dation. Molecular cell, 25(5):635–646, 2007.
[90] A. Eulalio, F. Tritschler, and E. Izaurralde. The gw182 protein family in animal
cells: new insights into domains required for mirna-mediated gene silencing. Rna,
15(8):1433–1442, 2009.
[91] Antonio J Giraldez, Yuichiro Mishima, Jason Rihel, Russell J Grocock, Stijn
Van Dongen, Kunio Inoue, Anton J Enright, and Alexander F Schier. Ze-
brafish mir-430 promotes deadenylation and clearance of maternal mrnas. science,
312(5770):75–79, 2006.
[92] A. Eulalio, E. Huntzinger, T. Nishihara, J. Rehwinkel, M. Fauser, and E. Izaurralde.
Deadenylation is a widespread effect of mirna regulation. Rna, 15(1):21–32, 2009.
[93] L. He and G.J. Hannon. Micrornas: small rnas with a big role in gene regulation.
Nature Reviews Genetics, 5(7):522–531, 2004.
[94] P.S. Linsley, J. Schelter, J. Burchard, M. Kibukawa, M.M. Martin, S.R. Bartz, J.M.
Johnson, J.M. Cummins, C.K. Raymond, H. Dai, et al. Transcripts targeted by
BIBLIOGRAPHY 115
the microrna-16 family cooperatively regulate cell cycle progression. Molecular and
cellular biology, 27(6):2240–2252, 2007.
[95] T.C. Chang, E.A. Wentzel, O.A. Kent, K. Ramachandran, M. Mullendore, K.H.
Lee, G. Feldmann, M. Yamakuchi, M. Ferlito, C.J. Lowenstein, et al. Transactiva-
tion of mir-34a by p53 broadly influences gene expression and promotes apoptosis.
Molecular cell, 26(5):745, 2007.
[96] J. Krutzfeldt, N. Rajewsky, R. Braich, K.G. Rajeev, T. Tuschl, M. Manoharan, and
M. Stoffel. Silencing of micrornas in vivo with antagomirs. Nature, 438(7068):685–
689, 2005.
[97] B. Gentner, G. Schira, A. Giustacchini, M. Amendola, B.D. Brown, M. Ponzoni,
and L. Naldini. Stable knockdown of microrna in vivo by lentiviral vectors. Nature
methods, 6(1):63–66, 2008.
[98] M.S. Ebert, J.R. Neilson, and P.A. Sharp. Microrna sponges: competitive inhibitors
of small rnas in mammalian cells. Nature methods, 4(9):721–726, 2007.
[99] Y.G. Li, P.P. Zhang, K.L. Jiao, and Y.Z. Zou. Knockdown of microrna-181 by
lentivirus mediated sirna expression vector decreases the arrhythmogenic effect of
skeletal myoblast transplantation in rat with myocardial infarction. Microvascular
research, 78(3):393–404, 2009.
[100] S.W. Chi, J.B. Zang, A. Mele, and R.B. Darnell. Argonaute hits-clip decodes
microrna–mrna interaction maps. Nature, 460(7254):479–486, 2009.
[101] M. Hafner, M. Landthaler, L. Burger, M. Khorshid, J. Hausser, P. Berninger,
A. Rothballer, M. Ascano Jr, A.C. Jungkamp, M. Munschauer, et al.
Transcriptome-wide identification of rna-binding protein and microrna target sites
by par-clip. Cell, 141(1):129–141, 2010.
BIBLIOGRAPHY 116
[102] F.E. Nicolas. Experimental validation of microrna targets using a luciferase reporter
system. Methods Mol Biol, 732:139–52, 2011.
[103] W. Van Leeuwen, M.J.M. Hagendoorn, T. Ruttink, R. Van Poecke, L.H.W. Van
Der Plas, and A.R. Van Der Krol. The use of the luciferase reporter system for in
planta gene expression studies. Plant Molecular Biology Reporter, 18(2):143–144,
2000.
[104] Ellen Siebring-van Olst, Christie Vermeulen, Renee X de Menezes, Michael How-
ell, Egbert F Smit, and Victor W van Beusechem. Affordable luciferase reporter
assay for cell-based high-throughput screening. Journal of biomolecular screening,
18(4):453–461, 2013.
[105] A. Kozomara and S. Griffiths-Jones. mirbase integrating microrna annotation and
deep-sequencing data. Nucleic acids research, 39(suppl 1):D152–D157, 2011.
[106] A. Krek, D. Grun, M.N. Poy, R. Wolf, L. Rosenberg, E.J. Epstein, P. MacMenamin,
I. da Piedade, K.C. Gunsalus, M. Stoffel, et al. Combinatorial microRNA target
predictions. Nature genetics, 37(5):495–500, 2005.
[107] N. Rajewsky, M. Vergassola, U. Gaul, and E.D. Siggia. Computational detection of
genomic cis-regulatory modules applied to body patterning in the early drosophila
embryo. BMC bioinformatics, 3(1):30, 2002.
[108] A.J. Enright, B. John, U. Gaul, T. Tuschl, C. Sander, D.S. Marks, et al. Microrna
targets in drosophila. Genome biology, 5(1):1–1, 2004.
[109] J. C. Huang, Q. D. Morris, and B. J. Frey. Bayesian inference of microRNA targets
from sequence and expression data. Journal of Computational Biology, 14:550–563,
2007.
BIBLIOGRAPHY 117
[110] M.H. Radfar, W. Wong, and Q. Morris. Computational prediction of intronic
microrna targets using host gene expression reveals novel regulatory mechanisms.
PLoS One, 6(6):e19312, 2011.
[111] A.M. Monteys, R.M. Spengler, J. Wan, L. Tecedor, K.A. Lennox, Y. Xing, and B.L.
Davidson. Structure and activity of putative intronic miRNA promoters. RNA,
16(3):495, 2010.
[112] F. Ozsolak, L.L. Poling, Z. Wang, H. Liu, X.S. Liu, R.G. Roeder, X. Zhang, J.S.
Song, and D.E. Fisher. Chromatin structure analyses identify miRNA promoters.
Genes & development, 22(22):3172, 2008.
[113] N.J. Martinez, M.C. Ow, J.S. Reece-Hoyes, M.I. Barrasa, V.R. Ambros, and A.J.M.
Walhout. Genome-scale spatiotemporal analysis of Caenorhabditis elegans mi-
croRNA promoter activity. Genome research, 18(12):2005, 2008.
[114] X. Wang, Z. Xuan, X. Zhao, Y. Li, and M.Q. Zhang. High-resolution human
core-promoter prediction with CoreBoost HM. Genome research, 19(2):266, 2009.
[115] D. Golan, C. Levy, B. Friedman, and N. Shomron. Biased hosting of intronic
microRNA genes. Bioinformatics, 26(8):992, 2010.
[116] J. Ernst, H.L. Plasterer, I. Simon, and Z. Bar-Joseph. Integrating multiple evidence
sources to predict transcription factor binding in the human genome. Genome
research, 20(4):526, 2010.
[117] D.L. Corcoran, K.V. Pandit, B. Gordon, A. Bhattacharjee, N. Kaminski, and P.V.
Benos. Features of mammalian microRNA promoters emerge from polymerase II
chromatin immunoprecipitation data. PLoS One, 4(4):5279, 2009.
[118] X. Zhou, J. Ruan, G. Wang, and W. Zhang. Characterization and identification
BIBLIOGRAPHY 118
of microRNA core promoters in four model species. PLoS Comput Biol, 3(3):e37,
2007.
[119] Richard Durbin, Sean R Eddy, Anders Krogh, and Graeme Mitchison. Biological
sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge
university press, 1998.
[120] ME Peter. Targeting of mRNAs by multiple miRNAs: the next step. Oncogene,
29(15):2161–2164, 2010.
[121] S. Vasudevan, Y. Tong, and J.A. Steitz. Switching from repression to activation:
microRNAs can up-regulate translation. Science, 318(5858):1931, 2007.
[122] S. Vasudevan and J.A. Steitz. AU-rich-element-mediated upregulation of transla-
tion by FXR1 and Argonaute 2. Cell, 128(6):1105–1118, 2007.
[123] K.D. Swisher and R. Parker. Localization to, and Effects of Pbp1, Pbp4, Lsm12,
Dhh1, and Pab1 on Stress Granules in Saccharomyces cerevisiae. 2010.
[124] R. Lowry. Concepts and applications of inferential statistics. VassarStats: Web
Site for Statistical Computation, 2005.
[125] B. John, C. Sander, D.S. Marks, et al. Prediction of human microRNA targets.
METHODS IN MOLECULAR BIOLOGY-CLIFTON THEN TOTOWA-, 342:101,
2006.
[126] P. Landgraf, M. Rusu, R. Sheridan, A. Sewer, N. Iovino, A. Aravin, S. Pfeffer,
A. Rice, A.O. Kamphorst, M. Landthaler, et al. A mammalian microRNA expres-
sion atlas based on small RNA library sequencing. Cell, 129(7):1401–1414, 2007.
[127] M. H. Radfar, W. Wong, and Q. Morris. Baymir: inferring evidence for endoge-
nous mirna-induced gene repression from mrna expression profiles. BMC Genomic,
21(23):3135–3148, 2013.
BIBLIOGRAPHY 119
[128] J. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized
linear models via coordinate descent. Journal of statistical software, 33(1):1, 2010.
[129] G.L. Papadopoulos, M. Reczko, V.A. Simossis, P. Sethupathy, and A.G. Hatzige-
orgiou. The database of experimentally supported targets: a functional update of
tarbase. Nucleic acids research, 37(suppl 1):D155–D158, 2009.
[130] I. Ulitsky, L.C. Laurent, and R. Shamir. Towards computational prediction of
microrna function and activity. Nucleic acids research, 38(15):e160–e160, 2010.
[131] R. C. Friedman et al. Most mammalian mRNAs are conserved targets of mi-
croRNAs. Genome Res., 19:92–105, 2009.
[132] Yuji Funakoshi, Yusuke Doi, Nao Hosoda, Naoyuki Uchida, Masanori Osawa, Ichio
Shimada, Masafumi Tsujimoto, Tsutomu Suzuki, Toshiaki Katada, and Shin-ichi
Hoshino. Mechanism of mrna deadenylation: evidence for a molecular interplay
between translation termination factor erf3 and mrna deadenylases. Genes & de-
velopment, 21(23):3135–3148, 2007.
[133] J. Friedman, T. Hastie, H. Hofling, and R. Tibshirani. Pathwise coordinate opti-
mization. The Annals of Applied Statistics, 1(2):302–332, 2007.
[134] M. Lukk, M. Kapushesky, J. Nikkila, H. Parkinson, A. Goncalves, W. Huber,
E. Ukkonen, and A. Brazma. A global map of human gene expression. Nature
biotechnology, 28(4):322–324, 2010.
[135] Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: a practical
and powerful approach to multiple testing. Journal of the Royal Statistical Society.
Series B (Methodological), pages 289–300, 1995.
[136] C. Cheng and L.M. Li. Inferring microrna activities by combining gene expression
with microrna target prediction. PLoS One, 3(4):e1989, 2008.
BIBLIOGRAPHY 120
[137] C. Cheng, X. Fu, P. Alves, M. Gerstein, et al. mrna expression profiles show
differential regulatory effects of micrornas between estrogen receptor-positive and
estrogen receptor-negative breast cancer. Genome Biol, 10(9):R90, 2009.
[138] Z. Liang, H. Zhou, Z. He, H. Zheng, and J. Wu. miract: a web tool for evaluating
microrna activity based on gene expression data. Nucleic acids research, 39(suppl
2):W139–W144, 2011.
[139] P. Alexiou, M. Maragkakis, G.L. Papadopoulos, V.A. Simmosis, L. Zhang, and
A.G. Hatzigeorgiou. The diana-mirextra web server: from gene expression data to
microrna function. PLoS One, 5(2):e9171, 2010.
[140] K. Le Brigand, K. Robbe-Sermesant, B. Mari, and P. Barbry. Mirontop: min-
ing micrornas targets across large scale gene expression studies. Bioinformatics,
26(24):3131–3132, 2010.
[141] S. Volinia, R. Visone, M. Galasso, E. Rossi, and C.M. Croce. Identification of
microrna activity by targets’ reverse expression. Bioinformatics, 26(1):91–97, 2010.
[142] A. Arora and D.A.C. Simpson. Individual mrna expression profiles reveal the effects
of specific micrornas. Genome biology, 9(5):R82, 2008.
[143] Z. Yu, Z. Jian, S.H. Shen, E. Purisima, and E. Wang. Global analysis of mi-
crorna target gene expression reveals that mirna targets are lower expressed in
mature mouse and drosophila tissues than in the embryos. Nucleic acids research,
35(1):152–164, 2007.
[144] Z. Liang, H. Zhou, H. Zheng, J. Wu, Z. Liang, H. Zhou, H. Zheng, J. Wu, et al.
Expression levels of micrornas are not associated with their regulatory activities.
Biology direct, 6(1):1–4, 2011.
BIBLIOGRAPHY 121
[145] V. Jayaswal, M. Lutherborrow, and Y.H. Yang. Measures of association for iden-
tifying microrna-mrna pairs of biological interest. PloS one, 7(1):e29612, 2012.
[146] J. Lu and A.G. Clark. Impact of microrna regulation on variation in human gene
expression. Genome Research, 22(7):1243–1254, 2012.
[147] Thomas Derrien, Rory Johnson, Giovanni Bussotti, Andrea Tanzer, Sarah Dje-
bali, Hagen Tilgner, Gregory Guernec, David Martin, Angelika Merkel, David G
Knowles, et al. The gencode v7 catalog of human long noncoding rnas: Analysis of
their gene structure, evolution, and expression. Genome research, 22(9):1775–1789,
2012.
[148] Qing-Fei Yin, Li Yang, Yang Zhang, Jian-Feng Xiang, Yue-Wei Wu, Gordon G
Carmichael, and Ling-Ling Chen. Long noncoding rnas with snorna ends. Molecular
Cell, 2012.
[149] Ido Amit Mitchell Guttman, Manuel Garber, Courtney French, Michael F Lin,
David Feldser, Maite Huarte, Or Zuk, Bryce W Carey, John P Cassady, Moran N
Cabili, et al. Chromatin signature reveals over a thousand highly conserved large
non-coding rnas in mammals. Nature, 458(7235):223–227, 2009.
[150] Chenguang Gong and Lynne E Maquat. lncrnas transactivate stau1-mediated mrna
decay by duplexing with 3 [prime] utrs via alu elements. Nature, 470(7333):284–288,
2011.
[151] Liran Juan, Guohua Wang, Milan Radovich, Bryan P Schneider, Susan E Clare,
Yadong Wang, and Yunlong Liu. Potential roles of micrornas in regulating long
intergenic noncoding rnas. BMC medical genomics, 6(Suppl 1):S7, 2013.
[152] Ashwini Jeggari, Debora S Marks, and Erik Larsson. mircode: a map of puta-
tive microrna target sites in the long non-coding transcriptome. Bioinformatics,
28(15):2062–2063, 2012.
BIBLIOGRAPHY 122
[153] Jiangchao Li, Xiaodong Li, Yan Li, Hong Yang, Lijing Wang, Yanru Qin, Haibo
Liu, Li Fu, and Xin-Yuan Guan. Cell-specific detection of mir-375 downregulation
for predicting the prognosis of esophageal squamous cell carcinoma by mirna in
situ hybridization. PloS one, 8(1):e53582, 2013.
[154] M Nielsen, JH Hansen, J Hedegaard, RO Nielsen, F Panitz, C Bendixen, and
B Thomsen. Microrna identity and abundance in porcine skeletal muscles deter-
mined by deep sequencing. Animal genetics, 41(2):159–168, 2010.
[155] Federica Collino, Maria Chiara Deregibus, Stefania Bruno, Luca Sterpone, Giulia
Aghemo, Laura Viltono, Ciro Tetta, and Giovanni Camussi. Microvesicles derived
from adult human bone marrow and tissue specific mesenchymal stem cells shuttle
selected pattern of mirnas. PLoS One, 5(7):e11803, 2010.
[156] Ruotian Li, Guijun Yan, Qiaoling Li, Haixiang Sun, Yali Hu, Jianxin Sun, and
Biao Xu. Microrna-145 protects cardiomyocytes against hydrogen peroxide (h2o2)-
induced apoptosis through targeting the mitochondria apoptotic pathway. PloS
one, 7(9):e44907, 2012.
[157] Carla P Concepcion, Yoon-Chi Han, Ping Mu, Ciro Bonetti, Evelyn Yao, Aleco
D’Andrea, Joana A Vidigal, William P Maughan, Paul Ogrodowski, and Andrea
Ventura. Intact p53-dependent responses in mir-34–deficient mice. PLoS Genetics,
8(7):e1002797, 2012.
[158] Li Wang, Xin Chen, Yanyan Zheng, Fen Li, Zheng Lu, Chen Chen, Jin Liu,
Yu Wang, Yajing Peng, Zhongliang Shen, et al. Mir-23a inhibits myogenic differen-
tiation through down regulation of fast myosin heavy chain isoforms. Experimental
Cell Research, 2012.
[159] Mariana Lagos-Quintana, Reinhard Rauhut, Abdullah Yalcin, Jutta Meyer, Win-
BIBLIOGRAPHY 123
fried Lendeckel, and Thomas Tuschl. Identification of tissue-specific micrornas from
mouse. Current Biology, 12(9):735–739, 2002.
[160] Juanjie Bo, Guoliang Yang, Kailing Huo, Haifeng Jiang, Lianhua Zhang, Dongming
Liu, and Yiran Huang. microrna-203 suppresses bladder cancer development by
repressing bcl-w expression. Febs Journal, 278(5):786–792, 2011.
[161] Shui-Long Guo, Zheng Peng, Xue Yang, Kai-Ji Fan, Hui Ye, Zhen-Hua Li, Yan
Wang, Xiao-Li Xu, Jun Li, You-Liang Wang, et al. mir-148a promoted cell prolif-
eration by targeting p27 in gastric cancer cells. International journal of biological
sciences, 7(5):567, 2011.
[162] Lawrence S Hon, Zemin Zhang, et al. The roles of binding site arrangement and
combinatorial targeting in microrna repression of gene expression. Genome Biol,
8(8):R166, 2007.
[163] Li Huang, Junhua Luo, Qingqing Cai, Qiuhui Pan, Hong Zeng, Zhenghui Guo, Wen
Dong, Jian Huang, and Tianxin Lin. Microrna-125b suppresses the development
of bladder cancer by targeting e2f3. International Journal of Cancer, 128(8):1758–
1769, 2011.
[164] Marcella Cesana, Davide Cacchiarelli, Ivano Legnini, Tiziana Santini, Olga
Sthandier, Mauro Chinappi, Anna Tramontano, and Irene Bozzoni. A long noncod-
ing rna controls muscle differentiation by functioning as a competing endogenous
rna. Cell, 147(2):358–369, 2011.
[165] Jinong Feng, Guihua Sun, Jin Yan, Katie Noltner, Wenyan Li, Carolyn H Buzin,
Jeff Longmate, Leonard L Heston, John Rossi, and Steve S Sommer. Evidence
for x-chromosomal schizophrenia associated with microrna alterations. PLoS One,
4(7):e6121, 2009.
BIBLIOGRAPHY 124
[166] Iris Pinheiro, Lien Dejager, and Claude Libert. X-chromosome-located micrornas in
immunity: Might they explain male/female differences? Bioessays, 33(11):791–802,
2011.
[167] Ian Dunham, Ewan Birney, Bryan R Lajoie, Amartya Sanyal, Xianjun Dong,
Melissa Greven, Xinying Lin, Jie Wang, Troy W Whitfield, Jiali Zhuang, et al.
An integrated encyclopedia of dna elements in the human genome. 2012.