sequencin g data report · d and the f rt. experim rt. the stat ary is presen mended sof e...
TRANSCRIPT
Prepared by
mic
by LC Sciences
2575 WT
SequcroRNA
s, LLC | ww
W. Bellfort, SuTel. 713 664-
uencinDiscove
sampl
Dr.
Life Science
Pr
LC S
Ja
w.LCSciences
uite 270, Hous7087, Fax 71
ng Dataery Sequ
On
e_473623
For
Researcher
es University
Prepared by
Sciences, LLC
an. 1, 2009
s.com | sup
ston, Texas 73 664-8181
a Repouencing
344
r
of USA
C
port@LCScien
77054
ort g Servic
nces.com
ce
microR
I. PRO
Pro
Tab
D
II. DA
Thedeesavmetgivedatathe
Term
RNA Discove
LC S
OJECT INFO
oject related i
ble 1. Sampl
Customer Samp
Sam
Date Sample
Service R
Data Analysis R
LCS Project
LCS S
ATA REPORT
e received RNep sequencinved onto a DVthods were den in file Daa files which data are give
rminologies Use
Sequ Seq: Ra
Unique Seq:
Copy Number
Count: Numb
Mapping: Al
Mir: pre-miR
miR: mature
ery Sequen
ciences, LLC
2575 WT
ORMATION
information i
le, service, an
Pr
ple Name:
mple Type:
Received:
Requested:
Requested:
t Number:
Sample ID:
T
NA sample wng. The dataVD disc whidescribed in ata_summary
may be in teen in Table 3
ed
aw sequencing
Family of seq
r: Number of
ber of sequ seq
ligning a seque
RNA registered
miRNAs regis
ncing Data R
| www.LCSc
W. Bellfort, SuTel. 713 664-
is listed in Ta
nd project tra
roject Informa
sample_47
Human to
12/15/20
microRNA
Standard D
47362344
sample1
was processea generated wch is includeSection III
y_sample1.xlsens of Mbs a3.
reads generate
u seq with sam
sequ seqs in th
qs in the same
ence to a refere
d in miRBase
stered in miRB
Report – sa
ciences.com
uite 270, Hous7087, Fax 71
able 1.
acking inform
ation
7362344
otal RNA
09
A Discovery Se
Data Analysis
ed to generatwere analyze
ed in this repoof this repoand a summnd the recom
ed in after imag
me sequence
he same unique
unique seq fam
ence database
Base
ample_4736
| support@L
ston, Texas 73 664-8181
mation
equencing Servi
te a cDNA lied and the fort. Experimrt. The stat
mary is presenmmended sof
ge extraction a
e seq family
mily
62344
LCSciences.co
77054
ice
ibrary which full data filesmental procedtistics of the nted in Tableftware progra
and base-calling
2
om
was then uss of 2-3 Gb dures and andata analysi
e 2. The deams for revie
g
sed to were
nalysis s was
etailed ewing
microR
Tab
Raw Mappable Mapped to Mapped to Mapped to Mapped to Mapped to Mapped (toNohit (incl
Floand
41% unma
m
41% unma
m
RNA Discove
LC S
ble 2. A sum
miRbase (inclu Cluster I Cluster II Cluster III Cluster IV otal) luding nohit 1 a
ow chart of sd the number
4,722,478 reare mappedmiRNA ca
apped
91% mappable
7.42% mRN1.35% ADT 0.05% Junk0.07% Sequ
9,031,007(91%) are m
4,722,478 reare mappedmiRNA ca
apped
91% mappable
7.42% mRN1.35% ADT 0.05% Junk0.07% Sequ
9,031,007(91%) are m
ery Sequen
ciences, LLC
2575 WT
mmary of stan
uding nohit 1)
and nohit 2)
sequencing dr of miRNAs
eads (59%) d to or are
andidates
NA, RFam,repbasfilter
k filteruence pattern filt
7 reads mappable
eads (59%) d to or are
andidates
NA, RFam,repbasfilter
k filteruence pattern filt
7 reads mappable
ncing Data R
| www.LCSc
W. Bellfort, SuTel. 713 664-
ndard data an
#Sequ
6,870,4,764,4,200,4,181,
1102,1113,19
4,196,567,7
data analysis s detected.
se filter
ter
se filter
ter
Report – sa
ciences.com
uite 270, Hous7087, Fax 71
nalysis result
uSeq %Se
,727 10,254 69,913 61,077 600 016 091 0,494 61760 8
of a single s
6
pas
19% (gro149 miRNA
38% (gro292 miRNAs
6
pas
19% (gro149 miRNA
38% (gro292 miRNAs
ample_4736
| support@L
ston, Texas 73 664-8181
ts
equSeq #
0.00%9.34%1.14%0.85%.00%.03%.19%1.08%.26%
equencing re
6.07% length<15
88%ss optional filte
7,944,551 repassed opt
oup 3)As detected
our 4)detected
6.07% length<15
88%ss optional filte
7,944,551 repassed opt
oup 3)As detected
our 4)detected
62344
LCSciences.co
77054
#UniqueSeq
1,445,905 40,890 8,971 7,930
9134
1,128 9,201 31,689
eaction throu
5.96% copy#<
or > 26
r
eads (88%) ional filter
39300 m
4% (g27 mi
5.96% copy#<
or > 26
r
eads (88%) ional filter
39300 m
4% (g27 mi
3
om
%UniqueSeq
100.00%2.83%0.62%0.55%0.00%0.01%0.08%0.64%2.19%
ugh various f
<3
9% (group 1)miRNA detecte
group 2)RNAs detected
<3
9% (group 1)miRNA detecte
group 2)RNAs detected
filters
d
d
d
d
microR
Tab
Folder
Raw Data
Filtered Data
Mapped Data
Summary
RNA Discove
LC S
ble 3. Data
D
sample1_RawD
sample1_FHG_
sample1_FHG_
sample1_FHG_
sample1_FHG_
sample1_FHG_
sample1_FHG_
sample1_FHG_
sample1_FHG_sample1_FHG_sample1_FHG_
sample1_FHG_sample1_FHG_sample1_FHG_
sample1_FHG_sample1_FHG_sample1_FHG_
sample1_FHG_sample1_FHG_sample1_FHG_
sample1_FHG_
sample1_FHG_
sample1_FHG_
sample1_FHG_
Data_summary
ery Sequen
ciences, LLC
2575 WT
files delivere
Data Files
ata.txt
_unique.txt
_pass.txt
_long.txt
_short.txt
_hc.txt
_lc.txt
_db.fa
_gp1_Align.txt _gp1_miRlist.tx_gp1_Sum.txt
_gp2_Align.txt _gp2_miRlist.tx_gp2_Sum.txt
_gp3_Align.txt _gp3_miRlist.tx_gp3_Sum.txt
_gp4_Align.txt _gp4_miRlist.tx_gp4_Sum.txt
_uni_miRs.txt
_nohit.txt
_clusterPosition
_miRdistributio
y_sample1.xls
ncing Data R
| www.LCSc
W. Bellfort, SuTel. 713 664-
ed and progra
Seob
Se
Un
Un
Un
Un
Un
Fi
xt Cl
xt Cl
xt Cl
xt Cl
Thto
Unre
n.txt Gth
on.png Plin
Stste
Report – sa
ciences.com
uite 270, Hous7087, Fax 71
ams recomm
Description
equencing sequebtained from se
equ seqs listed by
Unique seqs passed
Unique seqs with l
Unique seqs with l
Unique seqs with c
Unique seqs with c
inal mappable u
luster I: see Tab
luster II: see Ta
luster III: see T
luster IV: see T
he list of all uniqo IV
Unique seqs havingference librarie
enomic chromohe mapped uniqu
lot of position oside genome
tatistics of data eps and the fina
ample_4736
| support@L
ston, Texas 73 664-8181
mended for re
ences (sequ seqs)equencer
y family (unique
d digital filters
length >= 15
length < 15
copy number >
copy number <
unique seqs
ble 4
able 4
Table 4
Table 4
ique seqs from C
g no hit with es or the genom
osomal positionue seqs
of mapped uniq
analysis at the val results
62344
LCSciences.co
77054
eviewing
RePr
) as Wor
seqs) Wor
Wor
Wor
Wor
>=3 Wor
<3 Wor
Wor
WorExceExce
WorExceExce
WorExceExce
WorExceExce
luster I Wor
me Wor
ns of Exce
que seqs Pain
various Exce
4
om
eviewing rogram
rdpad
rdpad
rdpad
rdpad
rdpad
rdpad
rdpad
rdpad
rdpad el el
rdpad el el
rdpad el el
rdpad el el
rdpad
rdpad
el
t
el
microR
III. M
A.
B.
RNA Discove
LC S
ETHODS AN
Small RNA
A small RNsample predescribed b
1. Small RN
The recUrea poquantifie
2. 5’ and 3
The SRAT4 RNAborate-Ewere isoa secondfragmenprecipita
3. Reverse T
The ligaM-MLVamplifiesmall RN
4. Purificati
PCR progel of ~were prmini-flu(Invitrog
10 L w
Deep Sequ
The purifieand then sinstrument
ery Sequen
ciences, LLC
2575 WT
ND EXPERIM
A Library C
NA library weparation inbelow.
NA Isolation
ceived total Rolyacrylamideed following
’ Adapter Liga
A 5’ adapter A ligase (ProEDTA-Urea olated. The Sd size-fractionts of size ation.
Transcription a
ated RNA fraV (Invitrogened with pfx DNA primers
ion of Amplifie
oducts prepa~80-115 bps recipitated anuorometer (Tgen). The co
was used in se
uencing
ed cDNA libsequenced ont. Raw seque
ncing Data R
| www.LCSc
W. Bellfort, SuTel. 713 664-
ENTS
Construction
was generatestruction 1 . A
by Denaturing
RNA samplee gel. The gel elution, a
ation
(Illumina) womega). Thepolyacrylam
SRA 3’ adaptonation using
~64-99 nt
and PCR Amp
agments wern) with RT-pDNA polymeset.
ed cDNA Lib
ared were puwas excised
nd quantifiedTurner Biosysoncentration
equencing re
brary was usen Illumina G
encing reads
Report – sa
ciences.com
uite 270, Hous7087, Fax 71
n
ed from the A summary
g PAGE Gel
e was size-frRNA fragmand ethanol p
was ligated toe ligated RNmide gel and
er (Illumina)g the same gets were isol
plification
re reverse traprimers recomerase (Invitro
brary for Seque
urified on a 1d. This fractid on Nanodstems) using of the samp
action.
ed for clusteGAIIx followwere obtaine
ample_4736
| support@L
ston, Texas 73 664-8181
customer saof the pro
ractionated oments of leng
precipitated.
the aforemeAs were sizethe RNA f
ligation wasel condition alated throug
anscribed to mmended byogen) in 20 c
encing
2% TBE poon was eluterop (ThermoPicogreen®
ple was adjus
er generationwing vendor’ed using Illu
62344
LCSciences.co
77054
ample accordcedures per
on a 15% trgth 15-50 n
entioned RNe-fractionatefragments ofs then perforas described gh gel eluti
single-strandy Illumina. Tcycles of PCR
olyacrylamideed and the ro Scientific)
® dsDNA quasted to ~10 n
n on Illumina’s instruction
umina’s Pipel
5
om
ding to Illumformed is b
ris-borate-EDnts were iso
NA fragmentsed on a 15%f size ~41-7rmed, followabove. The ion and eth
ded cDNAsThe cDNAsR using Illum
e gel and a slirecovered cD
and on TBSantization renM and a to
a’s Cluster Stn for runninline v1.5 sof
mina’s briefly
DTA-olated,
s with % tris-76 nts ed by RNA hanol
using were
mina’s
ice of DNAs S-380
eagent tal of
tation ng the ftware
microR
C.
RNA Discove
LC S
following Pipeline Bsample1_Rain the next
Standard D
A proprietstandard danalysis res
5. Obtaining
After thimage dun-mappand used
a. Gen
In thfamiAn e
23 T
whecoun
b. Gen
In tchemdetedatastore
c. Filte
In tleng15 sampsamp
ery Sequen
ciences, LLC
2575 WT
sequencing iBustard MoawData.txt an
Section.
Data Analys
tary softwardata analysis. sults are desc
ng Mappable S
he raw sequedata, a series pable sequend for mappin
nerating Uniq
his step, the ily of sequenexample of a
TTTGTCGG
ere 23 is the nt (copy num
nerating Map
this step, thmistry and ector were reabase files. Ted in file samp
ering Unique S
this step, unigths. Unique
nts for ple1_FHG_lople1_FHG_sh
ncing Data R
| www.LCSc
W. Bellfort, SuTel. 713 664-
image analysodule. The nd were then
sis
re package, The key fu
cribed here.
equences from R
ence reads, oof digital fil
ncing reads. ng.
que Families
same sequ seqnces (uniquea typical entry
GTCTTTGG
index of thimber) of the s
pable Sequ S
he “impurity”processes, amoved to giv
Those remainmple1_FHG_ p
Seqs by Leng
ique seqs wereseqs with seqmicroRNA
ong.txt, while hort.txt.
Report – sa
ciences.com
uite 270, Hous7087, Fax 71
sis by Pipeliextracted sused in the
ACGT101-munctions perf
Raw Sequencin
r sequenced lters (LC SciA Fasta file
of Sequ Seqs
qs in the raw e seqs) file, sy of this file i
GATATGCC
s sequence, sequ seq.
eqs
” sequencesand the optve sequ seqs w
ning sequ seqspass.txt.
gth
e separated iquence lengt
discovery)those of sho
ample_4736
| support@L
ston, Texas 73 664-8181
ine Firecrestsequencing standard dat
miR v3.x (Lformed by th
ng Data
sequences (iences) were named samp
by Sorting R
data file wersample1_FHis as shown b
GTGTGAC
followed by
due to samtical digital which were u
were groupe
into two groth greater tha) were saorter length
62344
LCSciences.co
77054
t Module anreads were
ta analysis, w
LC Scienceshis software
(sequ seqs) weemployed to
ple1_FHG_db
Raw Sequenci
re being counHG_unique.tx
below:
CAATGGTG
the sequ seq,
mple prepararesolution o
used to map wed by familie
oups based oan a cut-off aved in thwere saved
6
om
nd base-calline stored inwhich is desc
s), was usedand the rel
re extracted o remove vab.fa was gene
ing Reads
nted and a uxt, was gener
G 1,8560
and 18560 i
ation, sequeof the sequwith the refees (unique seqs
on their sequlength (defahe file nin the file n
ng by n file cribed
d for levant
from arious erated
unique rated.
is the
ncing uencer erence s) and
uence ault = named named
microRRNA Discove
LC S
d. Filte
In thwithstorestoremea
e. Rem
Stanmapthis
6. Mapping
In this S(mir) andgenomeon mirs mappingthe analyare sum
a. Map
TheThe whilmiR
b. Map
The werewere“uniseqs Theifile w
Theas “u
ery Sequen
ciences, LLC
2575 WT
ering Unique S
his step, uniqh copy numbed in the filed in the fil
ans low copy.
moving Uniqu
ndard procedpped to mRNstep were sa
g Mappable Un
Section, variod mature miR
e based on thof interest
gs were docuyses are presmarized in T
pping Unique
cleaned uniqmapped uni
le the remaiRbase”.
pping Mirs M
mirs to whice mapped we sorted outique seqs mappwere categoir alignmentwas also gene
unique seqs munique seqs ma
ncing Data R
| www.LCSc
W. Bellfort, SuTel. 713 664-
Seqs by Copy
que seqs werebers greater le named sale named sam.
ue Seqs from C
dures were NA, RFam aaved in the fil
nique Seqs to M
ous “mappinRNA (miR) she public releagainst genoumented in sented belowTable 4.
Seqs to Mirs
que seqs in samique seqs wereining ones w
Mapped by Un
ch the uniquewere further bt and the unped to mirs th
orized as Cluts were preserated and sa
mapped to thapped to mir
Report – sa
ciences.com
uite 270, Hous7087, Fax 71
y Number
e further sortthan a prede
ample1_FHG_mple1_FHG_
Certain Know
employed tand Repbase.le called samp
Mirs and Geno
ngs” were persequences liseases of approme sequencthe ACGT-1
w and the cha
in miRbase
mple1_FHG_e grouped aswere groupe
nique Seqs to G
e seqs in “uniqblasted againnique seqs asshat further m
uster I and sented in fileaved as sampl
he mirs that wrs that un-ma
ample_4736
| support@L
ston, Texas 73 664-8181
ted based onefined cut-of_hc.txt while
_lc.txt, wherea
wn RNA Re
o remove t. The uniqueple1_FHG_db
ome
rformed on ted in the latropriate specce. Methods 101 User’s Maracteristics o
_db.fa were bls “unique seqsed as “uniqu
Genome
que seqs mappnst genome. ociated with
mapped to gesaved in filee sample1_FHle1_FHG_gp
were not mapapped to geno
62344
LCSciences.co
77054
n their copy ff number (d
those with as hc means
ference Data
those unique seqs which pb.fa.
unique seqs agtest release ocies. Mappingand criteria
Manual5. Brieof various gro
lasted againsmapped to
ue seqs un-m
ped to mirs inThe mirs ma
h these mirs enome”. Thsample1_FH
HG_gp1_Alig1_Sum.txt.
pped to genoome”.
7
om
numbers. Tdefault = 3) less copies high copy a
abases
seqs which passed the fil
gainst pre-miof miRBase2, 3
gs were also used for va
ef descriptiooups of uniqu
t mirs in miRmirs in miRbapped to m
n miRbase” gapped to genwere group
his group of uHG_gp1_miRlign.txt. A sum
ome were gro
Those were were
and lc
were lter at
iRNA 3, 4, or done
arious ns of ue seqs
Rbase. base”,
mirs in
group nome ed as unique list.txt.
mmary
ouped
microRRNA Discove
LC S
c. Map
The genohereto sampsampsamp
The uniq
d. Map
The mirs mapmap“unigenomsampsampsamp
The groutheir
e. Map
UniqagainexteformuniqgenoCluspresand
All map
ery Sequen
ciences, LLC
2575 WT
pping Unique
unique seqs ome” were be were group
genome” aple1_FHG_gpple1_FHG_gpple1_FHG_gp
remaining uue seqs nor th
pping Unique
unique seqs imapped to gen
pped to any pped. All uniqique seqs mappme” and ple1_FHG_gpple1_FHG_gpple1_FHG_gp
rest of uniquped as “uniqr mirs mappe
pping Unique
que seqs in thenst genome
ended sequenmation of staue seqs were ome with poster IV and sented in filesaved as samp
unique seqs ipped miRs or
ncing Data R
| www.LCSc
W. Bellfort, SuTel. 713 664-
Seqs with Mi
in the groublasted again
ped as “uniquand were p2_miRlist.txtp2_Align.txt. p2_Sum.txt.
unique seqs weheir mirs mapp
e Seqs to miR
n the group nome” were fmature miR
ique seqs in thped to mirs awere categ
p3_miRlist.txtp3_Align.txt. p3_Sum.txt.
que seqs in thque seqs mapp
ed to genome
Seqs Un-map
e group of “u directly annces of the able hairpins.
then groupeossible hairpsaved in file
e sample1_FHmple1_FHG_gp
in Cluster Ir predicted m
Report – sa
ciences.com
uite 270, Hous7087, Fax 71
irs Un-mappe
up “unique snst genome. ue seqs mappe
categorized t. Their aA summary
ere grouped aped to genom
Rs
“unique seqs further categ
RNAs (miRs) his group thaand miRs but gorized ast. Their aA summary
he group thaped to mirs be” and terme
pped to Mirs
unique seqs unnd those mmapped gen. When stabled as “uniqu
pin formatione sample1_FHHG_gp4_Aliggp4_Sum.txt
to IV wermiRs.
ample_4736
| support@L
ston, Texas 73 664-8181
ed to Genom
seqs mapped The unique d to mirs and
as Clustealignments y file was a
as “unique seqme”.
mapped to mirgorized based
in the mirsat were mapt neither uniqu Cluster alignments y file was a
at were un-mbut not to md as “unique s
s to Genome
n-mapped to apped to gnome sequenle hairpins wue seqs un-man”. These un
HG_gp4_miRgn.txt. A sum
re listed in
62344
LCSciences.co
77054
me to Genom
to mirs thaseqs that ma
d genome buer II and
were prealso generate
qs mapped to
rs but neither ud on whetheto which thped to miRs ue seqs nor th
III and were pre
also generate
mapped to mmiR and neithseqs nohit 1”.
mirs in miRbgenome werences were te
were predictedapped to mirnique seqs we
Rlist.txt. Theirmmary file w
sample1_FHG
8
om
me
at un-mappeapped to genut mirs un-ma
saved in esented in ed and save
o mirs but ne
unique seqs noer unique seqse unique seqs
s were groupheir mirs mapp
saved in esented in ed and save
miRs were fuher unique seq.
base” were ble identified. ested for pod, their assocrs but mappere categorizr alignments
was also gene
G_uni_miRs.t
ed to nome apped
file file
ed as
either
or their were were
ped as ped to
file file
ed as
urther qs nor
lasted The
ssible ciated ed to ed as were
erated
txt as
microR
Table 4.
Clusters
Cluster I
Cluster II
Cluster III
Cluster IV
Unique seqnohit
* N
√×
RNA Discove
LC S
The grouseqs
The were
f. Plot
The chrosampsamp
Summary o
Unique seqmapped to
Unique seqbut mirs u
I
Unique seqneither unto genom
V
Unique seqmapped tohairpin fo
qs
Unique seqmiR and nmapped to
Unique seqgenome (u
Note:
√ indicates t× indicates t
ery Sequen
ciences, LLC
2575 WT
unique seqs tuped as “uniqnohit 2”.
unique seqs ie combined a
t of the Chro
genomic pomosomes ple1_FHG_clple1_FHG_m
f mapping o
Group Descri
qs mapped to mio genome
qs mapped to min-mapped to g
qs mapped to minique seqs nor thee
qs un-mapped too genome with
ormation
qs mapped to mineither unique seo genome (uniq
qs un-mapped tounique seqs nohit
that a unique sthat a unique
ncing Data R
| www.LCSc
W. Bellfort, SuTel. 713 664-
that were maque seqs un-m
in both grouas “unique seq
omosome Ge
positions of and the
lustposition.tmiRdistributio
f unique seqs t
iption
irs that further
irs and genomeenome
irs and miRs buteir mirs mapped
o mirs but possible
irs but not to qs nor their mir
que seqs nohit 1)
o mirs and t 2)
seq was mappseq was not m
Report – sa
ciences.com
uite 270, Hous7087, Fax 71
apped neitheapped to mir
ups of “uniqqs nohit” and
enomic Positi
f the Clusteresults
txt and don.png.
to mirs, miRs,
mir
√
√
t d √
×
rs √
×
ped to the mimapped to th
ample_4736
| support@L
ston, Texas 73 664-8181
er to mirs in rs and genom
ue seqs nohitd saved in file
ions of the M
r I to IV were savedisplayed i
and genome
Unique se
Genome
√
√
×
√
×
×
ir, miR, or genhe mir, miR, o
62344
LCSciences.co
77054
mirBase norme” and were
t 1” and “une sample1_FH
Mapped Uniq
sequences wed in thin the pl
e*
eqs Mapped*
miR Com
mirto g
√ mirto g
× mirto g
nome. or genome.
9
om
r to genome e termed as “u
nique seqs nohHG_nohit.txt.
que Seqs
were mappehe file nlot file n
mments
rs un-mapped genome
rs un-mapped genome
rs un-mapped genome
were unique
hit 2”
ed to named named
microR
Leng
111112222222
Total
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
RNA Discove
LC S
gth (nt)
15 16 17 18 19 20 21 22 23 24 25 26
105,314
0
0,000
0,000
0,000
0,000
0,000
0,000
0,000
0,000
15
ery Sequen
ciences, LLC
2575 WT
Le
# of Re
105,3181,1377,67189,77200,63377,82
2,203,53,533,5926,89184,7142,9320,54
7,944,5
4 81,131 77,6
16 17
ncing Data R
| www.LCSc
W. Bellfort, SuTel. 713 664-
ength Distrib
eads
14 1 2 76 39 24 539 567 97 13 0
49 551
672189,776200
7 18
Report – sa
ciences.com
uite 270, Hous7087, Fax 71
bution of Ma
% of Total Reads 1.33 1.02 0.98 2.39 2.53 4.76 27.74 44.48 11.67 2.33 0.54 0.26 100
0,639377,824
2,
19 20
ample_4736
| support@L
ston, Texas 73 664-8181
appable Data
# of USe291311152027516352291711
34,
,203,539
3,533,5
21 22
62344
LCSciences.co
77054
a
Unique eqs
R
990 344 54
519 003 797 34
370 290 989 762 19 471
567
926,897
184,
23 24
10
om
Reads # / Unseqs35.260.467.3124.9100.2135.1429.2554.7175.261.824.418.4230.5
,71342,930 20
4 25
nique
0,549
26
microRNA Discovery Sequencing Data Report – sample_47362344 11
LC Sciences, LLC | www.LCSciences.com | [email protected]
2575 W. Bellfort, Suite 270, Houston, Texas 77054 Tel. 713 664-7087, Fax 713 664-8181
Chromosomal location of pre-miRNAs. The relative locations of individual pre-miRNAs (mir) are shown across the 19 chromosomes. MID (Maximum Inter-Distance) is the maximum distance between any two pre-miRNAs on a same chromosome considered to be in the same cluster. Fifty-nine clusters (black dots) are obtained under the MID is limited to 50 kb.
microR
IV. RE
1
2
3
4 5
RNA Discove
LC S
FERENCE
“Preparing 2008;
Griffiths-JomicroRNA
Griffiths-JomicroRNA D140-D144
Griffiths-Jo
LC Sciences
ery Sequen
ciences, LLC
2575 WT
Samples for
nes, S., Saigenomics, 20
nes, S., Grosequences,
4;
nes, S., The m
s ACGT 101
ncing Data R
| www.LCSc
W. Bellfort, SuTel. 713 664-
Analysis of
ini, H.K., v008, Nucleic A
ocock, R.J., vtargets and
microRNA R
manual;
Report – sa
ciences.com
uite 270, Hous7087, Fax 71
f Small RNA
van DongenAcids Research
van Dongen,gene nome
Registry, 200
ample_4736
| support@L
ston, Texas 73 664-8181
A”, Illumina I
, S., Enrighh, 36, D154-D
S., Batemannclature, 20
04, Nucleic Ac
62344
LCSciences.co
77054
Inc., Part #
ht, A.J., miRD158;
n, A., Enrigh06, Nucleic A
cids Research 3
12
om
1004239 Re
RBase: tool
ht, A.J., miRAcids Research
2, D109-D11
ev. A,
s for
RBase: h, 34,
11;
Example printouts of the included sequence data files are attached.
The data files included with each sample report are listed in Table 3.
These printouts represent truncated sample data files.
sample1_RawData
>ILLUMINA-57021F:7:1:3:1127#0/1 ACATTGGGTTCTCATTCAAATATACTTTTGAAGTATGTGC >ILLUMINA-57021F:7:1:3:352#0/1 ACGAAGAGGGAGCGCAATNNNTCAGTATATATTGAAGGAC >ILLUMINA-57021F:7:1:3:804#0/1 TCTGAGCAGTGACTAGNACCCGTAATANGAGGTGAGCAGC >ILLUMINA-57021F:7:1:3:2003#0/1 CAACAAGGGTAAGTTAATGCAATCGCCCCTCCNNAAAGGG >ILLUMINA-57021F:7:1:3:399#0/1 TTAAGGTAATTAGCGTGGGCGGTAGCGCTCTGTATAAGCT >ILLUMINA-57021F:7:1:3:921#0/1 TAAGCAGGCCATGCCGCTACGNNGGGGATAAATCTGGCTG >ILLUMINA-57021F:7:1:3:1598#0/1 TATTGGGAGGGAATAGATCGCTGNCCAGCCTGATNTAAGG >ILLUMINA-57021F:7:1:3:1981#0/1 GCCGCCTGCCTAGGTCTTCTTATCTTGAGAATGAGTCAAG >ILLUMINA-57021F:7:1:3:118#0/1 TGAGCACACATTACATAATGCGGCTACTGTTTGACAAAGT >ILLUMINA-57021F:7:1:3:1347#0/1 CTCGATGTAAGAGATCACTATTTCGCCACATGGTATTCCG >ILLUMINA-57021F:7:1:3:185#0/1 TAAGGGACCTATTGTCAGCGGATCACAATGTCTTGAAGGA >ILLUMINA-57021F:7:1:3:824#0/1 TTTGCAAACCGATGTCAGTGGCACTGCAAATGTCCACTGT >ILLUMINA-57021F:7:1:3:1603#0/1 CAAAACAAACGTAATACGGCGGTATCCCACTAGAGTTGTC >ILLUMINA-57021F:7:1:3:613#0/1 TCCCATGAATCAGGCCNGACTAGGGAAANTTCAATCAGAC >ILLUMINA-57021F:7:1:3:1459#0/1 GCCCGCCGTTATGGACAATAAGTAAATTGCTACAATTGAC >ILLUMINA-57021F:7:1:3:494#0/1 GCTGTTCTAGAAAATGGTTTATCTATTCCTGCGTCAATCT >ILLUMINA-57021F:7:1:3:1747#0/1 TGGAAAACTCTATTAGAGTCTAACTTATCCAATGCGCACG >ILLUMINA-57021F:7:1:3:1574#0/1 AGTAAATNATANAAATAAAATTAAAAAAAAAANAAAAAAA >ILLUMINA-57021F:7:1:3:1448#0/1 CTATGTACAGCCACTCTCTTGATGGCGGGAAATATTTATT >ILLUMINA-57021F:7:1:3:149#0/1 GGTGCTGGATTCCCGTTTTGCGTATTTTGGGAGAGGTCCA >ILLUMINA-57021F:7:1:3:460#0/1 TAGGCTGTTTGCTACATTTTGAGACAAACTGTATAGAGTG >ILLUMINA-57021F:7:1:3:1436#0/1 AATGGCGGAGCGATTTATAGGGAGAGGGGCGATTGGCTCG >ILLUMINA-57021F:7:1:3:1847#0/1 GACTATCTGCCTGTAGCGGATAAGGCAGCATCCAACCTAA >ILLUMINA-57021F:7:1:3:909#0/1 sample1_RawData
Sample
Data
sample1_FHG_unique
# of input sequences: 6375923
# of families: 1436566; 1436566/6375923=21.0%
Index seq count
1 ATTATGGTACTTGTATTTAACAGGCTCACT 912465
2 CCTCTTATGTAGACCGTTGTCCAGTGGTGA 606708
3 ATTTTTATAGTACCAAGAGGCTACGCAGT 560130
4 GTAAAAAGTCTATCGCCGCACTGTCGTCA 243986
5 CTTAACGGTTCTAACTATTCACCGGTAAAG 185486
6 CAAAGAAGCCATAGGCGCCCGGGAACACC 172620
7 AACCGTGAGGCGCTGGAAAGACGCTAGAAG 177233
8 AGTAGTACTGGCGTACACATTCTCCACGG 171389
9 CATCCTATTCTAGCAATCAGGAGAACATTC 165265
10 AGGCCCATATCAAGAAGTAGAACTATCGA 137449
11 ACATGAATGGCGAATGCTTCCCGTGATA 128553
12 GTTAGCTAGTGCCCGGTTTTATCAAGCCC 115221
13 CGCGAATGTCTTCGTATGCTCAGGTAGC 124749
14 CATGAGAGGTCGAGGGACTTGATTCCTAC 114996
15 CTTCTGGCACCGTGGGCCAGCGGAAGGACA 113310
16 TGCCCCAACGACGCGGAAAATCAAGCGAGC 106560
17 CTTCTGGAAGCCAACGCTCTGGCGGGATCC 111243
18 AAACAAATTAACTCGACGACCTCTCCTCT 107082
19 AAACTATGCATTATTTCCCCCTAAGATCT 99460
20 CATATCTGTCTCCTACCAGATTATCACCCC 105833
21 TGTTTGCTGTGGCATTTTTCCCATGGATTA 102013
22 ATTATTAACGTGGTGTGGTAAATAGAGGGT 92352
23 GGTTCATGCCTAAATTGCATCTATAATA 81208
24 GAGTCGTAACGCTACCCTATACGAAGCG 95005
25 CCTTGAATGTGACCCTGAGGCTTCTATTAG 80527
26 AAGAAGCGTCAACCCCCACGTCAAGACGT 84791
27 TCCGTTGCTAGCCGGAGGACCTCCTGGT 86030
28 CCGCAGAGCCAGGCACTATGTCAGGGGCTA 91730
29 CATATGGCAAAAGACCGGACTGGACGCGA 89980
30 ACATCAAGAATCCTCAATCCTACGTGGACG 84201
31 AAGAGCCGGAGAACACATGATGGAGGCGAC 86202
32 TGCGATAAAAACGGTGATGACCAAAGAACA 72819
33 TGATCACGAAAAGTTGCTTGACAAGGTT 77665
34 CTGGTTAAGCACCCCCTGGTGGTGCTGCCT 82949
35 TGGGGCTACCGGGGCCTACACGCACCCAT 83437
36 GTTTATACTATTAATATGCAATGGTGACT 80270
37 GCCCCAAGCGTAGGTTGGGGGTCCGTTCG 69748
38 GCCCGGAGTCAGATGACTCGCTTACGTG 66301
39 CACGCTAGAAGGTGCTAGGGCTAGCTCTTT 81592
40 GTCATGGGAGCATTCATGCCGCGACGCAC 87826
41 CGCCTTTACTCCTGGAAGATATGACATGA 75275
42 GTAAACCCCGGGCGGTGCAACCACAGGCGG 64396
43 TGGTCTGGGCATTGTGCTTGAGCACACTTA 67972
Sample
Data
sample1_FHG_pass
# of input seq: 63847563
# of input family: 984732
# of seq after filter: 5876673; 5876673/6184267=95.0%
# of family after filter: 826122; 826122/987972=83.6%
# of seq with repeat fragment: 307594; 307594/6184267=5.0%
# of family with repeat fragment: 161850; 161850/987972=16.4%
# of seq of >=7A,>=8C,>=6G,>=7T: 307563; 307563/6184267=5.0%
# of family of >=7A,>=8C,>=6G,>=7T: 161819; 161819/987972=16.4%
# of seq of >10 dimer: 5; 5/6184267=0.0%
# of family of >10 dimer: 5; 5/987972=0.0%
# of seq of >6 trimer: 18; 18/6184267=0.0%
# of family of >6 trimer: 18; 18/987972=0.0%
# of seq of >5 tetramer: 8; 8/6184267=0.0%
# of family of >5 tetramer: 8; 8/987972=0.0%
Index seq count
1 TAGCACTCAAGTGTTTTGCACTGG 870337
2 AGAACGGTTTGCTATTTCTG 559024
3 TAACGGGGGTCACCTTCGGCAG 519382
4 CTCGAATTTTTCCAATCAC 198458
5 GTATGCATAATTGCAAGCACAT 135313
6 TTGCCTCACTCGTACAAAAGGCC 124018
7 GGATCAGGACCCGACTCCACATTAG 120386
8 AGAGGCCACCAAGATCTTAGGCC 115000
9 AAACAGCGTCTCAGTGTAATTG 99591
10 TCTGTCTATCTTCTTAAT 87246
11 CTGTCCCCCTTGTCTGATACA 78864
12 ACAAATGGCGGACGCGAATC 64239
13 GGCAAGTCTTCCTGCGA 59002
14 TGATGGAGGAACGTAAACGT 57130
15 GTCCACATCATCCCGGGGTCG 52244
16 TCGCGTATTGCTATTTAGGA 51227
17 ATCAACAACCAATCTCGATAT 49693
18 TATCAGTAACGTACATGCCCCCGAT 49008
19 ACTCCTGGAGGTCTCGCTCGTCTA 45497
20 AATCGAATCTTCGATACGTCGT 39503
21 CGTGGAGGGAGGCCGTCAGTTT 37584
22 GTCCATCTAGCCAATAGGC 32341
23 AGTAACCAACCGTGAGAGTGTTGGC 30809
24 GCTGGTTTTTGGAGCATG 30123
25 AGTAGGTCTGTAAGGGGT 29548
26 CAGTAAACGAAAGGACCGAGACT 29963
27 TTACACAAACATATCAGCGAT 27799
28 CTCGTTATTAGTAATACTC 24727
29 TAAGTGGTAAATTAACCGTTACACC 24531
30 TCGCGAGCTGACCAGTATCACG 23074
Sample
Data
sample1_FHG_gp1_Align
input file: sample1/mirAlign/sample1_FHG_Align.txt for mirs input file: sample1/db/sample1_FHG_db.fa for sequ seq input file: sample1/output/sample1_FHG_gp1.txt for cluster & genomeSeq Conventions: 1. the . in alignments means that the base is same as that in reference. 2. the * in alignments means that the base is same as that in reference, but the * is the mature part of precursor. The capital bases in * region also belong to mature. 3. the * in #error means that this sequseq has deletion compared with reference. 4. the + in #error means that this sequseq is without 3ADT cut, and the previous part of the sequseq is mapped to the reference and the other part is removed. 5. full length precursor and sequseq (except for the sequseq without 3ADT cut, which is indicated by + in #error) are listed below. clusterNo=1 chr=1 gi=NT_032977 strand=1 #mirs=7 #copy(all)=3832 #family(all)=37 #copy(0error)=1167 #family(0error)=12 #copy(1error)=2665 #family(1error)=25 genome 11191944 GTATGCCTTAACAGCAAGCGCAGTAGCGTAGCGACTGGGCATGAACGCGACGTTGATGAACTCGTAAGTTCTTCCACAAGTCTGACCGTCGTATAAG 11192040 #error
hsa-mir-XXe 1 ................**********************....................**********************............ 92 hsa-miR-XXe,hsa-miR-XXe* ptr-mir-XXe 1 ...............**********************...................................................... 91 ptr-miR-XXe mml-mir-XXe 1 ..........a.....**********************...................................................... 92 mml-miR-XXe mmu-mir-XXe 1 ................**********************............c.......**********************............ 92 mmu-miR-XXe,mmu-miR-XXe* bta-mir-XXe 1 ................************************......c...cga....................................... 92 bta-miR-XXe-5p oan-mir-XXe 1 .t.............***********************..g...... c.ga.....**********************.............a.. 94 oan-miR-XXe,oan-miR-XXe* cfa-mir-XXe 1 ................................a.......g.********************** 64 cfa-miR-XXe 275_count=837 1 ........................ 24 0 2659_count=62 1 ....................... 23 0 4472_count=32 1 ....................... 23 0 10481_count=12 1 ...................... 22 0 16177_count=7 1 ......................... 25 0 16949_count=7 1 ........................ 24 0 28999_count=4 1 ......................... 25 0 189_count=1390 1 ............c........... 24 1 336_count=643 1 ............c.......... 23 1 990_count=194 1 ............c......... 22 1 3693_count=41 1 ........................t 25 1 6354_count=21 1 ...........c........... 23 1 7153_count=19 1 ........................a 25 1 8769_count=15 1 ............c........ 21 1 15029_count=8 1 ...........c............ 24 1 18204_count=7 1 ...........c.......... 22 1 23103_count=5 1 .......................a 24 1 27729_count=4 1 .............t.......... 24 1 40840_count=3 1 ..........t............. 24 1 47403_count=3 1 ............c...... 19 1 38914_count=3 1 .................t...... 24 1 37319_count=3 1 ............c....... 20 1 1341_count=133 1 ...................... 22 0 2672_count=61 1 ..................... 21 0 27623_count=4 1 ................... 19 0 28829_count=4 1 ..................... 21 0 32112_count=4 1 .................... 20 0 1111_count=168 1 .....................t 22 1 2077_count=80 1 .................g.... 22 1 6959_count=19 1 .................g... 21 1 10215_count=12 1 ................g.... 21 1 12905_count=9 1 ......................a 23 1
Sample
Data
sample1_FHG_gp1_Align
19877_count=6 1 ................g... 20 1 42447_count=3 1 ....................t 21 1 clusterNo=2 chr=1 gi=NT_032977 strand=1 #mirs=5 #copy(all)=156 #family(all)=7 #copy(0error)=137 #family(0error)=4 #copy(1error)=19 #family(1error)=3 genome 11194868 CTCTTGCGAAAAATAAATAAACGCTCAATTAGATGGCGGCGGATTGGGTCCCCCCTAGAAGCGACAGGGTTGCTGCTGAACTCGGTGGTTCTGTGAG 11194968 #error
bta-mir- XXc 4 a..g.........c........***********************.........g............................................g. 104 bta-miR-XXc hsa-mir-XXc-1 1 ................***********************................**********************............ 89 hsa-miR-XXc ptr-mir-XXc-1 1 ...............***********************.................................................. 88 ptr-miR-XXc mmu-mir-XXc-1 1 ......t.........***********************................**********************............ 89 mmu-miR-XXc cfa-mir-XXc-1 1 ************************..................................... 61 cfa-miR-XXc 1630_count=106 1 ........................ 24 0 7100_count=19 1 ....................... 23 0 16195_count=7 1 ...................... 22 0 11836_count=10 1 .......................a 24 1 18340_count=6 1 ........................a 25 1 44145_count=3 1 ........................t 25 1 23678_count=5 1 ...................... 22 0 clusterNo=3 chr=1 gi=NT_019273 strand=1 #mirs=5 #copy(all)=22 #family(all)=2 #copy(0error)=19 #family(0error)=1 #copy(1error)=3 #family(1error)=1 genome 6049197 TCATAAAAATGTCGAGGAATGGCGGCTCGCGTAGACCCGCACCCCACCCCTTCGAAGCTCATTGCGTCAGTTCCACGATTC 6049277 #error
mmu-mir-XXX 1 .................................................**********************......... 80 mmu-miR-XXX bta-mir-XXX 1 .................................................**********************.g....a.. 80 bta-miR-XXX hsa-mir-XXX 1 ...............................................**********************...... 75 hsa-miR-XXX mne-mir-XXX 1 ...............................................*******************G**...... 75 mne-miR-XXX cfa-mir-XXX 1 .......................................********************** 61 cfa-miR-XXX 6971_count=19 1 ...................... 22 0 48948_count=3 1 .......................a 24 1 clusterNo=4 chr=1 gi=NT_019273 strand=1 #mirs=2 #copy(all)=33402 #family(all)=49 #copy(0error)=481 #family(0error)=12 #copy(1error)=32921 #family(1error)=37 genome 13122055 GCTCCCCTATAAGAAGCGCGGAAGCCGGCTTATATGTTTCCCCATTATCATCGAACTTTCGATTGGGCCCCGTAACTCT 13122133 #error
ptr-mir-XXX-1 1 ......................................**********************.................. 78 ptr-miR-XXX hsa-mir-XXX-1 1 ......................................**********************.................. 78 hsa-miR-XXX 1326_count=135 1 ...................... 22 0 1529_count=115 1 .................... 20 0 1624_count=107 1 ................... 19 0 4246_count=35 1 ..................... 21 0 4579_count=31 1 ....................... 23 0 6227_count=22 1 .................. 18 0 27689_count=4 1 .................... 20 0 45599_count=3 1 ...................... 22 0 29_count=20119 1 ....................g. 22 1 42_count=8560 1 ....................g.. 23 1 217_count=1182 1 ....................g 21 1 771_count=273 1 .....................g. 23 1 2650_count=62 1 ....................t. 22 1 2976_count=54 1 .....................t 22 1 3379_count=46 1 .....................g 22 1 3565_count=43 1 ...................a 20 1 4289_count=34 1 ......................t 23 1 5056_count=28 1 ....................g... 24 1 5748_count=24 1 .....................g.. 24 1 5990_count=23 1 ....................t 21 1
Sample
Data
sample1_FHG_gp1_miRlist
#sequ_seq_ID seq length
clusterNo=1: miR‐40e‐3p‐456
275_count=837 AAATTGTCGTCCGAACGACCCA 24 CTAAATTGTCGTCCGAACGACCCA 3
2659_count=62 AAATTGTCGTCCGAACGACCCA 23 CTAAATTGTCGTCCGAACGACCCA 3
4472_count=32 AAATTGTCGTCCGAACGACCCA 23 CTAAATTGTCGTCCGAACGACCCA 3
10481_count=12 CTAAATTGTCGTCCGAACGACC 22 CTAAATTGTCGTCCGAACGACCCA 1
16177_count=7 CTAAATTGTCGTCCGAACGACCCA 25 CTAAATTGTCGTCCGAACGACCCA 1
18204_count=7 CTAAATTGTCGTCCGAACGACC 22 CTAAATTGTCGTCCGAACGACCCA 1
23103_count=5 TAAATTGTCGTCCGAACGACCCA 24 CTAAATTGTCGTCCGAACGACCCA 2
27729_count=4 AAATTGTCGTCCGAACGACCCA 24 CTAAATTGTCGTCCGAACGACCCA 3
40840_count=3 TAAATTGTCGTCCGAACGACCCA 24 CTAAATTGTCGTCCGAACGACCCA 2
47403_count=3 AAATTGTCGTCCGAACGAC 19 CTAAATTGTCGTCCGAACGACCCA 3
38914_count=3 AAATTGTCGTCCGAACGACCCA 24 CTAAATTGTCGTCCGAACGACCCA 3
37319_count=3 CTAAATTGTCGTCCGAACGA 20 CTAAATTGTCGTCCGAACGACCCA 1
3
clusterNo=1: miR‐40e‐5p‐4567 3
1341_count=133 AAGAGTGCGTTGATTGTGGGTA 22 TCAAGAGTGCGTTGATTGTGGGTA 3
2672_count=61 CAAGAGTGCGTTGATTGTGGG 21 TCAAGAGTGCGTTGATTGTGGGTA 2
27623_count=4 AAGAGTGCGTTGATTGTGG 19 TCAAGAGTGCGTTGATTGTGGGTA 3
28829_count=4 AAGAGTGCGTTGATTGTGGGT 21 TCAAGAGTGCGTTGATTGTGGGTA 3
32112_count=4 AAGAGTGCGTTGATTGTGGG 20 TCAAGAGTGCGTTGATTGTGGGTA 3
1111_count=168 CAAGAGTGCGTTGATTGTGGGT 22 TCAAGAGTGCGTTGATTGTGGGTA 2
2077_count=80 AAGAGTGCGTTGATTGTGGGTA 22 TCAAGAGTGCGTTGATTGTGGGTA 3
6959_count=19 AAGAGTGCGTTGATTGTGGGT 21 TCAAGAGTGCGTTGATTGTGGGTA 3
10215_count=12 TCAAGAGTGCGTTGATTGTGG 21 TCAAGAGTGCGTTGATTGTGGGTA 1
12905_count=9 TCAAGAGTGCGTTGATTGTGGGT 23 TCAAGAGTGCGTTGATTGTGGGTA 1
19877_count=6 AAGAGTGCGTTGATTGTGGG 20 TCAAGAGTGCGTTGATTGTGGGTA 3
42447_count=3 CAAGAGTGCGTTGATTGTGGG 21 TCAAGAGTGCGTTGATTGTGGGTA 2
35894_count=3 AAGAGTGCGTTGATTGTGGGTA 22 TCAAGAGTGCGTTGATTGTGGGTA 3
38078_count=3 AAGAGTGCGTTGATTGTGGGTA 23 TCAAGAGTGCGTTGATTGTGGGTA 3
41722_count=3 AAGAGTGCGTTGATTGTGGGTA 23 TCAAGAGTGCGTTGATTGTGGGTA 3
3Sample
Data
sample1_FHG_gp1_miRlist
clusterNo=2: miR‐40c‐5p‐4564 2
1630_count=106 AGTGGAGAGTGCCGCGTGTCTCG 24 GAGTGGAGAGTGCCGCGTGTCTCG 2
7100_count=19 GAGTGGAGAGTGCCGCGTGTCTC 23 GAGTGGAGAGTGCCGCGTGTCTCG 1
16195_count=7 AGTGGAGAGTGCCGCGTGTCTC 22 GAGTGGAGAGTGCCGCGTGTCTCG 2
11836_count=10 AGTGGAGAGTGCCGCGTGTCTCG 24 GAGTGGAGAGTGCCGCGTGTCTCG 2
18340_count=6 AGTGGAGAGTGCCGCGTGTCTCG 25 GAGTGGAGAGTGCCGCGTGTCTCG 2
44145_count=3 GAGTGGAGAGTGCCGCGTGTCTCG 25 GAGTGGAGAGTGCCGCGTGTCTCG 1
2
clusterNo=2: miR‐40c‐3p‐57674 3
23678_count=5 ATCGCAGAATGCGCCTTGAT 22 CATCGCAGAATGCGCCTTGAT 2
1
clusterNo=3: miR‐3456‐3p‐46778 3
13218_count=9 AACTAGCGGTCTCTTTCGCGT 21 AACTAGCGGTCTCTTTCGCGTGGA 1
9642_count=13 ACTAGCGGTCTCTTTCGCGTGG 22 AACTAGCGGTCTCTTTCGCGTGGA 2
Sample
Data
sample1_FHG_gp1_Sum
input file: sample1/mirAlign/sample1_FHG_Align.txt for mirs
input file: sample1/db/sample1_FHG_db.fa for sequ seq
input file: sample1/output/sample1_FHG_gp1.txt for cluster & genomeSeq
Title: Position clusters of mirs mapping to genome
input file: sample1/lists/sample1_FHG_Align_chrY.txt for sequence start position
input file: sample1/mirAlign/sample1_FHG_matchedMirs.fa for mapped mir IDs
input file: sample1/lists/sample1_FHG_Align_chr1.txt for alignment data
input file: sample1/lists/sample1_FHG_Align_chr2.txt for alignment data
input file: sample1/lists/sample1_FHG_Align_chr3.txt for alignment data
input file: sample1/lists/sample1_FHG_Align_chr4.txt for alignment data
input file: sample1/lists/sample1_FHG_Align_chr5.txt for alignment data
input file: sample1/lists/sample1_FHG_Align_chr6.txt for alignment data
input file: sample1/lists/sample1_FHG_Align_chr7.txt for alignment data
input file: sample1/lists/sample1_FHG_Align_chr8.txt for alignment data
input file: sample1/lists/sample1_FHG_Align_chr9.txt for alignment data
input file: sample1/lists/sample1_FHG_Align_chr10.txt for alignment data
input file: sample1/lists/sample1_FHG_Align_chr11.txt for alignment data
input file: sample1/lists/sample1_FHG_Align_chr12.txt for alignment data
input file: sample1/lists/sample1_FHG_Align_chr13.txt for alignment data
input file: sample1/lists/sample1_FHG_Align_chr14.txt for alignment data
input file: sample1/lists/sample1_FHG_Align_chr15.txt for alignment data
input file: sample1/lists/sample1_FHG_Align_chr16.txt for alignment data
input file: sample1/lists/sample1_FHG_Align_chr17.txt for alignment data
input file: sample1/lists/sample1_FHG_Align_chr18.txt for alignment data
input file: sample1/lists/sample1_FHG_Align_chr19.txt for alignment data
input file: sample1/lists/sample1_FHG_Align_chr20.txt for alignment data
input file: sample1/lists/sample1_FHG_Align_chr21.txt for alignment data
input file: sample1/lists/sample1_FHG_Align_chr22.txt for alignment data
input file: sample1/lists/sample1_FHG_Align_chrX.txt for alignment data
input file: sample1/lists/sample1_FHG_Align_chrY.txt for alignment data
unique mammalian mirs mapped by sequ seq: 6456
# of unique mammalian mirs mapped by sequ seq & genome: 5256; 5256/6456=85.6%
# of position clusters of the mapped mirs: 574
length of the genome: 26277776868
position cluster: the distance of two near positions in a clustaer is < 50
For the mirs mapped by sequ seq and genome after re‐alignment:
# of position clusters: 457
# of mammalian mirs mapped to sequ seq and genome: 4839
# of unique sequ seq: 5381077
# of unique sequ family: 5330
Note: represent miR at 5p or 3p is the sequenced sequence of lowest error# in each cluster and highest copy#
the miR_name is composed of 4 parts, miR, extension number, 5p or 3p, index of sequ seq.
The extension number is the extension number in mirIDs of highest occurency
# of unique miRs detected: 563; # of unique miRs is counted based on miR_name.
Sample
Data
sample1_FHG_gp1_Sum
Index
clusterN
o miR_name miR_seq miR_len
copy# of the
isoform
copy#(all
isoforms in 5p
or 3p)
family#(all
isoforms in
5p or 3p) chr# chr_seqID strand miR_start miR_end mir_start mir_end #mirs mirIDs
1 1 miR‐30e‐5p‐275 TGTAAACATCCTTGACTGGAAGCT 24 5779 4212 22 1 NT_032977 + 11191961 11191984 11191944 11192040 7 hsa‐mir‐XXX
2 20 miR‐28‐3p‐2238 CACTAGATTGTGAGCTCCTGGA 22 107 167 7 3 NT_005612 + 94901772 94901793 94901718 94901803 11 hsa‐mir‐XXX
3 57 miR‐27b‐3p‐127 TTCACAGTGGCTAAGTTCTGC 21 5448 3137 152 9 NT_008470 + 5168992 5169012 5168931 5169027 6 hsa‐mir‐XXX
4 87 miR‐625‐3p‐23720 GACTATAGAACTTTCCCCCTCA 22 230 128 2 14 NT_026437 + 46937624 46937645 46937572 46937656 1 hsa‐mir‐XXX
5 98 miR‐1839‐5p‐21498 AGGGTAGATAGAACAGGTCTTG 22 140 274 3 15 NT_077661 + 545115 545136 545108 545179 3 hsa‐mir‐XXX
6 107 miR‐21‐3p‐413 CAACACCAGTCGATGGGCTGTC 22 9100 4694 129 17 NT_010783 + 16571733 16571754 16571677 16571768 6 hsa‐mir‐XXX
7 117 miR‐7e‐3p‐37414 CTATACGGCCTCCTAGCTTTCC 22 253 190 1 19 NT_011109 + 24464281 24464302 24464221 24464313 4 hsa‐mir‐XXX
8 159 miR‐101‐3p‐121 GTACAGTACTGTGATAACTGAA 22 9940 3692 81 1 NT_032977 ‐ 35496044 35496065 35496031 35496113 6 hsa‐mir‐XXX
9 168 miR‐135b‐3p‐17671 ATGTAGGGCTAAAAGCCATGGG 22 110 85 1 1 NT_004487 ‐ 55907805 55907826 55907783 55907879 6 hsa‐mir‐XXX
10 179 miR‐425‐3p‐5257 ATCGGGAATGTCGTGTCCGCC 21 222 239 277 3 NT_022517 ‐ 48997597 48997617 48997584 48997670 8 hsa‐mir‐XXX
11 196 miR‐548p‐3p‐22151 CCAAAACTGCAGTTACTTTTGC 22 245 189 1 5 NT_034772 ‐ 2567212 2567233 2567198 2567281 2 hsa‐mir‐XXX
12 225 miR‐31‐5p‐412 AGGCAAGATGCTGGCATAGCTG 22 1906 5007 272 9 NT_008413 ‐ 21502156 21502177 21502099 21502201 6 hsa‐mir‐XXX
13 237 miR‐548l‐5p‐14118 AAAAGTATTTGCGGGTTTTGTC 22 259 112 3 11 NT_008984 ‐ 6461333 6461354 6461282 6461367 2 hsa‐mir‐XXX
14 238 miR‐125b‐5p‐90 TCCCTGAGACCCTAACTTGTGA 22 11592 9859 157 11 NT_033899 ‐ 25532933 25532954 25532880 25532967 7 hsa‐mir‐XXX
15 245 miR‐1293‐5p‐5124 TCTGGGTGGTCTGGAGATTTGTG 23 207 51 16 12 NT_029419 ‐ 12771272 12771294 12771230 12771300 2 hsa‐mir‐XXX
16 248 miR‐16‐3p‐9924 CCAGTATTAACTGTGCTGCTGA 22 104 79 2 13 NT_024524 ‐ 31603122 31603143 31603108 31603199 8 hsa‐mir‐XXX
17 255 miR‐1910‐3p‐10089 GAGGCAGAAGCAGGATGACAA 21 54 181 2 16 NT_010498 ‐ 39389435 39389455 39389425 39389504 1 hsa‐mir‐XXX
18 260 miR‐33b‐5p‐3282 GTGCATTGCTGTTGCATTGCA 21 304 74 4 17 NT_010718 ‐ 17314559 17314579 17314498 17314593 5 hsa‐mir‐XXX
19 265 miR‐454‐3p‐8695 TAGTGCAATATTGCTTATAGGGTTT 25 114 259 3 17 NT_010783 ‐ 15868207 15868231 15868179 15868293 6 hsa‐mir‐XXX
20 270 miR‐320‐3p‐1624 AAAAGCTGGGTTGAGAGGG 19 107 226 9 19 NT_011109 ‐ 19480769 19480787 19480765 19480819 1 hsa‐mir‐XXX
21 276 miR‐548j‐5p‐14226 AAAAGTAATTGCGGTCTTTGGT 22 220 298 1 22 NT_011520 ‐ 6341809 6341830 6341746 6341857 1 hsa‐mir‐XXX
22 277 miR‐659‐5p‐34577 AGGACCTTCCCTGAACCAAGGA 22 82 222 1 22 NT_011520 ‐ 17634254 17634275 17634199 17634295 1 hsa‐mir‐XXX
23 278 miR‐1249‐3p‐24401 ACGCCCTTCCCCCCCTTCTTCA 22 116 103 1 22 NT_011523 ‐ 867545 867566 867540 867605 3 hsa‐mir‐XXX
24 279 miR‐221‐5p‐816 ACCTGGCATACAATGTAGATTTCT 24 256 585 299 X NT_079573 ‐ 8457414 8457437 8457351 8457460 9 hsa‐mir‐XXX
25 279 miR‐221‐3p‐5 AGCTACATTGTCTGCTGGGTTTC 23 6612 9147 231 X NT_079573 ‐ 8457375 8457397 8457351 8457460 9 hsa‐mir‐XXX
26 280 miR‐222‐5p‐4253 CTCAGTAGCCAGTGTAGATCC 21 101 271 5 X NT_079573 ‐ 8458247 8458267 8458187 8458296 6 hsa‐mir‐XXX
27 280 miR‐222‐3p‐23 AGCTACATCTGGCTACTGGGTCTC 24 6978 8658 242 X NT_079573 ‐ 8458206 8458229 8458187 8458296 6 hsa‐mir‐XXX
28 281 miR‐548i‐5p‐38320 AAAAGTACTTGCGGATTTTGC 21 247 99 1 X NT_011651 ‐ 6777114 6777134 6777067 6777143 1 hsa‐mir‐XXX
29 282 miR‐361‐5p‐1854 TTATCAGAATCTCCAGGGGTAC 22 126 273 7 X NT_011651 ‐ 8454994 8455015 8454948 8455019 4 hsa‐mir‐XXX
30 282 miR‐361‐3p‐17619 TCCCCCAGGTGTGATTCTGATTT 23 173 59 3 X NT_011651 ‐ 8454954 8454976 8454948 8455019 4 hsa‐mir‐XXX
31 283 miR‐421‐3p‐3255 ATCAACAGACATTAATTGGGCGC 23 186 54 6 X NT_011669 ‐ 11756215 11756237 11756199 11756283 6 hsa‐mir‐XXX
Sample
Data
sample1_FHG_clusterPosition
input from sample1/0finalReport/3_sample1_FHG_gp1_Sum.txt
input from sample1/0finalReport/4_sample1_FHG_gp2_Sum.txt
input from sample1/0finalReport/6_sample1_FHG_gp4_Sum.txt
The position refers to the start position of miR in human genome. ESTs are added one by one after chromosome X.
The PositionInSeq refers to the position of miR in its own contig sequence or EST
Refer to above three input files to get definitions of miR_name & miR_seq
The clusterDistance is the difference of the positions of the current and previous clusters.
Minimum c 1
Maximum c19010493
Index Position
cluster
Distance
#Copy (all
isoforms in 5p
or 3p)
#family (all
isoforms in 5p
or 3p) Chr# Strand
StartPosition
InSeq
EndPosition
InSeq Type unique_mirs miR_name miR_seq
1 753552 3 1 1 + 75242 75266 predict (gp4) PC‐5p‐XXXXX GCGGCACTGAGGCTTATAGCGGAA
2 866783 323529 5 1 1 ‐ 411322 411346 predict (gp4) PC‐3p‐XXXXX TGTACGGCCATCCAGCTCTAGGCC
3 1209671 2297742 5 1 1 + 933784 933805 predict (gp4) PC‐5p‐XXXXX GGAATAGCACATCAAGTAGGT
4 1435145 92222 3 1 1 ‐ 1303036 1303059 predict (gp4) PC‐5p‐XXXXX CACGGCCATTAGACGACGCCGGG
5 1623385 1738943 4 1 1 ‐ 1708079 1708102 predict (gp4) PC‐3p‐XXXXX TGTACGTAAAGTGACTCCACTAA
6 1965183 1297740 2 1 1 ‐ 2074417 2074440 predict (gp4) PC‐5p‐XXXXX TGATAGGCCCTACTGTCCATGTT
7 2345903 1377652 6 3 1 ‐ 2594777 2594798 known (gp1) hsa‐mir‐XXX miR‐XXX‐5p‐XXX TATGCCAGGCAGTTATACCAT
8 2823957 44 67 4 1 ‐ 3013390 3013415 known (gp1) hsa‐mir‐XXX miR‐XXX‐5p‐XXX CAAGCGTTTGTCAACAAAGTGTTGA
9 3358837 5278182 3 1 1 + 3184989 3185008 predict (gp4) PC‐5p‐XXXXX TTGTCCGATTATGTGCTCG
10 3839822 48398 5675 25 1 + 3621919 3621940 known (gp1) mmu‐mir‐XXX;mml‐mir‐XXX;b miR‐XXX‐5p‐XXX CCGCGACGTTTTCGGGACCGA
11 4097536 42 456 15 1 + 3921607 3921624 known (gp1) mmu‐mir‐XXX;mml‐mir‐XXX;b miR‐XXX‐3p‐XXX CCAAACTCGCGAACTAG
12 4440731 2887 532 6 1 + 4464521 4464546 known (gp1) mmu‐mir‐XXX;mml‐mir‐XXX;b miR‐XXX‐5p‐XXX GAACTCTACGAATCATCCTAGTATG
13 4768272 39 2 1 1 + 4883828 4883848 known (gp1) mmu‐mir‐XXX;mml‐mir‐XXX;b miR‐XXX‐3p‐XXX GCTGCCTCCGTACGATGCTA
14 5218520 761034 5 1 1 + 5090794 5090816 predict (gp4) PC‐5p‐XXXXX ATCTAATGTGGGTGACACTGGT
15 5508289 1445882 78 2 1 + 5337199 5337222 predict (gp4) PC‐5p‐XXXXX GGGGTTTAGGGTACCCGCTTCTG
16 6021793 1814174 2 1 1 + 5444635 5444659 predict (gp4) PC‐5p‐XXXXX TGTTGAGCGATTGCATGCAACTTA
17 6460209 2534403 56 1 1 ‐ 5666243 5666263 predict (gp4) PC‐5p‐XXXXX TGGGTGCGTGTGGTCACGTC
18 6947656 229060 21 1 1 ‐ 5768254 5768271 predict (gp4) PC‐5p‐XXXXX TGCGCGTCTTTATTATC
19 7422268 274643 4 1 1 ‐ 6044844 6044868 predict (gp4) PC‐5p‐XXXXX CCGTGATTGGACCGTCGCGTTCGT
20 7922269 1548217 56 1 1 ‐ 6305595 6305619 predict (gp4) PC‐5p‐XXXXX CACTGCCGAACGATCTGTGATTCC
21 8458859 3960774 7 1 1 + 6681929 6681953 predict (gp4) PC‐3p‐XXXXX TTCACGCTGGGTTATATCTCTCGC
22 8775378 8753269 8 1 1 ‐ 7051908 7051926 predict (gp4) PC‐5p‐XXXXX CCTCTCCTGGTTAGTCCA
23 9051642 36 8 1 1 + 7435397 7435420 predict (gp4) PC‐3p‐XXXXX CCAAGCAGTCTGGCATCTTATGC
24 9312384 2979623 4689 90 1 ‐ 7740471 7740495 known (gp1) mmu‐mir‐XXX;mml‐mir‐XXX;b miR‐XXX‐3p‐XXX GTATCGCTATCGCCCAGAGCGTCG
25 9486554 3393776 2 1 1 + 7895151 7895168 predict (gp4) PC‐5p‐XXXXX CACCCAAGGACCCCGCC
26 9802033 2615459 45 1 1 ‐ 8392277 8392297 known (gp1) ssc‐mir‐XXX miR‐XXX‐3p‐XXX GCGCCTTCCGCCGATTTTGT
27 10066816 9061509 7 1 1 ‐ 8528349 8528366 predict (gp4) PC‐3p‐XXXXX ACGATACTGTACTCGGG
28 10365246 7862322 34 1 1 ‐ 8997137 8997156 predict (gp4) PC‐5p‐XXXXX CCAAGAGGTGTGTTGAGCA
29 10746251 371399 76 1 1 ‐ 9300237 9300258 predict (gp4) PC‐5p‐XXXXX AGTTTTCGCACGGCGTGTCAT
30 11034399 5033786 8 1 1 + 9700728 9700750 predict (gp4) PC‐5p‐XXXXX TGCTTATGCAGCTTTGTAGCCT
31 11125629 450060 34 3 1 + 9804588 9804611 known (gp1) mmu‐mir‐XXX;mml‐mir‐XXX;b miR‐XXX‐5p‐XXX AAGGCGGGTCTACTAAGGGGAGC
32 11355321 4198426 76 1 1 ‐ 10112901 10112920 predict (gp4) PC‐5p‐XXXXX GAAACCAGCTAAGCAATGC
33 11850572 785 4 1 1 ‐ 10612103 10612123 known (gp1) mmu‐mir‐XXX;mml‐mir‐XXX;b miR‐XXX‐5p‐XXX GGGCATAACTGTGGGCTGAC
34 11947742 7938479 6456 6 1 + 11156668 11156693 predict (gp4) PC‐5p‐XXXXX TTGAGGTCCGTTCCTCAGTCGACCT
35 12107675 906267 3 1 1 ‐ 11576747 11576770 predict (gp4) PC‐3p‐XXXXX TAGAGGTAGCCACAAGGATAGCG
Sample
Data
Table 1 - Data summaryRaw Data #SequSeq #UniqueSeq# Raw Sequ seq 9,922,513 1,592,666
Data Processing #SequSeq %SequSeq #UniqueSeq %UniqueSeq1.“impurity” sequences filtered 948,723 9.6% 623,412 39.1%2. Copy#<3 filtered 913,248 9.2% 845,666 53.1%3. Length < 15 filtered 204,858 2.1% 56,788 3.6%4. mRNA,RFam,Repbase filtered 362,645 3.7% 11,345 0.7%5. Final Mappable 7,493,039 75.5% 55,455 3.5%
Total 9,922,513 100% 1,592,666 100%
Table 2 - Length distribution of mappable data
Length #SequSeq%FinalMappable
SequSeq #UniqueSeq%FinalMappable
UniqueSeq#SequSeq/
#UniqueSeq15 52,742 0.7% 945 1.7% 55.816 60,120 0.8% 1,159 2.1% 51.917 64,128 0.9% 978 1.8% 65.618 57,106 0.8% 1,005 1.8% 56.819 53,125 0.7% 883 1.6% 60.220 78,317 1.0% 1,579 2.8% 49.621 454,462 6.1% 3,907 7.0% 116.322 3,438,602 45.9% 16,261 29.3% 211.523 2,810,150 37.5% 8,934 16.1% 314.524 249,047 3.3% 2,037 3.7% 122.325 41,676 0.6% 947 1.7% 44.026 3,941 0.1% 445 0.8% 8.927 2,570 0.0% 103 0.2% 25.028 2,592 0.0% 94 0.2% 27.629 4,980 0.1% 120 0.2% 41.530 1,823 0.0% 82 0.1% 22.240 117,658 1.6% 15,976 28.8% 7.4
Final Mappable 7,493,039 100.0% 55,455 100.0% 135.1
Sample
Data
Table 3 - Unique seq mapped to unique mammalian mirs
#SequSeq %SequSeq %FinalMappable #UniqueSeq %UniqueSeq %FinalMappable
# Known mammalian unique mir in miRBase v14.0 3,924# Known mammalian unique miR in miRBase v14.0 2,656# Unique mir in miRBase mapped 1,599# Unique miR in miRBase mapped 2,273Mapped to miRBase 6,013,453 60.60% 80.25% 12,456 0.78% 22.46%
# Known hsa mir in miRBase v14.0 717# Known hsa miR in miRBase v14.0 894# Unique hsa mir in mirmiRBase mapped 286# Unique hsa miR in miRBase mapped 399Mapped to hsa of miRBase 5,937,837 59.84% 79.24% 8,392 0.53% 15.13%
Table 4 - Cluster I: Sequ seq mapped to mammalian mirs that further mapped to genomeMapping to mammalian:
#SequSeq %SequSeq %FinalMappable #UniqueSeq %UniqueSeq %FinalMappable
Cluster I 5,734,742 57.80% 76.53% 8,335 0.52% 15.03%# Alignment-cluster in Cluster I 389# Unique mir in Cluster I 1,456# Unique miR in Cluster I 309
FileName Mapped_Data/sample1_FHG_gp1_Align.txtFileName Mapped_Data/sample1_FHG_gp1_Sum.txtFileName Mapped_Data/sample1_FHG_gp1_miRlist.txt
Mapping to species hsa:
#SequSeq %SequSeq %FinalMappable #UniqueSeq %UniqueSeq %FinalMappable
Cluster I 5,734,597 57.79% 76.53% 8,323 0.52% 15.01%#Unique hsa mir in Cluster I 295#Unqiue hsa miR in Cluster I 399
Sequ Seq Unique Seq
Sequ Seq Unique Seq
Sequ Seq Unique Seq
Sample
Data
Table 5 - Cluster II: Sequ seq mapped to both mammalian mirs and genome, but the mirs unmapped to genome:
Mapping to mammalian:
#SequSeq %SequSeq %FinalMappable #UniqueSeq %UniqueSeq %FinalMappable
Cluster II 145 0.00% 0.00% 12 0.00% 0.02%# Alignment-cluster in Cluster II 7# Unique mir in Cluster II 3# Unique miR in Cluster II 4
FileName Mapped_Data/sample1_FHG_gp2_Align.txtFileName Mapped_Data/sample1_FHG_gp2_Sum.txtFileName Mapped_Data/sample1_FHG_gp2_miRlist.txt
Table 6 - Cluster III: Sequ seq mapped to mammalian mirs, but the mirs unmapped to genome (sequence cluster):
Mapping to mammalian:
#SequSeq %SequSeq %FinalMappable #UniqueSeq %UniqueSeq %FinalMappable
Cluster III 2,753 0.03% 0.04% 145 0.01% 0.26%# Alignment-cluster in Cluster III 23# Unique mir in Cluster III 37# Unique miR in Cluster III 18
FileName Mapped_Data/sample1_FHG_gp3_Align.txtFileName Mapped_Data/sample1_FHG_gp3_Sum.txtFileName Mapped_Data/sample1_FHG_gp3_miRlist.txt
Mapping to species hsa:
#SequSeq %SequSeq %FinalMappable #UniqueSeq %UniqueSeq %FinalMappable Cluster III 0 0.00% 0.00% 0 0.00% 0.00%#Unique hsa mir in Cluster III 0#Unqiue hsa miR in Cluster III 0
Sequ Seq Unique Seq
Sequ Seq Unique Seq
Sequ Seq Unique Seq
Sample
Data
Table 7 - Cluster IV: Sequ seq mapped to genome, but unmapped to mammalian mirs (predict new hairpin by mfold):
#SequSeq %SequSeq %FinalMappable #UniqueSeq %UniqueSeq %FinalMappable
Cluster IV 18,422 0.19% 0.25% 1,402 0.09% 2.53%# Alignment-cluster in Cluster IV 1,095# Unique miR in Cluster IV 703
FileName Mapped_Data/sample1_FHG_gp4_Align.txtFileName Mapped_Data/sample1_FHG_gp4_Sum.txtFileName Mapped_Data/sample1_FHG_gp4_miRlist.txt
Table 8 - Unmapped Sequ Seq
#SequSeq %SequSeq %FinalMappable #UniqueSeq %UniqueSeq %FinalMappable Nohit 847,339 8.54% 11.31% 40,012 2.51% 72.15%
FileName Mapped_Data/sample1_FHG_nohit.txt
Table 9 - Mapping summary
#SequSeq %SequSeq #UniqueSeq %UniqueSeq
Raw 9,922,513 100.00% 1,592,666 100.00%Mappable 7,493,039 75.52% 55,455 3.48%Mapped to miRbase (including nohit 1) 6,013,453 60.60% 12,456 0.78%Mapped to Cluster I 5,734,742 57.80% 8,335 0.52%Mapped to Cluster II 145 0.00% 12 0.00%Mapped to Cluster III 2,753 0.03% 145 0.01%Mapped to Cluster IV 18,422 0.19% 1,402 0.09%Mapped (total) 5,756,062 58.01% 9,894 0.62%Nohit (including nohit 1 and nohit 2) 847,339 8.54% 40,012 2.51%
Note: Mapped (total) + Nohit should equal to mappable
Table 10 - Detected miR summary
# of unique miRs detected in Cluster I 309# of unique miRs detected in Cluster II 4# of unique miRs detected in Cluster III 18# of unique miRs detected in Cluster IV 703Total 1,034
FileName Mapped_Data/sample1_FHG_uni_miRs.txt
Sequ Seq Unique Seq
Sequ Seq Unique Seq
Sample
Data
Note:Definition of miR_name:
in cluster group4: the miR_name is composed of 3 parts, PC, 5p or 3p, index of sequ seq. PC means 'Predicted Candidate'.
The miR sequence is from the isoform of lowest error# and highest copy# in cluster groups 1, 2 & 4.The miR sequence is from the isoform of highest copy# regardless its error# in cluster groups 3.
# of unique miRs detected in these 4 groups is counted from different miR_seq.
in cluster group1: the miR_name is composed of 4 parts, miR, extension number, 5p or 3p, index of sequ seq. The extension number is the extension number in mirIDs of highest occurency.
in cluster group2: the miR_name is composed of 4 parts, PC, extension number, 5p or 3p, index of sequ seq. PC means 'Predicted Candidate'. The extension number is the extension number in mirIDs of highest occurency.
in cluster group3: the miR_name is composed of 4 parts, PN, extension number, 5p or 3p, index of sequ seq. PN means 'Predicted Novel miR'. The extension number is the extension number in mirIDs of highest occurency.
Sample
Data