sequencin g data report · d and the f rt. experim rt. the stat ary is presen mended sof e...

Prepared by

mic

by LC Sciences

2575 WT

SequcroRNA

s, LLC | ww

W. Bellfort, SuTel. 713 664-

uencinDiscove

sampl

Dr.

Life Science

Pr

LC S

Ja

w.LCSciences

uite 270, Hous7087, Fax 71

ng Dataery Sequ

On

e_473623

For

Researcher

es University

Prepared by

Sciences, LLC

an. 1, 2009

s.com | sup

ston, Texas 73 664-8181

a Repouencing

344

r

of USA

C

port@LCScien

77054

ort g Servic

nces.com

ce

microR

I. PRO

Pro

Tab

D

II. DA

Thedeesavmetgivedatathe

Term

RNA Discove

LC S

OJECT INFO

oject related i

ble 1. Sampl

Customer Samp

Sam

Date Sample

Service R

Data Analysis R

LCS Project

LCS S

ATA REPORT

e received RNep sequencinved onto a DVthods were den in file Daa files which data are give

rminologies Use

Sequ Seq: Ra

Unique Seq:

Copy Number

Count: Numb

Mapping: Al

Mir: pre-miR

miR: mature

ery Sequen

ciences, LLC

2575 WT

ORMATION

information i

le, service, an

Pr

ple Name:

mple Type:

Received:

Requested:

Requested:

t Number:

Sample ID:

T

NA sample wng. The dataVD disc whidescribed in ata_summary

may be in teen in Table 3

ed

aw sequencing

Family of seq

r: Number of

ber of sequ seq

ligning a seque

RNA registered

miRNAs regis

ncing Data R

| www.LCSc


is listed in Ta

nd project tra

roject Informa

sample_47

Human to

12/15/20

microRNA

Standard D

47362344

sample1

was processea generated wch is includeSection III

y_sample1.xlsens of Mbs a3.

reads generate

u seq with sam

sequ seqs in th

qs in the same

ence to a refere

d in miRBase

stered in miRB

Report – sa

ciences.com


able 1.

acking inform

ation

7362344

otal RNA

09

A Discovery Se

Data Analysis

ed to generatwere analyze

ed in this repoof this repoand a summnd the recom

ed in after imag

me sequence

he same unique

unique seq fam

ence database

Base

ample_4736

| support@L

ston, Texas 73 664-8181

mation

equencing Servi

te a cDNA lied and the fort. Experimrt. The stat

mary is presenmmended sof

ge extraction a

e seq family

mily

62344

LCSciences.co

77054

ice

ibrary which full data filesmental procedtistics of the nted in Tableftware progra

and base-calling

2

om

was then uss of 2-3 Gb dures and andata analysi

e 2. The deams for revie

g

sed to were

nalysis s was

etailed ewing

microR

Tab

Raw Mappable Mapped to Mapped to Mapped to Mapped to Mapped to Mapped (toNohit (incl

Floand

41% unma

m

41% unma

m

RNA Discove

LC S

ble 2. A sum

miRbase (inclu Cluster I Cluster II Cluster III Cluster IV otal) luding nohit 1 a

ow chart of sd the number

4,722,478 reare mappedmiRNA ca

apped

91% mappable

7.42% mRN1.35% ADT 0.05% Junk0.07% Sequ

9,031,007(91%) are m

4,722,478 reare mappedmiRNA ca

apped

91% mappable

7.42% mRN1.35% ADT 0.05% Junk0.07% Sequ

9,031,007(91%) are m

ery Sequen

ciences, LLC

2575 WT

mmary of stan

uding nohit 1)

and nohit 2)

sequencing dr of miRNAs

eads (59%) d to or are

andidates

NA, RFam,repbasfilter

k filteruence pattern filt

7 reads mappable

eads (59%) d to or are

andidates

NA, RFam,repbasfilter

k filteruence pattern filt

7 reads mappable

ncing Data R

| www.LCSc


ndard data an

#Sequ

6,870,4,764,4,200,4,181,

1102,1113,19

4,196,567,7

data analysis s detected.

se filter

ter

se filter

ter

Report – sa

ciences.com


nalysis result

uSeq %Se

,727 10,254 69,913 61,077 600 016 091 0,494 61760 8

of a single s

6

pas

19% (gro149 miRNA

38% (gro292 miRNAs

6

pas

19% (gro149 miRNA

38% (gro292 miRNAs

ample_4736

| support@L

ston, Texas 73 664-8181

ts

equSeq #

0.00%9.34%1.14%0.85%.00%.03%.19%1.08%.26%

equencing re

6.07% length<15

88%ss optional filte

7,944,551 repassed opt

oup 3)As detected

our 4)detected

6.07% length<15

88%ss optional filte

7,944,551 repassed opt

oup 3)As detected

our 4)detected

62344

LCSciences.co

77054

#UniqueSeq

1,445,905 40,890 8,971 7,930

9134

1,128 9,201 31,689

eaction throu

5.96% copy#<

or > 26

r

eads (88%) ional filter

39300 m

4% (g27 mi

5.96% copy#<

or > 26

r

eads (88%) ional filter

39300 m

4% (g27 mi

3

om

%UniqueSeq

100.00%2.83%0.62%0.55%0.00%0.01%0.08%0.64%2.19%

ugh various f

<3

9% (group 1)miRNA detecte

group 2)RNAs detected

<3

9% (group 1)miRNA detecte

group 2)RNAs detected

filters

d

d

d

d

microR

Tab

Folder

Raw Data

Filtered Data

Mapped Data

Summary

RNA Discove

LC S

ble 3. Data

D

sample1_RawD

sample1_FHG_

sample1_FHG_

sample1_FHG_

sample1_FHG_

sample1_FHG_

sample1_FHG_

sample1_FHG_

sample1_FHG_sample1_FHG_sample1_FHG_




sample1_FHG_

sample1_FHG_

sample1_FHG_

sample1_FHG_

Data_summary

ery Sequen

ciences, LLC

2575 WT

files delivere

Data Files

ata.txt

_unique.txt

_pass.txt

_long.txt

_short.txt

_hc.txt

_lc.txt

_db.fa

_gp1_Align.txt _gp1_miRlist.tx_gp1_Sum.txt




_uni_miRs.txt

_nohit.txt

_clusterPosition

_miRdistributio

y_sample1.xls

ncing Data R

| www.LCSc


ed and progra

Seob

Se

Un

Un

Un

Un

Un

Fi

xt Cl

xt Cl

xt Cl

xt Cl

Thto

Unre

n.txt Gth

on.png Plin

Stste

Report – sa

ciences.com


ams recomm

Description

equencing sequebtained from se

equ seqs listed by

Unique seqs passed

Unique seqs with l

Unique seqs with l

Unique seqs with c

Unique seqs with c

inal mappable u

luster I: see Tab

luster II: see Ta

luster III: see T

luster IV: see T

he list of all uniqo IV

Unique seqs havingference librarie

enomic chromohe mapped uniqu

lot of position oside genome

tatistics of data eps and the fina

ample_4736

| support@L

ston, Texas 73 664-8181

mended for re

ences (sequ seqs)equencer

y family (unique

d digital filters

length >= 15

length < 15

copy number >

copy number <

unique seqs

ble 4

able 4

Table 4

Table 4

ique seqs from C

g no hit with es or the genom

osomal positionue seqs

of mapped uniq

analysis at the val results

62344

LCSciences.co

77054

eviewing

RePr

) as Wor

seqs) Wor

Wor

Wor

Wor

>=3 Wor

<3 Wor

Wor

WorExceExce

WorExceExce

WorExceExce

WorExceExce

luster I Wor

me Wor

ns of Exce

que seqs Pain

various Exce

4

om

eviewing rogram

rdpad

rdpad

rdpad

rdpad

rdpad

rdpad

rdpad

rdpad

rdpad el el

rdpad el el

rdpad el el

rdpad el el

rdpad

rdpad

el

t

el

microR

III. M

A.

B.

RNA Discove

LC S

ETHODS AN

Small RNA

A small RNsample predescribed b

1. Small RN

The recUrea poquantifie

2. 5’ and 3

The SRAT4 RNAborate-Ewere isoa secondfragmenprecipita

3. Reverse T

The ligaM-MLVamplifiesmall RN

4. Purificati

PCR progel of ~were prmini-flu(Invitrog

10 L w

Deep Sequ

The purifieand then sinstrument

ery Sequen

ciences, LLC

2575 WT

ND EXPERIM

A Library C

NA library weparation inbelow.

NA Isolation

ceived total Rolyacrylamideed following

’ Adapter Liga

A 5’ adapter A ligase (ProEDTA-Urea olated. The Sd size-fractionts of size ation.

Transcription a

ated RNA fraV (Invitrogened with pfx DNA primers

ion of Amplifie

oducts prepa~80-115 bps recipitated anuorometer (Tgen). The co

was used in se

uencing

ed cDNA libsequenced ont. Raw seque

ncing Data R

| www.LCSc


ENTS

Construction

was generatestruction 1 . A

by Denaturing

RNA samplee gel. The gel elution, a

ation

(Illumina) womega). Thepolyacrylam

SRA 3’ adaptonation using

~64-99 nt

and PCR Amp

agments wern) with RT-pDNA polymeset.

ed cDNA Lib

ared were puwas excised

nd quantifiedTurner Biosysoncentration

equencing re

brary was usen Illumina G

encing reads

Report – sa

ciences.com


n

ed from the A summary

g PAGE Gel

e was size-frRNA fragmand ethanol p

was ligated toe ligated RNmide gel and

er (Illumina)g the same gets were isol

plification

re reverse traprimers recomerase (Invitro

brary for Seque

urified on a 1d. This fractid on Nanodstems) using of the samp

action.

ed for clusteGAIIx followwere obtaine

ample_4736

| support@L

ston, Texas 73 664-8181

customer saof the pro

ractionated oments of leng

precipitated.

the aforemeAs were sizethe RNA f

ligation wasel condition alated throug

anscribed to mmended byogen) in 20 c

encing

2% TBE poon was eluterop (ThermoPicogreen®

ple was adjus

er generationwing vendor’ed using Illu

62344

LCSciences.co

77054

ample accordcedures per

on a 15% trgth 15-50 n

entioned RNe-fractionatefragments ofs then perforas described gh gel eluti

single-strandy Illumina. Tcycles of PCR

olyacrylamideed and the ro Scientific)

® dsDNA quasted to ~10 n

n on Illumina’s instruction

umina’s Pipel

5

om

ding to Illumformed is b

ris-borate-EDnts were iso

NA fragmentsed on a 15%f size ~41-7rmed, followabove. The ion and eth

ded cDNAsThe cDNAsR using Illum

e gel and a slirecovered cD

and on TBSantization renM and a to

a’s Cluster Stn for runninline v1.5 sof

mina’s briefly

DTA-olated,

s with % tris-76 nts ed by RNA hanol

using were

mina’s

ice of DNAs S-380

eagent tal of

tation ng the ftware

microR

C.

RNA Discove

LC S

following Pipeline Bsample1_Rain the next

Standard D

A proprietstandard danalysis res

5. Obtaining

After thimage dun-mappand used

a. Gen

In thfamiAn e

23 T

whecoun

b. Gen

In tchemdetedatastore

c. Filte

In tleng15 sampsamp

ery Sequen

ciences, LLC

2575 WT

sequencing iBustard MoawData.txt an

Section.

Data Analys

tary softwardata analysis. sults are desc

ng Mappable S

he raw sequedata, a series pable sequend for mappin

nerating Uniq

his step, the ily of sequenexample of a

TTTGTCGG

ere 23 is the nt (copy num

nerating Map

this step, thmistry and ector were reabase files. Ted in file samp

ering Unique S

this step, unigths. Unique

nts for ple1_FHG_lople1_FHG_sh

ncing Data R

| www.LCSc


image analysodule. The nd were then

sis

re package, The key fu

cribed here.

equences from R

ence reads, oof digital fil

ncing reads. ng.

que Families

same sequ seqnces (uniquea typical entry

GTCTTTGG

index of thimber) of the s

pable Sequ S

he “impurity”processes, amoved to giv

Those remainmple1_FHG_ p

Seqs by Leng

ique seqs wereseqs with seqmicroRNA

ong.txt, while hort.txt.

Report – sa

ciences.com


sis by Pipeliextracted sused in the

ACGT101-munctions perf

Raw Sequencin

r sequenced lters (LC SciA Fasta file

of Sequ Seqs

qs in the raw e seqs) file, sy of this file i

GATATGCC

s sequence, sequ seq.

eqs

” sequencesand the optve sequ seqs w

ning sequ seqspass.txt.

gth

e separated iquence lengt

discovery)those of sho

ample_4736

| support@L

ston, Texas 73 664-8181

ine Firecrestsequencing standard dat

miR v3.x (Lformed by th

ng Data

sequences (iences) were named samp

by Sorting R

data file wersample1_FHis as shown b

GTGTGAC

followed by

due to samtical digital which were u

were groupe

into two groth greater tha) were saorter length

62344

LCSciences.co

77054

t Module anreads were

ta analysis, w

LC Scienceshis software

(sequ seqs) weemployed to

ple1_FHG_db

Raw Sequenci

re being counHG_unique.tx

below:

CAATGGTG

the sequ seq,

mple prepararesolution o

used to map wed by familie

oups based oan a cut-off aved in thwere saved

6

om

nd base-calline stored inwhich is desc

s), was usedand the rel

re extracted o remove vab.fa was gene

ing Reads

nted and a uxt, was gener

G 1,8560

and 18560 i

ation, sequeof the sequwith the refees (unique seqs

on their sequlength (defahe file nin the file n

ng by n file cribed

d for levant

from arious erated

unique rated.

is the

ncing uencer erence s) and

uence ault = named named

microRRNA Discove

LC S

d. Filte

In thwithstorestoremea

e. Rem

Stanmapthis

6. Mapping

In this S(mir) andgenomeon mirs mappingthe analyare sum

a. Map

TheThe whilmiR

b. Map

The werewere“uniseqs Theifile w

Theas “u

ery Sequen

ciences, LLC

2575 WT

ering Unique S

his step, uniqh copy numbed in the filed in the fil

ans low copy.

moving Uniqu

ndard procedpped to mRNstep were sa

g Mappable Un

Section, variod mature miR

e based on thof interest

gs were docuyses are presmarized in T

pping Unique

cleaned uniqmapped uni

le the remaiRbase”.

pping Mirs M

mirs to whice mapped we sorted outique seqs mappwere categoir alignmentwas also gene

unique seqs munique seqs ma

ncing Data R

| www.LCSc


Seqs by Copy

que seqs werebers greater le named sale named sam.

ue Seqs from C

dures were NA, RFam aaved in the fil

nique Seqs to M

ous “mappinRNA (miR) she public releagainst genoumented in sented belowTable 4.

Seqs to Mirs

que seqs in samique seqs wereining ones w

Mapped by Un

ch the uniquewere further bt and the unped to mirs th

orized as Cluts were preserated and sa

mapped to thapped to mir

Report – sa

ciences.com


y Number

e further sortthan a prede

ample1_FHG_mple1_FHG_

Certain Know

employed tand Repbase.le called samp

Mirs and Geno

ngs” were persequences liseases of approme sequencthe ACGT-1

w and the cha

in miRbase

mple1_FHG_e grouped aswere groupe

nique Seqs to G

e seqs in “uniqblasted againnique seqs asshat further m

uster I and sented in fileaved as sampl

he mirs that wrs that un-ma

ample_4736

| support@L

ston, Texas 73 664-8181

ted based onefined cut-of_hc.txt while

_lc.txt, wherea

wn RNA Re

o remove t. The uniqueple1_FHG_db

ome

rformed on ted in the latropriate specce. Methods 101 User’s Maracteristics o

_db.fa were bls “unique seqsed as “uniqu

Genome

que seqs mappnst genome. ociated with

mapped to gesaved in filee sample1_FHle1_FHG_gp

were not mapapped to geno

62344

LCSciences.co

77054

n their copy ff number (d

those with as hc means

ference Data

those unique seqs which pb.fa.

unique seqs agtest release ocies. Mappingand criteria

Manual5. Brieof various gro

lasted againsmapped to

ue seqs un-m

ped to mirs inThe mirs ma

h these mirs enome”. Thsample1_FH

HG_gp1_Alig1_Sum.txt.

pped to genoome”.

7

om

numbers. Tdefault = 3) less copies high copy a

abases

seqs which passed the fil

gainst pre-miof miRBase2, 3

gs were also used for va

ef descriptiooups of uniqu

t mirs in miRmirs in miRbapped to m

n miRbase” gapped to genwere group

his group of uHG_gp1_miRlign.txt. A sum

ome were gro

Those were were

and lc

were lter at

iRNA 3, 4, or done

arious ns of ue seqs

Rbase. base”,

mirs in

group nome ed as unique list.txt.

mmary

ouped

microRRNA Discove

LC S

c. Map

The genohereto sampsampsamp

The uniq

d. Map

The mirs mapmap“unigenomsampsampsamp

The groutheir

e. Map

UniqagainexteformuniqgenoCluspresand

All map

ery Sequen

ciences, LLC

2575 WT

pping Unique

unique seqs ome” were be were group

genome” aple1_FHG_gpple1_FHG_gpple1_FHG_gp

remaining uue seqs nor th

pping Unique

unique seqs imapped to gen

pped to any pped. All uniqique seqs mappme” and ple1_FHG_gpple1_FHG_gpple1_FHG_gp

rest of uniquped as “uniqr mirs mappe

pping Unique

que seqs in thenst genome

ended sequenmation of staue seqs were ome with poster IV and sented in filesaved as samp

unique seqs ipped miRs or

ncing Data R

| www.LCSc


Seqs with Mi

in the groublasted again

ped as “uniquand were p2_miRlist.txtp2_Align.txt. p2_Sum.txt.

unique seqs weheir mirs mapp

e Seqs to miR

n the group nome” were fmature miR

ique seqs in thped to mirs awere categ

p3_miRlist.txtp3_Align.txt. p3_Sum.txt.

que seqs in thque seqs mapp

ed to genome

Seqs Un-map

e group of “u directly annces of the able hairpins.

then groupeossible hairpsaved in file

e sample1_FHmple1_FHG_gp

in Cluster Ir predicted m

Report – sa

ciences.com


irs Un-mappe

up “unique snst genome. ue seqs mappe

categorized t. Their aA summary

ere grouped aped to genom

Rs

“unique seqs further categ

RNAs (miRs) his group thaand miRs but gorized ast. Their aA summary

he group thaped to mirs be” and terme

pped to Mirs

unique seqs unnd those mmapped gen. When stabled as “uniqu

pin formatione sample1_FHHG_gp4_Aliggp4_Sum.txt

to IV wermiRs.

ample_4736

| support@L

ston, Texas 73 664-8181

ed to Genom

seqs mapped The unique d to mirs and

as Clustealignments y file was a

as “unique seqme”.

mapped to mirgorized based

in the mirsat were mapt neither uniqu Cluster alignments y file was a

at were un-mbut not to md as “unique s

s to Genome

n-mapped to apped to gnome sequenle hairpins wue seqs un-man”. These un

HG_gp4_miRgn.txt. A sum

re listed in

62344

LCSciences.co

77054

me to Genom

to mirs thaseqs that ma

d genome buer II and

were prealso generate

qs mapped to

rs but neither ud on whetheto which thped to miRs ue seqs nor th

III and were pre

also generate

mapped to mmiR and neithseqs nohit 1”.

mirs in miRbgenome werences were te

were predictedapped to mirnique seqs we

Rlist.txt. Theirmmary file w

sample1_FHG

8

om

me

at un-mappeapped to genut mirs un-ma

saved in esented in ed and save

o mirs but ne

unique seqs noer unique seqse unique seqs

s were groupheir mirs mapp

saved in esented in ed and save

miRs were fuher unique seq.

base” were ble identified. ested for pod, their assocrs but mappere categorizr alignments

was also gene

G_uni_miRs.t

ed to nome apped

file file

ed as

either

or their were were

ped as ped to

file file

ed as

urther qs nor

lasted The

ssible ciated ed to ed as were

erated

txt as

microR

Table 4.

Clusters

Cluster I

Cluster II

Cluster III

Cluster IV

Unique seqnohit

* N

√×

RNA Discove

LC S

The grouseqs

The were

f. Plot

The chrosampsamp

Summary o

Unique seqmapped to

Unique seqbut mirs u

I

Unique seqneither unto genom

V

Unique seqmapped tohairpin fo

qs

Unique seqmiR and nmapped to

Unique seqgenome (u

Note:

√ indicates t× indicates t

ery Sequen

ciences, LLC

2575 WT

unique seqs tuped as “uniqnohit 2”.

unique seqs ie combined a

t of the Chro

genomic pomosomes ple1_FHG_clple1_FHG_m

f mapping o

Group Descri

qs mapped to mio genome

qs mapped to min-mapped to g

qs mapped to minique seqs nor thee

qs un-mapped too genome with

ormation

qs mapped to mineither unique seo genome (uniq

qs un-mapped tounique seqs nohit

that a unique sthat a unique

ncing Data R

| www.LCSc


that were maque seqs un-m

in both grouas “unique seq

omosome Ge

positions of and the

lustposition.tmiRdistributio

f unique seqs t

iption

irs that further

irs and genomeenome

irs and miRs buteir mirs mapped

o mirs but possible

irs but not to qs nor their mir

que seqs nohit 1)

o mirs and t 2)

seq was mappseq was not m

Report – sa

ciences.com


apped neitheapped to mir

ups of “uniqqs nohit” and

enomic Positi

f the Clusteresults

txt and don.png.

to mirs, miRs,

mir

√

√

t d √

×

rs √

×

ped to the mimapped to th

ample_4736

| support@L

ston, Texas 73 664-8181

er to mirs in rs and genom

ue seqs nohitd saved in file

ions of the M

r I to IV were savedisplayed i

and genome

Unique se

Genome

√

√

×

√

×

×

ir, miR, or genhe mir, miR, o

62344

LCSciences.co

77054

mirBase norme” and were

t 1” and “une sample1_FH

Mapped Uniq

sequences wed in thin the pl

e*

eqs Mapped*

miR Com

mirto g

√ mirto g

× mirto g

nome. or genome.

9

om

r to genome e termed as “u

nique seqs nohHG_nohit.txt.

que Seqs

were mappehe file nlot file n

mments

rs un-mapped genome

rs un-mapped genome

rs un-mapped genome

were unique

hit 2”

ed to named named

microR

Leng

111112222222

Total

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

RNA Discove

LC S

gth (nt)

15 16 17 18 19 20 21 22 23 24 25 26

105,314

0

0,000

0,000

0,000

0,000

0,000

0,000

0,000

0,000

15

ery Sequen

ciences, LLC

2575 WT

Le

# of Re

105,3181,1377,67189,77200,63377,82

2,203,53,533,5926,89184,7142,9320,54

7,944,5

4 81,131 77,6

16 17

ncing Data R

| www.LCSc


ength Distrib

eads

14 1 2 76 39 24 539 567 97 13 0

49 551

672189,776200

7 18

Report – sa

ciences.com


bution of Ma

% of Total Reads 1.33 1.02 0.98 2.39 2.53 4.76 27.74 44.48 11.67 2.33 0.54 0.26 100

0,639377,824

2,

19 20

ample_4736

| support@L

ston, Texas 73 664-8181

appable Data

# of USe291311152027516352291711

34,

,203,539

3,533,5

21 22

62344

LCSciences.co

77054

a

Unique eqs

R

990 344 54

519 003 797 34

370 290 989 762 19 471

567

926,897

184,

23 24

10

om

Reads # / Unseqs35.260.467.3124.9100.2135.1429.2554.7175.261.824.418.4230.5

,71342,930 20

4 25

nique

0,549

26

microRNA Discovery Sequencing Data Report – sample_47362344 11

LC Sciences, LLC | www.LCSciences.com | [email protected]

2575 W. Bellfort, Suite 270, Houston, Texas 77054 Tel. 713 664-7087, Fax 713 664-8181

Chromosomal location of pre-miRNAs. The relative locations of individual pre-miRNAs (mir) are shown across the 19 chromosomes. MID (Maximum Inter-Distance) is the maximum distance between any two pre-miRNAs on a same chromosome considered to be in the same cluster. Fifty-nine clusters (black dots) are obtained under the MID is limited to 50 kb.

microR

IV. RE

1

2

3

4 5

RNA Discove

LC S

FERENCE

“Preparing 2008;

Griffiths-JomicroRNA

Griffiths-JomicroRNA D140-D144

Griffiths-Jo

LC Sciences

ery Sequen

ciences, LLC

2575 WT

Samples for

nes, S., Saigenomics, 20

nes, S., Grosequences,

4;

nes, S., The m

s ACGT 101

ncing Data R

| www.LCSc


Analysis of

ini, H.K., v008, Nucleic A

ocock, R.J., vtargets and

microRNA R

manual;

Report – sa

ciences.com


f Small RNA

van DongenAcids Research

van Dongen,gene nome

Registry, 200

ample_4736

| support@L

ston, Texas 73 664-8181

A”, Illumina I

, S., Enrighh, 36, D154-D

S., Batemannclature, 20

04, Nucleic Ac

62344

LCSciences.co

77054

Inc., Part #

ht, A.J., miRD158;

n, A., Enrigh06, Nucleic A

cids Research 3

12

om

1004239 Re

RBase: tool

ht, A.J., miRAcids Research

2, D109-D11

ev. A,

s for

RBase: h, 34,

11;

Example printouts of the included sequence data files are attached.

The data files included with each sample report are listed in Table 3.

These printouts represent truncated sample data files.

sample1_RawData

>ILLUMINA-57021F:7:1:3:1127#0/1 ACATTGGGTTCTCATTCAAATATACTTTTGAAGTATGTGC >ILLUMINA-57021F:7:1:3:352#0/1 ACGAAGAGGGAGCGCAATNNNTCAGTATATATTGAAGGAC >ILLUMINA-57021F:7:1:3:804#0/1 TCTGAGCAGTGACTAGNACCCGTAATANGAGGTGAGCAGC >ILLUMINA-57021F:7:1:3:2003#0/1 CAACAAGGGTAAGTTAATGCAATCGCCCCTCCNNAAAGGG >ILLUMINA-57021F:7:1:3:399#0/1 TTAAGGTAATTAGCGTGGGCGGTAGCGCTCTGTATAAGCT >ILLUMINA-57021F:7:1:3:921#0/1 TAAGCAGGCCATGCCGCTACGNNGGGGATAAATCTGGCTG >ILLUMINA-57021F:7:1:3:1598#0/1 TATTGGGAGGGAATAGATCGCTGNCCAGCCTGATNTAAGG >ILLUMINA-57021F:7:1:3:1981#0/1 GCCGCCTGCCTAGGTCTTCTTATCTTGAGAATGAGTCAAG >ILLUMINA-57021F:7:1:3:118#0/1 TGAGCACACATTACATAATGCGGCTACTGTTTGACAAAGT >ILLUMINA-57021F:7:1:3:1347#0/1 CTCGATGTAAGAGATCACTATTTCGCCACATGGTATTCCG >ILLUMINA-57021F:7:1:3:185#0/1 TAAGGGACCTATTGTCAGCGGATCACAATGTCTTGAAGGA >ILLUMINA-57021F:7:1:3:824#0/1 TTTGCAAACCGATGTCAGTGGCACTGCAAATGTCCACTGT >ILLUMINA-57021F:7:1:3:1603#0/1 CAAAACAAACGTAATACGGCGGTATCCCACTAGAGTTGTC >ILLUMINA-57021F:7:1:3:613#0/1 TCCCATGAATCAGGCCNGACTAGGGAAANTTCAATCAGAC >ILLUMINA-57021F:7:1:3:1459#0/1 GCCCGCCGTTATGGACAATAAGTAAATTGCTACAATTGAC >ILLUMINA-57021F:7:1:3:494#0/1 GCTGTTCTAGAAAATGGTTTATCTATTCCTGCGTCAATCT >ILLUMINA-57021F:7:1:3:1747#0/1 TGGAAAACTCTATTAGAGTCTAACTTATCCAATGCGCACG >ILLUMINA-57021F:7:1:3:1574#0/1 AGTAAATNATANAAATAAAATTAAAAAAAAAANAAAAAAA >ILLUMINA-57021F:7:1:3:1448#0/1 CTATGTACAGCCACTCTCTTGATGGCGGGAAATATTTATT >ILLUMINA-57021F:7:1:3:149#0/1 GGTGCTGGATTCCCGTTTTGCGTATTTTGGGAGAGGTCCA >ILLUMINA-57021F:7:1:3:460#0/1 TAGGCTGTTTGCTACATTTTGAGACAAACTGTATAGAGTG >ILLUMINA-57021F:7:1:3:1436#0/1 AATGGCGGAGCGATTTATAGGGAGAGGGGCGATTGGCTCG >ILLUMINA-57021F:7:1:3:1847#0/1 GACTATCTGCCTGTAGCGGATAAGGCAGCATCCAACCTAA >ILLUMINA-57021F:7:1:3:909#0/1 sample1_RawData

Sample

Data

sample1_FHG_unique

# of input sequences: 6375923

# of families: 1436566; 1436566/6375923=21.0%

Index seq count

1 ATTATGGTACTTGTATTTAACAGGCTCACT 912465

2 CCTCTTATGTAGACCGTTGTCCAGTGGTGA 606708

3 ATTTTTATAGTACCAAGAGGCTACGCAGT 560130

4 GTAAAAAGTCTATCGCCGCACTGTCGTCA 243986

5 CTTAACGGTTCTAACTATTCACCGGTAAAG 185486

6 CAAAGAAGCCATAGGCGCCCGGGAACACC 172620

7 AACCGTGAGGCGCTGGAAAGACGCTAGAAG 177233

8 AGTAGTACTGGCGTACACATTCTCCACGG 171389

9 CATCCTATTCTAGCAATCAGGAGAACATTC 165265

10 AGGCCCATATCAAGAAGTAGAACTATCGA 137449

11 ACATGAATGGCGAATGCTTCCCGTGATA 128553

12 GTTAGCTAGTGCCCGGTTTTATCAAGCCC 115221

13 CGCGAATGTCTTCGTATGCTCAGGTAGC 124749

14 CATGAGAGGTCGAGGGACTTGATTCCTAC 114996

15 CTTCTGGCACCGTGGGCCAGCGGAAGGACA 113310

16 TGCCCCAACGACGCGGAAAATCAAGCGAGC 106560

17 CTTCTGGAAGCCAACGCTCTGGCGGGATCC 111243

18 AAACAAATTAACTCGACGACCTCTCCTCT 107082

19 AAACTATGCATTATTTCCCCCTAAGATCT 99460

20 CATATCTGTCTCCTACCAGATTATCACCCC 105833

21 TGTTTGCTGTGGCATTTTTCCCATGGATTA 102013

22 ATTATTAACGTGGTGTGGTAAATAGAGGGT 92352

23 GGTTCATGCCTAAATTGCATCTATAATA 81208

24 GAGTCGTAACGCTACCCTATACGAAGCG 95005

25 CCTTGAATGTGACCCTGAGGCTTCTATTAG 80527

26 AAGAAGCGTCAACCCCCACGTCAAGACGT 84791

27 TCCGTTGCTAGCCGGAGGACCTCCTGGT 86030

28 CCGCAGAGCCAGGCACTATGTCAGGGGCTA 91730

29 CATATGGCAAAAGACCGGACTGGACGCGA 89980

30 ACATCAAGAATCCTCAATCCTACGTGGACG 84201

31 AAGAGCCGGAGAACACATGATGGAGGCGAC 86202

32 TGCGATAAAAACGGTGATGACCAAAGAACA 72819

33 TGATCACGAAAAGTTGCTTGACAAGGTT 77665

34 CTGGTTAAGCACCCCCTGGTGGTGCTGCCT 82949

35 TGGGGCTACCGGGGCCTACACGCACCCAT 83437

36 GTTTATACTATTAATATGCAATGGTGACT 80270

37 GCCCCAAGCGTAGGTTGGGGGTCCGTTCG 69748

38 GCCCGGAGTCAGATGACTCGCTTACGTG 66301

39 CACGCTAGAAGGTGCTAGGGCTAGCTCTTT 81592

40 GTCATGGGAGCATTCATGCCGCGACGCAC 87826

41 CGCCTTTACTCCTGGAAGATATGACATGA 75275

42 GTAAACCCCGGGCGGTGCAACCACAGGCGG 64396

43 TGGTCTGGGCATTGTGCTTGAGCACACTTA 67972

Sample

Data

sample1_FHG_pass

# of input seq: 63847563

# of input family: 984732

# of seq after filter: 5876673; 5876673/6184267=95.0%

# of family after filter: 826122; 826122/987972=83.6%

# of seq with repeat fragment: 307594; 307594/6184267=5.0%

# of family with repeat fragment: 161850; 161850/987972=16.4%

# of seq of >=7A,>=8C,>=6G,>=7T: 307563; 307563/6184267=5.0%

# of family of >=7A,>=8C,>=6G,>=7T: 161819; 161819/987972=16.4%

# of seq of >10 dimer: 5; 5/6184267=0.0%

# of family of >10 dimer: 5; 5/987972=0.0%

# of seq of >6 trimer: 18; 18/6184267=0.0%

# of family of >6 trimer: 18; 18/987972=0.0%

# of seq of >5 tetramer: 8; 8/6184267=0.0%

# of family of >5 tetramer: 8; 8/987972=0.0%

Index seq count

1 TAGCACTCAAGTGTTTTGCACTGG 870337

2 AGAACGGTTTGCTATTTCTG 559024

3 TAACGGGGGTCACCTTCGGCAG 519382

4 CTCGAATTTTTCCAATCAC 198458

5 GTATGCATAATTGCAAGCACAT 135313

6 TTGCCTCACTCGTACAAAAGGCC 124018

7 GGATCAGGACCCGACTCCACATTAG 120386

8 AGAGGCCACCAAGATCTTAGGCC 115000

9 AAACAGCGTCTCAGTGTAATTG 99591

10 TCTGTCTATCTTCTTAAT 87246

11 CTGTCCCCCTTGTCTGATACA 78864

12 ACAAATGGCGGACGCGAATC 64239

13 GGCAAGTCTTCCTGCGA 59002

14 TGATGGAGGAACGTAAACGT 57130

15 GTCCACATCATCCCGGGGTCG 52244

16 TCGCGTATTGCTATTTAGGA 51227

17 ATCAACAACCAATCTCGATAT 49693

18 TATCAGTAACGTACATGCCCCCGAT 49008

19 ACTCCTGGAGGTCTCGCTCGTCTA 45497

20 AATCGAATCTTCGATACGTCGT 39503

21 CGTGGAGGGAGGCCGTCAGTTT 37584

22 GTCCATCTAGCCAATAGGC 32341

23 AGTAACCAACCGTGAGAGTGTTGGC 30809

24 GCTGGTTTTTGGAGCATG 30123

25 AGTAGGTCTGTAAGGGGT 29548

26 CAGTAAACGAAAGGACCGAGACT 29963

27 TTACACAAACATATCAGCGAT 27799

28 CTCGTTATTAGTAATACTC 24727

29 TAAGTGGTAAATTAACCGTTACACC 24531

30 TCGCGAGCTGACCAGTATCACG 23074

Sample

Data

sample1_FHG_gp1_Align

input file: sample1/mirAlign/sample1_FHG_Align.txt for mirs input file: sample1/db/sample1_FHG_db.fa for sequ seq input file: sample1/output/sample1_FHG_gp1.txt for cluster & genomeSeq Conventions: 1. the . in alignments means that the base is same as that in reference. 2. the * in alignments means that the base is same as that in reference, but the * is the mature part of precursor. The capital bases in * region also belong to mature. 3. the * in #error means that this sequseq has deletion compared with reference. 4. the + in #error means that this sequseq is without 3ADT cut, and the previous part of the sequseq is mapped to the reference and the other part is removed. 5. full length precursor and sequseq (except for the sequseq without 3ADT cut, which is indicated by + in #error) are listed below. clusterNo=1 chr=1 gi=NT_032977 strand=1 #mirs=7 #copy(all)=3832 #family(all)=37 #copy(0error)=1167 #family(0error)=12 #copy(1error)=2665 #family(1error)=25 genome 11191944 GTATGCCTTAACAGCAAGCGCAGTAGCGTAGCGACTGGGCATGAACGCGACGTTGATGAACTCGTAAGTTCTTCCACAAGTCTGACCGTCGTATAAG 11192040 #error

hsa-mir-XXe 1 ................**********************....................**********************............ 92 hsa-miR-XXe,hsa-miR-XXe* ptr-mir-XXe 1 ...............**********************...................................................... 91 ptr-miR-XXe mml-mir-XXe 1 ..........a.....**********************...................................................... 92 mml-miR-XXe mmu-mir-XXe 1 ................**********************............c.......**********************............ 92 mmu-miR-XXe,mmu-miR-XXe* bta-mir-XXe 1 ................************************......c...cga....................................... 92 bta-miR-XXe-5p oan-mir-XXe 1 .t.............***********************..g...... c.ga.....**********************.............a.. 94 oan-miR-XXe,oan-miR-XXe* cfa-mir-XXe 1 ................................a.......g.********************** 64 cfa-miR-XXe 275_count=837 1 ........................ 24 0 2659_count=62 1 ....................... 23 0 4472_count=32 1 ....................... 23 0 10481_count=12 1 ...................... 22 0 16177_count=7 1 ......................... 25 0 16949_count=7 1 ........................ 24 0 28999_count=4 1 ......................... 25 0 189_count=1390 1 ............c........... 24 1 336_count=643 1 ............c.......... 23 1 990_count=194 1 ............c......... 22 1 3693_count=41 1 ........................t 25 1 6354_count=21 1 ...........c........... 23 1 7153_count=19 1 ........................a 25 1 8769_count=15 1 ............c........ 21 1 15029_count=8 1 ...........c............ 24 1 18204_count=7 1 ...........c.......... 22 1 23103_count=5 1 .......................a 24 1 27729_count=4 1 .............t.......... 24 1 40840_count=3 1 ..........t............. 24 1 47403_count=3 1 ............c...... 19 1 38914_count=3 1 .................t...... 24 1 37319_count=3 1 ............c....... 20 1 1341_count=133 1 ...................... 22 0 2672_count=61 1 ..................... 21 0 27623_count=4 1 ................... 19 0 28829_count=4 1 ..................... 21 0 32112_count=4 1 .................... 20 0 1111_count=168 1 .....................t 22 1 2077_count=80 1 .................g.... 22 1 6959_count=19 1 .................g... 21 1 10215_count=12 1 ................g.... 21 1 12905_count=9 1 ......................a 23 1

Sample

Data

sample1_FHG_gp1_Align

19877_count=6 1 ................g... 20 1 42447_count=3 1 ....................t 21 1 clusterNo=2 chr=1 gi=NT_032977 strand=1 #mirs=5 #copy(all)=156 #family(all)=7 #copy(0error)=137 #family(0error)=4 #copy(1error)=19 #family(1error)=3 genome 11194868 CTCTTGCGAAAAATAAATAAACGCTCAATTAGATGGCGGCGGATTGGGTCCCCCCTAGAAGCGACAGGGTTGCTGCTGAACTCGGTGGTTCTGTGAG 11194968 #error

bta-mir- XXc 4 a..g.........c........***********************.........g............................................g. 104 bta-miR-XXc hsa-mir-XXc-1 1 ................***********************................**********************............ 89 hsa-miR-XXc ptr-mir-XXc-1 1 ...............***********************.................................................. 88 ptr-miR-XXc mmu-mir-XXc-1 1 ......t.........***********************................**********************............ 89 mmu-miR-XXc cfa-mir-XXc-1 1 ************************..................................... 61 cfa-miR-XXc 1630_count=106 1 ........................ 24 0 7100_count=19 1 ....................... 23 0 16195_count=7 1 ...................... 22 0 11836_count=10 1 .......................a 24 1 18340_count=6 1 ........................a 25 1 44145_count=3 1 ........................t 25 1 23678_count=5 1 ...................... 22 0 clusterNo=3 chr=1 gi=NT_019273 strand=1 #mirs=5 #copy(all)=22 #family(all)=2 #copy(0error)=19 #family(0error)=1 #copy(1error)=3 #family(1error)=1 genome 6049197 TCATAAAAATGTCGAGGAATGGCGGCTCGCGTAGACCCGCACCCCACCCCTTCGAAGCTCATTGCGTCAGTTCCACGATTC 6049277 #error

mmu-mir-XXX 1 .................................................**********************......... 80 mmu-miR-XXX bta-mir-XXX 1 .................................................**********************.g....a.. 80 bta-miR-XXX hsa-mir-XXX 1 ...............................................**********************...... 75 hsa-miR-XXX mne-mir-XXX 1 ...............................................*******************G**...... 75 mne-miR-XXX cfa-mir-XXX 1 .......................................********************** 61 cfa-miR-XXX 6971_count=19 1 ...................... 22 0 48948_count=3 1 .......................a 24 1 clusterNo=4 chr=1 gi=NT_019273 strand=1 #mirs=2 #copy(all)=33402 #family(all)=49 #copy(0error)=481 #family(0error)=12 #copy(1error)=32921 #family(1error)=37 genome 13122055 GCTCCCCTATAAGAAGCGCGGAAGCCGGCTTATATGTTTCCCCATTATCATCGAACTTTCGATTGGGCCCCGTAACTCT 13122133 #error

ptr-mir-XXX-1 1 ......................................**********************.................. 78 ptr-miR-XXX hsa-mir-XXX-1 1 ......................................**********************.................. 78 hsa-miR-XXX 1326_count=135 1 ...................... 22 0 1529_count=115 1 .................... 20 0 1624_count=107 1 ................... 19 0 4246_count=35 1 ..................... 21 0 4579_count=31 1 ....................... 23 0 6227_count=22 1 .................. 18 0 27689_count=4 1 .................... 20 0 45599_count=3 1 ...................... 22 0 29_count=20119 1 ....................g. 22 1 42_count=8560 1 ....................g.. 23 1 217_count=1182 1 ....................g 21 1 771_count=273 1 .....................g. 23 1 2650_count=62 1 ....................t. 22 1 2976_count=54 1 .....................t 22 1 3379_count=46 1 .....................g 22 1 3565_count=43 1 ...................a 20 1 4289_count=34 1 ......................t 23 1 5056_count=28 1 ....................g... 24 1 5748_count=24 1 .....................g.. 24 1 5990_count=23 1 ....................t 21 1

Sample

Data

sample1_FHG_gp1_miRlist

#sequ_seq_ID seq length

clusterNo=1: miR‐40e‐3p‐456

275_count=837 AAATTGTCGTCCGAACGACCCA 24 CTAAATTGTCGTCCGAACGACCCA 3



10481_count=12 CTAAATTGTCGTCCGAACGACC 22 CTAAATTGTCGTCCGAACGACCCA 1

16177_count=7 CTAAATTGTCGTCCGAACGACCCA 25 CTAAATTGTCGTCCGAACGACCCA 1

18204_count=7 CTAAATTGTCGTCCGAACGACC 22 CTAAATTGTCGTCCGAACGACCCA 1

23103_count=5 TAAATTGTCGTCCGAACGACCCA 24 CTAAATTGTCGTCCGAACGACCCA 2


40840_count=3 TAAATTGTCGTCCGAACGACCCA 24 CTAAATTGTCGTCCGAACGACCCA 2

47403_count=3 AAATTGTCGTCCGAACGAC 19 CTAAATTGTCGTCCGAACGACCCA 3


37319_count=3 CTAAATTGTCGTCCGAACGA 20 CTAAATTGTCGTCCGAACGACCCA 1

3

clusterNo=1: miR‐40e‐5p‐4567 3

1341_count=133 AAGAGTGCGTTGATTGTGGGTA 22 TCAAGAGTGCGTTGATTGTGGGTA 3

2672_count=61 CAAGAGTGCGTTGATTGTGGG 21 TCAAGAGTGCGTTGATTGTGGGTA 2

27623_count=4 AAGAGTGCGTTGATTGTGG 19 TCAAGAGTGCGTTGATTGTGGGTA 3

28829_count=4 AAGAGTGCGTTGATTGTGGGT 21 TCAAGAGTGCGTTGATTGTGGGTA 3

32112_count=4 AAGAGTGCGTTGATTGTGGG 20 TCAAGAGTGCGTTGATTGTGGGTA 3

1111_count=168 CAAGAGTGCGTTGATTGTGGGT 22 TCAAGAGTGCGTTGATTGTGGGTA 2


6959_count=19 AAGAGTGCGTTGATTGTGGGT 21 TCAAGAGTGCGTTGATTGTGGGTA 3

10215_count=12 TCAAGAGTGCGTTGATTGTGG 21 TCAAGAGTGCGTTGATTGTGGGTA 1

12905_count=9 TCAAGAGTGCGTTGATTGTGGGT 23 TCAAGAGTGCGTTGATTGTGGGTA 1

19877_count=6 AAGAGTGCGTTGATTGTGGG 20 TCAAGAGTGCGTTGATTGTGGGTA 3

42447_count=3 CAAGAGTGCGTTGATTGTGGG 21 TCAAGAGTGCGTTGATTGTGGGTA 2




3Sample

Data

sample1_FHG_gp1_miRlist

clusterNo=2: miR‐40c‐5p‐4564 2

1630_count=106 AGTGGAGAGTGCCGCGTGTCTCG 24 GAGTGGAGAGTGCCGCGTGTCTCG 2

7100_count=19 GAGTGGAGAGTGCCGCGTGTCTC 23 GAGTGGAGAGTGCCGCGTGTCTCG 1

16195_count=7 AGTGGAGAGTGCCGCGTGTCTC 22 GAGTGGAGAGTGCCGCGTGTCTCG 2



44145_count=3 GAGTGGAGAGTGCCGCGTGTCTCG 25 GAGTGGAGAGTGCCGCGTGTCTCG 1

2

clusterNo=2: miR‐40c‐3p‐57674 3

23678_count=5 ATCGCAGAATGCGCCTTGAT 22 CATCGCAGAATGCGCCTTGAT 2

1

clusterNo=3: miR‐3456‐3p‐46778 3

13218_count=9 AACTAGCGGTCTCTTTCGCGT 21 AACTAGCGGTCTCTTTCGCGTGGA 1

9642_count=13 ACTAGCGGTCTCTTTCGCGTGG 22 AACTAGCGGTCTCTTTCGCGTGGA 2

Sample

Data

sample1_FHG_gp1_Sum

input file: sample1/mirAlign/sample1_FHG_Align.txt for mirs

input file: sample1/db/sample1_FHG_db.fa for sequ seq

input file: sample1/output/sample1_FHG_gp1.txt for cluster & genomeSeq

Title: Position clusters of mirs mapping to genome

input file: sample1/lists/sample1_FHG_Align_chrY.txt for sequence start position

input file: sample1/mirAlign/sample1_FHG_matchedMirs.fa for mapped mir IDs

input file: sample1/lists/sample1_FHG_Align_chr1.txt for alignment data






















input file: sample1/lists/sample1_FHG_Align_chrX.txt for alignment data

input file: sample1/lists/sample1_FHG_Align_chrY.txt for alignment data

unique mammalian mirs mapped by sequ seq: 6456

# of unique mammalian mirs mapped by sequ seq & genome: 5256; 5256/6456=85.6%

# of position clusters of the mapped mirs: 574

length of the genome: 26277776868

position cluster: the distance of two near positions in a clustaer is < 50

For the mirs mapped by sequ seq and genome after re‐alignment:

# of position clusters: 457

# of mammalian mirs mapped to sequ seq and genome: 4839

# of unique sequ seq: 5381077

# of unique sequ family: 5330

Note: represent miR at 5p or 3p is the sequenced sequence of lowest error# in each cluster and highest copy#

the miR_name is composed of 4 parts, miR, extension number, 5p or 3p, index of sequ seq.

The extension number is the extension number in mirIDs of highest occurency

# of unique miRs detected: 563; # of unique miRs is counted based on miR_name.

Sample

Data

sample1_FHG_gp1_Sum

Index

clusterN

o miR_name miR_seq miR_len

copy# of the

isoform

copy#(all

isoforms in 5p

or 3p)

family#(all

isoforms in

5p or 3p) chr# chr_seqID strand miR_start miR_end mir_start mir_end #mirs mirIDs

1 1 miR‐30e‐5p‐275 TGTAAACATCCTTGACTGGAAGCT 24 5779 4212 22 1 NT_032977 + 11191961 11191984 11191944 11192040 7 hsa‐mir‐XXX

2 20 miR‐28‐3p‐2238 CACTAGATTGTGAGCTCCTGGA 22 107 167 7 3 NT_005612 + 94901772 94901793 94901718 94901803 11 hsa‐mir‐XXX

3 57 miR‐27b‐3p‐127 TTCACAGTGGCTAAGTTCTGC 21 5448 3137 152 9 NT_008470 + 5168992 5169012 5168931 5169027 6 hsa‐mir‐XXX

4 87 miR‐625‐3p‐23720 GACTATAGAACTTTCCCCCTCA 22 230 128 2 14 NT_026437 + 46937624 46937645 46937572 46937656 1 hsa‐mir‐XXX

5 98 miR‐1839‐5p‐21498 AGGGTAGATAGAACAGGTCTTG 22 140 274 3 15 NT_077661 + 545115 545136 545108 545179 3 hsa‐mir‐XXX

6 107 miR‐21‐3p‐413 CAACACCAGTCGATGGGCTGTC 22 9100 4694 129 17 NT_010783 + 16571733 16571754 16571677 16571768 6 hsa‐mir‐XXX

7 117 miR‐7e‐3p‐37414 CTATACGGCCTCCTAGCTTTCC 22 253 190 1 19 NT_011109 + 24464281 24464302 24464221 24464313 4 hsa‐mir‐XXX

8 159 miR‐101‐3p‐121 GTACAGTACTGTGATAACTGAA 22 9940 3692 81 1 NT_032977 ‐ 35496044 35496065 35496031 35496113 6 hsa‐mir‐XXX

9 168 miR‐135b‐3p‐17671 ATGTAGGGCTAAAAGCCATGGG 22 110 85 1 1 NT_004487 ‐ 55907805 55907826 55907783 55907879 6 hsa‐mir‐XXX

10 179 miR‐425‐3p‐5257 ATCGGGAATGTCGTGTCCGCC 21 222 239 277 3 NT_022517 ‐ 48997597 48997617 48997584 48997670 8 hsa‐mir‐XXX

11 196 miR‐548p‐3p‐22151 CCAAAACTGCAGTTACTTTTGC 22 245 189 1 5 NT_034772 ‐ 2567212 2567233 2567198 2567281 2 hsa‐mir‐XXX

12 225 miR‐31‐5p‐412 AGGCAAGATGCTGGCATAGCTG 22 1906 5007 272 9 NT_008413 ‐ 21502156 21502177 21502099 21502201 6 hsa‐mir‐XXX

13 237 miR‐548l‐5p‐14118 AAAAGTATTTGCGGGTTTTGTC 22 259 112 3 11 NT_008984 ‐ 6461333 6461354 6461282 6461367 2 hsa‐mir‐XXX

14 238 miR‐125b‐5p‐90 TCCCTGAGACCCTAACTTGTGA 22 11592 9859 157 11 NT_033899 ‐ 25532933 25532954 25532880 25532967 7 hsa‐mir‐XXX

15 245 miR‐1293‐5p‐5124 TCTGGGTGGTCTGGAGATTTGTG 23 207 51 16 12 NT_029419 ‐ 12771272 12771294 12771230 12771300 2 hsa‐mir‐XXX

16 248 miR‐16‐3p‐9924 CCAGTATTAACTGTGCTGCTGA 22 104 79 2 13 NT_024524 ‐ 31603122 31603143 31603108 31603199 8 hsa‐mir‐XXX

17 255 miR‐1910‐3p‐10089 GAGGCAGAAGCAGGATGACAA 21 54 181 2 16 NT_010498 ‐ 39389435 39389455 39389425 39389504 1 hsa‐mir‐XXX

18 260 miR‐33b‐5p‐3282 GTGCATTGCTGTTGCATTGCA 21 304 74 4 17 NT_010718 ‐ 17314559 17314579 17314498 17314593 5 hsa‐mir‐XXX

19 265 miR‐454‐3p‐8695 TAGTGCAATATTGCTTATAGGGTTT 25 114 259 3 17 NT_010783 ‐ 15868207 15868231 15868179 15868293 6 hsa‐mir‐XXX

20 270 miR‐320‐3p‐1624 AAAAGCTGGGTTGAGAGGG 19 107 226 9 19 NT_011109 ‐ 19480769 19480787 19480765 19480819 1 hsa‐mir‐XXX

21 276 miR‐548j‐5p‐14226 AAAAGTAATTGCGGTCTTTGGT 22 220 298 1 22 NT_011520 ‐ 6341809 6341830 6341746 6341857 1 hsa‐mir‐XXX

22 277 miR‐659‐5p‐34577 AGGACCTTCCCTGAACCAAGGA 22 82 222 1 22 NT_011520 ‐ 17634254 17634275 17634199 17634295 1 hsa‐mir‐XXX

23 278 miR‐1249‐3p‐24401 ACGCCCTTCCCCCCCTTCTTCA 22 116 103 1 22 NT_011523 ‐ 867545 867566 867540 867605 3 hsa‐mir‐XXX

24 279 miR‐221‐5p‐816 ACCTGGCATACAATGTAGATTTCT 24 256 585 299 X NT_079573 ‐ 8457414 8457437 8457351 8457460 9 hsa‐mir‐XXX

25 279 miR‐221‐3p‐5 AGCTACATTGTCTGCTGGGTTTC 23 6612 9147 231 X NT_079573 ‐ 8457375 8457397 8457351 8457460 9 hsa‐mir‐XXX

26 280 miR‐222‐5p‐4253 CTCAGTAGCCAGTGTAGATCC 21 101 271 5 X NT_079573 ‐ 8458247 8458267 8458187 8458296 6 hsa‐mir‐XXX

27 280 miR‐222‐3p‐23 AGCTACATCTGGCTACTGGGTCTC 24 6978 8658 242 X NT_079573 ‐ 8458206 8458229 8458187 8458296 6 hsa‐mir‐XXX

28 281 miR‐548i‐5p‐38320 AAAAGTACTTGCGGATTTTGC 21 247 99 1 X NT_011651 ‐ 6777114 6777134 6777067 6777143 1 hsa‐mir‐XXX

29 282 miR‐361‐5p‐1854 TTATCAGAATCTCCAGGGGTAC 22 126 273 7 X NT_011651 ‐ 8454994 8455015 8454948 8455019 4 hsa‐mir‐XXX

30 282 miR‐361‐3p‐17619 TCCCCCAGGTGTGATTCTGATTT 23 173 59 3 X NT_011651 ‐ 8454954 8454976 8454948 8455019 4 hsa‐mir‐XXX

31 283 miR‐421‐3p‐3255 ATCAACAGACATTAATTGGGCGC 23 186 54 6 X NT_011669 ‐ 11756215 11756237 11756199 11756283 6 hsa‐mir‐XXX

Sample

Data

sample1_FHG_clusterPosition

input from sample1/0finalReport/3_sample1_FHG_gp1_Sum.txt



The position refers to the start position of miR in human genome. ESTs are added one by one after chromosome X.

The PositionInSeq refers to the position of miR in its own contig sequence or EST

Refer to above three input files to get definitions of miR_name & miR_seq

The clusterDistance is the difference of the positions of the current and previous clusters.

Minimum c 1

Maximum c19010493

Index Position

cluster

Distance

#Copy (all

isoforms in 5p

or 3p)

#family (all

isoforms in 5p

or 3p) Chr# Strand

StartPosition

InSeq

EndPosition

InSeq Type unique_mirs miR_name miR_seq

1 753552 3 1 1 + 75242 75266 predict (gp4) PC‐5p‐XXXXX GCGGCACTGAGGCTTATAGCGGAA

2 866783 323529 5 1 1 ‐ 411322 411346 predict (gp4) PC‐3p‐XXXXX TGTACGGCCATCCAGCTCTAGGCC

3 1209671 2297742 5 1 1 + 933784 933805 predict (gp4) PC‐5p‐XXXXX GGAATAGCACATCAAGTAGGT

4 1435145 92222 3 1 1 ‐ 1303036 1303059 predict (gp4) PC‐5p‐XXXXX CACGGCCATTAGACGACGCCGGG

5 1623385 1738943 4 1 1 ‐ 1708079 1708102 predict (gp4) PC‐3p‐XXXXX TGTACGTAAAGTGACTCCACTAA

6 1965183 1297740 2 1 1 ‐ 2074417 2074440 predict (gp4) PC‐5p‐XXXXX TGATAGGCCCTACTGTCCATGTT

7 2345903 1377652 6 3 1 ‐ 2594777 2594798 known (gp1) hsa‐mir‐XXX miR‐XXX‐5p‐XXX TATGCCAGGCAGTTATACCAT

8 2823957 44 67 4 1 ‐ 3013390 3013415 known (gp1) hsa‐mir‐XXX miR‐XXX‐5p‐XXX CAAGCGTTTGTCAACAAAGTGTTGA

9 3358837 5278182 3 1 1 + 3184989 3185008 predict (gp4) PC‐5p‐XXXXX TTGTCCGATTATGTGCTCG

10 3839822 48398 5675 25 1 + 3621919 3621940 known (gp1) mmu‐mir‐XXX;mml‐mir‐XXX;b miR‐XXX‐5p‐XXX CCGCGACGTTTTCGGGACCGA

11 4097536 42 456 15 1 + 3921607 3921624 known (gp1) mmu‐mir‐XXX;mml‐mir‐XXX;b miR‐XXX‐3p‐XXX CCAAACTCGCGAACTAG

12 4440731 2887 532 6 1 + 4464521 4464546 known (gp1) mmu‐mir‐XXX;mml‐mir‐XXX;b miR‐XXX‐5p‐XXX GAACTCTACGAATCATCCTAGTATG

13 4768272 39 2 1 1 + 4883828 4883848 known (gp1) mmu‐mir‐XXX;mml‐mir‐XXX;b miR‐XXX‐3p‐XXX GCTGCCTCCGTACGATGCTA

14 5218520 761034 5 1 1 + 5090794 5090816 predict (gp4) PC‐5p‐XXXXX ATCTAATGTGGGTGACACTGGT

15 5508289 1445882 78 2 1 + 5337199 5337222 predict (gp4) PC‐5p‐XXXXX GGGGTTTAGGGTACCCGCTTCTG

16 6021793 1814174 2 1 1 + 5444635 5444659 predict (gp4) PC‐5p‐XXXXX TGTTGAGCGATTGCATGCAACTTA

17 6460209 2534403 56 1 1 ‐ 5666243 5666263 predict (gp4) PC‐5p‐XXXXX TGGGTGCGTGTGGTCACGTC

18 6947656 229060 21 1 1 ‐ 5768254 5768271 predict (gp4) PC‐5p‐XXXXX TGCGCGTCTTTATTATC

19 7422268 274643 4 1 1 ‐ 6044844 6044868 predict (gp4) PC‐5p‐XXXXX CCGTGATTGGACCGTCGCGTTCGT

20 7922269 1548217 56 1 1 ‐ 6305595 6305619 predict (gp4) PC‐5p‐XXXXX CACTGCCGAACGATCTGTGATTCC

21 8458859 3960774 7 1 1 + 6681929 6681953 predict (gp4) PC‐3p‐XXXXX TTCACGCTGGGTTATATCTCTCGC

22 8775378 8753269 8 1 1 ‐ 7051908 7051926 predict (gp4) PC‐5p‐XXXXX CCTCTCCTGGTTAGTCCA

23 9051642 36 8 1 1 + 7435397 7435420 predict (gp4) PC‐3p‐XXXXX CCAAGCAGTCTGGCATCTTATGC

24 9312384 2979623 4689 90 1 ‐ 7740471 7740495 known (gp1) mmu‐mir‐XXX;mml‐mir‐XXX;b miR‐XXX‐3p‐XXX GTATCGCTATCGCCCAGAGCGTCG

25 9486554 3393776 2 1 1 + 7895151 7895168 predict (gp4) PC‐5p‐XXXXX CACCCAAGGACCCCGCC

26 9802033 2615459 45 1 1 ‐ 8392277 8392297 known (gp1) ssc‐mir‐XXX miR‐XXX‐3p‐XXX GCGCCTTCCGCCGATTTTGT

27 10066816 9061509 7 1 1 ‐ 8528349 8528366 predict (gp4) PC‐3p‐XXXXX ACGATACTGTACTCGGG

28 10365246 7862322 34 1 1 ‐ 8997137 8997156 predict (gp4) PC‐5p‐XXXXX CCAAGAGGTGTGTTGAGCA

29 10746251 371399 76 1 1 ‐ 9300237 9300258 predict (gp4) PC‐5p‐XXXXX AGTTTTCGCACGGCGTGTCAT

30 11034399 5033786 8 1 1 + 9700728 9700750 predict (gp4) PC‐5p‐XXXXX TGCTTATGCAGCTTTGTAGCCT

31 11125629 450060 34 3 1 + 9804588 9804611 known (gp1) mmu‐mir‐XXX;mml‐mir‐XXX;b miR‐XXX‐5p‐XXX AAGGCGGGTCTACTAAGGGGAGC

32 11355321 4198426 76 1 1 ‐ 10112901 10112920 predict (gp4) PC‐5p‐XXXXX GAAACCAGCTAAGCAATGC

33 11850572 785 4 1 1 ‐ 10612103 10612123 known (gp1) mmu‐mir‐XXX;mml‐mir‐XXX;b miR‐XXX‐5p‐XXX GGGCATAACTGTGGGCTGAC

34 11947742 7938479 6456 6 1 + 11156668 11156693 predict (gp4) PC‐5p‐XXXXX TTGAGGTCCGTTCCTCAGTCGACCT

35 12107675 906267 3 1 1 ‐ 11576747 11576770 predict (gp4) PC‐3p‐XXXXX TAGAGGTAGCCACAAGGATAGCG

Sample

Data

Table 1 - Data summaryRaw Data #SequSeq #UniqueSeq# Raw Sequ seq 9,922,513 1,592,666

Data Processing #SequSeq %SequSeq #UniqueSeq %UniqueSeq1.“impurity” sequences filtered 948,723 9.6% 623,412 39.1%2. Copy#<3 filtered 913,248 9.2% 845,666 53.1%3. Length < 15 filtered 204,858 2.1% 56,788 3.6%4. mRNA,RFam,Repbase filtered 362,645 3.7% 11,345 0.7%5. Final Mappable 7,493,039 75.5% 55,455 3.5%

Total 9,922,513 100% 1,592,666 100%

Table 2 - Length distribution of mappable data

Length #SequSeq%FinalMappable

SequSeq #UniqueSeq%FinalMappable

UniqueSeq#SequSeq/

#UniqueSeq15 52,742 0.7% 945 1.7% 55.816 60,120 0.8% 1,159 2.1% 51.917 64,128 0.9% 978 1.8% 65.618 57,106 0.8% 1,005 1.8% 56.819 53,125 0.7% 883 1.6% 60.220 78,317 1.0% 1,579 2.8% 49.621 454,462 6.1% 3,907 7.0% 116.322 3,438,602 45.9% 16,261 29.3% 211.523 2,810,150 37.5% 8,934 16.1% 314.524 249,047 3.3% 2,037 3.7% 122.325 41,676 0.6% 947 1.7% 44.026 3,941 0.1% 445 0.8% 8.927 2,570 0.0% 103 0.2% 25.028 2,592 0.0% 94 0.2% 27.629 4,980 0.1% 120 0.2% 41.530 1,823 0.0% 82 0.1% 22.240 117,658 1.6% 15,976 28.8% 7.4

Final Mappable 7,493,039 100.0% 55,455 100.0% 135.1

Sample

Data

Table 3 - Unique seq mapped to unique mammalian mirs

#SequSeq %SequSeq %FinalMappable #UniqueSeq %UniqueSeq %FinalMappable

# Known mammalian unique mir in miRBase v14.0 3,924# Known mammalian unique miR in miRBase v14.0 2,656# Unique mir in miRBase mapped 1,599# Unique miR in miRBase mapped 2,273Mapped to miRBase 6,013,453 60.60% 80.25% 12,456 0.78% 22.46%

# Known hsa mir in miRBase v14.0 717# Known hsa miR in miRBase v14.0 894# Unique hsa mir in mirmiRBase mapped 286# Unique hsa miR in miRBase mapped 399Mapped to hsa of miRBase 5,937,837 59.84% 79.24% 8,392 0.53% 15.13%

Table 4 - Cluster I: Sequ seq mapped to mammalian mirs that further mapped to genomeMapping to mammalian:


Cluster I 5,734,742 57.80% 76.53% 8,335 0.52% 15.03%# Alignment-cluster in Cluster I 389# Unique mir in Cluster I 1,456# Unique miR in Cluster I 309

FileName Mapped_Data/sample1_FHG_gp1_Align.txtFileName Mapped_Data/sample1_FHG_gp1_Sum.txtFileName Mapped_Data/sample1_FHG_gp1_miRlist.txt

Mapping to species hsa:


Cluster I 5,734,597 57.79% 76.53% 8,323 0.52% 15.01%#Unique hsa mir in Cluster I 295#Unqiue hsa miR in Cluster I 399

Sequ Seq Unique Seq

Sequ Seq Unique Seq

Sequ Seq Unique Seq

Sample

Data

Table 5 - Cluster II: Sequ seq mapped to both mammalian mirs and genome, but the mirs unmapped to genome:

Mapping to mammalian:


Cluster II 145 0.00% 0.00% 12 0.00% 0.02%# Alignment-cluster in Cluster II 7# Unique mir in Cluster II 3# Unique miR in Cluster II 4


Table 6 - Cluster III: Sequ seq mapped to mammalian mirs, but the mirs unmapped to genome (sequence cluster):

Mapping to mammalian:


Cluster III 2,753 0.03% 0.04% 145 0.01% 0.26%# Alignment-cluster in Cluster III 23# Unique mir in Cluster III 37# Unique miR in Cluster III 18


Mapping to species hsa:

#SequSeq %SequSeq %FinalMappable #UniqueSeq %UniqueSeq %FinalMappable Cluster III 0 0.00% 0.00% 0 0.00% 0.00%#Unique hsa mir in Cluster III 0#Unqiue hsa miR in Cluster III 0

Sequ Seq Unique Seq

Sequ Seq Unique Seq

Sequ Seq Unique Seq

Sample

Data

Table 7 - Cluster IV: Sequ seq mapped to genome, but unmapped to mammalian mirs (predict new hairpin by mfold):


Cluster IV 18,422 0.19% 0.25% 1,402 0.09% 2.53%# Alignment-cluster in Cluster IV 1,095# Unique miR in Cluster IV 703


Table 8 - Unmapped Sequ Seq

#SequSeq %SequSeq %FinalMappable #UniqueSeq %UniqueSeq %FinalMappable Nohit 847,339 8.54% 11.31% 40,012 2.51% 72.15%

FileName Mapped_Data/sample1_FHG_nohit.txt

Table 9 - Mapping summary

#SequSeq %SequSeq #UniqueSeq %UniqueSeq

Raw 9,922,513 100.00% 1,592,666 100.00%Mappable 7,493,039 75.52% 55,455 3.48%Mapped to miRbase (including nohit 1) 6,013,453 60.60% 12,456 0.78%Mapped to Cluster I 5,734,742 57.80% 8,335 0.52%Mapped to Cluster II 145 0.00% 12 0.00%Mapped to Cluster III 2,753 0.03% 145 0.01%Mapped to Cluster IV 18,422 0.19% 1,402 0.09%Mapped (total) 5,756,062 58.01% 9,894 0.62%Nohit (including nohit 1 and nohit 2) 847,339 8.54% 40,012 2.51%

Note: Mapped (total) + Nohit should equal to mappable

Table 10 - Detected miR summary

# of unique miRs detected in Cluster I 309# of unique miRs detected in Cluster II 4# of unique miRs detected in Cluster III 18# of unique miRs detected in Cluster IV 703Total 1,034

FileName Mapped_Data/sample1_FHG_uni_miRs.txt

Sequ Seq Unique Seq

Sequ Seq Unique Seq

Sample

Data

Note:Definition of miR_name:

in cluster group4: the miR_name is composed of 3 parts, PC, 5p or 3p, index of sequ seq. PC means 'Predicted Candidate'.

The miR sequence is from the isoform of lowest error# and highest copy# in cluster groups 1, 2 & 4.The miR sequence is from the isoform of highest copy# regardless its error# in cluster groups 3.

# of unique miRs detected in these 4 groups is counted from different miR_seq.

in cluster group1: the miR_name is composed of 4 parts, miR, extension number, 5p or 3p, index of sequ seq. The extension number is the extension number in mirIDs of highest occurency.

in cluster group2: the miR_name is composed of 4 parts, PC, extension number, 5p or 3p, index of sequ seq. PC means 'Predicted Candidate'. The extension number is the extension number in mirIDs of highest occurency.

in cluster group3: the miR_name is composed of 4 parts, PN, extension number, 5p or 3p, index of sequ seq. PN means 'Predicted Novel miR'. The extension number is the extension number in mirIDs of highest occurency.

Sample

Data

sequencin g data report · d and the f rt. experim rt. the stat ary is presen mended sof e...

Documents