bioinformatics - karolinska...

54
Bioinformatics in next generation sequencing projects Rickard Sandberg Assistant Professor Department of Cell and Molecular Biology Karolinska Institutet May 2013 Thursday, May 16, 13

Upload: vuonghanh

Post on 19-May-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Bioinformatics in next generation sequencing projects

Rickard SandbergAssistant ProfessorDepartment of Cell and Molecular BiologyKarolinska Institutet

May 2013

Thursday, May 16, 13

Page 2: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Standard sequence library generation

Thursday, May 16, 13

Page 3: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Illumina Sequencing Technology

Thursday, May 16, 13

Page 4: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Illumina (Solexa) Sequencing

Thursday, May 16, 13

Page 5: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Illumina paired-end and index-read sequencing

Thursday, May 16, 13

Page 6: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Once sequenced the problembecomes computational

Computational analyses is the bottleneck• Rapid improvement in sequencing• Still need for customized analysis for most projects

Thursday, May 16, 13

Page 7: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Overview of computational analyses

Image analysisBase calling

Primary Analyses: Mapping(Assembly)

Data typespeci!c analyses(e.g. peak calling,

calculate expression)

Custom projectspeci!c analyses

ChIP-Seq peak calling

RNA-Seq expression levelsgenome sequence

assembled contig

Thursday, May 16, 13

Page 8: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Preliminary Analyses

Raw Image (TB)

Platform-specific analysis using the vendors programs

Sequences and Quality scoresText File (GB)

Real Time Analysis

Thursday, May 16, 13

Page 9: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Sequenced reads

>EAS54_6_R1_2_1_413_324

CCCTTCTTGTCTTCAGCGTTTCTCCFasta file:

Fastq file:

SOLiD

csfasta file>1_39_146_F3T22100200202311030112002022222002021>1_39_194_F3T11022322003020303320012223122202221

SOLiD, QV file>1_39_146_F314 6 21 27 5 18 6 15 22 27 18 17 14 18 26 15 24 19 18 18 8 20 17 12 20 6 14 13 23 6 11 12 7 13 4 >1_39_194_F326 27 16 27 23 22 23 25 22 10 5 21 4 17 20 26 26 17 25 27 23 25 14 24 26 4 4 4 4 4 4 4 4 4 14

GAACTCTGCCTTTTTCAGTGATGAGGAAAGGAGTTCTCTCTGGTCCCCAG

aaab^_U_aa [U [ _Z ] a `WU_^X`GT^_ \ TM^ ^ \ ___ \ Z \ YQVVXUBBBB

Read identifier

Quality scores

@HWI - EAS269:1:120:1786:18#0/1

+HWI - EAS269:1:120:1786:18#0/1

Thursday, May 16, 13

Page 10: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Phred Quality Score, Q

Each base call has an estimate of the probability of being wrong (error probability, p)

Q = -10 * log10(p)

Phred Quality Score Probability of incorrect base call Base call accuracy10 1 in 10 90 %20 1 in 100 99 %30 1 in 1000 99.9 %40 1 in 10000 99.99 %50 1 in 100000 99.999 %

Thursday, May 16, 13

Page 11: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

FastQ encodings

Thursday, May 16, 13

Page 12: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Fastq quality control (FastQC)

http://www.youtube.com/watch?v=bz93ReOv87YVideo tutorial:

Thursday, May 16, 13

Page 13: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Quality scores for each sequence position

Thursday, May 16, 13

Page 14: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Quality scores for each sequence position:A good run

Thursday, May 16, 13

Page 15: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

GC for reads

Thursday, May 16, 13

Page 16: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Percent A,C,G,T at each position

Thursday, May 16, 13

Page 17: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Relative enrichment of kmers

Thursday, May 16, 13

Page 18: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Overview of computational analyses

Image analysisBase calling

Primary Analyses: MappingAssembly

Data typespeci!c analyses(e.g. peak calling,

calculate expression)

Custom projectspeci!c analyses

ChIP-Seq peak calling

RNA-Seq expression levelsgenome sequence

assembled contig

Thursday, May 16, 13

Page 19: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Short Read Assembly

Velvet and SOAPdenovode novo genomic assembler specially designed for short read sequencing technologies

Nature 2009

Thursday, May 16, 13

Page 20: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Two principal approaches for transcriptome reconstruction

Thursday, May 16, 13

Page 21: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Genome-independent transcriptome reconstruction

Garbherr et al. Nature Biotechnology, July 2011

Default k = 25

Thursday, May 16, 13

Page 22: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Finding novel non-annotated genes or transcript variants

Thursday, May 16, 13

Page 23: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Mapping of millions of short reads

Task: Map millions of short sequences (25-100 nt) onto a genome (3 000 Mbp ) or transcriptome

Mismatches (sequencing errors and SNPs)

Unique / Repetitive matches

Indels (Normal variation, CNVs)

Large rearrangements (translocations)

BLAST, BLAT tools not designed for these tasks

Thursday, May 16, 13

Page 24: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Mapping of RNA-Seq reads

Garber et al. 2011 Nat Methods

STAR

Thursday, May 16, 13

Page 25: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Genome Chromosome Fasta Files

+

Known and putative splice junctions Fasta File

2. map reads towardsgenome + junction compilation

GTAAGT-----------AG Exon n+1

1. compile sets of junctions

Exon n

Mapping of splice junctions

Thursday, May 16, 13

Page 26: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Tophat !rst MethodIdentifying the transcriptome

A B C identify candidate exons

via genomic mapping

A B C A B C Generate possible

pairings of exons

Align “unmappable”

reads to possible junctions

A B C A B C

Thursday, May 16, 13

Page 27: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Longer readsLonger reads

GATGTTCTCAGTGTCC GATGTAATCAGTGTCC AACCCTCTCAGTGTCC

>HWI-EAS229_75_30DY0AAXX:7:1:0:949

Very long (100Kb+) intron

By segmenting the long reads, and mapping the segments independently, we can

look harder for junctions we might have missed with shorter reads

Running time

independent of

intron size

Thursday, May 16, 13

Page 28: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Mapping to transcriptomeExons 5’UTR 3’UTRIntronsGene:

DNA (genome)W

C

pre-mRNA

Transcription

AAAAA

RNA processing (splicing, polyadenylation)

mRNA AAAAA

Exons 5’UTR 3’UTRIntronsGene:

DNA (genome)W

C

Thursday, May 16, 13

Page 29: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Microexons and junction coverage

Exons 5’UTR 3’UTRIntronsGene:

DNA (genome)W

C

2 or more splice junctions within the same read

in-house mapping tophat mapping

Thursday, May 16, 13

Page 30: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Microexons and junction coverage

Exons 5’UTR 3’UTRIntronsGene:

DNA (genome)W

C

2 or more splice junctions within the same read

in-house mapping tophat mapping

Different read length will have different problems!Thursday, May 16, 13

Page 31: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Mapping'speed 308'M'reads'/'hour%'uniquely'mapping 60%'multimapping 25%'unmapped 15

Example of STAR aligned single-cell RNA-Seq data

281 719 splice junctions279 356 with GT/AG 2 123 with GC/AG 215 with AT/AC

Thursday, May 16, 13

Page 32: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Storing mapped Alignments

Formats for storing alignments should include:

genomic coordinates

mismatches, insertion, deletions etc.

quality information

Thursday, May 16, 13

Page 33: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Samtools

Sequence Alignment Map (SAM)

Generic Alignment format

Supports long and short reads

Human readable, "exible and compact

Emerging standard

h"p://samtools.sourceforge.net/

Li  H.*,  Handsaker  B.*,  Wysoker  A.,  Fennell  T.,  Ruan  J.,  Homer  N.,  Marth  G.,  Abecasis  G.,  Durbin  R.  and  1000  Genome  Project  Data  Processing  Subgroup  (2009)  The  Sequence  alignment/map  (SAM)  format  and  SAMtools.  BioinformaScs,  25,  2078-­‐9.  [PMID:  19505943]

Thursday, May 16, 13

Page 34: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

SAM Example

16 chr Y 616000 255 22M731N28M

* 0 0 ATTTCGACCATGATCATCGAACCTTCCCCTGGATCCACTTCCACGATCAC

#9 ; -7 +2@4 : 2=20 - 14= : ><?< ; : BB? : 4<BB?ABBBBABCBBBBC=BB NM: i : 0

XS: A:-

Bit field, where 16

means reverse strand

Start position

Alignment structure. Here: 22 aligned bases,

then 731 bases intron, then 28 aligned bases

HWI - EAS269:1:114:1242:1582#0

Thursday, May 16, 13

Page 35: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

CIGAR Format

M, match/mismatch

I, insertion

D, deletion

S, softclip

...

Ref: GCATTCAGATGCAGTACGC

Read: ccTCAG--GCAGTAgtg

Pos: 5

CIGAR: 2S4M3D6M3S

50M

Thursday, May 16, 13

Page 36: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Samtools for SAM/BAM !les

Library and software package (C, Java)

Creating, sorting, indexing SAM & BAM

Visualizing alignments in command

SNP calling

Short indel detection

BAM (Binary representation of SAM) ~25% #le size reduction

Thursday, May 16, 13

Page 37: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Read mapping statistics

e.g. using RSeQC (package)

GC content (%)

Den

sity

of R

eads

0 20 40 60 80 100

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

● ●

●●

● ●

●●

● ● ● ●●

●●

●●

●●

●● ●

● ● ●● ●

●●

0 10 20 30 40

0.15

0.20

0.25

0.30

0.35

0.40

0.45

Position of Read

Nuc

leot

ide

Freq

uenc

y

●●

● ● ● ●●

●●

● ●●

●●

●● ● ● ● ●

● ● ● ●●

●●

●●

●●

● ●

●●

●●

●●

●● ●

● ● ● ●●

● ● ● ● ● ● ● ● ●● ●

● ●●

●● ● ● ● ●

●●

● ●● ● ●

● ●● ●

● ● ●●

●●

ATGC

Thursday, May 16, 13

Page 38: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Read mapping statistics:Read mapping across genes

0 20 40 60 80 100

2000

4000

6000

8000

1000

0

percentile of gene body (5'−>3')

read

num

ber

Thursday, May 16, 13

Page 39: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Read mapping statistics

partial_novel 2%

complete_novel 9%

known 89%

splicing junctions

Thursday, May 16, 13

Page 40: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Read mapping statistics: duplicate and unique reads

0 100 200 300 400 500

Frequency

Num

ber o

f Rea

ds (l

og10

)●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●●

●●

●●● ●

●●●●

●●●

●● ●●

●●●● ●●

●●●●●

●●●● ● ●

● ●● ●● ●

● ● ● ● ●

● ●● ●● ●

Sequence−baseMapping−base

01

23

45

23

983

Rea

ds %

Thursday, May 16, 13

Page 41: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Read mapping statistics: q values on mapped reads

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43

Position of Read

313233343536373839404142434445464748495051525354555657585960616263646566676869707172

Phre

d Q

ualit

y Sc

ore

Thursday, May 16, 13

Page 42: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Overview of computational analyses

Image analysisBase calling

Primary Analyses: MappingAssembly

Data typespeci!c analyses(e.g. peak calling,

calculate expression)

Custom projectspeci!c analyses

ChIP-Seq peak calling

RNA-Seq expression levelsgenome sequence

assembled contig

Thursday, May 16, 13

Page 43: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Visualization

Integrated Genome Viewer (Broad Inst.)

Custom tracks at UCSC Genome Browser

Thursday, May 16, 13

Page 44: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Peak characteristics differ with signal

Thursday, May 16, 13

Page 45: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Peak characteristics differ with signal

H3K4me3: Sharp promoter peaksH3K36me3: Broad transcription elongation signal

Thursday, May 16, 13

Page 46: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Important !le formats

Sequences: FastQ

Aligned reads: SAM/BAM

Genome annotations: Bed, Gff

Coverage: Wig, (Tdf )

http://genome.ucsc.edu/FAQ/FAQformat.html

Thursday, May 16, 13

Page 47: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

BED format

chrom  -­‐  The  name  of  the  chromosome  (e.g.  chr3,  chrY,  chr2_random)  or  scaffold  (e.g.  scaffold10671).

chromStart  -­‐  The  starSng  posiSon  of  the  feature  in  the  chromosome  or  scaffold.  The  first  base  in  a  chromosome  is  numbered  0.

chromEnd  -­‐  The  ending  posiSon  of  the  feature  in  the  chromosome  or  scaffold.  The  chromEnd  base  is  not  included  in  the  display  of  the  feature.  

For  example,  the  first  100  bases  of  a  chromosome  are  defined  as  chromStart=0,  chromEnd=100,  and  span  the  bases  numbered  0-­‐99.

http://genome.ucsc.edu/FAQ/FAQformat.html

track name=pairedReads description="Clone Paired Reads" useScore=1chr22 1000 5000

Thursday, May 16, 13

Page 48: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

BED continued

strand - Defines the strand - either '+' or '-'.thickStart - The starting position at which the feature is drawn thickly (for example, the start codon in gene displays).thickEnd - The ending position at which the feature is drawn thickly (for example, the stop codon in gene displays).itemRgb - An RGB value of the form R,G,B (e.g. 255,0,0). If the track line itemRgb attribute is set to "On", this RBG value will determine the display color of the data contained in this BED line. NOTE: It is recommended that a simple color scheme (eight colors or less) be used with this attribute to avoid overwhelming the color resources of the Genome Browser and your Internet browser.blockCount - The number of blocks (exons) in the BED line.blockSizes - A comma-separated list of the block sizes. The number of items in this list should correspond to blockCount.blockStarts - A comma-separated list of block starts. All of the blockStart positions should be calculated relative to chromStart. The number of items in this list should correspond to blockCount.

track name=pairedReads description="Clone Paired Reads" useScore=1chr22 2000 6000 cloneB 900 - 2000 6000 0 2 433,399, 0,3601

Thursday, May 16, 13

Page 49: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Variable step Fixed step

variableStep chrom=chr2300701 12.5300702 12.5300703 12.5300704 12.5300705 12.5is equivalent to:variableStep chrom=chr2 span=5300701 12.5

fixedStep chrom=chr3 start=400601 step=100112233

WIG format (coverage format)

Wiggle format (WIG) allows the display of continuous-valued data in a track format

Thursday, May 16, 13

Page 50: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Data Repositories

Short Read Archive (fastq) [discontinued!]http://www.ncbi.nlm.nih.gov/sraEuropean Nucleotide Archive

Gene Expression Omnibus (bed, wig, fastq)http://www.ncbi.nlm.nih.gov/geo/

Thursday, May 16, 13

Page 51: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

SEQAnswers, an active forum for discussions on next-generation sequencing methods and bioinformatics

http://seqanswers.com/Thursday, May 16, 13

Page 52: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Thursday, May 16, 13

Page 53: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Garbherr et al. Nature Biotechnology, July 2011

Genome-independent transcriptome reconstruction: accuracy and coverage

Thursday, May 16, 13

Page 54: Bioinformatics - Karolinska Institutetsandberg.cmb.ki.se/.../courses/bioinfocell/NGS_bioinformatics_2013.pdf · Bioinformatics in next generation ... 14 6 21 27 5 18 6 15 22 27 18

Genome-independent transcriptome reconstruction: accuracy and coverage

Garbherr et al. Nature Biotechnology, July 2011

Thursday, May 16, 13