sanger vs next-gen sequencing … · next-gen sequencing workflow source: lu and shen, 2016,...

12
1 __________________________________________________________________________________________________ Fall, 2017 GCBA/MGCB/BMI 815 Tools and Algorithms in Bioinformatics GCBA815/MCGB815/BMI815, Fall 2017 Week-8: Next-Gen Sequencing RNA-seq Data Analysis Babu Guda, Ph.D. Professor, Genetics, Cell Biology & Anatomy Director, Bioinformatics and Systems Biology Core University of Nebraska Medical Center Sanger vs Next-Gen Sequencing Source: https://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&ved=0ahUKEwj356Gaj- zWAhXEzFQKHZrlCh0QjRwIBw&url=http%3A%2F%2Fslideplayer.com%2Fslide%2F11674461%2F&psig=AOvVaw3BHyDsG4jHY9z4Y3Jc11IY&u st=1507933065294289

Upload: others

Post on 14-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sanger vs Next-Gen Sequencing … · Next-Gen Sequencing Workflow Source: Lu and Shen, 2016, Biochemistry, Genetics and Molecular Biology. DOI: 10.5772/61657 • Genome • Whole

1

__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815

Tools and Algorithms in BioinformaticsGCBA815/MCGB815/BMI815, Fall 2017

Week-8: Next-Gen Sequencing

RNA-seq Data Analysis

Babu Guda, Ph.D.Professor, Genetics, Cell Biology & Anatomy

Director, Bioinformatics and Systems Biology Core

University of Nebraska Medical Center

SangervsNext-GenSequencing

Source: https://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&ved=0ahUKEwj356Gaj-zWAhXEzFQKHZrlCh0QjRwIBw&url=http%3A%2F%2Fslideplayer.com%2Fslide%2F11674461%2F&psig=AOvVaw3BHyDsG4jHY9z4Y3Jc11IY&ust=1507933065294289

Page 2: Sanger vs Next-Gen Sequencing … · Next-Gen Sequencing Workflow Source: Lu and Shen, 2016, Biochemistry, Genetics and Molecular Biology. DOI: 10.5772/61657 • Genome • Whole

2

Next-GenSequencing

Source: https://bloggenohub.files.wordpress.com/2015/01/slide1.jpg

• No in vivo cloning

CostofHumanGenomeSequencing

Source: http://blog.dnanexus.com/wp-content/uploads/2017/04/Screen-Shot-2017-04-24-at-11.40.38-AM.png

Page 3: Sanger vs Next-Gen Sequencing … · Next-Gen Sequencing Workflow Source: Lu and Shen, 2016, Biochemistry, Genetics and Molecular Biology. DOI: 10.5772/61657 • Genome • Whole

3

Next-GenSequencingWorkflow

Source: Lu and Shen, 2016, Biochemistry, Genetics and Molecular Biology. DOI: 10.5772/61657

• Genome• Wholegenomesequencing• Wholeexomesequencing• Targetedgenepanels(cancer,newborns,autism,etc.)

• Transcriptome• WholeRNAsequencing• mRNAtranscriptome(poly-Aselection)• SmallRNAanalysis(siRNA,snoRNA,lincRNA,etc.)• Geneexpressionprofilingforselectedtargetgenes

• Metagenome• Bulksequencingofmanytypesofbacteria• Examples:humangutmicrobiome,soilsamples,foodcontamination,

extremophiles,etc.• Epigenome

• ChromatinImmunoprecipitation Sequencing(ChIP-Seq)• Methylation Sequencing(Methyl-Seq)

ApplicationsofNGS

Page 4: Sanger vs Next-Gen Sequencing … · Next-Gen Sequencing Workflow Source: Lu and Shen, 2016, Biochemistry, Genetics and Molecular Biology. DOI: 10.5772/61657 • Genome • Whole

4

DifferentSequencingLibraries

Source: http://slideplayer.com/7847747/25/images/7/Types+of+Sequencing+Libraries.jpg

Paired-endSequencing

Source: https://assets.illumina.com/content/dam/illumina-marketing/images/science/v2/web-graphic/paired-end-vs-single-read-seq-web-graphic.jpg

Page 5: Sanger vs Next-Gen Sequencing … · Next-Gen Sequencing Workflow Source: Lu and Shen, 2016, Biochemistry, Genetics and Molecular Biology. DOI: 10.5772/61657 • Genome • Whole

5

FASTQFilesfromPaired-endSequencing

Source: https://bioinf-galaxian.erasmusmc.nl/galaxy/

Demultiplexing MixedSamples

Source: https://www.illumina.com/content/dam/illumina-marketing/images/technology/multiplexing-overview-figure.gif

Page 6: Sanger vs Next-Gen Sequencing … · Next-Gen Sequencing Workflow Source: Lu and Shen, 2016, Biochemistry, Genetics and Molecular Biology. DOI: 10.5772/61657 • Genome • Whole

6

__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815

Different File Types in NGS analysis

• Fastq file – generated by the sequencer, contains NGS reads• SAM file – Sequence Alignment/Map (generated by aligning the

NGS reads with the reference genome)• BAM file – Binary version of the SAM file (SAMtools are used to

manipulate SAM/BAM files)• GFF file – General Feature Format used to hold genome

annotation (chromosome, strand, frame, exon, CDS, etc.)• GTF file – Gene Transfer Format (Also contains all the info as in

GFF and in addition contains gene annotation information)• VCF file – Variant Call Format (used to store variant data such

as SNPs, InDels, short structural rearrangements)

Row1:Informationfromthesequenceraboutthelocationofthisreadontheplate

Row2:TheSequenceRow3:MetadataprovidedbythesequencingteamRow4:Qualityscorespertainingtoeachnucleotideinthesequence

[email protected]/1GAGGCTATAGCATGGTCAAGGCACAAGAAGATCACTGGACTGCCCTCGCTCAGCCCTCAGCTACTG+>>?>?@>?>@@>?@@=@@@@@??>??@??@?@A?>@@@?>@@???A@:@A@@A@@@A@@AAB@@BB

Page 7: Sanger vs Next-Gen Sequencing … · Next-Gen Sequencing Workflow Source: Lu and Shen, 2016, Biochemistry, Genetics and Molecular Biology. DOI: 10.5772/61657 • Genome • Whole

7

__________________________________________________________________________________________________Fall, 2017 GCBA/MGCB/BMI 815

FASTQ format:

FASTQ is based on the popular FASTA format for sequences

FASTA format>sequence_ID; header in one lineAGTTGTAGTCCGTGATAGTCGGATCGG

FASTQ format provides additional information that includes the quality score@20FUKAAXX100202:1:64:10634:114560/1TTGTATTTTTAGTAGAGACGGAGTTTCGCCATGTTGGTCAGGCTGGCCTCGAATTCCTGACCTCAAGTGATCCGCCCGCCTCGGCCTCCCAACGTTTTGG+?=@7=>B==;;BB?<B?=8539<6?6>8>=BB<<B=08:9@5;:A@@?@9:BAAA<?;8;@AC@BBBBBA?<9-@B@;CAA77<:BEB<BB@07?@=<?84

ASCII code for Quality score (Phred score, ranges from 0-50)

ASCII code for Quality score (in the increasing order; ! is the worst and ~ is the best

SimilartotheFastq fileinthatitcontainstherawsequenceanditsqualityscores.

Italsotellsyouwherethesequencealignedtothegenome,andhowwell(thisscre isalsophred-scaled).

Inthiscase,thisreadalignedtochromosome22,position17445857,andhasaqualityscoreof60(ora1in1,000,000chanceofbeingplacedincorrectly).

SequenceAlignment/Map(SAM/BAM)SRR098401.104031357 83 chr22 17445857 60 76M = 17445512 -421 ACTGTTACCAGATCAAGAACTGATAGGGACAGGGATCATTATTCCCCCTTTACAGATGAGAAGGCCGTCACGCCTC @@>>B@@@BBAAAB9A@@>:@@?=A@?@?@A???>?@??=???@@@@@>@>>@@@><??@>@>@@8?>?=:@>?>> BD:Z:NOJKPQQQQMONOMKKKLNOMNLLLJLMINLJLMLMLKKKKJLJJJMKCKLINJMMLJKKKMOOMNNOLPQSNMKK PG:Z:MarkDuplicates RG:Z:NA12878 BI:Z:OOMLRRPPRPPQQONOLOPOONOOOKLNMONJKMNONMMMMLMKKKMLGMNLNMMNNJMJLNOMLNMPNONONNMM NM:i:0 MQ:i:60 AS:i:76 XS:i:0

Page 8: Sanger vs Next-Gen Sequencing … · Next-Gen Sequencing Workflow Source: Lu and Shen, 2016, Biochemistry, Genetics and Molecular Biology. DOI: 10.5772/61657 • Genome • Whole

8

VariantCallFormat(VCF)

RNA-Seq Data Analysis

Page 9: Sanger vs Next-Gen Sequencing … · Next-Gen Sequencing Workflow Source: Lu and Shen, 2016, Biochemistry, Genetics and Molecular Biology. DOI: 10.5772/61657 • Genome • Whole

9

ComputationalAnalysisofRNA-SeqData

Source: Conesa et al., Genome Biology, 2016, 17:13

RNA-Seq DataAnalysisWorkflow

FastQC, FQTrim

STAR, HISAT, TopHat, Sailfish, Salmon

Cufflinks, EdgeR, DESeq

CuffDiff, DESeq, DegeR, Limma

GSEA, IPA, DAVID, GO, etc.

Illumina, Ion Torrent, PacBio

Page 10: Sanger vs Next-Gen Sequencing … · Next-Gen Sequencing Workflow Source: Lu and Shen, 2016, Biochemistry, Genetics and Molecular Biology. DOI: 10.5772/61657 • Genome • Whole

10

InputFilesforRNA-seqAnalysis

Download Test Data file from the Course Page and unzip the folder

GalaxyServerhttps://usegalaxy.org/

• A large compilation of open-source NGS data analysis tools that

are accessible to users on web-based platforms

• Data can be uploaded from a PC/Mac and computing can be done

on the cloud

• No need to install tools and maintain servers locally

• In-depth tutorials are available to use Galaxy services

• A list of Public Galaxy Servers can be found at

• https://galaxyproject.org/public-galaxy-servers/

• Today’s RNA-seq analysis will be performed from the following link

• https://bioinf-galaxian.erasmusmc.nl/galaxy/

Page 11: Sanger vs Next-Gen Sequencing … · Next-Gen Sequencing Workflow Source: Lu and Shen, 2016, Biochemistry, Genetics and Molecular Biology. DOI: 10.5772/61657 • Genome • Whole

11

Phred Score(Q)explainedPhred&score&(Q)&vs&Error&probability&(P)&

PQ 10log10−=

&

BaseSequenceQualityInterpretation

Bad Quality

Bad QualityQuality drops at the tail end

Excellent Quality

Page 12: Sanger vs Next-Gen Sequencing … · Next-Gen Sequencing Workflow Source: Lu and Shen, 2016, Biochemistry, Genetics and Molecular Biology. DOI: 10.5772/61657 • Genome • Whole

12

ReadMappingandAssembly

Source: https://home.cc.umanitoba.ca/~frist/PLNT7690/lec12/lec12.3.html

DownstreamAnalysisofRNA-seqResults

IPA: Ingenuity Pathway Analysis

Source: Yoo et al., Nature Genetics, 2014

Hierarchical Clustering

GSEA- Gene Set Enrichment Analysis

Source: Graner et al, Front. Oncology, 2015

Source: Li et al, Scientific Reports, 2015 Source: Bee et al., PLoS ONE, 2011