Download - Analysis of Next Generation Sequence Data
![Page 1: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/1.jpg)
The Genome Access Course
November 2011
Analysis of Next Generation Sequence Data
Illumina HiSeq2000600 Gbp
(6 billion reads) in ~11 days
![Page 2: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/2.jpg)
The Genome Access Course
November 2011
Typical Next Gen Experiments
• Genome sequencing– Novel genomes– Resequencing
• Transcriptome sequencing (RNA-seq)– Characterize transcripts with or without reference genome
• Typical length• Short (microRNAs, …)
– Find differentially expressed transcripts
• Other– Methyl-seq– ChIP-seq– RIP-seq– …
![Page 3: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/3.jpg)
The Genome Access Course
November 2011
Types of Sequencing Libraries
Single-End Reads - 5’ or 3’ (random)
Paired-End Reads - 5’ and 3’
Mate-Pair Reads - 5’ and 3’
2-5 kbp
200-500 bp
![Page 4: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/4.jpg)
The Genome Access Course
November 2011
What Does the Data Look Like?FASTQ File Format
Sequence
Quality (ASCII character for each base)
> 80 million reads in one lane
![Page 5: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/5.jpg)
The Genome Access Course
November 2011
Quality Control Analysis of Reads
![Page 6: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/6.jpg)
The Genome Access Course
November 2011
Trim Sequences Prior To Analysis
• Make sure sequencing adapters are removed• Trim ends of sequence based on quality scores
![Page 7: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/7.jpg)
The Genome Access Course
November 2011
Sequence Composition Diagnostics
Unbiased Reads
Biased Reads
First Position Nearly Always “T”
![Page 8: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/8.jpg)
The Genome Access Course
November 2011
Genome Sequencing
![Page 9: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/9.jpg)
The Genome Access Course
November 2011
Workflows for Genome Sequencing
Novel Genome Sequencing
• de novo assembly– Generate contigs and
scaffolds using overlapping reads
• If applicable, align reads from a sample back to consensus to examine variation
Resequencing
• Align reads from a sample to a reference genome assembly to examine variation– BWA mapping software
![Page 10: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/10.jpg)
The Genome Access Course
November 2011
Sequence Alignment/Map (SAM) Format
Common file format to store reads and their alignment to a reference sequence
Generated by most next gen analysis softwaresamtools software package
![Page 11: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/11.jpg)
The Genome Access Course
November 2011
Binary Alignment/Map (BAM) Files
• SAM (text file) BAM (binary file)– Not human-readable– Smaller file sizes
• BAM is widely used:– Often deposited to Gene Expression Omnibus (GEO) at NCBI– UCSC Genome Browser can display alignments as a track
![Page 12: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/12.jpg)
The Genome Access Course
November 2011
UCSC Genome Browser with 1,000 Genomes Project Data
![Page 13: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/13.jpg)
The Genome Access Course
November 2011
LookSeq at Sanger Mouse Genomes Project
![Page 14: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/14.jpg)
The Genome Access Course
November 2011
Glo1 CNV Present in Mouse Genomes Data for A/J
Proximal FlankChr17: 30.5Mb
Max ~50x coverage
Glo1 LocusChr17: 30.7Mb
Max >100x coverage
Distal FlankChr17: 31.2Mb
Max ~50x coverage
50kb 50kb 50kb
![Page 15: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/15.jpg)
The Genome Access Course
November 2011
Glo1 CNV Not Present in Mouse Genomes Data for NZO
Proximal FlankChr17: 30.5Mb
Max ~25x coverage
Glo1 LocusChr17: 30.7Mb
Max ~25x coverage
Distal FlankChr17: 31.2Mb
Max ~25x coverage
50kb 50kb 50kb
![Page 16: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/16.jpg)
The Genome Access Course
November 2011
RNA-seq Data Analysis
![Page 17: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/17.jpg)
The Genome Access Course
November 2011
RNA-Seq
Reads are randomly sampled fragments from RNA sample
Proportion of reads for a transcript Expression level of transcript
Lots of reads needed to construct models for every alternatively spliced transcript
Garber et al, Nat Methods (2011)
![Page 18: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/18.jpg)
The Genome Access Course
November 2011
Experimental Design
Auer & Doerge Genetics (2010) 185: 405-416
![Page 19: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/19.jpg)
The Genome Access Course
November 2011
Marioni et al, Genome Res (2008) 18(9):1509-17
![Page 20: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/20.jpg)
The Genome Access Course
November 2011
Comparison of Affy and RNA-seq
Marioni et al, Genome Res (2008) 18(9):1509-17
![Page 21: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/21.jpg)
The Genome Access Course
November 2011
Comparison of Affy and RNA-seq
Marioni et al, Genome Res (2008) 18(9):1509-17
![Page 22: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/22.jpg)
The Genome Access Course
November 2011
Marioni et al, Genome Res (2008) 18(9):1509-17
![Page 23: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/23.jpg)
The Genome Access Course
November 2011Shendure Nat Methods (2008) 5(7): 585-7
![Page 24: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/24.jpg)
The Genome Access Course
November 2011
Workflows for RNA-seq
Novel Transcriptome Sequencing• de novo assembly
• Align reads from each sample/group to assembly
– Statistics for each transcript contig
Transcriptome Sequencing with Reference Genome
• Align reads from each sample/group to genome
– Statistics for each transcript model
– Examine isoforms
QC ReadsQC Reads
Analyze CountsAnalyze Counts
![Page 25: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/25.jpg)
The Genome Access Course
November 2011
de novo Transcriptome Assembly
Rarefaction Plot
How much sequencing is enough?
![Page 26: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/26.jpg)
The Genome Access Course
November 2011
Mapping Reads
Align reads to a referenceGenome assemblyTranscriptome assembly
Commonly used aligners:bwabowtie
![Page 27: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/27.jpg)
The Genome Access Course
November 2011
RNAseq Workflow With Reference Genome
Langmead et al. Genome Biology (2010), 11:R83
![Page 28: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/28.jpg)
The Genome Access Course
November 2011
Map Reads & ObtainCount Reads Per Gene
Both utilize a reference genome
![Page 29: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/29.jpg)
The Genome Access Course
November 2011
Bowtie/TopHat
Trapnell, Pachter, Salzberg. Bioinformatics (2009) 25(9):1105-1111
Bowtie uses Burrows-Wheeler indexing for rapid mapping
TopHat uses Initially Un-Mapped (IUM) reads to find novel splice sites
![Page 30: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/30.jpg)
The Genome Access Course
November 2011
Cufflinks
FPKM = Fragments Per Kilobase of transcript per Million fragments mapped
Trapnell et al. Nature Biotech (2010) 28(5):511-515
![Page 31: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/31.jpg)
The Genome Access Course
November 2011
Galaxy
Can be used to upload FASTQ files and then run a number of QC tools and many other tools:
bwabowtietophatcufflinks…
![Page 32: Analysis of Next Generation Sequence Data](https://reader033.vdocuments.us/reader033/viewer/2022051517/568157f2550346895dc56d50/html5/thumbnails/32.jpg)
The Genome Access Course
November 2011
Third Generation Sequencing