introduction to second generation sequencing

22
The Queensland Brain Institute | Introduction to 2GS data analysis Drink faster ! 6/14/22 [MIT]

Upload: denis-bauer

Post on 11-Jun-2015

3.256 views

Category:

Technology


1 download

DESCRIPTION

An introduction to second generation sequencing will be given with focus on the basic production informatics: The approach of raw data conversion and quality control will be discussed.

TRANSCRIPT

Page 1: Introduction to second generation sequencing

The Queensland Brain Institute |

Introduction to 2GS data analysisDrink faster !

April 13, 2023

[MIT]

Page 2: Introduction to second generation sequencing

The Queensland Brain Institute | April 13, 2023

Product Time

fastq 5 days

bam, vcf,… 3 weeks

paper >6 months

Per one-flowcell project

Production Informatics and Bioinformatics

Map to genome and generate raw genomic features (e.g. SNPs)

Analyze the data; Uncover the biological meaning

Produce raw sequence readsBasic ProductionInformatics

Advanced Production Inform.

BioinformaticsResearch

Page 3: Introduction to second generation sequencing

The Queensland Brain Institute | April 13, 2023

• First Generation: Sanger sequencing

• Second Generation: amplified molecule sequencing

• Third Generation: single molecule sequencing

Brief history of sequencing

*

*

* Discussion about category

Page 4: Introduction to second generation sequencing

The Queensland Brain Institute | April 13, 2023

What steps are involved in sequencing ?

• sequencing by synthesis (SBS) technology– Fragmentation– Library generation– Amplification– Sequencing– Analysis

Illumina Marketing: “3h 10 minutes wet-lab30 minutes dry lab”

Page 5: Introduction to second generation sequencing

The Queensland Brain Institute | April 13, 2023

Illumina sequencing: Library + Amplification

“Illumina Sequencing Technology” booklet

Page 6: Introduction to second generation sequencing

The Queensland Brain Institute | April 13, 2023

Illumina Sequencing: Synthesis + Imaging

“Illumina Sequencing Technology” booklet

Page 7: Introduction to second generation sequencing

The Queensland Brain Institute | April 13, 2023

Output: 1.5 Terabyte of data

Inspired by anzska information booklet

Page 8: Introduction to second generation sequencing

The Queensland Brain Institute | April 13, 2023

Sequencer Output Conversion: Production Informatics

• 1.5 TB data : 6 billion clusters with 100 bp reads = 600 billion data points

visualpharm.com

For HiSeq: images are converted to flat files (*.bcl or *.cif)

CASAVA

…× read length

HiSeq

Maysoft

Page 9: Introduction to second generation sequencing

The Queensland Brain Institute | April 13, 2023

Multiplexing

• 6 billion reads:– 750 million reads per lane

• Currently 12-plex (soon 96-plex):– One run

Oliver Twardowski

Page 10: Introduction to second generation sequencing

The Queensland Brain Institute | April 13, 2023

Demultiplexing

visualpharm.com

CASAVA

…× samples

…× read length

Page 11: Introduction to second generation sequencing

The Queensland Brain Institute | April 13, 2023

CASAVA1.8.0 program call

configureBclToFastq.pl \--input-dir Data/Intensities/BaseCalls/ \

-output-dir Data/Unaligned \--sample-sheet SampleSheet.csv \ --use-bases-mask y100,I6nn,Y100 >file.log 2>&1

cd Data/Unaligned

qsub -pe make 16 -j y -v $MYPATH –o qsub.out -cwd –N fastq -b y \ make -j 16

Runtime: ~ 6h

Page 12: Introduction to second generation sequencing

The Queensland Brain Institute | April 13, 2023

Fastq files

@HWI-ST301_0112:1:1:1169:2044#0/1CCATAAGGCCACGTATTTTGCAAGCTATTTAACTGGCGGCGAT+HWI-ST301_0112:1:1:1169:2044#0/1dddc\dd^dd`acacdacd`ecdedabdcdddcc\``\`bTa\

36 36 36 35 28 …

ASCII @ .. ~DEC 64 .. 126PHRED 0 .. 62

Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 2010 Apr;38(6):1767-71. PMID:20015970

Phred scores are estimates only !

Page 13: Introduction to second generation sequencing

The Queensland Brain Institute | April 13, 2023

Fastq – PHRED quality

• Pathological

Page 14: Introduction to second generation sequencing

The Queensland Brain Institute | April 13, 2023

Fastq: Quality control

• Base-pair quality score

• Adapter contamination

• Uneven Amplification

Page 15: Introduction to second generation sequencing

The Queensland Brain Institute | April 13, 2023

Three things to remember

1. Don’t be fooled by marketing2. Fastq files are not directly usable3. Basic-run QC can be made from fastq file

“All modern genomics projects are now bottlenecked at the stage of data analysis rather than data production”

Ewan Birney European Bioinformatics Institute

Wellcome Trust

David S. Roos Bioinformatics--Trying to Swim in a Sea of Data; Science 16 February 2001: Vol. 291 no. 5507 pp. 1260-1261 DOI: 10.1126/science.291.5507.1260

Page 16: Introduction to second generation sequencing

The Queensland Brain Institute | April 13, 2023

Next Week:

Abstract: This session will focus on identifying SNPs from whole genome, exome capture or targeted resequencing data. The approaches of mapping, local realigment, recalibration, SNP calling, and SNP recalibration will be introduced and quality metrics discussed.

Page 17: Introduction to second generation sequencing

The Queensland Brain Institute | April 13, 2023

Walk-in-clinic

Page 18: Introduction to second generation sequencing

The Queensland Brain Institute | April 13, 2023

• First Generation: Sanger sequencing

• Second Generation: amplified molecule sequencing

• Third Generation: single molecule sequencing

Brief history of sequencing

*

*

* Discussion about category

Page 19: Introduction to second generation sequencing

The Queensland Brain Institute | April 13, 2023

Helicos

• true Single Molecule Sequencing(tSMS)™ technology– Sequencing by synthesis but much more sensitive so no

amplification

Page 20: Introduction to second generation sequencing

The Queensland Brain Institute | April 13, 2023

Life Technology - Ion Torrent

• Hydrogen Ion is released by the incorporation of a nucleotide, which is measured by a semiconductor

• Depending on which nucleotide wash cycle the signal coincides

Page 21: Introduction to second generation sequencing

The Queensland Brain Institute | April 13, 2023

PacBio

• Immobilized polymerase at the bottom of a well

• Fluorescent nucleotides float around and if they are incorporated they are held still for tens of milliseconds, which is the signal that is recorded

• No upper limit on the length

http://www.pacificbiosciences.com/smrt-biology/smrt-technology?page=4

Page 22: Introduction to second generation sequencing

The Queensland Brain Institute | April 13, 2023

Nanopore

• Molecule is sucked through a poor and the change in the membrane charge due to the different nucleotides is recorded.

http://www.nanoporetech.com/sections/index/82