basic principles of whole genome sequencing

29
Basic Principles of Whole Genome Sequencing © ESCMID eLibrary by author

Upload: others

Post on 03-Apr-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Basic Principles of Whole Genome

Sequencing

© ESCMID eLibrary by a

uthor

Disclosure of speaker’s interests

(Potential) conflict of interest

none

Potentially relevant company relationships in

connection with event 1

none

Sponsorship or research funding2

Fee or other (financial) payment3

Shareholder4

Other relationship, i.e. …5

none

Disclosure slide for speaker at EUCIC Advanced module

for Infection Prevention and Control

© ESCMID eLibrary by a

uthor

Why?

• Define differences

• Outbreak situation & transmission chains

• Endemic situation – Monitoring/Surveillance

• Diagnosis (Re-Infection?)

• Nomenclature

• Difference epidemic clones with special characteristics

• Virulence factors, antibiotic resistance pattern

• Requirements: Discriminatory, specific, reproducible

• Fingerprint

Bacterial Diversity and Typing

© ESCMID eLibrary by a

uthor

How?

• Non-molecular methods

• Phage-Typing

• Phenotyping

• Serology

• Molecular methods

• DNA banding pattern (e.g. AFLP)

• Sequence-based analyses (e.g. MLST, spa-typing)

• PCR detection

Bacterial Diversity and Typing

© ESCMID eLibrary by a

uthor

Sequence-based molecular typing

• Multi-locus sequence typing (MLST)

• Differences in 7 housekeeping genes

• Combination of alleles resulting in sequence types (ST) – phone number

• ST1 – 1-1-1-1-1-1-1

• ST22 – 7-6-1-5-8-8-6

• Sequence types with 5 or more shared alleles are combined in clonal complexes (CC) – area code

• CC22

• ST22 7-6-1-5-8-8-6

• ST2371 258-6-1-5-8-8-6

• ST337 215-6-184-5-8-8-6

• ST3924 411-6-1-5-8-8-6

Bacterial Diversity and Typing

© ESCMID eLibrary by a

uthor

With epidemic clones circulating, typing is at its limit

• EMRSA-15, K. pneumoniae ST258, E. coli ST131

@torstenseemann

Bacterial Diversity and Typing

© ESCMID eLibrary by a

uthor

0.05

7229_5#2

7229_3#42

8728_5#68

7229_5#13

7396_2#87

8113_4#42

7229_5#12

8113_4#64

7229_6#54

7229_3#33

8113_4#27

8113_4#75

8113_4#29

7229_4#93

8113_4#53

7474_2#79

7229_6#53

7229_4#56

7396_1#5

7229_4#54

7474_2#91

8113_4#48

8113_4#55

7396_1#43

7229_6#68

7229_4#90

7474_2#93

8113_4#50

8113_4#21

7229_4#71

8113_4#78

9716_3#40

7474_2#68

7396_1#3

8113_4#18

8113_4#36

7229_5#32

8113_4#40

7229_5#287396_2#82

7229_6#50

7396_1#6

7396_1#12

7229_6#85

7229_5#30

7229_4#85

7396_2#92

7396_2#66

7396_1#20

7229_3#38

8113_4#23

7229_6#75

7396_2#58

7229_3#40

8113_4#38

8113_4#57

7229_6#89

7396_1#37

8728_5#69

7229_4#89

7229_4#70

7396_1#21

7229_6#92

7396_1#30

8113_4#71

7396_2#49

8728_5#65

7396_2#57

7474_2#89

7229_4#52

7229_4#62

7396_1#16

7396_1#44

7229_5#16

7229_3#16

7396_2#61

7229_6#95

7229_6#56

7396_1#27

7229_4#61

7229_4#73

8113_4#37

8113_4#41

8113_4#24

7396_1#4

7396_1#23

8113_4#39

7229_4#80

7396_1#38

7229_6#60

8113_4#25

7229_6#71

7229_4#53

8113_4#28

7229_6#59

7396_2#51

7229_5#36

7229_4#81

8113_4#46

7229_4#51

7229_6#70

7229_6#88

8113_4#62

7474_2#63

7229_5#31

7229_5#47

7229_4#88

7229_6#79

8113_4#60

7396_2#86

7474_2#65

7229_6#78

7229_4#63

7229_3#18

7474_2#72

7229_3#31

8728_5#11

7474_2#84

8113_4#82

7229_4#57

7474_2#78

7474_2#88

8113_4#66

7229_5#37

8113_4#76

7396_2#90

7474_2#74

7229_3#5

7396_1#14

8113_4#72

8728_5#72

7229_3#26

7474_2#96

8113_4#56

8113_4#22

7396_1#36

7396_1#11

7396_2#79

7229_6#87

8113_4#20

7229_6#84

7396_2#55

7229_5#17

7396_2#70

7229_5#8

7229_5#34

7396_1#18

7396_2#91

7396_2#81

8113_4#67

7229_3#8

7396_1#41

7396_2#89

7229_6#76

7229_3#24

8113_4#61

7229_3#28

7396_2#59

7229_3#29

7396_1#48

7229_5#43

8113_4#81

7229_5#5

7229_5#48

9716_3#41

7474_2#86

7229_5#33

7229_6#64

7229_3#48

7396_1#15

8728_5#71

7229_3#43

7396_2#72

7229_4#65

7396_2#65

7229_4#86

7229_6#69

8113_4#69

8113_4#79

8113_4#52

7229_5#197396_1#19

7396_1#35

8113_4#35

7229_5#15

7229_5#11

7229_6#51

7229_4#87

7229_5#18

7396_1#26

7474_2#69

8113_4#47

7474_2#70

7229_5#25

7229_5#42

7229_6#82

7396_2#52

7229_3#7

7396_2#75

8113_4#83

8113_4#51

7396_2#80

7229_5#44

7229_5#3

7229_3#6

7229_5#22

7229_6#86

7229_5#4

7229_4#72

7396_1#7

8113_4#31

7229_6#62

7229_4#76

7229_3#41

9716_3#39

7396_2#77

7396_1#24

7229_4#79

7474_2#73

7396_1#8

7396_1#32

7229_5#14

7229_6#578728_5#10

7229_4#78

7396_2#74

7474_2#92

7229_4#77

7229_6#66

7229_5#7

7229_5#6

7229_4#60

7229_4#64

7229_4#94

7229_3#22

8113_4#84

7229_3#2

8113_4#17

7229_3#12

7229_4#68

8113_4#54

7229_4#59

7229_5#9

8113_4#65

8728_5#77396_2#95

7474_2#66

7229_3#35

8113_4#77

8728_5#66

7474_2#94

7396_2#54

7474_2#87

7396_2#94

8113_4#30

7396_2#63

8113_4#80

7396_2#85

7229_5#10

7396_2#56

7396_1#39

7229_3#30

7229_3#34

7229_5#38

7229_6#90

7396_1#42

7474_2#90

8728_5#70

7396_2#60

7396_1#33

7474_2#82

7396_2#78

7396_1#40

7396_1#47

7229_6#73

8113_4#74

7396_2#69

8728_5#67

7229_4#82

8113_4#87

7229_4#49

7229_3#25

7396_2#67

7229_3#27

7229_3#17

8113_4#73

7229_6#74

7229_6#49

7396_2#64

7396_2#96

7396_2#83

7229_3#17229_3#47

8113_4#34

8113_4#14

8728_5#627229_3#15

7229_4#74

7396_1#28

7396_1#22

7229_6#81

7396_2#50

8113_4#58

8113_4#45

7229_3#4

7229_5#20

7396_1#17

7396_2#76

7396_2#84

7229_4#55

7396_1#10

7229_4#67

7229_6#93

7229_6#67

7229_5#1

7229_5#21

7229_4#66

7474_2#81

7396_1#31

7229_4#84

7229_4#697229_4#96

7396_1#25

7229_3#9

8113_4#68

7229_4#92

7229_3#36

7229_6#63

7229_5#23

8113_4#33

7396_2#53

7474_2#76

7229_3#32

8113_4#59

7229_3#13

7229_5#35

7229_6#91

7229_3#3

8113_4#86

7229_6#96

7229_3#44

7229_4#58

7474_2#85

7396_2#71

8113_4#44

7396_2#68

7229_4#50

7229_5#29

7229_6#55

7229_5#24

7396_2#73

7229_3#11

7229_6#52

8113_4#63

7229_5#26

7229_6#80

7229_3#39

8113_4#49

7229_6#77

7229_5#39

7229_4#91

9716_3#91

7229_6#94

7396_2#88

7229_3#19

7229_5#41

7229_6#83

7229_3#23

7229_5#27

7396_1#2

7474_2#80

7229_5#40

7396_1#34

7229_6#61

7396_2#93

7229_4#95

7474_2#95

8728_5#64

8113_4#85

7396_1#13

7396_1#29

8113_4#26

7229_3#21

7474_2#71

7229_3#14

8728_5#128728_5#8

7474_2#77

8728_5#63

7229_4#83

7229_3#37

7396_1#1

7474_2#83

8113_4#43

7229_6#58

7229_6#72

7229_4#75

7396_2#62

7396_1#9

8113_4#5

7474_2#75

7229_3#107229_6#65

7229_3#20

CC5

CC1

CC8

CC22 – EMRSA-15

CC45

CC30 – EMRSA-16

MSSA Addenbrooke’s

Phylogeny – Staphylococcus sp.

© ESCMID eLibrary by a

uthor

@torstenseemann

Bacterial Diversity and Typing

With epidemic clones circulating, typing is at its limit

• EMRSA-15, K. pneumoniae ST258, E. coli ST131

© ESCMID eLibrary by a

uthor

Whole genome sequencing (WGS)

• Species

• Subtypes

• Antibiotic resistance

• Known genes & mutations, future predictions possible

• Virulence factors

• Phylogenetic relationships

Bacterial Diversity and Typing

© ESCMID eLibrary by a

uthor

• 1. Generation – “old school”

Sanger Sequencing – 1977

ABI Prism 3700

Overview of Sequencing Generations

© ESCMID eLibrary by a

uthor

Background: DNA

Double-stranded molecule

Nucleoside-Monophosphate + Base

Chain

(RNA only) (DNA only)

© ESCMID eLibrary by a

uthor

Background: DNA

DNA double strand

Base pairing

© ESCMID eLibrary by a

uthor

Chain termination Synthesis

Background: Sanger Method

RNA Ribonucleic acid

DNA Deoxyribonucleic acid

dDNA Dideoxyribonucleic acid

O B

O

O

O

O-

P

OH OH

O B

O

O

O

O-

P

OH H

O B

O

O

O

O-

P

H H

5’ 5’ 5’

3’ 3’ 3’

2’Hydroxy 2’Deoxy Di-deoxy

© ESCMID eLibrary by a

uthor

Background: Sanger Method

https://lerninhalte.blogspot.com/2013/04/dna-sequenzierung-kettenabbruchmethode.html

Rea

din

g d

irec

tio

n

Sequence to analyse

Radioactively labelled primer

Electro-phoresis

© ESCMID eLibrary by a

uthor

Human Genome Project (1990-2003)

Capillary sequencing instead of gel electrophoresis (ca. 1998)

1. Generation: ABI3700 - 1998

LiCor ABI Prism 3700

© ESCMID eLibrary by a

uthor

• 1. Generation – “old school”

Sanger Sequencing – 1977

ABI Prism 3700

• 2. Generation – ”Next-generation” sequencing

Illumina MiSeq/HiSeq

454 (✝), IonTorrent, Qiagen GeneReader

Massive, parallel Shot-gun sequencing

Overview of Sequencing Generations

© ESCMID eLibrary by a

uthor

DNA Fragmentation, PCR Amplification

Background: Shot-gun Sequencing

BAC: Amplification Bias (systematic error)

Shot-gun: Amplification via PCR

Additionally:

• New assembly methods

• Existing reference genomes

[adapted from Shendure and Ji, 2008; Ansorge, 2009; Metzker, 2010]

© ESCMID eLibrary by a

uthor

Sequencing machine in “handy” format also for smaller laboratories

2. Generation: Illumina (MiSeq, HiSeq, NextSeq)

© ESCMID eLibrary by a

uthor

Background: Illumina Technologie

© ESCMID eLibrary by a

uthor

100 MICRONS

A C G T

> 100 MILLION CLUSTERS PER FLOW CELL

20 MICRONS

Background: Illumina Technologie

© ESCMID eLibrary by a

uthor

1 2 3 7 8 9 4 5 6

Background: Illumina Technologie

TGCTACGAT…

TTTTTTGT… © ESCMID eLibrary by a

uthor

Background: Comparison Sanger vs Illumina

[adapted from Shendure and Ji, 2008; Ansorge, 2009; Metzker, 2010]

© ESCMID eLibrary by a

uthor

Short Reads

• Good for detecting variants (single nucleotide polymorphisms /

SNPs) compared to closely-related reference genomes

• Repeat Elements, e.g. IS elements

• Phages

• Assembly (putting it back together) difficult and with gaps

• Plasmids are often mosaic and fluid in structure

Background: Problems of shot-gun data

© ESCMID eLibrary by a

uthor

• 1. Generation – “old school”

Sanger Sequencing – 1977

ABI Prism 3700

• 2. Generation – ”Next-generation” sequencing

Illumina MiSeq/HiSeq

454 (✝), IonTorrent, Qiagen GeneReader

Massive, parallel Shot-gun sequencing

• 3. Generation

PacBio RSII

Oxford Nanopore Technologies MinION

Long reads, single molecule

Overview of Sequencing Generations

© ESCMID eLibrary by a

uthor

Pacific Biosciences

Sequencing via polymerases attached to bottom of well

Advantages:

• Long reads for de novo assembly

Disadvantages:

• Large footprint

• High error rate – But repeated circular reading of molecule reduces

this

• Limited throughput, high costs

3. Generation: PacBio

© ESCMID eLibrary by a

uthor

Oxford Nanopore Technologies (ONT)

Reading of sequence of a single molecule passing through a pore

– electric signal

Advantage:

Small & handy, e.g. use ”in the field” – Ebola

Disadvantage:

Not fully mature yet, currently under constant development

Error rate – electric signal reads “words” (kmer) of length

depends on model, homopolymers problematic

3. Generation: Nanopore

© ESCMID eLibrary by a

uthor

Background: Comparison Sanger, Illumina vs Nanopore

[adapted from Shendure and Ji, 2008; Ansorge, 2009; Metzker, 2010; Goodwin, McPherson and McCombie 2016] © ESCMID eLibrary b

y author

Background: Comparison sequencers

Sanger Illumina PacBio ONT

Read length 500-1000bp 150-250 (PE) ~20,000bp <200,000bp

Throughput 0.0003Gb 4.5-5.1Gb 500Mb-1Gb <1.5Gb

Samples / run 96/384 20-24 (96) 1 1-6

Runtime 1-2h 24h 4h 48h (realtime Analysis possible)

Error profile 0.1%, Substitution

13% single pass, <1% circular consensus

~12%, Indel

Cost instrument $100,000 $695,000 $1,000

cost/Gb $212 $1,000 $750

[adapted from Goodwin, McPherson, and McCombie, 2016]

© ESCMID eLibrary by a

uthor

Mapping

Comparison to reference genome

Identification of SNPs (Mutations)

Phylogenetic analysis

Assembly

De novo reconstruction / no reference

Analysis of what is not in the reference (Antibiotic resistance

genes, virulence factors, plasmids, phages,…)

Genome Analysis

© ESCMID eLibrary by a

uthor