basic principles of whole genome sequencing
TRANSCRIPT
Disclosure of speaker’s interests
(Potential) conflict of interest
none
Potentially relevant company relationships in
connection with event 1
none
Sponsorship or research funding2
Fee or other (financial) payment3
Shareholder4
Other relationship, i.e. …5
none
Disclosure slide for speaker at EUCIC Advanced module
for Infection Prevention and Control
© ESCMID eLibrary by a
uthor
Why?
• Define differences
• Outbreak situation & transmission chains
• Endemic situation – Monitoring/Surveillance
• Diagnosis (Re-Infection?)
• Nomenclature
• Difference epidemic clones with special characteristics
• Virulence factors, antibiotic resistance pattern
• Requirements: Discriminatory, specific, reproducible
• Fingerprint
Bacterial Diversity and Typing
© ESCMID eLibrary by a
uthor
How?
• Non-molecular methods
• Phage-Typing
• Phenotyping
• Serology
• Molecular methods
• DNA banding pattern (e.g. AFLP)
• Sequence-based analyses (e.g. MLST, spa-typing)
• PCR detection
Bacterial Diversity and Typing
© ESCMID eLibrary by a
uthor
Sequence-based molecular typing
• Multi-locus sequence typing (MLST)
• Differences in 7 housekeeping genes
• Combination of alleles resulting in sequence types (ST) – phone number
• ST1 – 1-1-1-1-1-1-1
• ST22 – 7-6-1-5-8-8-6
• Sequence types with 5 or more shared alleles are combined in clonal complexes (CC) – area code
• CC22
• ST22 7-6-1-5-8-8-6
• ST2371 258-6-1-5-8-8-6
• ST337 215-6-184-5-8-8-6
• ST3924 411-6-1-5-8-8-6
Bacterial Diversity and Typing
© ESCMID eLibrary by a
uthor
With epidemic clones circulating, typing is at its limit
• EMRSA-15, K. pneumoniae ST258, E. coli ST131
@torstenseemann
Bacterial Diversity and Typing
© ESCMID eLibrary by a
uthor
0.05
7229_5#2
7229_3#42
8728_5#68
7229_5#13
7396_2#87
8113_4#42
7229_5#12
8113_4#64
7229_6#54
7229_3#33
8113_4#27
8113_4#75
8113_4#29
7229_4#93
8113_4#53
7474_2#79
7229_6#53
7229_4#56
7396_1#5
7229_4#54
7474_2#91
8113_4#48
8113_4#55
7396_1#43
7229_6#68
7229_4#90
7474_2#93
8113_4#50
8113_4#21
7229_4#71
8113_4#78
9716_3#40
7474_2#68
7396_1#3
8113_4#18
8113_4#36
7229_5#32
8113_4#40
7229_5#287396_2#82
7229_6#50
7396_1#6
7396_1#12
7229_6#85
7229_5#30
7229_4#85
7396_2#92
7396_2#66
7396_1#20
7229_3#38
8113_4#23
7229_6#75
7396_2#58
7229_3#40
8113_4#38
8113_4#57
7229_6#89
7396_1#37
8728_5#69
7229_4#89
7229_4#70
7396_1#21
7229_6#92
7396_1#30
8113_4#71
7396_2#49
8728_5#65
7396_2#57
7474_2#89
7229_4#52
7229_4#62
7396_1#16
7396_1#44
7229_5#16
7229_3#16
7396_2#61
7229_6#95
7229_6#56
7396_1#27
7229_4#61
7229_4#73
8113_4#37
8113_4#41
8113_4#24
7396_1#4
7396_1#23
8113_4#39
7229_4#80
7396_1#38
7229_6#60
8113_4#25
7229_6#71
7229_4#53
8113_4#28
7229_6#59
7396_2#51
7229_5#36
7229_4#81
8113_4#46
7229_4#51
7229_6#70
7229_6#88
8113_4#62
7474_2#63
7229_5#31
7229_5#47
7229_4#88
7229_6#79
8113_4#60
7396_2#86
7474_2#65
7229_6#78
7229_4#63
7229_3#18
7474_2#72
7229_3#31
8728_5#11
7474_2#84
8113_4#82
7229_4#57
7474_2#78
7474_2#88
8113_4#66
7229_5#37
8113_4#76
7396_2#90
7474_2#74
7229_3#5
7396_1#14
8113_4#72
8728_5#72
7229_3#26
7474_2#96
8113_4#56
8113_4#22
7396_1#36
7396_1#11
7396_2#79
7229_6#87
8113_4#20
7229_6#84
7396_2#55
7229_5#17
7396_2#70
7229_5#8
7229_5#34
7396_1#18
7396_2#91
7396_2#81
8113_4#67
7229_3#8
7396_1#41
7396_2#89
7229_6#76
7229_3#24
8113_4#61
7229_3#28
7396_2#59
7229_3#29
7396_1#48
7229_5#43
8113_4#81
7229_5#5
7229_5#48
9716_3#41
7474_2#86
7229_5#33
7229_6#64
7229_3#48
7396_1#15
8728_5#71
7229_3#43
7396_2#72
7229_4#65
7396_2#65
7229_4#86
7229_6#69
8113_4#69
8113_4#79
8113_4#52
7229_5#197396_1#19
7396_1#35
8113_4#35
7229_5#15
7229_5#11
7229_6#51
7229_4#87
7229_5#18
7396_1#26
7474_2#69
8113_4#47
7474_2#70
7229_5#25
7229_5#42
7229_6#82
7396_2#52
7229_3#7
7396_2#75
8113_4#83
8113_4#51
7396_2#80
7229_5#44
7229_5#3
7229_3#6
7229_5#22
7229_6#86
7229_5#4
7229_4#72
7396_1#7
8113_4#31
7229_6#62
7229_4#76
7229_3#41
9716_3#39
7396_2#77
7396_1#24
7229_4#79
7474_2#73
7396_1#8
7396_1#32
7229_5#14
7229_6#578728_5#10
7229_4#78
7396_2#74
7474_2#92
7229_4#77
7229_6#66
7229_5#7
7229_5#6
7229_4#60
7229_4#64
7229_4#94
7229_3#22
8113_4#84
7229_3#2
8113_4#17
7229_3#12
7229_4#68
8113_4#54
7229_4#59
7229_5#9
8113_4#65
8728_5#77396_2#95
7474_2#66
7229_3#35
8113_4#77
8728_5#66
7474_2#94
7396_2#54
7474_2#87
7396_2#94
8113_4#30
7396_2#63
8113_4#80
7396_2#85
7229_5#10
7396_2#56
7396_1#39
7229_3#30
7229_3#34
7229_5#38
7229_6#90
7396_1#42
7474_2#90
8728_5#70
7396_2#60
7396_1#33
7474_2#82
7396_2#78
7396_1#40
7396_1#47
7229_6#73
8113_4#74
7396_2#69
8728_5#67
7229_4#82
8113_4#87
7229_4#49
7229_3#25
7396_2#67
7229_3#27
7229_3#17
8113_4#73
7229_6#74
7229_6#49
7396_2#64
7396_2#96
7396_2#83
7229_3#17229_3#47
8113_4#34
8113_4#14
8728_5#627229_3#15
7229_4#74
7396_1#28
7396_1#22
7229_6#81
7396_2#50
8113_4#58
8113_4#45
7229_3#4
7229_5#20
7396_1#17
7396_2#76
7396_2#84
7229_4#55
7396_1#10
7229_4#67
7229_6#93
7229_6#67
7229_5#1
7229_5#21
7229_4#66
7474_2#81
7396_1#31
7229_4#84
7229_4#697229_4#96
7396_1#25
7229_3#9
8113_4#68
7229_4#92
7229_3#36
7229_6#63
7229_5#23
8113_4#33
7396_2#53
7474_2#76
7229_3#32
8113_4#59
7229_3#13
7229_5#35
7229_6#91
7229_3#3
8113_4#86
7229_6#96
7229_3#44
7229_4#58
7474_2#85
7396_2#71
8113_4#44
7396_2#68
7229_4#50
7229_5#29
7229_6#55
7229_5#24
7396_2#73
7229_3#11
7229_6#52
8113_4#63
7229_5#26
7229_6#80
7229_3#39
8113_4#49
7229_6#77
7229_5#39
7229_4#91
9716_3#91
7229_6#94
7396_2#88
7229_3#19
7229_5#41
7229_6#83
7229_3#23
7229_5#27
7396_1#2
7474_2#80
7229_5#40
7396_1#34
7229_6#61
7396_2#93
7229_4#95
7474_2#95
8728_5#64
8113_4#85
7396_1#13
7396_1#29
8113_4#26
7229_3#21
7474_2#71
7229_3#14
8728_5#128728_5#8
7474_2#77
8728_5#63
7229_4#83
7229_3#37
7396_1#1
7474_2#83
8113_4#43
7229_6#58
7229_6#72
7229_4#75
7396_2#62
7396_1#9
8113_4#5
7474_2#75
7229_3#107229_6#65
7229_3#20
CC5
CC1
CC8
CC22 – EMRSA-15
CC45
CC30 – EMRSA-16
MSSA Addenbrooke’s
Phylogeny – Staphylococcus sp.
© ESCMID eLibrary by a
uthor
@torstenseemann
Bacterial Diversity and Typing
With epidemic clones circulating, typing is at its limit
• EMRSA-15, K. pneumoniae ST258, E. coli ST131
© ESCMID eLibrary by a
uthor
Whole genome sequencing (WGS)
• Species
• Subtypes
• Antibiotic resistance
• Known genes & mutations, future predictions possible
• Virulence factors
• Phylogenetic relationships
Bacterial Diversity and Typing
© ESCMID eLibrary by a
uthor
• 1. Generation – “old school”
Sanger Sequencing – 1977
ABI Prism 3700
Overview of Sequencing Generations
© ESCMID eLibrary by a
uthor
Background: DNA
Double-stranded molecule
Nucleoside-Monophosphate + Base
Chain
(RNA only) (DNA only)
© ESCMID eLibrary by a
uthor
Chain termination Synthesis
Background: Sanger Method
RNA Ribonucleic acid
DNA Deoxyribonucleic acid
dDNA Dideoxyribonucleic acid
O B
O
O
O
O-
P
OH OH
O B
O
O
O
O-
P
OH H
O B
O
O
O
O-
P
H H
5’ 5’ 5’
3’ 3’ 3’
2’Hydroxy 2’Deoxy Di-deoxy
© ESCMID eLibrary by a
uthor
Background: Sanger Method
https://lerninhalte.blogspot.com/2013/04/dna-sequenzierung-kettenabbruchmethode.html
Rea
din
g d
irec
tio
n
Sequence to analyse
Radioactively labelled primer
Electro-phoresis
© ESCMID eLibrary by a
uthor
Human Genome Project (1990-2003)
Capillary sequencing instead of gel electrophoresis (ca. 1998)
1. Generation: ABI3700 - 1998
LiCor ABI Prism 3700
© ESCMID eLibrary by a
uthor
• 1. Generation – “old school”
Sanger Sequencing – 1977
ABI Prism 3700
• 2. Generation – ”Next-generation” sequencing
Illumina MiSeq/HiSeq
454 (✝), IonTorrent, Qiagen GeneReader
Massive, parallel Shot-gun sequencing
Overview of Sequencing Generations
© ESCMID eLibrary by a
uthor
DNA Fragmentation, PCR Amplification
Background: Shot-gun Sequencing
BAC: Amplification Bias (systematic error)
Shot-gun: Amplification via PCR
Additionally:
• New assembly methods
• Existing reference genomes
[adapted from Shendure and Ji, 2008; Ansorge, 2009; Metzker, 2010]
© ESCMID eLibrary by a
uthor
Sequencing machine in “handy” format also for smaller laboratories
2. Generation: Illumina (MiSeq, HiSeq, NextSeq)
© ESCMID eLibrary by a
uthor
100 MICRONS
A C G T
> 100 MILLION CLUSTERS PER FLOW CELL
20 MICRONS
Background: Illumina Technologie
© ESCMID eLibrary by a
uthor
1 2 3 7 8 9 4 5 6
Background: Illumina Technologie
TGCTACGAT…
TTTTTTGT… © ESCMID eLibrary by a
uthor
Background: Comparison Sanger vs Illumina
[adapted from Shendure and Ji, 2008; Ansorge, 2009; Metzker, 2010]
© ESCMID eLibrary by a
uthor
Short Reads
• Good for detecting variants (single nucleotide polymorphisms /
SNPs) compared to closely-related reference genomes
• Repeat Elements, e.g. IS elements
• Phages
• Assembly (putting it back together) difficult and with gaps
• Plasmids are often mosaic and fluid in structure
Background: Problems of shot-gun data
© ESCMID eLibrary by a
uthor
• 1. Generation – “old school”
Sanger Sequencing – 1977
ABI Prism 3700
• 2. Generation – ”Next-generation” sequencing
Illumina MiSeq/HiSeq
454 (✝), IonTorrent, Qiagen GeneReader
Massive, parallel Shot-gun sequencing
• 3. Generation
PacBio RSII
Oxford Nanopore Technologies MinION
Long reads, single molecule
Overview of Sequencing Generations
© ESCMID eLibrary by a
uthor
Pacific Biosciences
Sequencing via polymerases attached to bottom of well
Advantages:
• Long reads for de novo assembly
Disadvantages:
• Large footprint
• High error rate – But repeated circular reading of molecule reduces
this
• Limited throughput, high costs
3. Generation: PacBio
© ESCMID eLibrary by a
uthor
Oxford Nanopore Technologies (ONT)
Reading of sequence of a single molecule passing through a pore
– electric signal
Advantage:
Small & handy, e.g. use ”in the field” – Ebola
Disadvantage:
Not fully mature yet, currently under constant development
Error rate – electric signal reads “words” (kmer) of length
depends on model, homopolymers problematic
3. Generation: Nanopore
© ESCMID eLibrary by a
uthor
Background: Comparison Sanger, Illumina vs Nanopore
[adapted from Shendure and Ji, 2008; Ansorge, 2009; Metzker, 2010; Goodwin, McPherson and McCombie 2016] © ESCMID eLibrary b
y author
Background: Comparison sequencers
Sanger Illumina PacBio ONT
Read length 500-1000bp 150-250 (PE) ~20,000bp <200,000bp
Throughput 0.0003Gb 4.5-5.1Gb 500Mb-1Gb <1.5Gb
Samples / run 96/384 20-24 (96) 1 1-6
Runtime 1-2h 24h 4h 48h (realtime Analysis possible)
Error profile 0.1%, Substitution
13% single pass, <1% circular consensus
~12%, Indel
Cost instrument $100,000 $695,000 $1,000
cost/Gb $212 $1,000 $750
[adapted from Goodwin, McPherson, and McCombie, 2016]
© ESCMID eLibrary by a
uthor