introduction to single-isolates, single cge services · 2017-03-20 · workshop on whole genome...

37
Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE services

Upload: others

Post on 17-Mar-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017

Introduction to single-isolates, single CGE services

Page 2: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

Learning objective:

After this lecture and exercise, you should be able to…

…describe how the methods from Center for Genomic Epidemiology for identifying species, Multilocus Sequence Type, plasmids, and antimicrobial resistance genes work

…use the above-mentioned methods as stand-alone services and interpret the results

Page 3: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE
Page 4: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

Tools for species identificationName of Service Description Status PublicationSpeciesFinder Species

identification using 16S rRNA

OnlinePublished Feb 2014 PMID: 24574292

KmerFinder Species identification using overlapping 16mers

Online

Published Jan 2014 PMID: 24172157

TaxonomyFinder Taxonomy identification using functional protein domains

Under development

Published in PMID: 24574292 + Oksana Lukjancenko's PhD thesis

Reads2Type Species identification on client computer

OnlinePublished Feb 2014 PMID: 24574292

Page 5: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

PMID: 24574292

Page 6: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

Training data◇ 1,647 completed / almost completed genomes

downloaded from NCBI in 2011 (1,009 different species)

Evaluation data◇ NCBI draft genomes

• 695 isolates from species that overlap with training set (151 species)

◇ SRA draft genomes• 10,407 sets of short reads from Illumina (168 species)

• 10,407 draft genomes from Illumina data (168 species)

Page 7: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

16S rRNA• 16S rRNA sequencing has dominated molecular taxonomy of prokaryotes for more than 30 years (Fox et al, Int. J. Syst. Bacteriol., 1977)

• Tremendous amounts of 16S rRNA sequence data are available in databases

Concerns: • Low resolution • Some genomes contain several copies of the 16S rRNA gene with inter-gene variation

• The 16S rRNA gene represents only about 0.1% of the coding part of a microbial genome

Page 8: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

Reference database • 16S rRNA genes are isolated from genomes in training data using RNAmmer (Lagesen, NAR, 2007).

Method • Input genomes are BLASTed against 16S rRNA genes in reference database.

• Best hit is selected based on a combination of coverage, % identity, bitscore, number of mistmatches and number of gaps in the alignments.

CGE implementation of 16S species identification

SpeciesFinder

Page 9: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

•Genomesintrainingdataischoppedinto16mers:

A T G A C G T A T G A C T G A T G G C G T A G T A G T C C

•Downsampling

•Only16merswithspecificprefix(ATGAC)arekept

KmerFinder Using all information in the WGS data

almost

Page 10: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

Bact1-> E. coli

Bact2-> S. enterica

Bact3-> K. pneumoniae

Bact4-> S. aureus

?????

Query bacteria of unknown species

Reference db bacteria of known species (template)

Prediction: Query bacteria is a S. aureus

Page 11: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

Three other methods were evaluated

TaxonomyFinder:Performsitspredictionsbasedonthepresenceofproteinprofilesthatarespecifictoparticulartaxonomicgroups.

Reads2Type:Performsitspredictionsbasedonspecies-specific50mersinthe16SrRNAorgyrBgene(forEnterobacteriaceae).

rMLST:Performsitspredictionsbasedonupto53ribosomalgenes.ImplementedincollaborationwithKeithJolleyfromOxford(MLST).

Page 12: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

Results

(16srRNA)

Page 13: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

Summary of taxonomy benchmark study

• KmerFinder had the highest accuracy and was the fastest method.

• SpeciesFinder (16S rRNA-based) had the lowest accuracy.

• Methods that only sample genomic loci (16S, Reads2Type, rMLST) had difficulties distin-guishing species that only recently diverged, especially when main difference is a plasmid.

Page 14: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

“Standard”whenaimingatdeterminingthespeciesofoneisolate

“Winnertakesitall”ifyouhaveamixedsampleorsuspectyouhaveamixedsample

Page 15: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

KmerFinderstatistics

Squ

S:Score(totalnumberofuniquekmersinquerysequencethatmatchkmersintemplatesequence)qu:Totalnumberofuniquekmersinquerysequence

Slu

S:Score(totalnumberofuniquekmersinquerysequencethatmatchkmersintemplatesequence)lu:Totalnumberofuniquekmersintemplatesequence(databasesequence)

luS

Querycoverage

Templatecoverage

Kmersinquery Kmersintemplate(database)genome

qu

Page 16: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

MoreKmerFinderstatistics

Depth(DepthofCoverage).Onlyrelevantwhenuploadingrawreads

Average number of times each position is covered by a kmer.

N ⋅ LG

N=totalno.ofkmersthatmatchthetemplate(notthesameasscore)

L=16(lengthofkmer)

G=Totalno.ofuniquekmersintemplate

Page 17: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

KmerFinderoutput–standardscoringmethod

Page 18: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

Query(input)Rawreadsfromurinesamplearesplitinto16mers

Onlyunique16mersarekept

Template/referencedatabase

E.coli

P.mirabilis

S.aureus

Inthe“total”valuesthekmersareallowedtomatchmorethanonetemplate

“Winnertakesitall”

4493

3320

Depth

Page 19: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

Tools for further typing

Name of Service Description Publication

MLSTMultilocus sequence typing

Published Apr 2012, PMID: 22238442

PlasmidFinderIdentification of plasmids in Enterobacteriaceae (and Gram-positives)

Published Apr 2014, PMID: 24777092

pMLST pMLST of plasmids in Enterobacteriaceae

Published Apr 2014, PMID: 24777092

Page 20: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

MultilocusSequenceTyping(MLST)

Firstdevelopedin1998forNeisseriameningitis(Maidenetal.PNAS1998.95:3140-3145)

•Thenucleotidesequenceofinternalregionsofapp.7housekeepinggenesaredeterminedbyPCRfollowedbySangersequencing

•Differentallelesareeachassignedarandomnumber

•Theuniquecombinationofallelesisthesequencetype(ST)

Page 21: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

UsingWGSdataforMLST

DownloadoftheMLSTdatafrompubmlst.org

Page 22: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

Assembledgenome454–singleendreads454–pairedendreadsIllumina–singleendreadsIllumina–pairedendreadsIonTorrentSOLiD–singleendreadsSOLiD–matepairreads

Acinetobacterbaumannii#1Acinetobacterbaumannii#2ArcobacterBorreliaburgdorferiBacilluscereusBrachyspirahyodysenteriaeBifidobacteriumBrachyspiriaintermediaBordetellaBurkholderiapseudomalleiBrachyspiraBurkholeriacepaciacomplexCampylobacterjejuniClostridiumbotulinumClostridiumdifficile#1Clostridiumdifficile#2CampylobacterhelveticusCampylobacterinsulaenigraeClostridiumsepticumC.diphtheriaeCampylobacterfetusChlamydiales

CampylobacterlariCronobacterC.upsaliensisEscherichiacoli#1Escherichiacoli#2EnterococcusfaecalisEnterococcusfaeciumF.psychrophilumHaemophilusinfluenzaeHaemophilusparasuisHelicobacterpyloriKlebsiellapneumoniaeLactobacilluscaseiLactococcuslactisLeptospiraListeriaListeriamonocytogenesMoraxellacatarrhalisMannheimiahaemolyticaNeisseriaP.gingivalisP.acne

PseudomonasaeruginosaPasteurellamultocidaPasteurellamultocidaStaphylococcusaureusStreptococcusagalactiaeSalmonellaentericaStaphylococcusepidermidisS.maltophiliaStreptococcuspneumoniaeStreptococcusoralisS.zooepidemicusStreptococcuspyogenesStreptococcussuisStreptococcusthermophilusStreptomycesStreptococcusuberisVibrioparahaemolyticusVibriovulnificusWolbachiaXylellafastidiosaY.pseudotuberculosis

Page 23: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE
Page 24: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

ExtendedOutput

Page 25: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

ExtendedOutput

aro: WARNING, Identity: 100%, HSP/Length: 349/498, Gaps: 0, aro_122 is the best match for aro

Page 26: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

PlasmidFinderandpMLST

ThePlasmidFinderdatabasecontainsreplicons,notentireplasmids.

Page 27: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE
Page 28: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

Toolsforphenotyping-ResFinder

ResFinder(BLAST)

NGSIllumina

Iontorrent454..

Resistancegeneprofile

Assemblypipeline

List of genes Accession numbers

Theoretical resistance phenotype

Page 29: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE
Page 30: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

ResFinderoutput

Page 31: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

◇ 200 isolates from 4 different species (Salmonella Typhimurium, Escherichia coli, Enterococcus faecalis and Enterococcus faecium)

◇ ResFinder, 98 %ID, 60% length coverage

◇ Phenotypic tests, 3,051 in total • 482 Resistant • 2569 Susceptible

=> 99,74% of the results were in agreement between ResFinder and the phenotypic tests

23 discrepancies -> 16, typically in relation to spectinomycin in E. coli

Page 32: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE
Page 33: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

Handling sequence data?Watch out!

Same FASTA file in Word

This should be fine…

Page 34: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

Handling sequence data?Watch out!

What your data actually looks like!

Oh no! This wont work…

Use “pure” text editorsExamples: • Notepad (Win) • Textedit (Mac) • Sublime Text (all)

Save files in “txt” format.

Page 35: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

Awordonbrowsers

• Browserswelike:Chrome,Firefox,(Safari)

• Browserswedon’tlike:Explorer,Edge

Page 36: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

And now…

Page 37: Introduction to single-isolates, single CGE services · 2017-03-20 · Workshop on Whole Genome Sequencing and Analysis, 27-29 Mar. 2017 Introduction to single-isolates, single CGE

goseqit.com/Results