Center for Genomic Epidemiology
Aim:
• To provide the scien)fic founda)on for future internet-‐based solu)ons, where a central database will enable simplifica)on of total genome sequence informa)on and comparison to all other sequenced isolates including spa)al-‐temporal analysis.
• To develop algorithms for rapid analyses of whole genome DNA-‐sequences, tools for analyses and extrac)on of informa)on from the sequence data and internet/web-‐interfaces for using the tools in the global scien)fic and medical community.
Tools for species iden)fica)on
Name of Service Description
URL (cge.cbs.dtu.dk/services/) Status Publication
SpeciesFinder Species identification using 16S rRNA
SpeciesFinder Online Published Feb 2014 PMID: 24574292
KmerFinder Species identification using overlapping 16mers
KmerFinder Online Published Jan 2014 PMID: 24172157
TaxonomyFinder Taxonomy identification using functional protein domains
TaxonomyFinder Published in PMID: 24574292 + Oksana's PhD thesis
Reads2Type Species identification on client computer
Reads2Type Online Published Feb 2014 PMID: 24574292
Training data 1,647 completed / almost completed genomes downloaded
from NCBI in 2011 (1,009 different species)
Evalua)on data NCBI draV genomes
• 695 isolates from species that overlap with training set (151 species)
SRA draV genomes • 10,407 sets of short reads from Illumina (168 species)
• 10,407 draV genomes from Illumina data (168 species)
16S rRNA
• 16S rRNA sequencing has dominated molecular taxonomy of prokaryotes for more than 30 years (Fox et al, Int. J. Syst. Bacteriol., 1977)
• Tremendous amounts of 16S rRNA sequence data are available in databases
Concerns: • Low resolu)on • Some genomes contain several copies of the 16S rRNA gene with inter-‐gene varia)on • The 16S rRNA gene represents only about 0.1% of the coding part of a microbial genome
Reference database • 16S rRNA genes are isolated from genomes in training data using RNAmmer (Lagesen, NAR, 2007).
Method • Input genomes are BLASTed against 16S rRNA genes in reference database.
• Best hit is selected based on a combina)on of coverage, % iden)ty, bitscore, number of mistmatches and number of gaps in the alignments.
CGE implementa)on of 16S species iden)fica)on
SpeciesFinder
KmerFinder • Genomes in training data is chopped into 16mers:
A T G A C G T A T G A T T G A T G A C G T A G T A G T C C
• Immune system inspired downsampling
• Only 16mers with specific prefix are kept
MHC-I
9mer
ATGAATGTGTGAGTGA
ATGACTGTGCCCCTGA
ATGAAAAAAAAAAAA
Unique 16 mers:
Species Match No. of Kmer hits
Acinetobacter baumannii CP001921 2
Acinetobacter baumannii CP000521 1
Acinetobacter baumannii CP002521 1
Buchnera aphidicola CP002301 1
ATGAATGTGTGAGTGA CP001921 (Acinetobacter baumanii) CP000521 (Acinetobacter baumanii) CP002522 (Acinetobacter baumanii)
ATGACTGTGCCCCTGA CP001921 (Acinetobacter baumanii) CP002301 (Buchnera aphidicola)
16mer database
Unknown isolate
KmerFinder is very robust – it only needs one 16mer!
Desulfovibrio piger GOR1 SRR097356
>NODE 4 length 92 cov 23.119566!TAGGACGTGGAATATGGCAAGAAAACTGAAAATCATGGAAAATGAGAAACATCCACTTGA!CGACTTGAAAAATGACGAAATCACTAAAAAACGTGAAAAATGAGAAATGC!>NODE 15 length 82 cov 2.792683!AGCGAAAAATGTCATAACAACGATCACGACCGATAACCATCTTTGGTCCAAACTTACTCA!CGCAGCAGGCGTATAACTCGCGCATACCAGCTTTGGGCAT!
N50 = 110 �Total no. of bp: 210 �
Species Match No. of Kmer hits
Flavobacterium psycrophilum
AM398681 1
PredicNon
Input set of prokaryotic genomes
Gene prediction
Whole genome proteome scanagainst 3 HMM-based databases
Gene grouping based on functional domain profiles
Prodigal gene prediction
User submitted genes
PfamA
TIGRFAM
Superfamily
CD-HIT clustering of all CDSs with no hits to any HMM-database
Whole genome functional profile formation
Specific-profile finding
Phyla-specific
Species-specific
domains
Foreach
genome
MTGENLPPELPATAQAWRASVLYGQHLQLIRHLCVTCPRWSQSTSR
A B CProfile: A-B-C
Taxonomy level-specific gene database creationTaxonomyFinder
Reads2Type
• Read2Type pushes analysis to user, server provides 50-‐mers database
• SuffixTree: efficient data structure for string matching
• Narrow Down Approach: – Reads2Type compares 50-‐
mers of combined marker genes against raw reads
– Shared Probes vs Unique Probe
• DefiniNon: Quick & dirty taxonomy iden)fica)on of single isolates
• 50-‐mer of marker gene DB
– 16S rRNA: Training data genomes RNAmmer (other)
– ITS: Training data (Mycobacterium)
– GyrB: Training data (Enterobacteriaceae)
– Resul)ng database ~5 MB
rMLST
CGE implementaNon
• For each genome in the training data the 53 ribosomal genes were extracted.
• Genomes in evalua)on sets were aligned using blat to each gene collec)on (only hits with at least 95% iden)ty and 95% coverage were considered as a poten)al match).
• The closets match of the training genomes was selected based on a combina)on of coverage, %iden)ty, bitscore, number of mistmatches and number of gaps in the alignments across all genes.
Jolley KA, Bliss CM, Bennej JS, Bratcher HB, Brehony C, Colles FM, Wimalarathna H, Harrison OB, Sheppard SK, Cody AJ, Maiden MC. Ribosomal mulNlocus sequence typing: universal characterizaNon of bacteria from domain to strain. Microbiology. 2012 Apr;158(Pt 4):1005-‐15.
Isolates in the NCBIdra<s set for which all four methods predict the species to be different from the annotated one. * NZAEPO00000000 has been re-‐annotated as S. oralis since we downloaded the data.
!"#$%&'()"&%*+(),-)$$##!"&#$'()"#..,/+0.%,*'0$%,-'$#)%1)"#..,/+)$&2*)"#/1)"#..,/+"%*%,/1)"#..,/+/,(&#.#/1)"#..,/+&2,*#$3#%$/#/1)"#..,/+4%#2%$/&%02)$%$/#/1'**%.#)+(,*35'*6%*#1*,"%..)+)('*&,/1*,"%..)+-%.#&%$/#/1,*72'.5%*#)+-)..%#1,*72'.5%*#)+0/%,5'-)..%#1,*72'.5%*#)+&2)#.)$5%$/#/8)-09.'()"&%*+:%:,$#82.)-95#)+&*)"2'-)&#/8.'/&*#5#,-+('&,.#$,-8.'/&*#5#,-+$';9#8.'/&*#5#,-+0%*6*#$3%$/<$&%*'"'"",/+6)%").#/</"2%*#"2#)+"'.#=*)$"#/%..)+&,.)*%$/#/>)%-'02#.,/+#$6.,%$?)%>%.#"'()"&%*+09.'*#@)"&'()"#..,/+"*#/0)&,/@)"&'()"#..,/+3)//%*#@)"&'()"#..,/+*%,&%*#@#/&%*#)+-'$'"9&'3%$%/A9"'()"&%*#,-+&,(%*",.'/#/B%#//%*#)+3'$'**2'%)%C/%,5'-'$)/+)%*,3#$'/)D2#?'(#,-+%&.#D2#?'(#,-+.%3,-#$'/)*,-E).-'$%..)+%$&%*#")E2#3%..)+/'$$%#E&)029.'"'"",/+),*%,/E&)029.'"'"",/+%0#5%*-#5#/E&*%0&'"'"",/+)3).)"&#)%E&*%0&'"'"",/+-#&#/E&*%0&'"'"",/+'*).#/E&*%0&'"'"",/+0$%,-'$#)%F*%)0.)/-)+,*%).9&#",-G#(*#'+"2'.%*)%G#(*#'+2)*;%9#G#(*#'+0)*)2)%-'.9&#",/H%*/#$#)+0%/&#/
!"#$%&'()"&%*+(),-
)$$##
!"&#$'()"#..,/+0.%,*'0$%,-'$#)%
1)"#..,/+)$&2*)"#/
1)"#..,/+"%*%,/
1)"#..,/+/,(&#.#/
1)"#..,/+&2,*#$3#%$/#/
1)"#..,/+4%
#2%$/&%02)$%$/#/
1'**%.#)+(,*35'*6%*#
1*,"%..)+)('*&,/
1*,"%..)+-
%.#&%$/#/
1,*72'.5%*#)+-
)..%#
1,*72'.5%*#)+0/%,5'-
)..%#
1,*72'.5%*#)+&2)#.)$5%$/#/
8)-
09.'()"&%*+:%:,$#
82.)-
95#)+&*)"2'-
)&#/
8.'/&*#5#,-+('&,.#$,-
8.'/&*#5#,-+$';9#
8.'/&*#5#,-+0%*6*#$3%$/
<$&%*'"'"",/+6)%").#/
</"2%*#"2#)+"'.#
=*)$"#/%..)+&,.)*%$/#/
>)%-'02#.,/+#$6.,%$?)%
>%.#"'()"&%*+09.'*#
@)"&'()"#..,/+"*#/0)&,/
@)"&'()"#..,/+3)//%*#
@)"&'()"#..,/+*%,&%*#
@#/&%*#)+-
'$'"9&'3%$%/
A9"'()"&%*#,-+&,(%*",.'/#/
B%#//%*#)+3'$'**2'%)%
C/%,5'-'$)/+)%*,3#$'/)
D2#?'(#,-
+%&.#
D2#?'(#,-
+.%3,-
#$'/)*,-
E).-'$%..)+%$&%*#")
E2#3%..)+/'$$%#
E&)029.'"'"",/+),*%,/
E&)029.'"'"",/+%0#5%*-#5#/
E&*%0&'"'"",/+)3).)"&#)%
E&*%0&'"'"",/+-#&#/
E&*%0&'"'"",/+'*).#/
E&*%0&'"'"",/+0$%,-'$#)%
F*%)0.)/-
)+,*%).9&#",-
G#(*#'+"2'.%*)%
G#(*#'+2)*;%9#
G#(*#'+0)*)2)%-'.9&#",/
H%*/#$#)+0%/&#/
!"#$%&'(#$)*')+,-.)/$012)3#'*"#4
5$#(&62#()78#6
9''*202#()78#6
!" !#$ !$" !!%$ &""'
!"#$%&'()"&%*+(),-)$$##!"&#$'()"#..,/+0.%,*'0$%,-'$#)%!.&%*'-'$)/+-)".%'1##!2'*3#2'(#,-+"),.#$'1)$/4)"#..,/+)$&3*)"#/4)"#..,/+"%*%,/4)"#..,/+/,(&#.#/4)"#..,/+&3,*#$5#%$/#/4)"#..,/+6%#3%$/&%03)$%$/#/4.)&&)()"&%*#,-+/074'**%.#)+)82%.##4'**%.#)+(,*51'*8%*#4*,"%..)+)('*&,/4*,"%..)+'9#/4,"3$%*)+)03#1#"'.)4,*:3'.1%*#)+-)..%#4,*:3'.1%*#)+0/%,1'-)..%#;)-0<.'()"&%*+=%=,$#;3%.)	'*)$/+/07;3.)-<1#)+&*)"3'-)&#/;.'/&*#1#,-+('&,.#$,-;.'/&*#1#,-+$'9<#;.'/&*#1#,-+0%*8*#$5%$/>$&%*'()"&%*+".')")%>$&%*'"'"",/+8)%").#/>$&%*'"'"",/+/07>/"3%*#"3#)+"'.#?#(*'()"&%*+/,""#$'5%$%/?*)$"#/%..)+&,.)*%$/#/@)%-'03#.,/+#$8.,%$2)%@).'0#5%*+A)$)1,%$/#/@).'&%**#5%$)+&,*:-%$#")B)"&'()"#..,/+"*#/0)&,/B#/&%*#)+-'$'"<&'5%$%/C)*#$'()"&%*+)13)%*%$/C'(#.,$",/+",*&#/##C<"'()"&%*#,-+)(/"%//,/C<"'()"&%*#,-+('9#/C<"'()"&%*#,-+-)*#$,-C<"'()"&%*#,-+&,(%*",.'/#/D%#//%*#)+5'$'**3'%)%E.)$"&'-<"%/+.#-$'03#.,/E/%,1'-'$)/+)%*,5#$'/)F3#2'(#,-+%&.#F,-#$'"'"",/+/07G).-'$%..)+%$&%*#")G3#5%..)+('<1##G3#5%..)+8.%A$%*#G&)03<.'"'"",/+),*%,/G&)03<.'"'"",/+")*$'/,/G&)03<.'"'"",/+%0#1%*-#1#/G&%$'&*'03'-'$)/+-).&'03#.#)G&*%0&'"'"",/+)5).)"&#)%G&*%0&'"'"",/+-#&#/G&*%0&'"'"",/+0$%,-'$#)%H*%0'$%-)+)2'&'$,&*#"#,-I*%)0.)/-)+,*%).<&#",-J#(*#'+"3'.%*)%J#(*#'+8#/"3%*#J#(*#'+0)*)3)%-'.<&#",/J#(*#'+/07K%*/#$#)+0%/&#/K%*/#$#)+0/%,1'&,(%*",.'/#/
!"#$%&'()"&%*+(),-
)$$##
!"&#$'()"#..,/+0.%,*'0$%,-'$#)%
!.&%*'-'$)/+-
)".%'1##
!2'*3#2'(#,-
+"),.#$'1)$/
4)"#..,/+)$&3*)"#/
4)"#..,/+"%*%,/
4)"#..,/+/,(&#.#/
4)"#..,/+&3,*#$5#%$/#/
4)"#..,/+6%
#3%$/&%03)$%$/#/
4.)&&)()"&%*#,-+/07
4'**%.#)+)82%.##
4*,"%..)+)('*&,/
4*,"%..)+'9#/
4,"3$%*)+)03#1#"'.)
4,*:3'.1%*#)+-
)..%#
4,*:3'.1%*#)+0/%,1'-
)..%#
;)-
0<.'()"&%*+=%=,$#
;3%.)	'*)$/+/07
;3.)-
<1#)+&*)"3'-
)&#/
;.'/&*#1#,-+('&,.#$,-
;.'/&*#1#,-+$'9<#
;.'/&*#1#,-+0%*8*#$5%$/
>$&%*'()"&%*+".')")%
>$&%*'"'"",/+8)%").#/
>$&%*'"'"",/+/07
>/"3%*#"3#)+"'.#
?#(*'()"&%*+/,""#$'5%$%/
?*)$"#/%..)+&,.)*%$/#/
@)%-'03#.,/+#$8.,%$2)%
@).'0#5%*+A)$)1,%$/#/
@).'&%**#5%$)+&,*:-%$#")
B)"&'()"#..,/+"*#/0)&,/
B#/&%*#)+-
'$'"<&'5%$%/
C)*#$'()"&%*+)13)%*%$/
C'(#.,$",/+",*&#/##
C<"'()"&%*#,-+)(/"%//,/
C<"'()"&%*#,-+('9#/
C<"'()"&%*#,-+-
)*#$,-
C<"'()"&%*#,-+&,(%*",.'/#/
D%#//%*#)+5'$'**3'%)%
E.)$"&'-
<"%/+.#-$'03#.,/
E/%,1'-'$)/+)%*,5#$'/)
F3#2'(#,-
+%&.#
F,-
#$'"'"",/+/07
G).-'$%..)+%$&%*#")
G3#5%..)+('<1##
G3#5%..)+8.%A$%*#
G&)03<.'"'"",/+),*%,/
G&)03<.'"'"",/+")*$'/,/
G&)03<.'"'"",/+%0#1%*-#1#/
G&%$'&*'03'-
'$)/+-
).&'03#.#)
G&*%0&'"'"",/+)5).)"&#)%
G&*%0&'"'"",/+-#&#/
G&*%0&'"'"",/+0$%,-'$#)%
H*%0'$%-
)+)2'&'$,&*#"#,-
I*%)0.)/-
)+,*%).<&#",-
J#(*#'+"3'.%*)%
J#(*#'+8#/"3%*#
J#(*#'+0)*)3)%-'.<&#",/
J#(*#'+/07
K%*/#$#)+0%/&#/
K%*/#$#)+0/%,1'&,(%*",.'/#/
!"#$%&'()"&%*+(),-)$$##!"&#$'()"#..,/+0.%,*'0$%,-'$#)%4)"#..,/+)$&3*)"#/4)"#..,/+"%*%,/4)"#..,/+"<&'&'A#",/4)"#..,/+/,(&#.#/4)"#..,/+&3,*#$5#%$/#/4)"#..,/+6%#3%$/&%03)$%$/#/4'**%.#)+(,*51'*8%*#4'**%.#)+&,*#")&)%4*,"%..)+)('*&,/4,*:3'.1%*#)+-)..%#4,*:3'.1%*#)+0/%,1'-)..%#;)-0<.'()"&%*+=%=,$#;3.)-<1#)+&*)"3'-)&#/;.'/&*#1#,-+('&,.#$,-;.'/&*#1#,-+$'9<#;.'/&*#1#,-+0%*8*#$5%$/>$&%*'"'"",/+8)%").#/>/"3%*#"3#)+"'.#?*)$"#/%..)+&,.)*%$/#/@)%-'03#.,/+#$8.,%$2)%@%.#"'()"&%*+3%0)&#",/B)"&'()"#..,/+"*#/0)&,/B#/&%*#)+-'$'"<&'5%$%/C<"'()"&%*#,-+&,(%*",.'/#/D%#//%*#)+5'$'**3'%)%E/%,1'-'$)/+)%*,5#$'/)F3#2'(#,-+%&.#F3#2'(#,-+.%5,-#$'/)*,-G).-'$%..)+%$&%*#")G3#5%..)+1</%$&%*#)%G3#5%..)+/'$$%#G&)03<.'"'"",/+),*%,/G&)03<.'"'"",/+%0#1%*-#1#/G&*%0&'"'"",/+)5).)"&#)%G&*%0&'"'"",/+-#&#/G&*%0&'"'"",/+'*).#/G&*%0&'"'"",/+0$%,-'$#)%H3%*-')$)%*'()"&%*+/0I*%)0.)/-)+,*%).<&#",-J#(*#'+"3'.%*)%J#(*#'+0)*)3)%-'.<&#",/J#(*#'+9,.$#8#",/K%*/#$#)+0%/&#/
!"#$%&'()"&%*+(),-
)$$##
!"&#$'()"#..,/+0.%,*'0$%,-'$#)%
4)"#..,/+)$&3*)"#/
4)"#..,/+"%*%,/
4)"#..,/+"<&'&'A#",/
4)"#..,/+/,(&#.#/
4)"#..,/+&3,*#$5#%$/#/
4)"#..,/+6%
#3%$/&%03)$%$/#/
4'**%.#)+(,*51'*8%*#
4'**%.#)+&,*#")&)%
4*,"%..)+)('*&,/
4,*:3'.1%*#)+-
)..%#
4,*:3'.1%*#)+0/%,1'-
)..%#
;)-
0<.'()"&%*+=%=,$#
;3.)-
<1#)+&*)"3'-
)&#/
;.'/&*#1#,-+('&,.#$,-
;.'/&*#1#,-+$'9<#
;.'/&*#1#,-+0%*8*#$5%$/
>$&%*'"'"",/+8)%").#/
>/"3%*#"3#)+"'.#
?*)$"#/%..)+&,.)*%$/#/
@)%-'03#.,/+#$8.,%$2)%
@%.#"'()"&%*+3%0)&#",/
B)"&'()"#..,/+"*#/0)&,/
B#/&%*#)+-
'$'"<&'5%$%/
C<"'()"&%*#,-+&,(%*",.'/#/
D%#//%*#)+5'$'**3'%)%
E/%,1'-'$)/+)%*,5#$'/)
F3#2'(#,-
+%&.#
F3#2'(#,-
+.%5,-
#$'/)*,-
G).-'$%..)+%$&%*#")
G3#5%..)+1</%$&%*#)%
G3#5%..)+/'$$%#
G&)03<.'"'"",/+),*%,/
G&)03<.'"'"",/+%0#1%*-#1#/
G&*%0&'"'"",/+)5).)"&#)%
G&*%0&'"'"",/+-#&#/
G&*%0&'"'"",/+'*).#/
G&*%0&'"'"",/+0$%,-'$#)%
H3%*-')$)%*'()"&%*+/07
I*%)0.)/-
)+,*%).<&#",-
J#(*#'+"3'.%*)%
J#(*#'+0)*)3)%-'.<&#",/
J#(*#'+9,.$#8#",/
K%*/#$#)+0%/&#/
!"#$%&'()"&%*+(),-)$$##!"&#$'()"#..,/+0.%,*'0$%,-'$#)%4)"#..,/+)$&3*)"#/4)"#..,/+"%*%,/4)"#..,/+/,(&#.#/4)"#..,/+&3,*#$5#%$/#/4)"#..,/+6%#3%$/&%03)$%$/#/4'**%.#)+(,*51'*8%*#4*,"%..)+)('*&,/4*,"%..)+/,#/4,*:3'.1%*#)+-)..%#4,*:3'.1%*#)+0/%,1'-)..%#;)-0<.'()"&%*+=%=,$#;3.)-<1#)+&*)"3'-)&#/;.'/&*#1#,-+('&,.#$,-;.'/&*#1#,-+$'9<#;.'/&*#1#,-+0%*8*#$5%$/>$&%*'"'"",/+8)%").#/>/"3%*#"3#)+"'.#?*)$"#/%..)+&,.)*%$/#/@)%-'03#.,/+#$8.,%$2)%B)"&'()"#..,/+"*#/0)&,/B#/&%*#)+-'$'"<&'5%$%/C<"'()"&%*#,-+&,(%*",.'/#/D%#//%*#)+5'$'**3'%)%E/%,1'-'$)/+)%*,5#$'/)F3#2'(#,-+%&.#F3#2'(#,-+.%5,-#$'/)*,-G).-'$%..)+%$&%*#")G3#5%..)+/'$$%#G&)03<.'"'"",/+),*%,/G&)03<.'"'"",/+%0#1%*-#1#/G&*%0&'"'"",/+)5).)"&#)%G&*%0&'"'"",/+-#&#/G&*%0&'"'"",/+'*).#/G&*%0&'"'"",/+0$%,-'$#)%I*%)0.)/-)+,*%).<&#",-J#(*#'+"3'.%*)%J#(*#'+0)*)3)%-'.<&#",/J#(*#'+/0K%*/#$#)+0%/&#/K%*/#$#)+0/%,1'&,(%*",.'/#/
!"#$%&'()"&%*+(),-
)$$##
!"&#$'()"#..,/+0.%,*'0$%,-'$#)%
4)"#..,/+)$&3*)"#/
4)"#..,/+"%*%,/
4)"#..,/+/,(&#.#/
4)"#..,/+&3,*#$5#%$/#/
4)"#..,/+6%
#3%$/&%03)$%$/#/
4'**%.#)+(,*51'*8%*#
4*,"%..)+)('*&,/
4*,"%..)+/,#/
4,*:3'.1%*#)+-
)..%#
4,*:3'.1%*#)+0/%,1'-
)..%#
;)-
0<.'()"&%*+=%=,$#
;3.)-
<1#)+&*)"3'-
)&#/
;.'/&*#1#,-+('&,.#$,-
;.'/&*#1#,-+$'9<#
;.'/&*#1#,-+0%*8*#$5%$/
>$&%*'"'"",/+8)%").#/
>/"3%*#"3#)+"'.#
?*)$"#/%..)+&,.)*%$/#/
@)%-'03#.,/+#$8.,%$2)%
B)"&'()"#..,/+"*#/0)&,/
B#/&%*#)+-
'$'"<&'5%$%/
C<"'()"&%*#,-+&,(%*",.'/#/
D%#//%*#)+5'$'**3'%)%
E/%,1'-'$)/+)%*,5#$'/)
F3#2'(#,-
+%&.#
F3#2'(#,-
+.%5,-
#$'/)*,-
G).-'$%..)+%$&%*#")
G3#5%..)+/'$$%#
G&)03<.'"'"",/+),*%,/
G&)03<.'"'"",/+%0#1%*-#1#/
G&*%0&'"'"",/+)5).)"&#)%
G&*%0&'"'"",/+-#&#/
G&*%0&'"'"",/+'*).#/
G&*%0&'"'"",/+0$%,-'$#)%
I*%)0.)/-
)+,*%).<&#",-
J#(*#'+"3'.%*)%
J#(*#'+0)*)3)%-'.<&#",/
J#(*#'+/07
K%*/#$#)+0%/&#/
K%*/#$#)+0/%,1'&,(%*",.'/#/
!"#$%&'()"&%*+(),-)$$##!"&#$'()"#..,/+0.%,*'0$%,-'$#)%4)"#..,/+)$&3*)"#/4)"#..,/+"%*%,/4)"#..,/+/,(&#.#/4)"#..,/+&3,*#$5#%$/#/4)"#..,/+6%#3%$/&%03)$%$/#/4'**%.#)+(,*51'*8%*#4*,"%..)+)('*&,/4*,"%..)+-%.#&%$/#/4,*:3'.1%*#)+-)..%#4,*:3'.1%*#)+0/%,1'-)..%#4,*:3'.1%*#)+&3)#.)$1%$/#/;)-0<.'()"&%*+=%=,$#;3.)-<1#)+&*)"3'-)&#/;.'/&*#1#,-+('&,.#$,-;.'/&*#1#,-+$'9<#;.'/&*#1#,-+0%*8*#$5%$/>$&%*'"'"",/+8)%").#/>/"3%*#"3#)+"'.#?*)$"#/%..)+&,.)*%$/#/@)%-'03#.,/+#$8.,%$2)%@%.#"'()"&%*+0<.'*#B)"&'()"#..,/+"*#/0)&,/B)"&'()"#..,/+5)//%*#B)"&'()"#..,/+*%,&%*#B#/&%*#)+-'$'"<&'5%$%/C<"'()"&%*#,-+&,(%*",.'/#/D%#//%*#)+5'$'**3'%)%E/%,1'-'$)/+)%*,5#$'/)F3#2'(#,-+%&.#F3#2'(#,-+.%5,-#$'/)*,-G).-'$%..)+%$&%*#")G3#5%..)+/'$$%#G&)03<.'"'"",/+),*%,/G&)03<.'"'"",/+%0#1%*-#1#/G&*%0&'"'"",/+)5).)"&#)%G&*%0&'"'"",/+-#&#/G&*%0&'"'"",/+'*).#/G&*%0&'"'"",/+0$%,-'$#)%I*%)0.)/-)+,*%).<&#",-J#(*#'+"3'.%*)%J#(*#'+3)*9%<#J#(*#'+0)*)3)%-'.<&#",/K%*/#$#)+0%/&#/
!"#$%&'()"&%*+(),-
)$$##
!"&#$'()"#..,/+0.%,*'0$%,-'$#)%
4)"#..,/+)$&3*)"#/
4)"#..,/+"%*%,/
4)"#..,/+/,(&#.#/
4)"#..,/+&3,*#$5#%$/#/
4)"#..,/+6%
#3%$/&%03)$%$/#/
4'**%.#)+(,*51'*8%*#
4*,"%..)+)('*&,/
4*,"%..)+-
%.#&%$/#/
4,*:3'.1%*#)+-
)..%#
4,*:3'.1%*#)+0/%,1'-
)..%#
4,*:3'.1%*#)+&3)#.)$1%$/#/
;)-
0<.'()"&%*+=%=,$#
;3.)-
<1#)+&*)"3'-
)&#/
;.'/&*#1#,-+('&,.#$,-
;.'/&*#1#,-+$'9<#
;.'/&*#1#,-+0%*8*#$5%$/
>$&%*'"'"",/+8)%").#/
>/"3%*#"3#)+"'.#
?*)$"#/%..)+&,.)*%$/#/
@)%-'03#.,/+#$8.,%$2)%
@%.#"'()"&%*+0<.'*#
B)"&'()"#..,/+"*#/0)&,/
B)"&'()"#..,/+5)//%*#
B)"&'()"#..,/+*%,&%*#
B#/&%*#)+-
'$'"<&'5%$%/
C<"'()"&%*#,-+&,(%*",.'/#/
D%#//%*#)+5'$'**3'%)%
E/%,1'-'$)/+)%*,5#$'/)
F3#2'(#,-
+%&.#
F3#2'(#,-
+.%5,-
#$'/)*,-
G).-'$%..)+%$&%*#")
G3#5%..)+/'$$%#
G&)03<.'"'"",/+),*%,/
G&)03<.'"'"",/+%0#1%*-#1#/
G&*%0&'"'"",/+)5).)"&#)%
G&*%0&'"'"",/+-#&#/
G&*%0&'"'"",/+'*).#/
G&*%0&'"'"",/+0$%,-'$#)%
I*%)0.)/-
)+,*%).<&#",-
J#(*#'+"3'.%*)%
J#(*#'+3)*9%<#
J#(*#'+0)*)3)%-'.<&#",/
K%*/#$#)+0%/&#/
!"#$%&'#$()!#&%#) !"#$%&'#$()!#&%#)
!"#$%&'#$()!#&%#) !"#$%&'#$()!#&%#)
*+,-./01,-'23404+5./01,-
-67)')8,9/,:./01,-
;<<=';'#$()!#&%#)
;<<=';'#$()!#&%#)
;<<=';'#$()!#&%#)
;<<=';'#$()!#&%#)
<&>%($-2?@(A,04+,:
!" !#$ !$" !!%$ &""'
; >
& $
Speed
Method EsNmated speed (mm:ss)
16S 00:13*
KmerFinder 00:09*
TaxonomyFinder 11:33*
rMLST 00:45*
Reads2Type 00:55**
*Es)ma)on based on draV genomes **Es)ma)on based on short reads
Summary of taxonomy benchmark study
• KmerFinder had the highest accuracy and was the fastest method.
• SpeciesFinder (16S rRNA-‐based) had the lowest accuracy.
• Methods that only sample genomic loci (16S, Reads2Type, rMLST) had difficul)es dis)n-‐guishing species that only recently diverged, especially when main difference is a plasmid.
Tools for further typing
Name of Service Description
URL (https://cge.cbs.dtu.dk/services/ ) Publication
MLST Multilocus sequence typing MLST
Published Apr 2012, PMID: 22238442
Plasmid-Finder
Identification of plasmids in Enterobacteriaceae
PlasmidFinder Published Apr 2014, PMID 24777092
pMLST pMLST of plasmids in Enterobacteriaceae
pMLST Published Apr 2014, PMID 24777092
MulNlocus Sequence Typing (MLST)
First developed in 1998 for Neisseria meningiIs (Maiden et al. PNAS 1998. 95:3140-‐3145)
The nucleo)de sequence of internal regions of app. 7 housekeeping genes are determined by PCR followed by Sanger sequencing
Different alleles are each assigned a random number
The unique combina)on of alleles is the sequence type (ST)
www.cbs.dtu.dk/services/MLST
Assembled genome 454 – single end reads 454 – paired end reads Illumina – single end reads Illumina – paired end reads Ion Torrent SOLiD – single end reads SOLiD – mate pair reads
Acinetobacter baumannii #1 Acinetobacter baumannii #2 Arcobacter Borrelia burgdorferi Bacillus cereus Brachyspira hyodysenteriae Bifidobacterium Brachyspiria intermedia Bordetella Burkholderia pseudomallei Brachyspira Burkholeria cepacia complex Campylobacter jejuni Clostridium botulinum Clostridium difficile #1 Clostridium difficile #2 Campylobacter helveIcus Campylobacter insulaenigrae Clostridium sepIcum C. diphtheriae Campylobacter fetus Chlamydiales
Campylobacter lari Cronobacter C. upsaliensis Escherichia coli #1 Escherichia coli #2 Enterococcus faecalis Enterococcus faecium F. psychrophilum Haemophilus influenzae Haemophilus parasuis Helicobacter pylori Klebsiella pneumoniae Lactobacillus casei Lactococcus lacIs Leptospira Listeria Listeria monocytogenes Moraxella catarrhalis Mannheimia haemolyIca Neisseria P. gingivalis P. acne
Pseudomonas aeruginosa Pasteurella multocida Pasteurella multocida Staphylococcus aureus Streptococcus agalacIae Salmonella enterica Staphylococcus epidermidis S. maltophilia Streptococcus pneumoniae Streptococcus oralis S. zooepidemicus Streptococcus pyogenes Streptococcus suis Streptococcus thermophilus Streptomyces Streptococcus uberis Vibrio parahaemolyIcus Vibrio vulnificus Wolbachia Xylella fasIdiosa Y. pseudotuberculosis
Extended Output
aro: WARNING, Identity: 100%, HSP/Length: 349/498, Gaps: 0, aro_122 is the best match for aro
What is the MLST web-‐service used for?
A. Baumannii #1 4%
A. Baumannii #2 6%
A. chronobacter 2%
Capmpylobacter 6%
E. coli #1 21%
E. coli #2 4%
E. faecalis 2%
E. faecium 3%
K. pneumoniae 8%
Leptospira #1 2%
Leptospira #2 5%
L. monocytogenes 3%P. aeruginosa 2%S. agalactaie 2%
S. aureus 7%S. enterica 6%
S. pneumoniae 7%
Other 9%
MLST schemes usage
Tools for phenotyping
Name of Service Description
URL (https://cge.cbs.dtu.dk/services/ ) Publication
ResFinder
Identification of acquired antibiotic resistance genes ResFinder
Published Nov 2012, PMID: 22782487
Virulence-Finder
Identification of virulence genes in E. coli (and S. aureus and Enterococcus)
VirulenceFinder E. coli published Feb 2014, PMID: 24574290.
MyDbFinder Identification of genes from the users own database
MyDbFinder Will be published in book chapter
Pathogen-Finder
Prediction of pathogenic potential
PathogenFinder Published Oct 2013, PMID: 24204795
ResFinder
ResFinder (BLAST)
NGS Illumina
Ion torrent 454..
Sanger
Fasta
Resistance gene profile
Assembly pipeline
List of genes Accession numbers
Theoretical resistance phenotype
Sanger
Fasta
200 isolates from 4 different species (Salmonella Typhimurium, Escherichia coli, Enterococcus faecalis and Enterococcus faecium)
ResFinder, 98 %ID, 60% length coverage
Phenotypic tests, 3,051 in total • 482 Resistant • 2569 Suscep)ble
=> 99,74% of the results were in agreement between ResFinder and the phenotypic tests
23 discrepancies -‐> 16, typically in rela)on to spec)nomycin in E. coli
Unpublished or uncategorized
Name of Service Description
URL (https://cge.cbs.dtu.dk/
services/ ) Status Publication PanFunPro Groups homologous
proteins based on functional domain content
PanFunPro Online
Published in F1000Research 2013, 2:265
Serotype-Finder
Identification of serotypes SerotypeFinder-1.0
Online
Not yet published
Restriction-ModificationFinder
Identification of RM system genes
Restriction-ModificationFinder
Online
Will only be published in book chapter
HostPhinder Prediction of the host of a bacteriophage
HostPhinder Online, but under development
Not yet published
MetaVir-Finder
Identification of virus in metegenomic data
MetaVirFinder Online, but under development
Not yet published
MGmapper
Identifies the content of metagenomic samples MGmapper
Online, but under development
Not yet published
Tools for phylogeny
Name of Service Description URL (cge.cbs.dtu.dk/services) Status Publication
SnpTree
Creation of phylogenetic trees based on SNPs snpTree Online
Published Dec 2012, PMID: 23281601
CSIPhylo-geny
Creation of phylogenetic trees based on SNPs
CSIPhylogeny Online
Planned
NDtree Creation of phylogenetic trees
NDtree Online
Published in Feb 2014, PMID: 24505344
0.1 0.6 5.40.3
2.33.7
0.212.1
10.4
4.8
34.1
2.7
31.6
SerotypeFinderMGmapperVirulenceFinderRestrictionNDtreeSpeciesFinderKmerFinderHostPhinderPathogenBusterAssemblerpMLSTPlasmidFindersnpTreeCGEPrimerFinderResFinderPathogenFinderMLSTMetaVirFinder
Web-‐service usage
Type of data uploaded to MLST web-‐service
454, single reads
454, paired-‐end
Ion torrent
Illumina, single reads
Illumina, paired-‐end reads
Assembled draV genomes