rapid generation of e.coli o104:h4 pcr diagnostics

25
Outbreak of E. coli O104:H4 heralds a new paradigm in responding to disease threats Nicola J. Holden Leighton Pritchard

Upload: leighton-pritchard

Post on 10-May-2015

207 views

Category:

Science


0 download

DESCRIPTION

Presentation delivered 29th October 2012, at the CoZee workshop in Dundee (see CoZee zoonosis network site for more information: http://www.cozee-zoonosis.net/). [For clarity: our diagnostics work did not at the time form part of the excellent E.coli O104:H4 genome analysis crowd-sourcing consortium work, which can be found at https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki - we talked about it here because it was good work, and without their efforts we couldn't have done what we did]

TRANSCRIPT

Page 1: Rapid generation of E.coli O104:H4 PCR diagnostics

Outbreak of E. coli O104:H4 heralds a new paradigm in responding to disease threats

Nicola J. HoldenLeighton Pritchard

Page 2: Rapid generation of E.coli O104:H4 PCR diagnostics

EHEC O104:H4 outbreak, Europe 2011

Unprecedented:scale of outbreak

(3950 affected, 53 deaths; multipleimport restrictions)

emerging pathogen(one previous case in S.Korea)

rapid production of sequence datacrowd-sourcing of assembly, and annotation via

GitHubhttps://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki

Page 3: Rapid generation of E.coli O104:H4 PCR diagnostics

EHEC O104:H4 outbreak, Europe 2011

Unprecedented:scale of outbreak

(3950 affected, 53 deaths; multipleimport restrictions)

emerging pathogen(one previous case in S.Korea)

rapid production of sequence datacrowd-sourcing of assembly and annotation via

collaborative revision control site: GitHubhttps://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki

Page 4: Rapid generation of E.coli O104:H4 PCR diagnostics

EHEC O104:H4 outbreak – timeline

1st May: onset of outbreak

26th May: strain characteristics (Scheutz et al., 2012 Eurosurveill)

30th May: diagnostic laboratory information released (Muenster)

2nd June: first draft assembly available (GitHub)9th to 21st June: additional sequences announced 22nd June: Microbiological characteristics published (Bielaszewska

et al., 2011 LID)

26th July: official end of the outbreak (RKI)

refs: https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki; RKI; Institute of Hygiene, Muenster

Page 5: Rapid generation of E.coli O104:H4 PCR diagnostics

EHEC O104:H4 outbreak – timeline

Page 6: Rapid generation of E.coli O104:H4 PCR diagnostics

EHEC O104:H4 outbreak – timeline

1st May: onset of outbreak

26th May: strain characteristics (Scheutz et al., 2012 Eurosurveill)

30th May: diagnostic laboratory information released (Muenster)

2nd June: first draft assembly available (GitHub)9th to 21st June: additional sequences announced 22nd June: Microbiological characteristics published (Bielaszewska

et al., 2011 LID)

26th July: official end of the outbreak (RKI)

refs: https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/wiki; RKI; Institute of Hygiene, Muenster

Page 7: Rapid generation of E.coli O104:H4 PCR diagnostics

EHEC O104:H4 outbreak – timeline 27th July: Publication of open-source genomic analysis

Page 8: Rapid generation of E.coli O104:H4 PCR diagnostics

A changing paradigm?

Kwan et al. (2011) http://precedings.nature.com/documents/6663/version/1

Page 9: Rapid generation of E.coli O104:H4 PCR diagnostics

Meanwhile: diagnostics27th June – 6th July

1. Outbreak isolate-specific, sub-serotype diagnostics

2. Exploit rapid sequencing: work directly from incomplete and unordered draft genome sequences

3. Rapidly generated (perhaps ahead of the biology?)

4. Validated (good estimates of error rates)

5. Easy to use and distribute

6. Cheap(er than sequencing everything)

Page 10: Rapid generation of E.coli O104:H4 PCR diagnostics

Meanwhile: diagnostics27th June – 6th July

1. Outbreak isolate-specific, sub-serotype diagnostics

2. Exploit rapid sequencing: work directly from incomplete and unordered draft genome sequences

3. Rapidly generated (perhaps ahead of the biology?)

4. Validated (good estimates of error rates)

5. Easy to use and distribute

6. Cheap(er than sequencing everything)

Alignment-free PCR primer design: no need to identify conserved signature sequences prior to primer design

Page 11: Rapid generation of E.coli O104:H4 PCR diagnostics

Alignment-free primer design: strategy

‘Positive’ genome set: 11 genome assemblies of 9 EHEC O104:H4 outbreak isolates (GitHub crowdsourcing)

‘Negative’ genome set: 31 genomes of E. coli and E. fergusonii (GenBank)

Design many (>1000) primers to positive genome set:target CDS; optimise for qRT; 20 mers; 100 bp amplicons; TA = 58 oC

Filter primers in silico:

Exclude sets with predicted productive amplification in negative genomes.

Screen primers to exclude sets with strong sequence similarity to any of a larger set of off-target genomes: (GenBank Enterobacteriaceae)

Page 12: Rapid generation of E.coli O104:H4 PCR diagnostics

Alignment-free primer design: strategy

‘Positive’ genome set: 11 genome assemblies of 9 EHEC O104:H4 outbreak isolates (GitHub crowdsourcing)

‘Negative’ genome set: 31 genomes of E. coli and E. fergusonii (GenBank)

Design many (>1000) primers to positive genome set:target CDS; optimise for qRT; 20 mers; 100 bp amplicons; TA = 58 oC

Filter primers in silico:

Exclude sets with predicted productive amplification in negative genomes.

Screen primers to exclude sets with strong sequence similarity to any of a larger set of off-target genomes: (GenBank Enterobacteriaceae)

Page 13: Rapid generation of E.coli O104:H4 PCR diagnostics

Automation

https://github.com/widdowquinn/find_differential_primers

Page 14: Rapid generation of E.coli O104:H4 PCR diagnostics

Alignment-free primer design

Positive

Negative ...

...

...

...III

II

IV

V

I

1. Process configuration files: Locations and classes of input

sequence files.

2. Convert to single (pseudo)chromosomes:

Concatenate draft genome sequence.

3. Genome feature locations:From GBK file or predicted from

Prodigal.

Page 15: Rapid generation of E.coli O104:H4 PCR diagnostics

Primer prediction (on positive set)

Positive

Negative

III

II

IV

V

I4. Predict primer locations:> 1000 thermodynamically

plausible primer sets on each (pseudo)chromosome, using

Primer3.

Page 16: Rapid generation of E.coli O104:H4 PCR diagnostics

Test cross-amplification in silico

Positive

Negative

III

II

IV

V

I

5. Check cross-amplification:All primer sets tested against

other organisms, using PrimerSearch.

6. BLAST screen:All primers screened for off-

target sequences with BLAST: 7 possible primer sets

Page 17: Rapid generation of E.coli O104:H4 PCR diagnostics

Classify primers and validation

III

II

IV

V

I

...

...

...

...

...

III IV V +ve -ve7. Classify primers:

Classified primer sets according to their ability

to amplify specific classes of input

sequence.

8. Validate primers:Primer set validated on positive and negative

targets in vitro.

5 target sequences:prophage gp20 (2)hypothetical CDS (2)impB (1)

Page 18: Rapid generation of E.coli O104:H4 PCR diagnostics

ValidationIn silico, diagnostic primers are just another classifier

Validation on unseen data is critical

(avoid overfitting, estimation of performance)

Direct experimental validation of primer candidates (Münster):

‘Positive’ set = 21 clinical outbreak isolates

‘Negative’ set = 32 HUSEC / EPEC isolates

Positive control = LB 226692

Page 19: Rapid generation of E.coli O104:H4 PCR diagnostics

Primer design: validated in vitro

positive negative

Page 20: Rapid generation of E.coli O104:H4 PCR diagnostics

Alignment-free primer design: summary

Individual primer sets: 100 % sensitivity; 82–94 % specificity; 9% < FDR < 22%

Combining primers: 100 % sensitivity and specificity

A minimal combination of two primer sets discriminated absolutely between outbreak O104:H4 isolates and non-outbreak E. coli isolates, including HUSEC 041

Flexibility in strategy allows for targeted design, e.g. multiplex PCR / different organisms / large gene families etc..

Same approach used for

Resolving Dickeya plant pathogens

Discriminating between RxLR effectors in Phytophthora infestans

Page 21: Rapid generation of E.coli O104:H4 PCR diagnostics

Alignment-free primer design: summary

Bypass the need for:

multiple genomic alignments

biological justification for primer choice (maybe even reveal biology…)

Produce diagnostic primers for any subgroup of organisms (possibly…)

Limitations

Scaling issue: PrimerSearch is slow (modular pipeline allows use of alternative programs)

Low specificity of primers -> use qPCR

Very similar organisms may not be distinguished

Time from genomes to primer sets: 90 hours

possibility for improvements as collaborative bioinformatics projects (speed up off-target primer mapping, make into user-friendly tool…)

Page 22: Rapid generation of E.coli O104:H4 PCR diagnostics

Acknowledgements

[email protected]@hutton.ac.uk

Thanks to Nadine Brandt,Kath Wright and Sean Chapman

Page 23: Rapid generation of E.coli O104:H4 PCR diagnostics

Sprouted seeds as a source of infections

Page 24: Rapid generation of E.coli O104:H4 PCR diagnostics

‘Sproutbreak’ - Jimmy Johns restaurant

Page 25: Rapid generation of E.coli O104:H4 PCR diagnostics

Colonisation of spinach by VTEC O157:H7 Sakai (vt-)