pipeline or pipe dream - midlands micro meeting uk - mon 15 sep 2014

Post on 06-Aug-2015

50 Views

Category:

Science

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Pipeline or pipe dream?

Transitioning a public health microbiology laboratory network to

WGS & bioinformatics

Dr Torsten Seemann

1st Midlands Molecular Microbiology Meeting - Mon 15 Sep 2014 - Birmingham, UK

IntroductionOnly 4 hours until drinks.

(Very) South East Midlands

About me● Previous life

o B.Sc - computer science, data compressiono B.E - elec & comp sys engineering

(abandoned)o Ph.D - digital image processing

● Bioinformatician: microbial genomicso primarily bacterial pathogenso genomics data analysiso tool development: Prokka, Nesoni,

VelvetOptimiser

Nomadic bioinformatics

Microbial Diagnostic Unit

● Oldest public health lab in Australiao established 1897 in Melbourneo large historical isolate collection back to

1950s

● National reference laboratoryo Salmonella, Listeria, EHEC

● WHO regional reference labo vaccine preventable invasive bacterial

pathogens

New director● Professor Ben Howden

o clinician, microbiologist, pathologisto early adopter of genomics and bioinformaticso long term collaborator on MRSA and VRE

● Mandateo modernise service deliveryo enhance research output and collaborationo nationally lead the conversion to WGS

Transitioning“If you want to make enemies, try to change

something.”

Existing workflow

Traditional typing

● PFGEPulsed Field Gel Electrophoresis

● MLSTMulti-Locus Sequence Typing

● MLVAMulti-Locus VNTR Analysis

Drawbacks● Low resolution

o only gives rough idea of relationship

● Labour intensiveo lots of tedious lab work

● Relatively expensiveo in time and consumables

A single assay● Whole Genome Sequencing (WGS)

o backward compatible with most existing typing

● Complete snapshoto all variation: SNVs, insertions, structural

changeso plasmids, phage, resistance & virulence

genes

● High throughputo now cheaper (<£50) and faster (<24h)

New workflow

Refocussing● Small scale → big scale

o orders of magnitude increase in samples processed

● Manual → automatedo robots for colony picking & library

preparation● Benchtop → desktop

o bioinformatics, visualization, interpretation● Paper → electronic

o data storage, backups, LIMS system

ImplementationWarning: may contain some bioinformatics

The WGS assay

● Millions of DNA

sequences

● Typically 50-300 bp each

● Includes quality

information

● File size ~ 1 gigabyte

Using short reads● Read mapping

o align all reads to a reference genome

● De novo assemblyo reconstruct the source replicons into contigs

● Alignment-free methodso examine the nucleotide content of the reads

directly

Read mapping● Choose an existing reference genome● Find best fit for each read on the

reference

Use case: read mapping

Genome deletions● Regions in reference where no reads

align● DNA not present in sequenced isolate

De novo assembly

Like a jigsaw puzzle, except● we don’t have the box (unknown target)● missing pieces (coverage bias)● broken pieces (sequencing errors)● duplicate pieces (repeats)● disconnected sub-puzzles (multiple replicons)● random pieces from another puzzle

(contamination)● no corner or edge pieces (circular genomes)

Use case: de novo assembly● Novel DNA

o mobile elementso plasmidso phage

● Structural changeso inversions &

rearrangementso large insertions &

deletionso plasmid integration

Mutant

Wildtype

k-mer analysis● Build a “signature”

from all sub-reads of length k

● Compare signature to database of signatures of known genomes

k=4

Use case: k-mer analysis 1.04 1046 1046 U 0 unclassified98.96 99624 142 - 1 root98.81 99473 1 - 131567 cellular organisms98.81 99472 194 D 2 Bacteria98.57 99233 111 P 1224 Proteobacteria98.45 99110 318 C 1236 Gammaproteobacteria98.07 98728 0 O 91347 Enterobacteriales98.07 98728 52477 F 543 Enterobacteriaceae44.95 45256 665 G 561 Escherichia44.20 44498 33391 S 562 Escherichia coli 8.84 8899 8899 - 1274814 Escherichia coli APEC O78 0.29 287 0 - 244319 Escherichia coli O26:H11 0.29 287 287 - 573235 Escherichia coli O26:H11 str 11368 0.21 216 216 - 316401 Escherichia coli ETEC H10407 0.19 193 0 - 168807 Escherichia coli O127:H6 0.19 193 193 - 574521 Escherichia coli O127:H6 str E2348/69

http://ccb.jhu.edu/software/kraken

ProgressImplement. Deploy. ?????. Profit $$$.

Current status● Sequencer

o replacing MiSeq with NextSeq-500 + robots

● Softwareo most components written, some incompleteo not a fully automated pipeline yeto no friendly user interface yet

● Need a project name!

A vision for Australia● A common online system for all labs

o upload sampleso automated analysis pipelines

some customization for each genuso easy submission to ENA and Genbank

● Access controlo each lab controls their own datao jurisdictions can share data in outbreaks

Cooperation● International

o Global Microbial Identifier consortium

● UKo PHE, ngMicrobes, CLIMB, ENA

● USAo FDA GenomeTrakr, NCBI SRA

● NZo New Zealand already participating in our

PHLN

ConclusionYour post-prandial blood glucose is now

peaking.

Resistance to change● Protecting empires

o “this is how we’ve always done it”, job redundancies

● Expense of instrumentso capital purchase, maintenance, new staff

● Fear of the unknowno lack bioinformatics, infrastructure, software,

training● Legal requirements

o must do PFGE, validation, accreditation

Upcoming .au meetings● Lorne Infection & Immunity

o Feb 2015 @ Lorne (beach, near Melbourne)

● Australian ASM o Jul 2015 @ Canberra (near Parliament)

● BacPatho Sep 2015 @ San Remo (beach, near

Melbourne)

Opportunities

● Warwick-Monash Allianceo seed funding, joint positions, shared PhD

studentso www.monashwarwick.org

● Birmingham-Melbourneo both members of Universitas 21o www.universitas21.com

AcknowledgementsUni BrumNick Loman

Ian HendersonCathy WardiusAllie Hardwick

Loman family

My family

Monash Uni

David PowellDieter BulachRoss Coppel

Uni MelbTim StinearJason Kwong

MDUBen HowdenKim Barton

VLSCIAndrew Lonie

Helen Gardiner

Emailtorsten.seemann@gmail.comTwitter @torstenseemannBlogTheGenomeFactory.blogspot.comWeb bioinformatics.net.au

Contact

top related