a federated platform enabling richer genomic epidemiology ... · a federated platform enabling...

1
A Federated Platform Enabling Richer Genomic Epidemiology Analysis in a Public Health Environment. Franklin Bristow 2 , Josh Adam 2 , João André Carriço 5 , Mélanie Courtot 3,1, 10 , Bhavjinder Dhillon 1 , Damion Dooley 3 , Emma Griffiths 1 , Judy Isaac-Renton 3 , Alex Keddy 8 , Peter Kruczkiewicz 7 , Matthew Laird 1 , Andrew MacArthur 11 , Thomas Matthews 2 , Aaron Petkau 2 , Lynn Schriml 6 , Julie Shay 1 , Eduardo Taboada 7 , Patrick Tang 2 , Joel Thiessen 2,9 , Geoff Winsor 1 , Robert G. Beiko 8 , Morag Graham 2,9 , Gary Van Domselaar 2,9 , William Hsiao 3,4 ,The IRIDA Consortium and Fiona Brinkman 1 . www.irida.ca 1 Simon Fraser University, Burnaby, BC, Canada; 2 National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, MB, Canada; 3 BC Public Health Microbiology and Reference Laboratory, Vancouver, BC, Canada; 4 University of British Columbia, Vancouver, BC, Canada; 5 Faculty of Medicine, University of Lisbon, Lisbon, Portugal; 6 University of Maryland School of Medicine, Baltimore, MD, USA; 7 National Microbiology Laboratory, Public Health Agency of Canada, Lethbridge, AB, Canada; 8 Faculty of Computer Science, Dalhousie University, Halifax,NS, Canada; 9 University of Manitoba, Winnipeg, MB, Canada, 10 European Bioinformatics Institute, Hinxton, Cambridge, UK, 11 McMaster University, Hamilton, ON, Canada. The Goal of IRIDA: To support real-time infectious disease outbreak investigations in Canada: We are developing an open source, interactive, standards compliant resource for public health agencies that aims to release de-identified genome data rapidly, plus enable rich epi-clinical-lab-genome data analysis in a Public Health environment. Design Philosophy Dynamic integration of data sources (clinical, epidemiological and lab sourced) Privacy protection Suite of bioinformatics tools for researchers and public health workers Data standardization enabling value-added activities Data Management and Analysis Easy-to-use tools providing rich analyses in a timely manner Controlled parameters and workflows for reproducibility Automated raw read uploader Data organized by Project and Sample SNV phylogenies with SNVPhyl Galaxy workflow and de novo assembly and annotation using SPAdes and Prokka Easy-to-interpret QA/QC, revision tracking ddddddddddddd Data Security and Sharing Stakeholders maintain local, physical ownership of sample data Data sharing facilitated over an HTTP REST API, authorization with OAuth2 Application Ontology and User Interface No single existing ontology can adequately describe all the domains required for genomic epidemiology IRIDA ontology leverages existing resources and OBO standards for interoperability Metadata gaps identified by extensive user consultation IRIDA ontology available as OWL file Dynamic ontology driven user interface, selective de- identification and line list sharing Identifies genomic islands (GIs), virulence factors, antibiotic resistance genes in complete and draft genomes Integrates sequence data and geospatial information for phylogeographic analysis AMR gene annotation tool for whole genomes or genome assemblies integrated within Comprehensive Antibiotic Resistance Database (CARD) Background Most tools/pipelines require considerable technical knowledge Cannot easily integrate rich epidemiologic data IslandViewer GenGIS RGI – Resistance Gene Identifier Database & API Modules User access control + Data Security Quality Control Comparative Genomics Molecular Typing Visualization Patho- genomics Genomic Epi/Geo- tagging Lab Technician/ Medical Microbiologist Epidemiologist Researcher Customized output Clinical Data Sources Epidemiology Data Sources Laboratory Data Sources Programming Interface, Analysis Modules, End Users Data Sources Value Added Activities Epidemiology Investigation Results Epi-analysis Ready Data Outbreak Investigation IRIDA Platform Design Acknowledgements Funded by grants from Genome Canada, Genome BC, the Genomics R&D Initiative (GRDI), Cystic Fibrosis Canada and Compute Canada Genomics Pathogen Taxonomy SOPS Diagnostic Test Result Report Laboratory- test centric Clinical- patient centric Epidemiology- case centric Host Taxonomy Symptoms Demographics Treatment Vaccines Drugs Geography Public Health Intervention Exposure Contact Food Travel Environment Temporal Info

Upload: others

Post on 29-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Federated Platform Enabling Richer Genomic Epidemiology ... · A Federated Platform Enabling Richer Genomic Epidemiology Analysis in a Public Health Environment. Franklin Bristow2,

A Federated Platform Enabling Richer Genomic Epidemiology Analysis in a Public Health Environment.

Franklin Bristow2, Josh Adam2, João André Carriço5, Mélanie Courtot 3,1, 10, Bhavjinder Dhillon1, Damion Dooley3, Emma Griffiths1, Judy Isaac-Renton3, Alex Keddy8, Peter Kruczkiewicz7, Matthew Laird1, Andrew MacArthur11, Thomas Matthews2, Aaron Petkau2, Lynn Schriml6, Julie Shay1, Eduardo Taboada7, Patrick Tang2, Joel Thiessen2,9, Geoff Winsor1, Robert G. Beiko8, Morag Graham2,9, Gary Van Domselaar2,9, William Hsiao3,4,The IRIDA Consortium and Fiona Brinkman1. www.irida.ca1Simon Fraser University, Burnaby, BC, Canada;2National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, MB, Canada; 3BC Public Health Microbiology and Reference Laboratory, Vancouver, BC, Canada; 4University of British Columbia, Vancouver, BC, Canada; 5Faculty of Medicine, University of Lisbon, Lisbon, Portugal; 6University of Maryland School of Medicine, Baltimore, MD, USA; 7National Microbiology Laboratory, Public Health Agency of Canada, Lethbridge, AB, Canada; 8Faculty of Computer Science, Dalhousie University, Halifax,NS, Canada; 9University of Manitoba, Winnipeg, MB, Canada, 10European Bioinformatics Institute, Hinxton, Cambridge, UK, 11McMaster University, Hamilton, ON, Canada.

The Goal of IRIDA:To support real-time infectious disease outbreak investigations in Canada:

We are developing an open source, interactive, standards compliant resource for public health agencies that aims to release de-identified genome data rapidly, plus enable rich epi-clinical-lab-genome data analysisin a Public Health environment.

Design Philosophy• Dynamic integration of data sources (clinical,

epidemiological and lab sourced) • Privacy protection • Suite of bioinformatics tools for researchers and

public health workers • Data standardization enabling value-added activities

Data Management and Analysis• Easy-to-use tools providing rich analyses in a timely

manner • Controlled parameters and workflows for reproducibility

• Automated raw read uploader• Data organized by Project and Sample• SNV phylogenies with SNVPhyl Galaxy workflow and de

novo assembly and annotation using SPAdes and Prokka• Easy-to-interpret QA/QC, revision tracking

ddddddddddddd

Data Security and Sharing • Stakeholders maintain local, physical ownership of

sample data • Data sharing facilitated over an HTTP REST API,

authorization with OAuth2

Application Ontology and User Interface• No single existing ontology can adequately describe all the

domains required for genomic epidemiology

• IRIDA ontology leverages existing resources and OBO standards for interoperability

• Metadata gaps identified by extensive user consultation IRIDA ontology available as OWL file

• Dynamic ontology driven user interface, selective de-identification and line list sharing

Identifies genomic islands (GIs), virulence factors, antibiotic resistance genes in complete and draft genomes

Integrates sequence data and geospatial information for phylogeographicanalysis

AMR gene annotation tool for whole genomes or genome assemblies integrated within Comprehensive Antibiotic Resistance Database (CARD)

Background• Most tools/pipelines require considerable technical

knowledge • Cannot easily integrate rich epidemiologic data

IslandViewer

GenGIS

RGI – Resistance Gene IdentifierDatabase & API Modules

User access control + Data Security

Quality

Control

Comparative

Genomics

Molecular

Typing

VisualizationPatho-

genomics

Genomic

Epi/Geo-

tagging

Lab Technician/

Medical

Microbiologist

Epidemiologist Researcher

Customized output

Clinical Data

Sources

Epidemiology

Data

Sources

Laboratory

Data

Sources

Programming Interface, Analysis

Modules, End Users

Data Sources

Value Added

Activities

Epidemiology Investigation

Results

Epi-analysis Ready Data

Outbreak Investigation

IRIDA Platform Design

AcknowledgementsFunded by grants from Genome Canada, Genome BC, the Genomics R&D Initiative (GRDI), Cystic Fibrosis Canada and Compute Canada

Genomics

Pathogen Taxonomy

SOPS

Diagnostic Test

Result Report

Laboratory-test centric

Clinical-patient centric

Epidemiology-case centric

Host Taxonomy

Symptoms

Demographics

Treatment

Vaccines

DrugsGeography

Public Health Intervention

Exposure

Contact

Food

Travel

Environment

Temporal Info