ontology support for influenza and surveillance
DESCRIPTION
This is a presentation about the construction of the influenza ontology to support Influenza research and surveillance as part of the Genomics for Bio-forensics MITRE sponsored research.TRANSCRIPT
© 2006 The MITRE Corporation. All rights reserved
Ontology Support for Influenza Research and Surveillance
Joanne Luciano, PhD, Lynette Hirschman, PhD, Marc Colosimo, PhD
Approved for Public Release; Distribution Unlimited.
28 April 2008 Case Number 08-0738
© 2008 The MITRE Corporation. All rights reserved Approved for Public Release; Distribution Unlimited. 28 April 2008 Case Number 08-0738
2
Case Study 1: Indonesia ■ Possible Human to Human transmission of H5N1 (May 2006) ■ Samples were collected and epidemiological data obtained
– Know who got sick and their relationship to each other – Know when they got sick and if they died – Have some public sequence data from that time
■ It is not known if these sample are from these people!
Public Sequence Data
30 Aug 2006
A/Indonesia/CDC595/2006 (2006-05-09)
A/Indonesia/CDC594/2006 (2006-05-10)
….
A/Indonesia/CDC625L/2006 (2006-05-22)
A/Indonesia/CDC644/2006 (2006-05-30)
WHO Nature
Same person ?
GenBank
isolation_source="gender:M; age:32; Lung Aspirate"
Metadata 23 May 2006
Metadata 12 Jul 2006
Butler “Family tragedy spotlights flu mutations “ Nature 442, 114 - 115 (12 Jul 2006) News
© 2008 The MITRE Corporation. All rights reserved Approved for Public Release; Distribution Unlimited. 28 April 2008 Case Number 08-0738
3
Case Study 2: UK ■ Outbreak of H5N1 in the UK at a turkey farm Feb 1, 2007 ■ What is the source of the outbreak?
– Contact with infected wild birds? ■ But turkeys were in an enclosed “biosecure” unit ■ No H5N1 detected in the region in the 2 previous months
– Govt. veterinarian suggested turkey meat from Hungary might be source of infection ■ Turkey farm is adjacent to a poultry packing plant that had processed
poultry products from Hungary ■ Hungary had reported an H5N1 outbreak 2 weeks earlier
■ Sequence data showed that strain infecting the turkeys was 99.96% identical to strain that had infected Hungarian birds
■ Conclusion: Infected Hungarian poultry was source of H5N1 infection – Open question (relevant to food defense):
how did H5N1 spread from processing plant to live turkeys?
© 2008 The MITRE Corporation. All rights reserved Approved for Public Release; Distribution Unlimited. 28 April 2008 Case Number 08-0738
4
Research Agenda
■ Reference Database (Sequences & Metadata) – What metadata to collect? – Where to find data and how to connect different sources
(bridging the gap)?
© 2008 The MITRE Corporation. All rights reserved Approved for Public Release; Distribution Unlimited. 28 April 2008 Case Number 08-0738
5
Genomic Sequence Data
Systems Biology
Demographic data Clinical data
Research Question: Bridging the Gap - Connecting Genomics and Epidemiology
Geospatial data Temporal data Pathogenicity
Host Epidemiology: Occurrence of
Disease in Host
Genomics: Genes of Pathogen
Influenza Ontology
© 2008 The MITRE Corporation. All rights reserved Approved for Public Release; Distribution Unlimited. 28 April 2008 Case Number 08-0738
6
■ Identify the right collaborators ■ Collect metadata terms ■ Identify resources for that include these terms ■ Regularize metadata
– Generate a controlled vocabulary (terms)
■ Validate subset with BioHealthBase CEIRS data ■ Iterate, review with community, publish ■ Integrate Influenza ontology into workflow
Influenza Ontology: Development
© 2008 The MITRE Corporation. All rights reserved Approved for Public Release; Distribution Unlimited. 28 April 2008 Case Number 08-0738
7
Influenza Ontology First Draft: Community
■ BioHealthBase: NIAID Influenza Database Point of Contact for
– Centers of Excellence for Influenza Research and Surveillance (CEIRS) ■ Research: Emory, Mt Sinai, St. Jude, Univ. of Rochester ■ Surveillance: St. Jude, UCLA, Univ. of Minnesota
– Los Alamos National Laboratory (LANL)
■ Gemina: Category A-C Pathogen Database Point of contact for
– Children’s Hospital Boston – Johns Hopkins University
■ MITRE
Collaboration with BioHealthBase and Gemina
© 2008 The MITRE Corporation. All rights reserved Approved for Public Release; Distribution Unlimited. 28 April 2008 Case Number 08-0738
8
Influenza Ontology First Draft: Identify metadata
200 controlled vocabulary terms covering several fields
© 2008 The MITRE Corporation. All rights reserved Approved for Public Release; Distribution Unlimited. 28 April 2008 Case Number 08-0738
9
Reuse of existing ontologies & metadata standards
OBI – Ontology of Biomedical Investigations EnvO – Environmental Ontology (habitat of pathogen) GAZ – Gazeteer (geographic locations) FMA – Foundational Model of Anatomy DC – Dublin Core (publication metadata) PATO – Phenotype SO – Sequence Ontology (sequence features) Cell – Cell Ontology (types of cells) DO – Disease Ontology IDO – Infectious Disease Ontology
Influenza Ontology First Draft: Metadata resources
© 2008 The MITRE Corporation. All rights reserved Approved for Public Release; Distribution Unlimited. 28 April 2008 Case Number 08-0738
10
Influenza Ontology First Draft
Excel Spreadsheet
Initial steps: • Collect metadata terms • Map and align terms • Group related information • Identify and define relationships • Identify external ontologies
Formalize
OBO-Edit: Ontology Editing Tool
Formalize: • Normalize terms into a CV • Issue unique identifiers • Instantiate class hierarchy • Define properties and values • Link to external ontology terms
Status: We have just started the formalization step.
© 2008 The MITRE Corporation. All rights reserved Approved for Public Release; Distribution Unlimited. 28 April 2008 Case Number 08-0738
11
Future Work
Ontology development ■ Complete formalization process ■ Validate subset with data from BioHealthBase
– Circulate for review and comments
■ Use ontology to annotate influenza data
© 2008 The MITRE Corporation. All rights reserved Approved for Public Release; Distribution Unlimited. 28 April 2008 Case Number 08-0738
12
Team
■ BioHealthBase (UT Southwestern Medical Center) – Burke Squires – Richard Scheuermann
■ Institute of Genome Sciences/Gemina (U. Maryland Baltimore) – Lynn Schriml
■ MITRE – Joanne Luciano – Lynette Hirschman – Marc Colosimo