databases and tools to study the genomes of hundreds of pathogens, plants, and mammals

29
Databases and tools to study the genomes of hundreds of pathogens, plants, and mammals Richard H. Scheuermann, Ph.D. Director of Informatics J. Craig Venter Institute (JCVI)

Upload: chet

Post on 22-Feb-2016

23 views

Category:

Documents


0 download

DESCRIPTION

Databases and tools to study the genomes of hundreds of pathogens, plants, and mammals . Richard H. Scheuermann, Ph.D. Director of Informatics J. Craig Venter Institute (JCVI). A brief history of genomics. H. sapiens 3 x 10 9 bp phased. ΦX174 5,375 bp. H. influenza 1.8 x 10 6 bp. - PowerPoint PPT Presentation

TRANSCRIPT

Slide 1

Databases and tools to study the genomes of hundreds of pathogens, plants, and mammals Richard H. Scheuermann, Ph.D.Director of InformaticsJ. Craig Venter Institute (JCVI)1Thank the organizers for the invitation to speak. I was asked to speak about databases and tools to study the genomes of pathogens, plants and mammals, which is a pretty daunting task. So, Ill begin with a very brief overview about the current state genomics resources and the role that JCVI has played in driving the field. Ill end with an example of how we are using these kinds of genomics resources to understand the nature of human infectious disease outbreaks.A brief history of genomics

X1745,375 bp1977

H. influenza1.8 x 106 bp1995

H. sapiens3 x 109 bp2001

H. sapiens3 x 109 bpphased2007The field of genomics began with the publication of the complete genome sequence of the small bacterial virus phiX174 by a group led by Fred Sanger using his dideoxy chain termination methodology. The next big milestone occurred almost 20 year later with the publication of the first complete genome for a free living organism, the bacterium Haemophilus influenza, in 1995 by investigators at The Institute for Genomics Research, the predecessor to JCVI. These advances in genomics then culminated in the publication of the first human genome sequences by two groups in 2001, including a group lead by Craig Venter and his colleagues at Celera.2Sequencing Costs

NGSThe next big advance in the field was methodological. The determination of the first human genome sequences was a costly endeavor estimated to be in the 100s of millions of dollars. While cost started to slowly drop, the development of a variety of next generation sequencing methods in which capillary electrophoresis was replaced by array-based approaches had a dramatic affect on sequencing costs, such that the $1000 human genome is well within reach. Thus, the era of human genomics is upon us, much earlier than people expected 10 years ago.3Database resources human genomicshttp://cancergenome.nih.gov

http://www.1000genomes.org

http://genome.ucsc.edu

http://huref.jcvi.org

http://www.ncbi.nlm.nih.gov/SNP/

http://www.ncbi.nlm.nih.gov/clinvar/

http://www.knome.com

humangenomicsresourcesIn order to make genomics data and analysis and visualization tools broadly available, a number of database resources have been developed and made freely available. In the next few slides I list only a subset of these as examples. In some cases, like the HuRef and UCSC genome browsers, the resources are focused on single human genomes. Other resources like dbSNP and the 1000 genomes project are designed to support general population sequence variation data. Still other resources, like ClinVar and TCGA, are focused on sequence variations of clinical relevance.4Database resources plant genomicsplantgenomicsresourceshttp://www.plantgdb.org

http://www.gramene.org

https://arabidopsis.org

http://www.jcvi.org/cgi-bin/medicago/overview.cgi

http://www.iplantcollaborative.org

Likewise, a variety of database resources focused on plant genomics are available, including TAIR, Gramene, the Medicago resource at JCVI, and the iPlant Collaborative.5Database resources human microbiome and metagenomicswww.hmpdacc.orghttp://camera.calit2.net

One of the relatively new areas of genomics is the evaluation of populations of organisms in defined ecosystems, so called metagenomics, with data available through the Human Microbiome Projects Data Analysis and Coordination Center and the Camera Portal at CalIT2.6

www.viprbrc.orgwww.fludb.orgDatabase resources pathogen genomics www.patricbrc.orgwww.eupathdb.orgwww.vectorbase.orgNIAID Bioinformatics Resource Centers (BRCs)And finally, a large number of database resources focused on microbes are also available, including five Bioinformatics Resource Centers funded by the U.S. National Institute of Allergy and Infectious Diseases that are focused on human pathogens and their invertebrate vectors. Ill come back to these later in the talk.7Research at JCVI

JCVI and its predecessor TIGR has been a leader in the field since its inception, mostly recently including the determination of the first (and only) human diploid genome and a driver in the metagenomics and human microbiome fields. Our research activities have been divided into functional groups, including Genomic Medicine, Infectious Disease, Microbial and Environmental Genomics, Plant Genomics, Synthetic Biology and Public Policy. In addition to these functional groups, a number of cross cutting platforms for Informatics, IT and Sequencing have been established as organizational units.8

Infectious DiseaseAs an example, the research within the Infectious Disease group includes a Center for Structural Genomics in which the 3D atomic structures for pathogen proteins are determined, and Genome sequencing Center in which genome sequences are determined, and microbial pathogenesis program in which the molecular determinants of microbial pathogenesis are identified.9

Viral GenomicsIn terms of Viral Genomics, a number of important viral species are being evaluated including Adenovirus, coronaviruses causing SARS and MERS, influenza and rotavirus.10

As an example, out of a total of 15,557 complete genome sequences for flu A and 16,064 for all flu as of October 2, 2013 in IRD, >12,000 were determined by the Viral Genomics group at JCVI11

In addition to these research programs focused on biological areas, a number of research programs are focused on informatics research.12

In many cases, these bioinformatics research programs produce analysis and visualization tools that are routinely made available to the broader research community at open source software packages.Hopefully this brief summary has given you a sense for the state of the genomics field and the role that JCVI has played in it.13

www.viprbrc.orgwww.fludb.orgDatabase resources pathogen genomics www.patricbrc.orgwww.eupathdb.orgwww.vectorbase.orgNIAID Bioinformatics Resource Centers (BRCs)For the remained of my time, Id like to focus on one example of how we use these genomics resources to better understand human disease. As I mentioned earlier, the NIAID has supported the development of five BRCs focused on data related to the major categories of human pathogens and vectors. One of the key biological questions that they are intended to support related to the outbreak of new infectious disease that frequently arise through the spillover of a pathogen from an animal species into human.14Zoonosis SummaryA zoonosis is an infectious disease that is transmitted between species (sometimes by a vector) from animals other than humans to humans or from humans to other animals.Of the 1415 recognized species of human pathogens, 61% are of zoonotic origin [Taylor 2001].These include Hendra, Nipah, Machupo, Ebola, Influenza A, SARS-CoV, Yersinia pestis, Borrelia burgdorferi, Plasmodium knowlesi.Use of comparative genomics to understand zoonotic spillover what are the genetic determinants that allow an animal virus to adapt to humanH7N9 use caseIn February and March 2013, several human cases of influenza virus A H7N9 subtype were identified in Shanghai, China and surrounding provinces.As of August 12, 2013, a total of 135 human cases have been laboratory confirmed, including 44 deaths for a case fatality rate of 33%.A search for H7 influenza strains in IRD (data accessed from www.fludb.org on April 8, 2013) returned a total of 1485 strains, with 1306 from birds, 102 from environmental samples (usually bird droppings), 33 from horses and only 17 from humans prior to the recent outbreak.Of the 17 human H7 isolates, 12 were H7N7 from England 1996 and the Netherlands 2003. None were H7N9.Questions Where is the reservoir source of this newly emerging human pathogen?What are the genetic determinants allowing for human adaptation?Virus pathogen genomics IRD

www.fludb.org

H7N9 human HA query

H7N9 human HA query result

12H7N9 human HA query result

123H7N9 human HA query result

1234Phylogenetic analysis

PhyML in IRDSequence alignment

Statistical comparative genomics

Meta-CATS in IRDSequence features affected

Sequence features affected

3D structure

H7N9 and JCVINew isolate sequencingPublic release of sequence dataData Analysis Visualization - IntegrationComparative genomics analysisSynthetic genomics of vaccine seed strainsJCVI Core CompetenciesHuman, microbial, and plant genomicsMicrobiome and metagenomicsSynthetic genomicsData management, analysis, and miningNovel computational methods developmentInformatics infrastructure developmentWeb applicationsHigh performance computingCloud computing