data integration, gene ontology, and the mouse* joel richardson, ph.d. mouse genome informatics...

17
Data Data Integration, Integration, Gene Ontology, Gene Ontology, and the Mouse* and the Mouse* Joel Richardson, Ph.D. Joel Richardson, Ph.D. Mouse Genome Informatics Group Mouse Genome Informatics Group The Jackson Laboratory The Jackson Laboratory Bar Harbor, Maine 04609 Bar Harbor, Maine 04609 * Not necessarily in that order.

Upload: viviana-rawe

Post on 02-Apr-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Data Integration, Gene Ontology, and the Mouse* Joel Richardson, Ph.D. Mouse Genome Informatics Group The Jackson Laboratory Bar Harbor, Maine 04609 *

Data Integration, Data Integration, Gene Ontology, Gene Ontology, and the Mouse*and the Mouse*

Joel Richardson, Ph.D.Joel Richardson, Ph.D.

Mouse Genome Informatics GroupMouse Genome Informatics Group

The Jackson LaboratoryThe Jackson Laboratory

Bar Harbor, Maine 04609Bar Harbor, Maine 04609

* Not necessarily in that order.

Page 2: Data Integration, Gene Ontology, and the Mouse* Joel Richardson, Ph.D. Mouse Genome Informatics Group The Jackson Laboratory Bar Harbor, Maine 04609 *

We have the human We have the human sequence: OK, sequence: OK, nownow what? what?

One species is not enough:One species is not enough: model organisms (one strain is not enough)model organisms (one strain is not enough) comparative studiescomparative studies

The sequence is just the beginningThe sequence is just the beginning sequence variantssequence variants gene regulation and interaction networksgene regulation and interaction networks non-coding functional elementsnon-coding functional elements environmental effectsenvironmental effects

Genotype to phenotypeGenotype to phenotype

Page 3: Data Integration, Gene Ontology, and the Mouse* Joel Richardson, Ph.D. Mouse Genome Informatics Group The Jackson Laboratory Bar Harbor, Maine 04609 *

The MouseThe Mouse

the premier animal model for the premier animal model for studying human diseasestudying human disease

> 95% same genes> 95% same genes same diseases, similar reasons same diseases, similar reasons

(e.g., cancer, hypertension, (e.g., cancer, hypertension, diabetes, osteoporosis, …)diabetes, osteoporosis, …)

1000s lab strains, diff. 1000s lab strains, diff. characteristicscharacteristics

precise genetic controlprecise genetic control

Page 4: Data Integration, Gene Ontology, and the Mouse* Joel Richardson, Ph.D. Mouse Genome Informatics Group The Jackson Laboratory Bar Harbor, Maine 04609 *

The Jackson The Jackson LaboratoryLaboratory

Private nonprofit research Private nonprofit research institution (est. 1929)institution (est. 1929)

Studying mouse as a model of Studying mouse as a model of human biology and diseasehuman biology and disease

National Cancer Research National Cancer Research CenterCenter

Supplier of laboratory strains to Supplier of laboratory strains to researchers worldwideresearchers worldwide

Areas: metabolism, Areas: metabolism, development, cancer, immune development, cancer, immune responseresponse

www.jax.org

Page 5: Data Integration, Gene Ontology, and the Mouse* Joel Richardson, Ph.D. Mouse Genome Informatics Group The Jackson Laboratory Bar Harbor, Maine 04609 *

Bar Harbor, ME 04609Bar Harbor, ME 04609

Page 6: Data Integration, Gene Ontology, and the Mouse* Joel Richardson, Ph.D. Mouse Genome Informatics Group The Jackson Laboratory Bar Harbor, Maine 04609 *

Mouse Genome Mouse Genome Informatics (MGI)Informatics (MGI)

Consortium of NIH-funded projects Consortium of NIH-funded projects Housed at TJLHoused at TJL Integrates and disseminates public Integrates and disseminates public

data resources covering selected data resources covering selected aspects of mouse biologyaspects of mouse biology

First program project funding 1989First program project funding 1989 > $10M/y total, >60 people> $10M/y total, >60 people Online since 1994.Online since 1994.

Page 7: Data Integration, Gene Ontology, and the Mouse* Joel Richardson, Ph.D. Mouse Genome Informatics Group The Jackson Laboratory Bar Harbor, Maine 04609 *

www.informatics.jax.orgwww.informatics.jax.org

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 8: Data Integration, Gene Ontology, and the Mouse* Joel Richardson, Ph.D. Mouse Genome Informatics Group The Jackson Laboratory Bar Harbor, Maine 04609 *

MGI Concept MapMGI Concept Map

Genes and other loci

ExpressionData

MappingData

MolecularFragments

DNA and Protein

Sequences

Strains

Phenotypes

AnatomyGenotypes Alleles

References

AccessionIDs

Variants

Page 9: Data Integration, Gene Ontology, and the Mouse* Joel Richardson, Ph.D. Mouse Genome Informatics Group The Jackson Laboratory Bar Harbor, Maine 04609 *

Integration in MGIIntegration in MGI

Identifying objects.Resolving or notingdiscrepancies.

Integration is key to Integration is key to knowledge discoveryknowledge discovery in age of genomicsin age of genomics

Page 10: Data Integration, Gene Ontology, and the Mouse* Joel Richardson, Ph.D. Mouse Genome Informatics Group The Jackson Laboratory Bar Harbor, Maine 04609 *

The Power Of Integration: The Power Of Integration: QueriesQueries

What transcription factors are expressed in a 2-cell What transcription factors are expressed in a 2-cell embryo and embryo and notnot in a blastocyst? in a blastocyst? integration of multiple expression assay data sets and data types.integration of multiple expression assay data sets and data types. standardization of anatomical references and developmental standardization of anatomical references and developmental

stagesstages What development QTLs contain these TFs?What development QTLs contain these TFs?

integration of expression data and mapping dataintegration of expression data and mapping data genetic map result of integrating lots of mapping datagenetic map result of integrating lots of mapping data

What strains are distinguished by SNPs in this region?What strains are distinguished by SNPs in this region? And so on…And so on…

Page 11: Data Integration, Gene Ontology, and the Mouse* Joel Richardson, Ph.D. Mouse Genome Informatics Group The Jackson Laboratory Bar Harbor, Maine 04609 *

The MGI System The MGI System (from 40,000 feet)(from 40,000 feet)

MGIRDBMS

Web Files

Data Downloads

Literature Curation

SQL

Load scripts

Editing Interface

Servlets CGI ScriptsFiles

Report Scripts

Page 12: Data Integration, Gene Ontology, and the Mouse* Joel Richardson, Ph.D. Mouse Genome Informatics Group The Jackson Laboratory Bar Harbor, Maine 04609 *

MGI in ContextMGI in Context

MGI dbScientific

Literature

Mutagenesis

Centers

GenBank

LocusLink

Unigene TIGRDoTS

OMIM

Ensembl

GO

Interpro

SwissProt

ATCC

RIKEN

Anatomy

RPCI

RatMap

NIA

MGC

I.M.A.G.E.NCBI

RefSeq

Page 13: Data Integration, Gene Ontology, and the Mouse* Joel Richardson, Ph.D. Mouse Genome Informatics Group The Jackson Laboratory Bar Harbor, Maine 04609 *

Integration relies on Integration relies on Standard VocabulariesStandard Vocabularies

Structured vocabulariesStructured vocabularies The common semantic frameworksThe common semantic frameworks Structured into is-a/part-of hierarchiesStructured into is-a/part-of hierarchies

Evidence-based annotationEvidence-based annotation Associations of vocabulary terms with Associations of vocabulary terms with

objectsobjects Evidence (codes), citations, etc., Evidence (codes), citations, etc.,

decorate the associationsdecorate the associations Structured annotations and queriesStructured annotations and queries

Page 14: Data Integration, Gene Ontology, and the Mouse* Joel Richardson, Ph.D. Mouse Genome Informatics Group The Jackson Laboratory Bar Harbor, Maine 04609 *

Structured Vocabularies Structured Vocabularies in MGIin MGI

Gene Ontology (GO)Gene Ontology (GO) Functional gene annotationsFunctional gene annotations

Mammalian Phenotype (MP)Mammalian Phenotype (MP) Annotations to genotypes (e.g. knockouts)Annotations to genotypes (e.g. knockouts)

Mouse Anatomical DictionaryMouse Anatomical Dictionary Annotations of expressionAnnotations of expression

Other standardized, non-structured vocabulariesOther standardized, non-structured vocabularies Mouse strainsMouse strains cell linescell lines clone librariesclone libraries tissuestissues lots of smaller oneslots of smaller ones

Page 15: Data Integration, Gene Ontology, and the Mouse* Joel Richardson, Ph.D. Mouse Genome Informatics Group The Jackson Laboratory Bar Harbor, Maine 04609 *

ChallengesChallenges Domain very difficult to frameDomain very difficult to frame Huge variability, variety of data, formats, Huge variability, variety of data, formats,

providors, update providors, update schedulesschedules&semantics, &semantics, etc…etc…

Biologists and Computer Scientists think Biologists and Computer Scientists think differently.differently. communication is paramount, but difficultcommunication is paramount, but difficult

Rapid changes, e.g., in last 10 years:Rapid changes, e.g., in last 10 years: genetic crosses -> YAC/BAC mapping -> RH genetic crosses -> YAC/BAC mapping -> RH

mapping -> genome sequence mapping -> genome sequence northern blots -> microarrays -> mpssnorthern blots -> microarrays -> mpss

Page 16: Data Integration, Gene Ontology, and the Mouse* Joel Richardson, Ph.D. Mouse Genome Informatics Group The Jackson Laboratory Bar Harbor, Maine 04609 *

System EvolutionSystem Evolution

The system is a software The system is a software ecosystemecosystem

Maintenance is the cost of Maintenance is the cost of successsuccess

Changes and cost/benefitChanges and cost/benefit If it ain’t broke, don’t fix itIf it ain’t broke, don’t fix it Commitments/agenda/prioritiesCommitments/agenda/priorities

Page 17: Data Integration, Gene Ontology, and the Mouse* Joel Richardson, Ph.D. Mouse Genome Informatics Group The Jackson Laboratory Bar Harbor, Maine 04609 *

CreditsCreditsRichard BaldarelliMatt BayaJon BealDale BegleyJudy BlakeJohn BoddyDirck BradtCarol BultNancy ButlerDonna BurkartJeff CampbellLori CorbaniRebecca CoreySharon CousinsDiane DahmenHarold DrabkinJanan EppigJackie FingerDavid Garippa

Lucette GlassCarroll GoldsmithPat GrantTerry HayamizuDavid HillJim KadinBen KingDebbie KrupkeMoyha Lennon-PierceJill LewisIra LuCathy LutzLois MaltaisPrita ManiMike McCrossinLouise McKenzieDavid MiersDaniel ModrusanDieter Naf

Li NiJanice OrmsbySridhar RamachandranDeborah ReedJoel RichardsonMartin RingwaldDavid ShawBob SinclairCynthia SmithConnie SmithPaul SzauterLeslie TrombleyPierre Vanden BorreMichael WalkerLinda WashburnJosh WinslowIry WithamSophia Zhu