databases to support disease-focused research type 1 diabetes huntington’s disease
DESCRIPTION
Databases to Support Disease-Focused Research Type 1 Diabetes Huntington’s Disease. Nat Goodman Institute for Systems Biology January 2003. The Basic Idea. Database (website) to support research of scientists working on diseases of interest Key challenge: make it useful! Data must be - PowerPoint PPT PresentationTRANSCRIPT
Databases to SupportDatabases to SupportDisease-Focused ResearchDisease-Focused Research
Type 1 DiabetesType 1 DiabetesHuntington’s DiseaseHuntington’s Disease
Nat GoodmanNat GoodmanInstitute for Systems BiologyInstitute for Systems Biology
January 2003January 2003
Slide 2VanBUG January, 2003Nat Goodman
The Basic IdeaThe Basic Idea
Database (website) to support research of Database (website) to support research of scientists working on diseases of interestscientists working on diseases of interest
Key challenge: make it useful!Key challenge: make it useful! Data must beData must be
relevantrelevant to current research to current research rigorously rigorously accurateaccurate timelytimely coordinatedcoordinated with other databases with other databases
Steering committee provides scientific directionSteering committee provides scientific direction Also, easy-to-use, Also, easy-to-use, yaddayadda, , yaddayadda, , yaddayadda
Slide 3VanBUG January, 2003Nat Goodman
What Else is Like This?What Else is Like This? Other disease-focused websitesOther disease-focused websites
Alzheimer Research Forum (Alzforum) Alzheimer Research Forum (Alzforum) http://www.alzforum.orghttp://www.alzforum.org ? ALS Therapy Development Foundation (ALS-TDF)ALS Therapy Development Foundation (ALS-TDF) Technology Technology disease databases disease databases
– Stanford breast cancer microarray websiteStanford breast cancer microarray website Any others?Any others?
Model organism databases Model organism databases MGD, FlyBase, WormBase, TAIR, SGD, …MGD, FlyBase, WormBase, TAIR, SGD, …
Protein family databasesProtein family databases GPCRs, cytochrome P450s, …GPCRs, cytochrome P450s, …
Locus-specific databases Locus-specific databases HLA, CF, …HLA, CF, …
Alliance for Cellular Signaling (AfCS)-Nature Gateway Alliance for Cellular Signaling (AfCS)-Nature Gateway
Slide 4VanBUG January, 2003Nat Goodman
Potential Data ScopePotential Data Scope Genomic regionsGenomic regions Genes & proteinsGenes & proteins
functional summariesfunctional summaries curated sequences, genomic curated sequences, genomic
context, structurescontext, structures orthologs, families, multiple orthologs, families, multiple
alignmentsalignments Microarray resultsMicroarray results GenotypesGenotypes Protein-protein interactionsProtein-protein interactions Pathway modelsPathway models Empirical results on hot topicsEmpirical results on hot topics ReagentsReagents
antibodies, mouse models, antibodies, mouse models, clones, constructs, …clones, constructs, …
Therapeutic studiesTherapeutic studies drug, transplantation, gene drug, transplantation, gene
transfertransfer molecular, cellular, lower molecular, cellular, lower
organism, mouse, other organism, mouse, other mammalsmammals
clinicalclinical Patient informationPatient information
clinical & pathologic featuresclinical & pathologic features BiomarkersBiomarkers Literature scanning and Literature scanning and
alertingalerting Reports of negative and “ho-Reports of negative and “ho-
hum” resultshum” results Lay explanationsLay explanations
Slide 5VanBUG January, 2003Nat Goodman
Practical ConcernsPractical Concerns
Too much dataprioritize!
Steering committee to the rescue
Too much overlapcollaborate!
AlzforumAlzforumRefSeqRefSeq
GOGOStanford HOPES!!!Stanford HOPES!!!
OMIMOMIMBINDBIND
?? MGD MGD MEDLINEMEDLINE
Too much softwarereuse!AlzforumAlzforum
other collaborating databases other collaborating databases PubCrawler PubCrawler
GBrowseGBrowseBioPerl BioPerl
Generic Model Organism Generic Model Organism Database (GMOD)Database (GMOD)
Slide 6VanBUG January, 2003Nat Goodman
Some Differences Between ProjectsSome Differences Between Projects
Data TypeData Type Type 1 DiabetesType 1 Diabetes HDHD
GenomicsGenomics ~17 susceptibility ~17 susceptibility regionsregions Single gene disorderSingle gene disorder
GenesGenes Several hundred genes Several hundred genes in susceptibility regionsin susceptibility regions
~40 huntingtin (Htt) interactors~40 huntingtin (Htt) interactors~100 genes of interest~100 genes of interest
MicroarrayMicroarray A few datasets availableA few datasets availableHereditary Disease Array Hereditary Disease Array Group led by Jim OlsonGroup led by Jim Olson
Others ?Others ?
GenotypingGenotyping Consortium for fine-Consortium for fine-scale mappingscale mapping
Two efforts to map age-of-Two efforts to map age-of-onset modifiersonset modifiers
TherapiesTherapies
Coordinated program for Coordinated program for islet cell transplantationislet cell transplantationGene & drug therapyGene & drug therapy
Pharma, too!Pharma, too!
Semi-coordinated program for Semi-coordinated program for drug screeningdrug screening
Separate clinical studiesSeparate clinical studiesOrphan diseaseOrphan disease
Slide 7VanBUG January, 2003Nat Goodman
First Data Scope for HD WebsiteFirst Data Scope for HD Website
Data TypeData Type DetailsDetails
Large scale Large scale datasetsdatasets
Mouse & molecular drug screeningMouse & molecular drug screeningProtein-protein interactions (Hughes, Myriad ProteomicsProtein-protein interactions (Hughes, Myriad Proteomics))Protein abundance in cerebrospinal fluid (Watts, ISB)Protein abundance in cerebrospinal fluid (Watts, ISB)
Gene listGene listHuman, mouse, rat orthologsHuman, mouse, rat orthologsSequencesSequencesFunctional summariesFunctional summaries
Empirical Empirical results results
Example: Htt interaction with transcription factors - Example: Htt interaction with transcription factors - binding, transcriptional activity, cell deathbinding, transcriptional activity, cell death
ReagentsReagentsAntibodiesAntibodiesGenetic constructsGenetic constructs
Pathway Pathway modelsmodels
Hypothesized disease mechanismsHypothesized disease mechanismsExample: Example: Htt & CREB-mediated transcription Htt & CREB-mediated transcription
Slide 8VanBUG January, 2003Nat Goodman
Pathway Model (Wild type)Pathway Model (Wild type)Normal CREB-mediated transcription
Software: VisualCell™ from Gene Network
Sciences
Slide 9VanBUG January, 2003Nat Goodman
Pathway Model (Diseased)Pathway Model (Diseased)
Software: VisualCell™ from Gene Network Sciences
Slide 10VanBUG January, 2003Nat Goodman
Steering Committee ResponseSteering Committee Response
Slide 11VanBUG January, 2003Nat Goodman
Steering Committee GuidelinesSteering Committee Guidelines
Peer-review! Peer-review!
Connect everything to literatureConnect everything to literature
Rigorously scrutinized, but diverse, scienceRigorously scrutinized, but diverse, science
Data – “just the facts, Ma’am” – not conjectureData – “just the facts, Ma’am” – not conjecture
Hypotheses presented as such – not as factHypotheses presented as such – not as fact
Slide 12VanBUG January, 2003Nat Goodman
My Response My Response Hmm… this is
kinda narrow for a community
website
Slide 13VanBUG January, 2003Nat Goodman
CompromiseCompromise
Community information
Non-reviewed
Primarydatasets
Non-reviewed
CoreReviewed scientific
materialTied to literature
Steering committee in charge!
Slide 14VanBUG January, 2003Nat Goodman
Current Core Data ScopeCurrent Core Data ScopeData TypeData Type DetailsDetails
Comprehensive bibliography Comprehensive bibliography Milestone papersMilestone papers
Annotation by curators & committeeAnnotation by curators & committeeUser commentsUser comments
Published drug screens in mousePublished drug screens in mouse Bibliography & datasetBibliography & dataset
Mouse modelsMouse models Bibliography & datasetBibliography & dataset
AntibodiesAntibodies Bibliography & datasetBibliography & dataset
Published microarray studiesPublished microarray studies Bibliography, lists of changed Bibliography, lists of changed genes, links to full datasetsgenes, links to full datasets
Gene listGene list
BibliographyBibliographyHuman, mouse, rat orthologsHuman, mouse, rat orthologsSequencesSequencesHtt interactionsHtt interactionsShort functional descriptionsShort functional descriptions
Slide 15VanBUG January, 2003Nat Goodman
Current Core ServicesCurrent Core Services
Genome / gene browserGenome / gene browser View genes in human, mouse, rat syntenic regionsView genes in human, mouse, rat syntenic regions Accesses UC Santa Cruz DAS server plus local databasesAccesses UC Santa Cruz DAS server plus local databases All standard Santa Cruz information visible here, tooAll standard Santa Cruz information visible here, too Based on GBrowse – collaboration with L. SteinBased on GBrowse – collaboration with L. Stein
Literature alertingLiterature alerting Specify MEDLINE queriesSpecify MEDLINE queries Can include our bibliographiesCan include our bibliographies System runs periodically to get new hitsSystem runs periodically to get new hits Based on PubCrawler– collaboration with K. Wolfe, K. Based on PubCrawler– collaboration with K. Wolfe, K.
HokampHokamp
Slide 16VanBUG January, 2003Nat Goodman
Current Satellite Data ScopeCurrent Satellite Data ScopeData TypeData Type DetailsDetails
NewsNews Like news in Like news in ScienceScience and and NatureNature
ForumForum
Interviews with leading scientistsInterviews with leading scientistsLive discussions on hot topics with Live discussions on hot topics with subsequent transcriptssubsequent transcriptsWeb delivery of presentations Web delivery of presentations Mini-reviews derived from aboveMini-reviews derived from above
Calendar of eventsCalendar of events Conferences, etc.Conferences, etc.
Contact info for HD researchersContact info for HD researchers With permission!With permission!
Lay explanationsLay explanations For major sections, at leastFor major sections, at least
Primary datasetsPrimary datasetsProtein-protein interactions (HughesProtein-protein interactions (Hughes))Protein abundance in CSF (Watts)Protein abundance in CSF (Watts)
Slide 17VanBUG January, 2003Nat Goodman
Help From Our FriendsHelp From Our FriendsData TypeData Type WhoWho WhatWhat
All bibliographiesAll bibliographies AlzforumAlzforum citation databasecitation database
Comprehensive bibliographyComprehensive bibliography AlzforumAlzforum scanning & librarianscanning & librarian
Mouse modelsMouse models AlzforumAlzforum databasedatabase
AntibodiesAntibodies AlzforumAlzforum database & curatordatabase & curator
Published microarray studiesPublished microarray studies HDAGHDAG data & reviewdata & review
Gene listGene list
MGDMGD orthologs (we hope)orthologs (we hope)
RefSeqRefSeq sequences, descriptionssequences, descriptions
GOGO annotationsannotations
BINDBIND Htt interactionsHtt interactions
News, forum, calendar, contactsNews, forum, calendar, contacts AlzforumAlzforum
Lay explanationsLay explanations HOPESHOPES
Primary datasetsPrimary datasets Myriad, ISBMyriad, ISB datadata
Slide 18VanBUG January, 2003Nat Goodman
Software ArchitectureSoftware Architecture
Perl / CGI
scripts
RefSeq
MGD (?)
BIND
local databases
citations
antibodies
mouse models
news & things
Alz
foru
m
Other friends
Delivery by FTP & API, too
web delivery
Slide 19VanBUG January, 2003Nat Goodman
Genome Browser ScreenshotGenome Browser Screenshot
Slide 20VanBUG January, 2003Nat Goodman
Alzforum Home PageAlzforum Home Page
Slide 21VanBUG January, 2003Nat Goodman
Alzforum Papers of the WeekAlzforum Papers of the Week
Slide 22VanBUG January, 2003Nat Goodman
Alzforum Mouse Model ListAlzforum Mouse Model List
Slide 23VanBUG January, 2003Nat Goodman
A Few Words About IPA Few Words About IP
Open sourceOpen source Open dataOpen data Strong privacyStrong privacy
Slide 24VanBUG January, 2003Nat Goodman
Four Rules for a Successful WebsiteFour Rules for a Successful Website1.1. Too much dataToo much data
Prioritize!Prioritize! What will be most useful?What will be most useful? Rely on scientific expertsRely on scientific experts
2.2. Too much softwareToo much software Reuse!Reuse! Lots of great software availableLots of great software available Developers willing to helpDevelopers willing to help
3.3. Too much overlapToo much overlap Collaborate!Collaborate! Many databases welcome thisMany databases welcome this Less work – better product -- more fun!Less work – better product -- more fun!
4.4. Obsess on qualityObsess on quality Bad data wastes everyone’s timeBad data wastes everyone’s time
Slide 25VanBUG January, 2003Nat Goodman
AcknowledgementsAcknowledgements
ISB Project TeamISB Project TeamGeorge LakeGeorge Lake
Michelle WhitingMichelle WhitingPaul EdlefsenPaul EdlefsenRobert HubleyRobert Hubley
HDFHDFCarl JohnsonCarl Johnson
Minka van BeuzekomMinka van Beuzekom
Steering CommitteeSteering CommitteeCarl Johnson and Minka Carl Johnson and Minka
van Beuzekom, HDFvan Beuzekom, HDF
Dan GoldowitzDan GoldowitzUniversity of University of TennesseeTennessee
Emma HocklyEmma HocklyGuy’s HospitalGuy’s Hospital
Bruce KristalBruce KristalCornell UniversityCornell University
Marcy MacDonaldMarcy MacDonaldMassachusetts Massachusetts
General HospitalGeneral Hospital
Ray TruantRay TruantMcMaster UniversityMcMaster University
AlzforumAlzforumJune KinoshitaJune Kinoshita
RefSeqRefSeqKim PruittKim Pruitt
GO ConsortiumGO ConsortiumEvelyn CamonEvelyn Camon
HOPESHOPESBill DurhamBill Durham
HDAGHDAGJim OlsonJim Olson
Myriad ProteomicsMyriad ProteomicsBob HughesBob Hughes
ISBISBJulian WattsJulian Watts