university of illinois at urbana-champaign institute for genomic biology beespace: an interactive...

29
University of Illinois at Urbana-Champaign INSTITUTE FOR GENOMIC BIOLOGY BeeSpace: BeeSpace: An An Interactive Environment Interactive Environment for Functional Analysis of Social for Functional Analysis of Social Behavior Behavior Bruce Schatz, Principal Investigator Graduate School of Library & Information Science (GSLIS) Department of Computer Science, Program in Neuroscience [email protected] , www.canis.uiuc.edu Theme for Genomics of Neural and Behavioral Plasticity www.beespace.uiuc.edu IGB Thematic Research Seminar, November 2, 2004

Upload: vernon-daniel

Post on 29-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

BeeSpace: BeeSpace: An An Interactive Environment Interactive Environment

for Functional Analysis of Social Behaviorfor Functional Analysis of Social Behavior

BeeSpace: BeeSpace: An An Interactive Environment Interactive Environment

for Functional Analysis of Social Behaviorfor Functional Analysis of Social Behavior

Bruce Schatz, Principal InvestigatorGraduate School of Library & Information Science (GSLIS)

Department of Computer Science, Program in [email protected], www.canis.uiuc.edu

Theme for Genomics of Neural and Behavioral Plasticity

www.beespace.uiuc.edu

IGB Thematic Research Seminar, November 2, 2004

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

BBeeee Counted – Vote Today!Counted – Vote Today!BBeeee Counted – Vote Today!Counted – Vote Today!

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

BeeSpace FIBR ProjectBeeSpace FIBR ProjectBeeSpace FIBR ProjectBeeSpace FIBR ProjectBeeSpace project is NSF FIBR flagshipFrontiers Integrative Biological Research, $5M for 5 years at University of Illinois

Nature-Nurture using honey bee as modelGenome technologies in wet lab and dry lab biology

Localized Gene Expression for Normal Social BehaviorGene Robinson, Entomology (behavioral expressions)Susan Fahrbach, Entomology (anatomical localization)Sandra Rodriguez-Zas, Animal Sciences (data analysis)

Interactive Information System for Functional AnalysisBruce Schatz, Library & Information Science (info systems)ChengXiang Zhai, Computer Science (text analysis)Chip Bruce, Library & Information Science (user support)

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

Post-Genome InformaticsPost-Genome InformaticsPost-Genome InformaticsPost-Genome InformaticsClassical Organisms have extensive Genetic DescriptionsThere will be NO more classical organisms beyondMice and Men other than Worms and Flies, Yeasts and

Weeds.

So must use comparative genomics to classical organisms,Via sequence homologies and literature analysis.

Automatic annotation of genes to standard classifications,Such as Gene Ontology via sequence homology.

Automatic analysis of functions to scientific literature,Such as concept spaces via text mining.

Descriptions in Literature MUST be used for future interactive environments for functional analysis!

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

Informational ScienceInformational ScienceInformational ScienceInformational ScienceComputational Science is widely accepted as theThird Branch of Science (beyond Experimental and

Theoretical)

Genes are Computed, Proteins are Computed, Sequence “equivalences” are Computed.

Informational Science is coming to be accepted as theFourth Branch of Science

Based on Information Science technologies forFunctional Mining of Information Sources

Comparative Analysis within theDry Lab of Biological Knowledge

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

Conceptual Navigation in Conceptual Navigation in BeeSpaceBeeSpaceConceptual Navigation in Conceptual Navigation in BeeSpaceBeeSpace

NeuroscienceLiterature

MolecularBiology

Literature

BeeLiterature

Flybase,WormBase

BeeGenome

Brain RegionLocalization

Brain GeneExpression

Profiles

BehavioralBiologist

MolecularBiologist

Neuro-scientist

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

Biology: The Model OrganismBiology: The Model OrganismBiology: The Model OrganismBiology: The Model OrganismThe Western Honey Bee, Apis melliferahas become a primary model for social behavior

Complex social behavior in controllable urban environmentNormal Behavior – honey bees live in the wildControllable Environment – hives can be modified

Small size manageable with current genomic technologyCapture bees on-the-fly during normal behaviorRecord gene expressions for whole-brain or brain-region

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

Informatics: From Bases to Informatics: From Bases to SpacesSpacesInformatics: From Bases to Informatics: From Bases to SpacesSpacesdata Bases support genome datae.g. FlyBase has sequences and mapsGenes annotated by GeneOntology and linked to literature

BeeBase (Christine Elsik, Texas A&M)Uses computed homologies to annotate genes

information Spaces support biomedical literaturee.g. BeeSpace uses automatically generated conceptual relationships to navigate functions

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

BeeSpace Software EnvironmentBeeSpace Software EnvironmentBeeSpace Software EnvironmentBeeSpace Software EnvironmentWill build a Concept Space of Biomedical Literature for Functional Analysis of Bee Genes

-Partition Literature into Community Collections-Extract and Index Concepts within Collections-Navigate Concepts within Documents-Follow Links from Documents into Databases

Locate Candidate Genes in Related Literatures then follow links into Genome Databases

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

BeeSpace Software BeeSpace Software ImplementationImplementationBeeSpace Software BeeSpace Software ImplementationImplementation

Natural Language Processing

Identify noun phrasesRecognize biological entities

Statistical Information Retrieval Compute statistical contextsSupport conceptual navigation

Network Information SystemConcept switch across community collectionsSemantic Links into biological databases

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

BeeSpace Information SourcesBeeSpace Information SourcesBeeSpace Information SourcesBeeSpace Information Sources

Biomedical Literature- Medline (medicine)- Biosis (biology)- Agricola, CAB Abstracts, Agris (agriculture)

Model Organisms (heredity)-Gene Descriptions (FlyBase, WormBase)

Natural Histories (environment)-BeeKeeping Books (Cornell Library, Harvard Press)

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

Worm Community System (1991)Worm Community System (1991)Worm Community System (1991)Worm Community System (1991)

WCS Information SourcesLiterature Biosis, Medline, newsletters, meetings

Data Genes, Maps, Sequences, strains, cells

WCS Interactive EnvironmentBrowsing search, navigationFiltering selection, analysisSharing linking, publishing

WCS: 250 users at 50 labs across Internet (1991)

Flagship in NSF National Collaboratory program

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

WCSMolecular

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

WCS Cellular

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

WCSPPCS

demo

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

Medical Concept Spaces (1998)Medical Concept Spaces (1998)Medical Concept Spaces (1998)Medical Concept Spaces (1998)

Obtain discipline-scale collection Medline from NLM, 10M bibliographic abstractshuman classification: Medical Subject Headings

Partition discipline into Community Repositories

4 core terms per abstract for MeSH classification32K nodes with core terms (classification tree)

Community is all abstracts classified by core term

40M abstracts containing 280M conceptscomputation took 2 days on NCSA Origin 2000

Simulating World of Medical Communities10K repositories with > 1K abstracts (1K w/ > 10K)

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

Navigation in MedSpaceNavigation in MedSpaceNavigation in MedSpaceNavigation in MedSpaceFor a patient with Rheumatoid Arthritis

Find a drug that reduces the pain (analgesic)but does not cause stomach (gastrointestinal) bleeding

Choose DomainChoose DomainChoose DomainChoose Domain

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

Concept SearchConcept SearchConcept SearchConcept Search

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

Concept NavigationConcept NavigationConcept NavigationConcept Navigation

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

Retrieve DocumentRetrieve DocumentRetrieve DocumentRetrieve Document

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

Biomedical SessionBiomedical SessionBiomedical SessionBiomedical Session

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

Categories and ConceptsCategories and ConceptsCategories and ConceptsCategories and Concepts

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

Concept SwitchingConcept SwitchingConcept SwitchingConcept Switching

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

Document RetrievalDocument RetrievalDocument RetrievalDocument Retrieval

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

Biological Concept Spaces (2005)Biological Concept Spaces (2005)Biological Concept Spaces (2005)Biological Concept Spaces (2005)Compute concept spaces for All of BiologyBioSpace across entire biomedical literature

50M abstracts across 50K repositories

Use Gene Ontology to partition literature into biological communities for functional analysis

GO same scale as MeSH but adequate coverage?GO light on social behavior (biological process)

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

Interactive Functional AnalysisInteractive Functional AnalysisInteractive Functional AnalysisInteractive Functional AnalysisBeeSpace will enable users to navigate a uniform space of

diverse databases and literature sources for hypothesis development and testing, with a software system that goes beyond a searchable database, using statistical literature analyses to discover functional relationships between genes and behavior.

Genes to BehaviorsBehaviors to GenesConcepts to ConceptsClusters to ClustersNavigation across Sources

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

BeeSpace Information SourcesBeeSpace Information SourcesBeeSpace Information SourcesBeeSpace Information SourcesGeneral for All Spaces:

Scientific Literature-Medline, Biosis, Agricola, Agris, CAB Abstracts-partitioned by organisms and by functions

Model Organisms -Gene Descriptions (FlyBase, WormBase, MGI, SCD, TAIR)

Special Sources for BeeSpace:-Natural History Books (Cornell Library, Harvard Press)

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

XSpace Information SourcesXSpace Information SourcesXSpace Information SourcesXSpace Information SourcesOrganize Genome Databases (XBase)Compute Gene Descriptions from Model OrganismsPartition Scientific Literature for Organism XCompute XSpace using Semantic Indexing Technology

Boost the Functional Analysis from Special SourcesCollecting Useful Data about Natural Historiese.g. CowSpace Leverage in AIPL Databases

University of Illinois at Urbana-ChampaignINSTITUTE FOR GENOMIC BIOLOGY

Beyond BeeSpaceBeyond BeeSpaceBeyond BeeSpaceBeyond BeeSpaceThe Analysis Environment technology is GENERAL!

BirdSpace? BehaviorSpace? BrainSpace? SoySpace? CowSpace? IGBSpace?

BioSpace

Internet will evolve into Interspace…