analysis environments for scientific communities from bases to spaces bruce r. schatz institute for...

60
Analysis Environments Analysis Environments For Scientific Communities For Scientific Communities From Bases to Spaces From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana- Champaign [email protected],www.beespace.uiu c.edu Baker Center for Bioinformatics Iowa State University October 6, 2006

Upload: oswin-carter

Post on 11-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

Analysis EnvironmentsAnalysis Environments For Scientific CommunitiesFor Scientific Communities

From Bases to SpacesFrom Bases to Spaces

Bruce R. SchatzInstitute for Genomic Biology

University of Illinois at [email protected],www.beespace.uiuc.edu

Baker Center for BioinformaticsIowa State University

October 6, 2006

Page 2: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

What are Analysis EnvironmentsWhat are Analysis Environments

Functional Analysis Find the underlying Mechanisms Of Genes, Behaviors, Diseases

Comparative Analysis Top-down data mining (vs Bottom-up) Multiple Sources especially literature

Page 3: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

Building Analysis EnvironmentsBuilding Analysis Environments

Manual by Humans Interaction user navigation Classification collection indexing

Automatic by Computers Federation search bridges Integration results links

Page 4: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

Trends in Analysis EnvironmentsTrends in Analysis Environments

Central versus Distributed Viewpoints

The 90s Pre-Genome Entrez (NIH NCBI) versus WCS (NSF Arizona)

The 00s Post-Genome GO (NIH curators) versus BeeSpace (NSF Illinois)

Page 5: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

Pre-Genome EnvironmentsPre-Genome Environments

Focused on Syntax pre-Web

WCS (Worm Community System) Search words across sources Follow links across sources Words automatic, Links manual

Towards Integrated Searching

Page 6: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

Post-Genome EnvironmentsPost-Genome Environments

Focused on Semantics post-Web

BeeSpace (Honey Bee Inter Space) Navigate concepts across sources Integrate data across sources Concepts automatic, Links automatic

Towards Conceptual Navigation

Page 7: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

Worm Community SystemWorm Community System WCS Information:Literature BIOSIS, MEDLINE, newsletters,

meetings

Data Genes, Maps, Sequences, strains, cells

WCS FunctionalityBrowsing search, navigationFiltering selection, analysisSharing linking, publishing

WCS: 250 users at 50 labs across Internet (1991)

Page 8: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

WCSMolecular

Page 9: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

WCS Cellular

Page 10: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

WCS invokes

gm

Page 11: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

WCS vis-à-vis

acedb

Page 12: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

from Objects to Concepts

from Syntax to Semantics

Infrastructure is Interaction with Abstraction

Internet is packet transmission across computers

Interspace is concept navigation across repositories

Towards the InterspaceTowards the Interspace

Page 13: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

THE THIRD WAVE OF NET EVOLUTIONTHE THIRD WAVE OF NET EVOLUTION

PACKETS

OBJECTS

CONCEPTS

Page 14: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

Technology

Engineering

Electrical

FORMAL

INFORMAL

(manual)

(automatic)

IEEE

communities

groups

individuals

LEVELS OF INDEXESLEVELS OF INDEXES

Page 15: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

Post-Genome Informatics IPost-Genome Informatics I

Comparative Analysis within theDry Lab of Biological Knowledge

Classical Organisms have Genetic Descriptions.There will be NO more classical organisms beyondMice and Men, Worms and Flies, Yeasts and Weeds.

Must use comparative genomics on classical organismsVia sequence homologies and literature analysis.

Page 16: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

Post-Genome Informatics IIPost-Genome Informatics II

Functional Analysis within theDry Lab of Biological Knowledge

Automatic annotation of genes to standard classifications, e.g. Gene Ontology via homology on computed protein sequences.

Automatic analysis of functions to scientific literature, e.g. concept spaces via text extractions. Thus must use functions in literature descriptions.

Page 17: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

Informatics: From Bases to SpacesInformatics: From Bases to Spaces

data Bases support genome datae.g. FlyBase has sequences and mapsGenes annotated by GeneOntology and

linked to biological literature

information Spaces support biological literaturee.g. BeeSpace uses automatically generated conceptual relationships to navigate functions

Page 18: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

BeeSpace FIBR ProjectBeeSpace FIBR Project

BeeSpace project is NSF FIBR flagshipFrontiers Integrative Biological Research, $5M for 5 years at University of Illinois

Analyzing Nature and Nurture in Societal Roles using honey bee as model

(Functional Analysis of Social Behavior)

Genomic technologies in wet lab and dry lab BeeBee [Biology] gene expressions SpaceSpace [Informatics] concept navigations

Page 19: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign
Page 20: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

System ArchitectureSystem Architecture

Page 21: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

Concept Navigation in BeeSpaceConcept Navigation in BeeSpace

NeuroscienceLiterature

MolecularBiology

Literature

BeeLiterature

Flybase,WormBase

BeeGenome

Brain RegionLocalization

Brain GeneExpression

Profiles

BehavioralBiologist

MolecularBiologist

Neuro-scientist

Page 22: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

V1 BeeSpace Community CollectionsV1 BeeSpace Community Collections

Organism Honey Bee / Fruit Fly Song Bird / Soy Bean

Behavior Social / Territorial Foraging / Nesting

Development Behavioral Maturation Insect Development Insect Communication

 Structure Fly Genetics / Fly Biochemistry Fly Physiology / Insect Neurophysiology

Page 23: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

CONCEPT SWITCHINGCONCEPT SWITCHING

“Concept” versus “Term” set of “semantically” equivalent terms

Concept switching region to region (set to set) match

term

Semantic region

Concept SpaceConcept Space

Page 24: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign
Page 25: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign
Page 26: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign
Page 27: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign
Page 28: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign
Page 29: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

BeeSpace Analysis EnvironmentBeeSpace Analysis Environment Build Concept Space of Biomedical Literature

for Functional Analysis of Bee Genes

-Partition Literature into Community Collections-Extract and Index Concepts within Collections-Navigate Concepts within Documents-Follow Links from Documents into Databases

Locate Candidate Genes in Related Literatures then follow links into Genome Databases

Page 30: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

Well Characterized GeneWell Characterized Gene

Page 31: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

Poorly Characterized GenePoorly Characterized Gene

Page 32: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

Gene Summarization, BeeSpace V2

Page 33: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign
Page 34: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

Collaboration across UsersCollaboration across Users

Page 35: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign
Page 36: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign
Page 37: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign
Page 38: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign
Page 39: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign
Page 40: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign
Page 41: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

Category Browse (Collection)Category Browse (Collection)

Page 42: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign
Page 43: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign
Page 44: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

Category Browse (Search)Category Browse (Search)

Page 45: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign
Page 46: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign
Page 47: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

PlantSpace ExamplesPlantSpace Examples

Page 48: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign
Page 49: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign
Page 50: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign
Page 51: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign
Page 52: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign
Page 53: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign
Page 54: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign
Page 55: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign
Page 56: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

Interactive Functional AnalysisInteractive Functional AnalysisBeeSpace will enable users to navigate a uniform space of

diverse databases and literature sources for hypothesis development and testing, with a software system beyond a searchable database, using literature analyses to discover functional relationships between genes and behavior.

Genes to BehaviorsBehaviors to GenesConcepts to ConceptsClusters to ClustersNavigation across Sources

Page 57: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

BeeSpace Information SourcesBeeSpace Information Sources

General for All Spaces: Scientific Literature-Medline, Biosis, CAB Abstracts Genome Databases-GenBank, ProteinDataBank, ArrayExpress

Special for BeeSpace: Model Organisms (heredity)-Gene Descriptions (FlyBase, WormBase) Natural Histories (environment)-BeeKeeping Books (Cornell, Harvard)

Page 58: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

XSpace Information SourcesXSpace Information SourcesOrganize Genome Databases (XBase)Compute Gene Descriptions from Model OrganismsPartition Scientific Literature for Organism XCompute XSpace using Semantic Indexing

Boost the Functional Analysis from Special SourcesCollecting Useful Data about Natural Historiese.g. CowSpace Leverage in AIPL Databases

Page 59: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

Towards SoySpaceTowards SoySpace Organize Genome Databases (SoyBase) Partition Scientific Literature for SoyBean Gene Descriptions from Models (TAIR) Natural Histories from Population Databases

Key to Functional Analysis is Special Sources Collecting Appropriate Text about Genes Extracting Adequate Data about Histories Leverage is National Archives of germplasm

and Historical Records for soybean crops

Page 60: Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign

Towards the InterspaceTowards the Interspace

The Analysis Environment technology is GENERAL!

BirdSpace? BeeSpace?PigSpace? CowSpace? BehaviorSpace? BrainSpace?SoySpace? PlantSpace?

BioSpace… Interspace