nif vocabulary server
DESCRIPTION
NIF Vocabulary Server. Maryann Martone, Ph. D. NIF Technical Team. Perry Miller, Yale Luis Marenco, Yale Yuli Li, Yale Arun Rangarajun, Cal Tech Hans-Michael Muller, Cal Tech Sredevi Polavarum, George Mason Jeff Grethe, UCSD Brian Sanders, UCSD Vadim Astakhov, UCSD Amarnath Gupta, UCSD - PowerPoint PPT PresentationTRANSCRIPT
NIF Vocabulary Server
Maryann Martone, Ph. D.
NIF Technical Team
Perry Miller, Yale Luis Marenco, Yale Yuli Li, Yale Arun Rangarajun, Cal Tech Hans-Michael Muller, Cal Tech Sredevi Polavarum, George Mason Jeff Grethe, UCSD Brian Sanders, UCSD Vadim Astakhov, UCSD Amarnath Gupta, UCSD Xufei Qian, UCSD Bill Bug, UCSD Maryann Martone, UCSD
Basic Architecture
•The same architecture and workflow applies to the registration process
Role of NIF Terminologies
NIF terminologies provide a shared vocabulary for annotation of neuroscience data
NIF terminologies provide the shared semantics for accessing resources and data through the NIF interface Semantic enrichment of terms to enable more targeted
and meaningful queries
Ultimately, NIF terminologies are critical for data and database interoperability
Building the NIF Terminologies
NIF Basic: Daniel Gardner held a series of workshops with
neuroscientists to obtain sets of terms that are useful for neuroscientists
NIFSTD (NIF Standardized) Bill Bug built a set of expanded vocabularies using the
structure of the BIRNLex and the import of existing terminological resources
Provides enhanced coverage of domains in NIF Basic Provides coverage of domains not included in NIF but covered by
existing resources, e.g., molecules Encoded in OWL/RDF Provides mapping to source terminologies, including NIF Basic Provides synonyms, lexical variants, abbreviations
Registering a Resource to NIF
Level 1 NIF Registry: high level descriptions from NIF vocabularies
supplied by human curators
Level 2*** Discovery mechanism for hidden content (Disco or
SiteMaps.org)
Level 3 Direct query of web accessible database Automated registration Mapping of database content to NIF vocabulary by human
***Not yet implemented
Level 1 Registration
•Sites are entered by curators
•Annotation with NIF basic vocabulary + free text
•May be searched with NIFSTD terms
Level 2 Registration
Automated or semi-automated discovery and indexing of web sites Index of web sites registered to NIF registry Web content is indexed against the NIFSTD
vocabularies Discovery mechanism planned (Luis)
XML will utilize NIFSTD
Level 3: NIF Data Federation• Allows deep query of database content through a single interface
•Limited number of resources registered for Phase 2: proof of concept and demonstration of deep search via database mediation
•Registration process:
•Create wrapper to allow remote NIF mediator query
•Map content to NIFSTD
•Semi-automatic process based on high-level mapping of fields and data values:
•e.g., SumsDB geography maps to NIFSTD regional part of brain
Mapping to Level 3: Concept Mapping Tool
•Java webstart application
•Retrieves database schema + data from mediator registry
•Maps data to NIFSTD values
•Provides term mapping to mediator Term Index Source (TIS)
Why is this done by a human at the moment? •Abbreviations,
ambiguous terms, non-standard names, e.g.,
•LPF: (**if this is mapped as an abbreviation to NIFSTD, then it wouldn’t be a problem)
•Anterior cingulate: Gyrus? Sulcus?
•Frontal subgyral =frontal subgyral white matter?
Your definition-My definition?
Hippocampus (SUMS)= hippocampus (NIFSTD)?
•can’t tell just by the string; must look at the definition
BIRNLex ComponentsBIRNLex
Common AnatomyReferece Ontology
(CARO)
PhenotypicQualities (PATO)
SubcellularAnatomy Ontology
OBOCell Type
NIFNerve Cell
OBI
NIFMolecule
OBOSequence
Organism Taxonomy
SensoryBehaviorCognition
Disease
Investigation
Anatomy
Building NIFSTD OBO Foundry principles and best practices NIFSTD is built from a set of modular ontologies
Anatomy: Neuronames (via BIRNLex) Taxonomy: NCBI taxonomy (via BIRNLex) Molecule: IUPHAR + PDPS Ki + SwissProt (neuro) Cell: NIF (Senselab, Neuromorpho, CCDB) Subcellular anatomy: GO + SAO Disease: MESH/UMLS + NINDS + OMIM (neuro) Resource descriptors: NIF, NITRC, NCBC, OBI Technique: NIF + Ontology for Biomedical Investigation (OBI) Behavior: NIF, BIRN, BrainMap Attributes: PATO
Each is mapped to a unique identifier Single inheritance with minimal assignment of properties Each file is imported separately, but integrated through the Basic Formal Ontology into a single
vocabulary
Imported using manual, semi-automated and automated means Degree of intervention dependent on the vocabulary At this point, large degree of manual intervention is often necessary Link back to source ID is maintained
Encoded in OWL/RDF
Adding to and amending the BIRN lexically-enhanced ontology
Batch modifications (alpha)
yes
no
prefLabelsynonymabbrevacronymtax scientific nametax common nameGENBANK common nameNCBI BLAST nameantiquated labelmisspellingIMSR standard name
Batch modification example
IUPHAR V-gated Ion Channels (NIF)Row = class
col = related property(annotations & objects)
Parent prop: required to place in BIRNLex hierarchy
Batch modification example
IUPHAR V-gated Ion Channels (NIF)
Citations & Mappings
Maintain link back to external knowledge source
For terms/concepts and for definitions
Mappings provide parsable representation of cross terminology synonymies
Citations & Mappings External IDs
Generic externalSourceId
Specific (for common sources) Neuroanatomy: neuronamesID/bamsID Organism taxonomy: ncbiTaxID/itisID/gbifID/jaxMiceID/tacMiceID Cells/Tissue: atccID Disease: UmlsCui/MeSH
URL templates Use IDs to link to external source URL references (when available) automatically add ref links in to tools using BIRNLex - TIS, BONFIRE, etc.
Definition citations as well including URIs & publication references
Use Case: Cell Types
Existing cell type ontology, but poor coverage of neuronal cells and generally agreed by the community to be “problemmatical”
Senselab, CCDB, NeuroMorpho.org, NIF collated cell type terminologies
Produced master list on Excel spreadsheet with defined properties Neurotransmitter, anatomical location, morphology, molecular
constituent, circuit type
Using Jena code written by BB, imported contents directly into Protégé OWL, matching strings against existing content, e.g., anatomy, molecules
Cerebellar Granule Cell
Purkinje Cell
Photoreceptor Cell
Chandelier Cell
Cerebellar Basket Cell
Double Bouquet Cell
Globular Bushy Cell
Medium Spiny Cell
Pyramidal Cell
Dentate Gyrus Granule Cell
Olfactory Granule Cell
Cortical Spiny Stellate Cell
Neuron
is-a
is-a
is-a
is-a
is-a
is-a
is-a
is-a
is-a
is-a
is-a
is-a
Cerebellar Granule Cell
Purkinje Cell
Photoreceptor Cell
Chandelier Cell
Cerebellar Basket Cell
Double Bouquet Cell
Globular Bushy Cell
Medium Spiny Cell
Pyramidal Cell
Dentate Gyrus Granule Cell
Olfactory Granule Cell
GABAergic Neuron
Spiny Cell
Granule Cell
Glutamatergic Neuron
Cortical Spiny Stellate Cell
Neuron
is-ais-a
is-a
is-a
is-ais-a
is-a
is-a
is-a
is-a
is-a
is-a
is-a
is-a
is-a
is-a
is-a
is-a
is-a
is-a
is-a
is-a
Maintaining NIFSTD
Maintenance of NIFSTD at this point will probably require the use of a human curator, although several of the functions can be automated
Community can contribute to NIF Basic; human curator will be needed to migrate much of the content to NIFSTD
Availability of NIFSTD
NIFSTD OWL file available from http://purl.org/nif/ontology/nif.owl
NIFSTD available through Bonfire (1 and 2) for programmatic access
Bonfire NIF vocabularies are served by the vocabulary server built
by BIRN: Bonfire Oracle database Cross mappings between different vocabularies Basic graph queries (neighborhood, shortest path) Web services were developed for NIF Based on the structure of UMLS User interface for graph visualization and queries (not planned
for NIF delivery)
Bonfire 2 Optimized for NIF vocabularies Postgres RDMS + ontology access functions that we have built
e.g., Given a term, produce its ancestry graph by following the edge-label(subclass-of OR part-of)
App. Configuration
Fed. DB Registry
NIF Application ArchitectureFor OntoQuest (Bonfire 2)
Application Logic
OntologyDatabase
OntoQuest
LuceneIndex
XML NIF Registry
Neuroscience Web sites
ExternalDatabase-1External
Database-1ExternalDatabase-1
DocsDocs
Docs
Text Engine
Ontologies
External web sites
Web Client
Term Mapper and Indexer
What’s next
NIFSTD: Comprehensive “is a” hierarchy, but relations sparse - e.g., “part of”, “binds ligand”, “sequence of”, etc.
Continue to build pipeline from loosely structured to formal ontology
Continue to add domains Add relationships and definitions Generate additional hierarchies Incorporate more of the semantics into the NIF search
Evolution of Terminologies
•NIF Basic vocabulary
•Contributed by panels of experts
•Coarse granularity but broad coverage
•Loose hierarchy
•XML
•NIF STD
•Imports existing terminologies developed by other communities
•Modular design
•Normalizes structure according to Basic Formal Ontology (BFO) Creates single inheritance “is a” tree
•Provides mapping between NIF and NIFSTD
•Provides synonyms, abbreviations and lexical variants
•OWL/RDF
•NIF Plus
•Relates classes through “part of” and other OBO relations
•Consistent human and machine-readable definitions
NIF Phase I and II
Current Status and Future Work
Prototype interfaces built upon Bonfire I and II NIFSTD 1.0 in Bonfire 1 NIFSTD 1.1 in Bonfire 2
Will update Bonfire 1 content after this demonstration Implementation and testing of vocabulary services using Bonfire 2
Better use of lexical variants, synonyms etc.
Mapping of NIF Registry and NIF data federation with NIFSTD All resources registered will mark up more content
Coverage of behavior (sensory, motor) and behavioral assessments will be added
More lexical variants will be used in searches Improved access to annotation properties through Concept Mapper