semantic media wiki open terminology development - initial steps - frank hartel, ph.d. associate...
DESCRIPTION
3 Why Open Content Development ? Open content development by members of the community using the terminology has been proven to work, maturing rapidly Gene Ontology (GO) Ontology for Biomedical Investigations (OBI) And many others – see Open Biomedical Ontologies Organization National Center for Biomedical Ontology Advantages Community acceptance, content is relevant to community needs Fast publication cycle Disadvantages Development tools vs representational sophistication Long term viability of purely voluntary effortsTRANSCRIPT
Semantic Media WikiOpen Terminology Development
- Initial Steps -
Frank Hartel, Ph.D.Associate Director, Enterprise Vocabulary ServicesNational Cancer Institute, Center for Biomedical Informatics
10th Open Forum on Metadata RegistriesJuly 10, 2007
2
Biomedical Grid Terminology ( BiomedGT© )
• BiomedGT© is a project of the National Cancer Institute• Initially an effort of the NCI Center for Biomedical Informatics and
Information Technology• Transition rapidly to biomedical community partnership
• BiomedGT© features• Open content development by biomedical community volunteers• Federated ontology structure • Infrastructure support by NCI• Free, open license• Coverage of translational research domain• Clear distinction between ontology and thesaurus components
3
Why Open Content Development ?
• Open content development by members of the community using the terminology has been proven to work, maturing rapidly• Gene Ontology (GO) http://www.geneontology.org/ • Ontology for Biomedical Investigations (OBI) http://obi.sourceforge.net/
• And many others – see • Open Biomedical Ontologies Organization http://obofoundry.org/ • National Center for Biomedical Ontology http://bioontology.org/
• Advantages• Community acceptance, content is relevant to community needs• Fast publication cycle
• Disadvantages• Development tools vs representational sophistication• Long term viability of purely voluntary efforts
4
Why Federation?
• Biomedicine is a huge domain• Coverage in a single ontology isn’t in the cards
• Overwhelms tools, editorial and production techniques• Slows production cycles, runs up costs (partially due to redundant modeling)• Complexity, size overwhelm users
• Federation should help • Ontologies scoped reflect scientific domains
• Scales better to tools, editorial and production techniques • Easier to structure and extent, easier for users to understand• More attractive to subject matter experts (SMEs)• Many fine ontologies exist: reuse should help control costs• Easier to federate (in theory)
• Federation is an active area of research and development • OBO Foundry has one approach • Alan Rector and the Manchester Group has another • W3C Health Care and Life Sciences Special Interest Group (HCLSIG) is
working on another
5
Open Content Development in BiomedGT
• Semantic Media Wiki and NCI Protégé/OWL• Wiki supports biomedical SMEs
• Easy to learn tool• Registration for SMEs who wish to edit• View current BiomedGT content (from LexBIG) • Make changes that are stored in Wiki database • Anonymous read only access for others
• Protégé used by ontologists• Renders SME input as OWL DL 1.1
• Protégé Workflow is it the touch point • SME input “harvested” from Wiki database• Incorporated into work assignments to ontologists• Wiki and BiomedGT GForge Project supports SME, ontologist dialog
• Publication via caCORE/LexBIG services• NCBO Portal-based NCI Portal supports download & interactive search/display
6
Open Collaboration Infrastructureand Process
QAcaCORE /LexBIG
Protégé/OWL Curation
ProductioncaCORE / LexBIG
NCIOntology
NCIOntology
Subject Matter Expert
Work FlowIntegration
Editing
Consultation
Promotion Of ReleaseCandidate
to QA
Promotion To Production
Wiki DB
EVS Users& SMEs
NCI SemanticMedia Wiki
EVS Users
Terminology Portal
EVS Users
Terminology Portal
Review, critiqueand enhancement
caCORE API
Other Apps& services
7
Context of Open Development – NCI Enterprise Vocabulary Services
EVS Product
Em
ergi
ng
Infra
stru
ctur
e O
pera
tions
caCORE 4.0
Infra
stru
ctur
e D
evel
opm
ent
Cur
rent
P
rodu
ctio
n
Ope
ratio
ns TDE
MEME
NCIt Releases
NCI Meta Releases
DTS DTS-RPC
caCORE 3.2
Metaphrase NCI Meta Browser
NCI Term Browser
Semantic Media Wiki
Classification Services NCI BioPortal
caCORE 4.0
LexBIG
Open Content Development NCIt
Releases
NCI Meta Releases
UMLS Meta Releases
Other Terminology
NCI Protégé/OWL
Workflow
caCORE 4.0 NCI BioPortal
Other open ontologies
BiomedGT
8
BiomedGT Collaborators and Preparations
• All biomedical communities will be welcome• Initial collaboration is with UK Cancer Grid (Clinical Trials Ontology)• Additional early collaborations within the cancer Biomedical Informatics
Grid • Completion of ongoing development of collaboration infrastructure will
precede large scale open development • Semantic Media Wiki modifications• Protégé Workflow • Integration of Wiki database and workflow tool
• Meanwhile preliminary conversion of NCI Thesaurus data are occurring• Provides structural basis for subject matter domains within BiomedGT• Provides basis to distinguish ontological (class representing real-world
entity/process) from thesaurus classes (class provides browsing/navigation support)
9
Initial Construction of BiomedGT
timeline
NCIT Seed BiomedGT Beta
- upper level ontology- initial binning of NCIT top concepts- initial tagging: thesaurus vs ontology
publish to
Media Wikiworklists to
ProtégéCommunity development DL Modeling
baselines . . . . . .
BiomedGT
10
Reusing Terminology Components –Federation in Protégé 3.x
owl:imports of bothterminologies intothe Protégé editor
GO
NCIT
11
DL Modeling in BiomedGT with Gene Ontology Terms/Concepts
12
DL Modeling in BiomedGT with Gene Ontology Terms/Concepts