semantic media wiki open terminology development - initial steps - frank hartel, ph.d. associate...

13
Semantic Media Wiki Open Terminology Development - Initial Steps - Frank Hartel, Ph.D. Associate Director, Enterprise Vocabulary Services National Cancer Institute, Center for Biomedical Informatics 10th Open Forum on Metadata Registries July 10, 2007

Upload: agatha-hart

Post on 20-Jan-2018

226 views

Category:

Documents


0 download

DESCRIPTION

3 Why Open Content Development ? Open content development by members of the community using the terminology has been proven to work, maturing rapidly Gene Ontology (GO) Ontology for Biomedical Investigations (OBI) And many others – see Open Biomedical Ontologies Organization National Center for Biomedical Ontology Advantages Community acceptance, content is relevant to community needs Fast publication cycle Disadvantages Development tools vs representational sophistication Long term viability of purely voluntary efforts

TRANSCRIPT

Page 1: Semantic Media Wiki Open Terminology Development - Initial Steps - Frank Hartel, Ph.D. Associate Director,…

Semantic Media WikiOpen Terminology Development

- Initial Steps -

Frank Hartel, Ph.D.Associate Director, Enterprise Vocabulary ServicesNational Cancer Institute, Center for Biomedical Informatics

10th Open Forum on Metadata RegistriesJuly 10, 2007

Page 2: Semantic Media Wiki Open Terminology Development - Initial Steps - Frank Hartel, Ph.D. Associate Director,…

2

Biomedical Grid Terminology ( BiomedGT© )

• BiomedGT© is a project of the National Cancer Institute• Initially an effort of the NCI Center for Biomedical Informatics and

Information Technology• Transition rapidly to biomedical community partnership

• BiomedGT© features• Open content development by biomedical community volunteers• Federated ontology structure • Infrastructure support by NCI• Free, open license• Coverage of translational research domain• Clear distinction between ontology and thesaurus components

Page 3: Semantic Media Wiki Open Terminology Development - Initial Steps - Frank Hartel, Ph.D. Associate Director,…

3

Why Open Content Development ?

• Open content development by members of the community using the terminology has been proven to work, maturing rapidly• Gene Ontology (GO) http://www.geneontology.org/ • Ontology for Biomedical Investigations (OBI) http://obi.sourceforge.net/

• And many others – see • Open Biomedical Ontologies Organization http://obofoundry.org/ • National Center for Biomedical Ontology http://bioontology.org/

• Advantages• Community acceptance, content is relevant to community needs• Fast publication cycle

• Disadvantages• Development tools vs representational sophistication• Long term viability of purely voluntary efforts

Page 4: Semantic Media Wiki Open Terminology Development - Initial Steps - Frank Hartel, Ph.D. Associate Director,…

4

Why Federation?

• Biomedicine is a huge domain• Coverage in a single ontology isn’t in the cards

• Overwhelms tools, editorial and production techniques• Slows production cycles, runs up costs (partially due to redundant modeling)• Complexity, size overwhelm users

• Federation should help • Ontologies scoped reflect scientific domains

• Scales better to tools, editorial and production techniques • Easier to structure and extent, easier for users to understand• More attractive to subject matter experts (SMEs)• Many fine ontologies exist: reuse should help control costs• Easier to federate (in theory)

• Federation is an active area of research and development • OBO Foundry has one approach • Alan Rector and the Manchester Group has another • W3C Health Care and Life Sciences Special Interest Group (HCLSIG) is

working on another

Page 5: Semantic Media Wiki Open Terminology Development - Initial Steps - Frank Hartel, Ph.D. Associate Director,…

5

Open Content Development in BiomedGT

• Semantic Media Wiki and NCI Protégé/OWL• Wiki supports biomedical SMEs

• Easy to learn tool• Registration for SMEs who wish to edit• View current BiomedGT content (from LexBIG) • Make changes that are stored in Wiki database • Anonymous read only access for others

• Protégé used by ontologists• Renders SME input as OWL DL 1.1

• Protégé Workflow is it the touch point • SME input “harvested” from Wiki database• Incorporated into work assignments to ontologists• Wiki and BiomedGT GForge Project supports SME, ontologist dialog

• Publication via caCORE/LexBIG services• NCBO Portal-based NCI Portal supports download & interactive search/display

Page 6: Semantic Media Wiki Open Terminology Development - Initial Steps - Frank Hartel, Ph.D. Associate Director,…

6

Open Collaboration Infrastructureand Process

QAcaCORE /LexBIG

Protégé/OWL Curation

ProductioncaCORE / LexBIG

NCIOntology

NCIOntology

Subject Matter Expert

Work FlowIntegration

Editing

Consultation

Promotion Of ReleaseCandidate

to QA

Promotion To Production

Wiki DB

EVS Users& SMEs

NCI SemanticMedia Wiki

EVS Users

Terminology Portal

EVS Users

Terminology Portal

Review, critiqueand enhancement

caCORE API

Other Apps& services

Page 7: Semantic Media Wiki Open Terminology Development - Initial Steps - Frank Hartel, Ph.D. Associate Director,…

7

Context of Open Development – NCI Enterprise Vocabulary Services

EVS Product

Em

ergi

ng

Infra

stru

ctur

e O

pera

tions

caCORE 4.0

Infra

stru

ctur

e D

evel

opm

ent

Cur

rent

P

rodu

ctio

n

Ope

ratio

ns TDE

MEME

NCIt Releases

NCI Meta Releases

DTS DTS-RPC

caCORE 3.2

Metaphrase NCI Meta Browser

NCI Term Browser

Semantic Media Wiki

Classification Services NCI BioPortal

caCORE 4.0

LexBIG

Open Content Development NCIt

Releases

NCI Meta Releases

UMLS Meta Releases

Other Terminology

NCI Protégé/OWL

Workflow

caCORE 4.0 NCI BioPortal

Other open ontologies

BiomedGT

Page 8: Semantic Media Wiki Open Terminology Development - Initial Steps - Frank Hartel, Ph.D. Associate Director,…

8

BiomedGT Collaborators and Preparations

• All biomedical communities will be welcome• Initial collaboration is with UK Cancer Grid (Clinical Trials Ontology)• Additional early collaborations within the cancer Biomedical Informatics

Grid • Completion of ongoing development of collaboration infrastructure will

precede large scale open development • Semantic Media Wiki modifications• Protégé Workflow • Integration of Wiki database and workflow tool

• Meanwhile preliminary conversion of NCI Thesaurus data are occurring• Provides structural basis for subject matter domains within BiomedGT• Provides basis to distinguish ontological (class representing real-world

entity/process) from thesaurus classes (class provides browsing/navigation support)

Page 9: Semantic Media Wiki Open Terminology Development - Initial Steps - Frank Hartel, Ph.D. Associate Director,…

9

Initial Construction of BiomedGT

timeline

NCIT Seed BiomedGT Beta

- upper level ontology- initial binning of NCIT top concepts- initial tagging: thesaurus vs ontology

publish to

Media Wikiworklists to

ProtégéCommunity development DL Modeling

baselines . . . . . .

BiomedGT

Page 10: Semantic Media Wiki Open Terminology Development - Initial Steps - Frank Hartel, Ph.D. Associate Director,…

10

Reusing Terminology Components –Federation in Protégé 3.x

owl:imports of bothterminologies intothe Protégé editor

GO

NCIT

Page 11: Semantic Media Wiki Open Terminology Development - Initial Steps - Frank Hartel, Ph.D. Associate Director,…

11

DL Modeling in BiomedGT with Gene Ontology Terms/Concepts

Page 12: Semantic Media Wiki Open Terminology Development - Initial Steps - Frank Hartel, Ph.D. Associate Director,…

12

DL Modeling in BiomedGT with Gene Ontology Terms/Concepts

Page 13: Semantic Media Wiki Open Terminology Development - Initial Steps - Frank Hartel, Ph.D. Associate Director,…

13

Q & A