1 developing ontologies (and more) peter fox (ncar) esip winter meeting (tiwg) january 9, 2008,...
TRANSCRIPT
1
Developing Ontologies(and more)
Peter Fox (NCAR)
ESIP Winter Meeting (TIWG)
January 9, 2008, Washington, D.C.
2
Ontology Spectrum
Catalog/ID
SelectedLogical
Constraints(disjointness,
inverse, …)
Terms/glossary
Thesauri“narrower
term”relation
Formalis-a
Frames(properties)
Informalis-a
Formalinstance Value
Restrs.
GeneralLogical
constraints
Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty; – updated by McGuinness.Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html
3
Ontology - declarative knowledge• The triple: {subject-predicate-object}
interferometer is-a optical instrument
Fabry-Perot is-a interferometer
Optical instrument has focal length
Optical instrument is-a instrument
Instrument has instrument operating mode
Data archive has measured parameter
SO2 concentration is-a concentration
Concentration is-a parameter
4
Semantic Web Layers
http://www.w3.org/2003/Talks/1023-iswc-tbl/slide26-0.html, http://flickr.com/photos/pshab/291147522/
5
Terminology• Ontology (n.d.). The Free On-line Dictionary of Computing.
http://dictionary.reference.com/browse/ontology– An explicitformal specification of how to represent the objects,
conceptsand other entities that are assumed to exist in some area ofinterest and the relationships that hold among them.
• Semantic Web– An extension of the current web in which information is given well-
defined meaning, better enabling computers and people to work in cooperation, www.semanticweb.org
– Primer: http://www.ics.forth.gr/isl/swprimer/ • Languages
– OWL 1.0 (Lite, DL, Full) - Web Ontology Language (W3C)– RDF - Resource Description Framework (W3C)– OWL-S/SWSL - Web Services (W3C)– WSMO/WSML - Web Services (EC/W3C)– SWRL - Semantic Web Rule Language, RIF- Rules Interchange Format– Editors: Protégé, SWOOP, CoE, VOM, Medius, SWeDE, …
6
OWL and RDF• OWL
– Lite– DL– Full
• RDF• Services
– OWL-S – SWSL– WSML– SAWSDL - (WSDL-S)
• Rules– SWRL
7
Developing Ontologies• Approach:
– Bottom-up– Top-down (upper-level or foundational)– Mid-level (use case)
• Using tools
• Coding and testing
• Iterating
• Maintaining and evolving (curation, preservation)
8
GRDDL - bottom up• GRDDL - Gleaning Resource Descriptions
from Dialects of Languages
• Pretty much = “XML/XHTML (for e.g.) into RDF via XSLT”
• Good support, e.g. Jena
• Handles microformats
• Active community
• How to categorize, use, re-use (parts of)?
9
Collecting• RDFa extends XHTML by:
– extending the link and meta to include child elements
– add metadata to any elements (a bit like the class in micro-formats, but via dedicated properties)
– It is very similar to micro-formats, but with more rigor:
• it is a general framework (instead of an “agreement” on the meaning of, say, a class attribute value)
• terminologies can be mixed more easily
• ATOM (used with RSS)
10
Foundational Ontologies
CONTENTS
General concepts and relations that apply in all domainsphysical object, process, event,…, inheres, participates,…
Rigorously definedformal logic, philosophical principles, highly structured
ExamplesDOLCE, BFO, GFO, SUMO, CYC, (Sowa)
Courtesy: Boyan Brodaric
11
Foundational Ontologies
PURPOSE: help integrate domain ontologies
Geophysics ontology
Marine ontology
Water ontology
Planetary ontology
Geology ontology
Struc ontology
Rock ontology
“…and then there was one…”
Foundational ontology
Courtesy: Boyan Brodaric
12
Foundational Ontologies
PURPOSE: help organize domain ontologies
“…a place for everything, and everything in its place…”
Foundational ontology
shale rock formation
lithification
Courtesy: Boyan Brodaric
13
Problem scenario
Little work done on linking foundational ontologies with geoscience ontologies
Such linkage might benefit various scenarios requiring cross-disciplinary knowledge, e.g.:
water budgets: groundwater (geology) and surface water (hydro)
hazards risk: hazard potential (geology, geophysics) and items at threat (infrastructure, people, environment, economic)
health: toxic substances (geochemistry) and people, wildlife
many others…
Courtesy: Boyan Brodaric
14DOLCE
15
DOLCE + SWEETDOLCE = SWEET < SWEET
Physical-body BodyofGround, BodyofWater,…
Material-Artifact Infrastructure, Dam, Product,…
Physical-Object LivingThing, MarineAnimal
Amount-of-Matter Substance
Activity HumanActivity
Physical-Phenomenon Phenomena
Process Process
State StateOfMatter
Quality Quantity, Moisture,…
Physical-Region Basalt,…
Temporal-Region Ordovician,…
Benefitsfull coverage
rich relations
home for orphans
single superclasses
Issuesindividuals (e.g. Planet Earth)
roles (contaminant)
features (SeaFloor)
Courtesy: Boyan Brodaric
16
Conclusions
Surprisingly good fit amongst ontologiesso far: no show-stopper conflicts, a few difficult conflicts
DOLCE richness benefits geoscience ontologiesgood conceptual foundation helps clear some existing problems
Unresolved issues in modeling science entitiesmodeling classifications, interpretations, theories, models,…
Courtesy: Boyan Brodaric
Same procedure with GeoSciML
17
• Physical • Object
• SelfConnectedObject • ContinuousObject • CorpuscularObject • Collection
• Process • Abstract
• SetClass • Relation
• Proposition • Quantity
• Number • PhysicalQuantity
• Attribute
SUMO - Standard Upper Merged Ontology
18
19
20
Using SNAP/ SPAN
21
GeoSciOnt?
22
23
Using SWEET• Plug-in (import) domain detailed modules
• Lots of classes, few relations (properties)
24
Mix-n-Match• The IRI example:
– Collect a lot of different ontologies representing different terms, levels of concepts, etc. into a base form: RDF
– See Benno’s talk in session 1b.
• MMI
• Others
25
CF attributes
SWEET Ontologies(OWL)
Search Terms
CF Standard Names(RDF object)
IRIDL Terms
NC basic attributes
IRIDLattributes/objects
SWEET as Terms
CF Standard NamesAs Terms
Gazetteer Terms
CF data objects
Location
Blumenthal
26
Data ServersOntologies
MMI
JPL
StandardsOrganizations
Start Point
RDF Crawler
RDFS SemanticsOwl SemanticsSWRL Rules
SeRQL CONSTRUCT
Search Queries
LocationCanonicalizer
TimeCanonicalizer
Sesame
Search Interface
bibliography
IRI RDF Architecture
Blumenthal
27
Mid-Level: Developing ontologies• Use cases and small team (7-8; 2-3 domain experts, 2
knowledge experts, 1 software engineer, 1 facilitator, 1 scribe)
• Identify classes and properties (leverage controlled vocab.)– Start with narrower terms, generalize when needed or
possible– Adopt a suitable conceptual decomposition (e.g. SWEET) – Import modules when concepts are orthogonal
• Review, vet, publish • Only code them (in RDF or OWL) when needed
(CMAP, …)• Ontologies: small and modular
28
Use Case example• Plot the neutral temperature from the Millstone-Hill
Fabry Perot, operating in the vertical mode during January 2000 as a time series.
• Plot the neutral temperature from the Millstone-Hill Fabry Perot, operating in the vertical mode during January 2000 as a time series.
• Objects: – Neutral temperature is a (temperature is a) parameter– Millstone Hill is a (ground-based observatory is a) observatory– Fabry-Perot is a interferometer is a optical instrument is a instrument– Vertical mode is a instrument operating mode– January 2000 is a date-time range– Time is a independent variable/ coordinate– Time series is a data plot is a data product
29
Class and property example• Parameter
– Has coordinates (independent variables)
• Observatory– Operates instruments
• Instrument– Has operating mode
• Instrument operating mode– Has measured parameters
• Date-time interval• Data product
30
31
32
33
Higher level use case• Find data which represents the state of the
neutral atmosphere above 100km, toward the arctic circle at any time of high geomagnetic activity
• Find data which represents the state of the neutral atmosphere above 100km, toward the arctic circle at any time of high geomagnetic activity
34
Translating the Use-Case - non-monotonic?
Input
Physical properties: State of neutral atmosphere
Spatial:
• Above 100km
• Toward arctic circle (above 45N)
Conditions:
• High geomagnetic activity
Action: Return Data
Specification needed for query to CEDARWEB
Instrument
Parameter(s)
Operating Mode
Observatory
Date/time
Return-type: data
GeoMagneticActivity has ProxyRepresentation
GeophysicalIndex is a ProxyRepresentation (in Realm of Neutral Atmosphere)
Kp is a GeophysicalIndex hasTemporalDomain: “daily”
hasHighThreshold: xsd_number = 8
Date/time when KP => 8
35
Translating the Use-Case - ctd.
Input
Physical properties: State of neutral atmosphere
Spatial:
Above 100km
Toward arctic circle (above 45N)
Conditions:
High geomagnetic activity
Action: Return Data
Specification needed for query to CEDARWEB
Instrument
Parameter(s)
Operating Mode
Observatory
Date/time
Return-type: data
NeutralAtmosphere is a subRealm of TerrestrialAtmosphere
hasPhysicalProperties: NeutralTemperature, Neutral Wind, etc.
hasSpatialDomain: [0,360],[0,180],[100,150]
hasTemporalDomain:
NeutralTemperature is a Temperature (which) is a Parameter
FabryPerotInterferometer is a Interferometer, (which) is a Optical Instrument (which) is a Instrument
hasFilterCentralWavelength: Wavelength
hasLowerBoundFormationHeight: Height
ArcticCircle is a GeographicRegion
hasLatitudeBoundary:
hasLatitudeUpperBoundary:
GeoMagneticActivity has ProxyRepresentation
GeophysicalIndex is a ProxyRepresentation (in Realm of Neutral Atmosphere)
Kp is a GeophysicalIndex hasTemporalDomain: “daily”
hasHighThreshold: xsd_number = 8
Date/time when KP => 8
36
Tools - Using Protégé
37
Creating Ontologies - visual
• UML - new release of ODM/MOF– Ontology Definition Metamodel/Meta Object
Facility (OMG) for UML– Provides standardized notation
• CMAP Ontology Editor (concept mapping tool from IHMC)– Drag/drop visual development of classes,
subclass (is-a) and property relationship– Read and writes OWL– Formal convention (OWL/RDF tags, etc.)
• White board, text file
38
Using CMAP/COE
39
40
Is OWL the only option? No…• SKOS - Simple Knowledge Organization
Scheme• Annotations (RDFa)• Atom• Natural Language (read results from a web
search and transform to a usable form)– CL (common logic)– Rabbit, e.g. ShellfishCourse is a Meal Course
that (if has drink) always has drink Potable Liquid that has Full body and which either has Moderate or Strong flavour
– PENG (processable English)
41
Is OWL the only option II? No…• Natural Language (NL)
– Read results from a web search and transform to a usable form
– Find/filter out inconsistencies, concepts/relations that cannot be represented
• Popular options– CLCE (common logic controlled english)– Rabbit, e.g. ShellfishCourse is a Meal Course that (if has
drink) always has drink Potable Liquid that has Full body and which either has Moderate or Strong flavour
– PENG (processable English)
• Really need PSCI - process-able science
42
Creating Ontologies - verbal
• Translating use cases
• E.g. Find data which represents the state of the neutral atmosphere above 100km, toward the arctic circle at any time of high geomagnetic activity
• Can this be expressed as an ontology?– CLCE, Rabbit, PENG, Sydney syntax
• Notice something about the next examples?
43
Sydney syntax
If X has Y as a father then Y is the only father of X.
The class person is equivalent to male or female, and male and female are mutually exclusive.
equivalent toThe classes male and female are
mutually exclusive. The class person is fully defined as anything that is a male or a female.
44
PENG - Processible English
1. If X is a research programmer then X is a programmer.
2. Bill Smith is a research programmer who works at the CLT.
3. Who is a programmer and works at the CLT?
45
CLCE - Common Logic Controlled English
CLCE: If a set x is the set of (a cat, a dog, and an elephant), then the cat is an element of x, the dog is an element of x, and the elephant is an element of x.
PC:~(∃x:Set)(∃x1:Cat)(∃x2:Dog)(∃x3:Elephant)(Set(x,x1,x2,x3) ∧ ~(x1∈x ∧ x2∈x ∧ x3∈x))
46
Use Case• Provide a decision support capability for an
analyst to determine an individual’s susceptibility to avian flu without having to be precise in terminology (-nyms)
47
48
49
Using ThManager
50
Services• Ontologies of services, provides:
– What does the service provide for prospective clients? The answer to this question is given in the "profile," which is used to advertise the service. To capture this perspective, each instance of the class Service presents a ServiceProfile.
– How is it used? The answer to this question is given in the "process model." This perspective is captured by the ServiceModel class. Instances of the class Service use the property describedBy to refer to the service's ServiceModel.
– How does one interact with it? The answer to this question is given in the "grounding." A grounding provides the needed details about transport protocols. Instances of the class Service have a supports property referring to a ServiceGrounding.
51
Developing a service ontology• Use case: find and display in the same projection,
sea surface temperature and land surface temperature from a global climate model.
• Find and display in the same projection, sea surface temperature and land surface temperature from a global climate model.
• Classes/ concepts: – Temperature– Surface (sea/ land)– Model– Climate– Global– Projection– Display …
52
Service ontology• Climate model is a model• Model has domain• Climate Model has component representation• Land surface is-a component representation• Ocean is-a component representation• Sea surface is part of ocean• Model has spatial representation (and temporal)• Spatial representation has dimensions• Latitude-longitude is a horizontal spatial representation• Displaced pole is a horizontal spatial representation• Ocean model has displaced pole representation• Land surface model has latitude-longitude representation• Lambert conformal is a geographic spatial representation• Reprojection is a transform between spatial representation• ….
53
Service ontology• A sea surface model has grid representation displaced pole
and land surface model has grid representation latitude-longitude and both must be transformed to Lambert conformal for display
54
Best practices• Ontologies/ vocabularies must be shared and
reused - swoogle.umbc.edu, www.planetont.org• Examine ‘core vocabularies’ to start with
– SKOS Core: about knowledge systems– Dublin Core: about information resources, digital libraries,
with extensions for rights, permissions, digital right management
– FOAF: about people and their organizations – DOAP: on the descriptions of software projects– DOLCE seems the most promising to match science
ontologies
• Go “Lite” as much as possible, then DL and only if you have to Full - balancing expressibility vs. implementability
• Minimal properties to start, add only when needed
55
Tutorial Summary• Many different options for ontology
development and encoding
• Tools are in reasonable shape, no killer-tool
• Best practices DO exist– PLEASE DO NOT just start coding OWL!
• Use case should drive the functional requirements of both your ontology and how you will ‘build’ one
• PARTNER with someone already familiar
56
More information• OWL-S - http://www.w3.org/Submission/OWL-S• SWSO/F/L - Semantic Web Services Ontology/Framework/Language -
http://www.w3.org/Submission/SWSF/ • WSMO/X/L - Web Services Modeling Ontology/Exection/Language -
http://www.w3.org/Submission/WSMX/ www.wsmo.org, www.wsmx.org• SAWSDL - (WSDL-S)
57
Other tools
• Reasoners– Pellet, Racer, Medius KBS, FACT++, fuzzyDL, KAON2,
MSPASS, QuOnto
• Query Languages– SPARQL, XQUERY, SeRQL, OWL-QL, RDFQuery
• Other Tools for Semantic Web– Search: SWOOGLE swoogle.umbc.edu– Collaboration: www.planetont.org– Other: Jena, SeSAME/SAIL, Mulgara, Eclipse, KOWARI– Semantic wiki: OntoWiki, SemanticMediaWiki
58
Editors• Protégé (http://protégé.stanford.edu)• SWOOP (http://mindswap.org/2004/SWOOP)• Altova SemanticWorks (http://www.altova.
com/download/semanticworks/semantic_web_rdf_owl_editor.html)
• SWeDE (http://owl-eclipse.projects.semwebcentral.org/InstallSwede.html), goes with Eclipse
• Medius• TopBraid Composer and other commercial tools• Visual Ontology Modeler (VOM) - Sandpiper• CMAP Ontology Editor (COE)
(http://cmap.ihmc.us/coe)
59
What about Earth Science?• SWEET (Semantic Web for Earth and Environmental
Terminology) – http://sweet.jpl.nasa.gov – based on GCMD terms– modular using faceted and integrative concepts
• VSTO (Virtual Solar-Terrestrial Observatory)– http://vsto.hao.ucar.edu – captures observational data (from instruments)– modular using domains
• MMI– http://marinemetadata.org– captures aspects of marine data, ocean observing systems– partly modular, mostly by developed project
• GeoSciML– http://www.opengis.net/GeoSciML/– is a GML (Geography ML) application language for Geoscience– modular, in ‘packages’