dictionaries, vocabularies, namespaces, thesauri, ontologies, and all that rob raskin nasa/jet...
TRANSCRIPT
Dictionaries, Vocabularies, Namespaces, Thesauri,
Ontologies, and all that
Rob RaskinNASA/Jet Propulsion Laboratory
June 21, 2011
Why care about data semantics?
Current data may need to be archived for decades or centuries Global change analysis requires consistent
comparisons across decades or centuries Synonyms
multiple words, same meaning Homonyms
same word, multiple meanings Measurement ambiguities
Sea “surface” temperature - at what “height”?
Let’s eat, Grandma.Let’s eat Grandma.
Time flies like an arrow.Fruit flies like a pie.
Semantic Understanding is Difficult!
LA Times headline
“Mission accomplished. Major combat operations in Iraq have ended”
-Pres. Bush, 2003
Variable t: temperatureVariable t: time Data quality= 5
Data quality= 3
Surface wind: measured 3 m above surface Surface wind: measured at surface
Semantic Spectrum
Catalog
List of controlled words
Semantics
Formal Hierarchyw/ Relations
Relations between children defined
Informal Hierarchy
Terms classified by categories(e.g. GCMD)
Formal Hierarchy
Terms inherent properties/meaning of parentVocabulary Ontology
Human-Readable Machine-Readable
Scope of Representation
Parameter names Scientific units Spatial/temporal extent/resolution Data quality Data provenance Data type Data services
CF
What is an Ontology?
An approach to store knowledge Machine-readable and human-readable Provides definition of words or phrases
expressed relative to other terms Offers shared understanding of concepts and
knowledge reuse Provides semantics for machine-to-machine (or
human-to-human) communications
Practically, an ontology is a…
Framework for classifying knowledge Ensures there is a “place” to store
components of knowledge
Ontology Languages:RDF and OWL
W3C has adopted languages that specialize XML Resource Description Formulation (RDF) Ontology Web Language (OWL)
Languages predefine specific tags RDF: Class, subclass, property, subproperty
Class-property similar to Entity-Relation of DBMS theory
RDF Class and Subclass Class
The basic element or “thing” or “noun” Subclass
Inherits all attributes of parent class Typically, adds Properties to distinguish subclass
from its parent Can have multiple parent classes
Cat Animalis a
has Legs 4
RDF Property & Subproperty
Property A “verb” Examples:
measures, hasLocation, hasArea, northOf Properties can have attributes:
domain, range, transitive, …
Subproperty Inherits parent attributes
OWL Language
Extends RDF to predefine further tags cardinality transitive relations inverse relations same as, different from union, intersection domain, range Import (from one ontology to another, to enable sharing and
reuse of the work of others) …
OWL Ontology Example <Class “WaterPollution>
<SubClassOf “Pollution”> <Restriction>
<OnProperty “hasSubstance”> <AllValuesFrom “Water”>
</Restriction> </SubClassOf>
</Class>
Statements about Statements
OWL allows us to make statements about statements Degree of belief Timestamps Provenance / Lineage Probability / Uncertainty Security issues Author / Source / Community Community dialect …
ObservedFeature
Landsat
has Probability 0.75
Corn Crop
has Source
is a
Ontologies provide a common namespace Documents, web pages, data, people, and
other resources can be mapped/ categorized to this namespace
Anybody can create or extend the namespace
Why are Ontologies Useful? (1)
Dictionary Concepts in the namespace not just “listed” (a
taxonomy), but “defined” (in terms of others) Concepts defined via specializations of broader
concepts -- with properties that distinguish each child from the broader parent concept
Reductionist approach of science Arbitrary levels of specialization are possible
As with Library of Congress and Dewey Decimal numbering systems
Why are Ontologies Useful? (2)
Disambiguation Reduces semantic mismatch Synonym support (multiple terms with
same meaning) label available to indicate preferred term for
each community Homonym support (multiple meanings of
same term) separate namespaces (President:Bush vs
Plant:Bush)
Why are Ontologies Useful? (3)
Why are Ontologies Useful? (4)
Machine readable Ontologies are generally stored in a format
(XML) that is readable by both humans and computers
Computer accessibility enables automated reasoning
Knowledge retention Corporations use knowledge management to
ensure institutional memory over time, as personnel come and go
Climate disciplines can do the same! Facts/data can be represented and related in a
consistent manner Common sense knowledge is captured
Instrument characteristics
Why are Ontologies Useful? (5)
Ontology Representation (1):Knowledge Base of Triples
Noun-Verb-Noun representation
Parent-child relations:
Flood is a Weather Phenomena GeoTIFF is a File Format Soil Type is a Physical Property Pacific Ocean is a Ocean
Or create your own relations:
Ocean has substance Water Sensor measures Temperature
Ontology Representation(2): Visual
Ontology Representation (3): XML, RDF, and OWL
W3C has adopted XML-based standard ontology languages
Resource Description Formulation (RDF) Ontology Web Language (OWL)
Languages predefine specific tags RDF: Class, subclass, property, subproperty, … OWL: Extends RDF to predefine further tags such as cardinality
Three flavors of OWL (Lite, DL, and Full)
Use of standard languages makes it easy to extend (specialize) work of others
Global Warming Query in the Semantic Web
Find data which demonstrates global warming at high latitudes during summertime and plot warming rate.
Extract information from the use-case - encode knowledge Translate this into a complete query for data - inference and integration
of data from instruments, indices and models
“Global warming”= Trend of increasing temperature over large spatial scales
“High latitude”= |Latitude| > 60 degrees“Summertime”= June-Aug (NH) and Jan-Mar (SH)“Find data”= Locate datasets using catalogs, then access and
read it“Plot warming rate”= Display temperature vs time
Semantic Web for Earth and Environmental Terminology (SWEET)
Concept space written in OWL Initial focus to assist search for data resources
Funded by NASA Later focus to serve as community standard (upper-level
Earth system science ontology) Enables scalable classification of Earth system science and
associated data concepts Specialists can further refine SWEET concepts SWEET 2.2 has 6600 concepts in 200 modular ontologies http://sweet.jpl.nasa.gov
SWEET Top-Level View
CF vs SWEET Representation
CF (traditional single-attribute parameter name):tendency_of_mole_concentration_of_dissolved_inorganic_phosphorus_in_sea_water_due_to_biological_processes
SWEET (multi-attribute parameter name): Quantity= mole_concentration Transformation= tendency State= dissolved, inorganic Substance= phosphorous Medium= sea_water Process= biological_processes
SWEET Data Ontology Dataset characteristics
Format, data model, dimensions, … Provenance
Source, processing history, … Parameters
Scale factors, offsets, … Data services
Subsetting, reprojection, … Quality measures Special values
Missing, land, sea, ice, ...
Best Practices Keep ontologies small, modular
Use higher level ontologies where possible Identify hierarchy of concept spaces
Try to keep dependencies unidirectional Gain community buy-in
Involve respected leaders
Ontology Development Tools: CMAP Free, downloadable tool for knowledge
representation and ontology development
Visual language with input/export to OWL Supports subset of OWL language
http://cmap.ihmc.us/coe
Resources ESIP Semantic Web Cluster
Monthly telecons Tutorials Ontology development
Datatypes data services
SWEET http://sweet.jpl.nasa.gov
Rob Raskin [email protected]