foundations i: methodologies, knowledge representation
DESCRIPTION
Foundations I: Methodologies, Knowledge Representation. Deborah McGuinness and Peter Fox (NCAR) CSCI-6962-01 Week 2, 2008. Review of reading Assignment 1. Ontologies 101, Semantic Web, e-Science, RDFS, Common Logic Any comments, questions?. Contents. Review of methodologies - PowerPoint PPT PresentationTRANSCRIPT
1
Foundations I: Methodologies, Knowledge Representation
Deborah McGuinness and Peter Fox (NCAR)
CSCI-6962-01
Week 2, 2008
Review of reading Assignment 1• Ontologies 101, Semantic Web, e-Science,
RDFS, Common Logic
• Any comments, questions?
2
Contents• Review of methodologies
• Elements of KR in semantic web context
• And in e-Science
• Choices of representation, models
• Examples of KR
• Encoding and understanding representations
• Assignment 1
3
4
Semantic Web Methodology and Technology Development Process
• Establish and improve a well-defined methodology vision for Semantic Technology based application development
• Leverage controlled vocabularies, et c.
Use Case
Small Team, mixed skills
Analysis
Adopt Technology Approach
Leverage Technology
Infrastructure
Rapid Prototype
Open World: Evolve, Iterate,
Redesign, Redeploy
Use Tools
Science/Expert Review & Iteration
Develop model/
ontology
KR and methodologies• Procedural Knowledge: Knowledge is encoded in functions/procedures.
For example: function Person(X) return boolean is
if (X = ``Socrates'') or (X = ``Hillary'') then return true else return false;
Or
function Mortal(X) return boolean is return person(X);• Networks: A compromise between declarative and procedural schemes.
Knowledge is represented in a labeled, directed graph whose nodes represent concepts and entities, while its arcs represent relationships between these entities and concepts.
• Frames: Much like a semantic network except each node represents prototypical concepts and/or situations. Each node has several property slots whose values may be specified or inherited by default.
• Logic: A way of declaratively representing knowledge. For example:– person(Socrates).– person(Hillary).– forall X [person(X) ---> mortal(X)]– DL, FOL, SOL
5
KR and methodologies• Decision Trees: Concepts are organized in the form of a tree.• Statistical Knowledge: The use of certainty factors, Bayesian
Networks, Dempster-Shafer Theory, Fuzzy Logics, ..., etc.• Rules: The use of Production Systems to encode condition-
action rules (as in expert systems).• Parallel Distributed processing: The use of connectionist
models.• Subsumption Architectures: Behaviors are encoded
(represented) using layers of simple (numeric) finite-state machine elements.
• Hybrid Schemes: Any representation formalism employing a combination of KR schemes.
6
Remember, in science!• Some of the knowledge is lost when it is
placed into any particular structure, or may not be reusable (e.g. Frames)
• So, you may ask something that cannot be answered or inferred
• Knowledge evolves, i.e. changes
• Knowledge and understanding is very often context dependent (and discipline, language, and skill-level dependent, and …)
7
And, if you are used to logic• You are working mostly within the world of
logic, whereas we are trying to represent knowledge with logic and we are usually dealing with tangible objects, such as trees, clouds, rock, storms, etc.
• Because of this, we have to be very careful when translating real things into logical symbols - this can, surprisingly, be a difficult challenge.
• Consider your method of representation (yes, we do want to compute with it) 8
Thus• A person who wants to encode knowledge
needs to decouple the ambiguities of interpretation from the mathematical certainty of (any form of) logic.
• The nature of interpretation is critical in formal knowledge representation and is carefully formalized by KR scientists in order to guarantee that no ambiguity exists in the logical structure of the represented knowledge.
9
Representing Knowledge With Objects
• Take all individuals that we need to keep track of and place them into different buckets based on how similar they are to each other. Each bucket is given a descriptive based on what objects it contains.
• Since the individuals in a given bucket are at least somewhat similar, we can avoid needing to describe every inconsequential detail about each individual. Instead, properties that are common to all individuals in a bucket can just be assigned to the entire bucket at once. Properties are typically either primitive values (such as numbers or text strings) or may be references to other buckets.
10
Representing Knowledge With Objects
• Some buckets will be more similar to each other than others and we can arrange the buckets into a hierarchy based on the similarity.
• If all buckets in a branch in the tree of buckets share a property, the information can be further simplified by assigning the property only to the parent bucket. Other buckets (and individuals) are said to inherit that property.
• Buckets may have different names: e.g. Classes, Frames, or Nodes
• BUT, once we move to (e.g.) DL, not all object rules apply, e.g. cannot override properties
• Multiple inheritance is not always obvious to people11
Re-enter Semantic Web• At its core, the Semantic Web can be thought
of as a methodology for linking up pieces of structured and unstructured information into commonly-shared description logics ontologies.
12
13
Semantic Web Layers
http://www.w3.org/2003/Talks/1023-iswc-tbl/slide26-0.html, http://flickr.com/photos/pshab/291147522/
14
Elements of KR in Semantic Web• Declarative Knowledge• Statements as triples: {subject-predicate-object}
interferometer is-a optical instrument
Fabry-Perot is-a interferometer
Optical instrument has focal length
Optical instrument is-a instrument
Instrument has instrument operating mode
Instrument has measured parameter
Instrument operating mode has measured parameter
NeutralTemperature is-a temperature
Temperature is-a parameter
• A query: select all optical instruments which have operating mode vertical
• An inference: infer operating modes for a Fabry-Perot Interferometer which measures neutral temperature
15
Ontology Spectrum
Catalog/ID
SelectedLogical
Constraints(disjointness,
inverse, …)
Terms/glossary
Thesauri“narrower
term”relation
Formalis-a
Frames(properties)
Informalis-a
Formalinstance Value
Restrs.
GeneralLogical
constraints
Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty; – updated by McGuinness.Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html
16
OWL or RDF or SWRL?• In representing knowledge you will need to
balance expressivity with implementability– OWL (Lite, DL, Full)
– RDF and RDFS– Rules, e.g. SWRL
• You will need to consider the sources of your knowledge
• You will need to consider what you want to do with the represented knowledge
17
The knowledge base• Using, Re-using, Re-purposing, Extending,
Subsetting• Approach:
– Bottom-up (instance level or vocabularies)– Top-down (upper-level or foundational)– Mid-level (use case)
• Coding and testing (understanding)• Using tools (some this class, more over the next two
classes)• Iterating (later)• Maintaining and evolving (curation, preservation)
(later)
18
‘Collecting’ the ‘data’• Part of the (meta)data information is present in tools ... but thrown away
at output e.g., a business chart can be generated by a tool: it ‘knows’ the structure, the classification, etc. of the chart,but, usually, this information is lost storing it in web data would be easy!
• SW-aware tools are around (even if you do not know it...), though more would be good: – Photoshop CS stores metadata in RDF in, say, jpg files (using XMP)– RSS 1.0 feeds are generated by (almost) all blogging systems (a huge
amount of RDF data!)
• Scraping - different tools, services, etc, come around every day: – get RDF data associated with images, for example: service to get RDF from
flickr images– service to get RDF from XMP– XSLT scripts to retrieve microformat data from XHTML files– RSS scraping in use in VO projects in Japan– scripts to convert spreadsheets to RDF
• SQL - A huge amount of data in Relational Databases– Although tools exist, it is not feasible to convert that data into RDF – Instead: SQL ⇋ RDF ‘bridges’ are being developed: a query to RDF data is
transformed into SQL on-the-fly
19
More Collecting• RDFa (formerly known as RDF/A) extends XHTML
by: – extending the link and meta to include child elements– add metadata to any elements (a bit like the class in
microformats, but via dedicated properties)
• It is very similar to microformats, but with more rigor: – it is a general framework (instead of an メ agreement モ on
the meaning of, say, a class attribute value)– terminologies can be mixed more easily
• GRDDL - Gleaning Resource Descriptions from Dialects of Languages
• ATOM (used with RSS)
20
GRDDL - bottom up• GRDDL - Gleaning Resource Descriptions
from Dialects of Languages
• Pretty much = “XML/XHTML (for e.g.) into RDF via XSLT”
• Good support, e.g. Jena
• Handles microformats
• Active community
• How to categorize, use, re-use (parts of)?
21
Collecting• RDFa extends XHTML by:
– extending the link and meta to include child elements
– add metadata to any elements (a bit like the class in micro-formats, but via dedicated properties)
– It is very similar to micro-formats, but with more rigor:
• it is a general framework (instead of an “agreement” on the meaning of, say, a class attribute value)
• terminologies can be mixed more easily
• ATOM (used with RSS)
22
Foundational Ontologies
CONTENTS
General concepts and relations that apply in all domainsphysical object, process, event,…, inheres, participates,…
Rigorously definedformal logic, philosophical principles, highly structured
ExamplesDOLCE, BFO, GFO, SUMO, CYC, (Sowa)
Courtesy: Boyan Brodaric
23
Foundational Ontologies
PURPOSE: help integrate domain ontologies
Geophysics ontology
Marine ontology
Water ontology
Planetary ontology
Geology ontology
Struc ontology
Rock ontology
“…and then there was one…”
Foundational ontology
Courtesy: Boyan Brodaric
24
Foundational Ontologies
PURPOSE: help organize domain ontologies
“…a place for everything, and everything in its place…”
Foundational ontology
shale rock formation lithification
Courtesy: Boyan Brodaric
25
Problem scenario
Little work done on linking foundational ontologies with geoscience ontologies
Such linkage might benefit various scenarios requiring cross-disciplinary knowledge, e.g.:
water budgets: groundwater (geology) and surface water (hydro)
hazards risk: hazard potential (geology, geophysics) and items at threat (infrastructure, people, environment, economic)
health: toxic substances (geochemistry) and people, wildlife
many others…
Courtesy: Boyan Brodaric
26DOLCE
27
• Physical • Object
• SelfConnectedObject • ContinuousObject • CorpuscularObject • Collection
• Process • Abstract
• SetClass • Relation
• Proposition • Quantity
• Number • PhysicalQuantity
• Attribute
SUMO - Standard Upper Merged Ontology
28
29
30
Using SNAP/ SPAN
31
DOLCE + SWEETDOLCE = SWEET < SWEET
Physical-body BodyofGround, BodyofWater,…
Material-Artifact Infrastructure, Dam, Product,…
Physical-Object LivingThing, MarineAnimal
Amount-of-Matter Substance
Activity HumanActivity
Physical-Phenomenon Phenomena
Process Process
State StateOfMatter
Quality Quantity, Moisture,…
Physical-Region Basalt,…
Temporal-Region Ordovician,…
Benefitsfull coverage
rich relations
home for orphans
single superclasses
Issuesindividuals (e.g. Planet Earth)
roles (contaminant)
features (SeaFloor)
Courtesy: Boyan Brodaric
32
Conclusions
Surprisingly good fit amongst ontologiesso far: no show-stopper conflicts, a few difficult conflicts
DOLCE richness benefits geoscience ontologiesgood conceptual foundation helps clear some existing problems
Unresolved issues in modeling science entitiesmodeling classifications, interpretations, theories, models,…
Courtesy: Boyan Brodaric
Same procedure with GeoSciML
33
34
SWEET 2.0 Modular Design
Math, Time, Space
Basic Science
Geoscience Processes
Geophysical Phenomena
Applications
importation
• Supports easy extension by domain specialists
• Organized by subject (theoretical to applied)
• Reorganization of classes, but no significant changes to content
• Importation is unidirectional
35
SWEET 2.0 Ontologies
36
Using SWEET• Plug-in (import) domain detailed modules
• Lots of classes, few relations (properties)
• Version 2.0 is re-usable and extensible
37
GeoSciOnt?
38
Mix-n-Match• The hybrid example:
– Collect a lot of different ontologies representing different terms, levels of concepts, etc. into a base form: RDF
39
CF attributes
SWEET Ontologies(OWL)
Search Terms
CF Standard Names(RDF object)
IRIDL Terms
NC basic attributes
IRIDLattributes/objects
SWEET as Terms
CF Standard NamesAs Terms
Gazetteer Terms
CF data objects
Location
Blumenthal
40
Data ServersOntologies
MMI
JPL
StandardsOrganizations
Start Point
RDF Crawler
RDFS SemanticsOwl SemanticsSWRL Rules
SeRQL CONSTRUCT
Search Queries
LocationCanonicalizer
TimeCanonicalizer
Sesame
Search Interface
bibliography
IRI RDF Architecture
Blumenthal
41
Mid-Level: Developing ontologies• Use cases and small team (7-8; 2-3 domain experts, 2
knowledge experts, 1 software engineer, 1 facilitator, 1 scribe)
• Identify classes and properties (leverage controlled vocab.)– Start with narrower terms, generalize when needed or
possible– Adopt a suitable conceptual decomposition (e.g. SWEET) – Import modules when concepts are orthogonal
• Review, vet, publish • Only code them (in RDF or OWL) when needed
(CMAP, …)• Ontologies: small and modular
42
Use Case example• Plot the neutral temperature from the Millstone-Hill
Fabry Perot, operating in the vertical mode during January 2000 as a time series.
• Plot the neutral temperature from the Millstone-Hill Fabry Perot, operating in the non-vertical mode during January 2000 as a time series.
• Objects: – Neutral temperature is a (temperature is a) parameter– Millstone Hill is a (ground-based observatory is a) observatory– Fabry-Perot is a interferometer is a optical instrument is a instrument– Non-vertical mode is a instrument operating mode– January 2000 is a date-time range– Time is a independent variable/ coordinate– Time series is a data plot is a data product
43
Class and property example• Parameter
– Has coordinates (independent variables)
• Observatory– Operates instruments
• Instrument– Has operating mode
• Instrument operating mode– Has measured parameters
• Date-time interval• Data product
44
45
46
47
Higher level use case• Find data which represents the state of the
neutral atmosphere above 100km, toward the arctic circle at any time of high geomagnetic activity
• Find data which represents the state of the neutral atmosphere above 100km, toward the arctic circle at any time of high geomagnetic activity
48
Extending the KR for a purpose
Input
Physical properties: State of neutral atmosphere
Spatial:
• Above 100km
• Toward arctic circle (above 45N)
Conditions:
• High geomagnetic activity
Action: Return Data
Specification needed for query to CEDARWEB
Instrument
Parameter(s)
Operating Mode
Observatory
Date/time
Return-type: data
GeoMagneticActivity has ProxyRepresentation
GeophysicalIndex is a ProxyRepresentation (in Realm of Neutral Atmosphere)
Kp is a GeophysicalIndex hasTemporalDomain: “daily”
hasHighThreshold: xsd_number = 8
Date/time when KP => 8
49
Translating the Use-Case - ctd.
Input
Physical properties: State of neutral atmosphere
Spatial:
Above 100km
Toward arctic circle (above 45N)
Conditions:
High geomagnetic activity
Action: Return Data
Specification needed for query to CEDARWEB
Instrument
Parameter(s)
Operating Mode
Observatory
Date/time
Return-type: data
NeutralAtmosphere is a subRealm of TerrestrialAtmosphere
hasPhysicalProperties: NeutralTemperature, Neutral Wind, etc.
hasSpatialDomain: [0,360],[0,180],[100,150]
hasTemporalDomain:
NeutralTemperature is a Temperature (which) is a Parameter
FabryPerotInterferometer is a Interferometer, (which) is a Optical Instrument (which) is a Instrument
hasFilterCentralWavelength: Wavelength
hasLowerBoundFormationHeight: Height
ArcticCircle is a GeographicRegion
hasLatitudeBoundary:
hasLatitudeUpperBoundary:
GeoMagneticActivity has ProxyRepresentation
GeophysicalIndex is a ProxyRepresentation (in Realm of Neutral Atmosphere)
Kp is a GeophysicalIndex hasTemporalDomain: “daily”
hasHighThreshold: xsd_number = 8
Date/time when KP => 8
50
Knowledge representation - visual
• UML – Universal Modeling Language– Ontology Definition Metamodel/Meta Object
Facility (OMG) for UML– Provides standardized notation
• CMAP Ontology Editor (concept mapping tool from IHMC - http://cmap.ihmc.us/coe )– Drag/drop visual development of classes,
subclass (is-a) and property relationship– Read and writes OWL– Formal convention (OWL/RDF tags, etc.)
• White board, text file
51
52
Representing processes
53
Is OWL/RDF the only option? No…• SKOS - Simple Knowledge Organization
Scheme for Taxonomies http://www.w3.org/2004/02/skos/
• Annotations (RDFa) – for un- or semi-structured information sources http://www.w3.org/TR/xhtml-rdfa-primer/ http://rdfa.info
• Atom (and RSS) – for representing syndication feeds – structured http://tools.ietf.org/html/rfc4287
54
Use Case• Provide a decision support capability for an
analyst to determine an individual’s susceptibility to avian flu without having to be precise in terminology (-nyms)
55
56
57
Building SKOS• ThManager
• Protégé (4) plugin for SKOS
58
Is OWL the only option II? No…• Natural Language (NL)
– Read results from a web search and transform to a usable form
– Find/filter out inconsistencies, concepts/relations that cannot be represented
• Popular options– CLCE (common logic controlled english)– Rabbit, e.g. ShellfishCourse is a Meal Course that (if has
drink) always has drink Potable Liquid that has Full body and which either has Moderate or Strong flavour
– PENG (processable English)
• Really need PSCI - process-able science but that’s another story (research project)
59
Sydney syntax
If X has Y as a father then Y is the only father of X.
The class person is equivalent to male or female, and male and female are mutually exclusive.
equivalent toThe classes male and female are
mutually exclusive. The class person is fully defined as anything that is a male or a female.
60
PENG - Processible English
1. If X is a research programmer then X is a programmer.
2. Bill Smith is a research programmer who works at the CLT.
3. Who is a programmer and works at the CLT?
61
CLCE - Common Logic Controlled English
CLCE: If a set x is the set of (a cat, a dog, and an elephant), then the cat is an element of x, the dog is an element of x, and the elephant is an element of x.
PC:~(∃x:Set)(∃x1:Cat)(∃x2:Dog)(∃x3:Elephant)(Set(x,x1,x2,x3) ∧ ~(x1∈x ∧ x2∈x ∧ x3∈x))
62
Rules (aka ‘Logic’)• OWL-DL and OWL-Lite are based on
Description Logic• There are things that DL cannot express
(though there are things that are difficult to express with rules and easy in DL...)– A well known examples is Horn rules (eg, the
‘uncle’ relationship): (P1 ∧ P2 ∧ ...) → C– e.g.: parent(?x,?y) ∧ brother(?y,?z) ⇒ uncle(?x,?z)
– Or, for any X, Y and Z: if Y is a parent of X, and Z is a brother of Y then Z is the uncle of X
63
Examples from http://www.w3.org/Submission/SWRL/
• A simple use of these rules would be to assert that the combination of the hasParent and hasBrother properties implies the hasUncle property. Informally, this rule could be written as:– hasParent(?x1,?x2) ∧ hasBrother(?x2,?x3) ⇒ hasUncle(?x1,?x3)
• In the abstract syntax the rule would be written like:– Implies(Antecedent(hasParent(I-variable(x1) I-variable(x2)) hasBrother(I-variable(x2) I-variable(x3)))Consequent(hasUncle(I-variable(x1) I-variable(x3))))
• From this rule, if John has Mary as a parent and Mary has Bill as a brother then John has Bill as an uncle.
64
Examples• An even simpler rule would be to assert that Students are Persons, as in– Student(?x1) ⇒ Person(?x1).Implies(Antecedent(Student(I-variable(x1)))Consequent(Person(I-variable(x1))))
– However, this kind of use for rules in OWL just duplicates the OWL subclass facility. It is logically equivalent to write instead• Class(Student partial Person) or • SubClassOf(Student Person)
– which would make the information directly available to an OWL reasoner.
65
Semantic Web with Rules• Metalog• RuleML• SWRL• RIF• WRL• Cwm• Jess - rules engine
66
Query• Querying knowledge representations in OWL and/or
RDF
• OWL-QL (for OWL) http://projects.semwebcentral.org/projects/owl-ql/
• SPARQL for RDF http://www.sparql.org/ and http://www.w3.org/TR/rdf-sparql-query/
• XQUERY (for XML)• SeRQL (for SeSAME)• RDFQuery (RDF)• Few as yet for natural language representations
67
Developing a service ontology• Use case: find and display in the same projection,
sea surface temperature and land surface temperature from a global climate model.
• Find and display in the same projection, sea surface temperature and land surface temperature from a global climate model.
• Classes/ concepts: – Temperature– Surface (sea/ land)– Model– Climate– Global– Projection– Display …
68
Service ontology• Climate model is a model• Model has domain• Climate Model has component representation• Land surface is-a component representation• Ocean is-a component representation• Sea surface is part of ocean• Model has spatial representation (and temporal)• Spatial representation has dimensions• Latitude-longitude is a horizontal spatial representation• Displaced pole is a horizontal spatial representation• Ocean model has displaced pole representation• Land surface model has latitude-longitude representation• Lambert conformal is a geographic spatial representation• Reprojection is a transform between spatial representation• ….
69
Service ontology• A sea surface model has grid representation displaced pole
and land surface model has grid representation latitude-longitude and both must be transformed to Lambert conformal for display
70
Best practices (some)• Ontologies/ vocabularies must be shared and
reused - swoogle.umbc.edu, www.planetont.org• Examine ‘core vocabularies’ to start with
– SKOS Core: about knowledge systems– Dublin Core: about information resources, digital libraries,
with extensions for rights, permissions, digital right management
– FOAF: about people and their organizations – DOAP: on the descriptions of software projects– DOLCE seems the most promising to match science
ontologies
• Go “Lite” as much as possible, then increasing logic - balancing expressibility vs. implementability
• Minimal properties to start, add only when needed
71
Summary• The science of knowledge representation has, throughout its
history, consisted of a compromise between pragmatism, scientific rigor, and accessibility to domain experts
• Many different options for ontology development and encoding, i.e. knowledge representation
• Sometimes, your choice of representation may need to change based on language and tools availability/ capability…
• Balancing expressivity and implementability means we favor an object-type, e.g. DL representation (but also suggests the need for a meta-representation: e.g. KIF – Knowledge Interchange Format)
• Next class (3) – ontology engineering• Use cases should drive the functional requirements of both
your ontology and how you will ‘build’ one (see class 4)
72
Assignments for Week 2• Reading: OWL Guide• Assignment 1: Representing Knowledge and Understanding
Representations
73
74
Developing ontologies• Use cases and small team (7-8; 2-3 domain experts, 2
knowledge experts, 1 software engineer, 1 facilitator, 1 scribe)• Identify classes and properties (leverage controlled vocab.)
– Start with narrower terms, generalize when needed or possible– Data integration - often requires broader terms– Adopt a suitable conceptual decomposition (e.g. SWEET) – Import modules when concepts are orthogonal
• Minimal properties to start, add only when needed• Mid-level to depth - i.e. neither top-down nor bottom-up• Review, review, review, vet, vet, vet, publish - www.planetont.org
(experiences, results, lessons learned, AND your ontologies AND discussions)
• Only code them (in RDF or OWL) when needed (CMAP, …)• Ontologies: small and modular
75
SW != ontologies on the web (!)• Ontologies are important, but use them only when necessary as identified by
use cases• The Semantic Web is about integrating data on the Web; ontologies (and/or
rules) are tools to achieve that when necessary• SW ontologies != some big (central) ontology
– The ethos of the Semantic Web is on sharing, ie, sharing possibly many small ontologies
– A huge, central ontology could be difficult to manage in terms of maintenance.
– Semantic web languages such as OWL contain primitives for equivalence and disjointness of terms and meta primitives for versioning info
• The practice: – SW applications using ontologies mix large number of ontologies and
vocabularies (FOAF, DC, and others) – the real advantage comes from this mix: that is also how new relationships
may be discovered• This dictates the type of knowledge representation required• One readable background article from the metadata world is available at:
http://www.metamodel.com/article.php?story=20030115211223271
76
Editors• Protégé (http://protégé.stanford.edu) with the
SWRL plug-in
• SWOOP (http://mindswap.org/2004/SWOOP)
• Ipedo Visual Rules editor
• TopBraid Composer and other commercial tools