Download - Introduction to Ontologies
Introduction to Ontologies
Adding Meaning to Metadata
Brian Lowe
Metadata Working Group
February 16, 2007
So…
what
heck
the
arewe
talking about
exactly
ontologies are really, really
simple.
Thing
Person Foodeats
Ontologies can also be really, really
complex.
We store data and metadata in all kinds of ways.
We’re probably all familiar with a database record:
Record
Record Number: 289425
Title: Metamorphosis
Type: book
Author: Kafka, Franz
Publication date: 1946
Publisher: Vanguard Press
Let’s back up a bit
We need to add another field.
RecordRecord Number: 289425 Title1: Metamorphosis
Type: book
Author: Kafka, Franz
Publication date: 1946
Publisher: Vanguard Press
What do we do when we want to express something else?
Title2: Die Verwandlung
We need to add another table.
ThingRecord Number: 289425
Type: book
Author: Kafka, Franz
Publication date: 1946
Publisher: Vanguard Press
Say we want unlimited titles.
Title289425 Metamorphosis
289425 Die Verwandlung
20027 Dr. Strangelove
Well-designed databases tend to deal with lots of relationships
between different elements of data.
The way the relationships are set up is called the data model
Relational databases are great.
Until you want to share your data with someone else who isn’t running the same database software or who doesn’t understand what you’ve done.
OK, no problem. Why don’t we just create a standardized way of shipping data around.
Let’s call this standard XML.
<?xml version=“1.0” encoding=“UTF-8”?>
<things>
<thing id=“289425”>
<title>Metamorphosis</title>
<title noindex=“4”>Die Verwandlung</title>
<author>Kafka, Franz</author>
<publisher>Vanguard Press</publisher>
<publicationDate>1946</publicationDate>
<type>book</type>
</thing>
<thing id=“20027”>
<title>Dr. Strangelove</title>
XML is great.
• we can use standardized tools
• XML is readable by both machines and humans (in theory)
• we can create rich schemas that will let us check whether an XML document is valid
XML alone is all about trees.
But sometimes trees aren’t enough.
What about all those complex relationships?
<eml:eml packageId="gss1.37.2" system="knb" xsi:schemaLocation="eml://ecoinformatics.org/eml-2.0.1 eml.xsd" scope="system"> <dataset scope="document"> <title>Test GIS data upload</title> <creator id="1170948373895" scope="document"> <individualName> <surName>steinhart</surName> </individualName> <organizationName>mann</organizationName> <positionName>librarian</positionName> </creator> <abstract> <para>test upload of a 58MB GIS data file w/eml record</para> </abstract>
<contact scope="document"> <references system="document">1170948373895</references> </contact>
one (nonstandard) way of breaking out of a tree
Let’s use one standard data model.
RDF: Resource Description Framework
GailDataset1.37.2
creator
contact
librarian
position
Mann Library
organization
graphs instead of trees
Everything is expressed as statements or triples
Subject —— Property —— Object(predicate)
Thing289435 title “Metamorphosis”Thing289435 title “Verwandlung”Thing289435 author “Kafka, Franz”Thing289435 type book
If everything’s a triple, we can store new things very easily.
Subject —— Property —— Object(predicate)
Thing289435 title “Metamorphosis”Thing289435 title “Die Verwandlung”Thing289435 author “Kafka, Franz”Thing289435 type book
Thing289435 comment “This is the one where Gregor Samsa wakes up as a cockroach.”
Thing289435 callNumber “PT2621.A25 V5 1946”
“Triple Stores”S—P—O
S—P—O
S—P—O
S—P—O
S—P—O
TRIPLE
STOREThere are various query languages for RDF, similar to SQL
We can select all the triples where the subject is Thing289435
Or select all the triples where the property is “title.”
RDF: Resource Description Framework
What’s a resource?
Something we assign a specific identifier or URI
(Uniform Resource Identifier).
http://www.somerandomlibrary.org/ourthings/Thing289435
We use this URI as the subject or object of a triple.
We can now mash this up with a whole bunch of other triples and not get confused about which thing we’re describing.
RDF: Resource Description Framework
We also assign URIs to the properties.http://purl.org/dc/elements/1.1/title
Subject http://www.somerandomlibrary.org/ourthings#Thing289435
Property http://purl.org/dc/elements/1.1/title
Object “Metamorphosis”
Now, anything that understands what a Dublin Core title is can find the title of our book.
RDF: Resource Description Framework
We even use URIs with things that aren’t resources.
Subject http://www.somerandomlibrary.org/ourthings/Thing289435
Property http://purl.org/dc/elements/1.1/title
Object “Metamorphosis”^^http://www.w3.org/2001/XMLSchema#string
“Semantics”
This is “semantic” metadata in its simplest sense.
We’ve explicitly stated what kind of relationship exists between two things.
But it’s still up to software or humans to understand what the different properties actually “mean”
Ontologies
• describe what we mean in some ways that machines can understand.
• are a standardized way of modeling the ways the different pieces of data relate to one another
• Ontologies have been around for decades, but there is an increasing interest in sharing them over the Web.
So how do we make an ontology?
• We need to decide what kinds of things we want to talk about (classes)
• We also need to describe what kind of relationships they can have (properties)
Class hierarchy(also called the taxonomy or the terminology box (“TBox”))
Thing
Person
Employee
Academic Employee Non-Academic Employee
Faculty member Librarian Cataloger Programmer
Class hierarchyArrows represent “subclass of” or “is a” relationships
Thing
Person
Employee
Academic Employee Non-Academic Employee
Faculty member Librarian Cataloger Programmer
•A faculty member is an academic employee
• A faculty member is an employee
• A faculty member is a person
• (A faculty member is a thing.)
Class hierarchyThe classes here are not disjoint.
Thing
Person
Employee
Academic Employee Non-Academic Employee
Faculty member Librarian Cataloger Programmer
We can assert that someone is a librarian.
We can assert that the same individual is also a faculty member, and that’s not a problem.
Class hierarchyLet’s make some classes disjoint.
Thing
Person
Employee
Academic Employee Non-Academic Employee
Faculty member Librarian Cataloger Programmer
Now if we try to assert that something is both a faculty member and a cow, the ontology will tell us that these statements are inconsistent with our model.
Farm Animal
Cow
disjoint
Making a class hierarchy can be trickyHow do we model an organization?
Organization charts are typically organized by what things are part of
Cornell University
CUL
LTS IRIS
CALS A&S
Plant Pathology
Crop & Soil Sciences
Asian Studies
Making a class hierarchy can be trickyThis is not a valid class hierarchy. Why not?
University
Library System
Library Department
College
College Department
Making a class hierarchy can be trickyThis is not a valid class hierarchy. Why not?
University
Library System
Library Department
College
College Department
Plant Biology
Plant Biology is a College Department.
Plant Biology is a College. (NO!)
Plant Biology is a University. (NO!)
Making a class hierarchy can be trickyLet’s try this instead.
Organization
College Library System Department
Maybe not the best model, but it works.
Siblings disjoint{ University
Let’s add a propertysubunitOf
Organization
College Library System DepartmentSiblings disjoint{ University
Plant Biology
subunitOf
Let’s add a propertysubunitOf
Now we can assert things like:
subject property object
CALS subunitOf Cornell
CUL subunitOf Cornell
Arts&Sciences subunitOf Cornell
Plant Biology subunitOf CALS
LTS subunitOf CUL
and model our organization chart.
Property hierarchiesAs with classes, properties can be arranged in a hierarchy.
partOf
subunitOf
subpropertyOf
Property HierarchiessubunitOf
Now if we assert statements like:
subject property object
CALS subunitOf Cornell
CUL subunitOf Cornell
Arts&Sciences subunitOf Cornell
Plant Biology subunitOf CALS
LTS subunitOf CUL
Our ontology tell us these statements must also be true:
subject property object
CALS partOf Cornell
CUL partOf Cornell
Arts&Sciences partOf Cornell
Plant Biology partOf CALS
LTS partOf CUL
Property hierarchiesAnother example.
memberOf
headOf
subpropertyOf
Things that are tricky to do with ontologies / statements
What if we want to express things that aren’t simple subject-predicate-object statements?
Mike took a picture of a moose with a Nikon camera in Maine.
Event-based ontologies
ABC Ontology / Harmony Project
http://metadata.net/harmony/
- events
- participants in events
- tools used in events
- outcomes of events
W3C “Technologies”
Gives us the simple standard data model that lets us draw graphs and show how things are related to one another
RDF
RDF Schema (RDFS)
Lets us construct basic ontologies and build class and property hierarchies
Web Ontology Language (OWL)
Lets us do significantly more complex things.
RDF Schema Inferencing
To make an inference is to add new statements based on existing ones.
Software that understands RDF Schema can make the kinds of simple inferences we’ve seen so far:
From:
Dr.Smith type Faculty Member
Joe Jones headOf Finance Committee
RDFS inferencing adds:
Dr.Smith type Person
Joe Jones memberOf Finance Committee
Why?
Faculty Member is a subclass of Person.
headOf is a subproperty of memberOf.
RDF Schema Limitations
Usually when we relate two things with a property, it’s very useful if the relationship is bidirectional.
David Skorton presidentOf Cornell University
implies
Cornell University hasPresident David Skorton
RDF Schema doesn’t come with a very good way of handling this.
“RoleNoun”
One way of dealing with this is to use a naming convention.
president
is president of
Software that assumes this convention can automatically the text to display for the inverse property.
The Dublin Core properties are largely compatible with this convention:
publisher
is publisher of
contributor
is contributor of (Doesn’t work!)
Inferencing
More complex inferencing with OWL usually requires a separate inference engine (also known as a reasoner or classifier).
Flavors of OWL
OWL “Tiny”
OWL Lite
OWL DL (Description Logics)
OWL Full
Inference engines get increasingly complex.
Inference engines choke. Very expressive; bad for reasoning.
OWL Basics
Object Properties
relate resources to other resources
Datatype Properties
relate resources to literals
Most software supports only string and integer datatypes.
Classes overlap by default
Must specify which classes are disjoint.
(But can’t do this if we’re using OWL Lite!)
Stuff OWL Gets Us
Explicit Inverse Properties
hasPresident
presidentOf
OWL allows us to specify that these two properties are inverses of each other.
Cornell hasPresident David Skorton
OWL inferencing automatically adds:
David Skorton presidentOf Cornell University
Stuff OWL Gets Us
Transitive Properties
partOf
If Ithaca is part of Tompkins County
and
Tompkins County is part of New York State
and
New York State is part of the United States
then Ithaca is part of the United States.
Stuff OWL Gets Us
Transitive Properties
partOf
OWL lets us specify that is property is transitive
If we assert these statements…
Ithaca partOf Tompkins County
Tompkins County partOf New York State
New York State partOf United States
Stuff OWL Gets Us
Transitive Properties
an OWL reasoner fills in these additional statements:
Ithaca partOf New York State
Ithaca partOf United States
Tompkins County partOf United States
Stuff OWL Gets Us
Transitive Properties
This time, let’s also say the partOf and hasPart are inverses of each other.
Again, we’ll assert:
Ithaca partOf New York State
Ithaca partOf United States
Tompkins County partOf United States
Stuff OWL Gets Us
Transitive Properties
Now the OWL reasoner adds these:
Ithaca partOf New York State
Ithaca partOf United States
Tompkins County partOf United States
United States hasPart New York State
United States hasPart Tompkins County
United States hasPart Ithaca
New York State hasPart Tompkins County
New York State hasPart Ithaca
Tompkins County hasPart Ithaca
Stuff OWL Gets Us
Transitive Properties
We put in three statements manually and got nine more free.
What good is this?
Makes it easier to query the data in different ways. We sacrifice some space (store more stuff) to make it faster to get the answer we want.
Stuff OWL Gets Us
Transitive Properties
Say we want to get all the towns in New York State
We could crawl around the graph, or we could ask for all the triples that match
x partOf New York State AND
x type Town
This is easier and faster. The reasoner has already done the heavy lifting for us ahead of time.
Something we can’t yet do in OWL
Transitive “over”
Who at Cornell is doing work in Africa?
Dr. Jones conductsResearchIn Lomé
Lomé partOf Togo
Togo partOf Africa
Let’s make conductsResearchIn “transitive over” partOf
Something we can’t yet do in OWL
Transitive “over”
Who at Cornell is doing work in Africa?
Because partOf is transitive, the reasoner adds
Lomé partOf Africa
Because Dr. Jones conducts research in Lomé,
Dr. Jones conductsResearchIn Africa
More OWL Constructs
Symmetric Properties
John friendOf Kate
implies
Kate friendOf John
Functional Properties
Tom birthdate 1978-03-12
More Complex OWL Inferencing
Classes
So far, we’ve only dealt with primitive classes.
We name a class and then assert that an individual is a member of that class.
But OWL lets up make defined classes where a reasoner automatically computes the membership.
More Complex OWL Inferencing
Defined Classes
We can specify what kinds of properties the members of a class need to have.
OWL Pizzas
VegetarianPizza
hasTopping allValuesFrom
(union of CheeseTopping and VegetableTopping)
CheesyPizza
hasTopping someValuesFrom
CheeseTopping
More Complex OWL Inferencing
Vet School example
Say we want to get lists of teaching faculty versus clinical faculty.
We could create TeachingFaculty and ClinicalFaculty as primitive classes,
and assert who is a member of each class.
Or, we could define:
TeachingFaculty = faculty who teach at least one course.
ClinicalFaculty = faculty who have a medical appointment in the animal hospital
We can get these properties from external databases.
Reasoner can assign membership in our defined classes based on those properties.
Inferences with Rules
Semantic Web Rules Language (SWRL)
Still in a fairly experimental stage. Not all of SWRL is compatible with OWL-DL reasoners.
OWL reasoning focuses mainly on classifying things. Rules also let us add new property instances.
If x type AcademicDepartment
and x hasFacultyMember y
and y memberOfGraduateField z
Then x hasAssociatedGraduateField z
How might more complex reasoning be useful?
• Evolution of VIVO
New directions in VIVO
VIVO “flags” that exist outside the ontology and filter entities for display.
Initially manually applied; now also automatically set. Time-consuming and error-prone.
New directions in VIVO
We should be able to use the statements we already have instead of setting flags.
How about some defined classes?
CALSUnit
departmentOrDivisionWithin
someValuesFrom CALS
CALSPerson
participantIn (employeeIn?)
someValuesFrom CALSUnit
New directions in VIVO
Different colleges have different definitions of “faculty” and “nonfaculty”
Different colleges are interested in keeping track of different things.
A multiple-ontology integration approach might be very useful:
Model how the colleges think of things, and then infer into the ontology of how VIVO wants to think of things.
Interoperability between Ontologies
OWL Constructs
equivalentClass
(equivalent class extension)
sameAs
(indivduals are the same thing, just with different names)
Interoperability between Ontologies
Subclassing an established upper-level ontology
DOLCE (Descriptive Ontology for Linguistic and Cultural Engineering)
SUMO (Suggested Upper Merged Ontology)
AKT (Advanced Knowledge Technologies)
The Overall Picture
Uses for inferencing with ontologies and rules:
Automated metadata generation
• When we enter metadata we can focus on exactly what we know and not have to try to anticipate every way someone might want to use the metadata.
• Ontology modelers and rule writers can focus on setting things up so reasoners can add new statements and let metadata be queried in different ways.
Checking accuracy of metadata/data
• Reasoners can find the inconsistent statements
• Bad metadata / data gets flagged for review.
Open Problems
Versioning
How do we track changes in ontologies over time and ensure that existing statements don’t become useless or misinterpreted?
Provenance
How do we know where statements came from? Who provided them? Were they inferred by software or asserted by someone? Can we trust the asserter?
Conflicts
What do we do when we encounter statements saying completely contradictory things?
Open Problems
Expressivity
Does OWL allow us to express enough things to be useful in real-world applications?
- OWL 1.1: University of Manchester
Software Tools
Most tools for creating, storing, and reasoning with ontologies and statements are relatively new, rapidly changing, and likely to have lots of bugs.
How can I learn more?
Protégé-OWL
http://protege.stanford.edu/
Pellet reasoner
http://pellet.owldl.com
Pizza tutorial (on Protégé site)
http://www.co-ode.org/resources/tutorials/ProtegeOWLTutorial.pdf
Thank you