a really brief crash course in semantic web technologies rocky dunlap spencer rugaber georgia tech

19
A Really Brief Crash Course in Semantic Web Technologies Rocky Dunlap Spencer Rugaber Georgia Tech

Upload: angel-garrett

Post on 24-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Really Brief Crash Course in Semantic Web Technologies Rocky Dunlap Spencer Rugaber Georgia Tech

A Really Brief Crash Course

in Semantic Web Technologies

Rocky Dunlap

Spencer Rugaber

Georgia Tech

Page 2: A Really Brief Crash Course in Semantic Web Technologies Rocky Dunlap Spencer Rugaber Georgia Tech

Languages you may encounter... XML (eXtensible Markup Language) XML Schema XPath (navigate an XML document) XQuery (query an XML document) XSLT (Extensible Stylesheet Language Transformations) RDF (Resource Description Framework) RDF Schema OWL (Web Ontology Language) SPARQL (Query language for RDF triples) SQL (Structured Query Language – for RDBMS) UML (Unified Modeling Language – conceptual) SKOS (Simple Knowledge Organization System) – glossary

Page 3: A Really Brief Crash Course in Semantic Web Technologies Rocky Dunlap Spencer Rugaber Georgia Tech

Links to language specs

Name Source Description

RDF W3C Resource Description Framework

RDFS W3C RDF Schema

SKOS W3C Simple Knowledge Organisation Systems

SPARQL W3C RDF/OWL Query Language

SQL ANSI/ISO Structured Query Language

UML OMG Unified Modeling Language

OWL W3C Web Ontology Language

XML W3C Extensible Markup Language

XML Schema (XSD) W3C XML Schema

XPath W3C XML Path Language

XQuery W3C XML Query Language

XSLT W3C Extensible Stylesheet Language Transformations

Page 4: A Really Brief Crash Course in Semantic Web Technologies Rocky Dunlap Spencer Rugaber Georgia Tech

XML

General purpose markup language Mechanism for structured data exchange

between heterogeneous systems Basically: elements (tags) and attributes Not really for human consumption, although it

is easy for us to read and write in small amounts

An XML file is often called an instance document

Page 5: A Really Brief Crash Course in Semantic Web Technologies Rocky Dunlap Spencer Rugaber Georgia Tech

XML Schema

Defines the allowed structure of a set of instance documents

Defines a set of “types” -- valid chunks of XML Typically the schema is defined up front and

applications are written to process valid or schema-conforming instance documents

The schema is a way to achieve standardization – like a contract “If you provide a valid document, we’ll provide you with

tools that do X, Y, and Z.”

Page 6: A Really Brief Crash Course in Semantic Web Technologies Rocky Dunlap Spencer Rugaber Georgia Tech

RDF

A knowledge representation language Conceptual in nature It really has nothing to do with XML

But, there happens to be an XML representation A way to make statements about pretty much

anything you want: “The Curator meeting is at GFDL.” “The Curator meeting is Oct 18-19.” “Balaji works at GFDL.”

Page 7: A Really Brief Crash Course in Semantic Web Technologies Rocky Dunlap Spencer Rugaber Georgia Tech

RDF Statements“The Curator meeting is at GFDL.”

Curatormeeting GFDL

hasLocation

subject predicate object

Page 8: A Really Brief Crash Course in Semantic Web Technologies Rocky Dunlap Spencer Rugaber Georgia Tech

RDF Statements“The Curator meeting is Oct 18-19.”

Curatormeeting GFDL

“18 Oct 2007”

“19 Oct 2007”

hasLocation

starts

ends

resource

literal

Page 9: A Really Brief Crash Course in Semantic Web Technologies Rocky Dunlap Spencer Rugaber Georgia Tech

RDF Statements“Balaji works at GFDL.”

Curatormeeting GFDL

“18 Oct 2007”

“19 Oct 2007”

Balaji

hasLocationworksAt

starts

ends

Page 10: A Really Brief Crash Course in Semantic Web Technologies Rocky Dunlap Spencer Rugaber Georgia Tech

RDF XML Representation<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:esc="http://www.earthsystemcurator.org">

<rdf:Description rdf:about=“http://....#OctCuratorMeeting"><esc:hasLocation rdf:resource=“http://....#GFDL”/><esc:starts>18 Oct 2007</esc:starts><esc:ends>19 Oct 2007</esc:ends>

</rdf:Description>

<rdf:Description rdf:about=“http://....#Balaji"><esc:worksAt rdf:resource=“http://....#GFDL”/>

</rdf:Description>

</rdf:RDF>

Page 11: A Really Brief Crash Course in Semantic Web Technologies Rocky Dunlap Spencer Rugaber Georgia Tech

RDF Schema

Define a domain specific data model for RDF Includes classes and properties (along with

subclasses and subproperties) Properties are first class (they are not defined as

part of a particular class)

Page 12: A Really Brief Crash Course in Semantic Web Technologies Rocky Dunlap Spencer Rugaber Georgia Tech

RDF SchemaClasses Properties

Event

MeetingFlight

Person

hasLocation domain: Event range: Place

starts domain: Event range: date

Place

ends domain: Event range: date

worksAt domain: Person range: Place

Page 13: A Really Brief Crash Course in Semantic Web Technologies Rocky Dunlap Spencer Rugaber Georgia Tech

OWL (Web Ontology Language) Builds on RDF by adding increased

expressivity Every OWL file is RDF (but not necessarily

the reverse)

Page 14: A Really Brief Crash Course in Semantic Web Technologies Rocky Dunlap Spencer Rugaber Georgia Tech

RDF vs. OWL

ClassesSubclassesPropertiesSubpropertiesIndividuals

RDF

OWL

Property constraints -allValuesFrom -someValuesFrom -hasValue

Cardinality constraints on properties -cardinality (exact) -minCardinality -maxCardinality

Class definitions -intersection -union -complement -equivalentClass -disjointWith -oneOf (enum)

Transitive PropertiesSymmetric Properties

Individuals -sameAs -differentFrom

Page 15: A Really Brief Crash Course in Semantic Web Technologies Rocky Dunlap Spencer Rugaber Georgia Tech

Things you can NOT say in RDF, but can say in OWL The class TriangularUnstructuredGrid is at the

intersection of TriangularGrid and UnstructuredGrid UnstructuredGrid is the complement of

StructuredGrid A Dataset is generated by exactly one Model A Model is made up of at least one Component An AtmosphereComponent is a Component with

ScienceType equal to “Atmosphere” X subComponent Y, Y subComponent Z X

subComponent Z

Page 16: A Really Brief Crash Course in Semantic Web Technologies Rocky Dunlap Spencer Rugaber Georgia Tech

Things you can NOT say in RDF, but can say in OWL The class Model is equivalent to

ConfiguredModel ScienceType is the exact enumeration

Atmosphere, Ocean, Ice, and Land ObservationDataset is disjoint from

SimulationDataset Dataset123 is the same object as

DatasetXYZ

Page 17: A Really Brief Crash Course in Semantic Web Technologies Rocky Dunlap Spencer Rugaber Georgia Tech

SPARQL

A language for querying RDF/OWL triples Example query:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>SELECT ?x ?name WHERE { ?x foaf:name ?name }

Page 18: A Really Brief Crash Course in Semantic Web Technologies Rocky Dunlap Spencer Rugaber Georgia Tech

Curator’s Current Strategy

Curator data model written in XML Schema Models and Datasets (Resources*) annotated

with conforming XML instance documents Portions of XML translated into RDF and

exposed by CDP-Curator faceted search This means:

Low level details remain in XML instance Higher level concepts pulled out into the RDF

Can we confirm this strategy?

Page 19: A Really Brief Crash Course in Semantic Web Technologies Rocky Dunlap Spencer Rugaber Georgia Tech

Technical Challenges

XML to RDF translation Hierarchical, low level graph-based, conceptual Is there a need to go from RDF back to XML? What stays in XML? What goes to RDF? Automation of translation

Schema level (e.g., schema evolution) Instance level (e.g., submission of new resource to

CDP-Curator)