logics for data and knowledge representation introduction to semantic web fausto giunchiglia feroz...

28
Logics for Data and Knowledge Representation Introduction to Semantic Web Fausto Giunchiglia Feroz Farazi

Upload: alicia-hoover

Post on 25-Dec-2015

224 views

Category:

Documents


2 download

TRANSCRIPT

Logics for Data and Knowledge Representation

Introduction to Semantic Web

Fausto GiunchigliaFeroz Farazi

Semantic Web

An extension of the WWW, in which information is given well-defined

meaning, better enabling computers and people to work in

cooperation [T. Berners-Lee et al., 2001]

A new form of Web content that is computer comprehensible will

open up a revolution of new possibilities [T. Berners-Lee et al., 2001]

An alternative approach to represent Web content in machine

processable way, and to use intelligent techniques to take advantage

of these representations [G. Antoniou and F.v. Harmelen, 2004]

An extra abstraction layer, a so-called semantic layer, to be built on

top of the Web [F. Giunchiglia et al., 2010]

Definitions

Semantic Web

Semantics

Data and documents are assigned semantics

Semantics are codified as metadata

Logic

Logic as a tool for expressing knowledge and semantics

Ontology

A set of terms and semantic relations among them

ZIP code and postal code are equivalent for example

Language and Vocabulary

Semantic Web Languages (e.g., RDF and OWL)

Standard Vocabularies (e.g., Dublin Core and FOAF)

Keys

World Wide Web

An enormous collection of data and documents Any kind Mixed Keeps growing Open to all

Suffers from some well known limitations in information Searching Extracting Maintaining Unveiling

With all this limitations and features it is quite useful and interesting

Nevertheless, for better user experience we want to build a more

integrated and consistent Web

Dumb Web to Smart Web

Consider that you are planning vacation to major excavation region of

Heraklion in Crete Island Find a list of hotels by location List shows your known hotel chain Aldemar has a branch there Unfortunately, you do not see it in Aldemar’s website What would you call it? Dumb? Here with dumb we mean inconsistent

Consider that you are planning a conference trip to Crete Island You find many branches of Aldemar in the surroundings of the conference venue You wonder to know the nearest (minimum walking distance) one You can find many mapping sites (e.g., Google Map) answering the distance with

the addresses given in input You are the one spending time in copying and pasting addresses on the site. Can

we make it any better?

[D. Allemang and J. Hendler, 2008]

Dumb Web to Smart Web

Suppose you wonder to know the municipalities in the Autonomous

Province of Trento

municipalities in the province of Trento were reorganized in 2010

these were reduced from 223 to 217

still many sites listing the former statistics instead of the latter

because information is hard-coded in the html pages or retrieved from the

databases of the authorities to represent them on the web

in way for human consumption only

not for the machines, which hinders other parties to update changes automatically

Considering all the above what do we opt to build a smart web?

Smart applications or smart Web infrastructure?

Why

Smart Web Applications The Web is overwhelmed with smart applications, in addition day to

day new ones are coming to the scene

Great advancement achieved in the implementation of the ideas once

considered very hard to do or will never happen

To name a few applications Search engines’ matches are non-trivial, seem deep and intuitive Commerce sites recommend intelligently considering customer purchase patterns Mapping sites can plan routes and provide detailed information about geography

What role the Web infrastructure can play? All these smart applications are only as smart as the data provided to them Inconsistent data will lead to dumb result even from smart applications Web infrastructure needs to be improved to support better consistency of the data

the fact that smart applications can perform to their potential

Smarter Web A Web with an infrastructure that enhances the whole Web

experience by

enabling connections among data

letting users connect data to smart Web applications

not surprising us with inconsistencies

In the case of Aldemar hotel branch in the major excavation region of

Heraklion we need a coordination

between the Aldemar site and the hotel listing site by location in the level of data

that would help updating the list when there is a change in the location of hotels

In the mapping site scenario, we would like it to understand

the data from the conference and the hotels sites

without requiring human intervention in copying and pasting

Semantic Data and Web of Data

Semantic data is computer understandable data

e.g., representing the hotels as real world entities and their addresses as attributes

in Semantic Web languages using standard vocabularies

e.g., representing each municipality of Trento as part_meronym of the province,

entity-entity connectivity within a dataset

The Semantic Web is a web of interconnected datasets where

one data element can point to another (through URIs), rather than a webpage

points to another, forming a web of data

the Web infrastructure provides a data model supporting a single entity can be

distributed over the Web

the data model coherence is part of the Web infrastructure

Linked Data

Linked Data approach form the basis of data publishing guidelines

pinpointing how can data from government, public and private sectors

be more valuable for the consumers

Linked Data approach came up with

a set of principles

the star rating system

Principles

the use of http URIs as the identifiers of things (concepts, entities and attributes)

the provision of meaningful content published in RDF for each such URI reference

the production of navigable content via links

Linked Open Data

The star rating system is a system that rates the published data in a

scale from 1-star to 5-star

Getting 1-star requires publishing data on the Web with an open license regardless

of format, e.g., datasets can be published as images; this is also called Open Data

Producing 2-star data requires the Open Data to be made available in structured

format (e.g., excel; proprietary) in order to make it become machine readable

Producing 3-star data requires non-proprietary formats, e.g., csv or tsv, on top of

the previous rating levels

Getting 4-star requires publishing data using W3C open standards, e.g., RDF

Achieving 5-star, the highest level in the rating spectrum, demands establishing

links to RDF datasets published by others

A dataset that reaches 5-star is also called Linked Open Data

A World of Entities

EntitypediaLinked Entities

What is an entity?We organize our world (ground) knowledge around entities

»Entities are objects which are so important in our everyday life to be referred with a name»Each entity has its own metadata (e.g. name, latitude, longitude, height…)»Each entity is in relation with many other entities (e.g. Eiffel Tower is located in Paris, Fausto is a friend of Raffaella)»There are relatively “few” commonsense entity types (person, …, event)»There are many application/focus dependent entities (artifacts, maths, ..)

Eiffel Tower

Entitypedia – the key ideas• Clear separation between the

– knowledge (about entities/instances) and the– language (classes/concepts) used to express the knowledge

• Knowledge as very carefully designed (2)– Lattice of entity types (attributes, relations, services)– … unifying most (all?) standards (de jure, de facto) (Dublin Core,

FOAF, Facebook, …)• Language as very carefully designed (1)

– Linguistic resource (Wordnet + (Corelex + homographs) + multiple NLs)

– … + a faceted domain Knowledge organization infrastructure, developed using the analytico-synthetic approach (extending Library Science PMEST/DEPA frameworks)

• Direct linear time encoding into RDF/DL (3)– but (!) with fine tuned very fast data structures (for search, entity

matching, …)• (Relatively) large scale bootstrapping + continuous evolution (4)

– via system-sourcing and crowd-sourcing (under study now)• Data certification (5)

– … via quality certification pipeline (under study now)

Natural language and formal language

AUTOMOBILE CAR MACCHINA

The same concept can be expressed in different ways in the same language and

across languages

Different languages and terminology

Formal language: domains

DERA domains (D for Domain) organize the (formal concept) language into any number of domains (“any area of knowledge, chosen subjectively, that we want to reason or communicate about”). Examples: medicine. music, pop music, people, Movies, skiing, my garden …

LOCATION

MONUMENT

BODY OF WATER

RIVER

EIFFEL TOWER

COLOSSEUM

GARDA LAKE

MISSISSIPI

AMAZON RIVER

A fragment of the Space Domain

» Inspired by Ranganatan faceted approach

» Following precise design principles (analytico-synthetic approacch)

» Organize entities as classes of similar objects

» Independent of the specific chosen domains

» Lattice of (overlapping) domains » Top level domain = upper level

ontology

Formal language: Facets» A DERA Domain contains any number of facets (hierarchy of terms

each denoting an atomic concept – often corresponding to a NL multiword)

» A DERA Facet is of one of three types (E for Entity, R for Relation, A for Attribute)

LOCATION

MONUMENT

BODY OF WATER

RIVER

EIFFEL TOWER

COLISEUM

GARDA LAKE

MISSISSIPI

AMAZON RIVER

A fragment of an entity facet in the Space Domain

» Entity: see picture (classes of entities and entities)

» Relation: Far, near, east, … with roles playing the double role of entity and relation

» Attribute: qualities / quantities (high, low, 23m,) , descriptive attributes (“India is a democratic country”)

User interface

Knowledge

» A set of entity types, each entity type defined in terms of:˃ Attributes (e.g., height, lattitude)

˃ Relations (e.g., locatedIn, friend)

˃ Services (e.g., computeAge, computeFoFs, computeInverseRelation, ..)

˃ Many (categories of) metaattributes (e.g., mandatory, identifying, permanent, timespan, provenance, …)

» Entity types organized in a lattice ˃ coherent with the domain lattice

˃ With an ordering on <attibutes, relations, services> but also subsupmption, value ranges, …

» Entities:˃ A name and a URI

˃ Etype <attributes, relations, services> plus free

˃ One reference etype and many induced etypes

Knowledge services

» CRUD on entities» EntitySearch(“metadata of E1”) (*useful in NER

*)» EntityMatch (E1, E2)» Etypes (“some element of an entity”)» Extension (etype) (* same as search(etype)

*)» Navigate (E1, R) (* Navigate (Fausto,

Friends) *)» Distance(E1,E2,R) (* Distance(Fausto, Obama, Friend)

*)» … » … many etype and application dependent services

Entity type lattice

Some examples of etypes

ENTITYName String [ ] Description SString [ ] Part Of <Entity> Homepage URL [ ] Start Moment End Moment Duration Duration

EVENT extends ABSTRACT ENTITYParticipant <Person> [ ] | <Organization> [ ]Location <Location> Status Enum <SString>…

LOCATION extends PHYSICAL ENTITYLatitude floatLongitude floatAltitude float…

PHYSICAL ENTITY extends ENTITY Height floatLength floatWidth floatWeight float

Example of entities

ETH Zurich

UNIVERSITY

Albert Einstein Mileva Maric

Ulm Germany

part-of

birth place

spouse

affiliation

SCIENTIST PERSON

CITY COUNTRY

A critical issue: dot-objects

ETH Zurich

UNIVERSITY(as organization)

UNIVERSITY(as building)

Some entities have a clear inherent polysemy (Pustejovski)

» According to the situation either one aspect or the other (typically the physical or abstract aspect) of the entity is emphasized. This generates polysemy in language.

» Since it depends on the situation, it would be wrong to permanently disambiguate it in one or the other way

» We need a systematic way to represent these entities

Encoding into RDF

» Choose (sub)domain» E facet translates into TBOX concept subsumption

axioms (e.g., river LG “body of water”)» R facet translates into TBOX role subsumption (e.g.,

parentOf MG fatherOf)» A facet translates into TBOX subsumption (e.g.,

angularDistance MG latitude)» Entity properties translate into ABOX axioms (e.g.,

livesIn(Fausto, Trento)

NOTE: Used only for interoperability, open data, … reasoning on native data structures as specific purpose services

Features of a Semantic Web Radical new way of thinking about representing information for better

results and better management

The feature of the Web is characterized by AAA Slogan (Anyone can

say Anything about Any topic)

On the Semantic Web any individual has to be allowed to contribute a

piece of data about some entity that can be linked to the information

from other sources

This requirement

was taken into account while designing RDF

has a consequence that there is always one more (something new that someone

will express) could be known – Open World Assumption

RDF RDF (Resource Description Framework)

– A language for representing data in the Semantic Web

– a simple data model for making statements

– the capability to perform inference on the statements

Data model in RDF

– The data model in RDF is a graph data model

– An edge with two connecting nodes form a triple

– Triple elements are subject, object and predicate

RDF representation

– URIs to identify subjects, objects and predicates

– Objects can be Literals

References T. Berners-Lee, J. Hendler, & O. Lassila (2001, May). The Semantic

Web. Scientific American 284,34–43. G. Antoniou & F. van Harmelen (2004). A Semantic Web Primer

(Cooperative Information Systems). MIT Press, Cambridge MA, USA. F. Giunchiglia, F. Farazi, L. Tanca, and R. D. Virgilio. The semantic

web languages. In Semantic Web Information management, a model based perspective. Roberto de Virgilio, Fausto Giunchiglia, Letizia Tanca (Eds.), Springer, 2009.

D. Allemang and J. Hendler. Semantic web for the working ontologist: modeling in RDF, RDFS and OWL. Morgan Kaufmann Elsevier, Amsterdam, NL, 2008.

T. Berners-Lee. Linked Data. Design Issues for the World Wide Web - W3C, http://www.w3.org/DesignIssues/LinkedData.html, 2006.