where does it break? or: why semantic web research is not just “computer science as usual”

47
Where does it break? or: Why Semantic Web research is not just “Computer Science as usual” Frank van Harmelen AI Department Vrije Universiteit Amsterdam

Upload: bisa

Post on 14-Jan-2016

14 views

Category:

Documents


1 download

DESCRIPTION

Where does it break? or: Why Semantic Web research is not just “Computer Science as usual”. Frank van Harmelen AI Department Vrije Universiteit Amsterdam. if you can read this, you must be on steroids. But first:. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

Where does it break?or:

Why Semantic Web research is not just

“Computer Science as usual”

Frank van HarmelenAI Department

Vrije Universiteit Amsterdam

Page 2: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

2

But first:

“the Semantic Web forces us to rethink the foundations of many subfields of Computer Science”

“the Semantic Web forces us to rethink the foundations of many subfields of Computer Science”

“the challenge of the Semantic Web continues to break many often silently held and shared assumptions underlying decades of research”

“the challenge of the Semantic Web continues to break many often silently held and shared assumptions underlying decades of research”

“I will try to identify silently held assumptions which are no longer true on the Semantic Web, prompting a radical rethink of many past results”

“I will try to identify silently held assumptions which are no longer true on the Semantic Web, prompting a radical rethink of many past results”

if you can

read this,

you must

be on

steroids

Page 3: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

3

Oh no, not more “vision”…

Don’t worry,there will be lots of technical content

our plan is to invent some sort of

doohicky that everyone wants to

buy.

the visionary leadership work is

done. How long will your part take?

sales are dropping like a rock

Page 4: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

4

Grand Topics… what are the science challenges in SW? Which implicit traditional assumptions

break? Illustrated with 4 such “traditional

assumptions”

and also: “Which Semantic Web” ?

Page 5: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

Before we go on:

Which Semantic Web are we talking about?

Page 6: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

8

Which Semantic Web?Version 1:

"Semantic Web as Web of Data" (TBL)

recipe:expose databases on the web, use RDF, integrate

meta-data from: expressing DB schema semantics

in machine interpretable waysenable integration and unexpected re-

use

Page 7: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

9

Which Semantic Web?Version 2:

“Enrichment of the current Web”

recipe:Annotate, classify, index

meta-data from: automatically producing markup:

named-entity recognition, concept extraction, tagging, etc.

enable personalisation, search, browse,..

Page 8: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

10

Which Semantic Web?Version 1:

“Semantic Web as Web of Data”

Version 2:“Enrichment of the current Web”

Different use-cases Different techniques Different users

Page 9: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

Semantic Web:

Science or technology?

Page 10: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

15

Semantic Web as Technology better search & browse personalisation semantic linking semantic web services ...Semantic Web as Science

Page 11: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

4 examples of“where does it break?”

old assumptions that no longer hold, old approaches that no longer work

Page 12: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

4 examples of“where does it break?”

Traditional complexity measures

Page 13: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

18

Who cares about

decidability?

Decidability ≈ completeness guarantee to find an answer, or tell you it doesn’t exist,given enough run-time & memory

Sources of incompleteness: incompleteness of the input data insufficient run-time to wait for the answer

Completeness is unachievable in practice anyway, regardless of the completeness of the algorithm

Page 14: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

19

Who cares about undecidability?Undecidability

≠ always guaranteed not to find an answer

Undecidability = not always guaranteed to find an answer

Undecidability may be harmless in many cases;in all cases that matter

Page 15: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

20

Who cares about complexity?worst-case: may be exponentially

rareasymptotic

ignores constants

Page 16: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

21

What to do instead?Practical observations on RDF Schema:

Compute full closure of O(105) statementsPractical observations on OWL:

NEXPTIME but fine on many practical cases

Do more experimental performance profileswith realistic data

Think hard about “average case” complexity….

69

Page 17: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

4 examples of“where does it break?”

Traditional complexity measures Hard in theory, easy in practice

Page 18: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

Example:Reasoning with

Inconsistent Knowledge

This work withZhisheng Huang & Annette ten Teije

Page 19: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

24

Knowledge will be inconsistent

Because of:mistreatment of defaultshomonymsmigration from another formalismintegration of multiple sources

Page 20: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

25

New formal notions are needed

New notions: Accepted: Rejected: Overdetermined: Undetermined:

Soundness: (only classically justified results)

Page 21: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

26

Selectionfunction

Basic Idea

1. Start from the query2. Incrementally select larger parts of the

ontology that are “relevant” to the query, until:

i. you have an ontology subpart that is small enough to be consistent and large enough to answer the queryor

ii. the selected subpart is already inconsistent before it can answer the query

Page 22: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

27

General Framework

s(T,,0)s(T,,1)s(T,,2)

Page 23: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

28

More precisely:

Use selection function s(T,,k),with s(T,,k) µ s(T,,k+1)

1. Start with k=0: s(T,,0) j¼ or s(T,,0) j¼ : ?

2. Increase k, untils(T,,k) j¼ or s(T,,k) j¼ :

3. Abort when undetermined at maximal k overdetermined at some k

Page 24: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

29

Nice general framework, but...which selection function s(T,,k) to use?Simple option: syntactic distance

put all formulae in clausal form:a1 Ç a2 Ç … Ç an

distance k=1 if some clausal letters overlap a1 Ç X Ç … Ç an, b1 Ç … X Ç bn

distance k if chain of k overlapping clauses are neededa1 Ç X Ç … X1 Ç an b1 Ç X1 Ç … X2 Ç bn, ….

c1 Ç Xk Ç … X Ç cn

Page 25: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

30

Evaluation

Ontologies: Transport: 450 concepts

Communication: 200 conceptsMadcow: 55 concepts

Selection functions: symbol-relevance = axioms overlap by 1

symbol concept-relevance axioms overlap by 1

concept

Query a random set of subsumption queries:Concept1 Concept2 ?

Page 26: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

31

Evaluation: Lessons

this makes concept-relevance a high quality sound approximation(> 90% recall, 100% precision)

Page 27: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

32

Works surprisingly well

On our benchmarks,allmost all answers are “intuitive”Not well understood whyTheory doesn’t predict that this is easy

paraconsistent logic, relevance logic multi-valued logic

Hypothesis: due to “local structure of knowledge”?

Page 28: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

4 examples of“where does it break?”

Traditional complexity measures Hard in theory, easy in practice

context-specific nature of

knowledge

Page 29: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

34

Opinion pollleft

meaning of a sentence is only determined by the sentence itself,and not influenced bythe surroundingsentences, and not by the

situation in which the sentence is used

meaning of a sentence is only determined by the sentence itself,and not influenced bythe surroundingsentences, and not by the

situation in which the sentence is used

meaning of sentence is not only determinedby the sentence itself,but is also influenced

byby the surroundingsentences,and also by the

situationin which the sentenceis used

meaning of sentence is not only determinedby the sentence itself,but is also influenced

byby the surroundingsentences,and also by the

situationin which the sentenceis used

right

Page 30: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

35

Opinion pollrightleft

don’t you see what I mean?

Page 31: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

Example:Ontology mapping

with community support

This work withZharko Aleksovski &

Michel Klein

Page 32: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

37

The general idea

source target

background knowledge

anchoring anchoring

mapping

inference

Page 33: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

38

Example 1

Page 34: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

39

Example 2

Page 35: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

41

Experimental results Source & target =

flat lists of ±1400 ICU terms each Anchoring =

substring + simple germanic morphology Background = DICE (2300 concepts in DL)

Page 36: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

42

New results: more background knowledge makes

mappings better DICE (2300 concepts) MeSH (22000 concepts) ICD-10 (11000 concepts)

Monotonic improvement of qualityLinear increase of cost

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4

Page 37: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

44

So…The OLVG & AMC terms get their

meaning from the context in which they are being used.

Different background knowledge would have resulted in different mappings

Their semantics is not context-freeSee also: S-MATCH by Trento

Page 38: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

4 examples of“where does it break?”

Traditional complexity measures Hard in theory, easy in practice context-specific nature of

knowledge logic vs. statistics

Page 39: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

46

Logic vs. statisticsDB schema’s & integration is

only logic, no statistics

AI is both logic and statistics,but completely disjoint

Find combinations of the two worlds? Statistics in the logic? Statistics to control the logic? Statistics to define the semantics of the

logic?

Page 40: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

47

Statistics in the logic? Fuzzy DL

(TalksByFrank v InterestingTalks) ¸ 0.7(Turkey:EuropeanCountry) · 0.2youngPerson = Person u 9 age.Young

Young(x) =

veryYoungPerson = Person u 9 age.very(Young)

10yr 30yr

1

0

10yr 30yr

1

0Umberto Straccia

Page 41: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

48

Statistics to control the logic?

query: A v B ?

B = B1 u B2 u B3 A v B1, A v B2, A v

B3 ?

B1 B3

B2

A

Page 42: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

49

Statistics to control the logic?

Use “Google distance” to decide which ones are reasonable to focus on

Google distance ≈ symmetric conditional probability of co-

occurrence≈ estimate of semantic distance≈ estimate of “contribution” to A v B1 u B2 u

B3 B1 B3

B2

A

This work by Riste Gligorov

Page 43: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

50

Statistics to define semantics?Many peers have many mappings

on many terms to many other peers

Mapping is good if results of “whispering game” are truthful

Punish mappings that contribute to bad whispering results

Network will converge to set of good mappings(or at least: consistent)

This work by Karl Aberer

Page 44: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

51

Statistics to define semantics? Meaning of terms =

relations to other terms Determined by stochastic process Meaning ≈

stable state of self-organising system statistics = getting a system to a

meaning-defining stable state logic = description of such a stable

state

Note: meaning is still binary, classical truth-value

Note: same system may havemultiple stable states…

Page 45: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

4 examples of“where does it break?” old assumptions that no longer hold, old approaches that no longer work

Traditional complexity measures don’t workcompleteness, decidability, complexity

Sometimes “hard in theory, easy in practice”Q/A over inconsistent ontologies is easy, but why?

Meaning dependent on contextmeaning determined by background knowledge

Logic versus statisticsstatistics in the logicstatistics to control the logicstatistics to determine semantics

Page 46: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

53

Final comments These 4 “broken assumptions/old methods”

were just examples. There are many more.(e.g. Hayes, Halpin on identity, equality and reference)

Notice that they are interlinked, e.g hard theory/easy practice & complexity meaning in context & logic/statistics

Working on these will not be SemWeb work per se, but

they will be inspired by SemWeb challenges they will help the SemWeb effort (either V1 or V2)

Page 47: Where does it break? or: Why Semantic Web research  is not just  “Computer Science as usual”

54

Have fun with the puzzles!