cade23-schneidsut-atp4owlfull-2011

Reasoning in the OWL 2 Full Ontology Languageusing First-Order Automated Theorem Proving

Michael SchneiderFZI Research Center for Information Technology, Germany

Geoff SutcliffeUniversity of Miami, USA

23rd International Conference on Automated Deduction (CADE 23)Wrocław, Poland, August 2011

Introduction

• Context: Semantic Web, OWL, First-Order Logic, ATP

• Focus: W3C ontology language OWL 2 Full– highly expressive ontology formalism– no reasoner as of today

• Aims: find out …– … to what extend can practical OWL 2 Full reasoning be

implemented with off-the-shelf FOL reasoning technology– … whether FOL-based OWL 2 Full reasoning can provide

added value over today's state-of-the art OWL reasoners

PRELIMINARIES

Semantic Web• Semantic Web (SW): extends WWW by machine-processible,

interlinked resource descriptions and vocabularies– Resources (everything one can talk about):

• identified by URIs • described by property-value pairs• classified by classes• related to other resources via properties

– Vocabularies: define classes and properties and their (formal) semantics,e.g. that class Eagle is a subclass of class Animal, so Harry becomes an Animal

• RDF ("Resource Description Framework"): language to define"graphs" of interlinked resource descriptions

• OWL ("Web Ontology Language"): language to define vocabularies

:Eagle

:Harry :Larry3

m

age

sexfather

OWL Flavours

• OWL 2: family of ontology languages for the Semantic Web– W3C Recommendation (2009)– version 2 is revised and extended version of OWL (2004)

• Two major "flavours" of OWL: OWL 2 DL and OWL 2 Full– OWL 2 DL: basically a description logic (SROIQ[D]) adjusted to SW needs– OWL 2 Full: similar to OWL 2 DL, but directly designed for an RDF-based SW

• Some observable distinctive features of OWL 2 Full:– can be applied to weakly-structured SW data (LOD) and RDFS vocabularies– no restrictions on use of OWL constructs (e.g. asymmetric transitive properties)– support for (semantic) metamodeling (Harry, the Eagle, has meta-class Species)

• Theoretical issue: OWL 2 Full is undecidable (practical problem?)

OWL 2 Full Semantics

• Specification: OWL 2 Full semantics is specified via a set ofmodel-theoretic "semantic conditions"

OWL 2 RDF-Based Semantics:http://www.w3.org/TR/owl2-rdf-based-semantics/

• Core Observation:– all semantic conditions have the form of standard first-order logic formulae

→ OWL 2 Full semantics is essentially a FOL theory!– all input RDF graphs are representable as FOL formulae– hence: OWL 2 Full reasoning is implementable in terms of FOL reasoning!

• Question: How well does it work in practice?• Prior Art: very little research based on this observation so far;

none for OWL 2 Full

http://www.w3.org/TR/owl2-rdf-based-semantics/�

OWL 2 Full Semantic Conditions• Typical format of semantic conditions:

if a certain semantic relationship holds,then another associated relationship also holds

(Note: many semantic conditions are in fact if-and-only-if conditions)

• Example (1st semantic condition in Table 5.8 of OWL 2 RDF-Based Semantics):if two individuals c1 and c2 are related by the denotation of URI rdfs:subClassOf,then c1 and c2 are classes (members of set IC) and ICEXT(c1), the class extensionof c1, is a subset of ICEXT(c2), the class extensions of c2

Prior Art in OWL Full Reasoning

• Fikes, McGuinness, Waldinger: A First-Order Logic Semantics for Semantic Web Markup Languages. TR, Stanford, 2002.– translation of specifications of precursers of OWL and RDF into first-order

logic (FOL) theory, and application of FOL reasoners– focus: checking for technical issues in specifications (less on inferencing)

• Hayes: Translating Semantic Web Languages into Common Logic.TR, Pensacola (Florida), 2005.– translation of OWL 1 Full into Common Logic (basically a variant of FOL)– no report on reasoning experiments

• Hawke: Surnia. 2003. URL: http://www.w3.org/2003/08/surnia– OWL 1 Full reasoner based on FOL translation using Otter FOL reasoner– did not perform well on W3C OWL 1 test suite– ad hoc implementation: does not properly follow specification; many flaws

http://www.w3.org/2003/08/surnia�

APPROACH

Translation into FOL:General Process

10

OWL 2 Full Entailment Checking using FOL Reasoners and concrete FOL syntax TPTP:

TPTP Axiom Set

ConclusionRDF Graph

OWL 2 FullSemantic Conditions

PremiseRDF Graph

TPTP Axiom

TPTP Conjecture

FOL Theorem Prover

FOL Model Finder

theoremcounter-satunknown{ }

1. Input:• translate semantic conditions into set of TPTP axioms• translate premise RDF graph into TPTP axiom• translate conclusion RDF graph into TPTP conjecture

2. Reasoning: feed all TPTP formulae into FOL reasoners (parallel execution):• FOL theorem provers: used to detect positive entailments• FOL model-finders: used to detect non-entailments

3. Output: integrate results from FOL reasoners into single result

Translation into FOL:Semantic Conditions

model-theoretic OWL 2 Full semantic condition (Table 5.8)

iff

corresponding FOL formula (TPTP)

Translation into FOL:RDF Graphs

RDF graph (Turtle)

corresponding FOL formula (TPTP)

EVALUATION SETTING

Evaluation Setting:TPTP-Encoding

• FOL Axiomatization:– translated most normative semantic conditions of OWL 2 Full– excluded: datatype reasoning-related semantics– size of complete axiomatization: 558 FOL formulae

• RDF Graph Conversion:– implemented simple RDF-to-TPTP converter tool

Syntax StatisticsNumber of formulae: 558 ( 196 unit )Number of atoms: 1772 ( 90 equality )Maximal formula depth: 27 ( 5 average )Number of connectives: 1350 ( 136 '~' ; 35 '|' ; 758 '&' ; 126 '<=>' ; 295 '=>' )Number of predicates: 13 ( 1 propositional ; 0-3 arity )Number of functors: 157 ( 156 constant ; 0-2 arity )Number of variables: 973 ( 0 sgn ; 911 '!' ; 62 '?' )Maximal term depth: 2 ( 1 average )

Evaluation Setting:Experiments

1. Language Coveragecompleteness w.r.t. OWL 2 Full specification

2. Characteristic OWL 2 Full Conclusionssemantic capabilities beyond OWL 2 DL or OWL 2 RL/RDF Rules

3. Scalabilityreasoning upon large data sets

4. Model Findingdetecting consistent ontologies and non-entailments

Evaluation Setting:Reasoners

• FOL Theorem Provers:– Vampire 0.6 (using two modes: "auto", and with SInE strategy)– iProver-SInE 0.8 (iProver with SInE strategy and strategy scheduling)

• FOL Model-Finders:– Paradox 4.0 (finite model finder)– DarwinFM 1.4.5 (finite model finder)

• OWL Reasoners:– Pellet 2.2.2 (tableaux-based OWL 2 DL reasoner)– HermiT 1.3.2 (tableaux-based OWL 2 DL reasoner)– FaCT++ 1.5.0 (tableaux-based OWL 2 DL reasoner)– BigOWLIM 3.4, using "owl-rl" ruleset (RDF entailment-rule reasoner)– Jena 2.6.4, using OWL_MEM_RULE_INF spec (RDF framework with rule engine)– Parliament 2.6.9 (reasoning-enabled RDF triple store)

Evaluation Setting:Environment

• Computers:– CPU: Intel Pentium 4, 2.8 GHz– Memory: 2 GB– Operating System: Linux FC8

• Max. CPU time per run: 300 s

EVALUATION RESULTS1. LANGUAGE COVERAGE

Experiment 1: Language CoverageOverview

• Aim: analyse completeness w.r.t. OWL 2 Full specification• Method: check that all parts of OWL 2 Full semantics specification

are covered (except for datatype reasoning)• Test Data: dedicated OWL 2 Full coverage test suite targeted to

specification level (Schneider & Mainzer, 2009):– at least one test case for each OWL 2 Full semantic condition– each test case focuses as much as possible on targeted semantic condition– generally easy to solve, hence failure indicates flaw or lack of coverage

• Adjustments:– removal of datatype-reasoning related test cases (currently unsupported)– only using entailment and inconsistency test cases– size of remaining test suite: 411 test cases

Experiment 1: Language CoverageExample Test Case

TESTCASE rdfbased-sem-rdfs-subclass-condp rdfs owl2rlThe extensions of two classes related by rdfs:subClassOfare in a subsumption relationship.+ex:c1 rdfs:subClassOf ex:c2 .ex:w rdf:type ex:c1 .+ex:w rdf:type ex:c2 .+

Test case for probing coverage of the RDFS semantic condition for class subsumption:

This positive entailment (‘p‘) test case applies to RDFS (‘rdfs‘), the OWL 2 RL/RDF Rules (‘owl2rl‘), and all common semantic extensions, including OWL 2 Full. The upper RDF graph is the premise graph, the lower RDF graph is the conclusion graph.

Experiment 1: Language CoverageResults

21

396383

34914

129282

190246

237

00

0373

282129

45157168

1528

6224

00

17686

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

iProver-SInE with small axiom setsiProver-SInE, OWL 2 Full axioms

Vampire, OWL 2 Full axiomsParliament

JenaBigOWLIM

FaCT++HermiT

Pellet

Success Wrong UnknownNotes:•All DL reasoners show similar results (although FaCT++ signals much more errors)•Results of BigOWLIM and DL reasoners are very different (BigOWLIM not "better")•"small axiom set": a manually selected subset of axioms from the completeOWL 2 Full axiom set that is small but sufficient to succeed on the given test case

Experiment 1: Language CoverageRuntimes (sorted)

22

Notes:• Most problems solved in less than 1s• Vampire solves slightly less problems, but is generally faster

→ suggests strategy to run both reasoners in parallel

For each reasoner: runtimes for all test cases are sorted increasingly(all runtimes are for the complete OWL 2 Full axiom set; small axiom sets are ignored)

EVALUATION RESULTS:2. CHARACTERISTIC CONCLUSIONS

Experiment 2: Characteristic ConclusionsOverview

• Aim: analyse ability to infer semantic conclusions that arecharacteristic for OWL 2 Full (beyond OWL 2 DL or OWL 2 RL/RDF)

• Test Data: new "Characteristic Conclusions" test suite– 32 test cases (manually created)– probes many distinctive features of OWL 2 Full, including:

• strong logic-based reasoning• unrestricted use of complex properties• blank nodes as existentially quantified variables• metamodeling• use of data values as individuals• semantic annotation properties• reflective use of built-in vocabulary terms

– Differences to Language Coverage test suite:• focus is on "emergent behaviour" of OWL 2 Full rather than on technical specification• most test cases depend on interplay of several OWL 2 Full semantic conditions• results often technically non-obvious (proof needed)

Experiment 2: Characteristic ConclusionsExample Test Case

Test Case: 014_Harry_belongs_to_some_Species

Premise Graph:ex:Eagle rdf:type ex:Species .ex:Falcon rdf:type ex:Species .ex:harry rdf:type [

owl:unionOf ( ex:Eagle ex:Falcon ) ] .

Conclusion Graph:ex:harry rdf:type _:x ._:x rdf:type ex:Species .

Test case for probing metamodeling with Boolean logic reasoning and blank node semantics:

This positive entailment test case applies to OWL 2 Full, but neither to OWL 2 DL (requiresreasoning based on metamodeling) nor to OWL 2 RL/RDF (requires strong support for classunion and existential blank node semantics).

Experiment 2: Characteristic ConclusionsResults

26

01

02

03

04

05

06

07

08

09

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

+ + + - - - - - + + - - - - + - - - - + + - - - - + - - ? - - -

+ ? + - - ? - + + + - - - - + - - - - + + - - + ? + - - ? - - -

+ ? ? ? ? ? ? - ? + - - - ? + ? - - - + + ? ? ? ? + - ? ? - - ?

+ - - + - - + + - - + + - - + - - + + - - - - - - - - - - - - -

+ - - - - + + + - - + - - - - - - + - - - - + - - + - - - - - +

+ - - - - - - + - - ? - - - - - - - ? - - - - - - - - - - ? ? -

Pellet

Hermit

Fact++

BigOWLIM

Jena

Parliament

+ success

- wrong

? unknown

Vampire / complete

iProver-SInE / complete

Vampire / small

iProver-SInE / small

+ + + + + + + + + ? + ? ? + + + + + + ? ? ? + + ? + ? ? + + + +

+ + + + + + + + + + + ? ? + + + + + + ? ? + + + + + + + + + + +

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Notes:•OWL reasoners weak: < 30% success rate•DL vs RDF-rule reasoners: nearly no overlap for successful results•FOL reasoners: much better on complete axiom set; perfect on small axiom sets

Experiment 2: Characteristic ConclusionsRuntimes (sorted)

27

Notes:• FOL reasoners often slow when using complete axiom set• Generally much faster with small axiom sets (up to several magnitudes)

EVALUATION RESULTS:3. SCALABILITY

Experiment 3: ScalabilityOverview

• Aim: analysing reasoning upon large data sets, when most data isnot relevant for reasoning result (most simple scenario for a start)

• Method: using existing reasoning test suite, but with large massesof "bulk" RDF data added to premise graph, where the bulk data issemantically weak and unrelated to the test suite

• Test Data:– Reasoning test suite: Characteristic Conclusions test suite– Bulk RDF data: 1 Million triples, no RDF(S)/OWL vocabulary terms, no URIs

shared with reasoning test cases

• Reasoning Scenarios:– auto reasoning mode vs. SInE strategy– complete axiom set vs. small axiom sets

Experiment 3: ScalabilityExample Bulk RDF Data Set

ex:si1 ex:pi1 ex:oi1 .ex:si2 ex:pi2 ex:oi2 .ex:si3 ex:pi3 ex:oi3 .ex:si4 ex:pi4 ex:oi4 .ex:si5 ex:pi5 ex:oi5 .ex:ss ex:ps1 ex:os1 .ex:ss ex:ps2 ex:os2 .ex:ss ex:ps3 ex:os3 .ex:ss ex:ps4 ex:os4 .ex:ss ex:ps5 ex:os5 .ex:sp1 ex:pp ex:op1 .ex:sp2 ex:pp ex:op2 .ex:sp3 ex:pp ex:op3 .ex:sp4 ex:pp ex:op4 .ex:sp5 ex:pp ex:op5 .ex:ssp ex:psp ex:osp1 .ex:ssp ex:psp ex:osp2 .ex:ssp ex:psp ex:osp3 .ex:ssp ex:psp ex:osp4 .ex:ssp ex:psp ex:osp5 .

This is an example bulk RDF data set consistingof 20 RDF triples. The data set has no names in common with any of the test cases being usedin the evaluation, nor does the bulk data referto any built-in terms of the OWL and RDF(S) vocabularies. There are no blank nodes, i.e., the bulk data consists entirely of a "ground" RDF graph. The bulk data sets being used in theevaluation have been much larger, still havingthe same basic format as the example setpresented here.

Notes:• Vampire is very bad with "auto" strategy: times out in most cases• Improvement by using SInE strategy (Iprover and Vampire) on complete axiom set• Major improvement by combining SInE strategy with removal of irrelevant OWL axioms

(small axiom sets)

Experiment 3: ScalabilityResults

01

02

03

04

05

06

07

08

09

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

+ + + ? ? ? ? ? ? ? ? ? ? ? + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

+ + + + + + ? + ? ? + ? ? ? + + ? + + ? ? ? + ? ? + ? ? ? + ? +

+ + + + + + + + + + + ? ? + + + + + + ? ? + + + + + + + + + + +

+ ? + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

+ + + + + + ? + + + + + ? + + + ? + + ? ? + + + + + + ? + + ? +

+ + + + + + + + + + + + ? + + + + + + + + + + + + + + + + + + +

Vampire auto / complete

Vampire SInE / complete

IProver-SInE / complete

Vampire auto / small

Vampire SInE / small

Iprover-SInE / small

+ success

- wrong

? unknown

Experiment 3: ScalabilityRuntimes (sorted)

Notes:• General offset of ca. 20s for parsing large input data (ca. 55MB)• SInE strategy successful (Vampire mostly fails when using "auto" mode)• further improvements by using small axiom sets

EVALUATION RESULTS:4. MODEL FINDING

Experiment 4: Model-FindingOverview

• Aim: analyse ability to detect non-entailments and consistentontologies

• Method: Using FOL model-finders on test suite with consistentontologies and non-entailments. Also using sub axiom sets ofOWL 2 Full axiom set in order to see how well model-findingimproves for smaller sublanguages of OWL 2 Full. For sub-axiomsets, some of the OWL 2 Full entailments and inconsistencies in a test suite will become non-entailments and consistent ontologies.

• Axiom Sets:– OWL 2 Full– ALCO Full: undecidable sublanguage of OWL 2 Full [Motik 05]– RDFS-EXT: "extensional RDFS" [RDF Semantics, Sec. 4.2]

• Test Data: Characteristic Conclusions test suite

Experiment 4: Model-FindingResults (Summary)

• OWL 2 Full (unsuccessful!):– No FOL model-finder confirmed satisfiability of axiomatization (timeouts)– Fortunately: no theorem prover confirmed unsatisfiability– Good: all "small-sufficient" sub-axiomatizations of test cases satisfiable

• ALCO Full:– Satisfiability checking for axiomatization successful– Checking non-entailment/consistency successful in 2/3 of the test cases– Runtimes: median ca. 18s with model-finder Paradox

• RDFS: – Checking non-entailment/consistency always successful– Runtimes: ca. 1/10s for most experiments with model-finder DarwinFM

35

Experiment 4: Model-FindingResults (Concrete)

Notes:• black cells are still entailments or inconsistent ontologies: not probed! • 1st line: ALCO Full axiom set, using Paradox model-finder

Result: 15 successful detections, 9 time-outs• 2nd/3rd line: RDFS-EXT axiom set, using Paradox/DarwinFM model-finders

Result: always successfull

01

02

03

04

05

06

07

08

09

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

+ + + + + + ? + + + + + ? ? ? ? + ? ? ? + + ? +

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Paradox / ALCO Full

Paradox / RDFS

DarwinFM / RDFS

+ success

- wrong

? unknown

not probed

CONCLUSIONS

Summary

• Using ATP-based OWL 2 Full reasoning works in principle:– Language Coverage: basically complete (skipped datatypes)

• for a few test cases, it was necessary to select a small axiom set from thecomplete OWL 2 Full axiomatization sufficient to proof the result

– Characteristic OWL 2 Full Conclusions: all, if using small axiom sets– Performance: often quick (< 1/10s), if using small axiom sets– Scalability: works for semantically weak and unrelated "bulk" data– Model-Finding: works for certain fragments of OWL 2 Full

• Identified Problems (motivates future work):– slow or even dysfunctional on complete axiomatization (> 500 axioms)– no successful model-finding for complete OWL 2 Full axiomatization

Future Work

• develop automated method for selecting small axiom sets• conduct more realistic scalability experiments• investigate query answering with FOL ATPs• add support for datatype reasoning• try to manually find a model for the OWL 2 Full axiomatization• implement a prototypical OWL 2 Full reasoner

Links

• Conference Paper:http://dx.doi.org/10.1007/978-3-642-22438-6_35

• Extended Version of Paper(detailed results, "Characteristic Conclusions" test suite)

http://arxiv.org/abs/1108.0155

• Supplementary Material(all axiom sets, test data, raw results, software):

http://www.fzi.de/downloads/ipe/schneid/cade2011-schneidsut-owlfullatp.zip

http://dx.doi.org/10.1007/978-3-642-22438-6_35�

http://arxiv.org/abs/1108.0155�

http://www.fzi.de/downloads/ipe/schneid/cade2011-schneidsut-owlfullatp.zip�