1 iclp-09 enabling serendipitous search on the web of data using prolog jan wielemaker vu university...

Post on 11-Jan-2016

214 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1ICLP-09

Enabling serendipitous searchon the Web of Data

usingProlog

Jan WielemakerVU University Amsterdam

2ICLP-09

Issues addressed

Recent developments reshaped the Web The web moved from “Web of documents” to

“Web of data” and “Web of applications” “Open” and “Linked” data makes massive

amounts of data available to be processed by machines

How can we deploy Prolog in this environment?

3ICLP-09

Overview

Introducing the semantic search engine “ClioPatria”; description of the problem it addresses

Why (not) use Prolog for semantic web applications?

Processing RDF-data Applying Prolog in web-servers Creating interactive web-applications Wrap-up

PART I

The ClioPatria use-case:

Integrate digital collections of multiple museums and connect it to background

knowledge

Collection and Meta-data

SchemaSchema

VocabulariesVocabularies

6

Background knowledge

7

The Web: documents and links

URL URL

Web-link(untyped hyperlink)

8

The Semantic, or Data Web: data and links

URL URLWeb link

Painter“Henri Matisse”Getty ULAN

creator

Dublin Core

Painting“Green Stripe (Mme Matisse)”Royal Museum of Fine Arts, Copenhagen

… nice graph, but ...

What about semantics? What about structure?

Semantic Web data model: RDF

1 fact = R(O1, O2) = <O1,R, O2> = 1 “triple” many facts = labelled graph = RDF URIs as identifiers, typed relations between typed objects Has many different syntaxes

(XML (W3C), N3, Turtle, graphical, etc). Doesn’t matter: it’s a data model

Slide by Frank van Harmelen

Semantic Web data model: RDF Schema

hierarchy of types, hierarchy of relations, domain/range-constraints simple: no negation, disjunction, universal

Slide by Frank van Harmelen

Semantic Web data model: OWL and SWRL

everything you wanted to say but cannot say in RDF(S) negation, disjunction, cardinality, limited universal, relational algebra (trans, symm) still no composition of relations (DL-based)

SWRL: rules with DL concepts as atoms

Full

DL

Lite

Slide by Frank van Harmelen

15

Structure for thesauri

Structure for works of Art

From meta-data tosemantic meta-data

ThesaurusSchema mapping (SKOS)

Meta-dataSchema mapping (VRA)

Thesaurusalignment

Meta-datamapping

5 collections → 11,000,000 triples

Part of a large cloud oflinked data!

The challenge

How to make use of this network for search? Can we search better? Can we present better?

ClioPatria

A Prolog web-server with RDF-store Developed to explore this challenge

Explore graph using best-first search based on semantic distance

Cluster results based on relation to query

ClioPatria: “Matisse”“Matisse” in the

title“Matisse” in the

title

Located in“Musee Matisse”

Located in“Musee Matisse”

Created by“Matisse”

Created by“Matisse”

Paintings in the same style as

used by “Matisse”

Paintings in the same style as

used by “Matisse”

Serendipitous?

Serendipity is the effect by which one accidentally discovers something fortunate, especially while looking for something else entirely unrelated (wikipedia).

The search is not based on any schema It can find results through unexpected paths It often finds many unintended results (i.e., it

answers multiple “graph” queries) This remains manageable due to clustering

→ “Post-query disambiguation”

Serendipitous … “Picasso”

Things made from“Picasso marble”

Things made from“Picasso marble”

ClioPatria fact-sheet

Prolog 246 files, 67,500 lines

Developers 3 core, about 10 occasional

Triples loaded Used with upto 22,000,000. Scales to300,000,000 in 64-Gb memory

Usage Known to be in use in 6 projects

http://e-culture.multimedian.nl/software/ClioPatria.shtml

25ICLP-09

Part-II

Using Prolog for theSemantic Web

26ICLP-09

The neaties vs. the scruffies

(DL-)Logic background In search for

expressive logics, correct and efficient resolution techniques

LP: F-Logic, ASP, ALP, FO(.), … (Marc Denecker)

Webby background In search for doing

something useful with huge amounts of shallow and inconsistent facts

Simple logics, techniques need not be sound, neither complete.

27ICLP-09

Why NOT Prolog?

The core-concepts in the Web community are: Networking Concurrency Web-page generation Internationalization ...

These are typically not associated to Prolog

28ICLP-09

Why Prolog?

RDF fits nicely with relational model of Prolog With a little work it does everything SPARQL

can … but it is much more flexible

Most languages in the SW-community can be translated into Horn-clauses:

OWL (large subset) Rule languages: SWRL, RIF ...

29ICLP-09

The Semantic Web seen from Prolog

Pure predicate rdf/3: rdf(?Subject, ?Predicate, ?Object) is nondet.

URI → Atom

Literal → literal(Atom) literal(lang(Code, Atom)) literal(type(URI, Atom))

30ICLP-09

URI: XML Namespaces

Namespaces are expanded at compile-time by means of rules for goal_expansion/2, so

rdf(S, 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'http://e-culture.multimedian.nl/ns/getty/ulan#Person').

rdf(S, rdf:type, ulan:'Person').

Toplevel and debugger results are made readable again using portray/1

Can be written as

31ICLP-09

A simple example

?- module(rdfs_entailment).

rdfs_entailment:

?- rdf(X, rdf:type, ulan:'Person'),

rdf(X, rdfs:label, literal('Matisse, Henri')),

rdf(Work, dc:creator, X).

32ICLP-09

Optimising

?- In = rdf(X, rdf:type, ulan:'Person'), rdf(X, rdfs:label, literal('Matisse, Henri')) rdf(Work, dc:creator, X),

rdf_optimise(In, Goal).

Goal = rdf(X, rdfs:label, literal('Matisse, Henri')), rdfql_carthesian([ bag([], rdf(X, rdf:type, ulan:'Person')), bag([Work], rdf(Work, dc:creator, X)) ])).

33ICLP-09

Advantages of Prolog over SPARQL

Flexibility and reuse: We can mix with arbitrary Prolog code We can name and combine queries We can do recursion

This is similar to SQL vs. Prolog, but Processing RDF involves pattern-matching,

rules and recursion, while datatypes are less important.

34ICLP-09

Prolog ↔ SPARQL

Fits SPARQL Partially fits SPARQL Does not fit SPARQL

0

0.2

0.4

0.6

0.8

1

1.2

Prolog

SPARQL

One SPARQL queryto get result

One SPARQL queryto get result

MultipleSPARQL queries

MultipleSPARQL queries

Fetch triple-by-tripleand

process in client

Fetch triple-by-tripleand

process in client

35ICLP-09

Reasoning

Reasoning is connected to a language (RDFS, OWL, SWRL, …)

Reasoning derives facts from the triple store that are not explicitly provided in the dataset.

36ICLP-09

Options for Reasoning (I)

Reasoning adds (virtual) triples (entailment): The only API is rdf(S,P,O) Forward reasoning

Easy to implement Difficult to handle database updates Can explode using richer languages (e.g., OWL)

Backward reasoning Non-termination under SLD resolution Need for optimization of conjunctions Easy to provide alternative reasoners

37ICLP-09

Alternative entailment reasoners asProlog modules

Core RDF-DBrdf/3

Core RDF-DBrdf/3

RDFSrdf/3

RDFSrdf/3

OWL-Horstrdf/3

OWL-Horstrdf/3

....

38ICLP-09

Options for Reasoning (II)

Based on Abstract Syntax Dedicated high-level API Forward reasoning

Transformation (Thea OWL(-2) library)

Backward reasoning Thea: http://www.semanticweb.gr/TheaOWLLib/

By Vangelis Vassiliadis and Chris Mungall

39ICLP-09

Reasoning with Abstract Syntax API

Core RDF-DBrdf/3

Core RDF-DBrdf/3

....Thea (OWL-2)subClassOf/2Thea (OWL-2)subClassOf/2

Forward:Transformation

RDFSrdfs_individual_of/2rdfs_subclass_of/2

...

RDFSrdfs_individual_of/2rdfs_subclass_of/2

...

Backward:Prolog rules

40ICLP-09

Options for reasoning (summary)

Entailment-based Uniform query API → app can switch entailment Query API is low-level (Using forward reasoning) entailed graph is

added to database → Difficult to deal with multiple languages

Abstract-syntax based Each language has its own query API Query API is high-level Easy to deal with multiple languages

41ICLP-09

A closer look at the RDF store: requirements

Efficient in any instantiation-pattern (full indexing) Deal with property-hierarchy Deal with owl:sameAs Literal indexing (prefix, full-text, ...) Scalable to 10-100 M-triples

42ICLP-09

Options for rdf/3 (I: Using Prolog)

Prolog dynamic database We need multiple indexes (e.g., YAP) Cannot exploit domain-specific aspects:

Property-hierarchy matching Facts are ground, unordered and support limited

types

Hard to provide statistics for the optimizer because they are also domain-specific

43ICLP-09

Options for rdf/3 (II: Using an external store)

External store Slow connection (need to intern/extern URI-

as-atom) We do not want (most of) the reasoning

44ICLP-09

Options for rdf/3 (III: Dedicated C)

Using dedicated C-library Can optimize for space based on limited

datatypes Use atom-handles in the database (no

intern/extern) Sort literals in an AVL-tree (prefix search) Keep counts (for query optimizations) Fast binary load/save format

45ICLP-09

RDF Processing (summary)

Expressing graph-patterns mixed with auxiliary Prolog is easy

This is enough for a large part of RDF processing in semantic web applications

Reasoning Forward closure (easy, big, no changes) Backward: termination issues (tabling can

help) Extending rdf/3 ↔ Using abstract language

46ICLP-09

Part III

Web-Applications

47ICLP-09

DatabaseDatabase

Web-Application Reference Architecture(Three Tier Model)

Presentationgeneration

Presentationgeneration

ApplicationLogic

ApplicationLogic

Web 2.0

JavaScript

Web Browser

Web 3.0 (Semantic Web)

RDF

RDF

RDF

RDF

RDF

RDF

RDF

RDF

RDF

RDF

RDFRD

FRDF

LinkedData

48ICLP-09

Protocols and Standards

RDFDatabase

RDFDatabase

ApplicationLogic

ApplicationLogic

HTTP SPARQL

PrologHTTP ?

49ICLP-09

Prolog-to-HTTP

Tomcat.NET

...

Tomcat.NET

...

JPLInterProlog

PrologBeans...

JPLInterProlog

PrologBeans...

PrologProlog

Web-Server Interface Application

Need to program in Tomcat/.NET/... & PrologDifficult deploymentJPL: One process (JNI/C interface)

Fast, but hard to debugInterProlog/Prologbeans/... (proprietary network)

HTTP

50ICLP-09

Prolog-to-HTTP

Easy debugging

Easily extend the HTTP interface

Not `industry standard'

But … many languages provide an HTTP server library

PrologProlog

Web-Server Application

ProloglibraryHTTP

ProloglibraryHTTP

Interface

51ICLP-09

ApacheApache

Deployment

Using Apache reverse-proxy and load-balancer <VirtualHost * > ServerName www.swi-prolog.org ProxyPass / http://localhost:3040/

PrologProlog

VNC

Port80

Port3040

52ICLP-09

VNC server console

53ICLP-09

/api/search?q=picasso&count=100

:- use_module(library(http/http_dispatch)).:- use_module(library(http/http_parameters)).:- use_module(library(http/http_json)).

:- http_handler('/api/search', search, []).

search(Request) :-http_parameters(Request,

[ q(Q, []), start(S, [default(0)]), count(C, [default(25)])]),

search(Q, S, C, Results),reply_json(Results).

54ICLP-09

Summary HTTP support

Writing the HTTP-server in Prolog gives us:

Good single-language development environment Incremental compilation: life-updating the server Deployment can be direct or through a proxy

Not so big: 12,000 lines for

Core HTTP client and server HTML and JSON read/write Parameters, sessions, authorization, logging

55ICLP-09

Part IV

Creating InteractiveWeb Applications

usingProlog

56ICLP-09

Web of Documents(Original drawing by Tim Burners Lee)

57ICLP-09

Interactive Web-Applications

Server needs to keep track of client (sessions) Client needs light-weight updates of the interface … but HTTP is state-less …

58ICLP-09

Introducing State

Negotiate a session-key between client and server Server associates state with this key Client modifies the interface using JavaScript

→ AJAX

59ICLP-09

What is AJAX not?

60ICLP-09

Case

Create a web-interface for the N-queens problem Interaction

Select size of board Select implementation (Prolog ↔ clp(FD)) Get first solution Get next solution or stop

State in backtrackable Prolog program By

Torbjörn Lager, Markus Triska, Jan Wielemaker

61ICLP-09

Step I: create initial page

DOM

Browser

JavaScriptJavaScript

WEBApplication

Server(HTTP)

WEBApplication

Server(HTTP)

Initial HTML Page

Builds initial DOM

InitialHTML

+JS

62ICLP-09

63ICLP-09

DOM

Browser

JavaScriptJavaScript

WEBApplication

Server(HTTP)

WEBApplication

Server(HTTP)

Initial HTML Page

Builds initial DOM

InitialHTML

+JSLocal Interaction

Step II: Add local interaction

64ICLP-09

65ICLP-09

Options ...

<input type="button" id='opts' name="options" value="Options …" onClick="showOptions(true)">

function showOptions(show) { document.getElementById("options").style.display = show ? "block" : "none";}

66ICLP-09

OK: applyOptions()function applyOptions() { var size = parseInt(document.getElementById("size").value);

if ( document.getElementById("queens").checked == true ) {algorithm = "queens";

} else {algorithm = "clpfd_queens";

}

if ( size < 2 || size > 40 ) {alert("Size must be in the range 2..40");

} else {boardsize = size;showOptions(false);document.getElementById("N").innerHTML = size;document.getElementById("who").innerHTML = (algorithm == "queens" ? "Prolog" : "clp(FD)");document.getElementById("board").innerHTML = board(boardsize, boardwidth);

}}

Set client state inglobal variables

Set client state inglobal variables

Update the interfaceby changing the DOMUpdate the interface

by changing the DOM

→ NO server interaction

67ICLP-09

68ICLP-09

Step-III: Add server interaction

DOM

Browser

JavaScriptJavaScript

WEBApplication

Server(HTTP)

WEBApplication

Server(HTTP)

Initial HTML Page

Builds initial DOM

InitialHTML

+JSLocal Interaction

ServerInteraction

69ICLP-09

First ...

function first() {working();

YAHOO.util.Connect.asyncRequest( 'GET', "/prolog/first?goal="+algorithm+"("+boardsize+",L)", { success: update });}

<input type="button" id='first' name="first" value="First" onClick="first()">

Server requestServer request

What to do whenthe server responds?

What to do whenthe server responds?

70ICLP-09

Client code-fragment: handle response function update(o) { var solution = YAHOO.lang.JSON.parse(o.responseText); if (solution.solution) { if ( solution.next == true ) { setButtons(true); } else { setButtons(false); } clearBoard(); setQueens(solution.solution.args[1].value); document.getElementById("msg").innerHTML = "CPU: " + solution.time.toPrecision(2) + " sec.";

} else if ( solution.error ) { setButtons(false); document.getElementById("msg").innerHTML = "<span class='error'>"+solution.error+"</span>";

} else { setButtons(false); document.getElementById("msg").innerHTML = "There are no more solutions."; } }

Process as JSONProcess as JSON

Update DOM basedon JSON reply

Update DOM basedon JSON reply

71ICLP-09

setQueens()

Replace DOMfragment

Replace DOMfragment

function setQueens(squareList) { for (var i = 1; i <= boardsize; i++) { var id = i + "-" + (squareList[i-1].value); document.getElementById(id).innerHTML = "<img src='/queen' class='square-img'/>"; }}

72ICLP-09

73ICLP-09

Backtracking state in the server

Threadsession-1Thread

session-1

Threadsession-N

Threadsession-N

HTTPWorkerthread

HTTPWorkerthread

JSONDocument

JSONDocument

Backtrack

Prolog-termGET /prolog/next

session-id=1

State

74ICLP-09

Backtracking statesolve(Goal, Bindings, ThreadID) :-

thread_self(Me),thread_statistics(Me, cputime, T0a),State = client(ThreadID, T0a),solve_2(Goal, Bindings, Solution),State = client(Client, T0),thread_statistics(Me, cputime, T1),Time is T1 - T0,solution_time(Solution, Time),nb_setarg(2, State, T1),debug(prolog_server, 'Sending: ~q', [Solution]),thread_send_message(Client, Solution),solution_type(Solution, Type),( Type == last-> true; Type == true-> catch(thread_get_message(command(From, Command)), _, Command =

stop), debug(prolog_server, 'Command: ~q', [Command]), nb_setarg(1, State, From), Command == stop; true).

(Guarded)actual goal(Guarded)actual goal

Send replySend reply

Wait foruser

Wait foruser

75ICLP-09

AJAX has many architectures From http://www.openajax.org/member/wiki/Whitepaper_20060730

76ICLP-09

Where does the JavaScript come from?

Widget LibraryAjaxAnywhere, MochiKit, YUI, ...

Widget LibraryAjaxAnywhere, MochiKit, YUI, ...

User Code- Instantiation- Set attributes- Refine methods

User Code- Instantiation- Set attributes- Refine methods

77ICLP-09

Options for generating application JavaScript

Write a JavaScript file and link it from the HTML page

Code is in two places → Good split if API is stable Poor for prototyping and often changing APIs

Write JavaScript in Prolog strings and include in page

Messy syntax (Python """long string""")

78ICLP-09

Generate from Prolog terms?

Works well for HTML (e.g., html_write, PiLLoW) But, JavaScript customization often places code-

fragments in object-properties No simply interface such as e.g., XPCE:

Create/Set property/Call method

A full mapping of JS code to Prolog syntax is probably not transparent enough for users

79ICLP-09

Wrap-Up

The “Web of data” is out there Prolog is an excellent tool for processing RDF

The interactive “Web 2.0” is out there Web 2.0 is (relatively) language independent Prolog is a suitable server component for Web

2.0

80ICLP-09

Future DirectionsFuture Directions

Enhance RDF support: Improve scalability Higher level reasoning

Provide tabling Generalise optimizers

Enhance web-programming support Explore cleaner integration with AJAX

Merge into Prolog-Commons Initiative

81ICLP-09

Links

http://www.swi-prolog.org http://e-culture.multimedian.nl/software/ClioPatria.s

html http://www.swi-prolog.org/Publications.html

82

• Part of the Dutch knowledge-economy project MultimediaN

• Partners: VU, CWI, UvA, DEN, ICN• People:

Alia Amin, Lora Aroyo, Mark van Assem, Victor de Boer, Lynda Hardman, Michiel Hildebrand, Laura Hollink, Marco de Niet, Borys Omelayenko, Marie-France van Orsouw, Jacco van Ossenbruggen, Guus Schreiber Jos Taekema, Annemiek Teesing, Anna Tordai, Jan Wielemaker, Bob Wielinga

• Artchive.com, RKD, Rijksmuseum Amsterdam, Dutch ethnology musea (Amsterdam, Leiden), National Library (Bibliopolis)

http://e-culture.multimedian.nl

top related