1 iclp-09 enabling serendipitous search on the web of data using prolog jan wielemaker vu university...
Post on 11-Jan-2016
214 Views
Preview:
TRANSCRIPT
1ICLP-09
Enabling serendipitous searchon the Web of Data
usingProlog
Jan WielemakerVU University Amsterdam
2ICLP-09
Issues addressed
Recent developments reshaped the Web The web moved from “Web of documents” to
“Web of data” and “Web of applications” “Open” and “Linked” data makes massive
amounts of data available to be processed by machines
How can we deploy Prolog in this environment?
3ICLP-09
Overview
Introducing the semantic search engine “ClioPatria”; description of the problem it addresses
Why (not) use Prolog for semantic web applications?
Processing RDF-data Applying Prolog in web-servers Creating interactive web-applications Wrap-up
PART I
The ClioPatria use-case:
Integrate digital collections of multiple museums and connect it to background
knowledge
Collection and Meta-data
SchemaSchema
VocabulariesVocabularies
6
Background knowledge
7
The Web: documents and links
URL URL
Web-link(untyped hyperlink)
8
The Semantic, or Data Web: data and links
URL URLWeb link
Painter“Henri Matisse”Getty ULAN
creator
Dublin Core
Painting“Green Stripe (Mme Matisse)”Royal Museum of Fine Arts, Copenhagen
… nice graph, but ...
What about semantics? What about structure?
Semantic Web data model: RDF
1 fact = R(O1, O2) = <O1,R, O2> = 1 “triple” many facts = labelled graph = RDF URIs as identifiers, typed relations between typed objects Has many different syntaxes
(XML (W3C), N3, Turtle, graphical, etc). Doesn’t matter: it’s a data model
Slide by Frank van Harmelen
Semantic Web data model: RDF Schema
hierarchy of types, hierarchy of relations, domain/range-constraints simple: no negation, disjunction, universal
Slide by Frank van Harmelen
Semantic Web data model: OWL and SWRL
everything you wanted to say but cannot say in RDF(S) negation, disjunction, cardinality, limited universal, relational algebra (trans, symm) still no composition of relations (DL-based)
SWRL: rules with DL concepts as atoms
Full
DL
Lite
Slide by Frank van Harmelen
15
Structure for thesauri
Structure for works of Art
From meta-data tosemantic meta-data
ThesaurusSchema mapping (SKOS)
Meta-dataSchema mapping (VRA)
Thesaurusalignment
Meta-datamapping
5 collections → 11,000,000 triples
Part of a large cloud oflinked data!
The challenge
How to make use of this network for search? Can we search better? Can we present better?
ClioPatria
A Prolog web-server with RDF-store Developed to explore this challenge
Explore graph using best-first search based on semantic distance
Cluster results based on relation to query
ClioPatria: “Matisse”“Matisse” in the
title“Matisse” in the
title
Located in“Musee Matisse”
Located in“Musee Matisse”
Created by“Matisse”
Created by“Matisse”
Paintings in the same style as
used by “Matisse”
Paintings in the same style as
used by “Matisse”
Serendipitous?
Serendipity is the effect by which one accidentally discovers something fortunate, especially while looking for something else entirely unrelated (wikipedia).
The search is not based on any schema It can find results through unexpected paths It often finds many unintended results (i.e., it
answers multiple “graph” queries) This remains manageable due to clustering
→ “Post-query disambiguation”
Serendipitous … “Picasso”
Things made from“Picasso marble”
Things made from“Picasso marble”
ClioPatria fact-sheet
Prolog 246 files, 67,500 lines
Developers 3 core, about 10 occasional
Triples loaded Used with upto 22,000,000. Scales to300,000,000 in 64-Gb memory
Usage Known to be in use in 6 projects
http://e-culture.multimedian.nl/software/ClioPatria.shtml
25ICLP-09
Part-II
Using Prolog for theSemantic Web
26ICLP-09
The neaties vs. the scruffies
(DL-)Logic background In search for
expressive logics, correct and efficient resolution techniques
LP: F-Logic, ASP, ALP, FO(.), … (Marc Denecker)
Webby background In search for doing
something useful with huge amounts of shallow and inconsistent facts
Simple logics, techniques need not be sound, neither complete.
27ICLP-09
Why NOT Prolog?
The core-concepts in the Web community are: Networking Concurrency Web-page generation Internationalization ...
These are typically not associated to Prolog
28ICLP-09
Why Prolog?
RDF fits nicely with relational model of Prolog With a little work it does everything SPARQL
can … but it is much more flexible
Most languages in the SW-community can be translated into Horn-clauses:
OWL (large subset) Rule languages: SWRL, RIF ...
29ICLP-09
The Semantic Web seen from Prolog
Pure predicate rdf/3: rdf(?Subject, ?Predicate, ?Object) is nondet.
URI → Atom
Literal → literal(Atom) literal(lang(Code, Atom)) literal(type(URI, Atom))
30ICLP-09
URI: XML Namespaces
Namespaces are expanded at compile-time by means of rules for goal_expansion/2, so
rdf(S, 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'http://e-culture.multimedian.nl/ns/getty/ulan#Person').
rdf(S, rdf:type, ulan:'Person').
Toplevel and debugger results are made readable again using portray/1
Can be written as
31ICLP-09
A simple example
?- module(rdfs_entailment).
rdfs_entailment:
?- rdf(X, rdf:type, ulan:'Person'),
rdf(X, rdfs:label, literal('Matisse, Henri')),
rdf(Work, dc:creator, X).
32ICLP-09
Optimising
?- In = rdf(X, rdf:type, ulan:'Person'), rdf(X, rdfs:label, literal('Matisse, Henri')) rdf(Work, dc:creator, X),
rdf_optimise(In, Goal).
Goal = rdf(X, rdfs:label, literal('Matisse, Henri')), rdfql_carthesian([ bag([], rdf(X, rdf:type, ulan:'Person')), bag([Work], rdf(Work, dc:creator, X)) ])).
33ICLP-09
Advantages of Prolog over SPARQL
Flexibility and reuse: We can mix with arbitrary Prolog code We can name and combine queries We can do recursion
This is similar to SQL vs. Prolog, but Processing RDF involves pattern-matching,
rules and recursion, while datatypes are less important.
34ICLP-09
Prolog ↔ SPARQL
Fits SPARQL Partially fits SPARQL Does not fit SPARQL
0
0.2
0.4
0.6
0.8
1
1.2
Prolog
SPARQL
One SPARQL queryto get result
One SPARQL queryto get result
MultipleSPARQL queries
MultipleSPARQL queries
Fetch triple-by-tripleand
process in client
Fetch triple-by-tripleand
process in client
35ICLP-09
Reasoning
Reasoning is connected to a language (RDFS, OWL, SWRL, …)
Reasoning derives facts from the triple store that are not explicitly provided in the dataset.
36ICLP-09
Options for Reasoning (I)
Reasoning adds (virtual) triples (entailment): The only API is rdf(S,P,O) Forward reasoning
Easy to implement Difficult to handle database updates Can explode using richer languages (e.g., OWL)
Backward reasoning Non-termination under SLD resolution Need for optimization of conjunctions Easy to provide alternative reasoners
37ICLP-09
Alternative entailment reasoners asProlog modules
Core RDF-DBrdf/3
Core RDF-DBrdf/3
RDFSrdf/3
RDFSrdf/3
OWL-Horstrdf/3
OWL-Horstrdf/3
....
38ICLP-09
Options for Reasoning (II)
Based on Abstract Syntax Dedicated high-level API Forward reasoning
Transformation (Thea OWL(-2) library)
Backward reasoning Thea: http://www.semanticweb.gr/TheaOWLLib/
By Vangelis Vassiliadis and Chris Mungall
39ICLP-09
Reasoning with Abstract Syntax API
Core RDF-DBrdf/3
Core RDF-DBrdf/3
....Thea (OWL-2)subClassOf/2Thea (OWL-2)subClassOf/2
Forward:Transformation
RDFSrdfs_individual_of/2rdfs_subclass_of/2
...
RDFSrdfs_individual_of/2rdfs_subclass_of/2
...
Backward:Prolog rules
40ICLP-09
Options for reasoning (summary)
Entailment-based Uniform query API → app can switch entailment Query API is low-level (Using forward reasoning) entailed graph is
added to database → Difficult to deal with multiple languages
Abstract-syntax based Each language has its own query API Query API is high-level Easy to deal with multiple languages
41ICLP-09
A closer look at the RDF store: requirements
Efficient in any instantiation-pattern (full indexing) Deal with property-hierarchy Deal with owl:sameAs Literal indexing (prefix, full-text, ...) Scalable to 10-100 M-triples
42ICLP-09
Options for rdf/3 (I: Using Prolog)
Prolog dynamic database We need multiple indexes (e.g., YAP) Cannot exploit domain-specific aspects:
Property-hierarchy matching Facts are ground, unordered and support limited
types
Hard to provide statistics for the optimizer because they are also domain-specific
43ICLP-09
Options for rdf/3 (II: Using an external store)
External store Slow connection (need to intern/extern URI-
as-atom) We do not want (most of) the reasoning
44ICLP-09
Options for rdf/3 (III: Dedicated C)
Using dedicated C-library Can optimize for space based on limited
datatypes Use atom-handles in the database (no
intern/extern) Sort literals in an AVL-tree (prefix search) Keep counts (for query optimizations) Fast binary load/save format
45ICLP-09
RDF Processing (summary)
Expressing graph-patterns mixed with auxiliary Prolog is easy
This is enough for a large part of RDF processing in semantic web applications
Reasoning Forward closure (easy, big, no changes) Backward: termination issues (tabling can
help) Extending rdf/3 ↔ Using abstract language
46ICLP-09
Part III
Web-Applications
47ICLP-09
DatabaseDatabase
Web-Application Reference Architecture(Three Tier Model)
Presentationgeneration
Presentationgeneration
ApplicationLogic
ApplicationLogic
Web 2.0
JavaScript
Web Browser
Web 3.0 (Semantic Web)
RDF
RDF
RDF
RDF
RDF
RDF
RDF
RDF
RDF
RDF
RDFRD
FRDF
LinkedData
48ICLP-09
Protocols and Standards
RDFDatabase
RDFDatabase
ApplicationLogic
ApplicationLogic
HTTP SPARQL
PrologHTTP ?
49ICLP-09
Prolog-to-HTTP
Tomcat.NET
...
Tomcat.NET
...
JPLInterProlog
PrologBeans...
JPLInterProlog
PrologBeans...
PrologProlog
Web-Server Interface Application
Need to program in Tomcat/.NET/... & PrologDifficult deploymentJPL: One process (JNI/C interface)
Fast, but hard to debugInterProlog/Prologbeans/... (proprietary network)
HTTP
50ICLP-09
Prolog-to-HTTP
Easy debugging
Easily extend the HTTP interface
Not `industry standard'
But … many languages provide an HTTP server library
PrologProlog
Web-Server Application
ProloglibraryHTTP
ProloglibraryHTTP
Interface
51ICLP-09
ApacheApache
Deployment
Using Apache reverse-proxy and load-balancer <VirtualHost * > ServerName www.swi-prolog.org ProxyPass / http://localhost:3040/
PrologProlog
VNC
Port80
Port3040
52ICLP-09
VNC server console
53ICLP-09
/api/search?q=picasso&count=100
:- use_module(library(http/http_dispatch)).:- use_module(library(http/http_parameters)).:- use_module(library(http/http_json)).
:- http_handler('/api/search', search, []).
search(Request) :-http_parameters(Request,
[ q(Q, []), start(S, [default(0)]), count(C, [default(25)])]),
search(Q, S, C, Results),reply_json(Results).
54ICLP-09
Summary HTTP support
Writing the HTTP-server in Prolog gives us:
Good single-language development environment Incremental compilation: life-updating the server Deployment can be direct or through a proxy
Not so big: 12,000 lines for
Core HTTP client and server HTML and JSON read/write Parameters, sessions, authorization, logging
55ICLP-09
Part IV
Creating InteractiveWeb Applications
usingProlog
56ICLP-09
Web of Documents(Original drawing by Tim Burners Lee)
57ICLP-09
Interactive Web-Applications
Server needs to keep track of client (sessions) Client needs light-weight updates of the interface … but HTTP is state-less …
58ICLP-09
Introducing State
Negotiate a session-key between client and server Server associates state with this key Client modifies the interface using JavaScript
→ AJAX
59ICLP-09
What is AJAX not?
60ICLP-09
Case
Create a web-interface for the N-queens problem Interaction
Select size of board Select implementation (Prolog ↔ clp(FD)) Get first solution Get next solution or stop
State in backtrackable Prolog program By
Torbjörn Lager, Markus Triska, Jan Wielemaker
61ICLP-09
Step I: create initial page
DOM
Browser
JavaScriptJavaScript
WEBApplication
Server(HTTP)
WEBApplication
Server(HTTP)
Initial HTML Page
Builds initial DOM
InitialHTML
+JS
62ICLP-09
63ICLP-09
DOM
Browser
JavaScriptJavaScript
WEBApplication
Server(HTTP)
WEBApplication
Server(HTTP)
Initial HTML Page
Builds initial DOM
InitialHTML
+JSLocal Interaction
Step II: Add local interaction
64ICLP-09
65ICLP-09
Options ...
<input type="button" id='opts' name="options" value="Options …" onClick="showOptions(true)">
function showOptions(show) { document.getElementById("options").style.display = show ? "block" : "none";}
66ICLP-09
OK: applyOptions()function applyOptions() { var size = parseInt(document.getElementById("size").value);
if ( document.getElementById("queens").checked == true ) {algorithm = "queens";
} else {algorithm = "clpfd_queens";
}
if ( size < 2 || size > 40 ) {alert("Size must be in the range 2..40");
} else {boardsize = size;showOptions(false);document.getElementById("N").innerHTML = size;document.getElementById("who").innerHTML = (algorithm == "queens" ? "Prolog" : "clp(FD)");document.getElementById("board").innerHTML = board(boardsize, boardwidth);
}}
Set client state inglobal variables
Set client state inglobal variables
Update the interfaceby changing the DOMUpdate the interface
by changing the DOM
→ NO server interaction
67ICLP-09
68ICLP-09
Step-III: Add server interaction
DOM
Browser
JavaScriptJavaScript
WEBApplication
Server(HTTP)
WEBApplication
Server(HTTP)
Initial HTML Page
Builds initial DOM
InitialHTML
+JSLocal Interaction
ServerInteraction
69ICLP-09
First ...
function first() {working();
YAHOO.util.Connect.asyncRequest( 'GET', "/prolog/first?goal="+algorithm+"("+boardsize+",L)", { success: update });}
<input type="button" id='first' name="first" value="First" onClick="first()">
Server requestServer request
What to do whenthe server responds?
What to do whenthe server responds?
70ICLP-09
Client code-fragment: handle response function update(o) { var solution = YAHOO.lang.JSON.parse(o.responseText); if (solution.solution) { if ( solution.next == true ) { setButtons(true); } else { setButtons(false); } clearBoard(); setQueens(solution.solution.args[1].value); document.getElementById("msg").innerHTML = "CPU: " + solution.time.toPrecision(2) + " sec.";
} else if ( solution.error ) { setButtons(false); document.getElementById("msg").innerHTML = "<span class='error'>"+solution.error+"</span>";
} else { setButtons(false); document.getElementById("msg").innerHTML = "There are no more solutions."; } }
Process as JSONProcess as JSON
Update DOM basedon JSON reply
Update DOM basedon JSON reply
71ICLP-09
setQueens()
Replace DOMfragment
Replace DOMfragment
function setQueens(squareList) { for (var i = 1; i <= boardsize; i++) { var id = i + "-" + (squareList[i-1].value); document.getElementById(id).innerHTML = "<img src='/queen' class='square-img'/>"; }}
72ICLP-09
73ICLP-09
Backtracking state in the server
Threadsession-1Thread
session-1
Threadsession-N
Threadsession-N
HTTPWorkerthread
HTTPWorkerthread
JSONDocument
JSONDocument
Backtrack
Prolog-termGET /prolog/next
session-id=1
State
74ICLP-09
Backtracking statesolve(Goal, Bindings, ThreadID) :-
thread_self(Me),thread_statistics(Me, cputime, T0a),State = client(ThreadID, T0a),solve_2(Goal, Bindings, Solution),State = client(Client, T0),thread_statistics(Me, cputime, T1),Time is T1 - T0,solution_time(Solution, Time),nb_setarg(2, State, T1),debug(prolog_server, 'Sending: ~q', [Solution]),thread_send_message(Client, Solution),solution_type(Solution, Type),( Type == last-> true; Type == true-> catch(thread_get_message(command(From, Command)), _, Command =
stop), debug(prolog_server, 'Command: ~q', [Command]), nb_setarg(1, State, From), Command == stop; true).
(Guarded)actual goal(Guarded)actual goal
Send replySend reply
Wait foruser
Wait foruser
75ICLP-09
AJAX has many architectures From http://www.openajax.org/member/wiki/Whitepaper_20060730
76ICLP-09
Where does the JavaScript come from?
Widget LibraryAjaxAnywhere, MochiKit, YUI, ...
Widget LibraryAjaxAnywhere, MochiKit, YUI, ...
User Code- Instantiation- Set attributes- Refine methods
User Code- Instantiation- Set attributes- Refine methods
77ICLP-09
Options for generating application JavaScript
Write a JavaScript file and link it from the HTML page
Code is in two places → Good split if API is stable Poor for prototyping and often changing APIs
Write JavaScript in Prolog strings and include in page
Messy syntax (Python """long string""")
78ICLP-09
Generate from Prolog terms?
Works well for HTML (e.g., html_write, PiLLoW) But, JavaScript customization often places code-
fragments in object-properties No simply interface such as e.g., XPCE:
Create/Set property/Call method
A full mapping of JS code to Prolog syntax is probably not transparent enough for users
79ICLP-09
Wrap-Up
The “Web of data” is out there Prolog is an excellent tool for processing RDF
The interactive “Web 2.0” is out there Web 2.0 is (relatively) language independent Prolog is a suitable server component for Web
2.0
80ICLP-09
Future DirectionsFuture Directions
Enhance RDF support: Improve scalability Higher level reasoning
Provide tabling Generalise optimizers
Enhance web-programming support Explore cleaner integration with AJAX
Merge into Prolog-Commons Initiative
81ICLP-09
Links
http://www.swi-prolog.org http://e-culture.multimedian.nl/software/ClioPatria.s
html http://www.swi-prolog.org/Publications.html
82
• Part of the Dutch knowledge-economy project MultimediaN
• Partners: VU, CWI, UvA, DEN, ICN• People:
Alia Amin, Lora Aroyo, Mark van Assem, Victor de Boer, Lynda Hardman, Michiel Hildebrand, Laura Hollink, Marco de Niet, Borys Omelayenko, Marie-France van Orsouw, Jacco van Ossenbruggen, Guus Schreiber Jos Taekema, Annemiek Teesing, Anna Tordai, Jan Wielemaker, Bob Wielinga
• Artchive.com, RKD, Rijksmuseum Amsterdam, Dutch ethnology musea (Amsterdam, Leiden), National Library (Bibliopolis)
http://e-culture.multimedian.nl
top related