Download - Facet: Building Web Pages with SPARQL
a division of Publishing Technology
FacetBuilding Web Pages With SPARQL
SWIG-UK Event, HP LabsNovember 23rd 2007
Leigh DoddsChief Technology Officer, Ingenta
a division of Publishing Technology
Problem Statement
a division of Publishing Technology
Where’s my RDF-native Web Framework?!
There is no good system for integrating RDF repositories with existing an web framework (in Java)
a division of Publishing Technology
Design Constraints
a division of Publishing Technology
a division of Publishing Technology
a division of Publishing Technology
a division of Publishing Technology
a division of Publishing Technology
a division of Publishing Technology
a division of Publishing Technology
a division of Publishing Technology
Journal
Article
Article
ArticleIssue
Issue
Issue
2007 Index
2006 Index
New Article Index
a division of Publishing Technology
Design Constraints
• A web page presents data that is a sub-graph (i.e. a view) over a larger RDF store, (the data model)
• The extent of the sub-graph may vary for different presentations of the data, and may contain arbitrary properties
• The description of the sub-graph (a lens) should be declarative
• That sub-graph is “rooted” on a single primary resource (e.g. a Journal)
• The identifier of the primary resource can be derived from the request URL, e.g. by rewriting the URI. And vice versa
– Therefore, we don’t support blank nodes as primary resources– Or fragment identifiers in URIs!
• The sub-graph should be serializable into an object graph for presentation to the templating system
a division of Publishing Technology
Facet Request Handling
To return a response we need to answer three questions…
1. What lens are we going to apply?
2. What data model are we going to apply it to?
3. What’s the identifier of the primary resource?
a division of Publishing Technology
Lenses
Describing views of RDF data
a division of Publishing Technology
A Simple Lens
PREFIX dc: <http://purl.org/dc/elements/1.1/>CONSTRUCT { ?item dc:title ?title . ?item dc:language ?language .} WHERE { ?item dc:title ?title . OPTIONAL { ?item dc:language ?language . } }
a division of Publishing Technology
Configuring Lenses<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ja="http://jena.hpl.hp.com/2005/11/Assembler#"
xmlns:view="http://metastore.ingenta.com/facet/lens/"> <rdf:Description rdf:about="http://metastore.ingenta.com/facet/lens/Sparql">
<ja:assembler>com.ingenta.facet….</ja:assembler> </rdf:Description>
<view:Sparql rdf:about="http://oecd.metastore.ingenta.com/views/just-the-
title"> <view:query>sparql/just-the-title.rq</view:query> </view:Sparql> </rdf:RDF>
a division of Publishing Technology
Data Model
Configuring RDF graphs
a division of Publishing Technology
Data Model Configuration
• Jena Assembler API
• Add notion of application level default data model– Uses well-known URI
• Lenses may be configured to apply to a specific data model– Allows “sharding” of data models
<view:Sparql
rdf:about="http://oecd.metastore.ingenta.com/views/just-the-title">
<view:appliesTo rdf:resource=“…”/>
</view:Sparql>
a division of Publishing Technology
Resource Identifiers
a division of Publishing Technology
Mapping Resource Identifiers
• In a RESTful application, each resource should have a single primary location
• Allows resource identifiers to be derived using URL rewriting
http://test.sourceoecd.org/oecd/content/journal/18168116
http://oecd.metastore.ingenta.com/content/journal/18168116
a division of Publishing Technology
Serialization
Mapping an RDF sub-graph to a Java object model
a division of Publishing Technology
Serialization
• Primary resource is a ContentItem– Has an identifier and Map of properties
• Walk through graph, beginning at “root” resource, mapping RDF statements to Map entries
• Mapping of property names is configurable.
– Default based on namespace prefix, E.g. dc_title
• Mapping of objects of each statement to a suitable Java object– ContentItem, Map, List, Integer, etc
a division of Publishing Technology
Serialization (special cases)
• Multilingual properties– Special casing (i.e. a hack!) to modify naming, e.g. dc_title_fr
• Repeated properties, e.g. dc:subject– Use schema annotation to indicate these, and then Serialize to a List
• XML Literals & Multi-lingual data– E.g. multi-lingual abstracts (dc:description) that contain XHTML
markup– Use schema annotation, parse and create separate Map entries
a division of Publishing Technology
Additional Features• “MultiLens”, applying multiple queries in series to build results
• Automatic availability of URL parameters as SPARQL query parameters
• Integral API support– RDF output for free; JSON output trivial
• Simple content lifecycle, mapping to HTTP resource statuses– E.g. Content Not Found, Moved, Gone– Add type (life:Deleted) and properties (life:newLocation) to data
• Support for URL Aliasing based on property values– /content/issn/1234-5678 -> /content/journal/abcdef– <prism:issn>
a division of Publishing Technology
SummaryPros• By embracing a few limitations on RDF modelling, e.g. identifiers provides
a very flexible means of building web pages from an RDF repository
• Reliance on SPARQL and Jena API features provides great deal of configuration options
• Good integration with existing web templating environments
• Quick to learn
Cons• Model limitations mean its not suited to all RDF “in the wild”