finding the story — generating large- scale document structure in semantics-to-hypermedia...

30
Finding the Story — Generating Large- Scale Document Structure in Semantics-to-Hypermedia Transformation Lloyd Rutledge CWI, Amsterdam

Post on 20-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Finding the Story — Generating Large-Scale Document Structure in

Semantics-to-Hypermedia Transformation

Lloyd RutledgeCWI, Amsterdam

The Topia Project● Principles and Goals

– Topiary Hypermedia: plant once and trim

– Presentation generation

– User-controlled and on-demand

– Automated propogation of each author change

– Structure-focussed

– Domain-independence and facilitated specificity

Topia projectpartners:

Telematica Instituut Technische UniversiteitEindhoven

Document Request ala Google

find existing document

generate new multimedia

The Topia Demo

Document Engineering “Triangle”

document

search

retrieval

structure

the engineer

style

the stylist

content

the archivist

topic

the user

Document Engineering Historypaper author

read

find

the past 5000 yearsWeb author

choose style

surf

read

universallyapplicable

the past 10 years

search database

post/archive browse results

select arch

ive

enter query

the past 5 years

pres

enta

tion

cre

ator

select/co

ntrol

select/control

select/co

ntrol

presentation

generated

by the end of this talk

universallyapplicable

user

topic

engineer

clustering

stylist

style sheet

archivist

selection structure presentationsemantics

The Topia Architecture

onl

ine

picks

Principles● User Control

– pick expert – set options – become author

● Cross-applicability– each expert’s contribution applies to any from the others

● Show what and why– why archivist selected content for user request

– why engineer put concept where it is in structure

– why stylist picked each media for its concept

Archivist’s Responsibilities

● To user– reasonable (amount of) content for reasonable requests

● To engineer– enough relations between subset to derive structure

● To stylist– media for presenting concepts in different structural

context

● Node-based interaction with all levels

<rdf:Description rdf:about='&ARIA;#ArtefactSK-A-2670'> <rdf:type rdf:resource='&ARIA;#Artefact'/> <dc:title>Pinks in the Breakers</dc:title> <topia:artefactImage rdf:resource='&RM;SK/Org/SK-A-2670.org.jpg'/> <dc:creator rdf:resource='&ARIA;#Artist11960'/> <dc:date>1875-1885</dc:date> <dc:description> ... along the beach by horses. Scheveningen did not ... </dc:description> <topia:artefactMaterial>Oil on canvas</topia:artefactMaterial> <topia:artefactSize>90 x 181 cm</topia:artefactSize> ...</rdf:Description>

Archival RDF Code

concept

concept

text

media

#

text<rdfs:label>

property type

concept type

ARIA Concept Map

User's Request Interface

<tableQueryResult> <header> <columnName>Concept</columnName> <columnName>Property</columnName> <columnName>String</columnName> </header> ... <tuple> <uri>&ARIA;#ArtefactSK-A-2670</uri> <uri>http://purl.org/dc/elements/1.1/#description</uri> <literal> ... along the beach by horses. Scheveningen did not ... </literal> </tuple> ...</tableQueryResult>

Selection SeRQL Result Code

Clustering for Structure

original selection

cluster node

recu

rren

ce

sequence

parent-child

hierarchical nodes

leaf nodes

from clustering

from user query

form introduction and summary displays

form main displays

Document Structure

Proximity Principle

● Proximity Matrix– each pair of selected concepts has a proximity measure

● Matching conceptual and structural proximity– grouping, sequence and recurrance convey proximity

● Let’s not forget why– presentation should convey why structurally proximate

concepts were measured as proximate

Engineer's Interface

Concept Lattices(C1) “Waterâ€

(C4)

Gen

re: F

ield

mea

dow

s

(C5)

Gen

re: D

utch

Lan

dsca

pes

(C8)

Gen

re: R

iver

scap

es

“A watercourse at Abcoude†(A1) X X X X“Watercourse near â€̃s-Graveland†(A2) X X

“Mountainous landscape with waterfall†(A3) X X X X“A water mill†(A4) X X X

“Landscape with waterfall" (A5) X X X X“Water mill†(A6) X X

“Windmill on a polder waterway’†(A7) X X X X“A waterside ruin in Italy†(A8) X

“The battle of Waterloo, 18 june 1815†(A9) XConcept Size 5 3 3 3 3 3 3 2

(C2)

Gen

re: W

ater

ice

and

snow

(C3)

Gen

re: B

uild

ings

in la

ndsc

apes

(C6)

Art

ist:

Jaco

b v

an R

uisd

ael

(C7)

Gen

re: B

uild

ings

in la

ndsc

apes

(C9)

Art

ist:

Pau

l Jos

eph

Con

stan

tin G

abrie

l

Beyond Lattices

● Inferred properties to beef up the link metrics– we use art genre sub-class inheritence

– rules provided by archivist as domain-specific

● Relational clustering– property (ie lattice) functional subset of relational

– Can infer relations just like properties

● Axial (numeric) clustering– creates virtual group nodes, without RDF resource

<node> <concept literal="beach"/> <node> <concept property="artefactThema" resource="&ARIA;#Thema6292"/> <node><concept resource="&ARIA;#ArtefactRP-P-OB-4635"/></node> <node><concept resource="&ARIA;#ArtefactSK-A-2670"/></node> </node> <node> <concept property="type" resource="&ARIA;#Term26402"/> <concept property="type" resource="&ARIA;#BroaderTerm24480"/> <node><concept resource="&ARIA;#ArtefactRP-P-FM-1157-A"/></node> <node> <concept literal="Oil on canvas" property="artefactMaterial"/> <concept property="type" resource="&ARIA;#TopTerm4"/> <node><concept resource="&ARIA;#ArtefactSK-A-4868"/></node> <node><concept resource="&ARIA;#ArtefactSK-A-2670"/></node> <node><concept resource="&ARIA;#ArtefactSK-A-3602"/></node> <node><concept resource="&ARIA;#ArtefactSK-A-3597"/></node> <node><concept resource="&ARIA;#ArtefactSK-A-4644"/></node> </node> </node></node>

Structure Code

Make it Presentable

Stylist Responsibilities

● Good presentation of each concept– retrieval of good media

● Good presentation of structure– global view and local context

● Use media, layout and timing to show why– why primary content in presentation

– why structure was chosen

– group, sequence, (adjacency) and recurrence

One Example of Style

outline(structure)

main display(node)

originaluser request

defaultprogression

contextualrecurrenceaccess

seen

current

recurrence

<dc:

titl

e>

<dc:title>

<dc:creator><dc:date>

<dc:description>

Media for the Stylist

● Dublin Core for Main Display Text– title, description, date, creator

● Media URI’s for Main Display● Titles and thumbnails for outline and context● <rdfs:label> for why

– describes what type of concept a concept is

– describes property types, thus relations

– “Titus is the son of the painter Rembrandt”concept property type concept type concept

<xsl:template match="*" mode="getDesc"> <xsl:variable name="server" select='sesame:setServer("http://media.cwi.nl:8080/sesame/")'/> <xsl:variable name="rep" select='sesame:setRepository("topia")'/> <xsl:variable name="handle" select="@resource"/> <xsl:variable name="desc"> <sesame:serql query=" SELECT DISTINCT desc FROM {<!{$handle}>} <dc:description> {desc} USING NAMESPACE topia = <!http://www.telin.nl/rdf/topia#> "/> </xsl:variable> <xsl:text> </xsl:text> <xsl:apply-templates select="xalan:nodeset($desc)/tableQueryResult/tuple"/> <xsl:text> </xsl:text> </xsl:template>

character escaping removed

Media Selection XSLT

New Topia Domain: Google

New Topia Interface: Spectacle

DISC: Domain-specific Discourse

SampLe: More User Control

Topia Take-home Message

● Content/Style/Structure all separate– defined apart and interchangable

– full user control from selection as such

● Structure is current challenge for generation– can be defined apart and domain-independent

– facilitated user/engineer control

● Result is user-controlled on-demand hypermedia generation