finding the story — generating large- scale document structure in semantics-to-hypermedia...
Post on 20-Dec-2015
214 views
TRANSCRIPT
Finding the Story — Generating Large-Scale Document Structure in
Semantics-to-Hypermedia Transformation
Lloyd RutledgeCWI, Amsterdam
The Topia Project● Principles and Goals
– Topiary Hypermedia: plant once and trim
– Presentation generation
– User-controlled and on-demand
– Automated propogation of each author change
– Structure-focussed
– Domain-independence and facilitated specificity
Topia projectpartners:
Telematica Instituut Technische UniversiteitEindhoven
Document Engineering “Triangle”
document
search
retrieval
structure
the engineer
style
the stylist
content
the archivist
topic
the user
Document Engineering Historypaper author
read
find
the past 5000 yearsWeb author
choose style
surf
read
universallyapplicable
the past 10 years
search database
post/archive browse results
select arch
ive
enter query
the past 5 years
pres
enta
tion
cre
ator
select/co
ntrol
select/control
select/co
ntrol
presentation
generated
by the end of this talk
universallyapplicable
user
topic
engineer
clustering
stylist
style sheet
archivist
selection structure presentationsemantics
The Topia Architecture
onl
ine
picks
Principles● User Control
– pick expert – set options – become author
● Cross-applicability– each expert’s contribution applies to any from the others
● Show what and why– why archivist selected content for user request
– why engineer put concept where it is in structure
– why stylist picked each media for its concept
Archivist’s Responsibilities
● To user– reasonable (amount of) content for reasonable requests
● To engineer– enough relations between subset to derive structure
● To stylist– media for presenting concepts in different structural
context
● Node-based interaction with all levels
<rdf:Description rdf:about='&ARIA;#ArtefactSK-A-2670'> <rdf:type rdf:resource='&ARIA;#Artefact'/> <dc:title>Pinks in the Breakers</dc:title> <topia:artefactImage rdf:resource='&RM;SK/Org/SK-A-2670.org.jpg'/> <dc:creator rdf:resource='&ARIA;#Artist11960'/> <dc:date>1875-1885</dc:date> <dc:description> ... along the beach by horses. Scheveningen did not ... </dc:description> <topia:artefactMaterial>Oil on canvas</topia:artefactMaterial> <topia:artefactSize>90 x 181 cm</topia:artefactSize> ...</rdf:Description>
Archival RDF Code
concept
concept
text
media
#
text<rdfs:label>
property type
concept type
<tableQueryResult> <header> <columnName>Concept</columnName> <columnName>Property</columnName> <columnName>String</columnName> </header> ... <tuple> <uri>&ARIA;#ArtefactSK-A-2670</uri> <uri>http://purl.org/dc/elements/1.1/#description</uri> <literal> ... along the beach by horses. Scheveningen did not ... </literal> </tuple> ...</tableQueryResult>
Selection SeRQL Result Code
recu
rren
ce
sequence
parent-child
hierarchical nodes
leaf nodes
from clustering
from user query
form introduction and summary displays
form main displays
Document Structure
Proximity Principle
● Proximity Matrix– each pair of selected concepts has a proximity measure
● Matching conceptual and structural proximity– grouping, sequence and recurrance convey proximity
● Let’s not forget why– presentation should convey why structurally proximate
concepts were measured as proximate
Concept Lattices(C1) “Waterâ€
(C4)
Gen
re: F
ield
mea
dow
s
(C5)
Gen
re: D
utch
Lan
dsca
pes
(C8)
Gen
re: R
iver
scap
es
“A watercourse at Abcoude†(A1) X X X X“Watercourse near â€̃s-Graveland†(A2) X X
“Mountainous landscape with waterfall†(A3) X X X X“A water mill†(A4) X X X
“Landscape with waterfall" (A5) X X X X“Water mill†(A6) X X
“Windmill on a polder waterway’†(A7) X X X X“A waterside ruin in Italy†(A8) X
“The battle of Waterloo, 18 june 1815†(A9) XConcept Size 5 3 3 3 3 3 3 2
(C2)
Gen
re: W
ater
ice
and
snow
(C3)
Gen
re: B
uild
ings
in la
ndsc
apes
(C6)
Art
ist:
Jaco
b v
an R
uisd
ael
(C7)
Gen
re: B
uild
ings
in la
ndsc
apes
(C9)
Art
ist:
Pau
l Jos
eph
Con
stan
tin G
abrie
l
Beyond Lattices
● Inferred properties to beef up the link metrics– we use art genre sub-class inheritence
– rules provided by archivist as domain-specific
● Relational clustering– property (ie lattice) functional subset of relational
– Can infer relations just like properties
● Axial (numeric) clustering– creates virtual group nodes, without RDF resource
<node> <concept literal="beach"/> <node> <concept property="artefactThema" resource="&ARIA;#Thema6292"/> <node><concept resource="&ARIA;#ArtefactRP-P-OB-4635"/></node> <node><concept resource="&ARIA;#ArtefactSK-A-2670"/></node> </node> <node> <concept property="type" resource="&ARIA;#Term26402"/> <concept property="type" resource="&ARIA;#BroaderTerm24480"/> <node><concept resource="&ARIA;#ArtefactRP-P-FM-1157-A"/></node> <node> <concept literal="Oil on canvas" property="artefactMaterial"/> <concept property="type" resource="&ARIA;#TopTerm4"/> <node><concept resource="&ARIA;#ArtefactSK-A-4868"/></node> <node><concept resource="&ARIA;#ArtefactSK-A-2670"/></node> <node><concept resource="&ARIA;#ArtefactSK-A-3602"/></node> <node><concept resource="&ARIA;#ArtefactSK-A-3597"/></node> <node><concept resource="&ARIA;#ArtefactSK-A-4644"/></node> </node> </node></node>
Structure Code
Stylist Responsibilities
● Good presentation of each concept– retrieval of good media
● Good presentation of structure– global view and local context
● Use media, layout and timing to show why– why primary content in presentation
– why structure was chosen
– group, sequence, (adjacency) and recurrence
One Example of Style
outline(structure)
main display(node)
originaluser request
defaultprogression
contextualrecurrenceaccess
seen
current
recurrence
<dc:
titl
e>
<dc:title>
<dc:creator><dc:date>
<dc:description>
Media for the Stylist
● Dublin Core for Main Display Text– title, description, date, creator
● Media URI’s for Main Display● Titles and thumbnails for outline and context● <rdfs:label> for why
– describes what type of concept a concept is
– describes property types, thus relations
– “Titus is the son of the painter Rembrandt”concept property type concept type concept
<xsl:template match="*" mode="getDesc"> <xsl:variable name="server" select='sesame:setServer("http://media.cwi.nl:8080/sesame/")'/> <xsl:variable name="rep" select='sesame:setRepository("topia")'/> <xsl:variable name="handle" select="@resource"/> <xsl:variable name="desc"> <sesame:serql query=" SELECT DISTINCT desc FROM {<!{$handle}>} <dc:description> {desc} USING NAMESPACE topia = <!http://www.telin.nl/rdf/topia#> "/> </xsl:variable> <xsl:text> </xsl:text> <xsl:apply-templates select="xalan:nodeset($desc)/tableQueryResult/tuple"/> <xsl:text> </xsl:text> </xsl:template>
character escaping removed
Media Selection XSLT
Topia Take-home Message
● Content/Style/Structure all separate– defined apart and interchangable
– full user control from selection as such
● Structure is current challenge for generation– can be defined apart and domain-independent
– facilitated user/engineer control
● Result is user-controlled on-demand hypermedia generation