ontopia tutorial
DESCRIPTION
A full-day tutorial covering all modules in the Ontopia Topic Maps engine.TRANSCRIPT
1
Ontopia Tutorial
TMRA 2010-09-29Lars Marius Garshol & Geir Ove Grønmo
2
Agenda
• About you– who are you?
• About Ontopia• The product• The future• Participating in the project
3
Some background
About Ontopia
4
Brief history
• 1999-2000– private hobby project for Geir Ove
• 2000-2009– commercial software sold by Ontopia AS– lots of international customers in diverse
fields
• 2009-– open source project
5
The project
• Open source hosted at Google Code• Contributors
– Lars Marius Garshol, Bouvet– Geir Ove Grønmo, Bouvet– Thomas Neidhart, SpaceApps– Lars Heuer, Semagia– Hannes Niederhausen, TMLab– Stig Lau, Bouvet– Baard H. Rehn-Johansen, Bouvet– Peter-Paul Kruijssen, Morpheus– Quintin Siebers, Morpheus– Matthias Fischer, HTW Berlin
6
Recent work
• Ontopia/Liferay integration– Matthias Fischer & LMG
• Various fixes and optimizations– everyone
• Tropics (RESTful web service interface)– SpaceApps
• Porting build system to Maven2– Morpheus
7
Architecture and modules
Product overview
8
The big picture
Engine
Ontopoly
Portlet support
CMSintegration
Data integration
OKP
Escenic
A.N.other
A.N.other
OtherCMSs
DB2TM
TMSync
A.N.other
A.N.other
Auto-class.
Taxon.import
Webservice
9
The engine
• Core API• TMAPI 2.0 support• Import/export• RDF conversion• TMSync• Fulltext search• Event API• tolog query
language• tolog update
language
Engine
10
The backends
• In-memory– no persistent storage– thread-safe– no setup
• RDBMS– transactions– persistent– thread-safe– uses caching– clustering
• Remote– uses web service– read-only– unofficial
Engine
Memory RDBMS Remote
11
DB2TM
• Upconversion to TMs– from RDBMS via
JDBC– or from CSV
• Uses XML mapping– can call out to Java
• Supports sync– either full rescan– or change table
Engine
Memory RDBMS Remote
DB2TM TMRAP Nav Classify
12
TMRAP
• Web service interface– via SOAP– via plain HTTP
• Requests– get-topic– get-topic-page– get-tolog– delete-topic– ...
Engine
Memory RDBMS Remote
DB2TM TMRAP Nav Classify
13
Navigator framework
• Servlet-based API– manage topic maps– load/scan/delete/
create
• JSP tag library– XSLT-like– based on tolog– JSTL integration
Engine
Memory RDBMS Remote
DB2TM TMRAP Nav Classify
14
Automated classification
• Undocumented– experimental
• Extracts text– autodetects format– Word, PDF, XML, HTML
• Processes text– detects language– stemming, stop-words
• Extracts keywords– ranked by importance– uses existing topics– supports compound
terms
Engine
Memory RDBMS Remote
DB2TM TMRAP Nav Classify
15
Vizigator
• Graphical visualization
• VizDesktop– Swing app to
configure– filter/style/...
• Vizlet– Java applet for web– uses configuration– loads via TMRAP– uses “Remote”
backend
Engine
Memory RDBMS Remote
DB2TM TMRAP Nav Classify
Viz Ontopoly
16
Ontopoly
• Generic editor– web-based, AJAX– meta-ontology in TM
• Ontology designer– create types and
fields– control user interface– build views– incremental dev
• Instance editor– guided by ontology
Engine
Memory RDBMS Remote
DB2TM TMRAP Nav Classify
Viz Ontopoly
17
Typical deployment
Application server
EngineDB
Backend
DB
DB
DB2TM
Framew
orks
UsersViewingapplication
Editors
Ontopoly
HTTP
TMRAP
External application
18
APIs
The engine
19
Core APIs
• net.ontopia.topicmaps.core.*• Fairly direct mapping from TMDM
– TopicIF– AssociationIF– TopicMapIF– ...
• Set/get methods reflect TMDM properties
20
TopicIF
• Interface, not a class– getTopicNames()– addTopicName(TopicNameIF)– removeTopicName(TopicNameIF)– getOccurrences() + add + remove– getSubjectIdentifiers() + add + remove– getItemIdentifiers() + add + remove– getSubjectLocators() + add + remove– getRoles()– getRolesByType(TopicIF)
21
Core interfaces
TopicMapStoreIF TopicMapIF
TopicIF AssociationIF
TopicNameIF OccurrenceIF AssociationRoleIF
VariantNameIF
22
How to get a TopicMapIF
• Create one directly– new
net...impl.basic.InMemoryTopicMapStore()
• Load one from file– using an importer (next slide)
• Connect to an RDBMS– covered later
• Use a topic map repository– covered later
23
TopicMapReaderIF
import net.ontopia.topicmaps.core.TopicMapIF;import net.ontopia.topicmaps.core.TopicMapReaderIF;import net.ontopia.topicmaps.utils.ImportExportUtils;
public class TopicCounter {
public static void main(String[] argv) throws Exception { TopicMapReaderIF reader = ImportExportUtils.getReader(argv[0]); TopicMapIF tm = reader.read(); System.out.println("TM contains " + tm.getTopics().size()
+ " topics"); } }
[larsga@c716c5ac1 tmp]$ java TopicCounter ~/data/bilder/privat/metadata.xtmTM contains 17035 topics[larsga@c716c5ac1 tmp]$
24
Supported syntaxes
Syntax Export Import
XTM 1.0 Y Y
XTM 2.0 Y Y
XTM 2.1 Y Y
CXTM Y
LTM 1.3 Y Y
CTM Y
TM/XML Y Y
JTM 1.0 Y Y
RDF Y Y
25
The utility classes
• A set of classes outside the core interfaces that perform common tasks– a number of these utilities are obsolete now that tolog is here
• They are all built on top of the core interfaces• Some important utilities
– ImportExportUtils creates readers and writers
– MergeUtilsmerges topics and topic maps– PSI contains important PSIs– DeletionUtils cascading delete of
topics– DuplicateSuppressionUtils removes duplicates– TopicStringifiers find names for topics
26
Topic Maps repository
• Uses a set of topic maps sources to build a set of topics maps– topic maps can be looked up by ID
• Many kinds of sources– scan directory for files matching pattern– download from URL– connect to RDBMS– ...
• Configurable using an XML file– tm-sources.xml
• Used by Navigator Framework
27
Event API
• Allows clients to receive notification of changes
• Must implement TopicMapListenerIF static class TestListener extends AbstractTopicMapListener {
public void objectAdded(TMObjectIF snapshot) { System.out.println("Topic added: " + snapshot.getObjectId()); }
public void objectModified(TMObjectIF snapshot) { System.out.println("Topic modified: " + snapshot.getObjectId()); }
public void objectRemoved(TMObjectIF snapshot) { System.out.println("Topic removed: " + snapshot.getObjectId()); } }
28
Using the API // register to listen for events TestListener listener = new TestListener(); TopicMapEvents.addTopicListener(ref, listener);
// get the store through the reference so the listener is registered ref.createStore(false);
// let's add a topic System.out.println("Off we go"); TopicMapBuilderIF builder = tm.getBuilder(); TopicIF newbie = builder.makeTopic(tm); System.out.println("Let's name this topic"); builder.makeTopicName(newbie, "Newbie topic");
// then let's remove it System.out.println("And now, the exit"); DeletionUtils.remove(newbie); System.out.println("Goodbye, short-lived topic");
[larsga@dhcp-98 tmp]$ java EventTest bkclean.xtm Off we goTopic added: 3409Let's name this topicTopic modified: 3409Topic modified: 3409And now, the exitTopic removed: 3409Goodbye, short-lived topic
29
For more information
• See the Engine Developer's Guide– http://www.ontopia.net/doc/current/doc/
engine/devguide.html
30
Persistence & transactions
RDBMS backend
31
RDBMS backend
• Stores Topic Maps in an RDBMS– generic schema– access via JDBC
• Provides full ACID– transactions– concurrency– ...
• Supports several databases– Oracle, MySQL, PostgreSQL, MS SQL Server,
hsql
• Clustering support
32
Core API implementation
• Implements same API as in-memory impl– theoretically, a switch requires only config
change
• Lazy loading of objects– objects loaded from DB as needed
• Considerable internal caching– for performance reasons
• Separate objects for separate transactions– in order to provide isolation
• Shared cache between transactions
33
Configuration
• A Java property file• Specifies
– database type– JDBC URL– username + password– cache settings– clustering settings– ...
34
jdbcspy
• A built-in SQL profiler• Useful for identifying cause of
performance issues
35
tolog
The Query Engine
36
tolog
• A logic-based query language– a mix of Prolog and SQL– effectively equivalent to Datalog
• Two parts– queries (data retrieval)– updates (data modification)
• Developed by Ontopia– not an ISO standard– eventually to be replaced by TMQL
37
tolog
• The recommended way to interact with the data– API programming is slow and cumbersome– tolog queries perform better
• Available via– Java API– Web service API– Forms interface in Omnigator
• tolog queries return API objects
38
Finding all operas by a composer
Collection operas = new ArrayList(); TopicIF composer = getTopicById("puccini"); TopicIF composed_by = getTopicById("composed-by"); TopicIF work = getTopicById("work"); TopicIF creator = getTopicById("composer");
for (AssociationRoleIF role1 : composer.getRolesByType(creator)) { AssociationIF assoc = role1.getAssociation(); if (assoc.getType() != composed_by) continue;
for (AssociationRoleIF role2 : assoc.getRoles()) { if (role2.getType() != work) continue;
operas.add(role2.getPlayer()); } }
39
Finding all operas by a composer
• composed-by(puccini : composer, $O : work)?
• composed-by($C : composer, tosca : work)?
• composed-by($C : composer, $O : work)?• composed-by(puccini : composer,
tosca : work)?
40
Features
• Access all aspects of a topic map• Generic queries independent of ontology• AND, OR, NOT, OPTIONAL• Count• Sort• LIMIT/OFFSET• Reusable inference rules
41
Chaining predicates (AND)
• Predicates can be chained– born-in($PERSON : person, $PLACE : place),
located-in($PLACE : containee, italy : container)?
• The comma between the predicates means AND• This query finds all the people born in Italy
– It first builds a two-column table of all born-in associations– Then, those rows where the place is not located-in Italy are
removed– (Note that when the PLACE variable is reused above that means
that the birthplace and the location must be the same topic in each match)
• Any number of predicates can be chained• Their order is insignificant
– Actually, the optimizer reorders the predicates– It will start with located-in because it has a topic constant
42
Thinking in predicates
• Most of you are probably used to functions, which work like this:– function(arg1, arg2, arg3) → result
• Predicates, however, are in a sense bidirectional, because of the way the pattern matching works– predicate(topic : role1, $VAR : role2)
– predicate($VAR : role1, topic : role2)
• The order of the roles are, on the other hand, insignificant– predicate(topic : role1, $VAR : role2)
– predicate($VAR : role2, topic : role1)
43
Projection
• Sometimes queries make use of temporary variables that we are not really interested in
• The way to get rid of unwanted variables is projection
• Syntax:select $variable1, $variable2, ... from <query>?
• The query is first run, then projected down to the requested variables
44
The instance-of predicate
• instance-of has the following form:– instance-of ( instance, class )– NOTE: the order of the arguments is significant
• Like players, instance and class may be specified in two ways:– using a variable ($name)– using a topic reference– e.g. instance-of ( $A, city )
• instance-of makes use of the superclass-subclass associations in the topic map– this means that composers will be considered
musicians, and musicians will be considered persons
45
Cities with the most premieres
using o for i"http://psi.ontopedia.net/"
select $CITY, count($OPERA) from instance-of($CITY, o:City), { o:premiere($OPERA : o:Work, $CITY : o:Place) | o:premiere($OPERA : o:Work, $THEATRE : o:Place), o:located_in($THEATRE : o:Containee, $CITY : o:Container) } order by $OPERA desc?
46
All non-hidden photos
select $PHOTO from instance-of($PHOTO, op:Photo) not(ph:hide($PHOTO : ph:hidden)), not(ph:taken-at($PHOTO : op:Image, $PLACE : op:Place), ph:hide($PLACE : ph:hidden)), not(ph:taken-during($PHOTO : op:Image, $EVENT : op:Event), ph:hide($EVENT : ph:hidden)), not(ph:depicted-in($PHOTO : ph:depiction, $PERSON : ph:depicted), ph:hide($PERSON : ph:hidden))?
47
Demo
• Show running queries in Omnigator• Also show query tracing
– Shakespeare– /* #OPTION: optimizer.reorder = false */
48
tologspy
• tolog query profiler– shares code with jdbcspy
49
Using the query engine API
The query engine API is really simple to use1.get a QueryProcessorIF object2.run a query in the QueryProcessorIF and get a
QueryResultIF3.loop over the results and use them4.close the result object5.go back to step 2, or do something else
There are two different QueryProcessorIF implementations
the API lets you write code without worrying about that, however
the two implementations behave identically
50
Running a query with the API
TopicMapIF tm = ...;QueryProcessorIF processor =
QueryUtils.getQueryProcessor(tm);QueryResultIF result = processor.execute(“instance-of($P,
person)?”);try { while (result.next()) { TopicIF person = (TopicIF) result.getValue(0); // do something useful with 'person' } } finally { result.close();}
51
Advanced options
It is possible to parse a query once, and then run it many times the processor returns a ParsedQueryIF
object, which can be executed parameters can be passed to the query on
each execution It is possible to make declarations
and use them across executions
52
Using a parsed query
ParsedQueryIF parsedQuery = processor.parse(“instance-of($P, %TYPE%)?”);
Map params = Collections.singletonMap(“TYPE”, person);QueryResultIF result = parsedQuery.execute(params);try { while (result.next()) { // ... } } finally { result.close();}
53
QueryWrapper
• Designed to make all of this easier• QueryWrapper qw = new
QueryWrapper(tm);• TopicIF topic = qw.queryForTopic(...);• List topics = qw.queryForList(...);• List<Person> people =
– qw.queryForList(..., mapper);
54
tolog updates
• Greatly simplifies TM modification• Also means you can do modification
without API programming– useful with RDBMS topic maps– useful with TMs in running web servers
• By performing a sequence of updates, just about any change can be made
• Potentially allows much more powerful architecture
55
DELETE
• Static form– delete lmg
• Dynamic form– delete $person from instance-of($person,
person)
• Delete a value– delete subject-identifier(topic,
“http://ex.org/tst”)
56
MERGE
• Static form– MERGE topic1, topic2
• Dynamic form– MERGE $p1, $p2 FROM
instance-of($p1, person),instance-of($p2, person),
– email($p1, $email),
– email($p2, $email)
57
INSERT
• Static formINSERT lmg isa person; - “Lars Marius Garshol”
.
• Dynamic formINSERT
tmcl:belongs-to-schema(tmcl:container : theschema, tmcl:containee: $c) FROM instance-of($c, tmcl:constraint)
58
INSERT again
INSERT ?y $psi . event-in-year(event: $e, year: ?y)FROM
start-date($e, $date),str:substring($y, $date, 4),str:concat($psi, "http://psi.semagia.com/iso8601", $y)
59
UPDATE
• Static form– UPDATE value(@3421, “New name”)
• Dynamic form– UPDATE value($TN, “Ontopia”)
FROM topic-name(oks, $TN)
60
More information
• Look at sample queries in Omnigator• tolog tutorial
– http://www.ontopia.net/doc/current/doc/query/tutorial.html
• tolog built-in predicate reference– http://www.ontopia.net/doc/current/doc/
query/predicate-reference.html
61
Conversion from RDBMS data
DB2TM
62
DB2TM
• Upconversion of relational data– either from CSV files or– over JDBC
• Based on an XML file describing the mapping– very highly configurable
• Support for– all of Topic Maps (except variants)– value transformations– synchronization
63
Standard use case
• Pull in data from external source– turn it into Topic Maps following some
ontology
• Enrich it– usually manually, but not necessarily
• Resync from source at intervals
64
DB2TM example
ID Name Website
1 Ontopia http://www.ontopia.net
2 United Nations http://www.un.org
3 Bouvet http://www.bouvet.no
<relation name="organizations.csv" columns="id name url"> <topic type="ex:organization">
<item-identifier>#org${id}</item-identifier>
<topic-name>${name}</topic-name>
<occurrence type="ex:homepage">${url}</occurrence>
</topic></relation>
+ =
Ontopia
United Nations
Bouvet
65
Creating associations
<relation name="people.csv" columns="id given family employer phone"> <topic id="employer"> <item-identifier>#org${employer}</item-identifier> </topic> <topic type="ex:person"> <item-identifier>#person${id}</item-identifier> <topic-name>${given} ${family}</topic-name> <occurrence type="ex:phone">${phone}</occurrence> <player atype="ex:employed-by" rtype="ex:employee"> <other rtype="ex:employer" player="#employer"/> </player> </topic> </relation>
66
Value transformations
<relation name="SCHEMATA" columns="SCHEMA_NAME"> <function-column name='SCHEMA_ID' method='net.ontopia.topicmaps.db2tm.Functions.makePSI'> <param>${SCHEMA_NAME}</param> </function-column> <topic type="mysql:schema"> <item-identifier>#${SCHEMA_ID}</item-identifier> <topic-name>${SCHEMA_NAME}</topic-name> </topic> </relation>
67
Running DB2TM
• java net.ontopia.topicmaps.db2tm.Execute– command-line tool– also works with RDBMS topic maps
• net.ontopia.topicmaps.db2tm.DB2TM– API class to run transformations– methods "add" and "sync"
68
More information
• DB2TM User's Guide– http://www.ontopia.net/doc/current/doc/
db2tm/user-guide.html
69
Synchronizing with other sources
TMSync
70
TMSync
• Configurable module for synchronizing one TM against another– define subset of source TM to sync (using
tolog)– define subset of target TM to sync (using
tolog)– the module handles the rest
• Can also be used with non-TM sources– create a non-updating conversion from the
source to some TM format– then use TMSync to sync against the
converted TM instead of directly against the source
71
How TMSync works
• Define which part of the target topic map you want,• Define which part of the source topic map it is the
master for, and• The algorithm does the rest
72
If the source is not a topic map
• Simply do a normal one-time conversion– let TMSync do the update for you
• In other words, TMSync reduces the update problem to a conversion problem
source.xmlconvert.xslt TMSync
73
The City of Bergen usecase
LOS
Service
Unit Person
City of Bergen
LOS
Norge.no
74
Web service interface
TMRAP
75
TMRAP basics
• Abstract interface– that is, independent of any particular technology– coarse-grained operations, to reduce network
traffic
• Protocol bindings exist– plain HTTP binding– SOAP binding
• Supports many syntaxes– XTM 1.0– LTM– TM/XML– custom tolog result-set syntax
76
get-topic
• Retrieves a single topic from the remote server– topic map may optionally be specified– syntax likewise
• Main use– to build client-side fragments into a bigger
topic map– to present information about a topic on a
different server
77
get-topic
• Parameters– identifier: a set of URIs (subject identifiers of wanted topic)– subject: a set of URIs (subject locators of wanted topic)– item: a set of URIs (item identifiers of wanted topic)– topicmap: identifier for topic map being queried– syntax: string identifying desired Topic Maps syntax in response– view: string identifying TM-Views view used to define fragment
• Response– topic map fragment representing topic in requested syntax– default is XTM fragment with all URI identifiers, names,
occurrences, and associations– in default view types and scopes on these constructs are only
identified by one <*Ref xlink:href=“...”/> XTM element– the same goes for associated topics
78
get-topic-page
• Returns link information about a topic– that is, where does the server present this
topic– mainly useful for realizing the portal
integration scenario– result information contains metadata about
server setup
79
get-topic-page
• Parameters– identifier: a set of URIs (subject identifiers of wanted topic)– subject: a set of URIs (subject locators of wanted topic)– item: a set of URIs (item identifiers of wanted topic)– topicmap: identifier for topic map being queried– syntax: string identifying desired Topic Maps syntax in response
• Response is a topic map fragment[oks : tmrap:server = "OKS Samplers local installation"][opera : tmrap:topicmap = "The Italian Opera Topic Map"] {opera, tmrap:handle, [[opera.xtm]]}tmrap:contained-in(oks : tmrap:container, opera : tmrap:containee)tmrap:contained-in(opera : tmrap:container, view : tmrap:containee)tmrap:contained-in(opera : tmrap:container, edit : tmrap:containee)[view : tmrap:view-page
%"http://localhost:8080/omnigator/models/..."][edit : tmrap:edit-page %"http://localhost:8080/ontopoly/enter.ted?..."][russia = "Russia”
@"http://www.topicmaps.org/xtm/1.0/country.xtm#RU"]
80
get-tolog
• Returns query results– main use is to extract larger chunks of the
topic map to the client for presentation– more flexible than get-topic– can achieve more with less network traffic
81
get-tolog
• Parameters– tolog: tolog query– topicmap: identifier for topic map being queried– syntax: string identifying desired syntax of response– view: string identifying TM-Views view used to
define fragment
• Response– if syntax is “tolog”
• an XML representation of the query result• useful if order of results matter
– otherwise, a topic map fragment containing multiple topics is returned
• as for get-topic
82
add-fragment
• Adds information to topic map on the server– does this by merging in a fragment
• Parameters– fragment: topic map fragment– topicmap: identifier for topic map being
added to– syntax: string identifying syntax of request
fragment
• Result– fragment imported into named topic map
83
update-topic
• Can be used to update a topic– add-fragment only adds information– update sets the topic to exactly the uploaded information
• Parameters– topicmap: the topic map to update– fragment: fragment containing the new topic– syntax: syntax of the uploaded fragment– identifier: a set of URIs (subject identifiers of wanted
topic)– subject: a set of URIs (subject locators of wanted topic)– item: a set of URIs (item identifiers of wanted topic)
• Update happens using TMSync
84
delete-topic
• Removes a topic from the server• Parameters
– identifier: a set of URIs (subject identifiers of wanted topic)
– subject: a set of URIs (subject locators of wanted topic)
– item: a set of URIs (item identifiers of wanted topic)
– topicmap: identifier for topic map being queried
• Result – deletes the identified topic
• includes all names, occurrences, and associations
85
tolog-update
• Runs a tolog update statement• Parameters
– topicmap: topic map to update– statement: tolog statement to run
• Runs the statement & commits the change
86
HTTP binding basics
• The mapping requires a base URL– e.g http://localhost:8080/tmrap/
• This is used to send requests– http://localhost:8080/tmrap/method?
param1=value1&...– GET is used for requests that do not cause
state changes– POST for requests that do
• Responses returned in response body
87
Exercise #1: Retrieve a topic
• Use the get-topic request to retrieve a topic from the server– base URL is http://localhost:8080/tmrap/– find the identifying URI in Omnigator– just print the retrieved fragment to get a look at it
• Note: you must escape the “#” character in URIs– otherwise it is interpreted as the anchor and not
transmitted at all– escape sequence: %23
• Note: you must specify the topic map ID– otherwise results will only be returned from loaded topic
maps– in other words: if the topic map isn’t loaded, you get no
results
88
Solution #1 (in Python)
import urllib
BASE = "http://localhost:8080/tmrap/tmrap/"psi = "http://www.topicmaps.org/xtm/1.0/country.xtm%23RU"
inf = urllib.urlopen(BASE + "get-topic?identifier=" + psi)print inf.read()inf.close()
89
Solution #1 (response)
<topicMap xmlns="http://www.topicmaps.org/xtm/1.0/"
xmlns:xlink="http://www.w3.org/1999/xlink">
<topic id="id458">
<instanceOf>
<subjectIndicatorRef xlink:href="http://psi.ontopia.net/geography/#country"/>
</instanceOf>
<subjectIdentity>
<subjectIndicatorRef xlink:href="http://www.topicmaps.org/xtm/1.0/country.xtm#RU"/>
<topicRef xlink:href="file:/.../WEB-INF/topicmaps/geography.xtmm#russia"/>
</subjectIdentity>
<baseName>
<baseNameString>Russia</baseNameString>
</baseName>
</topic>
90
Processing XTM with XSLT
• This is possible, but unpleasant– the main problem is that the XML is phrased in terms
of Topic Maps, not in domain terms– this means that all the XPath will talk about “topic”,
“association”, ... and not “person”, “works-for” etc
• The structure is also complicated– this makes queries complicated– for example, the XPath to traverse an association looks
like this://xtm:association [xtm:member[xtm:roleSpec / xtm:topicRef / @xlink:href =
'#employer'] [xtm:topicRef / @xlink:href = concat('#', $company)]] [xtm:instanceOf / xtm:topicRef / @xlink:href = '#employed-by']
91
TM/XML
• Non-standard XML syntax for Topic Maps– defined by Ontopia (presented at TMRA’05)– implemented in the OKS
• XSLT-friendly– much easier to process with XSLT than XTM– can be understood by developers who do not
understand Topic Maps– dynamic domain-specific syntaxes instead of
generic syntax– predictable (can generate XML Schema from
TM ontology)
92
TM/XML example<topicmap ... reifier="tmtopic"> <topicmap id="tmtopic"> <iso:topic-name><tm:value>TM/XML example</tm:value> </iso:topic-name> <dc:description>An example of the use of TM/XML.</dc:description> </topicmap>
<person id="lmg"> <iso:topic-name><tm:value>Lars Marius Garshol</tm:value> <tm:variant scope="core:sort">garshol, lars marius</tm:variant> </iso:topic-name> <homepage datatype="http://www.w3.org/2001/XMLSchema#anyURI" >http://www.garshol.priv.no</homepage> <created-by role="creator" topicref="tmtopic" otherrole="work"/> <presentation role="presenter"> <presented topicref="tmxml"/> <event topicref="tmra05"/> </presentation> </person></topicmap>
93
tmphoto
• A topic map to organize my personal photos– contains ~15,000 photos
• A web gallery runs on Ontopia– on www.garshol.priv.no
PhotoPerson
Event
Category
Location
http://www.garshol.priv.no/tmphoto/
94
tmtools
• An index of Topic Maps tools– organized as shown on the
right
• Again, web application for browsing– screenshots below
Softwareproduct
Person
Organization
Platform
Category
Technology
http://www.garshol.priv.no/tmtools/
95
The person page
Boring! No content.
96
And in tmphoto...
97
get-illustration
• A web service in tmphoto– receives the PSI of a person– then automatically picks a suitable photo of that
person
• Based on– vote score for photos,– categories (portrait),– other people in photo– ...
• The service returns– a topic map fragment with links to the person page
and a few different sizes of the selected photo
http://www.garshol.priv.no/blog/183.html
98
get-illustration
tmphoto tmtools
Do you have a photo ofhttp://psi.ontopedia.net/
Benjamin_Bock ?http://www.garshol.priv.no/tmphoto/get-illustration?identifier=http://psi.on....
Hmmm. Scores, categories,
people in photo, ...
Topic mapfragment
99
Voila...
100
Points to note
• No hard-wiring of links– just add identifiers when creating people topics– photos appear automatically– if a better photo is added later, it’s replaced
automatically
• No copying of data– no duplication, no extra maintenance
• Very loose binding– nothing application-specific
• Highly extensible– once the identifiers are in place we can easily pull
in more content from other sources
101
My blog
• Has more content about– people (tmphoto & tmtools),– events (tmphoto),– tools (tmtools),– technologies (tmtools)
• Should be available in those applications
102
Solution
• My blog posts are tagged– but the tags are topics, which can have PSIs– these PSIs are used in tmphoto and tmtools,
too
• The get-topic-page request lets tmphoto & tmtools ask the blog for links to relevant posts– given identifiers for a topic, returns links to
pages about that topic
http://www.garshol.priv.no/blog/145.html
103
get-topic-page
Blog tmphoto
Do you have pages abouthttp://psi.ontopedia.net/
TMRA_2008 ?http://www.garshol.priv.no/blog/get-topic-page?identifier=http://psi.on....
Topic mapfragment
Topics linking toindividual blog posts
104
In tmphoto
105
Making web applications
Navigator Framework
106
Ontopia Navigator Framework
• Java API for interacting with TM repository
• JSP tag library– based on tolog– kind of like XSLT in JSP with tolog instead of
XPath– has JSTL integration
• Undocumented parts– web presentation components– some wrapped as JSP tags– want to build proper portlets from them
107
How it works
Topic MapEngine
TagLibraries
JSP page
JSP page
JSP page
JSP page
Web server with JSP containere.g. Apache Tomcat
Browser
Browser
Browser
Browser
Topic Map
108
The two tag libraries
tolog makes up nearly the entire framework used to extract information from topic maps lets you execute tolog queries to extract
information from the topic map looping and control flow structures
template used to create template pages separates layout and structure from content not Topic Maps-aware optional, but recommended
109
How the tag libraries work
The topic map engine holds a registry of topic maps collected from the tm-sources.xml configuration file each topic map has its own id (usually the file name)
Each page also holds a set of variable bindings each variable holds a collection of objects objects can be topics, base names, locators,
strings, ... Tags access variables
some tags set the values of variables, while others use them
110
Building a JSP page
The <%@ taglib ... %> tags declare your tag libraries Tells the page which tag library to include and binds it
to a prefix Prefixes are used to qualify the tags (and avoid name
collisions) Use the <tolog:context> tag around the entire
page The "topicmap" attribute specifies the ID of the
current topic map The first time you access the page in your
browser the page gets compiled If you modify the page then it will be
recompiled the next time it is accessed
111 http://www.ontopia.net/operamap
112
Navigator tag library example
<%-- assume variable 'composer' is already set --%>
<p><b>Operas:</b><br/><tolog:foreach query=”composed-by(%composer% : composer, $OPERA : opera), { premiere-date($OPERA, $DATE) }?”> <li> <a href="opera.jsp?id=<tolog:id var="OPERA"/>”
><tolog:out var="OPERA"/></a>
<tolog:if var="DATE"> <tolog:out var="DATE"/> </tolog:if> </li></tolog:foreach></p>
113Elmer Preview
114
116
117
Possible configuration
Application directories
webapps myApp/
omnigator/
i18n/
*.jsp
WEB-INF/ config/
topicmaps/
web.xml
*.xml
*.xtm, *.ltm
118
The navigator configuration files
web.xml where to find the other files, plus plug-ins
tm-sources.xml tells the navigator where to find topic
maps log4j.properties
configuration of the log4j logging More details in the "Configuration
Guide" document
119
More information
• Navigator Framework Configuration Guide– http://www.ontopia.net/doc/current/doc/
navigator/config.html
• Navigator Framework Developer's Guide– http://www.ontopia.net/doc/current/doc/
navigator/navguide.html
• Navigator Framework Tag Library Reference– http://www.ontopia.net/doc/current/doc/
navigator/tolog-taglib.html
120
...
Automated classification
121
What is automated classification?
• Create parts of a topic map automatically– using the text in existing content as the source– not necessarily 100% automatic; user may help
out
• A hard task– natural language processing is very complex– result is never perfect
• However, it’s possible to achieve some results
122
Why automate classification?
• Creating a topic map requires intellectual effort– that is, it requires work by humans
• Human effort = cost– added value must be sufficient to justify the cost– in some cases either
• the cost is too high, or• the value added is too limited
• The purpose of automation is to lower the cost– this increases the number of cases where the use
of Topic Maps is justified
123
Automatable tasks
• Ontology– hard– depends on
requirements– one time only
• Instance data– hard– usually exists in
other sources
• Document keywords
– easier– frequent
operation– usually no other
sources
Project Person DepartmentWorked on Worked on
XYZ Project IT groupJane Doe
worked on employed in
124
Two kinds of categorization
• Broad categorization– categories are broadly defined– include many different subjects
• Narrow categorization– uses very specific keywords– each keyword is a single subject
Broad:Environment, Crisis managementNarrow:Water, Norway, drought, Drought Act, Cloud seeding, Morecambe Bay
125
What it does
• Extract keywords from content– goal is to use these for classification
• Not entity recognition– we only care about identifying what the
content is about
• Uses statistical approach– no attempt at full formal parsing of the text
126
Steps of operation
• Identify format– then, extract the text
• Identify language– then, remove stop words– stem remaining words
• Classify– can use terms from preexisting Topic Maps– exploits knowledge of the language
• Return proposed keywords
127
Example of keyword extraction
• topic maps1.0
• metadata0.57
• subject-based class.0.42• Core metadata
0.42• faceted classification
0.34• taxonomy
0.22• monolingual thesauri
0.19• controlled vocabulary
0.19• Dublin Core
0.16• thesauri
0.16• Dublin
0.15• keywords
0.15
128
Example #2
• Automated classification 1.05
• Topic Maps0.51 14
• XSLT0.38 11
• compound keywords 0.292
• keywords 0.2620
• Lars0.23 1
• Marius0.23 1
• Garshol 0.221
• ...
129
So how could this be used?
• To help users classify new documents in a CMS interface– suggest appropriate keywords, screened by user before
approval
• Automate classification of incoming documents– this means lower quality, but also lower cost
• Get an overview of interesting terms in a document corpus– classify all documents, extract the most interesting
terms– this can be used as the starting point for building an
ontology– (keyword extraction only)
130
Example user interface
• The user creates an article– this screen then used to add keywords– user adjusts the proposals from the classifier
131
Interfaces
• java net.ontopia.topicmaps.classify.Chew– <topicmapuri>– <inputfile>– produces textual output only
• net.ontopia.topicmaps.classify.SimpleClassifier– classify(uri, topicmap) -> TermDatabase– classify(uri) -> TermDatabase
132
Supported formats and languages
• XML (any schema)• HTML (non-XML)• PDF• Word (.doc, .docx)• PowerPoint
(.ppt, .pptx)• Plain text
• English• Norwegian
133
Visualization of Topic Maps
Vizigator
134
The Vizigator
• Graphical visualization of Topic Maps• Two parts
– VizDesktop: Swing desktop app for configuration
– Vizlet: Java applet for web deployment
• Configuration stored in XTM file
135
The uses of visualization
• Not really suitable for navigation– doesn't work for all kinds of data
• Great for seeing the big picture
136
Without configuration
137
With configuration
138
VizDesktop
139
The Vizigator
• The Vizigator uses TMRAP– the Vizlet runs in the browser (on the client)– a fragment of the topic map is downloaded from the server– the fragment is grown as needed
ServerTMRAP
140
Embedding the Vizlet
• Set up TMRAP service• Add ontopia-vizlet.jar• Add necessary HTML <applet code="net.ontopia.topicmaps.viz.Vizlet.class" archive="ontopia-vizlet.jar"> <param name="tmrap" value="/omnigator/plugins/viz/"> <param name="config" value="/omnigator/plugins/viz/config.jsp?tm=<%= tmid %>"> <param name="tmid" value="<%= tmid %>"> <param name="idtype" value="<%= idtype %>"> <param name="idvalue" value="<%= idvalue %>"> <param name="propTarget" value="VizletProp"> <param name="controlsVisible" value="true"> <param name="locality" value="1"> <param name="max-locality" value="5"></applet>
141
Topic Maps debugger
Omnigator
142
Omnigator
• Generic Topic Maps browser– very useful for seeing what's in a topic map– the second-oldest part of Ontopia
• Contains other features beyond simple browsing– statistics– management console– merging– tolog querying/updates– export
143
Ontology designer and editor
Ontopoly
144
Ontopoly
• A generic Topic Maps editor, in two parts– ontology editor: used to create the ontology and
schema– instance editor: used to enter instances based on
ontology
• Features– works with both XTM files and topic maps stored in
RDBMS backend– supports access control to administrative functions,
ontology, and instance editors– existing topic maps can be imported– parts of the ontology can be marked as read-only, or
hidden
145
Ontology designer
• Create ontology based on– topic, association, name, occurrence, and role
types
• Supports iterative ontology development– modify and prototype the ontology until it's
right
• Supports ontology annotation– add fields to topic types, for example
• Supports views– define restricted views of certain topic types
146
Instance editor
• Configured by the ontology editor– shows topics as defined by the ontology
• Has several ways to pick associations– drop-down list– by search– from hierarchy
• Avoids conflicts– pages viewed by one user are locked to others
147
Ontopoly is embeddable
• The Ontopoly instance editor can be embedded– basically, the main panel can be inserted into
another web application– uses an iframe
• Requires only ID of topic being edited– can also be restricted to a specific view
• Makes it possible to build easier-to-use editors– so users don't have to learn all of Ontopoly
148
Adding content features
CMS integrations
149
CMS integration
• The best way to add content functionality to Ontopia– the world doesn’t need another CMS– better to reuse those which already exist
• So far two integrations exist– Escenic– OfficeNet Knowledge Portal– more are being worked on
150
Implementation
• A CMS event listener– the listener creates topics for new CMS articles, folders, etc– the mapping is basically the design of the ontology used by this listener
• Presentation integration– it must be possible to list all topics attached to an article– conversely, it must be possible to list all articles attached to a topic– how close the integration needs to be here will vary, as will the difficulty
of the integration
• User interface integration– it needs to be possible to attach topics to an article from within the
normal CMS user interface– this can be quite tricky
• Search integration– the Topic Maps search needs to also search content in the CMS– can be achieved by writing a tolog plug-in
151
Articles as topics
• Goal: associate articles with topics– mainly to say what they are about– typically also want to include other metadata
• Need to create topics for the articles to do this– in fact, a general CMS-to-TM mapping is needed– must decide what metadata and structures to include
New city council appointed
is about
Elections
152
Mapping issues
• Article topics– what topic type to use?– title becomes name? (do you know the title?)– include author? include last modified? include workflow
state?– should all articles be mapped?
• Folders/directories/sections/...– should these be mapped, too?– one topic type for all folders/.../.../...?– if so, use associations to connect articles to folders– use associations to reproduce hierarchical folder structure
• Multimedia objects– should these be included?– what topic type? what name? ...
153
Two styles of mappings
Articles as articles• Topic represents only the article• Topic type is some subclass of “article”• “Is about” association connects article into topic map• Fields are presentational
– title, abstract, body
Articles as concepts• Topic represents some real-world subject (like a person)
– article is just the default content about that subject
• Type is the type of the subject (person)• Semantic associations to the rest of the topic map
– works in department, has competence, ...
• Fields can be semantic– name, phone no, email, ...
154
Article as article
• Article about building of a new school
• Is about association to “Primary schools”
• Topic type is “article”
155
Article as concept
Article about a sports hall
Article really represents the hall
Topic type is “Location”
Associations to– city borough– events in the location– category “Sports”
156
157
158
159
160
161
Ontopia/Liferay
• An integration with the Liferay CMS and portal is in progress– presented Friday 1130-1150 in Schiller 2
162
Two projects
Real-life usage
163
The project
• A new citizen’s portal for the city administration– strategic decision to make portal main interface for
interaction with citizens– as many services as possible are to be moved online
• Big project– started in late 2004, to continue at
least into 2008– ~5 million Euro spent by launch date– 1.7 million Euro budgeted for 2007– Topic Maps development is a fraction
of this (less than 25%)• Many companies involved
– Bouvet/Ontopia– Avenir– KPMG– Karabin– Escenic
164
Simplified original ontology
Externalresource
Category
Subject Department
Service
Employee
Borough
FormArticle
nearlyeverything
LOSService catalog
Payroll++
Escenic (CMS)
165
Data flow
OntopiaEscenic LOS
Fellesdata
Payroll(Agresso)Dexter/Extens Service
Catalog
DB2TM
TMSync
Ontopoly
Integration
166
Conceptual architecture
Ontopia Escenic
Application
Oracle Portal
Oracle Database
Datasources
167
The portal
168
Technical architecture
169
NRK/Skole
• Norwegian National Broadcasting (NRK)– media resources from the archives– published for use in schools– integrated with the National Curriculum
• In production– delayed by copyright wrangling
• Technologies– OKS– Polopoly CMS– MySQL database– Resin application server
170
Curriculum-based browsing (1)
Curriculum
Social studies
High school
171
Curriculum-based browsing (2)
Gender roles
172
Curriculum-based browsing (3)
Feminist movement in the 70s and 80sChanges to the family in the 70sThe prime minister’s husbandChildren choosing careersGay partnerships in 1993
173
One video (prime minister’s husband)
Metadata
Description
Subject
Person
Relatedresources
174
Conceptual architecture
Polopoly
Ontopia
MySQL
MediaDBGrep
RDBMS backend
DB2TMTMSync
HTTP
Editors
175
Implementation
• Domain model in Java– Plain old Java objects built on
• Ontopia’s Java API• tolog
• JSP for presentation– using JSTL on top of the domain model
• Subversion for the source code• Maven2 to build and deploy• Unit tests
176
What we’d like to see
The future
177
The big picture
Engine
Ontopoly
Portlet support
CMSintegration
Data integration
OKP
Escenic
A.N.other
A.N.other
OtherCMSs
DB2TM
XML2TM
A.N.other
A.N.other
Auto-class.
Taxon.import
Webservice
178
CMS integrations
• The more of these, the better• Candidate CMSs
– Liferay (being worked on at Bouvet)– Alfresco – Magnolia– Inspera– JSR-170 Java Content Repository– CMIS (OASIS web service standard)
179
Portlet toolkit
• Subversion contains a number of “portlets”– basically, Java objects doing presentation tasks– some have JSP wrappers as well
• Examples– display tree view– list of topics filterable by facets– show related topics– get-topic-page via TMRAP component
• Not ready for prime-time yet– undocumented– incomplete
180
Ontopoly plug-ins
• Plugins for getting more data from externals– TMSync import plugin– DB2TM plugin– Subj3ct.com plugin– adapted RDF2TM plugin– classify plugin– ...
• Plugins for ontology fragments– menu editor, for example
181
TMCL
• Now implementable• We’d like to see
– an object model for TMCL (supporting changes)
– a validator based on the object model– Ontopoly import/export from TMCL (initially)– refactor Ontopoly API to make it more
portable– Ontopoly ported to use TMCL natively
(eventually)
182
Things we’d like to remove
• OSL support– Ontopia Schema Language
• Web editor framework– unfortunately, still used by some major
customers
• Fulltext search– the old APIs for this are not really of any use
183
Management interface
• Import topic maps (to file or RDBMS)
184
What do you think?
• Suggestions?• Questions?• Plans?• Ideas?
185
Setting up the developer environment
Getting started
186
If you are using Ontopia...
• ...simply download the zip, then– unzip,– set the classpath,– start the server, ...
• ...and you’re good to go
187
If you are developing Ontopia...
• You must have– Java 1.5 (not 1.6 or 1.7 or ...)– Ant 1.6 (or later)– Ivy 2.0 (or later)– Subversion
• Then– check out the source from Subversion
svn checkout http://ontopia.googlecode.com/svn/trunk/ ontopia-read-only
– ant bootstrap– ant dist.jar.ontopia– ant test– ant dist.ontopia
188
Beware
• This is fun, because– you can play around with anything you want– e.g, my build has a faster
TopicIF.getRolesByType– you can track changes as they happen in svn
• However, you’re on your own– if it fails it’s kind of hard to say why– maybe it’s your changes, maybe not
• For production use, official releases are best
189
Participating etc
The project
190
Our goal
• To provide the best toolkit for building Topic Maps-based applications
• We want it to be– actively maintained,– bug-free,– scalable,– easy to use,– well documented,– stable,– reliable
191
Our philosophy
• We want Ontopia to provide as much useful more-or-less generic functionality as possible
• New contributions are generally welcome as long as– they meet the quality requirements, and– they don’t cause problems for others
192
The sandbox
• There’s a lot of Ontopia-related code which does not meet those requirements– some of it can be very useful,– someone may pick it up and improve it
• The sandbox is for these pieces– some are in Ontopia’s Subversion repository,– others are maintained externally
• To be “promoted” into Ontopia a module needs– an active maintainer,– to be generally useful, and– to meet certain quality requirements
193
Communications
• Join the mailing list(s)!– http://groups.google.com/group/ontopia– http://groups.google.com/group/ontopia-dev
• Google Code page– http://code.google.com/p/ontopia/– note the “updates” feed!
• Blog– http://ontopia.wordpress.com
• Twitter– http://twitter.com/ontopia
194
Committers
• These are the people who run the project– they can actually commit to Subversion– they can vote on decisions to be made etc
• Everyone else can– use the software as much as they want,– report and comment on issues,– discuss on the mailing list, and– submit patches for inclusion
195
How to become a committer
• Participate in the project!– that is, get involved first– let people get to know you, show some
commitment
• Once you’ve gotten some way into the project you can ask to become a committer– best if you have provided some patches first
• Unless you’re going to commit changes there’s no need to be a committer
196
Finding a task to work on
• Report bugs!– they exist. if you find any, please report them.
• Look at the open issues– there is always testing/discussion to be done
• Look for issues marked “newbie”– http://code.google.com/p/ontopia/issues/list?
q=label:Newbie
• Look at what’s in the sandbox– most of these modules need work
• Scratch an itch– if there’s something you want
fixed/changed/added...
197
How to fix a bug
• First figure out why you think it fails• Then write a test case
– based on your assumption– make sure the test case fails (test before you
fix)
• Then fix the bug– follow the coding guidelines (see wiki)
• Then run the test suite– verify that you’ve fixed the bug– verify that you haven’t broken anything
• Then submit the patch
198
The test suite
• Lots of *.test packages in the source tree– 3795 test cases as of right now– test data in ontopia/src/test-data– some tests are generators based on files– some of the test files come from cxtm-
tests.sf.net
• Run with– ant test– java net.ontopia.test.TestRunner src/test-
data/config/tests.xml test-group
199
Source tree structure
• net.ontopia.– utils various utilities– test various test
support code– infoset LocatorIF code +
cruft– persistence OR-mapper for RDBMS
backend– product cruft– xml various XML-
related utilities– topicmaps next slides
200
Source tree structure
• net.ontopia.topicmaps.– core core engine API– impl engine backends + utils– utils utilities (see next slide)– cmdlineutils command-line tools– entry TM repository– nav + nav2 navigator framework– query tolog engine– viz– classify– db2tm– webed cruft
201
Source tree structure
• net.ontopia.topicmaps.utils– * various utility classes– ltm LTM reader and writer– ctm CTM reader– rdf RDF converter (both ways)– tmrap TMRAP implementation
202
Let’s write some code!
203
The engine
• The core API corresponds closely to the TMDM– TopicMapIF, TopicIF, TopicNameIF, ...
• Compile with– ant init compile.ontopia– .class files go into ontopia/build/classes– ant dist.ontopia.jar # makes a jar
204
The importers
• Main class implements TopicMapReaderIF– usually, this lets you set up configuration, etc– then uses other classes to do the real work
• XTM importers– use an XML parser– main work done in XTM(2)ContentHandler– some extra code for validation and format detection
• CTM/LTM importers– use Antlr-based parsers– real code in ctm.g/ltm.g
• All importers work via the core API
205
Find an issue in the issue tracker
• (Picking one with “Newbie” might be good, – but isn’t necessary)
• Get set up– check out the source code– build the code– run the test suite
• Then dig in– we’ll help you with any questions you have
• At the end, submit a patch to the issue tracker– remember to use the test suite!