2008 © martin dzbor, 34th sofsem conf., slovakia best of both “using semantic web technologies to...
TRANSCRIPT
2008 © Martin Dzbor, 34th SofSem Conf., Slovakia
Best of BothBest of Both
“Using Semantic Web Technologies to Enrich User Interaction with the Web, and Vice-Versa”
Martin Dzbor
Knowledge Media Institute, The Open University (UK)
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 2
OutlineOutline
Motivation, gaps in current tools
Value for the users Exposing implicit semantics of legacy and public data
Taking advantage of (semantic) data redundancy Revyu.com case: linking open data project Watson case: gateway capable of analyzing and finding SW data PowerMagpie case: bringing implicit semantics to the user User interaction case: revisiting familiar GUI-s with semantics
Wrap up, next generation semantic web tools
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 3
The Web and MeaningThe Web and Meaning
Size of the information pool can be intimidating 2000: 7 million unique sites [OCLC report]
2005: 11-19.5 billion documents [Gulli, Signorini + Yahoo!]
2007: ~30 billion pages + 1 billion users [Netcraft report]
It’s not the pages that carry the bulk of meaning Number of facts and assertion is many times larger Effect of large and complex systems applies
Meaning of a page ≠ sum of meanings of embedded facts
Even more meaning is in links Links and relations pose substantial challenges
Publications
Sources
Centres
Projects
co-occur
Languages
Atomic Concepts
Authors
TechnologiesOWL <extends> XML
OWL <is_a> Markup_Lang
‘Class’ represents a collection of entities
co-occur
co-occur
publish
co-occur
Research Issues
Semantic ViewSemantic View
discussed_in
situated_in
expert_in
implemented_in
has_src
relates_to
relates_to
criticizes
coauthor
investigates
has_key
active_in
active_in
active_inresearched_by
xyz:Author
foaf:Person
abc:Institute
abc:University
skos:Document
dolce:Activity…
…
prj:Task
…
skos:Languagexyz:Relationdolce:Activity
…
…
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 5
Meaning and InterpretationMeaning and Interpretation
Constructivist view of knowledge on the Web “Most of our intelligent behaviour relies on the capability to see
and make connections.” [Vannevar Bush, 1946]
Yet connections are subjective Meaning on the Web thus arises in the eyes of user, reader Fact-based knowledge retrieval is not necessarily matching the
established meaning
A document with terms
‘truth’, ‘holocaust’
But not ‘the truth’
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 6
Human task is typically much more than a query Few activities we carry out can be directly & uniquely translated
to (formal) queries… Often, multiple queries need to be connected & data from them
interpreted, contextualized…
Interpretations are often imprecise Queries are hard to (re)formulate & expand by the users
Where Semantic Web can help Embed initial queries into potential exploratory paths
Don’t just respond to the queries Give alternatives, suggest what next/else can be done, and
why/why not
Beyond Retrieval QueriesBeyond Retrieval Queries
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 7
Comes Semantic WebComes Semantic Web
“Semantic Web is a Web of data” [Berners-Lee 2001]
Actually, of connected and exposed data… [Altova.com] Ideally, of interchangeable data… [W3C SW Activity]
Where is the meaning? Interchange enabled by committing data/facts to the same thing
Our interests Expose connections and commitments also in places where they
are so far implicit and hidden,… …using the existing web content as an enabling asset rather
than something intimidating,… …to support ordinary users in exploring and effectively making
sense of this vast information space.
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 8
From Motivation to StrategyFrom Motivation to Strategy
Key differences from other similar work: Limited manual handcrafting: aiming for scale and automated KA Designing for users: not only support knowledge sharing but also
doing something, using that knowledge
Software development approach Based on formative evaluations (w/real users) Tapping into legacy data sources, often DB-s (e.g. DBLP) Using information extraction techniques to enrich and validate
potential meanings of gleaned data (e.g. Corder, ExpertSearch) Exposing the semantics of (hypothesized and validated) relations
rather than individual ‘tags’
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 9
Sample Findings from One StudySample Findings from One Study
Product: ASPL [KnowledgeWeb, Magpie projects]
Web-based platform and plug-in to support learners on the Web to perform knowledge-intensive analyses in a given domain
Some key findings we had to address: Users not keen on ‘declarative semantics’
e.g. showing query results is insufficient; users wanted to know what (else) can be done with results, how to use them to learn something
Users (esp. more experienced) expect the semantic system to ‘know the domain’ e.g. individuals are often better characterized by the research communities
they belong to, by abstraction Resource finding and retrieval are not ‘selling points’
e.g. there are tools finding information more efficiently (e.g. Google); value is in supporting exploratory, interactive and customizable interaction
Example: generalizations and abstractions Interpret aggregations over a simple property in DBLP to formalize
semantically richer relationships; e.g.: Research community membership, Expertise and leadership in a particular research area, etc.
Based on information retrieval but the automated composition of partial findings provides richer means to navigate/explore
Opportunity to conceptualize results as new knowledge assertions Going beyond mere retrieval/search …
Example: generalizations and abstractions Interpret aggregations over a simple property in DBLP to formalize
semantically richer relationships; e.g.: Research community membership, Expertise and leadership in a particular research area, etc.
Based on information retrieval but the automated composition of partial findings provides richer means to navigate/explore
Opportunity to conceptualize results as new knowledge assertions Going beyond mere retrieval/search …
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 11
Positioning Semantic ToolsPositioning Semantic Tools
Semantic relevance
Popularity, statistics, etc.
None
Full automation Manual + interactiveAutomated + user choice
Masque
Blinkx
TextDigger
Precise
AquaLog
AskNow
Ask
Hakia
Ilqua
ASPL/DBLP++
Automatically embedded explanations
Automatically embedded explanations
Explanation upon request Explanation upon request
Explanation not presentExplanation not present
OrderingOrdering Classification + clusteringClassification + clustering
Visual clustering + labellingVisual clustering + labelling
SummariesSummaries
MasqueMasque
BlinkxBlinkx
GoogleGoogle
TextDiggerTextDigger
PrecisePrecise
AquaLogAquaLog
AskNowAskNow AskAsk
HakiaHakia
IlquaIlqua
ASPL/DBLP++ASPL/DBLP++
Position Chart 1:Result ranking function(sources, style)
Position Chart 1:Result ranking function(sources, style)
Position Chart 2:Explanatory function in post processing(technique, style)
Position Chart 2:Explanatory function in post processing(technique, style)
Full supportFull support
Partial support Partial support
Limited useLimited use
KeywordsKeywords PhrasesPhrases NL sentencesNL sentences ExamplesExamples
MasqueMasqueBlinkxBlinkx
GoogleGoogle
TextDiggerTextDigger
PrecisePreciseAquaLogAquaLog
AskNowAskNow
AskAsk
HakiaHakia
IlquaIlqua
ASPL/DBLP++ASPL/DBLP++
Position Chart 3:Query formulation(support, means)
Position Chart 3:Query formulation(support, means)
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 13
Rich GUI to ASPL contentRich GUI to ASPL content
Faceted DBLP (http://dblp.l3s.de)
Facets shown correspond to the same data as presented by ASPL in different screens/services; here they also act as query modifiers
Data record enables further navigation as a means to query
refinement and actual data access (e.g. PDF, BibTeX, DOI)
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 14
Modularity + Semantics Modularity + Semantics Sustainability Sustainability
All produced user end points remain after the end of the project that funded their development Hosted by their institutions and/or ongoing projects REASE and other RDF content is interesting for W3C SWEO for
education and outreach purposes
ASPL functionalities continue to be extended E.g. in the context of an independent collaboration between OU
and FAO’s Knowledge Systems Division Essentially, the entire ASPL ‘pipework’ can be reused, only the
domain ontology has to reflect FAO needs Further use of the ASPL technology currently explored (e.g.
bioinformatics, biological pathogens, etc.)
Magpie/ASPL in practice annotations equal to user
choosing ontological view FAO’s Agrovoc layered over an
(arbitrary) web page semantic browsing intertwined
with ‘classic’ browsing and showed as ‘taggings’
Magpie/ASPL in practice annotations equal to user
choosing ontological view FAO’s Agrovoc layered over an
(arbitrary) web page semantic browsing intertwined
with ‘classic’ browsing and showed as ‘taggings’
Semantic proximity link
Web link
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 16
Semantics of RedundancySemantics of Redundancy
Reusing (often non-semantic) data to produce semantic annotations and interchangeability Previous examples reuse non-semantic data (SQL DB) While data content in DB is not semantic, certain composite
queries have a well-defined semantic interpretation Hence, such queries act as if they were feeding semantic
annotations onto the singular Web resources
More importantly, this approach to exposing semantics takes advantage of the Web nature Information on the Web is captured redundantly Law of big numbers & statistical correlations
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 17
Same Idea in Linking Open DataSame Idea in Linking Open Data
Some information is captured redundantly in several places
Use it as a standard ‘JOIN’ in SQL queries…
Say that the two statements are about the same thing…
…which gives an access to additional information from (e.g.) specialized data sets
Handy in: eliminating the eternal bane of
data sharing = form filling seeding the data input forms
with ‘obvious knowledge’
Source: http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 18
Revyu.com in Linking Open DataRevyu.com in Linking Open DataLinked data gleaned from (e.g.) Amazon
User’s data from a minimalist
data form
Expressing the sameness in formal RDF
Returning the mash-up back (to the Web)
Semantic data disguised into a folksonomy
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 19
Novelty vs. FamiliarityNovelty vs. Familiarity
Novelty of technologies like Semantic Web has drawbacks Hard to sustain over longer period Creates resistance to the proposed change
Technology (Semantic Web) is not the sole new thing the user has to cope with! Many tools assume new user roles Many tools assume new interaction modalities
Try a different view… Instead of pushing new technology, we need to improve the overall use
experience ‘How can Semantic Web make task I’m doing different = easier, faster,
simpler,…’
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 20
Getting Hold of Semantic WebGetting Hold of Semantic Web
NG SW Application Semantic WebSmart Feature
New applications need to exploit SW at large Dynamically retrieving relevant semantic data Combining several, heterogeneous models (ontologies)
Need tools and infrastructures to efficiently access the knowledge available on SW: a Gateway…
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 21
Why Gateway?Why Gateway?
Functionality beyond discovering, indexing, and retrieving is necessary Because of heterogeneity in terms of data quality… Because of heterogeneity in terms of data coverage… Because of a substantial degree of knowledge duplicity…
Watson Case: http://watson.kmi.open.ac.uk
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 22
Analyzing Semantic ContentAnalyzing Semantic Content
Have to deal with heterogeneity; great variety in:
Size Coverage
Richness
Etc.
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 23
Watson ArchitectureWatson Architecture
Keyword Search
SPARQLQuery
Crawling Parsing(Jena)
Validation/Analysis
Indexing
RepositoryURLs Metadata Indexes
populate populate populate populate
useextractretrieve
OntologyExploration
queries queriesqueriesqueries queries
request
WWW
discover
Collecting Analyzing
Querying
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 25
Collecting and Analyzing KnowledgeCollecting and Analyzing Knowledge
Web and DB retrieval techniques crawl through data repositories and pages with semantic content
E.g. in October 2007 Watson collected tens of thousands
of semantic documents That represents millions of RDF entities
(most of them being instances)
Yet, in terms of models… Conceptually ‘same’ data often occur
in numerous duplicates and near duplicates Which affects reliability of reasoning
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 26
A Gateway to the Semantic WebA Gateway to the Semantic Web
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 27
Watson & Multiple Ontologies in UseWatson & Multiple Ontologies in Use
In rapid (ontology) prototyping and modelling
Near-duplicate models of a term (e.g. ‘Human’)
Chunk of a model around selected node reused by acknowledging its redundancy
Thus, new ontology created from reused fragments of existing (often tested) models:
Engineering process is much faster The outcome is a less error-prone model
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 28
Watson & Multiple Ontologies in UseWatson & Multiple Ontologies in Use
Embedding semantics into ordinary web pages, plain text, and other content currently without it
Purpose Impose a particular interpretative frame onto a web page to bias
its interpretation Highlight conceptual entities that are key in a particular context Enable user navigation and browsing in (a part of) Semantic Web
knowledge space
Case: my Magpie framework [Dzbor et al. 2007 in JWS]
Usually, I draw people’s attention to what can be done with it But today let’s look at limitations
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 30
Magpie vs. Semantic Web BrowsingMagpie vs. Semantic Web Browsing
User picks up one ontology to annotate pages Precision: ; recall: (?)
Annotated entities carry one meaning only E.g. Virus Comp_Program
Entity-based approach to annotating only Usually instances like ‘OWL’
as members of categories Visual presentation limited to
entity highlighting
System offers a range of ontologies applicable Let users to balance P/R
In reality, it’s useful to see and use also other senses E.g. what if Virus Organism
Sometimes other views are better in a specific context Concept senses view Ontology network view Topic view, etc.
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 31
Towards PowerMagpieTowards PowerMagpie
Relevant SW content
let user select
browse via semantic links
adapt ‘fingerprint’
improve SW content retrieval
enrich by redundancy
rank, filter,…customize presentation
use ‘fingerprint’to discover SW content
Web Page
visualize
GUI-s
visualize
calculate page ‘fingerprint’
Characteristics of the web page
send to userUser
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 32
Selecting Multiple OntologiesSelecting Multiple Ontologies
core ontology
ontology extension
by declaration
e.g. Carnivor Animal & eats.Meat
by inconsistence different ont. frames
e.g. YellowFin AtlanticHabitat
ontology
specialization
by reference
e.g. Albacore Tuna
analogous ontology
by mapping e.g. Contamination Polution
Other relevant semantic databy classification
e.g. SoilAcidification
Known irrelevant semantic data
by classification
e.g. Baltic
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 33
Example: Entities in Ontology NetworksExample: Entities in Ontology Networks
None of the following views is ‘typically ontological’ Entities are presented in a more familiar ‘tag style’ Node positions reflect semantic proximity, similarity, ‘sameness’
These are truly from one ontology:proximity = ontological distance
These are collated from multiple sources:proximity = repetitive contextual co-occurrence
Source: Cipher project, 2005
Source: NeOn project, 2008
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 34
Example: Selecting OntologiesExample: Selecting Ontologies
A concept occurring in more than one ontology:redundancy of occurrence
A concept occurring in more than one role, sense:redundancy of meaning
A concept from topicallymore distant ontology:divergence into new frame
Statistics of the ontologyor entity:provenance of information
Versions of the same ontologydiscovered by PowerMagpie:temporality of occurrence
PowerMagpie analyzes a web page, proposes and justifies relevant entities Supporting divergent navigation Supporting time snapshots Acknowledging multiple meanings Exposing redundancy of occurrences Etc.
Next step: improve the user interaction, GUI Making the ontology-driven interaction more serendipitous,
natural and embedded in standard browsing
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 35
Where is ‘Best of Both’?Where is ‘Best of Both’?
Essentially in two contradictory properties: Ontologies are expressive, well-defined But they are fairly sparse in terms of content
It’s the sparseness that makes meaningful browsing and navigation difficult In any single ontology we are merely performing graph/tree
navigation = often falling into the ‘closed world assumption’ Semantic Web affords more flexibility
It may not enable us to tell which sense of a term we see But it is sufficiently connected to enable us telling the difference
between the senses: On the level of ontology networks but also individuals
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 36
Comparing SW and KBSComparing SW and KBS
Why should it work now if it hasn’t in the past? There are some key changes in the play:
Classic KBS SW Systems
Representation 'Clean' 'Good Enough'
Size Small/Medium Extra Huge
Repr. Schema Homogeneous Heterogeneous
Quality High Very Variable
Degree of trust High Very Variable
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 37
Key Paradigm ShiftKey Paradigm Shift
Classic KBS SW Systems
Intelligence
A function of sophisticated, task-centric problem solving
A side-effect of size and heterogeneity
(Collective Intelligence)
Is due to information and data redundancy There are not only numerous documents - with little formal
semantic structure, but also… numerous formal take-ons trying to conceptualize user views
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 38
How Far Are We?How Far Are We?
Solutions working in a new dynamic context (run-time rather than design-time) Example: Ontology Mapping
So far: mostly design-time mapping of (two) complete ontologies Mapping many partial, incomplete ontologies, ontological modules?
Example: Ontology Selection So far: largely by querying, user-mediated ontology retrieval Selecting networks of not contradictory partial ontologies?
Example: Ontology Modularization So far: by and large has the user in the loop, consistency-driven Many diverse drivers (access right, trust, scale, summarization,…)
The context of the above tasks is changing
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 39
Next Generation SW ToolsNext Generation SW Tools
Make away with singular data sources Incl. ontologies, classification trees, maps,…
For them ontology becomes a dynamic notion Ontology = a selection of modules appropriate to a particular
context, situation, user, task,…
New challenges arise Discovering semantic content and relations in it Modularizing large sources of semantic content Selecting (parts of) semantic content or ontologies Support user interaction on such a large scale Etc.
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 40
Some ReadingSome Reading Dzbor, M. - Motta, E. - Domingue, J.B.: Magpie: Experiences in supporting
Semantic Web browsing. Journal of Web Semantics, Vol.5, No.3., pp.204-222. Elsevier Publishers, The Netherlands.
Dzbor, M. - Motta, E.: Semantic Web Technology to Support Learning about the Semantic Web. In 13th Intl. Conf. on Artificial Intelligence in Education (AIED). July 2007, California, US.
Sabou, M. - Lopez, V. - Motta, E. (2006). Ontology Selection for the Real Semantic Web: How to Cover the Queen’s Birthday Dinner?. Proc. of the EKAW 2006 Conf., Podebrady, Czech Republic.
D'Aquin, M., - Sabou, M. - Motta, E. (2006). Modularization: A key for the dynamic selection of relevant knowledge components. ISWC 2006 Workshop on Ontology Modularization, Georgia, US.
Motta, E. (2006). Knowledge Publishing and Access on the Semantic Web: A Socio-Technological Analysis. IEEE Intelligent Systems, Vol.21, No.3, pp.88-90. IEEE Press, US.
23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 41
Acknowledgements and Web SitesAcknowledgements and Web Sites
Work presented has been developed in the context of the following projects and activities: Magpie (info, demo, download): http://kmi.open.ac.uk/projects/magpie PowerMagpie (info, demo, download): http://powermagpie.open.ac.uk Watson (info, UI, API download): http://watson.kmi.open.ac.uk
NeOn project: http://www.NeOn-project.org OpenKnowledge project: http://www.openk.org Papers cited and personal pages: http://kmi.open.ac.uk
Acknowledging funding from European Commission’s Framework 6 (NeOn & OpenKnowledge), EPSRC and NERC
Also thanks to Laurian Gridinoc, Enrico Motta, Joerg Diederich, etc. for their input to some of the ideas presented