2008 © martin dzbor, 34th sofsem conf., slovakia best of both “using semantic web technologies to...

40
2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Best of Both Best of Both “Using Semantic Web Technologies to Enrich User Interaction with the Web, and Vice- Versa” Martin Dzbor Knowledge Media Institute, The Open University (UK)

Upload: harley-farren

Post on 14-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

2008 © Martin Dzbor, 34th SofSem Conf., Slovakia

Best of BothBest of Both

“Using Semantic Web Technologies to Enrich User Interaction with the Web, and Vice-Versa”

Martin Dzbor

Knowledge Media Institute, The Open University (UK)

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 2

OutlineOutline

Motivation, gaps in current tools

Value for the users Exposing implicit semantics of legacy and public data

Taking advantage of (semantic) data redundancy Revyu.com case: linking open data project Watson case: gateway capable of analyzing and finding SW data PowerMagpie case: bringing implicit semantics to the user User interaction case: revisiting familiar GUI-s with semantics

Wrap up, next generation semantic web tools

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 3

The Web and MeaningThe Web and Meaning

Size of the information pool can be intimidating 2000: 7 million unique sites [OCLC report]

2005: 11-19.5 billion documents [Gulli, Signorini + Yahoo!]

2007: ~30 billion pages + 1 billion users [Netcraft report]

It’s not the pages that carry the bulk of meaning Number of facts and assertion is many times larger Effect of large and complex systems applies

Meaning of a page ≠ sum of meanings of embedded facts

Even more meaning is in links Links and relations pose substantial challenges

Publications

Sources

Centres

Projects

co-occur

Languages

Atomic Concepts

Authors

TechnologiesOWL <extends> XML

OWL <is_a> Markup_Lang

‘Class’ represents a collection of entities

co-occur

co-occur

publish

co-occur

Research Issues

Semantic ViewSemantic View

discussed_in

situated_in

expert_in

implemented_in

has_src

relates_to

relates_to

criticizes

coauthor

investigates

has_key

active_in

active_in

active_inresearched_by

xyz:Author

foaf:Person

abc:Institute

abc:University

skos:Document

dolce:Activity…

prj:Task

skos:Languagexyz:Relationdolce:Activity

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 5

Meaning and InterpretationMeaning and Interpretation

Constructivist view of knowledge on the Web “Most of our intelligent behaviour relies on the capability to see

and make connections.” [Vannevar Bush, 1946]

Yet connections are subjective Meaning on the Web thus arises in the eyes of user, reader Fact-based knowledge retrieval is not necessarily matching the

established meaning

A document with terms

‘truth’, ‘holocaust’

But not ‘the truth’

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 6

Human task is typically much more than a query Few activities we carry out can be directly & uniquely translated

to (formal) queries… Often, multiple queries need to be connected & data from them

interpreted, contextualized…

Interpretations are often imprecise Queries are hard to (re)formulate & expand by the users

Where Semantic Web can help Embed initial queries into potential exploratory paths

Don’t just respond to the queries Give alternatives, suggest what next/else can be done, and

why/why not

Beyond Retrieval QueriesBeyond Retrieval Queries

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 7

Comes Semantic WebComes Semantic Web

“Semantic Web is a Web of data” [Berners-Lee 2001]

Actually, of connected and exposed data… [Altova.com] Ideally, of interchangeable data… [W3C SW Activity]

Where is the meaning? Interchange enabled by committing data/facts to the same thing

Our interests Expose connections and commitments also in places where they

are so far implicit and hidden,… …using the existing web content as an enabling asset rather

than something intimidating,… …to support ordinary users in exploring and effectively making

sense of this vast information space.

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 8

From Motivation to StrategyFrom Motivation to Strategy

Key differences from other similar work: Limited manual handcrafting: aiming for scale and automated KA Designing for users: not only support knowledge sharing but also

doing something, using that knowledge

Software development approach Based on formative evaluations (w/real users) Tapping into legacy data sources, often DB-s (e.g. DBLP) Using information extraction techniques to enrich and validate

potential meanings of gleaned data (e.g. Corder, ExpertSearch) Exposing the semantics of (hypothesized and validated) relations

rather than individual ‘tags’

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 9

Sample Findings from One StudySample Findings from One Study

Product: ASPL [KnowledgeWeb, Magpie projects]

Web-based platform and plug-in to support learners on the Web to perform knowledge-intensive analyses in a given domain

Some key findings we had to address: Users not keen on ‘declarative semantics’

e.g. showing query results is insufficient; users wanted to know what (else) can be done with results, how to use them to learn something

Users (esp. more experienced) expect the semantic system to ‘know the domain’ e.g. individuals are often better characterized by the research communities

they belong to, by abstraction Resource finding and retrieval are not ‘selling points’

e.g. there are tools finding information more efficiently (e.g. Google); value is in supporting exploratory, interactive and customizable interaction

Example: generalizations and abstractions Interpret aggregations over a simple property in DBLP to formalize

semantically richer relationships; e.g.: Research community membership, Expertise and leadership in a particular research area, etc.

Based on information retrieval but the automated composition of partial findings provides richer means to navigate/explore

Opportunity to conceptualize results as new knowledge assertions Going beyond mere retrieval/search …

Example: generalizations and abstractions Interpret aggregations over a simple property in DBLP to formalize

semantically richer relationships; e.g.: Research community membership, Expertise and leadership in a particular research area, etc.

Based on information retrieval but the automated composition of partial findings provides richer means to navigate/explore

Opportunity to conceptualize results as new knowledge assertions Going beyond mere retrieval/search …

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 11

Positioning Semantic ToolsPositioning Semantic Tools

Semantic relevance

Popularity, statistics, etc.

None

Full automation Manual + interactiveAutomated + user choice

Masque

Blinkx

Google

TextDigger

Precise

AquaLog

AskNow

Ask

Hakia

Ilqua

ASPL/DBLP++

Automatically embedded explanations

Automatically embedded explanations

Explanation upon request Explanation upon request

Explanation not presentExplanation not present

OrderingOrdering Classification + clusteringClassification + clustering

Visual clustering + labellingVisual clustering + labelling

SummariesSummaries

MasqueMasque

BlinkxBlinkx

GoogleGoogle

TextDiggerTextDigger

PrecisePrecise

AquaLogAquaLog

AskNowAskNow AskAsk

HakiaHakia

IlquaIlqua

ASPL/DBLP++ASPL/DBLP++

Position Chart 1:Result ranking function(sources, style)

Position Chart 1:Result ranking function(sources, style)

Position Chart 2:Explanatory function in post processing(technique, style)

Position Chart 2:Explanatory function in post processing(technique, style)

Full supportFull support

Partial support Partial support

Limited useLimited use

KeywordsKeywords PhrasesPhrases NL sentencesNL sentences ExamplesExamples

MasqueMasqueBlinkxBlinkx

GoogleGoogle

TextDiggerTextDigger

PrecisePreciseAquaLogAquaLog

AskNowAskNow

AskAsk

HakiaHakia

IlquaIlqua

ASPL/DBLP++ASPL/DBLP++

Position Chart 3:Query formulation(support, means)

Position Chart 3:Query formulation(support, means)

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 13

Rich GUI to ASPL contentRich GUI to ASPL content

Faceted DBLP (http://dblp.l3s.de)

Facets shown correspond to the same data as presented by ASPL in different screens/services; here they also act as query modifiers

Data record enables further navigation as a means to query

refinement and actual data access (e.g. PDF, BibTeX, DOI)

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 14

Modularity + Semantics Modularity + Semantics Sustainability Sustainability

All produced user end points remain after the end of the project that funded their development Hosted by their institutions and/or ongoing projects REASE and other RDF content is interesting for W3C SWEO for

education and outreach purposes

ASPL functionalities continue to be extended E.g. in the context of an independent collaboration between OU

and FAO’s Knowledge Systems Division Essentially, the entire ASPL ‘pipework’ can be reused, only the

domain ontology has to reflect FAO needs Further use of the ASPL technology currently explored (e.g.

bioinformatics, biological pathogens, etc.)

Magpie/ASPL in practice annotations equal to user

choosing ontological view FAO’s Agrovoc layered over an

(arbitrary) web page semantic browsing intertwined

with ‘classic’ browsing and showed as ‘taggings’

Magpie/ASPL in practice annotations equal to user

choosing ontological view FAO’s Agrovoc layered over an

(arbitrary) web page semantic browsing intertwined

with ‘classic’ browsing and showed as ‘taggings’

Semantic proximity link

Web link

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 16

Semantics of RedundancySemantics of Redundancy

Reusing (often non-semantic) data to produce semantic annotations and interchangeability Previous examples reuse non-semantic data (SQL DB) While data content in DB is not semantic, certain composite

queries have a well-defined semantic interpretation Hence, such queries act as if they were feeding semantic

annotations onto the singular Web resources

More importantly, this approach to exposing semantics takes advantage of the Web nature Information on the Web is captured redundantly Law of big numbers & statistical correlations

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 17

Same Idea in Linking Open DataSame Idea in Linking Open Data

Some information is captured redundantly in several places

Use it as a standard ‘JOIN’ in SQL queries…

Say that the two statements are about the same thing…

…which gives an access to additional information from (e.g.) specialized data sets

Handy in: eliminating the eternal bane of

data sharing = form filling seeding the data input forms

with ‘obvious knowledge’

Source: http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 18

Revyu.com in Linking Open DataRevyu.com in Linking Open DataLinked data gleaned from (e.g.) Amazon

User’s data from a minimalist

data form

Expressing the sameness in formal RDF

Returning the mash-up back (to the Web)

Semantic data disguised into a folksonomy

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 19

Novelty vs. FamiliarityNovelty vs. Familiarity

Novelty of technologies like Semantic Web has drawbacks Hard to sustain over longer period Creates resistance to the proposed change

Technology (Semantic Web) is not the sole new thing the user has to cope with! Many tools assume new user roles Many tools assume new interaction modalities

Try a different view… Instead of pushing new technology, we need to improve the overall use

experience ‘How can Semantic Web make task I’m doing different = easier, faster,

simpler,…’

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 20

Getting Hold of Semantic WebGetting Hold of Semantic Web

NG SW Application Semantic WebSmart Feature

New applications need to exploit SW at large Dynamically retrieving relevant semantic data Combining several, heterogeneous models (ontologies)

Need tools and infrastructures to efficiently access the knowledge available on SW: a Gateway…

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 21

Why Gateway?Why Gateway?

Functionality beyond discovering, indexing, and retrieving is necessary Because of heterogeneity in terms of data quality… Because of heterogeneity in terms of data coverage… Because of a substantial degree of knowledge duplicity…

Watson Case: http://watson.kmi.open.ac.uk

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 22

Analyzing Semantic ContentAnalyzing Semantic Content

Have to deal with heterogeneity; great variety in:

Size Coverage

Richness

Etc.

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 23

Watson ArchitectureWatson Architecture

Keyword Search

SPARQLQuery

Crawling Parsing(Jena)

Validation/Analysis

Indexing

RepositoryURLs Metadata Indexes

populate populate populate populate

useextractretrieve

OntologyExploration

queries queriesqueriesqueries queries

request

WWW

discover

Collecting Analyzing

Querying

Watson Web User Interface: http://watson.kmi.open.ac.uk/WatsonWUI

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 25

Collecting and Analyzing KnowledgeCollecting and Analyzing Knowledge

Web and DB retrieval techniques crawl through data repositories and pages with semantic content

E.g. in October 2007 Watson collected tens of thousands

of semantic documents That represents millions of RDF entities

(most of them being instances)

Yet, in terms of models… Conceptually ‘same’ data often occur

in numerous duplicates and near duplicates Which affects reliability of reasoning

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 26

A Gateway to the Semantic WebA Gateway to the Semantic Web

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 27

Watson & Multiple Ontologies in UseWatson & Multiple Ontologies in Use

In rapid (ontology) prototyping and modelling

Near-duplicate models of a term (e.g. ‘Human’)

Chunk of a model around selected node reused by acknowledging its redundancy

Thus, new ontology created from reused fragments of existing (often tested) models:

Engineering process is much faster The outcome is a less error-prone model

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 28

Watson & Multiple Ontologies in UseWatson & Multiple Ontologies in Use

Embedding semantics into ordinary web pages, plain text, and other content currently without it

Purpose Impose a particular interpretative frame onto a web page to bias

its interpretation Highlight conceptual entities that are key in a particular context Enable user navigation and browsing in (a part of) Semantic Web

knowledge space

Case: my Magpie framework [Dzbor et al. 2007 in JWS]

Usually, I draw people’s attention to what can be done with it But today let’s look at limitations

Magpie (Dzbor et al. 2007): http://kmi.open.ac.uk/projects/mapgie

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 30

Magpie vs. Semantic Web BrowsingMagpie vs. Semantic Web Browsing

User picks up one ontology to annotate pages Precision: ; recall: (?)

Annotated entities carry one meaning only E.g. Virus Comp_Program

Entity-based approach to annotating only Usually instances like ‘OWL’

as members of categories Visual presentation limited to

entity highlighting

System offers a range of ontologies applicable Let users to balance P/R

In reality, it’s useful to see and use also other senses E.g. what if Virus Organism

Sometimes other views are better in a specific context Concept senses view Ontology network view Topic view, etc.

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 31

Towards PowerMagpieTowards PowerMagpie

Relevant SW content

let user select

browse via semantic links

adapt ‘fingerprint’

improve SW content retrieval

enrich by redundancy

rank, filter,…customize presentation

use ‘fingerprint’to discover SW content

Web Page

visualize

GUI-s

visualize

calculate page ‘fingerprint’

Characteristics of the web page

send to userUser

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 32

Selecting Multiple OntologiesSelecting Multiple Ontologies

core ontology

ontology extension

by declaration

e.g. Carnivor Animal & eats.Meat

by inconsistence different ont. frames

e.g. YellowFin AtlanticHabitat

ontology

specialization

by reference

e.g. Albacore Tuna

analogous ontology

by mapping e.g. Contamination Polution

Other relevant semantic databy classification

e.g. SoilAcidification

Known irrelevant semantic data

by classification

e.g. Baltic

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 33

Example: Entities in Ontology NetworksExample: Entities in Ontology Networks

None of the following views is ‘typically ontological’ Entities are presented in a more familiar ‘tag style’ Node positions reflect semantic proximity, similarity, ‘sameness’

These are truly from one ontology:proximity = ontological distance

These are collated from multiple sources:proximity = repetitive contextual co-occurrence

Source: Cipher project, 2005

Source: NeOn project, 2008

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 34

Example: Selecting OntologiesExample: Selecting Ontologies

A concept occurring in more than one ontology:redundancy of occurrence

A concept occurring in more than one role, sense:redundancy of meaning

A concept from topicallymore distant ontology:divergence into new frame

Statistics of the ontologyor entity:provenance of information

Versions of the same ontologydiscovered by PowerMagpie:temporality of occurrence

PowerMagpie analyzes a web page, proposes and justifies relevant entities Supporting divergent navigation Supporting time snapshots Acknowledging multiple meanings Exposing redundancy of occurrences Etc.

Next step: improve the user interaction, GUI Making the ontology-driven interaction more serendipitous,

natural and embedded in standard browsing

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 35

Where is ‘Best of Both’?Where is ‘Best of Both’?

Essentially in two contradictory properties: Ontologies are expressive, well-defined But they are fairly sparse in terms of content

It’s the sparseness that makes meaningful browsing and navigation difficult In any single ontology we are merely performing graph/tree

navigation = often falling into the ‘closed world assumption’ Semantic Web affords more flexibility

It may not enable us to tell which sense of a term we see But it is sufficiently connected to enable us telling the difference

between the senses: On the level of ontology networks but also individuals

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 36

Comparing SW and KBSComparing SW and KBS

Why should it work now if it hasn’t in the past? There are some key changes in the play:

Classic KBS SW Systems

Representation 'Clean' 'Good Enough'

Size Small/Medium Extra Huge

Repr. Schema Homogeneous Heterogeneous

Quality High Very Variable

Degree of trust High Very Variable

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 37

Key Paradigm ShiftKey Paradigm Shift

Classic KBS SW Systems

Intelligence

A function of sophisticated, task-centric problem solving

A side-effect of size and heterogeneity

(Collective Intelligence)

Is due to information and data redundancy There are not only numerous documents - with little formal

semantic structure, but also… numerous formal take-ons trying to conceptualize user views

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 38

How Far Are We?How Far Are We?

Solutions working in a new dynamic context (run-time rather than design-time) Example: Ontology Mapping

So far: mostly design-time mapping of (two) complete ontologies Mapping many partial, incomplete ontologies, ontological modules?

Example: Ontology Selection So far: largely by querying, user-mediated ontology retrieval Selecting networks of not contradictory partial ontologies?

Example: Ontology Modularization So far: by and large has the user in the loop, consistency-driven Many diverse drivers (access right, trust, scale, summarization,…)

The context of the above tasks is changing

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 39

Next Generation SW ToolsNext Generation SW Tools

Make away with singular data sources Incl. ontologies, classification trees, maps,…

For them ontology becomes a dynamic notion Ontology = a selection of modules appropriate to a particular

context, situation, user, task,…

New challenges arise Discovering semantic content and relations in it Modularizing large sources of semantic content Selecting (parts of) semantic content or ontologies Support user interaction on such a large scale Etc.

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 40

Some ReadingSome Reading Dzbor, M. - Motta, E. - Domingue, J.B.: Magpie: Experiences in supporting

Semantic Web browsing. Journal of Web Semantics, Vol.5, No.3., pp.204-222. Elsevier Publishers, The Netherlands.

Dzbor, M. - Motta, E.: Semantic Web Technology to Support Learning about the Semantic Web. In 13th Intl. Conf. on Artificial Intelligence in Education (AIED). July 2007, California, US.

Sabou, M. - Lopez, V. - Motta, E. (2006). Ontology Selection for the Real Semantic Web: How to Cover the Queen’s Birthday Dinner?. Proc. of the EKAW 2006 Conf., Podebrady, Czech Republic.

D'Aquin, M., - Sabou, M. - Motta, E. (2006). Modularization: A key for the dynamic selection of relevant knowledge components. ISWC 2006 Workshop on Ontology Modularization, Georgia, US.

Motta, E. (2006). Knowledge Publishing and Access on the Semantic Web: A Socio-Technological Analysis. IEEE Intelligent Systems, Vol.21, No.3, pp.88-90. IEEE Press, US.

23 Jan 2008 2008 © Martin Dzbor, 34th SofSem Conf., Slovakia Slide 41

Acknowledgements and Web SitesAcknowledgements and Web Sites

Work presented has been developed in the context of the following projects and activities: Magpie (info, demo, download): http://kmi.open.ac.uk/projects/magpie PowerMagpie (info, demo, download): http://powermagpie.open.ac.uk Watson (info, UI, API download): http://watson.kmi.open.ac.uk

NeOn project: http://www.NeOn-project.org OpenKnowledge project: http://www.openk.org Papers cited and personal pages: http://kmi.open.ac.uk

Acknowledging funding from European Commission’s Framework 6 (NeOn & OpenKnowledge), EPSRC and NERC

Also thanks to Laurian Gridinoc, Enrico Motta, Joerg Diederich, etc. for their input to some of the ideas presented