stefano mazzocchi, researcher at mit, application catalyst at metaweb technologies, inc. stephen...

37
Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer Science and Artificial Intelligence Laboratory Ryan Lee, W3C Research Engineer January 26, 2005 - XML.com Massachusetts Institute of Technology (MIT) Research Activity Alireza Abbasi Alireza Abbasi Technology Management, Economics and Policy Program (TEMEP), College of Eng., SNU

Upload: samantha-ball

Post on 28-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc.

Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer Science and Artificial Intelligence Laboratory

Ryan Lee, W3C Research Engineer

January 26, 2005 - XML.comMassachusetts Institute of Technology (MIT) Research Activity

Alireza AbbasiAlireza AbbasiTechnology Management, Economics and Policy Program (TEMEP), College of Eng., SNU

Page 2: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

SIMILE Project

Focused on collecting and publishing Semantic Web data to the (non-Semantic) Web.

Researching solutions to data interoperability problems for digital libraries using semantic web technologies.

RDF-based Tools Longwell Gadget RDFizer Welkin Fresnel Timeline *new Referee *new Crowbar *new Piggy Bank *new Solvent *new

2

Page 3: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

Introduction

Digital Libraries’ Problem: Browsing digital libraries is a difficult process of

navigating through different interfaces and different terminologies for each collection

SIMILESIMILE Project [Semantic Interoperability of Metadata In unLike Environments]Make it easier to wander from collection to

collection, And, more generally, to find your way around in the Semantic Web

Motivated by DSpace a repository for storing, indexing, preserving, and

redistributing digital assets. Manages metadata about the content and distributes on

the web3

Page 4: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

DSpace

Jointly developed by HP Research Labs

and the MIT Libraries. (Open source software)

Used by many research-producing Organizations, and often

by their libraries, to manage digital data and for researchers to find that data

4

Page 5: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

DSpace (2)

Needs to support additional metadata schemas for a variety of purposes:finding digital research material described in various,

domain-specific ways, managing that digital content over time in order to

preserve it.

As DSpace expands to use new metadata schemas, it will have to deal with the problem of interoperability.

5

Page 6: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

Incentive of SIMILE

The Semantic Web Core stack (RDF, RDFS, and OWL)

enables people to create ontologies to describe their specialized metadata and to make them generally reusableBut most people are not trained Semantic Web

developers.

So, they need some toolstools for this and assess whether they did the job correctly.

6

Page 7: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

Goals of SIMILE

To extend DSpace, enhancing support for arbitrary schemas and metadata and providing an architecture for disseminating digital assets

Creating toolsCreating tools that metadata specialists (e.g., librarians) need, to produce good-quality RDF. Due to limited expertise in defining ontologies, creating RDF, and

converting existing XML-based metadata into RDF.

Make Make metadata interoperability metadata interoperability easier easier for digital libraries by for digital libraries by providing useful tools providing useful tools

for browsing, searching and mapping for browsing, searching and mapping heterogeneous metadata in RDFheterogeneous metadata in RDF

7

Page 8: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

SIMILE – Delivered Components

Tools for Metadata Managers Gadget - XML inspector RDFizers - Batch tools to transform existing XML data into RDF Solvent* - Firefox extension for Javascript screen scraping Welkin - Graphical tool to inspect/edit RDF graph

Tools for End-Users Longwell - Web-based RDF faceted metadata browser Frensel – extensible universal information client Piggy Bank* - Firefox extension for personal info. management of

metadata in RDF Semantic Bank* - Web-based server that allows data publishing and

sharing by individuals, groups, or communities Exibit* - lightweight structured data publishing framework Timeline* - AJAXy widget for visualizing time-based events

*: new tools after the paper

8

Page 9: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

SIMILE: Tools for Metadata Managers

RDFizers Batch tools to transform existing XML data into RDF

Gadget XML inspector

Welkin Graphical tool to inspect/edit RDF graph

Solvent* Firefox extension for Javascript screen scraping

*: new tools after the paper

9

Page 10: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

RDFizers: Transform XML data into RDF Transform XML data into RDF

RDF’s strength is “defining models in the highly distributed nature”

But, RDF/XML serialization is a very unfriendly compromise

So, RDFizers is created to create and catalog software tools and scripts, which are able to

transform data from existing syntaxes into RDF. allows people to explore their existing data in available RDF browsing

tools.

It helps to resolve the SW chicken-and-egg problem "not much RDF data will be created without a killer app., but no

killer app. will be created without more RDF data“ Solution: making it easier for specialists (like librarians and other

metadata experts) to convert popular and widely available metadata sources into RDF.

10

Page 11: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

RDFizers (2)

Done with XSLT style sheets, simple scripts

Need to define RDF “ontologies” for each

List of RDFizers in SIMILE: MARC/MODS RDF . OAI-PMH RDF OCW RDF . EMail RDF BibTEX RDF . Flat RDF Weather RDF . Java RDF Javadoc RDF . Jira RDF Subversion RDF . Random RDF

Page 12: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

Gadget: XML inspector

Problem in transformation of existing XML datasets into RDF lack of tools that give you an at-a-glance overview of an

XML dataset (or a collection of XML documents).

Gadget helps data managers understand the structure of an XML dataset by providing a summary of the

count, unique values, and percentage of unique values for XML attributes.

Works on any well-formed XML

Used for Data exploration, understanding Data migration, transformation Data cleanup Complexity evaluation Schema adherence understanding Schema emergence (if none provided)

12

Page 13: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

Gadget: sample

13

OCW: 2,002,015 Lines of XML

Page 14: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

Welkin: Graphical tool to inspect/edit RDF graph

Configuring tools like Longwell requires a thorough understanding of the structure of the data being examined. it is hard to get a global overview of an RDF model, a few tools for summarizing RDF and giving a quick mental

model of the data being manipulated with a browser.

So WelkinWelkin is created an interactive graphical RDF browser that visualizes

any RDF model without requiring prior configuration (like Knowle, but unlike Longwell)

displays RDF as a clustered set of nodes and arcs. useful for understanding and mining the layout of

unfamiliar datasets. tries to empower the user with an interactive approach,

allowing users to mine, zoom, drag, select, cluster, filter, and highlight nodes and arcs.

14

Page 15: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

Welkin: Graphical tool to inspect/edit RDF graph

15

Page 16: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

Solvent (new*): Easier Scraping to RDF

a Firefox extension that helps write Javascript screen scrapers for Piggy Bank.

Motivation: turns a regular web page into a semantic web page, freeing the

data from the page/site that contains it.

Unfortunately, not many web pages embed or link to RDF information.

Piggy Bank needs web pages to embed information in RDF.

Piggy Bank is capable to execute a particular screen scraper on particular pages in order to "extract" the information it needs.

16

Page 17: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

Solvent (example)

17

Page 18: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

SIMILE: SIMILE: Tools for End-UsersTools for End-Users

Longwell Web-based RDF faceted metadata browser

Frensel Vocabulary for specifying how RDF graphs are presented

Piggy Bank* Firefox extension for personal info. management of metadata in RDF

Semantic Bank* Web-based server that allows data publishing and sharing by

individuals, groups, or communities

Exibit* lightweight structured data publishing framework

*: new tools after the paper

18

Page 19: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

Longwell: RDF faceted metadata browser

RDF browsing for library usersLongwell, a web-based RDF-powered highly-configurable

faceted browser targets users by hiding the presence of the

underlying RDF model

Knowle (shipped as part of the Longwell distribution), a node-focused graph navigation browser targeted at people who want to see or debug the

underlying RDF model.

The browsing suite is written as Java servlets and is built around HP's Jena2 Semantic Web toolkit.

19

Page 20: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

Longwell (sample)

20

Page 21: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

Haystack: extensible "universal information client“

enables users to manage diverse sources of information (e.g., email, calendars, address books, and web pages) by defining whichever arrangements of, connections between, and views of

information they find most effective.

the interaction offered by a web-browser interface is too limited, So, The Haystack project is exploring a "rich client" interface that allows RDF data to be manipulated as well as navigated.

Unlike Welkin, which displays information as a graph, Haystack aims for a Longwell-like presentation of information that is natural for simple end users. It uses standard primitives like drag and drop and context menus

to give users access to various operations on the data being viewed at any given time.

It is currently being repackaged as a plugin in the Eclipse platform.

21

Page 22: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

Fresnel: vocabulary for specifying how RDF graphs are presented

In working on RDF browsing for both SIMILE and Haystack, they found that it is better to have a general ontology governing how to display RDF, a kind of stylesheet for RDF that allows user to indicate

how we would like to present some abstract data to the user.

Together with other members of the Semantic Web development community, SIMILE is working on putting together Fresnel, a generic ontology for describing how to render RDF in a human-friendly manner.

22

Page 23: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

23 ©MIT CNI Spring 2006

Piggy Bank*: information management of metadata in RDF

Firefox extension for managing metadata Loads RDF into local Longwell server

Search and faceted browse of local RDF Views defined by library, other users

Users can find, collect, annotate RDF Can then publish for access by others

Page 24: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

24 ©MIT CNI Spring 2006

Piggy Bank* (Sample)

Page 25: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

25 ©MIT CNI Spring 2006

Semantic Bank*: Web-based server that allows data publishing and sharing by individuals, groups, or communities

To persist remotely, share, and publish data on a server

For individuals, groups, communities e.g. conference

proceedings

Ability to tag resources Longwell facetted

browsing view of published information

Page 26: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

Exibit*: create web pages with support for sorting, filtering, and rich visualizations

Page 27: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

27 ©MIT CNI Spring 2006

SIMILE Categories of Work

Page 28: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

Projects after this Paper - Done

Timeplot Timeplot a cross-browser DHTML (canvas-based) time

series plotting widget.

TimelineTimelineA DHTML AJAX timeline widget for visualizing

temporal information.

28

Page 29: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

Projects after this Paper - ongoing

Piggy BankPiggy Bank An extension to the Firefox that turns it into a Semantic Web browser letting you

make use of existing information on the Web in more useful and flexible ways not offered by the original Web sites.

Semantic BankSemantic Bank The server companion of Piggy Bank that lets you persist, share and publish data

collected by individuals, groups or communities. SolventSolvent

A Firefox extension that helps you write Javascript screen scrapers for Piggy Bank.

jsTeXjsTeX a javascript library that is capable of interpreting some (basic) TeX encodings

and transform them into HTML definitions right directly on a web page. CitelineCiteline

A web application to facilitate the web publishing of bibliographies and citation collections as interactive exhibits and facilitate the sharing of this type of data.

ZotzZotz a Firefox add-on giving you the ability to publish citations from your Zotero to an

Exhibit (via Citeline) in one step.

29

Page 30: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

Projects after this Paper – ongoing (2)

RefereeReferee reads your web server logs, crawls your referrers (the links that point to your pages)

and extract metadata from those pages and text around the links that pointed to your pages.

BabelBabel lets you convert between various data formats.

ExhibitExhibit lets you create web pages with support for sorting, filtering, and rich

visualizations by writing only HTML and optionally some CSS and Javascript code.

AppalachianAppalachian a Firefox add-on that adds the ability to manage and use several OpenIDs to ease

the login parts of your browsing experience.

SeekSeek adds faceted browsing features to Mozilla Thunderbird and lets you search

through your email more effectively.

30

Page 31: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

An Incomplete Picture

For metadata specialists and system developers, For metadata specialists and system developers, What about editing RDF?

http://www.altova.com/features_RDF.html http://www.cs.rpi.edu/~puninj/rdfeditor http://rhodonite.angelite.nl

What about building new ontologies? Universidad Politécnica de Madrid’s School of Computing (FIUPM) have developed a new method for

building multilingual ontologies that can be applied to the Semantic Web.

What about storing vast quantities of (potentially distributed) RDF and accessing it efficiently?

http://tucana.es.northropgrumman.com/solutions/technology.htm

What about using performance-enhancing techniques (such as caching) for RDF? What about quickly inferencing over RDF data?

For users, For users, Can we design faceted browsing interfaces that scale to dozens of RDF

ontologies? How about improving navigation across the linkages between ontologies? How can we support searching that will start in one domain/ontology and

expand into relevant related domains/ontologies?

31

Page 32: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

References

SIMILE: Practical Metadata for the Semantic Web,

by Stefano Mazzocchi, Stephen Garland, Ryan Lee [January 26, 2005] http://www.xml.com/pub/a/2005/01/26/simile.html

http://simile.mit.edu/http://en.wikipedia.org/wiki/SIMILE

“MIT’s SIMILE Project: Demonstrating Practical Value of Semantic Web Technology for Digital Libraries” by MacKenzie Smith, MIT Libraries

“Tutorial – Semantic Digital Libraries, Comparison and the Future” by Sebastian R. Kruk, Bernhard Haslhofer, Philipp Nußbaumer, Sandy Payette, Tomasz Woroniecki, Univ. of Vienna, 2007.

32

Page 33: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

33

Page 34: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

Faceted browsing

a technique for accessing a collection of information represented using a faceted classification, allowing users to explore by filtering available information.

Displays only the metadata fields that are configured to be 'facets' (i.e., to be important for the user browsing data in one or more specific domains) using values for those fields as a means for zooming into a collection by

selecting those items with a particular field-value pair (e.g., 26 works of art in the example dataset have a subject of Abstract Expressionism).

Provides a mechanism that allows users to explore different schemas from different domains with a unified interface and to discover the synergies across them. For example, the interface can be designed to show users that one

schema uses a "subject" facet while another uses a "topic" facet for similar information.

34

Page 35: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

Welkin (sample)Welkin is used

to browse a fragment of the MIT OpenCourseWare metadata converted to RDF.

35

Page 36: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

Timeline*: visualizing temporal information

Page 37: Stefano Mazzocchi, Researcher at MIT, Application Catalyst at Metaweb Technologies, Inc. Stephen Garland, Principal Research Scientist, Emeritus, MIT Computer

Behind the Curtain

Four groups support SIMILE: HP Research Labs, the W3C, MIT Libraries,

and MIT CSAIL. The principal investigators have included

Mick Bass, Eric Miller, MacKenzie Smith, and David Karger.

The developers are Stefano Mazzocchi, Stephen Garland, and Ryan

Lee. Mark Butler (bootstraper of the Longwell project)

37