dh2012 enriching digital libraries contents with pundit system

41
ENRICHING DIGITAL LIBRARIES CONTENTS WITH SEMLIB SEMANTIC ANNOTATION SYSTEM (PUNDIT!) Michele Nucci, Marco Grassi , Christian Morbidoni and Francesco Piazza Semedia (Semantic Web and Multimedia) http://semedia.dii.univpm.it DII - Department of Information Engineering. Polytechnic University of Le Marche, Ancona, Italy Tuesday, July 24, 2012

Upload: marco-grassi

Post on 11-May-2015

614 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Dh2012 enriching digital libraries contents with pundit system

ENRICHING DIGITAL LIBRARIES CONTENTS WITH SEMLIB SEMANTIC

ANNOTATION SYSTEM(PUNDIT!)

Michele Nucci, Marco Grassi, Christian Morbidoni and Francesco Piazza

Semedia (Semantic Web and Multimedia)http://semedia.dii.univpm.it

DII - Department of Information Engineering. Polytechnic University of Le Marche, Ancona, ItalyTuesday, July 24, 2012

Page 2: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

DIGITAL EVOLUTION

• Most of the resources of interest for the Humanities:

• in digital format (digitized or born digital)

• available on the Web

• Information is multiplying faster and faster :• classification and management increasingly complex task

• well structured metadata a key requirement

• Semantic Web technologies in Digital Libraries• Publish DL content as Linked Data

• define ontologies or vocabularies for metadata encoding (Europeana Data Model, OAI-ORE…)

Tuesday, July 24, 2012

Page 3: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

THE WEB SCENARIO

• Web (> 2.0) has become more social and interactive

• Annotation of Web content is beneficial:• More engaging and productive user experience• Exploit social engagement to improve resource ranking, classification

• Annotating web content has become a common task• Comments and tags are widely supported by mainstream application

• Facebook pictures tags, Flickrs pictures comments, etc ...

• Many tools to bookmark, highlight, comment web page fragments• E.g. sharedcopy.com, annotateit.org, diigo.com,

• Some tools support collaborative annotations

Tuesday, July 24, 2012

Page 4: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

DL SCENARIO

• Crowdsourcing experiments for enriching DL, curating contents or uploading digital material of interest for the DL (BBC WW2 People’s War, …)

• Digital Libraries (DL) are no longer simple “expositions” of digital objects but provide users with more interaction

Digital Library

Consume Contents

Create Contents

Experts

Users

Expert modelDigital Library

Consume Contents

Create Contents

Experts

Users

TaggingConsumeContents

Linking

Commenting

Social Engagement

User Interaction

Digital LibraryConsume Contents

Create Contents

Experts

Users

TaggingConsumeContents

Linking

Commenting

Add Content Add Annotations

Crowdsourcing

Tuesday, July 24, 2012

Page 5: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

SEMANTICALLY STRUCTURED ANNOTATIONS

• ... so what’s missing?

• Most of existing annotation tools are usually limited to simple textual tags and comments.

• limitation due to the ambiguity of natural language (“orange” a fruit or a color?)

• their semantic is not machine interpretable

• Semantically structured annotations to make smart use of such added knowledge:

• Unambiguously express semantics to be processed by software agents (e.g. annotations can be harvested and used by recommender systems, search engines, etc.)

• Power Digital Libraries (improving browsing, search, automatic content classification, ...)

• Reuse such a collaborative knowledge in different contexts and different applications

Tuesday, July 24, 2012

Page 6: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

SEMANTICALLY STRUCTURED ANNOTATIONS

Users to create knowledge graphs where web content fragments, concepts and entities are meaningfully connected.

Tuesday, July 24, 2012

Page 7: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

SEMANTICALLY STRUCTURED ANNOTATIONS

• Rely on controlled vocabularies and ontologies• share the same terminology and “talk about the same things”• annotations can be meaningfully mashed-up

• Link to the emerging Web of Data• a software can automatically get additional, useful semantic data (e.g. date and place of

birth, pictures, citations, multi-language data)

Augmenting the information of the original annotation content to support smarter application

Ex. We have discovered that the two images contain american film actors showing anger emotion!

Tuesday, July 24, 2012

Page 8: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

• developed by:

• funded by:

• supported and further developed in:

• Pundit is a novel semantic annotation tool:

Semedia (Semantic Web and Multimedia)http://semedia.dii.univpm.it

with the collaboration of NET7

Semlib Project Eu Projecthttp://semedia.dii.univpm.it

DM2E EU Projecthttp://dm2e.edu/

AGORA EU Projecthttp://project-agora.eu/

Tuesday, July 24, 2012

Page 9: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

SEMLIB PROJECT

• R&D project supported by EU FP7 Theme: Research for SMEs (no. FP7-SME -2010-01- 262301 - SEMLIB)

• 24 months (commenced in January 2011, currently at month 19)

Semlib ProjectSemantic Web Tools for DL

http://www.semlibproject.eu/

www.netseven.it/www.knowledgehives.com/www.liberologico.com/www.in-two.com

www.semedia.dii.univpm.it/ www.deri.ie/

Tuesday, July 24, 2012

Page 10: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

ANNOTATION MODEL• Based on Open Annotation Collaboration (OAC) ontology*

Contextual Information

Tuesday, July 24, 2012

Page 11: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

ANNOTATION MODEL• Based on Open Annotation Collaboration (OAC) ontology*

Contextual Information

Annotation Content

Tuesday, July 24, 2012

Page 12: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

ANNOTATION MODEL• Based on Open Annotation Collaboration (OAC) ontology*

Contextual Information

Annotation Content

Semantically Structured Content

Tuesday, July 24, 2012

Page 13: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

ANNOTATION MODEL• Based on Open Annotation Collaboration (OAC) ontology*

Contextual Information

Annotation Content

Named Graph

SPARQL support to query slices of knowledge

Tuesday, July 24, 2012

Page 14: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

NOTEBOOKS• Annotations are collected in notebooks

NotebookURI

2011-01-27 10:30:56

My Example Notebook

An Example Notebook used to show the model

dcterms:creator

dcterms:created

rdfs:label

rdfs:comment

• Provide users with the capability to organize their annotations• users has a default notebook• can create more

• Put together annotations so that they can be retrieved and queried

• Different UNIX style read/write privileges (from private to completely public)

• Identified by a URI

Tuesday, July 24, 2012

Page 15: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

NOTEBOOKS• Notebooks allow annotations sharing

NotebookURI

2011-01-27 10:30:56

My Example Notebook

An Example Notebook used to show the model

dcterms:creator

dcterms:created

rdfs:label

rdfs:comment

SINGLE USER

COMMUNITIES

PUBLIC

SHARE

NotebookURI

SHARENotebookURI

SHARE

NotebookURI

WIKI

• Sharing a notebook is as easy as sharing its URL on the web (similarly to popular file sharing platforms)

Tuesday, July 24, 2012

Page 16: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

USER AUTHENTICATION

• Authentication is based on OpenID:

• No need to store user’s credentials

• Implemented already by mainstream company (Google, Yahoo, ...)

• Possibly avoid user multiple registration (waste of time, another password)

• Single identity can be used among different Pundit-enabled Digital Libraries

• Adding an OpenID provider is easy and transparent to the Pundit server.

Tuesday, July 24, 2012

Page 17: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

ANNOTATION SHARING SCENARIO

Annotation Authoring API

Annotation Server

Annotation Consuming API

Annotation Client

Annotation Client

Annotation Client

structured annotations structured annotations

Create structured annotations

Tuesday, July 24, 2012

Page 18: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

ANNOTATION SHARING SCENARIO

Annotation Authoring API

Annotation Server

Annotation Consuming API

Annotation Client

Annotation Client

Annotation Client

structured annotations structured annotations

COLLECTIVE KB

Create structured annotations

Store them into a unique

knowledge base

Tuesday, July 24, 2012

Page 19: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

ANNOTATION SHARING SCENARIO

Annotation Authoring API

Annotation Server

Annotation Consuming API

Annotation Client

Annotation Client

Annotation Client

Annotation Client

Annotation Client

Annotation Client

structured annotations structured annotations

COLLECTIVE KB

...whose slices can be accessed not only by their creator...

Create structured annotations

Store them into a unique

knowledge base

Tuesday, July 24, 2012

Page 20: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

ANNOTATION SHARING SCENARIO

Annotation Authoring API

Annotation Server

Annotation Consuming API

Annotation Client

Annotation Client

Annotation Client

Annotation Client

Annotation Client

Annotation Client

Third PartyApplication

Annotation Client

structured annotations structured annotations

COLLECTIVE KB

...but also by other users and third party applications!

Create structured annotations

...whose slices can be accessed not only by their creator...

Store them into a unique

knowledge base

Tuesday, July 24, 2012

Page 21: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

ANNOTATION SHARING SCENARIO

Annotation Authoring API

Annotation Server

Annotation Consuming API

Annotation Client

Annotation Client

Annotation Client

Annotation Client

Annotation Client

Annotation Client

Annotation Client

Third PartyApplication

Annotation Client

structured annotations structured annotations

selected annotations

trusted/ufficialannotations

COLLECTIVE KB

Create structured annotations

Store them into a unique

knowledge base

...but also by other users and third party applications!

...whose slices can be accessed not only by their creator...

DL administrator can select annotations and publish back

as trusted annotations to enrich DL content

Tuesday, July 24, 2012

Page 22: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

NAMED CONTENT• DLs change over time

• Presentation can restyled and content can be re-organized

• Same content in different pages• Some part of the page should not be

annotated (menu, ...)

• Specific markup can be added in the pages to allows Pundit:• identifying atomic pieces of content (by

means of URI)• attaching the annotations to such

contents• avoid the annotation of page accessory

component

<div class="pundit-content" about="http://example.org/contents/123"> <!-- HTML goes here. --> <p>This is a named content and contains both text and a picture</p> <img src="http://example.org/pictires/pictire123.png" /> <p><em>Caption:</em> this is a caption.</p></div>

Tuesday, July 24, 2012

Page 23: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

NAMED CONTENT• DLs change over time

• Presentation can restyled and content can be re-organized

• Same content in different pages• Some part of the page should not be

annotated (menu, ...)

• Specific markup can be added in the pages to allows Pundit:• identifying atomic pieces of content (by

means of URI)• attaching the annotations to such

contents• avoid the annotation of page accessory

component

<div class="pundit-content" about="http://example.org/contents/123"> <!-- HTML goes here. --> <p>This is a named content and contains both text and a picture</p> <img src="http://example.org/pictires/pictire123.png" /> <p><em>Caption:</em> this is a caption.</p></div>

Tuesday, July 24, 2012

Page 24: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

NAMED CONTENT

The same content in different pages shows the same annotations!

Text

Tuesday, July 24, 2012

Page 25: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

NAMED CONTENT

The same content in different pages shows the same annotations!

Text

Tuesday, July 24, 2012

Page 26: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

PUNDIT ARCHITECTURE

• Open Source RESTful Web Service (Java Jersey framework)

• Cross origin request• CORS (Cross-Origin Resource Sharing)

• JSONP

• Sesame triple store• SPARQL and inference

• Different sail are provided to implement different storages (BigOWLIM, MySQL, PostgreeSQL, Virtuoso ...)

• MySQL for user data

• Set of Javascript modules (Dojo Framework)• Easily extendable

• Highly customizableCLI

ENT

SERV

ER

Tuesday, July 24, 2012

Page 27: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

DIFFERENT ANNOTABLE CONTENTS

• Pundit allows the annotation of different types of contents at different level of granularity

• Text fragments

• Images

• Image fragments (under development)

• Videos and video fragments (experimented in Semtube)

Tuesday, July 24, 2012

Page 28: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

• Semantic annotation of YouTube videos (alpha state) based on Pundit JavaScript libraries and annotation server

http://semedia.dii.univpm.it/semtube

Tuesday, July 24, 2012

Page 29: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

DIFFERENT TYPES OF ANNOTATIONS

Annotation with different levels of expressivity and structure

Comment/Tag Panel

Tuesday, July 24, 2012

Page 30: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

DIFFERENT TYPES OF ANNOTATIONS

Annotation with different levels of expressivity and structure

Comment/Tag Panel

Tuesday, July 24, 2012

Page 31: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

DIFFERENT TYPES OF ANNOTATIONS

• Textual comments

Annotation with different levels of expressivity and structure

Comment/Tag Panel

Tuesday, July 24, 2012

Page 32: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

DIFFERENT TYPES OF ANNOTATIONS

• Textual comments• Semantic Tags

• Automatically extracted from textual comments (Dbpedia Spotlight)

Annotation with different levels of expressivity and structure

Comment/Tag Panel

Tuesday, July 24, 2012

Page 33: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

DIFFERENT TYPES OF ANNOTATIONS

• Textual comments• Semantic Tags

• Automatically extracted from textual comments (Dbpedia Spotlight)

• Popular Linked Data service(Dbpedia, Freebase, Wordnet, ..)

• Define your own (SPARQL endpoint)

Annotation with different levels of expressivity and structure

Comment/Tag Panel

Tuesday, July 24, 2012

Page 34: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

DIFFERENT TYPES OF ANNOTATIONS

• Textual comments• Semantic Tags

• Popular Linked Data service(Dbpedia, Freebase, Wordnet, ..)

• Automatically extracted from textual comments (Dbpedia Spotlight)

• Define your own (SPARQL endpoint)

• Semantic Relations• Subject-Property-Object Statements

• Drag&Drop and suggestions

• Connect different resources (user selection, linked data entities, ...) with semantically defined properties

Annotation with different levels of expressivity and structure

Triple Composer

Tuesday, July 24, 2012

Page 35: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

DIFFERENT TYPES OF ANNOTATIONS

• Textual comments• Semantic Tags

• Popular Linked Data service(Dbpedia, Freebase, Wordnet, ..)

• Automatically extracted from textual comments (Dbpedia Spotlight)

• Define your own (SPARQL endpoint)

• Semantic Relations• Subject-Property-Object Statements

• Drag&Drop and suggestions

• Connect different resources (user selection, linked data entities, ...) with semantically defined properties

Annotation with different levels of expressivity and structure

Triple Composer

Tuesday, July 24, 2012

Page 36: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

DIFFERENT TYPES OF ANNOTATIONS

• Textual comments• Semantic Tags

• Popular Linked Data service(Dbpedia, Freebase, Wordnet, ..)

• Automatically extracted from textual comments (Dbpedia Spotlight)

• Define your own (SPARQL endpoint)

• Semantic Relations• Subject-Property-Object Statements

• Drag&Drop and suggestions

• Connect different resources (user selection, linked data entities, ...) with semantically defined properties

Annotation with different levels of expressivity and structure

Triple Composer

Tuesday, July 24, 2012

Page 37: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

CUSTOM VOCABULARIES• Pundit allows to use custom vocabularies/taxonomies (and

relations):• Create a JSONp file (manually or automatically from an ontology )

• Put it online

• Add its URL to the configuration to import and use it

Tuesday, July 24, 2012

Page 38: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

CROSS PAGE / DOMAIN ANNOTATIONS• Special Bookmarklet allows to lunch Pundit on every Web page to perform annotations

• Selected resources (text fragments, images, ...) on different pages and domain can be added to “My Items” to be stored on server and reused on different pages

Tuesday, July 24, 2012

Page 39: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

CROSS PAGE / DOMAIN ANNOTATIONS• Special Bookmarklet allows to lunch Pundit on every Web page to perform annotations

• Selected resources (text fragments, images, ...) on different pages and domain can be added to “My Items” to be stored on server and reused on different pages

Add to My Items

Use in another page

Create cross page semantic relations

cites

Tuesday, July 24, 2012

Page 40: Dh2012 enriching digital libraries contents with pundit system

Enriching digital libraries contents with Pundit [email protected]

DEMO TIME!

http://thepund.itTuesday, July 24, 2012

Page 41: Dh2012 enriching digital libraries contents with pundit system

http://thepund.itSemedia (Semantic Web and Multimedia)

http://semedia.dii.univpm.it

Semlib Project Eu Projecthttp://www.semlibproject.eu/

THANK YOU!

DM2E EU Projecthttp://dm2e.edu/

AGORA EU Projecthttp://project-agora.eu/

Tuesday, July 24, 2012