introduction to the topaz otm framework and the ambra publishing system

27
Introduction to the Topaz OTM framework and the Ambra publishing platform Richard Cave – rcave at plos.org Russell Uman – ruman at plos.org

Upload: richard-cave

Post on 09-May-2015

3.222 views

Category:

Technology


1 download

DESCRIPTION

This presentation is an introduction to Topaz, an Open Source content modeling and storage framework that uses the Fedora Service Framework and Mulgara semantic technology as the core engine, and Ambra, a publishing application built on the Topaz framework.

TRANSCRIPT

Page 1: Introduction to the Topaz OTM framework and the Ambra publishing system

Introduction to the Topaz OTM framework

and the Ambra publishing platform

Richard Cave – rcave at plos.org

Russell Uman – ruman at plos.org

Page 2: Introduction to the Topaz OTM framework and the Ambra publishing system

open repositories 2009 www.plos.org

the Public Library of Science (PLoS)

non-profit, Open Access publisher

mission: open the doors to the world's library of scientific knowledge by giving any scientist, physician, patient, or student - anywhere in the world - unlimited access to the latest scientific research

currently publish seven journals– PLoS Biology, PLoS Medicine, PLoS Pathogens, PLoS

Computational Biology, PLoS NTDs, PLoS Genetics, PLoS ONE

largest journal is PLoS ONE– 5532 articles as of May 15, 2009

using Ambra / Topaz since December ‘06– PLoS ONE first journal on Ambra / Topaz– all journals migrated to Ambra / Topaz as of May 12, ‘09– ~13,000 articles published on Ambra / Topaz platform

Page 3: Introduction to the Topaz OTM framework and the Ambra publishing system

open repositories 2009 www.plos.org

journey down the yellow Topaz road…

originally intended to be an end-to-end online publishing system– peer review system

– composition system

– journal publishing system

open source platform for many types of publishing– scholarly communications / Open Access– eScience / eScholarship– education– libraries / museums

Page 4: Introduction to the Topaz OTM framework and the Ambra publishing system

open repositories 2009 www.plos.org

big ideas for transforming journal publishing

open source!

provide features for post-publication annotation and discussion allowing for a “living” document

mine the unknown (semantic) relationships in research articles

journal publishing system as first application

– publish a high volume of research articles

– stability

– high performance

© by wales.nhs.uk

Page 5: Introduction to the Topaz OTM framework and the Ambra publishing system

open repositories 2009 www.plos.org

freedom of choice

open source “publishing” platforms available in 2006- Rhaptos / Connexions (based on

Zope/Plone)

- Open Journal System (based on Drupal)

- DPubS (based on Fedora)

- ePrints

- Apache Lenya

© by David Pescovitz

but no system offered a high performance semantic repository with a “Web 2.0” user interface…

Page 6: Introduction to the Topaz OTM framework and the Ambra publishing system

open repositories 2009 www.plos.org

behind door #1 – Fedora + Kowari (Mulgara)

© by That Guy Called Ben

Page 7: Introduction to the Topaz OTM framework and the Ambra publishing system

open repositories 2009 www.plos.org

…at the end of the road

semantic publishing platform based on Fedora and Mulgara

Topaz (back-end glue)

– Object to Triple Mapping (OTM)

– Object Query Language (OQL)

© by Michael James

Ambra journal publishing system (front-end user interface)

Page 8: Introduction to the Topaz OTM framework and the Ambra publishing system

open repositories 2009 www.plos.org

Ambra / Topaz journal publishing platform

Apache

Ambra

Fedora + Mulgara

RDF Store

Topaz OTM

Topaz

Files

CAS

Fedora is used to store digital objects (XML, PDF, images, etc.)

article metadata, annotations (annotea) and user information (foaf) is stored as triples in Mulgara

Topaz is used for storage and retrieval of the digital objects and triple stores through the Objects to Triples Mapping (OTM)

Ambra (user interface)CAS single sign-on serviceApache webhead

Page 9: Introduction to the Topaz OTM framework and the Ambra publishing system

open repositories 2009 www.plos.org

under the hood of Topaz (1)

an Object-Triples-Mapping (OTM) library – modeled after Hibernate Object-Relational Mapping (ORM) – except the database is made of RDF triples instead of a relational

database.

provides a query language based on objects (OQL)– an "object" based query syntax– makes life a bit easier for developers

OQL exampleselect all articles with a given title: select a.id, a.author from Article a where a.title = 'Hello Dolly';

Page 10: Introduction to the Topaz OTM framework and the Ambra publishing system

open repositories 2009 www.plos.org

under the hood of Topaz (2)

defines Java classes maps the classes into RDF – Ambra defines models which are mapped into sets of triples in

various graphs

– such as “article”, “annotation”, etc. models defined in Ambra

provides support for storing files to a separate blob store (Fedora and/or Akubra)

provides storage and retrieval of files and triples in a single transaction – necessary to render an article with associated metadata (e.g.

notes, ratings, etc.)

Page 11: Introduction to the Topaz OTM framework and the Ambra publishing system

open repositories 2009 www.plos.org

why Objects to Triples Mapping (OTM)?

allows for retrieving collections of objects (fast) with one query instead of a single object at a time (slow)

as an online-only publisher, we need fast

Page 12: Introduction to the Topaz OTM framework and the Ambra publishing system

open repositories 2009 www.plos.org

Ambra

first application built on Topaz application framework

“Web 2.0” features– uses the FreeMarker templating engine to display the content

received from Topaz service.– uses the DOJO JavaScript toolkit to handle complex user

interactions like annotations, ratings, etc. – provides social networking features (in-line notes, comments,

trackbacks)– turns a reader of scientific articles into a knowledge contributor,

knowledge that can be used by other users– living document!

Page 13: Introduction to the Topaz OTM framework and the Ambra publishing system

open repositories 2009 www.plos.org

Ambra features

Ambra

article

ingestion

search

annotations

discussions

security

mgmtratings

user profile/

preferencesatom feeds

multiple

journalstrackbacks

SignOn

Server

CAS

single

sign-

onarticle

publication

CrossRef

registration

DOI resolver

Cache for web content and digital objects

Page 14: Introduction to the Topaz OTM framework and the Ambra publishing system

open repositories 2009 www.plos.org

Ambra – what it’s not

cms– only NLM DTD XML

workflow engine

peer-review system

scientific social site

out-of-the box solution for journal publishing © by roflrazzi.com

Page 15: Introduction to the Topaz OTM framework and the Ambra publishing system

open repositories 2009 www.plos.org

Ambra – new development

article level metrics

article usage data (COUNTER)

tags and better discoverability

semantic enhancement

automatic file transfer to external sources– PubMed Central– other repostories

Page 16: Introduction to the Topaz OTM framework and the Ambra publishing system

open repositories 2009 www.plos.org

article level metrics

“impact” of an article outside of citations– notes, comments, ‘star ratings’ and trackbacks

in March ‘09, we launched:

1. number of Citations– PubMedCentral and Scopus

2. amount of Blog coverage– Postgenomic, Nature Blogs and Bloglines

3. number of Social Bookmarks– CiteULike and Connotea

Page 17: Introduction to the Topaz OTM framework and the Ambra publishing system

open repositories 2009 www.plos.org

Page 18: Introduction to the Topaz OTM framework and the Ambra publishing system

open repositories 2009 www.plos.org

article usage data (COUNTER)

in mid-2009, we will add usage data to every article – HTML Page Views– PDF Downloads– XML Downloads

article usage data to be displayed numerically and graphically– includes historical data– in the context of other articles within the journal

Page 19: Introduction to the Topaz OTM framework and the Ambra publishing system

open repositories 2009 www.plos.org

Page 20: Introduction to the Topaz OTM framework and the Ambra publishing system

open repositories 2009 www.plos.org

semantic enhancement of content

add value to the content of a research article

highlight text for selected terms– protein names– genus / species– disease– location / habitat– etc.

provide links to external sources to create new user interactions

Page 21: Introduction to the Topaz OTM framework and the Ambra publishing system

open repositories 2009 www.plos.org© by David Shotton

Page 22: Introduction to the Topaz OTM framework and the Ambra publishing system

open repositories 2009 www.plos.org

future development…?

© by David Shotton

ingest and publish many types of content / data– structured and unstructured

access triple store– sparql endpoint

REST-based API

Page 23: Introduction to the Topaz OTM framework and the Ambra publishing system

open repositories 2009 www.plos.org

future development…?

© by David Shotton

Page 24: Introduction to the Topaz OTM framework and the Ambra publishing system

open repositories 2009 www.plos.org

system requirements

minimum - single server (Linux) with 8 Gb RAM

…better (based on PLoS journals):– 1 server for Fedora and Mulgara with 8Gb RAM– 1 server for Ambra and Topaz with 8Gb RAM– 1 server for Apache and CAS with 4Gb RAM

PLoS journals on Ambra / Topaz– 800k visits / month– 1.8 million pageviews / month

Amazon AMI to test Ambra / Topaz available soon!

Page 25: Introduction to the Topaz OTM framework and the Ambra publishing system

open repositories 2009 www.plos.org

where are they now?

funding for Topaz project has concluded but it’s still an active project

(not a deathstar)

Topaz moved to Fedora Commons– Paul Gearon, Fedora Commons, technical lead of semantic

technologies projects

Ambra stewardship moved to PLoS– PLoS fully committed to Ambra / Topaz platform– small development team working on new features

© by nydailynews.com

Page 26: Introduction to the Topaz OTM framework and the Ambra publishing system

open repositories 2009 www.plos.org

interested?

Ambra project site launched– www.ambraproject.org– documentation in progress– we need your input!– we need models for other types of

content

Open Access Publishing solution community at Fedora Commons– “steward” = Richard Cave, PLoS– “knowledgebase gardner” = Chris Freeland, Biodiversity

Heritage Library

Page 27: Introduction to the Topaz OTM framework and the Ambra publishing system

open repositories 2009 www.plos.org

resources

Topaz websitehttp://www.topazproject.org/

Topaz manualhttp://www.topazproject.org/trac/wiki/Topaz/Manual

Ambra website http://www.ambraproject.org/

Ambra mailing lists:http://lists.topazproject.org/mailman/listinfo/ambra-usershttp://lists.topazproject.org/mailman/listinfo/ambra-dev

Richard Cave – rcave at plos.org