the impact interoperability framework - workflows for ocr and beyond

15
The IMPACT Interoperability Framework: Workflows for OCR and beyond Clemens Neudecker, KB National Library of the Netherlands 2 nd IMPACT Conference, British Library, London 24/25 October 2011

Upload: cneudecker

Post on 13-Jun-2015

81 views

Category:

Technology


0 download

DESCRIPTION

The IMPACT Interoperability Framework - Workflows for OCR and beyond Better, faster, cheaper. Solutions of the IMPACT Centre of Competence and future challenges, The British Library, 24-25 October 2011, London, United Kingdom.

TRANSCRIPT

Page 1: The IMPACT Interoperability Framework - Workflows for OCR and beyond

The IMPACT Interoperability Framework: Workflows for OCR and beyondClemens Neudecker, KB National Library of the Netherlands

2nd IMPACT Conference, British Library, London 24/25 October 2011

Page 2: The IMPACT Interoperability Framework - Workflows for OCR and beyond

Background

> 20 individual software components for specific challenges

Prototyping new algorithms, improving commercial solutions

Different frameworks (C, C++, Java, etc.), platforms (Win/Linux)

Extensible with 3rd party applications

IMPACT Interoperability Framework (IIF)

Page 3: The IMPACT Interoperability Framework - Workflows for OCR and beyond

Architecture

Java

Web Services

Apache

Taverna

Open Source available on https://github.com/impactcentre

Free Hackathon 14/15 November, University of Manchesterhttp://impact-mygrid-taverna-hackathon.wikispaces.com/

Page 4: The IMPACT Interoperability Framework - Workflows for OCR and beyond

Integration

Only requirement:command line executable

Generic command line wrapperproduces web service

Web service exposed as workflow module withdocumentation

Quick & easy integration: developers can focus on their application and have to worry less about integration = higher quality software

Page 5: The IMPACT Interoperability Framework - Workflows for OCR and beyond

Workflows OCR workflow =

data pipeline

Building blocks = processing modules (nodes)

Integration = interaction between nodes (mashups)

Collaboration with

Page 6: The IMPACT Interoperability Framework - Workflows for OCR and beyond
Page 7: The IMPACT Interoperability Framework - Workflows for OCR and beyond

Evaluation features Text comparison of result with ground truth,

using Levenshtein distance method Word evaluation (with reading order) Layout based comparison of result with ground truth,

using the Page Analysis And Ground Truth Elements Framework

Page 8: The IMPACT Interoperability Framework - Workflows for OCR and beyond

Community

Web2.0 style workflow registry

Ready-to-use and documented resources

Community of experts

Sharing of experimentsand know how

Page 9: The IMPACT Interoperability Framework - Workflows for OCR and beyond

Local client: Taverna Workbench

Background: BioSciences

Developed and maintained bymyGrid, UK

Open source

GUI for design and execution of web services & workflows

Page 10: The IMPACT Interoperability Framework - Workflows for OCR and beyond

Remote client: Portal

SOAP/REST API Remote execution of web services & workflows

Page 11: The IMPACT Interoperability Framework - Workflows for OCR and beyond

Results Repository

Custom service for IMPACT:

automatic storage of

workflow outputs and

provenance via WebDAV Fully interoperable,

since HTTP-based Configurable storage of

result sets Create reports using POI

Page 12: The IMPACT Interoperability Framework - Workflows for OCR and beyond

Scalability

Central ESB proxy manages multiple service copies

Process parallelization,Load distribution,Fail over, Security

Served >2M requests

Throughput improvements of 94% with every additional instance

Tested on Dutch Supercomputing Cloud (“Enlighten Your Research”)

Page 13: The IMPACT Interoperability Framework - Workflows for OCR and beyond

Outlook

Online service for testing/evaluation Specification & Guidelines

Extending the scope:Workflows for linguistic analysis: CLARINWorkflows for preservation: SCAPE

Even better scalability: Map/Reduce

Supported by a community of developers & practitioners

Page 14: The IMPACT Interoperability Framework - Workflows for OCR and beyond
Page 15: The IMPACT Interoperability Framework - Workflows for OCR and beyond

“Anyway, the thing about progress is that is always seems greater than it really is.”

Ludwig Wittgenstein, Philosophical Investigations (quoting Johann Nestroy)

xkcd.com/688