the mint mapping tool and the more aggregator

36
The Mint Mapping tool The MoRe aggregator Vassilis Tzouvaras, Dimitris Gavrilis National Technical University of Athens Digital Curation Unit - IMIS, Athena Research Center LoCloud is funded by the European Commission's ICT Policy Support Programme

Upload: locloud

Post on 15-Apr-2017

221 views

Category:

Internet


1 download

TRANSCRIPT

The Mint Mapping toolThe MoRe aggregator

Vassilis Tzouvaras, Dimitris Gavrilis

National Technical University of AthensDigital Curation Unit - IMIS, Athena Research Center

LoCloud is funded by the European Commission's ICT Policy Support Programme

Cultural Heritage Content

• Diversity of cultural heritage content– Numerous metadata schemas to annotate content

(LIDO, CIDOC-CRM, EAD, METS ) • Massive digitization and annotation activities are in

progress• Need for interoperability

MINT Mapping Tool

• Provides users the ability to perform a mapping of their own metadata schemas to reference domain models

• Follows a typical web based architecture• It was developed for ATHENA, but it is currently used

for EUScreen, CARARE, Judaica, ECLAP, DCA and Linked Heritage

MINT 2 – What’s new?

• The backend was reconstructed for better performance– File size for imports is extended

• The frontend was updated– New interface– Workflow is integrated in UI– Facilitated browsing of input and target schema

MORe Overall Architecture

Registry

Apache Cassandra cluster

Fedora-commons

Temporary storage

Vocabulary services

Storage

JMS logging

Messaging

Core services

Enrichment service management

Entity matching / NLP

Geocoding / Historic Place names

REST

External enrichment services

Publish service management OAI-PMH

RDF Store

Elastic Search

Archive

Cloud architecture

• De-centralized• Scalable• Four cloud environmets– Storage– Monitoring & logging– Core services deployment– Enrichment services deployment

Distributed

• Enrichment services run on:– Austria– Spain– Greece– Lithuania– Slovenia– Norway

• Scalability can be facilitated through a virtualization infrastructure

Workflow

OAI-PMH

LoCloud Collections

Wikimedia

MINT

Harvest

Ingest

Transform Enrich

Publish

OAI-PMH

Archive

RDF Store

SolR

Validate Index

Delete Reject

Omeka

Intermediate Schemas

Dublin Core

LIDO

CARARE

EAD

ESE

EDM

Dublin Core

LIDO

CARARE

EAD

ESE

EDM

OMEKA-XML

OGD

• Harvesting• Validation• Ingestion• Transformation• Enrichment• Previewing• Publishing

Core services

Harvests content from metadata sourcesOAI-PMH repositoryMINTLoCloud CollectionsWikimedia

Multiple schemas are supportedOAI_DCCARARECARARE 2.0LIDOEADEDMESE

• Harvesting• Validation• Ingestion• Transformation• Enrichment• Previewing• Publishing

Core services

Validates incoming information packagesExecutes validation schemesValidation micro-services

StructureSchemaLinkingSchematron rules

Flexible

How it is used in MoRe:Pre-validation Post-validation

• Harvesting• Validation• Ingestion• Transformation• Enrichment• Previewing• Publishing

Core services

Ingest content into storageUses storage layer APIPluggable drivers for attaching different technologies / repositories

Apache CassandraFilesystem-basedFedora-commons

Versioning supportComplex digital object support

• Harvesting• Validation• Ingestion• Transformation• Enrichment• Previewing• Publishing

Core services

Content Model

Digital objects comprise data streams

Each data stream can hold any kind of information• XML/RDF, Image, Video, Documents, etc.

Each different representation of an information object is stored as a different data stream

Each curation action generates a new version• Transformation, Enrichment

• Harvesting• Validation• Ingestion• Transformation• Enrichment• Previewing• Publishing

Core services

Transforms entire information packages into the Europeana Data Model (EDM), or any other schema

Multiple transformation routinesPer schemaPer projectPer provider

User can attach rights statement

• Harvesting• Validation• Ingestion• Transformation• Enrichment• Previewing• Publishing

Core services

The generic enrichment service facilitates the execution of the enrichment micro-services

• Hides the complexity from the user by using enrichment plans

• Provides seamless integration with the UI of MORE

Virtual Enrichment driver• Allows developers/creative industries to create

their own enrichment services and declare/use them within MoRe

• Harvesting• Validation• Ingestion• Transformation• Enrichment• Previewing• Publishing

Core services

Preview the XML record information for all datastreams

Preview the record in HTML (using the Europeana style sheet)

• Harvesting• Validation• Ingestion• Transformation• Enrichment• Previewing• Publishing

Core services

Publish transformed / enriched information• Internal OAI-PMH provider• XML export • Publish directly to RDF repositories

• Sesame• Virtuoso

• SolR index server

• Thematic– Thesauri collections– Vocabulary matching– Background links

• Spatial– Geo normalization– Geo coding– Reverse geo-coding– Historic place names

• Other– Language identification

Enrichment micro-services

SKOS Thesauri

Geo-Names

DBPedia

Wikipedia

Enrichment Plan

• Enrichment micro-services are used within enrichment workflows: – Enrichment plans

• Each enrichment plan applies to a specific schema

• Each enrichment plan executes enrichment micro-services in a specific order

Enrichment plans

Language identification

Vocabulary matching

Geo-normalization

Geo-coding

Enrichment Plan

• Each enrichment plan defines run-time parameters for specific services– Content based

Enrichment plans

Language identification

Vocabulary matching

Geo-normalization

Geo-coding

Add subject collection A only if term X or Y

are matched

Dashboard

Packages organization

Package overview

Package lifecycle overview

Preview

Metadata completeness & statistics

Enrichment services overview

Direct access to 27 thesauriCreate & (re)use subject collections