procedures for data integration through cidoc crm.pdf

36
Session II: Chair Luc Van Eycken Encoding Cultural Heritage Encoding Cultural Heritage Information for the Semantic Web Procedures for Data Integration h h i through CIDOC CRM Mapping Ø. Eide*, A. Felicetti**, C. E. Ore*, A. D’Andrea*** and J. Holmen* * Unit for Digital Documentation, University of Oslo - Norway Unit for Digital Documentation, University of Oslo Norway ** PIN Scarl VASTLAB, University of Florence - Italy *** CISA, University fo Naples “L’Orientale” - Italy EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Upload: vudang

Post on 23-Dec-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Procedures for Data Integration through CIDOC CRM.pdf

Session II: Chair Luc Van Eycken

Encoding Cultural Heritage Encoding Cultural Heritage Information for the Semantic Web

Procedures for Data Integration h h ithrough CIDOC CRM Mapping

Ø. Eide*, A. Felicetti**, C. E. Ore*, A. D’Andrea*** and J. Holmen*

* Unit for Digital Documentation, University of Oslo - Norway Unit for Digital Documentation, University of Oslo Norway

** PIN Scarl VASTLAB, University of Florence - Italy

*** CISA, University fo Naples “L’Orientale” - Italy

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Page 2: Procedures for Data Integration through CIDOC CRM.pdf

The BABEL tower….

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Page 3: Procedures for Data Integration through CIDOC CRM.pdf

The BABEL tower….

I CH i l th BABEL t i Metaphor about lack of communication or In CH circles the BABEL tower is a

incommutability for too many languages and different culturesand different cultures

Languages Terminology Thesauri StandardTh i S d d

Languages gyLanguages Terminology Thesauri Standard

Languages Terminology Thesauri StandardLanguages Terminology Thesauri Standard

Languages Terminology Thesauri Standardh S d d

Languages gyLanguages Terminologies Thesaurii Standard

KnowledgesKnowledgesKnowledgesKnowledgesKnowledges……………………

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Page 4: Procedures for Data Integration through CIDOC CRM.pdf

The BABEL tower….

As regard archaeological documentation (forms, reports, etc..) we have different

Archaeological Best Practices

(forms, reports, etc..) we have different streams:

National

ArchaeologicalBackgrounds

Best Practices

Rules Standard

Local Experiences

LegalObbligations

Experiences

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Page 5: Procedures for Data Integration through CIDOC CRM.pdf

The BABEL tower….

A paradoxical situation could be: Io sono un I’m a classical je ne fait pas des fouilles Io sono un preistorico e scavo seguendo le superfici…

I m a classical archaeologist. I dig following inscriptions and texts….

je ne fait pas des fouilles archéologiques... mais je rassemble la documentation suivant le St d d ICOMOS

pStandard ICOMOS….

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Here is what can happen in the same archaeological area…..

Page 6: Procedures for Data Integration through CIDOC CRM.pdf

The BABEL tower….

Could the technology help us in overcoming these gy p gissues….SURE!!!!

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Page 7: Procedures for Data Integration through CIDOC CRM.pdf

The BABEL tower….

Languages Terminologies Thesaurii Standard +Human h +approach

+National Rules

Legal obbligations

Standard GuidelinesLegal approach

l fProgram Formats Softwares +ITC Platforms

gLanguages

Formats Softwares +ITC

Resources on-line with scarce access, not alwaysuseful and usable for the lack of interoperability

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

useful and usable for the lack of interoperability

Page 8: Procedures for Data Integration through CIDOC CRM.pdf

A new challenge

A new challenge for the reconciliation of these disparate reconciliation of these disparate sources stored separately each other(we wanted also try to reconciliate different disciplinary groups separated by cultures a much as content (methods, theories…)

A SEMANTIC APPROACHA SEMANTIC APPROACH………..The integration of knowledge…. through ontologiesthrough ontologies

The use of CIDOC-CRM as a sort of inter-lingua enabling to guarantee data and schemas integration among cultural heritage information/archives/repositories

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

heritage information/archives/repositories

Page 9: Procedures for Data Integration through CIDOC CRM.pdf

What is the CIDOC CRM?

• is not a metadata standard• it is a Conceptual Reference Model for the

analysis and design of cultural information systems

• does not define the terminology used to gydocument these data structures

• It is addressed to explain the logic of what they “do” documentationwhat they do documentation

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Page 10: Procedures for Data Integration through CIDOC CRM.pdf

WP 3.3 Common Infrastructure

As part of the EPOCH a complete framework for the mapping and management of cultural heritage mapping and management of cultural heritage information in a semantic web context has been developed by

• PIN (University of Florence, Italy)

CISA (University of Naples “L’Orientale” Italy) • CISA (University of Naples “L’Orientale”, Italy)

• EDD (University of Oslo, Norway)

AMAAMA (Archive Mapper for Archaeology) is one of the NEWTON projects (NEW TOols Needed)

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Page 11: Procedures for Data Integration through CIDOC CRM.pdf

Background

BUT!!!

mapping requires skills and mapping requires skills and knowledge which are g

uncommon among the cultural heritage cultural heritage

Professionals

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Page 12: Procedures for Data Integration through CIDOC CRM.pdf

AMA Project

The aim of the AMA project was to develop a tools for semi-automated mapping of cultural heritage for semi automated mapping of cultural heritage data to CIDOC-CRM (ISO 2006: 21127).

The reason for this investment was that such tool The reason for this investment was that such tool can enhance interoperability among the different archives and datasets produced in the field of pCultural Heritage.

We created a tool able to extract and encode We created a tool able to extract and encode legacy information coming from diverse sources, to store and manage this information using a semantic enabled container and to make it available for query and reuse.

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Page 13: Procedures for Data Integration through CIDOC CRM.pdf

AMA project: Two key-words

The tool relied upon two concepts:

Mapping = the rules allowing to write and conciliate different schemas

Template = the instantiation of the abstract pmapping between the source data structure and the mapping target (CIDOC-CRM compliant). Templates

h f h dcapture the semantic structure of the sources and their transformations ensuring the modularity and future maintenance of the systemfuture maintenance of the system.

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Page 14: Procedures for Data Integration through CIDOC CRM.pdf

A schema for a mapping-tool

Template

Classes and Properties

Template

E65 E t

Schema to be mapped CIDOC-CRMClasses and Properties

USE31_InformationCarrierE65_Event

CompilationEventP94_was created by

is a Site

p

P14_carried out by

E39 actor P70 is documented in

Compiler

E39_actor

E21 person

_

is a

YearForm

E21_person

E50_date

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Page 15: Procedures for Data Integration through CIDOC CRM.pdf

AMATool

The tool set developed in the AMA project includes:p j

• A powerful mapping application for the creation of mappings from existing datasets• A tool for mapping cultural heritage information contained in free text into a CIDOCinformation contained in free text into a CIDOC-CRM compliant data model• Templates describing relations between the Templates describing relations between the structure of existing archives and CIDOC-CRM• A semantic framework to store, manage and browse the encoded information providing user-friendly interfaces

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Page 16: Procedures for Data Integration through CIDOC CRM.pdf

AMATool: partners

CISA (Italy)PIN (It l )PIN (Italy)UNIREL (Italy)CIMEC (Romania)CIMEC (Romania)IAA (Israel)Oxford ArchDigital (UK)g ( )University of Kent (UK)Paveprime LtD (UK)U i it f O l (N )University of Oslo (Norway)ROB (Netherlands)VARTEC (Belgium)VARTEC (Belgium)

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Page 17: Procedures for Data Integration through CIDOC CRM.pdf

Tools developed

AMACurrent releaseCurrent releaseNew online applicationFuture developmentpSoftware release after EPOCH

MADMAD• Review release• Development after reviewDevelopment after review• Future development• Software release after EPOCH

• Complementary tools

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Page 18: Procedures for Data Integration through CIDOC CRM.pdf

AMA: Archive Mapper for Archaeology

Creation of an Open Source tool for mapping existing Creation of an Open Source tool for mapping existing archaeological datasets to CIDOC-CRM compliant structuresEnhance data interchange and interoperability between existing and future interoperability between existing and future repositorieshttp://www.epoch-net.org/AMA/

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Page 19: Procedures for Data Integration through CIDOC CRM.pdf

AMA: the new online application

A new and powerful online version of the application is under development (PHP)

Beta version available at http://ama.ilbello.com

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Page 20: Procedures for Data Integration through CIDOC CRM.pdf

AMA: new version features

Possibility to upload both starting schema and target ontology for any kind of mapping (not only towards ontology for any kind of mapping (not only towards CIDOC-CRM)

Ontology to ontology mapping capabilitiesOntology to ontology mapping capabilities

Advanced mapping features for complex mapping d fi iti (i ti f titi d tidefinitions (i.e. creation of new entities and propertiesto represent implicit elements and relations)Creation of shortcuts to simplify the mapping processp y pp g p

Generation of mapping templates to be applied to information stored in databases in order to get perfectly information stored in databases in order to get perfectly converted semantic archives

Graphic visualization of simple an complex relations

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Graphic visualization of simple an complex relations obtained during the mapping process.

Page 21: Procedures for Data Integration through CIDOC CRM.pdf

AMA: forthcoming developments

Inplementation of a high level mapping language to Inplementation of a high level mapping language to describe the mappings (under development by Martin Doerr's team in Crete)

Implementation and archiving of mapping templates to be used as mapping starting points

Integration with MAD and other tools for authomatic data extraction and storing in a semantic containerg

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Page 22: Procedures for Data Integration through CIDOC CRM.pdf

AMA releases after EPOCH

2 versions: Online Application and a stand-aloneapplication

GPL licenses

Supported formats: all XML compliant documents, includingsemantic grammars (RDF and OWL), and schema formats(XSD/RDFS)(XSD/RDFS)

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Page 23: Procedures for Data Integration through CIDOC CRM.pdf

CIDOC-CRM

The most central classes and properties:

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Page 24: Procedures for Data Integration through CIDOC CRM.pdf

TEI guidelines

A set of guidelines for XML encoding of culture heritage texts There are three culture heritage texts. There are three primary functions of the TEI guidelines:

• guidance for individual or local practice in • guidance for individual or local practice in text creation and data capture;

t f d t i t h• support of data interchange;• support of application-independent local

processing.

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Page 25: Procedures for Data Integration through CIDOC CRM.pdf

From text to CIDOC-CRM

• Free text documents contain useful informationinformation

• This information is more useful in a computerised environment if it can be computerised environment if it can be extracted into a well defined format

• This is impossible to do correctly by • This is impossible to do correctly by automatic tools

• This is very time consuming to do manually• This is very time consuming to do manually• Therefore: A semi-automatic approach

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Page 26: Procedures for Data Integration through CIDOC CRM.pdf

Amatexttool

• Developed by EDD, University of OsloB d 14 i XML • Based on 14 years experience on XML content markup of texts

• Tool for semi-automatic semantic enrichment of XML markup

• Creates CIDOC-CRM compliant markup• Information stored in XML• Information stored in XML

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Page 27: Procedures for Data Integration through CIDOC CRM.pdf

What to encode in a text

• Reference to entities (e.g. person names)( g p )• Co-reference in the text (e.g. "he" refers

to the same person as the name on the to the same person as the name on the previous line)

• Relations between entities (e g family • Relations between entities (e.g. family relations between persons)

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Page 28: Procedures for Data Integration through CIDOC CRM.pdf

Search—choose—tag

• In the tool a document can be searhed • In the tool, a document can be searhed for pattersThe result list is a KWIC concordance• The result list is a KWIC concordance

• All or some of the results can be chosen ffor tagging

• Iterative process —semantic bootstrapping

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Page 29: Procedures for Data Integration through CIDOC CRM.pdf

Amatexttool example

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Page 30: Procedures for Data Integration through CIDOC CRM.pdf

MAD: Managing archaeological data

Application designed to manage structured and unstructured archaeological excavation datasetsencoded in XMLDeveloped using Open Source technologies and entirely based on XML and W3C g ystandardsThe core: eXist Native XML Database (http://exist-db.org)

Fully XPath/XQuery and SPARQL aware

Featured by dynamic XSLT transformation and Featured by dynamic XSLT transformation and presentation of documents and query results

CIDOC-CRM ontology for semantic data encoding

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Page 31: Procedures for Data Integration through CIDOC CRM.pdf

Data structure of MAD

MAD

XML documents indexed and stored in a file-system-like structure of folders and of folders and subfolders

• XPath and XQuery to

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

• XPath and XQuery to query XML documents

Page 32: Procedures for Data Integration through CIDOC CRM.pdf

MAD: Testing the application

Fi t l f MAD t t d f th EPOCH P t ’ XML First release of MAD tested for the EPOCH Partners’ XML database (http://partners.epoch-net.org:8080/exist/partners/index.xml)g p )

... and for an XML database of relevant european and italian projects (http://www.epoch-italian projects (http://www.epochnet.org:8080/exist/projects/index.xml)

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Page 33: Procedures for Data Integration through CIDOC CRM.pdf

MAD: The SAD extension

Implementation of the RDF language

Full SPARQL and RQL query languages supportFull SPARQL and RQL query languages support

Semantic Browser: intuitive set of interfaces tonavigate semantic modelsFull CIDOC-CRM compliancynavigate semantic modelsFull CIDOC-CRM compliancy

Presented at the last EPOCH review in July 2007

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Page 34: Procedures for Data Integration through CIDOC CRM.pdf

MAD: Real Life applications

Archaeological excavation dataset of Cuma containinginformation on Stratigraphical units and other relatedresources converted from Syslat (HyperCard) andmanaged using MADg g

The MAD framework used for the creation of an onlineapplication for the complete managament of coinsapplication for the complete managament of coinscollections for the COINS Project - http://www.coins-project.eu)

Some features of MAD will be used for the creation of awide XML repository of ancient iberian pottery(University of Jaen - Spain)

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

Page 35: Procedures for Data Integration through CIDOC CRM.pdf

MAD: The future

Full integration of GML geospatial featureswithin the CIDOC-CRM framework

XML nativ support of MAD can be used in thefield of digital preservation for setting upannotations repositories and creating co-annotations repositories and creating co-reference resolution services

P ibl COLLADA/X3D t f 3D bj tPossible COLLADA/X3D support for 3D objectsrepresented in XML-based formats

AJAX integration and XForms support for easyweb applications development and complexXML information deployment and integrationXML information deployment and integration

Interfaces development to query complexsemantic data in a visual and user friendly

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008

semantic data in a visual and user-friendlyway

Page 36: Procedures for Data Integration through CIDOC CRM.pdf

MAD releases

2 versions: Online Application and DownloadableVersion to be used as a stand-alone application ori t t d i li t t t (i i di t ib t dintegrated in a client-server context (i.e. using distributedarchives)

GPL License

Supported formats: all XML compliant documents, includingSupported formats: all XML compliant documents, includingsemantic grammars (RDF and OWL), graphic formats(SVG), geographic formats (GML) and 3D formats(COLLADA/X3D)(COLLADA/X3D)

EPOCH Conference on OPEN DIGITAL CULTURAL HERITAGE SYSTEMS - Rome 25th feb 2008