tei, cidoc-crm and a possible interface between the two? Øyvind eide & christian-emil ore unit...
TRANSCRIPT
TEI, CIDOC-CRM and a Possible Interface between the Two?
Øyvind Eide & Christian-Emil Ore
Unit for Digital Documentation, University of Oslo, Norway
The CIDOC Conceptual Reference Model(cidoc.ics.forth.gr)
• What is the CIDOC CRM?– An object oriented ontology developed by ICOM-CIDOC, 1996-2005– Accepted as ISO-21127 in June 2005– About 80 classes and 130 properties for cultural and natural history– CRM instances can be encoded in many forms: RDBMS, ooDBMS, XML,
RDF(S), OWL.
• What is the CIDOC CRM for?– Intellectual guide to create schemata, formats, profiles Extension of CRM with a
categorical level, e.g. reoccurring events– Best practice guide– A language for analysis of existing sources and models for data integration
(mapping)– Transportation format for data integration / migration /Internet
• Ongoing activities– CRM-Core– Harmonisation with object oriented version of FRBR, (Functional Requirement for
Bibliographic Records, IFLA), first version will be published in fall 2006– Extension of CRM with a categorical level, e.g. reoccurring events
The CIDOC CRM Top-level Classes relevant for Integration
participate in
E39 Actors(persons, inst.)
E55 Types
E28 Conceptual Objects
E18 Physical Things
E2 Temporal Entities(Events)
E41
Ap
pel
lati
ons
refer to / refine
refe
r to
/ i d
ent i f
ie
have location
within
E53 PlacesE52 Time-Spans
at
affect or refer to
CIDOC CRM: Class hierarchy
CIDOC CRM: Events
CIDOC CRM: Things and Conceptual object
Original text (text witness)
Bibliographical record
Text with XML mark-up 1. Structural mark-up (2. Lemmatization etc.)
Step 1: registration
Step 3: transcriptionFacsimile
Step 2: reproduction
Text with XML mark-up Information elements identified and marked up according to a simple information model, DTD)
Step 4: content mark-up Museum database artefacts, excavations, referential information
Event/object oriented model (CIDOC-CRM compatible)
Motivation: Grey literature in Museums
Catalogue entry
8.Malayan dagger, taken from pirates of the Indian Oceans.
Beautiful handle, graven as a human figure above waistline. Snake winded blade. VII, IX, p, 2. Daa,O., 99.
Donated April 11 1856 from Captain Teiste.
Motivation: Grey literature in Museums
Catalogue entry with mark up
<NRPAR> <CATNR NRID="EM8"> 8</CATNR>. <ARTIFDATA><PROD><USE><PEOPLE><PLACE> Malayan </PLACE></PEOPLE></USE></PROD> <ARTIFACT> dagger </ARTIFACT> , <AQUISITION> taken from <AQUFROM>pirates</AQUFROM> of the Indian Oceans. </AQUISITION>
<DESCR>Beautiful handle, graven as a human figure above waistline. Snake winded blade. <LIT_REF>VII, IX, p, 2. Daa,O., 99.</LIT_REF></DESC>
<AQUISITION> Donated <AQUTIME> April 11 1856 </AQUTIME> from <AQUFROM> Captain Teiste </AQUFROM>. </AQUISITION> </ARTIFDATA> </NRPAR>
Motivation: Grey literature in Museums
The excavation in Wasteland in 2005 was performed by Dr. Diggey. He had the misfortune of breaking the beautiful sword (C50435) into 30 pieces.
Motivation: Grey literature in Museums
E31 Document
E21 Person (actor)
E82 Actor appellaton
”Dr. Diggey”
E7 Activity
E52 Time span
E50 Date
”2005”
E55 Type
”Archaeological report”
P2 has type
P1 is identified by
E11 Modification
”Breaking of the sword”
P9 forms part of
P14 carried out by
E22 Man–Made object
“Sword”
P12 was present at
P70 documents
P4 has time-span
E55 Type
”Archaeological excavation”
E53 Place
E44 Place appellaton
”Wasteland”
P7 took place at
E82 Object identifier
” C50435”
P2 has type
The content of the text expressed in CIDOC-CRM
P1 is identified by P78 is identified byP87 is identified by
• Originally, a research project within the humanities– Founded in 1987-88– Sponsored by three professional associations– Funded 1990-1994 by US NEH, EU LE Programme etal
• Major influences– digital libraries and text collections– language corpora– scholarly datasets
• International consortium established June 1999 (see• http://www.tei-c.org/)
TEI - where did itcome from?
Acc. to L. Burnard
• better interchange and integration of scholarly data• support for all texts, in all languages, from all periods• guidance for the perplexed: what to encode — hence, a
user-driven codification of existing best practice• assistance for the specialist: how to encode — hence, a
loose framework into which unpredictable extensions can be fitted
• These apparently incompatible goals result in a highly flexible, modular, environment
Goals of the TEI
Acc. to L. Burnard
• A set of recommendations for text encoding, covering both generic text structures and some highly specific areas based on (but not limited by) existing practice
• A very large collection of element (400+) definitions with associated declarations for various schema languages
• a modular system for creating personalized schemas or DTDs from the foregoing
• for the full picture see http://www.tei-c.org/TEI/Guidelines/
TEI Deliverables
Acc. to L. Burnard
• a way of looking at what ‘text’ really is• a codification of current scholarly practice• (crucially) a set of shared assumptions about the digital
agenda:– focus on content and function (rather than
presentation)– identify generic solutions (rather than application-
specific ones)
Legacy of the TEI
Acc. to L. Burnard
• Elements for detailed bibliographic description:– File description
• Title statement• Edition statement• Extent statement• Publication statement• Series statement• Notes• Source Description
– bibliographic elements • (Manuscript description)
– Encoding description– Profile description– Revision description
• Mapping to other meta data standards– Marc, discusset– Dublin Core unfinished
TEI - the header
• Base Tag Set for Verse• Performance Texts• Transcription of Speech• Print Dictionaries• Manuscript description• Linking and alignment; analysis• Feature structures;• Certainty; physical transcription; textual criticism,• Names and dates• Graphs, networks and trees• Graphics, figures and tables• Language Corpora• Representation of non-standard characters and glyphs • Feature System Declaration
TEI additional element sets
Some “ontological” elements in TEI: Events
• History– groups elements describing the full history of a manuscript or
manuscript part. • Origin
– contains any descriptive or other information concerning the origin of a manuscript or manuscript part
• CustEvent– describes a single event during the custodial history of a manuscript
• Provenance– contains any descriptive or other information concerning the origin of a
manuscript or manuscript part • Acquisition
– contains any descriptive or other information concerning the process by which a manuscript or manuscript part entered the holding institution.
• Event– (Event) any phenomenon or occurrence, not necessarily vocalized or
communicative, for example incidental noises or other events affecting communication. Eg. “ceiling collapses” during a recorded interview
• persEvent– contains a description of a particular event of significance in the life of a
person • Birth,death
– contains information about a person's birth/death, such as its date and place
• Date– contains a date in any format.
• Occasion– a temporal expression (either a date or a time) given in terms of a
named occasion such as a holiday, a named time of day, or some notable event
Some “ontological” elements in TEI: Events, time appellations
• Person – provides information about an identifiable individual, for example
a participant in a language interaction, or a person referred to in a historical source.
• Hand– used in the header to define each distinct scribe or handwriting
style.
• Author– in a bibliographic reference, contains the name of the author(s),
personal or corporate, of a work; the primary statement of responsibility for any bibliographic item
• Name– (name, proper noun) contains a proper noun or noun phrase
Some “ontological” elements in TEI: Actors and appellations
<person xml:id="Ovi01" sex="1" role="poet"> <persName xml:lang="en">Ovid</persName> <persName xml:lang="la">Publius Ovidius Naso</persName> <birth date="-0044-03-20"> 20 March 43 BC <placeName> <settlement type="city">Sulmona</settlement> <country reg="IT">Italy</country> </placeName> </birth> <death notBefore="17" notAfter="18">
17 or 18 AD <placeName> <settlement type="city">Tomis (Constanta)</settlement> <country reg="RO">Romania</country> </placeName> </death>
</person>
Some “ontological” elements in TEI: Person example (from P5 guidelines)
A simple extension of the TEI-dtdThe root CIDOC-CRM element<!ELEMENT crm (crmClass*, crmProperty*)> <!ATTLIST crm id #ID>
The class element<!ELEMENT crmClass #PCDATA ><!ATTLIST crmClass
id #ID className #CDATA>
The property element<!ELEMENT crmProperty #EMPTY <!ATTLIST crmProperty
id #ID propName #CDATA from #IDREF to #IDREF>
The excavation in Wasteland in 2005 was performed by Dr. Diggey. He had the misfortune of breaking the beautiful sword (C50435) into 30 pieces.
The sample text revisited
The text expressed with a TEI mark-up
<p id="p1">The <rs id="e1">excavation in
<name type="place" id="n1">Wasteland</name></rs> in
<date id="d1">2005</date> was performed by
<name type="person" id="n2">Dr. Diggey</name>. He had the misfortune of
<rs id="e2">breaking <rs id="o1">the beautiful sword
<rs id=“o_id1”>(C50435)</rs></rs> into 30 pieces
</rs>.</p>
<crm id="crm-mod1"> <crmClass id="ent1" className=“E7_Activity"></crmClass> <crmClass id="ent2" className=“E55_Type">archaeological excavation</crmClass> <crmClass id="ent3" className=“E21_Person"></crmClass> <crmClass id="ent4" className=“E82_Actor_Appellation">Dr. Diggey</crmClass> <crmClass id="ent5" className=“E31_Document"></crmClass> <crmClass id="ent6" className=“E52_Time-span"></crmClass> <crmClass id="ent7" className=“E50_Date">2005</crmClass> <crmClass id="ent8" className=“E31_Document"></crmClass>… <crmProperty id="prop1" propName=“P2_has_type" from="ent1" to="ent2"/> <crmProperty id="prop2" propName=“P14_carried_out_by" from="ent1" to="ent3"/> <crmProperty id="prop3" propName=“P131_is_identified_by" from="ent3" to="ent4"/> <crmProperty id="prop4" propName=“P70_is_documented_in" from="ent1" to="ent8"/> <crmProperty id="prop5" propName=“P70_is_documented_in" from="ent4" to="ent5"/> <crmProperty id="prop6" propName=“P4_has_time_span" from="ent1" to="ent6"/> <crmProperty id="prop7" propName=“P78_is_identified_by" from="ent6" to="ent7"/>…</crm><linkGrp type="TEI-CRM interface"> <link targets="#ent5 #n2"/> <link targets="#ent8 #e1"/>…</linkGrp>
Encoding the information in an RDF-triplet fashion
CRM-Core – a dtd for encoding information [suggested by CRM-SIG]
<CRM_Core> <Category>E31 Document</Category> <Classification>Archaeological report</Classification> <Identification>Wasteland excavation 2005 report</Identification> <Event> <Role_in_Event>P70_documents</Role_in_Event> <Identification>Wasteland_2005_excavation</Identification> <Event_Type>E7_Activity</Event_Type> <Participant>Dr. Diggey</Participant> <Participant_Type>excavator</Participant_Type> <Thing_Present>C50435 sword</Thing_Present> <Date>2005</Date><Place>Wasteland</Place> </Event> <Event> <Role_in_Event>P70_documents</Role_in_Event> <Identification>damage_to_artifact_C50435</Identification> <Event_Type>E11_Modification</Event_Type> <Participant>Dr. Diggey</Participant> <Participant_Type>excavator</Participant_Type> <Thing_Present>C50435 sword</Thing_Present> <RelatedEvent> <Role_in_Event>P9_forms_part_of</Role_in_Event> <Identification>Wasteland_2005_excavation</Identification> </RelatedEvent> </Event></CRM_Core>
Encoding the information in CRM Core (Factoides)
<CRM_Core> <Category>E21 Person</Category> <Classification>archaeologist</Classification> <Identification>Dr. Diggey</Identification> <Event> <Role_in_Event>P14 carried out by</Role_in_Event> <Identification>damage_to_artifact_C50435</Identification> <Event_Type>E11 Modification</Event_Type> <Participant_Type>excavator</Participant_Type> <Thing_Present>C50435 sword</Thing_Present> </Event></CRM_Core><CRM_Core> <Category>E82 Actor appellaton</Category> <Classification>formal name</Classification> <Identification>mention of name</Identification> <Relation> <To>Wasteland_excavation_2005_report#n2</To> <Relation_Type> <referred_to_by/> </Relation_Type> </Relation></CRM_Core>
Encoding the information in CRM Core (Factoides)
Conclusions and further work
• Possible now– TEI extended with a RDF-like CIDOC-CRM– TEI extended with CRM-Core records
• Future:– Make a mapping from TEI-elements to CRM– Make a mapping from the TEI-header into ooFRBR– Create an extension of the TEI definition – Write guidelines for CIDOC-CRM encoding of
information in TEI documents – Convince the TEI users