towards a digital edition of the slovenian biographical lexicon
DESCRIPTION
Towards a Digital Edition of the Slovenian Biographical Lexicon. Petra Vide Ogrin Slovenia n Academy of Sciences a nd Arts, Library Tomaž Erjavec Department of Knowledge Technologies, Jožef Stefan Institute. Overview of the talk. SBL (publication, nature, significance) M ethodology: - PowerPoint PPT PresentationTRANSCRIPT
INFuture 2007, ZagrebINFuture 2007, Zagreb
Towards a Digital Edition of the Slovenian Biographical Lexicon
Petra Vide OgrinSlovenian Academy of Sciences and Arts,
Library
Tomaž ErjavecDepartment of Knowledge Technologies,
Jožef Stefan Institute
INFuture 2007, ZagrebINFuture 2007, Zagreb
SBL (publication, nature, significance)SBL (publication, nature, significance) MMethodology: ethodology:
TEI P5TEI P5 up-conversion into TEI-XML formatup-conversion into TEI-XML format
EExample of TEI-XML article structure:xample of TEI-XML article structure: skeletonskeleton actual XML documentactual XML document
FFuture plans: implementation of IR systemuture plans: implementation of IR system
Overview of the talkOverview of the talk
INFuture 2007, ZagrebINFuture 2007, Zagreb
SBLSBL
15 volumes + index, published over a long period of 15 volumes + index, published over a long period of time (1925-1991)time (1925-1991)
WWho is included? : notable figures important for ho is included? : notable figures important for Slovenian cultural life, from the beginnings up to the Slovenian cultural life, from the beginnings up to the contemporary timecontemporary time - - criteriacriteria
CCovers 5,031 biographical entries, over 5,100 overs 5,031 biographical entries, over 5,100 personspersons
DData in the articles are checked against the relevant ata in the articles are checked against the relevant primary material sourcesprimary material sources
INFuture 2007, ZagrebINFuture 2007, Zagreb
Methodology of encodingMethodology of encoding
UUse of open standards and softwarese of open standards and software UUse of TEI P5 Guidelinesse of TEI P5 Guidelines UUp-conversion from OCR source into TEI-XMLp-conversion from OCR source into TEI-XML DDown-conversion into XHTMLown-conversion into XHTML
((IImplementation of DL open source software mplementation of DL open source software →→ fullfull--text and advanced searching)text and advanced searching)
TEI – Text Encoding InitiativeTEI – Text Encoding Initiative
What’s TEI?What’s TEI? Why do we encode?Why do we encode?
to make explicit (to a machine) what is implicit (to a person)
to add value by supplying annotations (structural metadata)
to facilitate re-use of the same material XML (eXtensible Markup Language):
international standard application-, platform- and vendor- independent extensible
INFuture 2007, ZagrebINFuture 2007, Zagreb
TEI P5TEI P5
no backward compatibility with P4 – new possibilities no backward compatibility with P4 – new possibilities for text encodingfor text encoding
validation of an XML document: checking against an validation of an XML document: checking against an XML schemaXML schema
an XML schema (XML syntax) = project-specific an XML schema (XML syntax) = project-specific combination of TEI modulescombination of TEI modules
extension and generalization of modular systemextension and generalization of modular system interoperability and standards (ISO, W3C: Unicode, interoperability and standards (ISO, W3C: Unicode,
lang lang →→ xml:lang, id xml:lang, id →→ xml:id) xml:id) some new elements, e. g. for biographical and some new elements, e. g. for biographical and
prosopographical data prosopographical data →→ relev relevaant for nt for SBLSBL project project
INFuture 2007, ZagrebINFuture 2007, Zagreb
Up-conversion into TEI-XMLUp-conversion into TEI-XML
OpenOffice – TEI OO package (XSLT OpenOffice – TEI OO package (XSLT stylesheets) stylesheets) →→ TEI-XML document (basic TEI-XML document (basic structure)structure)
(semi-)automatic encoding – to achieve the (semi-)automatic encoding – to achieve the needed structure:needed structure: Perl, XSLTPerl, XSLT manual intervention (correction)manual intervention (correction)
INFuture 2007, ZagrebINFuture 2007, Zagreb
An SBL articleAn SBL article TTypical structure: ypical structure:
biographical entrybiographical entry biography: data about birth, death, residence, biography: data about birth, death, residence,
occupation, important events (marriage, ordination occupation, important events (marriage, ordination etc.)etc.)
representative bibliography that depicts a person's representative bibliography that depicts a person's life and worklife and work
OOne or more paragraphsne or more paragraphs EEncyclopaedic style: dense language, many ncyclopaedic style: dense language, many
abbreviations (biblabbreviations (bibliographyiography, authors, general: , authors, general: e.g. e.g. months (Sept.) emonths (Sept.) etc.)tc.)
INFuture 2007, ZagrebINFuture 2007, Zagreb
Article TEI-XML structureArticle TEI-XML structure
<div><div>
<listPerson><listPerson>
<person><person>
<!--other elements for biographical <!--other elements for biographical data: birth, death, occupation ...-->data: birth, death, occupation ...-->
</person></person>
</listPerson></listPerson>
<p><p>
<!--the annotated text of the article--><!--the annotated text of the article-->
</p></p>
</div></div>
INFuture 2007, ZagrebINFuture 2007, Zagreb
Future plansFuture plans
IImplementation of an IR system – for full-mplementation of an IR system – for full-text and advanced searchingtext and advanced searching
PPossible adoption of PhiloLogicossible adoption of PhiloLogic EExploring automatic recognition, extraction xploring automatic recognition, extraction
and and encodingencoding of data of data