talk 4: linguistic annotation; customising the tei; real...
TRANSCRIPT
Linguistic Analysis Customising the TEI Real World TEI
Talk 4: Linguistic Annotation; Customising theTEI; Real World TEI
James Cummings
19 September 2013
@jamescummings 1/81
Linguistic Analysis Customising the TEI Real World TEI
Linguistic Analysis
associating simple analyses and interpretations with textelementssemantic or syntactic interpretations which an encoder wishesto attach to all or part of a textmainly covering linguistic informationas often in the TEI, you can do the same thing in many ways:
using generic <seg> (任意の句レベルのテキスト単位を示す(要素segを含む))elements with @type attributesusing the straightforward analyses described hereusing the more powerful and general TEI Feature Structures
@jamescummings 2/81
Linguistic Analysis Customising the TEI Real World TEI
<teiCorpus> reminderGrouping documents into a corpus allows you to factor out themetadata they have in common:.
......
<teiCorpus xmlns="http://www.tei-c.org/ns/1.0"><teiHeader><!-- shared metadata --></teiHeader><TEI><teiHeader>
<!-- specific metadata --></teiHeader><text>
<!-- ... --></text></TEI><TEI><teiHeader>
<!-- specific metadata --></teiHeader><text>
<!-- ... --></text></TEI></teiCorpus>
@jamescummings 3/81
Linguistic Analysis Customising the TEI Real World TEI
<particDesc> example (1)
<particDesc> (言語交流における,特定可能な発話者,声,その他の参加者を示す).
......
<particDesc xml:id="p2"><p>Female informant, well-educated, born in Shropshire UK,12 Jan 1950, of unknown occupation. Speaks French fluently.Socio-Economic status B2 in the PEP classificationscheme.</p></particDesc>
<particDesc> can just contain paragraphs of prose, or a more structured<person> element in <listPerson>
@jamescummings 4/81
Linguistic Analysis Customising the TEI Real World TEI
<particDesc> example (2).
......
<particDesc><listPerson><person xml:id="SL"><persName><forename>Stuart</forename><surname>Lee</surname></persName><note><reftarget="http://users.ox.ac.uk/~stuart/Site/About_Me.html"> Stuart
Lee's home page</ref></note>
<!-- We could give more details about Stuart here --></person><person xml:id="IH"><persName><forename>Ian</forename><surname>Hislop</surname></persName><note><reftarget="http://en.wikipedia.org/wiki/Ian_Hislop"> Ian Hislop's entry
in Wikipedia</ref></note></person></listPerson></particDesc>
@jamescummings 5/81
Linguistic Analysis Customising the TEI Real World TEI
Basic linguistic units
To mark up text for linguistic purposes:<s> (s-unit) 文に相当するテキスト単位を示す
<cl> (clause) 言語学上の節を示す
<phr> (phrase) 文法上の句を示す
<w> (word) 文法上の語を示す(正書形である必要はない)
<m> (morpheme) 言語学上の形態素を示す
<c> (character) 文字を示す
From the att.segLike class, these elements all have @type and@function attributes
@jamescummings 6/81
Linguistic Analysis Customising the TEI Real World TEI
Example of linguistic markup
Compare.
......<u>Like a suck of one of my sweets?</u><u>No I don't take sweets from strangers, oh God</u>
with....
@jamescummings 7/81
Linguistic Analysis Customising the TEI Real World TEI
Another example of linguistic markup.
......
<u who="PS1K5"><s n="5963"><w type="AV0">Like</w><w type="AT0">a</w><w type="NN1">suck</w><w type="PRF">of</w><w type="CRD">one</w><w type="PRF">of</w><w type="DPS">my</w><w type="NN2">sweets</w> ?</s>
</u><u trans="smooth" who="PS1BY"><s n="5964"><w type="ITJ">No </w><w type="PNP">I </w><w type="VDB">do</w><w type="XX0">n't </w><w type="VVI">take </w><w type="NN2">sweets </w><w type="PRP">from </w><w type="NN2">strangers</w><c type="PUN">, </c><w type="ITJ">oh </w><w type="NP0">God</w></s></u>
(from British National Corpus, KSV 5963)
@jamescummings 8/81
Linguistic Analysis Customising the TEI Real World TEI
Stand-off interpretation
When inline markup is inappropriate, the <span>(テキスト部分に解釈的注釈を関連づける) element can be used to make ad hocremarks about bits of text, linked to by @xml:id. And <spanGrp>(要素spanをまとめる) is available to group assertions together..
......
<sp><speaker>CORNWALL</speaker><ab xml:id="eye_start">Lest it see more, prevent it. Out, vile jelly!</ab><ab>Where is thy lustre now?</ab></sp><sp><speaker>GLOUCESTER</speaker><ab>All dark and comfortless. Where's my son Edmund?</ab><ab>Edmund, enkindle all the sparks of nature,</ab><ab xml:id="eye_end">To quit this horrid act.</ab></sp><span from="#eye_start" to="#eye_end">the eye is pulled out</span>
@jamescummings 9/81
Linguistic Analysis Customising the TEI Real World TEI
Stand-off interpretation with <interp>
The <interp> (あるテキスト部分とリンクする,特定の解釈的注釈をまとめる) element isused to encode an interpretation. The global @ana attribute can pointfrom the text to such an interpretation (or a taxonomy classification):.
......
<ab n="2Sam_12:14"><gap/>by this deed thou hast given great occasion to the enemies of the LORDto blaspheme, the child also that is born unto thee shall surely die.</ab><ab n="2Sam_12:15"><gap/>And the LORD struck the child that Uriah's wife bare unto David<gap/></ab><gap/><ab n="2Sam_12:18" ana="#infanticide">And it came to pass on the seventh day,that the child died.</ab><!-- elsewhere in document --><interp resp="#SAB" xml:id="infanticide">Infanticide: God seems to likekilling children.</interp>
The <interpGrp> element is used to group interpretations together.
@jamescummings 10/81
Linguistic Analysis Customising the TEI Real World TEI
Interpretation example (1)
In this example:A set of possible interpretations is defined, using <interp>elements<seg> is used to markup distinct portions of a narrative<s> is used to mark sentencesthe @ana attribute links sections or milestones to appropriateinterpretation
.
......
<interpGrp resp="#TMA" type="structuralUnit"><interp xml:id="INTRO">introduction</interp><interp xml:id="CONFLICT">conflict</interp><interp xml:id="CLIMAX">climax</interp><interp xml:id="REVENGE">revenge</interp><interp xml:id="RECONCIL">reconciliation</interp><interp xml:id="AFTERM">aftermath</interp></interpGrp>
@jamescummings 11/81
Linguistic Analysis Customising the TEI Real World TEI
Interpretation example (2)
.
......
<p xml:id="PP1"><seg xml:id="SS1-SS3" ana="#INTRO"><s xml:id="SS1">Sigmund ... was a king in Frankish country.</s><s xml:id="SS2">Sinfiotli was the eldest of his sons.</s><s xml:id="SS3">Borghild, Sigmund's wife, had a brother ... </s></seg><s xml:id="SS4A" ana="#CONFLICT">But Sinfiotli ... wooed the same woman</s><s xml:id="SS4B" ana="#I3">and Sinfiotli killed him over it.</s><seg xml:id="SS5-SS17" ana="#CLIMAX"><s xml:id="SS5">And when he came home, ... she was obliged to accept it.</s><s xml:id="SS6">At the funeral feast Borghild was serving beer.</s><s xml:id="SS17">Sinfiotli drank it off and at once fell dead.</s></seg></p><anchor xml:id="NIL1" ana="#RECONCIL"/><p xml:id="PP2">Sigmund carried him a long way in his arms ... </p>
@jamescummings 12/81
Linguistic Analysis Customising the TEI Real World TEI
<taxonomy> Example
.
......
<taxonomy xml:id="part-of-speech"><category xml:id="adje"><catDesc>adjectives</catDesc><category xml:id="AJ0"><catDesc>adjective (unmarked) (e.g. GOOD, OLD)</catDesc></category><category xml:id="AJC"><catDesc>comparative adjective (e.g. BETTER, OLDER)</catDesc></category><category xml:id="AJS"><catDesc>superlative adjective (e.g. BEST, OLDEST)</catDesc></category></category><category xml:id="AT0"><catDesc>article (e.g. THE, A, AN)</catDesc></category><!-- ... --></taxonomy>
.
......<w ana="#AJ0">brilliant</w>
@jamescummings 13/81
Linguistic Analysis Customising the TEI Real World TEI
Phrase segmentation
.
......
<s><cl type="finite-declarative" function="independent"><phr type="NP" function="subject">It</phr><phr type="VP" function="predicate"><phr type="V" function="verb-main">was</phr>also
<phr type="NP" function="predicate-nom.">a crucial year for me</phr></phr></cl></s>
@jamescummings 14/81
Linguistic Analysis Customising the TEI Real World TEI
Words with lemmas and morphemes with types
.
......
<s xml:lang="la"><w lemma="timeo">timeo</w><w lemma="danaii">Danaos</w><w lemma="et">et</w><w lemma="donum">dona</w><w lemma="fero">ferentes</w></s>
or.
......
<w type="adjective"><m type="prefix" baseForm="con">com</m><m type="root">fort</m><m type="suffix">able</m></w>
.
......One could use @lemmaRef instead of @lemma to point to a word’slemma with a URI.
@jamescummings 15/81
Linguistic Analysis Customising the TEI Real World TEI
Nested <w>
.
......
<s><w>I</w><w><w>did</w><m>n't</m></w><w>do</w><w>it</w><c>.</c></s>
@jamescummings 16/81
Linguistic Analysis Customising the TEI Real World TEI
Word analysis
.
......
<s><w ana="#AT0">The</w><w ana="#NN1">victim</w><w ana="#POS">'s</w><w ana="#NN2">friends</w><w ana="#VVD">told</w><w ana="#NN2">police</w><w ana="#CJT">that</w><w ana="#NP0">Kruger</w><w ana="#VVD">drove</w><w ana="#PRP">into</w><w ana="#AT0">the</w><w ana="#NN1">quarry</w><w ana="#CJC">and</w><w ana="#AV0">never</w><w ana="#VVD">surfaced</w></s>
@jamescummings 17/81
Linguistic Analysis Customising the TEI Real World TEI
Interpretation
.
......
<interpGrp type="POS"><interp xml:id="AT0">Definite article</interp><interp xml:id="AV0">Adverb</interp><interp xml:id="CJC">Conjunction</interp><interp xml:id="CJT">Relative that</interp><interp xml:id="NN1">Noun singular</interp><interp xml:id="NN2">Noun plural</interp><interp xml:id="NP0">Proper noun</interp><interp xml:id="POS">Genitive marker</interp><interp xml:id="PRP">Preposition</interp><interp xml:id="VVD">Verb past tense</interp></interpGrp>
@jamescummings 18/81
Linguistic Analysis Customising the TEI Real World TEI
More interpretation
.
......
<u xml:id="u1">Can I have ten oranges and a kilo of bananas please?</u><u xml:id="u2">Yes, anything else?</u><u xml:id="u3">No thanks.</u><u xml:id="u4">That'll be dollar forty.</u><u xml:id="u5">Two dollars</u><u xml:id="u6">Sixty, eighty, two dollars. Thank you.</u><spanGrp type="transactions"><span from="#u1">sale request</span><span from="#u2" to="#u3">sale compliance</span><span from="#u4">sale</span><span from="#u5">purchase</span><span from="#u6">purchase closure</span></spanGrp>
@jamescummings 19/81
Linguistic Analysis Customising the TEI Real World TEI
British National Corpus
a snapshot of British English, taken at the end of the 20thcentury100 million words in approx 4000 different text samples, bothspoken (10%) and written (90%)synchronic (1990-4), sampled, general purpose corpusavailable under licence; latest edition is BNC-XML (13 March2007)Part-of-speech and lemma taggingUses a variant of TEI XML originally called CDIF
@jamescummings 20/81
Linguistic Analysis Customising the TEI Real World TEI
BNC XML.
......
<div level="1" n="1" type="leaflet"><head type="MAIN"><s n="1"><w c5="NN1" hw="factsheet" pos="SUBST">FACTSHEET</w><w c5="DTQ" hw="what" pos="PRON">WHAT</w><w c5="VBZ" hw="be" pos="VERB">IS</w><w c5="NN1" hw="aids" pos="SUBST">AIDS</w><c c5="PUN">?</c></s> </head><p><s n="2"><hi rend="bo"> <w c5="NN1" hw="aids" pos="SUBST">AIDS</w><c c5="PUL">(</c><w c5="VVN-AJ0" hw="acquire" pos="VERB">Acquired</w><w c5="AJ0" hw="immune" pos="ADJ">Immune</w><w c5="NN1" hw="deficiency" pos="SUBST">Deficiency</w><w c5="NN1" hw="syndrome" pos="SUBST">Syndrome</w><c c5="PUR">)</c></hi><w c5="VBZ" hw="be" pos="VERB">is</w><w c5="AT0" hw="a" pos="ART">a</w><w c5="NN1" hw="condition" pos="SUBST">condition</w>
<!-- ... --></s></p></div>
@jamescummings 21/81
Linguistic Analysis Customising the TEI Real World TEI
Spoken Texts
A spoken text may contain any of the following components:utterancespausesvocalized but non-lexical phenomena such as coughskinesic (non-verbal, non-lexical) phenomena such as gesturesentirely non-linguistic incidents occurring during and possiblyinfluencing the course of speechwriting, regarded as a special class of incident in that it canbe transcribed, for example captions or overheads displayedduring a lectureshifts or changes in vocal quality
@jamescummings 22/81
Linguistic Analysis Customising the TEI Real World TEI
The notion of ”utterance”
<u> (一般に沈黙または話者交替が前後する,発話の時間)problematic, but pragmatica sequence of speech from a single speakermay be grouped into higher-level <div>sor fragmented into smaller segments <seg> or <s>the @who attribute points to speaker information
@jamescummings 23/81
Linguistic Analysis Customising the TEI Real World TEI
Transcriptions of Speech
Elements defined: <broadcast>, <equipment>, <incident>,<kinesic>, <pause>, <recording>,<recordingStmt>, <scriptStmt>, <shift>, <u>,<vocal>, <writing>,
Classes defined: att.duration, model.divPart.spoken,model.global.spoken, model.recordingPart
@jamescummings 24/81
Linguistic Analysis Customising the TEI Real World TEI
Simple examplesMixture of utterance and ‘paralinguistic’ information:.
......
<u who="#Jan">This is just delicious</u><incident><desc>telephone rings</desc></incident><u who="#Kim">I'll get it</u><u who="#Tom">I used to <vocal><desc>coughs</desc></vocal> smoke a lot</u><u who="#Bob"><vocal><desc>sniffs</desc></vocal>He thinks he's tough</u><vocal who="#Ann"><desc>snorts</desc></vocal><u who="#Tom">Yeah<kinesic><desc>gives uplifted middle finger sign</desc></kinesic></u>
@jamescummings 25/81
Linguistic Analysis Customising the TEI Real World TEI
Back channelling
<vocal> (音声化されているが,必ずしも単語化される必要はない現象を示す.例えば,有声の間,単語化されない相づち,など).
......
<u who="#a">So what could I have done <vocal who="#b"><desc>tut-tutting</desc></vocal> about it anyway?</u>
@jamescummings 26/81
Linguistic Analysis Customising the TEI Real World TEI
Example using other TEI elements.
......
<u who="#mar">you never <pause/> take this cat forshow and tell<pause/> meow meow</u><u who="#ros">yeah well I dont want to</u><incident><desc>toy cat has bell in tail which continuesto make a tinkling sound</desc>
</incident><u who="#ros">because it is so old</u><u who="#mar">how <choice><orig>bout</orig><reg>about</reg></choice><emph>your</emph> cat <pause/>yours is <emph>new</emph><kinesic><desc>shows Father the cat</desc></kinesic></u><u trans="pause" who="#fat">thats <pause/> darling</u><u who="#mar">no <emph>mine</emph> isnt oldmine is just um a little dirty</u>
@jamescummings 27/81
Linguistic Analysis Customising the TEI Real World TEI
Shifts in voice qualityClassic multiple hierarchy problem
can use <shift> (発話者による一連の発話 (パラ言語) 素性が変化する場所を示す)or <milestone> (当該セクションが構造要素により表現することができない場合に,標準的な参照機能によりテキストの各種セクション間にある境界点を示す) to markboundaries...... or can use typed <seg> elements
useful also for code shifting.
......
<u who="#LB"><shift feature="loud" new="f"/>Elizabeth</u><u who="#EB">Yes</u><u who="#LB"><shift feature="loud"/>Come and try this <pause/><shift feature="loud" new="ff"/>come on!
</u>
@jamescummings 28/81
Linguistic Analysis Customising the TEI Real World TEI
<shift> vs <incident>
<shift> (発話者による一連の発話 (パラ言語) 素性が変化する場所を示す) vs <incident>(必ずしも言語化またはコミュニケーションには上らない現象や出来事を示す.例えば,偶発的な雑音またはコミュニケーションに影響を与える他の事象な ど) Compare:.
......
<u who="#a">Listen to this <shift new="reading"/>Thegovernment is confident, he said, that the current economicproblems will be completely overcome by June<shift/> whatnonsense!</u>
and.
......
<u who="#a">Listen to this<incident><desc>reads aloud from newspaper</desc></incident> what nonsense!</u>
@jamescummings 29/81
Linguistic Analysis Customising the TEI Real World TEI
<vocal> vs <u>
<vocal> (音声化されているが,必ずしも単語化される必要はない現象を示す.例えば,有声の間,単語化されない相づち,など) vs <u>(一般に沈黙または話者交替が前後する,発話の時間) Compare:.
......
<vocal who="#ann"><desc>snorts</desc></vocal>
and.
......
<u who="#ann"><vocal><desc>snorts</desc></vocal></u>
@jamescummings 30/81
Linguistic Analysis Customising the TEI Real World TEI
<writing> example
<writing> (発話テキストの中で参加者に示される,書かれたテキストの一節を示す).
......
<u who="#a">look at this</u><writing who="#a" type="newspaper" gradual="false">Government claims economic problems <soCalled>over byJune</soCalled></writing><u who="#a">what nonsense!</u>
@jamescummings 31/81
Linguistic Analysis Customising the TEI Real World TEI
Timing issues
pausing: use <pause> elementduration: use @dur attributesynchronization: use @synch attributeoverlap: use @trans attribute
.
......
<u>Okay <pause dur="PT2M"/>U-m<pause dur="PT75S"/>the sceneopens up<pause dur="PT50S"/> with <pause dur="PT20S"/> um<pause dur="PT145S"/> you see a tree okay?</u>
@jamescummings 32/81
Linguistic Analysis Customising the TEI Real World TEI
Overlap
Mutt: Have you heard the --Jeff: the election result?Mutt: It's a disaster!
.
......
<u who="#mutt">have you heard the</u><u trans="latching" who="#jeff">the election result</u><u who="#mutt">its a disaster</u><u who="#jeff" trans="overlap">its a miracle</u>
@jamescummings 33/81
Linguistic Analysis Customising the TEI Real World TEI
Synchronization
.
......
<u who="#mutt">have you heard <anchor synch="#t1"/>the</u><u who="#jeff" synch="#t1">the election result</u><u who="#mutt" synch="#t2">its a disaster</u><u who="#jeff" synch="#t2">its a miracle</u><!-- Elsewhere in Document --><timeline origin="#t1"><when xml:id="t1"/><when xml:id="t2"/></timeline>
<timeline> (時間的なまとまりを示すために,発話テキストの要素をリンクすることができる,時間軸上の順序付き時点の集合を示す)<when> (同じ要素timeline中にある他の要素に対応する時点,または絶対的な時点を示す)
@jamescummings 34/81
Linguistic Analysis Customising the TEI Real World TEI
Using Elements Seen Elsewhere...
.
......
<u><del type="truncation">s</del>see<del type="repetition">you you</del> you know<del type="falseStart">it's</del> he's crazy
</u>
.
......<gap reason="passing truck" extent="5" unit="s"/>
.
......
<u who="#P1">I proposed that <foreign xml:lang="de"> wirkönnen <pause dur="PT1S"/> vielleicht </foreign> go towarsaw and <emph>vienna</emph></u>
@jamescummings 35/81
Linguistic Analysis Customising the TEI Real World TEI
Metadata: Participant Description.
......
<particDesc><listPerson><person xml:id="P-1234" sex="2" age="mid"><p>Female informant, well-educated, born in Shropshire
UK, 12 Jan 1950, of unknown occupation. Speaks Frenchfluently. Socio-Economic status B2.</p></person><person xml:id="P-4332" sex="1"><persName><surname>Hancock</surname><forename>Antony</forename><forename>Aloysius</forename><forename>St John</forename></persName><residence notAfter="1959"><address><street>Railway Cuttings</street><settlement>East Cheam</settlement></address></residence><occupation>comedian</occupation></person></listPerson></particDesc>
@jamescummings 36/81
Linguistic Analysis Customising the TEI Real World TEI
Metadata: <scriptStmt> Example
<scriptStmt> (発話テキストで使われている台本の詳細に関する引用を示す).
......
<sourceDesc><scriptStmt xml:id="CNN12"><bibl><author>CNN Network News</author><title>News headlines</title><date when="1991-06-12">12 Jun 91</date></bibl></scriptStmt></sourceDesc>
@jamescummings 37/81
Linguistic Analysis Customising the TEI Real World TEI
Metadata: <recordingStmt> Example
<recordingStmt> (発話テキストの転記の元になる録音,録画されたものを示す).
......
<recordingStmt><recording type="audio" dur="P30M"><respStmt><resp>Location recording by</resp><orgName>Sound Services Ltd.</orgName></respStmt><equipment><p>Multiple close microphones mixed down to stereo
Digital Audio Tape, standard play, 44.1 KHz samplingfrequency</p></equipment><date>12 Jan 1987</date></recording></recordingStmt>
@jamescummings 38/81
Linguistic Analysis Customising the TEI Real World TEI
Detailed <recording><recording>(発話テキストの元資料として使われる,直接または放送から事象を録音,録画したものの詳細を示す).
......
<recording type="audio" dur="P10M"><equipment><p>Recorded from FM Radio to digital tape</p></equipment><broadcast><bibl><title>Interview on foreign policy</title><author>BBC Radio 5</author><respStmt><resp>interviewer</resp><name>Robin Day</name></respStmt><respStmt><resp>interviewee</resp><name>Margaret Thatcher</name></respStmt></bibl></broadcast></recording>
@jamescummings 39/81
Linguistic Analysis Customising the TEI Real World TEI
Customising the TEI
.
......Every use of the TEI involves making use of a customisation.
@jamescummings 40/81
Linguistic Analysis Customising the TEI Real World TEI
Some terminologyThe TEI encoding scheme consists of a number of modulesEach module contains a number of element specificationsEach element specification contains:
a canonical name (<gi>) for the element, and optionally othernames in other languagesa canonical description (also possibly translated) of its functiona declaration of the classes to which it belongsa definition for each of its attributesa definition of its content modelusage examples and notes
a TEI schema specification (<schemaSpec>) is made byselecting modules or elements and (optionally) modifying theircontentsa TEI document containing a schema specification is called anODD (One Document Does it all)
@jamescummings 41/81
Linguistic Analysis Customising the TEI Real World TEI
What is a module?
A convenient way of grouping together a number of elementdeclarationsThese are usually on a related topic or specific applicationMost chapters of P5 focus on elements drawn from a singlemodule, which that chapter then definesA TEI Schema is created by selecting modules and adding orremoving elements from them as needed
@jamescummings 42/81
Linguistic Analysis Customising the TEI Real World TEI
Which modules exist?Module name Chapteranalysis Simple Analytic Mechanismscertainty Certainty and Responsibilitycore Elements Available in All TEI Documentscorpus Language Corporadictionaries Dictionariesdrama Performance Textsfigures Tables, Formulae, and Graphicsgaiji Non-standard Characters and Glyphsheader The TEI Headeriso-fs Feature Structureslinking Linking, Segmentation, and Alignmentmsdescription Manuscript Descriptionnamesdates Names, Dates, People, and Placesnets Graphs, Networks, and Treesspoken Transcriptions of Speechtagdocs Documentation Elementstei The TEI Infrastructuretextcrit Critical Apparatustextstructure Default Text Structuretranscr Representation of Primary Sourcesverse Verse
@jamescummings 43/81
Linguistic Analysis Customising the TEI Real World TEI
How do you choose?
Just choose everything (not really a good idea)The TEI provides a small set of predefined combinations (TEILite, TEI Bare...)Or you could roll your own (but then you need to know whatyou’re choosing)
Roma a command line script, with a web front end,designed to make this process much easier
http://www.tei-c.org/Roma/
@jamescummings 44/81
Linguistic Analysis Customising the TEI Real World TEI
Roma: New
@jamescummings 45/81
Linguistic Analysis Customising the TEI Real World TEI
Roma: Customize
@jamescummings 46/81
Linguistic Analysis Customising the TEI Real World TEI
Roma: Schema
@jamescummings 47/81
Linguistic Analysis Customising the TEI Real World TEI
Roma: Documentation
@jamescummings 48/81
Linguistic Analysis Customising the TEI Real World TEI
What did we just do?We processed a pre-existing ODD file which contained (as well assome discursive prose) the following schema specification:.
......
<schemaSpec ident="tei_bare" start="TEI"><moduleRef key="core"/><moduleRef key="tei"/><moduleRef key="header"/><moduleRef key="textstructure"/><elementSpec ident="abbr" mode="delete" module="core"/><elementSpec ident="add" mode="delete" module="core"/><!-- ... --><elementSpec ident="trailer" mode="delete" module="textstructure"/><elementSpec ident="title" mode="change" module="core"><attList><attDef ident="level" mode="delete"/></attList></elementSpec><!-- ... --></schemaSpec>
We selected four modules, deleted loads of elements, and alsodeleted an attribute.
@jamescummings 49/81
Linguistic Analysis Customising the TEI Real World TEI
Roma provides an interface to the detail
The [Modules] tab shows the modules availableSelecting a module from it shows the elements within thatmodule, and gives you the choice to
include all of them (and then remove some)exclude all of them (and then put back the ones you want)
You can also change an element’s attribute list, and thevalues they permit
@jamescummings 50/81
Linguistic Analysis Customising the TEI Real World TEI
Roma: Modules
@jamescummings 51/81
Linguistic Analysis Customising the TEI Real World TEI
Roma: Change Module
@jamescummings 52/81
Linguistic Analysis Customising the TEI Real World TEI
The ODD advantageWe can express these constraints in our ODD meta-schema, andthen generate a formal schema to enforce them using whicheverschema language we like.
TEI schemas can be generated inISO RELAX NG languageW3C Schema LanguageXML DTD language
ODD itself defines an element’s content models using a subsetof RELAX NG syntaxDatatypes are defined in terms of W3C datatypesSome facilities (e.g. alternation, namespaces) cannot beexpressed in DTDs — RELAX NG schema is recommendedAdditional constraints stored in the TEI ODD can beexpressed in Schematron
@jamescummings 53/81
Linguistic Analysis Customising the TEI Real World TEI
Roma: selecting attributes
@jamescummings 54/81
Linguistic Analysis Customising the TEI Real World TEI
Roma: constraining attribute values
@jamescummings 55/81
Linguistic Analysis Customising the TEI Real World TEI
What did we just do?Our ODD now includes something like this:.
......
<elementSpec ident="div" mod-ule="textstructure" mode="change"><attList><attDef ident="type" mode="change" usage="req"><valList type="closed" mode="replace"><valItem ident="prose"/><valItem ident="verse"/><valItem ident="drama"/>
<!-- ... --></valList></attDef></attList></elementSpec>
Note that we can also add documentation to the ODD:.
......
<valItem ident="verse"><gloss>contains (parts of ) a poem</gloss></valItem>
@jamescummings 56/81
Linguistic Analysis Customising the TEI Real World TEI
Real World TEI
Main website for the TEI communityRuns version of Roma, OxGarage, etc.Maintains a vault of old material and in /Vault/P5/ of all P5releasesURL: http://www.tei-c.org/
@jamescummings 57/81
Linguistic Analysis Customising the TEI Real World TEI
http://www.tei-c.org/
@jamescummings 58/81
Linguistic Analysis Customising the TEI Real World TEI
http://www.tei-c.org/release/doc/tei-p5-doc/ja/html/index.html
@jamescummings 59/81
Linguistic Analysis Customising the TEI Real World TEI
http://www.tei-c.org/release/doc/tei-p5-doc/ja/html/MS.html
@jamescummings 60/81
Linguistic Analysis Customising the TEI Real World TEI
http://www.tei-c.org/release/doc/tei-p5-doc/ja/html/MS.html#msdates
@jamescummings 61/81
Linguistic Analysis Customising the TEI Real World TEI
http://www.tei-c.org/release/doc/tei-p5-doc/ja/html/ref-history.html
@jamescummings 62/81
Linguistic Analysis Customising the TEI Real World TEI
Other sources of information
In addition to the TEI-C Website, the TEI Community has varioustools and sources of information:
an active and extremely helpful mailing list TEI-La detailed wiki http://wiki.tei-c.org/a SourceForge project tei.sourceforge.netwhere you can see allthe work that is done, report bugs, request features, etc.TEI-C Stylesheets (transform to HTML, PDF, ePub, etc.)TEI By Example: http://tbe.kantl.be/TBE/
@jamescummings 63/81
Linguistic Analysis Customising the TEI Real World TEI
There are also tools like: OxGarage
An experimental RESTful web service to managetransformations between various formatsUses TEI P5 XML as a pivot formatJoins conversions together into a pipelineCan be installed locally (but non-trivial)http://www.oucs.ox.ac.uk/oxgarage/
@jamescummings 64/81
Linguistic Analysis Customising the TEI Real World TEI
@jamescummings 65/81
Linguistic Analysis Customising the TEI Real World TEI
@jamescummings 66/81
Linguistic Analysis Customising the TEI Real World TEI
TEI Wiki
Open to the TEI community for editingSample files, Stylesheets, CustomisationsFAQs, SIGs work, Information on ToolsAnd much more...http://wiki.tei-c.org/
@jamescummings 67/81
Linguistic Analysis Customising the TEI Real World TEI
http://wiki.tei-c.org/index.php/Main_Page
@jamescummings 68/81
Linguistic Analysis Customising the TEI Real World TEI
TEI on SourceForge
Main place of work for the TEI Technical Council (aside frompublicly archived mailing list)Place to submit bugs and feature requestsPlace to download the official releasesWhere the TEI Subversion repository is (up-to-minutechanges)http://tei.sourceforge.net/
@jamescummings 69/81
Linguistic Analysis Customising the TEI Real World TEI
http://tei.sourceforge.net
@jamescummings 70/81
Linguistic Analysis Customising the TEI Real World TEI
Oxford Corpus of Old JapaneseOCOJ is a long-term research project to create an annotatedcorpus of all Old Japanese textsThe texts come primarily from the Asuka and Nara periods(7-8th Century AD)Parallel romanization in a phonemic transcription forphonology of Old JapaneseOCOJ uses the Frellesvig & Whitman system of transcriptionLed by Professor Bjarke Frellesvig with Dr Stephen WrightHorn and Dr Kerri L Russell from Oxford’s Faculty of OrientalStudiesCreated in collaboration with NINJAL in TokyoWhat I did: converted MS Word transcriptions (andtransliterations) to richly marked up TEI XML
.
......http://vsarpj.orinst.ox.ac.uk/corpus/
@jamescummings 71/81
Linguistic Analysis Customising the TEI Real World TEI
OCOJ Example
.
......
”It should be better to drink saké and weep drunkenly than talkingin a clever fashion”「さかしみと 物言ふよりは 酒飲みて 酔い泣きするし まさりたるらし」
@jamescummings 72/81
Linguistic Analysis Customising the TEI Real World TEI
Oxford Corpus of Old Japanese
@jamescummings 73/81
Linguistic Analysis Customising the TEI Real World TEI
OCOJ Markup Example
@jamescummings 74/81
Linguistic Analysis Customising the TEI Real World TEI
OCOJ Markup Example
@jamescummings 75/81
Linguistic Analysis Customising the TEI Real World TEI
William Godwin Diary
William Godwin:1756-1836, philosopher, writer, political activist,husband of Mary Wollstonecraft, father of Mary Godwin (akaMary Wollstonecraft Shelley).
Project: Led by the Politics Department: Dr Mark Philp, DrDavid O’Shaughnessy, two students + othersObjectives include:
provide a browsable/searchable transcription with digitalimages of the diaryresearch to identify people mentioned (over 64000 nameinstances) in the 48 years of diary;
http://godwindiary.bodleian.ox.ac.uk/
@jamescummings 76/81
Linguistic Analysis Customising the TEI Real World TEI
William Godwin Diary
@jamescummings 77/81
Linguistic Analysis Customising the TEI Real World TEI
Wandering Jew’s Chronicle Archive
@jamescummings 78/81
Linguistic Analysis Customising the TEI Real World TEI
Workshop ConclusionThis workshop has just quickly seen some of the basics of the TEI,but the TEI Guidelines are all freely available online for you to readat a much slower pace!Finally(!) Remember that:
All course materials including:All slides from lecturesAll exercisesAll materials for the exercises
are available at: http://tei.it.ox.ac.uk/Talks/All the slides, exercises, and some materials are licensed witha Creative Commons Attribution license, which means theyare freely available for re-use (though do let us know!)Email: [email protected] or the TEI-L mailing list
@jamescummings 79/81
Linguistic Analysis Customising the TEI Real World TEI
After the workshop...
After the workshop, if you have questions about:The workshop materials or teaching other workshops:[email protected] TEI generally by joining: [email protected]
If you mail the TEI-L mailing list it is better because:we’ll still try to answer as well as we would privatelyyou get answers not only from us, but TEI experts around theworldquestions from those of all levels of ability stop the listbecoming too technicaleveryone benefits from having the answers be public – and youbenefit by reading (and sometimes answering!) others’problems
@jamescummings 80/81
Linguistic Analysis Customising the TEI Real World TEI
Questions?
[email protected] mailing [email protected] ask now if we have time!
@jamescummings 81/81