digital medieval data curation
TRANSCRIPT
Digital Medieval Data Curation
CLIR Postdoctoral Fellowship SeminarBryn Mawr, 2013Benjamin Albritton, Stanford University [email protected]@bla222
Current State: A World of Silos
Roman de la Rose Parker on the Web e-codices And so on…
Data Interoperability
• Break down silos
• Separate data from applications
• Share data models and programming interfaces
• Enable interactions at the tool and repository level
Designing Modular Repositories and
Tools
Image Data (Canonical)
Image
Viewer
Discovery
Annotation
Non-image data (Canonical)
Transcription
Image Viewer
Image
AnalysisDiscovery Tool X?
Repository
Repository
User
Interface
3rd-Party
Tools
Image Data (Canonical)
Image
Viewer
Discovery
Annotation
Non-image data (Canonical)
Transcription
Image Viewer
Image
AnalysisDiscovery Tool X?
Repository
Repository
User
Interface
3rd-Party
Tools
Designing Modular Repositories and
Tools
Image Data (Canonical)
Image
Viewer
Discovery
Annotation
Non-image data (Canonical)
Transcription
Image Viewer
Image
AnalysisDiscovery Tool X?
Designing Modular Repositories and
Tools
Iterative Interactions
Multiple Data Sources
• Existing structured data (catalogs)
• User-added
– Comments
– Transcriptions
– Etc.
• Digital images
• Machine processing
Motivating Questions
What does this mean for medieval data?
• How do we rethink medieval object data in a shared, distributed, global space?
• How do we enable collaboration and encourage engagement?
• How do we deal with tools that are producing new data on digital surrogates that are implicitly about a real world object?
Transcribing from Digital Surrogates
La Terre de Secille
Naïve Approach: Attach Transcription to ImageOne problem example: Multiple Representations
CCC 26 f. iiiR
Naïve Approach: Attach Transcription to ImageOne problem example: Multiple Representations
CCC 26 f. iiiR Fold A Open
Naïve Approach: Attach Transcription to ImageOne problem example: Multiple Representations
CCC 26 f. iiiR Fold A Open Fold A and B Open
Naïve Approach: Attach Transcription to ImageOne problem example: Multiple Representations
CCC 26 f. iiiR Fold A Open Fold A and B Open f. iiiV
The Shared Canvas
• Represents a real world thing we want to “talk” about
• Has a unique name• http://dms-data.stanford.edu/Parker/CCC026/canvas-12
Data Model: SharedCanvas
http://www.shared-canvas.org
Data is “about” a real thing
Canvas Paradigm• A Canvas is an empty space in which to build up a display
• Makes explicit that the image is a surrogate
Open Annotation Model• Annotation (a document)
• Body (the ‘comment’ of the annotation)
• Target (the resource the Body is ‘about’)
Model: Annotations to Paint Canvas
• The Canvas represents the empty page
• Annotation links Image with Canvas
Model: Annotations to Paint Canvas
• Annotation links Text with Canvas
Model: Annotations to Paint Canvas
Model: Missing Pages
Medieval Data Use-Cases: A Sampler
• Structured data from existing sources
• Transcription and glyphs
• Structured data from new sources
Structured Data from Existing Sources
A Catalog of the Manuscripts of Salisbury Cathedral Library
Drives Discovery
Transcription:T-PEN (Saint Louis University) http://t-pen.org
• Transcription tool
• Provides image parsing
– Columns
BNF fr. 9221 – column parsing
T-PEN (Saint Louis University)http://t-pen.org
• Transcription tool
• Provides image parsing
– Columns
– Lines
BNF fr. 9221 – line parsing
T-PEN (Saint Louis University)http://t-pen.org
BNF fr. 9221 – transcription view
Drives Full-Text Search
http://t-pen.org/TPEN
… and other interfaces
http://stanford.edu/~blalbrit/v-machine-2/samples/DamedequiRF5.xml
T-PEN’s PaleoTool
BNF fr. 1586 – glyph parsing
Results for “matching” glyphs
Glyphs with multiple letters
Comparing results across manuscripts
BNF fr. 1586 CCCC 324
User-created Structured Data
Beinecke MS 310, f. 1r
• Each row = 1 day (January 1, here)• Lists the feast of the Circumcision• Optionally provides additional information
Distributed Resources / Distributed Environments
Data capture in T-PEN
http:t-pen.org – Saint Louis University
Front-end: Exhibit
http://guillaumedemachaut.com/kalendar/sharedkalendar.htmlSimple (really simple) Exhibit based on kalendar transcriptions(Exhibit: http://www.simile-widgets.org/exhibit/)
For each record:
Enabling rapid comparison
Two mss. include the entry “Thimotheus apostel”
Distributed Resources / Distributed Environments
SharedCanvas Demo Implementation
http://www.shared-canvas.org/impl/demodh
SharedCanvas Demo Implementation
http://www.shared-canvas.org/impl/demodh
SharedCanvas Demo Implementation
http://www.shared-canvas.org/impl/demodh
A Sea of Manuscript Data• Thousands of manuscripts currently available
interoperably, with more coming rapidly
• Discovery data is a mixed bag
• Tools provide data back into the system that can be re-used
• New data drives new discovery, new interfaces, and new visualization challenges
• Management and manipulation of that “wild” data is a serious challenge