digital medieval data curation

Post on 18-Jul-2015

101 Views

Category:

Education

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Digital Medieval Data Curation

CLIR Postdoctoral Fellowship SeminarBryn Mawr, 2013Benjamin Albritton, Stanford University Librariesblalbrit@stanford.edu@bla222

Current State: A World of Silos

Roman de la Rose Parker on the Web e-codices And so on…

Data Interoperability

• Break down silos

• Separate data from applications

• Share data models and programming interfaces

• Enable interactions at the tool and repository level

Designing Modular Repositories and

Tools

Image Data (Canonical)

Image

Viewer

Discovery

Annotation

Non-image data (Canonical)

Transcription

Image Viewer

Image

AnalysisDiscovery Tool X?

Repository

Repository

User

Interface

3rd-Party

Tools

Image Data (Canonical)

Image

Viewer

Discovery

Annotation

Non-image data (Canonical)

Transcription

Image Viewer

Image

AnalysisDiscovery Tool X?

Repository

Repository

User

Interface

3rd-Party

Tools

Designing Modular Repositories and

Tools

Image Data (Canonical)

Image

Viewer

Discovery

Annotation

Non-image data (Canonical)

Transcription

Image Viewer

Image

AnalysisDiscovery Tool X?

Designing Modular Repositories and

Tools

Iterative Interactions

Multiple Data Sources

• Existing structured data (catalogs)

• User-added

– Comments

– Transcriptions

– Etc.

• Digital images

• Machine processing

Motivating Questions

What does this mean for medieval data?

• How do we rethink medieval object data in a shared, distributed, global space?

• How do we enable collaboration and encourage engagement?

• How do we deal with tools that are producing new data on digital surrogates that are implicitly about a real world object?

Transcribing from Digital Surrogates

La Terre de Secille

Naïve Approach: Attach Transcription to ImageOne problem example: Multiple Representations

CCC 26 f. iiiR

Naïve Approach: Attach Transcription to ImageOne problem example: Multiple Representations

CCC 26 f. iiiR Fold A Open

Naïve Approach: Attach Transcription to ImageOne problem example: Multiple Representations

CCC 26 f. iiiR Fold A Open Fold A and B Open

Naïve Approach: Attach Transcription to ImageOne problem example: Multiple Representations

CCC 26 f. iiiR Fold A Open Fold A and B Open f. iiiV

The Shared Canvas

• Represents a real world thing we want to “talk” about

• Has a unique name• http://dms-data.stanford.edu/Parker/CCC026/canvas-12

Data Model: SharedCanvas

http://www.shared-canvas.org

Data is “about” a real thing

Canvas Paradigm• A Canvas is an empty space in which to build up a display

• Makes explicit that the image is a surrogate

Open Annotation Model• Annotation (a document)

• Body (the ‘comment’ of the annotation)

• Target (the resource the Body is ‘about’)

Model: Annotations to Paint Canvas

• The Canvas represents the empty page

• Annotation links Image with Canvas

Model: Annotations to Paint Canvas

• Annotation links Text with Canvas

Model: Annotations to Paint Canvas

Model: Missing Pages

Medieval Data Use-Cases: A Sampler

• Structured data from existing sources

• Transcription and glyphs

• Structured data from new sources

Structured Data from Existing Sources

A Catalog of the Manuscripts of Salisbury Cathedral Library

Drives Discovery

Transcription:T-PEN (Saint Louis University) http://t-pen.org

• Transcription tool

• Provides image parsing

– Columns

BNF fr. 9221 – column parsing

T-PEN (Saint Louis University)http://t-pen.org

• Transcription tool

• Provides image parsing

– Columns

– Lines

BNF fr. 9221 – line parsing

T-PEN (Saint Louis University)http://t-pen.org

BNF fr. 9221 – transcription view

Drives Full-Text Search

http://t-pen.org/TPEN

T-PEN’s PaleoTool

BNF fr. 1586 – glyph parsing

Results for “matching” glyphs

Glyphs with multiple letters

Comparing results across manuscripts

BNF fr. 1586 CCCC 324

User-created Structured Data

Beinecke MS 310, f. 1r

• Each row = 1 day (January 1, here)• Lists the feast of the Circumcision• Optionally provides additional information

Distributed Resources / Distributed Environments

Data capture in T-PEN

http:t-pen.org – Saint Louis University

Front-end: Exhibit

http://guillaumedemachaut.com/kalendar/sharedkalendar.htmlSimple (really simple) Exhibit based on kalendar transcriptions(Exhibit: http://www.simile-widgets.org/exhibit/)

For each record:

Enabling rapid comparison

Two mss. include the entry “Thimotheus apostel”

Distributed Resources / Distributed Environments

SharedCanvas Demo Implementation

http://www.shared-canvas.org/impl/demodh

SharedCanvas Demo Implementation

http://www.shared-canvas.org/impl/demodh

SharedCanvas Demo Implementation

http://www.shared-canvas.org/impl/demodh

A Sea of Manuscript Data• Thousands of manuscripts currently available

interoperably, with more coming rapidly

• Discovery data is a mixed bag

• Tools provide data back into the system that can be re-used

• New data drives new discovery, new interfaces, and new visualization challenges

• Management and manipulation of that “wild” data is a serious challenge

top related