plazi: prospects for markup of legacy and new taxonomic literature terry catapano tdwg fremantle, wa...

12
Plazi: Prospects for Markup of Legacy and New Taxonomic Literature Terry Catapano TDWG Fremantle, WA October 21, 2008

Upload: aubrey-watkins

Post on 18-Jan-2018

217 views

Category:

Documents


0 download

DESCRIPTION

Markup Languages Provides grammar to define document types Delineate & identify document elements (atoms) in text Syntax: Structural relationships between elements (parent/child, cardinality, ordinality, id/idref, key/keyref)‏ Beyond the PDF‏

TRANSCRIPT

Page 1: Plazi: Prospects for Markup of Legacy and New Taxonomic Literature Terry Catapano TDWG Fremantle, WA October 21, 2008

Plazi:Prospects for Markup of Legacy and

New Taxonomic Literature

Terry CatapanoTDWG Fremantle, WA

October 21, 2008

Page 2: Plazi: Prospects for Markup of Legacy and New Taxonomic Literature Terry Catapano TDWG Fremantle, WA October 21, 2008

NSF/DFG Grant (AMNH/University of Karlsruhe)XML Markup of taxonomic publications for extraction of:

Treatments Scientific Names Morphological Characters Distribution Data Collection locales/events

For: Open Access Submission to db's Retrieval Ontology development

Page 3: Plazi: Prospects for Markup of Legacy and New Taxonomic Literature Terry Catapano TDWG Fremantle, WA October 21, 2008

Markup Languages Provides grammar to define document types Delineate & identify document elements (atoms) in text Syntax: Structural relationships between elements

(parent/child, cardinality, ordinality, id/idref, key/keyref) Beyond the PDF

Page 4: Plazi: Prospects for Markup of Legacy and New Taxonomic Literature Terry Catapano TDWG Fremantle, WA October 21, 2008
Page 5: Plazi: Prospects for Markup of Legacy and New Taxonomic Literature Terry Catapano TDWG Fremantle, WA October 21, 2008

TaxonX schemaGolden Gate Editor250 Docs/7500 TreatmentsDSpace-based Digital Object Repository (handles)SRSTAPIR (specimen data)Species Profile Model/RDF (descriptive data)

Page 6: Plazi: Prospects for Markup of Legacy and New Taxonomic Literature Terry Catapano TDWG Fremantle, WA October 21, 2008

Wildly heterogeneousRequires lax structuring of documentsNeed for regularizationRequires editorial policy (reproduction: text of work or text of document) Defers much work of interoperabilityBenefits

Treatments +names, subsections, localities, bibliographic references

Extraction & representation in other services Costs

• GoldenGate configured for testbed: 3 minutes per page• $5 page(?)

Page 7: Plazi: Prospects for Markup of Legacy and New Taxonomic Literature Terry Catapano TDWG Fremantle, WA October 21, 2008

New LiteratureDifferent markup activityDifferent markup activity

Prospective not RetrospectiveMore optimal cost/benefit ratio?

Strict modeling for consistent documents/data Increased regularization Increased sharing, re-use Decreased costs (potentially):

Application QC Adoption

Page 8: Plazi: Prospects for Markup of Legacy and New Taxonomic Literature Terry Catapano TDWG Fremantle, WA October 21, 2008

TDWG Vocabularies supply many conceptsNLM Journal Archiving and Interchange Tag Suite

DTD's for markup of journal articles Archiving, Publishing, Authoring, other modules possible Wide adoption by publishers and aggregators; LOC Actively maintained

Module for taxonomic treatments in Publishing

Page 9: Plazi: Prospects for Markup of Legacy and New Taxonomic Literature Terry Catapano TDWG Fremantle, WA October 21, 2008

Inherit generic features from existing Tag Set Bibliographic references Tables Linking supporting material/data (xlink) Linking to graphic and media objects (xlink)

TreatmentsTreatment sectionsScientific names, Geographic names, Characters/StatesSpecimens and other materials citations

Page 10: Plazi: Prospects for Markup of Legacy and New Taxonomic Literature Terry Catapano TDWG Fremantle, WA October 21, 2008

Plazi: NLM conversion of Zootaxa and PLOS One articlesApply markup at earliest stage possibleDevelop tools to assist (probably easier than for “pure” legacy literature)Extend codes and structures to handle electronic publicationShifts

“illustrated narrative” complex digital objects

METS, OAI-ORE, MPEG-21/DIDL

Page 11: Plazi: Prospects for Markup of Legacy and New Taxonomic Literature Terry Catapano TDWG Fremantle, WA October 21, 2008

Text

Materials Description

Treatment

ImageData

Nomenclature

Page 12: Plazi: Prospects for Markup of Legacy and New Taxonomic Literature Terry Catapano TDWG Fremantle, WA October 21, 2008

Linked Data Machines > Documents > Data

Open documents, free dataReduced costs of use/re-use (e.g., SPM for EOL)Broaden scope of applicationAccelerate velocity of information exchange