digital archives at the national library of medicine a presentation at the mla session lighting the...

47
Digital Archives at Digital Archives at the National Library the National Library of Medicine of Medicine A presentation at the MLA Session A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real Lighting the Path: Digital Repositories in the Real World World May 24, 2004 May 24, 2004 by Diane Boehr by Diane Boehr Cataloging Unit Head, National Library of Medicine, Cataloging Unit Head, National Library of Medicine, National Institutes of Health, National Institutes of Health, Health & Human Services Health & Human Services [email protected] [email protected]

Upload: roger-mcbride

Post on 02-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

Digital Archives at the Digital Archives at the National Library of National Library of

MedicineMedicineA presentation at the MLA SessionA presentation at the MLA Session

Lighting the Path: Digital Repositories in the Real Lighting the Path: Digital Repositories in the Real WorldWorld

May 24, 2004May 24, 2004by Diane Boehrby Diane Boehr

Cataloging Unit Head, National Library of Medicine, Cataloging Unit Head, National Library of Medicine, National Institutes of Health, National Institutes of Health,

Health & Human ServicesHealth & Human [email protected]@mail.nlm.nih.gov

Page 2: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

ScopeScope

Historical medical worksHistorical medical works The NLM ArchiveThe NLM Archive PubMed CentralPubMed Central

Page 3: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

Considerations as you begin a Considerations as you begin a projectproject

It will take much longer than you It will take much longer than you anticipateanticipate

You will learn a great deal about topics You will learn a great deal about topics outside your normal work dutiesoutside your normal work duties

Be willing to take baby steps and make Be willing to take baby steps and make a starta start

It is very rewarding to see the fruits of It is very rewarding to see the fruits of your laboryour labor

Page 4: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

HMD ProjectsHMD Projects

Historical AnatomiesHistorical Anatomies Medicine in the AmericasMedicine in the Americas

Page 5: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

Historical AnatomiesHistorical Anatomies

http://www.nlm.nih.gov/exhibition/http://www.nlm.nih.gov/exhibition/historicalanatomies/home.html historicalanatomies/home.html

Provides high-resolution downloadable Provides high-resolution downloadable scans of selected important images from scans of selected important images from illustrated anatomical atlases dating from illustrated anatomical atlases dating from the 15th to the 20th century the 15th to the 20th century

Titles and images selected by Michael Titles and images selected by Michael North, Head of Rare Books and Early North, Head of Rare Books and Early ManuscriptsManuscripts

Page 6: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

Historical AnatomiesHistorical Anatomies

Consists of large JPEGs andConsists of large JPEGs and zoomable zoomable digitized images from the books and digitized images from the books and a brief bibliographical and historical a brief bibliographical and historical introduction to each title introduction to each title

Page 7: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

Technical detailsTechnical details

The imaging for this project is contracted The imaging for this project is contracted outout

The contractor makes archival quality TIFF The contractor makes archival quality TIFF files (800 ppi resolution) and from that, files (800 ppi resolution) and from that, thumbnail and JPEG images are made for thumbnail and JPEG images are made for the site, using Adobe Photoshopthe site, using Adobe Photoshop

Zoomifyer Pro is used to create the pan Zoomifyer Pro is used to create the pan and zoom imagesand zoom images

The TIFF files are backed up on CD-ROMsThe TIFF files are backed up on CD-ROMs

Page 8: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

Search and retrievalSearch and retrieval

Individual images do not have any Individual images do not have any metadata associated with them at this metadata associated with them at this time time

Bibliographic citations on the site match Bibliographic citations on the site match the LocatorPlus recordsthe LocatorPlus records

As the focus of the site is selected As the focus of the site is selected individual images from the books, rather individual images from the books, rather than the entire text, there are currently no than the entire text, there are currently no links from the LocatorPlus records for the links from the LocatorPlus records for the individual titles to images on the Web site individual titles to images on the Web site

Page 9: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

Sample screenSample screen

Page 10: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

Medicine in the AmericasMedicine in the Americas

Monographic original source Monographic original source materials on the development of materials on the development of medicine in New World published medicine in New World published prior to 1914 are being digitized in prior to 1914 are being digitized in their entirety their entirety

(http://www.ncbi.nlm.nih.gov/(http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Books)entrez/query.fcgi?db=Books)

Page 11: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

Technical detailsTechnical details

Digitizing is being done in-house Digitizing is being done in-house Books are scanned, and from the initial scan Books are scanned, and from the initial scan

a photocopy and a TIFF file are createda photocopy and a TIFF file are created Photocopies are scanned to create OCR Word Photocopies are scanned to create OCR Word

text files, which are then manually reviewed text files, which are then manually reviewed and cleaned up to create a searchable, and cleaned up to create a searchable, downloadable PDF textdownloadable PDF text in modern font in modern font

TIFF file is used to create the typeface and TIFF file is used to create the typeface and layout of the original published worklayout of the original published work

Page 12: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

Technical detailsTechnical details

Mounting of these texts on the Web and Mounting of these texts on the Web and the XML coding of the Word files done the XML coding of the Word files done using the NLM Bookshelf platform using the NLM Bookshelf platform

Bookshelf developed by NCBI for medical Bookshelf developed by NCBI for medical texts supplied by publishers in SGML, or texts supplied by publishers in SGML, or other desktop publishing formatsother desktop publishing formats

Platform has an existing template that Platform has an existing template that allows the record creators to easily input allows the record creators to easily input metadata without needing to know XML metadata without needing to know XML

Page 13: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

Search and RetrievalSearch and Retrieval

Bookshelf site only supports keyword Bookshelf site only supports keyword searching searching

Standard bibliographic data from Standard bibliographic data from LocatorPlus and brief historical data LocatorPlus and brief historical data is included with the text is included with the text Catalog records have hot links to the Catalog records have hot links to the

Bookshelf siteBookshelf site

Page 14: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,
Page 15: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,
Page 16: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,
Page 17: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,
Page 18: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

TimeframesTimeframes

Both projects went from planning to Both projects went from planning to implementation in about one year, implementation in about one year, although both projects will be adding although both projects will be adding more material to their sitesmore material to their sites

Use of standard, off the shelf Use of standard, off the shelf products or existing technologies products or existing technologies made implementation easiermade implementation easier

Page 19: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

NLM ArchivesNLM Archives

A site to store material of permanent A site to store material of permanent value that has been published on the value that has been published on the NLM Web site, but is now outdated or NLM Web site, but is now outdated or supersededsuperseded

Searchable, yet clearly distinguished Searchable, yet clearly distinguished from current materialfrom current material

Page 20: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

What do we mean by What do we mean by permanent?permanent?

Three aspects to permanence were Three aspects to permanence were identified:identified: 1) Identifier validity: The extent to which 1) Identifier validity: The extent to which

the given name or identifier will always the given name or identifier will always provide access to the same resourceprovide access to the same resource

2) Resource availability: The extent to 2) Resource availability: The extent to which a given resource is guaranteed to which a given resource is guaranteed to remain available in electronic formremain available in electronic form

3) Content invariability: The extent to which 3) Content invariability: The extent to which the content of the resource could change the content of the resource could change

Page 21: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

NLM Permanence RatingsNLM Permanence Ratings

Four categories of permanence have Four categories of permanence have been defined:been defined: 1) Permanent, unchanging content: 1) Permanent, unchanging content:

NLM has made a commitment to keep NLM has made a commitment to keep this resource permanently available. Its this resource permanently available. Its identifier will always provide access to identifier will always provide access to the resource. Its content will not the resource. Its content will not change. change.

Page 22: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

NLM Permanence RatingsNLM Permanence Ratings

2) Permanent, stable content: NLM has 2) Permanent, stable content: NLM has made a commitment to keep this made a commitment to keep this resource permanently available. Its resource permanently available. Its identifier will always provide access to identifier will always provide access to the resource. Its content is subject only the resource. Its content is subject only to minor corrections or additions.to minor corrections or additions.

Page 23: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

NLM Permanence RatingsNLM Permanence Ratings

3) Permanent, dynamic content: NLM has 3) Permanent, dynamic content: NLM has made a commitment to keep this resource made a commitment to keep this resource permanently available. Its identifier will permanently available. Its identifier will always provide access to the resource. Its always provide access to the resource. Its content could be revised, replaced. content could be revised, replaced.

Page 24: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

NLM Permanence RatingsNLM Permanence Ratings

4) Permanence not guaranteed: 4) Permanence not guaranteed: NLM has made no commitment to NLM has made no commitment to retain this resource. It could become retain this resource. It could become unavailable at any time. Its identifier unavailable at any time. Its identifier could be changed. could be changed.

Page 25: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

WorkflowsWorkflows

Permanence ratings are assigned when a Permanence ratings are assigned when a resource is promoted to the NLM Web resource is promoted to the NLM Web sitesite

Default permanence ratings are Default permanence ratings are generated based on the category to generated based on the category to which the resource belongs which the resource belongs

Resource creators use a template which Resource creators use a template which adds basic metadata, in addition to the adds basic metadata, in addition to the category and permanence rating category and permanence rating

Page 26: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

TemplatesTemplates

Metadata input template is a feature Metadata input template is a feature of TeamSite, our Web content of TeamSite, our Web content management softwaremanagement software

No knowledge of HTML is needed to No knowledge of HTML is needed to use these templatesuse these templates

Minimal set of required fields, with Minimal set of required fields, with default values or drop-down menus default values or drop-down menus supplied wherever possiblesupplied wherever possible

Page 27: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

Required metadataRequired metadata

1) Title 7) Rights

2) Heading 8) Contact e-mail

3) Date first published 9) Language

4) Date last modified 10) Document category

5) Next scheduled review date

11) Permanence level

6) Publisher 12) URL

Page 28: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,
Page 29: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

The NLM metadata set is based on The NLM metadata set is based on Dublin Core, with some local Dublin Core, with some local adaptationsadaptations

The full scheme may be seen atThe full scheme may be seen at http://www.nlm.nih.gov/tsd/cataloging/http://www.nlm.nih.gov/tsd/cataloging/

metafilenew.htmlmetafilenew.html

Page 30: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

WorkflowsWorkflows

Every resource has the minimal metadata Every resource has the minimal metadata assigned by the resource creatorassigned by the resource creator

Permanent resources are routed to the Permanent resources are routed to the Cataloging Section Cataloging Section Complete MARC bibliographic records are createdComplete MARC bibliographic records are created Includes standardized access points, including Includes standardized access points, including

MeSH and an NLM classification numberMeSH and an NLM classification number Accessible in LocatorPlusAccessible in LocatorPlus Distributed to the utilities and other NLM Distributed to the utilities and other NLM

licensees. licensees.

Page 31: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

WorkflowsWorkflows

The enhanced metadata created in The enhanced metadata created in Cataloging is then added back to the Cataloging is then added back to the header information of the online header information of the online resource resource

Preliminary metadata and the Preliminary metadata and the enhanced versions can be seen by enhanced versions can be seen by clicking on "View source"clicking on "View source"

Page 32: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,
Page 33: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,
Page 34: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

Basic metadataBasic metadata

Page 35: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,
Page 36: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

Enhanced metadataEnhanced metadata

Page 37: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

Archive DesignArchive Design

Separate, distinct, but integral part Separate, distinct, but integral part of the NLM Web site of the NLM Web site

Searchable with standard NLM Searchable with standard NLM search software: Mindserver from search software: Mindserver from RecommindRecommind

Page 38: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

Archive contentsArchive contents

Out-of-date resources--older material Out-of-date resources--older material that was once up on the site, but is that was once up on the site, but is no longer of current interestno longer of current interest

Earlier versions of current documents Earlier versions of current documents that have undergone major revisions that have undergone major revisions

Page 39: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,
Page 40: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,
Page 41: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,
Page 42: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

Still to comeStill to come

Archiving non-HTML files, such as Archiving non-HTML files, such as PDF, video and audio clips, etc. PDF, video and audio clips, etc.

Archiving resources from areas in the Archiving resources from areas in the library which do not get promoted library which do not get promoted through TeamSitethrough TeamSite

Page 43: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

Impact on CatalogingImpact on Cataloging

PubMed Central (PMC)PubMed Central (PMC) A bibliographic record must exist in the NLM A bibliographic record must exist in the NLM

catalog before a journal is added to PMCcatalog before a journal is added to PMC Records must be created if the title is not Records must be created if the title is not

already in the catalogalready in the catalog Downloaded from OCLCDownloaded from OCLC Skeletal record created from local templateSkeletal record created from local template High-priority, 24 hr. turnaround timeHigh-priority, 24 hr. turnaround time

Records are then fully catalogedRecords are then fully cataloged

Page 44: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

Impact on CatalogingImpact on Cataloging

PMCPMC If the title is already in the catalog, If the title is already in the catalog,

holdings must be updatedholdings must be updated Indicate the title is available in PMCIndicate the title is available in PMC Range of issuesRange of issues Any embargo periodsAny embargo periods

Page 45: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

Impact on CatalogingImpact on Cataloging

NLM ArchiveNLM Archive Cataloger creates core level MARC records for Cataloger creates core level MARC records for

any new resource on the NLM Web site rated any new resource on the NLM Web site rated PermanentPermanent

View the site, as well as utilize metadata supplied by View the site, as well as utilize metadata supplied by record creator for descriptive datarecord creator for descriptive data

Supply MeSH and NLM classificationSupply MeSH and NLM classification Establish authorized name headings in the national Establish authorized name headings in the national

authority fileauthority file Transfer this enhanced metadata back to the Transfer this enhanced metadata back to the

resource resource

Page 46: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

Impact on CatalogingImpact on Cataloging

HMD projectsHMD projects Minimal impact on CatalogingMinimal impact on Cataloging

Books being digitized already have records Books being digitized already have records in the catalogin the catalog

HMD has its own cataloging staff who can HMD has its own cataloging staff who can make links between existing catalog records make links between existing catalog records and digitized materialand digitized material

Page 47: Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

Impact on CatalogingImpact on Cataloging

Despite the increased workload, we Despite the increased workload, we think archiving projects are think archiving projects are enhanced when catalogers are enhanced when catalogers are involved in the projectsinvolved in the projects

Catalogers increase their knowledge Catalogers increase their knowledge by becoming involved in these by becoming involved in these projects projects