harvesting hathitrust documents: a new model for online access
DESCRIPTION
Brown, Christopher C. “Harvesting HathiTrust Documents: A New Model for Online Access.” Presentation given at the 2011 Missouri Government Documents Conference, 7 June 2011, Columbia, MO.TRANSCRIPT
Christopher C. BrownUniversity of Denver, Penrose Library
(303) [email protected]
2011 Missouri Government Documents Conference
Harvesting HathiTrust Documents: A New Model for Online Access
This presentation will show how Encore harvesting can be used to mitigate a space problem in a library, substituting online access for the need for physical access to the collection. The government documents collection will be the primary focus.Note: Encore is the next-generation catalog interface produced by Innovative Interfaces, Inc.
DR, IR, Digital Texts
Inbound HarvestingOutbound Harvesting
Collection Downsizing?
Malpas, Constance. 2011. Cloud-sourcing Research Collections: Managing Print in the Mass-digitized Library Environment. Dublin, Ohio: OCLC Research. http://www.oclc.org/research/publications/library/2011/2011-01.pdf.
About University of DenverDepository since 1909Historically a 70-75% selectiveNow a 4.8% selective, but receive
100% of online catalogingAdding URLs to historic
documentsCurrently 100% of our paper
documents are in storageWe are remodeling our library.
Under the remodeling plan, all docs will remain in remote storage.
Partial Solution: Using Encore for Outbound Harvesting
All documents off-siteOur users are accustomed to using
electronic documentsNeed to divert attention away from physical
collection holdingsEncore harvesting of Hathi Trust can do this
PD = where docs generally live
Hathi Trust AttributesFrom: http://www.hathitrust.org/rights_database
Sampling MethodI wanted to see how many government
documents were in our Hathi Trust harvestLimit to Hathi Trust for a given yearExamine first result on each page of 25
results (4% of results) [limitation: Encore only displays first 1,000 results]
Harvesting Hathi Docs: The Stats
Date Range Hathi Totals
Hathi All Pub Domain
pdus + pd Hathi pdus DU pd Harvest Docs Sampling2000-2009 505,682 14,140 726 13,369 13,340 99.78%1990-1999 709,214 29,163 880 28,164 26,662 94.67%1980-1989 723,657 33,753 1,204 32,321 31,370 97.06%1970-1979 631,110 28,633 2,046 26,189 25,607 97.78%1960-1969 546,914 21,244 1,987 18,991 7,668 40.38%1950-1959 281,615 20,861 863 19,893 3,888 19.54%1940-1949 184,755 17,096 600 16,253 3,771 23.21%1930-1939 175,103 16,237 654 15,317 2,600 16.97%1920-1929 175,226 66,563 27,108 28,854 1,529 5.30%1910-1919 175,148 169,923 75,955 61,230 4,124 6.73%1900-1909 179,018 153,284 70,900 47,999 2,265 4.72%1890-1899 112,295 110,605 50,502 34,742 596 1.72%1880-1889 83,950 82,809 38,928 23,855 699 2.93%1870-1879 58,624 57,826 27,202 17,751 319 1.80%1860-1869 50,907 50,337 2,273 45,790 248 0.54%
4,593,218 872,474 301,828 430,718 124,686 28.95%
Statistics as of mid-March, 2011The Docs Sampling columns show the estimated numbers of docs per year and the estimated percentage of docs per year from the Harvest
Malpas: Docs about 3% of Hathi Total and 15% of Public Domain
GovDocs: 3% overall
GovDocs: 15% of Public Domain
Hathi Docs Usage in Proportion to Docs Distribution
Sources: 1895-1976 data: Monthly Catalog, 1895-1976 (ProQuest);1976 onward data: CGP
% Docs in HathiTrust (est.)
Hathi Docs Links Provide Access to Docs in Storage
Stripped-Out Fields
008 fixed field data
650 subfields other than “a”
500 notes5xx shipping list info
300 subfields after “a”
086 SuDocs number
Use Stats for Hathi Trust?
•Statistics for all Hathi Trust records accessed, not just documents•Spikes in usage are docs librarian (my) testing, not real users
Statistics from Google Analytics
Harvesting with Summon
Summon Harvesting of HathiTrust
ConclusionsDocuments content in HathiTrust can
provide a suitable surrogate for a limited subset of documents, but not a wholesale replacement.
HathiTrust documents can be used as surrogates for selected titles, especially larger serial runs. But it is difficult at this time to isolate those titles.
HathiTrust is definitely worth harvesting into local catalogs or other digital repositories.