building the universal library: introducing hathitrust patricia a. steele indiana university...
TRANSCRIPT
![Page 1: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/1.jpg)
Building the Universal Library: Introducing HathiTrust
Patricia A. SteeleIndiana University Libraries
John Price WilkinUniversity of Michigan Libraries
December 8, 2008
![Page 2: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/2.jpg)
www.hathitrust.org
The Vision
![Page 3: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/3.jpg)
www.hathitrust.org
The Reasons• Google Digitization Project• Collective Agreement with CIC Announced in
June 2007– U of Michigan and U of Wisconsin Projects already
underway
![Page 4: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/4.jpg)
www.hathitrust.org
• Librarians value preservation– How to ensure digital files are preserved?
The Reasons
![Page 5: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/5.jpg)
www.hathitrust.org
The Reasons• Librarians value access
– How to create a comprehensiveand coherent body of materials?
• Librarians believe in cooperation– How do you achieve a common
goal?
![Page 6: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/6.jpg)
www.hathitrust.org
The Beginning• In 2007, CIC agreed to establish a shared
digital repository• University of Michigan and Indiana University
initial leaders of this effort
![Page 7: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/7.jpg)
www.hathitrust.org
The Beginning
![Page 8: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/8.jpg)
www.hathitrust.org
The Name• The name… hathitrust.org
hathi.org
olifant.org
silverback.org
kingkong.org
toomai.org
![Page 9: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/9.jpg)
www.hathitrust.org
The Name• The meaning behind the name
– Hathi (hah-tee)--Hindi for elephant– Big, strong– Never forgets, wise– Secure– Trustworthy
![Page 10: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/10.jpg)
www.hathitrust.org
Banking Analogy
![Page 11: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/11.jpg)
www.hathitrust.org
The Logo
![Page 12: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/12.jpg)
www.hathitrust.org
The Partners• When announced in October 2008, full
partners included:– University of California system– CIC (Committee on Institutional Cooperation)
– University of Virginia
University of ChicagoUniversity of IllinoisIndiana UniversityUniversity of IowaUniversity of Michigan Michgian State University
University of MinnesotaNorthwestern University Ohio State University Pennsylvania State University Purdue University University of Wisconsin-Madison
![Page 13: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/13.jpg)
www.hathitrust.org
vs.
The Differences
![Page 14: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/14.jpg)
www.hathitrust.org
Sorting the Issues• Cost Model
– Partners charged a one-time start-up fee based on the number of volumes added to the repository, in addition to an annual fee for the curation of those volumes.
![Page 15: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/15.jpg)
www.hathitrust.org
Sorting the Issues• Governance
![Page 16: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/16.jpg)
www.hathitrust.org
Sorting the Issues• Impact of Google settlement
– Full access to materials– More quickly than a court– Win would have permitted
content locked up foryears
![Page 17: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/17.jpg)
www.hathitrust.org
HathiTrust Architecture• Storage in Ann Arbor and Indianapolis• Encrypted backup to 2nd AA location• Inbound validation, standards-based object
storage and related metadata• Rights database for rights metadata• Online catalog as source and storage for
descriptive metadata
![Page 18: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/18.jpg)
www.hathitrust.org
• Objectives:– A guiding principle: store archival images, create deliverables on
demand– Incorporate TDR-specific practices
• Simple filesystem layout using Pairtree structure– One directory per volume, all files inside zip w/associated METS
file– Use of a namespace allows for conflicting identifiers– Namespaces for institutions and, if needed, types of identifiers
within the institution
Page image andmetadata repository
![Page 19: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/19.jpg)
www.hathitrust.org
• What information to store?– Considered complexity and maintenance– Considered using MARC directly– Needed to accommodate both bib record-derived rights and
manual overrides
• Approach: examine bib record, determine authoritative copyright status, store rights attribute, source, reason, and timestamp
• Stored in MySQL
Rights database, pt1©
![Page 20: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/20.jpg)
www.hathitrust.org
• Each rights attribute must have a reason.– bib: bibliographically-derived– man: manual access control override– ddd: due diligence documented
• Typical rights attributes in use– pd: public domain– pdus: public domain for US viewers*– inc: in copyright– nobody (override): no access
• Source (e.g.,‘google’)
Rights database, pt. 2©
![Page 21: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/21.jpg)
www.hathitrust.org
©rights
databaseGeoIP
databasearchival
page image
Pageturner: page image retrieval
librarycatalog
metadata
METS XML
online page image
XSLT
XML
HTML
browser
![Page 22: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/22.jpg)
www.hathitrust.org
HathiTrust and TRAC• Automatic validation in GROOVE
– Check barcode check digit using Luhn algorithm– Fixity check on JPG, TIFF, UTF8 using MD5– Well-formedness and embedded metadata check
on JPG, TIFF, UTF8 using JHove– Various completeness cross-checks– Failures retried, admin will eventually intervene
• Periodic fixity checks using MD5
![Page 23: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/23.jpg)
www.hathitrust.org
OAIS Reference Model
GRINInternal Data Loading
GRINInternal Data Loading
Google[OCA]
In-house Conversion
Google[OCA]
In-house Conversion
MARC record extensions (Aleph)
Rights DB
MARC record extensions (Aleph)
Rights DB
Page TurnerHathiTrust API
OAIGeoIP DB
CNRI Handles[Solr]
Page TurnerHathiTrust API
OAIGeoIP DB
CNRI Handles[Solr]
METS/PREMIS objectTIFF G4/JPEG2000
OCRMD5 checksums
METS/PREMIS objectTIFF G4/JPEG2000
OCRMD5 checksums
METS objectPNGOCRPDF
METS objectPNGOCRPDFIsilon
Site ReplicationTSM
MD5 checksum validation
IsilonSite Replication
TSMMD5 checksum validation
GROOVE(JHOVE)GROOVE(JHOVE)
![Page 24: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/24.jpg)
www.hathitrust.org
• Why METS?– Can serve as an Archival Information Package and
a Dissemination Information Package– Designed to record the relationship between
pieces of complex digital objects– Can be created automatically as texts are loaded
or reloaded
METS Object
![Page 25: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/25.jpg)
www.hathitrust.org
• What’s there?
– metsHdr with an ID and CREATEDATE
– dmdSec with a URL
– Two techMD referencing notes files
– Two fileGrps (images and OCR)
– Physical structMap tying together the files with any metadata (pg. numbers or features)
METS Object
![Page 26: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/26.jpg)
www.hathitrust.org
HathiTrust Services• Preservation of digital surrogate• Access (within bounds of law and settlement)
– Viewing– Redistribution
• Services for print-disabled users• Section 108• Non-consumptive research
![Page 27: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/27.jpg)
www.hathitrust.org
HathiTrust Branding
![Page 28: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/28.jpg)
www.hathitrust.org
Legal Status of the Books• Outside of the Settlement
– Public domain content digitized by libraries unconstrained– Libraries continue to do preservation-related work with in-copyright
works (Sec108)
• Settlement– LDC or cooperative LDC (HathiTrust)– Services for print-disabled users– Non-consumptive research– Section 108 uses– General discovery– Sharing of Public domain
![Page 29: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/29.jpg)
www.hathitrust.org
HathiTrust Future• Expansion of partnership• New services • Revision of governance• Refinement of content
![Page 30: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/30.jpg)
www.hathitrust.org
Contacts, etc.• http://www.HathiTrust.org (see sitemap)• Patricia Steele <[email protected]>• John Wilkin <[email protected]>
![Page 31: Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries](https://reader033.vdocuments.us/reader033/viewer/2022051819/5514c1ff55034640138b5929/html5/thumbnails/31.jpg)
www.hathitrust.org
Digital library for the future