hathitrust and its role in building the universal collection john wilkin 2 june 2009
TRANSCRIPT
HathiTrust and Its Role in Building the Universal Collection
John Wilkin
2 June 2009
www.hathitrust.org
Presentation structure
• Quick background on where we are• A few pieces of what’s in the hopper• Development work underway• New collaborative structures
• Explore HathiTrust as a vehicle for collaboration in the realm of collections
www.hathitrust.org
Mission and Goals
• to contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge – materials converted from print– improve access …to meet the needs of the co-owning
institutions– reliable and accessible electronic representations– coordinate shared storage strategies– “public good” … free-riders.– simultaneously …centralized …open
www.hathitrust.org
current members
www.hathitrust.org
Governance Model
• Executive Committee• Strategic Advisory Board• Coordinated input from groups of members– Hathi/CIC Steering Committee– UC library directors
www.hathitrust.org
Executive Committee• Paul Courant, University Librarian and Dean of Libraries, University
of Michigan• Laine Farley, Executive Director, California Digital Library• Paula Kaufman, University Librarian and Dean of Libraries,
University of Illinois at Champaign-Urbana• John King, Vice Provost for Academic Information, University of
Michigan• Brian Schottlaender, University Librarian, University of California,
San Diego Libraries• Patricia Steele, Dean of Libraries, Indiana University• Brad Wheeler, Chief Information Officer, Indiana University• John Wilkin, Executive Director of HathiTrust and Associate
University Library, Library Information Technology, University of Michigan
www.hathitrust.org
Strategic Advisory Board– Ed Van Gemert (Chair), Director of Libraries, University of
Wisconsin-Madison– John Butler, Associate University Librarian for Information
Technology, University of Minnesota– Patricia Cruse, Director, Preservation, California Digital Library– Robin Dale, Associate University Librarian for Collections and
Library Information Systems, University of California, Santa Cruz– R. Bruce Miller, University Librarian, University of California,
Merced– Sarah Pritchard, University Librarian, Northwestern University– Paul Soderdahl, Director, Library Information Technology,
University of Iowa – John Wilkin, Executive Director, HathiTrust (ex officio)
www.hathitrust.org
Preservation: OAIS Reference Model
GRINInternal Data Loading
GRINInternal Data Loading
Google[OCA]
In-house Conversion
Google[OCA]
In-house Conversion
MARC record extensions (Aleph)
Rights DB
MARC record extensions (Aleph)
Rights DB
Page TurnerHathiTrust API
OAIGeoIP DB
CNRI Handles[Solr]
Page TurnerHathiTrust API
OAIGeoIP DB
CNRI Handles[Solr]
METS/PREMIS objectTIFF G4/JPEG2000
OCRMD5 checksums
METS/PREMIS objectTIFF G4/JPEG2000
OCRMD5 checksums
METS objectPNGOCRPDF
METS objectPNGOCRPDFIsilon
Site ReplicationTSM
MD5 checksum validation
IsilonSite Replication
TSMMD5 checksum validation
GROOVE(JHOVE)GROOVE(JHOVE)
www.hathitrust.org
growth trajectory?
www.hathitrust.org
accomplishments to date
1. 25 partners2. successful ingest and millions of vols online3. mirroring and backup4. rich access5. collection builder6. Catalog beta and WCL working group
www.hathitrust.org
What next?• Data API and other strategies for increased
openness• Internet Archive/OCA ingest followed by misc.
non-Google ingest• Full text search over entire repository• Extending out services through Shib• Creating research corpus• Deeper collaborative strategies
www.hathitrust.org
Where next with collaboration?
• Begin sharing actual development, cf. ingest of Internet Archive content– Specifications– Validation routines?– Packaging?
• Collaboratively develop a collaborative framework– SAB and working group charges
www.hathitrust.org
Working groups?• Security• Collection management• Non-Consumptive Research• Digital preservation• Discovery (bibliographic and full text)• Externally-facing repository APIs• Bibliographic metadata management• Rights Management
www.hathitrust.org
Universal collection
• What is a collection?• Bibliographic identity• Certification (and for specific or purposes)– Object as content– Object as artifact
www.hathitrust.org
Toward a Cloud Library
• Shared Print repository or repositories with all the best attributes (service, treatment, management)
• Shared digital repository with all the best attributes (compliance with TRAC, accessible in every sense, a foundation for services)
• … and even some coordination between the two• … and even (particularly for in-copyright works where
we don’t have permissions) a viewable copy in GBS
www.hathitrust.org
Expectations and plans?
• How would we define our requirements for satisfaction with each?
• What would the business model be?• How would we build our local collections in
light of the presence of something like this?• What would we do on the “core” or shared
collections?
www.hathitrust.org
Next steps for libraries
• Case study library: NYU Library• ReCap storage facility in Princeton, NJ• HathiTrust digital repository• CLIR as broker and OCLC Research as agent• Futures that depend on looking beyond the
local to the shared, from the shared as “you” to the shared as “we”
www.hathitrust.org
Needed infrastructure
• More refined bibliographic identification• Relationship of digital to partner print
holdings, including withdrawn volumes• Certification of digital• Rights determination• Rights clearance
www.hathitrust.org
Further info/updates
• http://www.hathitrust.org/• RSS feed for updates• [email protected]