ncsu libraries digital repository projects at the north carolina state university libraries james...
TRANSCRIPT
NCSU Libraries
Digital Repository Projects at the
North Carolina State University Libraries
James Jackson Sanborn
Jim Tuttle
Open Repositories/DSpace User Group ‘07
NCSU Libraries
Early Repository Planning
• Digital Repository Planning Committee• What it wouldn’t be (at least to start)
– Distributed community structure– Open submission– ‘Institutional’ Repository
• What it would be (at least to start)
– Library-managed collections– Building block for campus partnership– Learning opportunity
NCSU Libraries
Repository Building Blocks
• NCSU Electronic Theses and Dissertations– Started 1997– Mandatory since 2002– Virginia Tech’s ETDdb– ~3,000 ETDs
• NCSU Authors Database– Started 1995– Access Database/Cold Fusion front-end– ~22,000 citations
NCSU Libraries
Repository Building Blocks (cont’d)
• Technical Reports Print Collection– Campus Institutes and Departments– Massive fall-off in print distribution
• Special Collections Resource Center– Digitized texts and photographs– Campus Newsletters
• GIS Data– Library managed/acquired data collection– Homegrown data layer database/discovery
tools
NCSU Libraries
Repository Plan
• Target ‘Research’ collections first– Technical Reports– ETDs– Faculty Publications/Citations
• Treat each collection as its own project
• Actively pursue common technological solutions
NCSU Libraries
Technical Reports
• DSpace Application
• Lightly Customized
• Library Harvested– Local Cataloging/Metadata database– Scripted Ingest Object Creation– Batch Ingest
• Mix of ongoing submission by institute/departmental personnel and Library capture.
NCSU Libraries
Electronic Theses & Dissertations
• Partnership with Graduate School
• Hybrid System: DSpace and ETD-db– ETD-db submission/approval/management– Direct database extract for DSpace Ingest
Object creation– Scheduled Batch Ingest process
• DSpace Considerations/Alterations– Metadata Mapping– Author Browse (exclude contributor.advisor)– Various interface changes
NCSU Libraries
Faculty Publications
• Built on Existing Author Database– Rebuilt Authors DB from Access/ColdFusion
to Oracle/PHP• Re-modeled data• Added Functionality
– OpenURL– ‘Vita-like’ citation display– Full-text or submission links
– Full-text stored in DSpace• Citation metadata and file exported by script• DSpace Identifier currently manually entered
NCSU Libraries
Faculty Publications Schematic
Scholar
Oracle FacultyPublications DB (citations)
Web interface (php)
DSpaceJava/JSP
(full-text only)
Cataloging and Coll. Mgt.
Access
DSpace Item DisplayWeb Submission Form
ISIAnn. Reps
Etc.
View full-text
S+R Citations
Add/Edit data
Handle IDs
SubmitCitations
and/or Text
File System(files)
PostgreSQL(metadata)
NCSU Libraries
Repository Governance
• Internal– Digital Repository Planning Committee– Data Repository Architect
• External– Faculty Repository Advisory Committee– Partnerships with departments and institutes
NCSU Libraries
NCGDAP: Overview
• NDIIPP: National Digital Information Infrastructure and Preservation Program
• Collaboration with Library of Congress
• 1 of 8 three year projects to study long-term (50+ years) digital preservation
• Objective: engage existing state/federal geospatial data infrastructures in preservation
• Project approaches: Technical and Social
NCSU Libraries
Repository Requirements
• Dim archive with possible future access– minimal IR/access component
• Minimal repository imprint on data– repository agnostic ingest and export
• Simple digital curation functions– Periodic MD5 checksum validation– Structured metadata index
• Expected archived-data exchange• Leverage existing investments• Free Software with active community
NCSU Libraries
Automation: Threat and format analysis, validationPython wrappers for the following:
• Anti-virus – ClamAV
• Compressed files (tar, zip, gzip, bzip)
• At-risk formats
• Executable files (magic numbers)
• Jhove validation
NCSU Libraries
Automation:Archive package organization• Rule-based python
logic– filestem – extension
relationships ( multi-file format validation)
– directory structure
• Manual intervention• NOID assignment
NCSU Libraries
Metadata:Seed file form• 'Transfer set' metadata capture in 'Seed
file'– communicates with DSpace backend,
generates xml used to inform later scripts
NCSU Libraries
Metadata:Communities and Collections
• Search by type for 100+ communities• Facilitates creation and reduces errors
NCSU Libraries
Curation Processing
• At-risk format migration, original retained
• Agency-specific XML templates in ArcCatalog with synchronization flags
• Provenance and curation metadata scripted
NCSU Libraries
Source Metadata Translation
• Repository agnostic approach
• Spokes for each transformation
• Facilitates export from Dspace into other repositories
• Generate Dspace QDC, METS; populate Workflow database
NCSU Libraries
Extra-repository AIP management
• Workflow Management Database (WMD) populated as a spoke on the metadata/ingest hub
• External tracking of NOID, Handle, ISO keywords, other metadata for interaction with other systems
• Integrates with existing GIS Lookup tool
NCSU Libraries
Repository Architecture Overview
PostgreSQL
repository tomcat instance
Faculty PublicationsPHP/DSpace hybrid
TomcatDSpace Internal
NDIIPP(DSpace)
SCRC(DSpace)
Asset Store/ATABeast
(sub-directory for each DSpace app)
One shared username. Separate database for each
app
Repository(DSpace)•Technical Reports•ETDs
Collections (DSpace)SCRC --Course Catalogs --Green ‘N’ Growing
NCSU Libraries
Upcoming Repository Related Projects
• Enhancements to current system– XTF search interface– Inter-archive exchange
• Digital Collections Repository– Special Collections Research Center– Other non-faculty collections
• Data Repository– Scientific data– Statistical resources
NCSU Libraries
For More Information:
• James Jackson Sanborn– [email protected]
• Jim Tuttle– [email protected]