indiana university
DESCRIPTION
Indiana University. Data Publishing Service. Stacy Kowalczyk. April 9, 2010. Questions. Which phases of the data life cycle are managed by your repository? How do data management requirements differ across the data life cycle? What systems do you use to support the data life cycle? - PowerPoint PPT PresentationTRANSCRIPT
Data Publishing Service Indiana University
Stacy KowalczykApril 9, 2010
Questions
• Which phases of the data life cycle are managed by your repository?
• How do data management requirements differ across the data life cycle?
• What systems do you use to support the data life cycle?
• Can you generalize the mechanisms used to migrate data between different phases of the data life cycle?
Data Publishing Service• A new service of the IUScholarWorks institutional
repository and the Scholarly Data Services• Providing data management support and data access• Data will have a persistent URL so it can be linked to
publications• The service will combine our DSpace repository with
IU’s Scholarly Data system (formerly known as MDSS), a system that researchers are already uses
• Allows discovery over the Web• Preservation – bit level
Current Data Lifecycle Model Implementation
Scholarly Data ServiceData creationresearch designdata management planningdata collection (surveying, experimentation, measuring etc.)data checking and cleaning
↓Data analysisanalysisderived data creationcreation of data documentation
↓End of researchresearch outputspreparing data for preservation
IU ScholarWorksPreservation of datastorage of datamigration to suitable format/mediummetadata creation
↓Distribution/publication of data
↓Re-use of databy same researcherby other researchers
http://www.data-archive.ac.uk/sharing/lifecycle.asp
Scholarly Data Service
• Massive Data Storage System• Current system for research data storage• Installed in 1998• Based on IBM developed High Performance
Storage System (HPSS) software• It offers over 2.8 petabytes of disk- and
tape-based storage. Distributed between Indianapolis and Bloomington campuses
IUBSubsystem
IUPUISubsystem
Research Network
Research Network
BloomingtonUsers
BloomingtonUsers Indianapolis
UsersIndianapolis
Users
HPSSMoversHPSS
MoversHPSS
MoversHPSS
Movers
Research Network
Research Network
TCP/IP Wide Area
Network
SANSANSANSAN
IUBCampus Network
IUPUICampus Network
Disk ArraysDisk Arrays Tape LibraryTape LibraryDisk ArraysDisk Arrays Tape LibraryTape Library
HPSS CoreServers
HPSS CoreServers
Distributed between IUB and IUPUI
Data Publishing in IU Scholarworks
• Discovery and access of datasets and related publications through the IUScholarWorks Repository service
• DSpace records that are searchable, indexed, and harvested and available at stable URLs
• DSpace records that contain DSpace bitstreams for small datasets
• DSpace records that link via stable URLs to large datasets in IU MDSS
IU MDSS
MDSS web server
HTTP Server
hpssfs filesystem
IUScholarWorks Data: Linking to MDSS and delivery via HTTP
Item record with URL’s of
datasets in MDSS
Data Publishing in IU Scholarworks• Facilitating the submission process for
both the researcher and collection manager
• We facilitate the process for submitters via the DSpace Configurable Submission system
• We facilitate the data collection manager’s process via steps in the DSpace workflow system
IU MDSS
Initiate MDSS actions (move datasets, etc.)
Instructions and
preparation
Describe item
metadata form(s)
Review step
File upload step
MDSS and dataset
info/form
Finalize/ Accept License
IUScholarWorks Data: Item submission user interfacePhase 2, automated workflow
DSpace Configurable Submission System
Non-interactive processing steps Update
metadata
Query MDSS technical metadata
(checksum, etc.)
Planning for a More Curated Life Cycle Model
April 21, 2023
http://libraries.mit.edu/guides/subjects/data-management/cycle.html
Active and Social Curation
• Engage researchers during projects not at the end
• Use immediate benefits to drive automatic capture and 'volunteering’ of metadata
• Reduce costs by re-engineering curation processes to leverage this rich metadata and volunteered effort
Appraisal and
Selection Trusted Digital Repository Federation (OAIS compliant) Preservation
Actions
Compound Objects - OAI-ORE
Dissemination Packages
Ingest, AIPs
Active Data Systems
Data Acquisition, Analysis and Simulation
Search, Browse,
Annotation, Visualization
Tools
Metadata Management
DDI3. METS, PREMIS, MODS, DC, SensorML,
OGC, …
Automated Curation Workflow/Rule
Engine
Operates on Metadata, Content Objects and
Trigger Events
Access Mechanisms and E-Scholarship Services
Migration and
Emulation Tools
Use, Reuse, Repurposing
Tools
Wide-Area File System
Active Curation OAIS Repository FederationCuration Boundary
UserContributor