a data model and architecture for long-term preservation
DESCRIPTION
A Data Model and Architecture for Long-term Preservation. Greg Janée, Justin Mathena, James Frew University of California at Santa Barbara. Outline. Project overview Character of geospatial data Observations on preservation requirements Architecture Ongoing work. Project overview. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: A Data Model and Architecture for Long-term Preservation](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814499550346895db14213/html5/thumbnails/1.jpg)
A Data Model and Architecture for Long-term Preservation
Greg Janée, Justin Mathena, James FrewUniversity of California at Santa Barbara
![Page 2: A Data Model and Architecture for Long-term Preservation](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814499550346895db14213/html5/thumbnails/2.jpg)
Greg Janée • JCDL 2008 2
Outline
• Project overview• Character of geospatial data• Observations on preservation
– requirements
• Architecture• Ongoing work
![Page 3: A Data Model and Architecture for Long-term Preservation](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814499550346895db14213/html5/thumbnails/3.jpg)
Greg Janée • JCDL 2008 3
Project overview
• National Geospatial Digital Archive (NGDA)– UCSB (Map & Imagery Laboratory)– Stanford (Branner Earth Sciences Library)
• Funded by Library of Congress’s NDIIPP program
How to achieve long-term preservationof geospatial data on a national scale?
![Page 4: A Data Model and Architecture for Long-term Preservation](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814499550346895db14213/html5/thumbnails/4.jpg)
Greg Janée • JCDL 2008 4
Geospatial data characteristics
• Voluminous• Sensor platforms are long-lived• Highly structured
– support not ubiquitous
• Requires specialized interpretation• Tied to Earth models
![Page 5: A Data Model and Architecture for Long-term Preservation](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814499550346895db14213/html5/thumbnails/5.jpg)
Greg Janée • JCDL 2008 5
Starting point
content
now
takeaction
now+
100 years
![Page 6: A Data Model and Architecture for Long-term Preservation](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814499550346895db14213/html5/thumbnails/6.jpg)
Greg Janée • JCDL 2008 6
Preservation: relay across time
repository system
now now+
100 years
storage system
institution
![Page 7: A Data Model and Architecture for Long-term Preservation](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814499550346895db14213/html5/thumbnails/7.jpg)
Greg Janée • JCDL 2008 7
Preservation: relay across time
repository system
now now+
100 years
storage system
institution
Requirement
Each archive facilitates handoff to the next
![Page 8: A Data Model and Architecture for Long-term Preservation](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814499550346895db14213/html5/thumbnails/8.jpg)
Greg Janée • JCDL 2008 8
Mid-century perspective
oldcontent
now - 50 now + 50now
takeaction
contentancientcontent
![Page 9: A Data Model and Architecture for Long-term Preservation](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814499550346895db14213/html5/thumbnails/9.jpg)
Greg Janée • JCDL 2008 9
Mid-century perspective
oldcontent
now - 50 now + 50now
takeaction
contentancientcontent
Requirement
Each archive facilitates handoff to the next
... on unfamiliar content
... such that the next archive can make the same claim
![Page 10: A Data Model and Architecture for Long-term Preservation](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814499550346895db14213/html5/thumbnails/10.jpg)
Greg Janée • JCDL 2008 10
Preservation: mitigation of risk
• Preservation is an outcome• Risk: insufficient resources and/or desire• Risk: handoff
– e.g., from failing institution– e.g., from unsupported repository system
![Page 11: A Data Model and Architecture for Long-term Preservation](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814499550346895db14213/html5/thumbnails/11.jpg)
Greg Janée • JCDL 2008 11
Preservation: mitigation of risk
• Preservation is an outcome• Risk: insufficient resources and/or desire• Risk: handoff
– e.g., from failing institution– e.g., from unsupported repository systemRequirement
Each archive supports a low-cost, robust “fallback” preservation mode
![Page 12: A Data Model and Architecture for Long-term Preservation](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814499550346895db14213/html5/thumbnails/12.jpg)
Greg Janée • JCDL 2008 12
computing platformsemantics
terminologyprovenance
providerquality
appropriate usagecommunity
context
context
capture
object(data + metadata)
object(data + metadata)
2008
object(data + metadata)
object(data + metadata)
2108
Preservation: context
objectobject migrate
![Page 13: A Data Model and Architecture for Long-term Preservation](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814499550346895db14213/html5/thumbnails/13.jpg)
Greg Janée • JCDL 2008 13
Geospatial data context
• Complex– sensor, platform characteristics
• In practice, not handled as metadata• Deep understanding of provenance required
– to support reprocessing
![Page 14: A Data Model and Architecture for Long-term Preservation](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814499550346895db14213/html5/thumbnails/14.jpg)
Greg Janée • JCDL 2008 14
Ozone reprocessing requirements• xDRs• Delivered IPs• Engineering data (incl. C3S data if not in RDRs)• Upload files• Databases• Software (source code)• Calibration artifacts
– data– analysis tools– tables– logs– notebooks– instrument design
• All project documentation• All scientific papers• All reports
Taken from: Mike Linda, “OMPS Aggregation and Packaging,”2006 CLASS Users’ Workshop
![Page 15: A Data Model and Architecture for Long-term Preservation](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814499550346895db14213/html5/thumbnails/15.jpg)
Greg Janée • JCDL 2008 15
Ozone reprocessing requirements• xDRs• Delivered IPs• Engineering data (incl. C3S data if not in RDRs)• Upload files• Databases• Software (source code)• Calibration artifacts
– data– analysis tools– tables– logs– notebooks– instrument design
• All project documentation• All scientific papers• All reports
Taken from: Mike Linda, “OMPS Aggregation and Packaging,”2006 CLASS Users’ Workshop
Requirement
Context must be preserved
... and context must accommodate complex networks of objects
![Page 16: A Data Model and Architecture for Long-term Preservation](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814499550346895db14213/html5/thumbnails/16.jpg)
Greg Janée • JCDL 2008 16
Architecture
archivemanagement, policies, services, access
domain-specific
logical data modelstandard packaging of data, semantics
physical data modelsurvivable, vendor-neutral representation of above
bestpractices/
interopstandard
storage virtualization layerseamless movement, reliability, redundancy
interopstandard
![Page 17: A Data Model and Architecture for Long-term Preservation](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814499550346895db14213/html5/thumbnails/17.jpg)
Greg Janée • JCDL 2008 17
• Logical data model captures all information required to resurrect, reuse objects
• Includes archival of format specs, metadata, contextual information, transitive closure thereof
• NGDA: archival objects
Architecture
archive
logicaldata model
physicaldata model
storage virtualization layer
domain-specific
interop
bestpractices/
interop
![Page 18: A Data Model and Architecture for Long-term Preservation](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814499550346895db14213/html5/thumbnails/18.jpg)
Greg Janée • JCDL 2008 18
• Physical data model fully and simply represents logical data model
• No vendor lock-in• NGDA: files, filesystems,
XML manifests
Architecture
archive
logicaldata model
physicaldata model
storage virtualization layerinterop
domain-specific
bestpractices/
interop
![Page 19: A Data Model and Architecture for Long-term Preservation](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814499550346895db14213/html5/thumbnails/19.jpg)
Greg Janée • JCDL 2008 19
• Storage virtualization layer supports intra- and inter-archive handoffs
• NGDA: “logistical networking”
Architecture
archive
logicaldata model
physicaldata model
storage virtualization layerinterop
domain-specific
bestpractices/
interop
![Page 20: A Data Model and Architecture for Long-term Preservation](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814499550346895db14213/html5/thumbnails/20.jpg)
Greg Janée • JCDL 2008 20
• Combination of complete resurrection information with a simple physical representation provides fallback mechanism
Architecture: fallback
archive
logicaldata model
physicaldata model
storage virtualization layerinterop
domain-specific
bestpractices/
interop
![Page 21: A Data Model and Architecture for Long-term Preservation](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814499550346895db14213/html5/thumbnails/21.jpg)
Greg Janée • JCDL 2008 21
Architecture: handoffs
archiveexport ingest
archive
logicaldata model
physicaldata model
storage virtualization layer
logicaldata model
physicaldata model
![Page 22: A Data Model and Architecture for Long-term Preservation](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814499550346895db14213/html5/thumbnails/22.jpg)
Greg Janée • JCDL 2008 22
Logical data model
![Page 23: A Data Model and Architecture for Long-term Preservation](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814499550346895db14213/html5/thumbnails/23.jpg)
Greg Janée • JCDL 2008 23
Example archival object
![Page 24: A Data Model and Architecture for Long-term Preservation](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814499550346895db14213/html5/thumbnails/24.jpg)
Greg Janée • JCDL 2008 24
Physical data model
...identifier/manifest.xmlcnty24k97.xmldata/
source/cnty24k97.shpcnty24k97.dbf...
cnty24k97.png
• object structure• fixity metadata• inter- and intra-object
relationships
![Page 25: A Data Model and Architecture for Long-term Preservation](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814499550346895db14213/html5/thumbnails/25.jpg)
Greg Janée • JCDL 2008 25
Storage abstraction
• Bitstreams– create, (delete), read, write– no modify
• Directories– create, (delete), list members
• Above identified by hierarchical pathnames
• Satisfied by filesystems, WebDAV, ...
![Page 26: A Data Model and Architecture for Long-term Preservation](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814499550346895db14213/html5/thumbnails/26.jpg)
Greg Janée • JCDL 2008 26
Archive depencies
• Filesystem• XML• Character set(s)• Identifier resolution mechanism(s)
![Page 27: A Data Model and Architecture for Long-term Preservation](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814499550346895db14213/html5/thumbnails/27.jpg)
Greg Janée • JCDL 2008 27
Summary
• Architecture to facilitate handoffs, reduce risk, provide fallback– best practices– interoperability potential
• Ongoing work– “logistical networking” for storage virtualization– preservation profiles for other data models– format registries and other achive depencies– whole-archive descriptor
• dependencies, policies
![Page 28: A Data Model and Architecture for Long-term Preservation](https://reader036.vdocuments.us/reader036/viewer/2022062423/56814499550346895db14213/html5/thumbnails/28.jpg)
Greg Janée • JCDL 2008 28
Questions?