esri user conference, august 8, 2006 long-term archiving of geospatial data: the ngda project julie...
TRANSCRIPT
ESRI User Conference, August 8, 2006
Long-term archiving of geospatial data: the NGDA project
Julie Sweetkind-Singer
John Banning
Stanford University
ESRI User Conference, August 8, 2006
The Library of Congress and NDIIPP
$100 million from Congress, Dec. 2000. 1st round of funding announced Sept. 30,
2004.– 8 grants funded for nearly $14 million.
2nd round of funding announced May 6, 2005.– 10 awards totaling $3 million (in conjunction with
NSF).
ESRI User Conference, August 8, 2006
Funded Geospatial Projects
North Carolina State University: preservation of geospatial data from state and government agencies in North Carolina Main partner: North Carolina Center for Geographic Information & Analysis
University of California at Santa Barbara: formation of a national geospatial federated digital repository Main partner: Stanford University
Total of both awards: $3.1 million
ESRI User Conference, August 8, 2006
What is meant by digital preservation?
“Reliable long-term access to managed digital resources to its designated communities, now and in the future.” (RLG/OCLC, 2002)
Trusted digital repository attributes
ESRI User Conference, August 8, 2006
Key non-technical elements
Collection development– Assessing scope– Assessing risk
Contracts– Rights / use of materials
Cost of acquiring data. Increasing the size of
the collecting network.
ESRI User Conference, August 8, 2006
Key technical elements
Large data sets Versioning Variety and complexity of
formats Proprietary file formats Need for format infor-
mation and specifications Federation
ESRI User Conference, August 8, 2006
External contacts
California Spatial Information Library (CASIL) David Rumsey Collection California Geological Survey Katrina Image Warehouse Digital Globe and GeoEye ESRI
ESRI User Conference, August 8, 2006
Technical Architecture-Stanford
ESRI User Conference, August 8, 2006
Technical Architecture-UCSB
storage subsystem
standard, public data model
archival system
ADL OAIbulk
loader
databases,caches,
etc.
Web
access ingest
ESRI User Conference, August 8, 2006
What is a format?
“A serialization of an abstract information model”– A set of syntactic and semantic rules for mapping
from an information model to a byte stream (and, in most instances, for mapping back).
Without knowledge of its format, a digital object is merely a collection of undifferentiated bits.
ESRI User Conference, August 8, 2006
What is a Format Registry?
Definition– The registry is a central location where information is stored and
maintained in a controlled method. – This includes: Identifiers, Responsibility, Classification,
Relationships, Specifications, Signatures, Grammar, Tools, and Assessment
Why do we need one? Formats become obsolete over time Need machine actionable validation of the format.
ESRI User Conference, August 8, 2006
Goals of a Format Registry
Interpret the information content of that object properly.
Effective use, interchange, and preservation of all digitally-encoded content.
ESRI User Conference, August 8, 2006
Current Efforts in Format Registries
Global Digital Format Registry (GDFR) Digital Formats Web (Library of Congress) PRONOM (UK) NGDA (geospatial) Long Now Foundation
ESRI User Conference, August 8, 2006
Geospatial Example: ESRI Shapefile
1. ESRI Shapefile Technical Description white paper
2. dBase specification
3. Reference to different geospatial metadata standards
4. Additional documentation, specifications or statements on the various files that may be used as part of shapefiles (.sbn, .sbx, .prj, . xml, .fbn, .fbx)
ESRI User Conference, August 8, 2006
Geospatial Example: ESRI Shapefile
5. Identifiers – “.shp”
6. Responsibility – ESRI 380 New York Street Redlands, CA 92373
7. Tools – ArcGIS, ArcView 3.0, etc.
Link to existing Format Registry: http://www.ngda.org/format/
ESRI User Conference, August 8, 2006
Goals of the Project
Create robust preservation environments
Save at-risk data Write collection development
policy Start a geospatial format registry Develop guidelines for
preservation of geospatial materials
Agree upon guidelines for participation in the NGDA
ESRI User Conference, August 8, 2006
Relevant contact information
Julie Sweetkind-Singer– [email protected]
John Banning– [email protected]
NGDA Web site– www.ngda.org
NDIIPP Web site– www.digitalpreservation.gov