dlf spring forum 2008
DESCRIPTION
The Disappearing Data Problem: Preserving Today's Geospatial Data to Meet Tomorrow's Temporal Analysis Needs Steve Morris Head of Digital Library Initiatives North Carolina State University Libraries. DLF Spring Forum 2008. April 28, 2008. Outline. Background to the geospatial content domain - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/1.jpg)
The Disappearing Data Problem:
Preserving Today's Geospatial Data to Meet Tomorrow's Temporal Analysis Needs Steve MorrisHead of Digital Library InitiativesNorth Carolina State University Libraries
DLF Spring Forum 2008 April 28, 2008
![Page 2: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/2.jpg)
2
Outline
Background to the geospatial content domainOverview of the NDIIPP projectPreservation challenges and solutions (?)Changes in the content domainMoving forward: New initiatives
![Page 3: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/3.jpg)
3
Geospatial Data Types – Digital Orthophotography
• All 100 NC counties with orthos• 1-5 flight years per county• 200-300 gb per flight
![Page 4: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/4.jpg)
4
Geospatial Data Types – Vector Data
• Point, line, and polygon• Attached attribute data• Some layers frequently updated
![Page 5: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/5.jpg)
5
Geospatial Data Types – Vector Data
• Cadastral (tax parcels) • Street centerlines• Zoning• Topographic contours• Public utilities• School, sheriff, fire• Voting precincts• More …
Frequent UpdateMore detailed, current, and accurate than state/federal data sources
![Page 6: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/6.jpg)
6
Background: GIS at NCSU Libraries
GIS services program since 1992Focus on campus-wide infrastructure, not a lab
Data, software, support, evangelism
Roughly 35 academic departments with GIS activityHistory of close collaboration with state agenciesHeavy reliance on state/local agency geospatial dataData discovery tool developmentProblem: Access to temporal data
![Page 7: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/7.jpg)
7
Example: County and City GIS Data Directories
Tracking data, map servers, and web services since 2000
Ranked 3rd in traffic among entry points to library website
Persistent identifiers– usage tracking– IDs used in other sites
Community help in site maintenance
![Page 8: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/8.jpg)
Carrboro, NC : Population 17,797 (2005 est.)
24 downloadable GIS data layers
4 OGC WMS services (web services)
6 web mapping applications
9 downloadable PDF map layers
![Page 9: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/9.jpg)
9
Problem: Lack of Temporal Data
Industry focus on “latest and greatest” dataIndustry temporally-impaired from the point of view of data availability, software support, etc.“Kill and fill” as a common approach to data management (past versions of vector data lost)
Loss of memory about the data Of superceded county orthophoto flights in NC:
Only 22% recorded in the state’s GIS inventoryOnly 30% accessible through county map servers
Some older inventories only available through Internet Archive
![Page 10: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/10.jpg)
Note: Percentages based on the actual number of respondents to each question 10
Downtown Raleigh Near State Capitol
1914 Sanborn Map
![Page 11: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/11.jpg)
Note: Percentages based on the actual number of respondents to each question 11
Downtown Raleigh Near State Capitol
1993 DOQQ
![Page 12: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/12.jpg)
Note: Percentages based on the actual number of respondents to each question 12
Downtown Raleigh Near State Capitol
1999 Wake County Ortho
![Page 13: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/13.jpg)
Note: Percentages based on the actual number of respondents to each question 13
Downtown Raleigh Near State Capitol
2005 Wake County Ortho
![Page 14: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/14.jpg)
Note: Percentages based on the actual number of respondents to each question 14
Downtown Raleigh Near State Capitol
2005 Wake County Ortho
Imagery = DurableStatic Simple structureMostly open formats
Vector data = VolatileFrequent updateComplex structureMostly proprietary formats
Downtown Raleigh Near State Capitol
2005 Wake County Ortho
Imagery = DurableStatic Simple structureMostly open formats
Vector data = VolatileFrequent updateComplex structureMostly proprietary formats
![Page 15: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/15.jpg)
15
NDIIPP Project Overview
![Page 16: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/16.jpg)
16
NC Geospatial Data Archiving Project
Partnership between university library (NCSU) and NC Center for Geographic Information & AnalysisPart of the Library of Congress National Digital Information Infrastructure and Preservation Program (NDIIPP)Focus on state and local geospatial content in North Carolina (state demonstration)Tied to NC OneMap initiative, which provides for seamless access to data, metadata, and inventoriesObjective: engage existing state/federal geospatial data infrastructures in preservation
Serve as catalyst for discussion within industry
![Page 17: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/17.jpg)
17
Background to Spatial Data Infrastructure
Ca. 1990: Response to high costs of recreating dataProduced data not discoverable or not reusable
1st: Metadata standard: 1994 (FGDC)Enable data discovery and evaluation for use
2nd: Data clearinghouse network: 1996 (using Z39.50:)Search metadata encoded in SGML (later XML)
3rd: Cultivate content standards: late 1990’s - Enhance reusability, compatibility, semantic consistency
4th: Develop web services specifications: 2000 - (OGC)Specs facilitate interoperability of data/services (e.g., WMS)
Temporal aspects of SDI not well developed
![Page 18: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/18.jpg)
18
Technical Challenges with Geospatial Data
Complex vector formats: multi-file, multi-format
No non-proprietary, well-supported format for vector data
Shift to web services-based accessData becoming more ephemeral
Often: Inadequate or nonexistent metadataImpedes discovery and use
Increasing use of spatial databases for data management
The whole is greater than the sum of the parts but the whole is very hard to preserve
![Page 19: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/19.jpg)
19
Problems and (Elusive) Solutions
![Page 20: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/20.jpg)
20
Problem: Data Loss
Jurisdictions Archiving Vector Data Snapshots
No: 34.7%
Yes: 65.3%
No response
Yes
No
Survey of current archiving practice amongNC counties and municipalities
57.6% survey response rate
![Page 21: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/21.jpg)
21
“All of our data is kept monthly for 1 year; i.e., September 2006 tape will be overwritten September 2007.”
“… I do a weekly backup of existing data but it is overwriting the previously saved data.”
“All of our data is archived daily, then weekly, then monthly, and yearly.”
“No emphasis on historical data here. We just try to keep from losing data completely. Very minimal hardware to work with and no money.”
Survey of current archiving practice amongNC counties and municipalities
![Page 22: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/22.jpg)
22
“We are only an emerging GIS. But it is my intention that ALL data will be archived.”
“Getting ready to implement this type of archiving of data.”
“I have not done this, but it does seem like a good idea!”
“I do not see why this can not be incorporated with disaster recovery. Don't you think you would foster greater support?”
Tremendous data producer interest in digitizing and georeferencing old analog imagery and maps
Survey of current archiving practice amongNC counties and municipalities
![Page 23: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/23.jpg)
23
Data Loss: Looking for Solutions
Sept. 2006: Survey of current archiving practice among NC county and municipal agenciesNov. 2007: NC Geographic Information Coordinating Council (GICC):
Ten Recommendations in Support of Geospatial Data Sharing released
Recommendation: “Establish archive and long term data access strategies”
Suggested best practices include: “Establish a policy and procedure for the provision of access to historic data, especially for framework data layers.”
Feb. 2008: NC GICC Archival and Long-Term Access Working Group formed
![Page 24: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/24.jpg)
24
Problem: Making the Business Case for Data Archiving
Use case: Land use and impervious surface change analysis
1993
2005
1998
2002
1999
![Page 25: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/25.jpg)
25
Business Case: Looking for Solutions
Harvesting use cases for older data as part of outreachFormal surveys of current archiving practice and business drivers
Factors Driving Capture of Temporal Data
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
IT policy Recordsretention
policy
Tax adminrules
Land usechangeanalysis
Resolutionof legalissues
Historicmapping
Other
% o
f R
esp
on
den
ts
Survey of current archiving practice amongNC counties and municipalities
![Page 26: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/26.jpg)
26
Problem: Putting the Data in Motion
Most costly part of archive development is identifying, negotiating acquisition, and then transferring data
Local agency “contact fatigue” resulting from repeated state, federal, and university requests for data
Archive development is low priority – leverage other business uses that can put the data in motion
•Continuity of operations•Highway planning•Floodplain mapping
Objective• Minimize direct contacts• Document data• Clarify rights• Routinize transfer
![Page 27: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/27.jpg)
27
Putting the Data in Motion: Looking for Solutions
OrthophotoData DistributionSystem
Transfer of large quantities of imagery
Street Centerline Data Distribution System
Efficient transfer of data from 100 counties, with metadata and clarified rights
NC GIS Inventory
• Efficient data identification• Adding preservation elements
NC OneMap Data Download and Viewer
• Public access• Data visualization
![Page 28: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/28.jpg)
28
Problem: Metadata
Survey of current archiving practice amongNC counties and municipalities
Metadata is often asynchronous, inconsistently structured, incomplete, or missing.
Metadata archived with data?
25%
9%
6%
60%
FGDC
Locally Defined
NC OneMap Starter Block
None
![Page 29: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/29.jpg)
29
Metadata: Looking for Solutions
NC OneMap Metadata OutreachWorkshops, support
NC OneMap Metadata Starter BlockStarter templates for key data layers
NC GIS InventoryBuilds minimal metadata
Emerging content exchange networkse.g., NC StreetMap.comAccrete metadata as part of submission and transfer process
![Page 30: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/30.jpg)
30
Problem: Content Packaging
XML DatabaseExport
XML DatabaseExport
TIFF Images •Pixel Value and Header file•World file•Coordinate System file•Metadata file
Shapefiles•Geometry file•Index file•Attribute file•Metadata file•Coordinate System file•Spatial Index files
Potential Ingest Objects
• Complex multi-file, multi-format objects
• Shared ancillary components
• Need to add administrative & technical metadata beyond FGDC
![Page 31: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/31.jpg)
31
Content Packaging: Looking for Solutions
Open Geospatial Consortium (OGC) Data Preservation Working Group formed
Content packaging now a topic of discussion
Emerging content exchange networks.e.g., NCStreetMap.com
ObjectiveAutomated processing of received dataReduce costly and error-prone human interventionCapture additional technical and administrative metadata
![Page 32: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/32.jpg)
32
Changes in the Domain
![Page 33: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/33.jpg)
33
Changes in the Domain: New Location-Based Content
Present-day value in location-based services and mobile applications
Street ViewsOblique Imagery
3D Images
![Page 34: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/34.jpg)
34
Changes in the Domain: New Location-Based Content
Future value as cultural heritage resource
More descriptive of place and function than spatial data
Ortho image
![Page 35: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/35.jpg)
35
Changes in the Domain: Geospatial PDF
Counterpart to analog map = datasets plus data models, symbolization, classification, annotation, etc.
More data intelligence survives in PDF documents than survives in most other “baked” formats
PDF and GeoPDF
![Page 36: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/36.jpg)
36
Changes in the Domain: New Network Payloads
KML
GeoRSS
GeoJSON
Tile Map Service
More ….
• Lightweight• AJAX-friendly• Often ephemeral
![Page 37: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/37.jpg)
37
Moving Forward: New Initiatives
![Page 38: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/38.jpg)
38
NC GICC Archival and Long-Term Access Committee
Initiated by NC Geographic Information Coordinating Council in 2008 to address growing concerns of state and local agencies about long-term access to dataFederal, state, regional, and local agency representationKey focus
Best practices for data snapshots and retentionState Archives processes: appraisal, selection, retention schedules, etc.Who, What, Why, When, Where, How
![Page 39: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/39.jpg)
39
NDIIPP Multi-State Geospatial Project
Lead organizations: North Carolina Center for Geographic Information & Analysis (NCCGIA) and State Archives of NCPartners:
Leading state geospatial organizations of Kentucky and UtahState Archives of Kentucky and UtahNCSU Libraries in catalytic/advisory role
State-to-state and geo-to-Archives collaboration2 year project: Nov. 2007-Dec. 2009Archives as part of Spatial Data Infrastructure
![Page 40: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/40.jpg)
40
Conclusion
“Supporting temporal analysis requirements” gets more attention than “archiving and preservation”Leverage existing infrastructureCurrent data sharing needs drive infrastructure improvements that help archivingLeverage business needs that are more compelling than preservation (e.g., continuity of operations)Facilitate stakeholder ownership of the solutionsMine state and local archiving innovations
Thanks to Library of Congress and the NDIIPP Partners!
![Page 41: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/41.jpg)
41
Questions?
Steve MorrisHead, Digital Library InitiativesNCSU Librariesph: (919) [email protected]
http://www.lib.ncsu.edu/ncgdap
![Page 42: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/42.jpg)
42
Problem: Preserving Web Services Interactions
![Page 43: DLF Spring Forum 2008](https://reader033.vdocuments.us/reader033/viewer/2022051019/56814e78550346895dbc111c/html5/thumbnails/43.jpg)
Note: Percentages based on the actual number of respondents to each question 43
“Web mash-ups” and the New Mainstream Geospatial Web Services