archiving in the data environment of heliophysics at nasa
TRANSCRIPT
Science Archives Workshop - April 25, 2007 - Page 1
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Archive Policies and Implementation:
A Personal View from a NASA Heliophysics Data Policy
PerspectiveD. Aaron Roberts
NASA GSFC
25 April 2007
Science Archives Workshop - April 25, 2007 - Page 2
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.Define:Archive(some Google results)
A site containing a large number of files, possibly acquired over time, and often publicly accessible. (100 Best Web Hosting)
A function permitting users to copy one or more files to a long-term storage device. Archive copies can:
Accompany descriptive information; Imply data compression software usage; Be retrieved by archive date, file name, or description
(Tivoli Storage Manager)
Archive is a London-based Trip-hop group. (Wikipedia)
Science Archives Workshop - April 25, 2007 - Page 3
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Science Data Archive Definition
Easily accessible, scientifically useable, well-documented, secure data = a good archive.
Requires: Open data policy Independently useable data Science input (data preparation and serving) Proper registration and backup
Science Archives Workshop - April 25, 2007 - Page 4
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture. Archiving Homilies
Archiving is a journey, not a destination “Archive early, archive often” as a natural extension of serving data
“Central” archiving is more about knowledge than acquisition
Knowledge must be easily available: presentation matters
The customer is always right Standards are only as good as the community that
supports them, but they are essential: “It’s the metadata, stupid”
Consider the legacy
Science Archives Workshop - April 25, 2007 - Page 5
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture. Archiving is a journey
Properly described, well-documented, accessible data should easily move from one archiving stage to the next:
NASA missions produce Active Archives (nothing is “ingested”) Products, delivery, and initial long-term data plans in Project Data Management Plan Virtual Observatories provide uniform descriptions and access to many such archives
The archive continues to develop in the extended mission A Mission Archive Plan provides updates to the Senior Reviews on status, plans, and
actions for post mission products and service
After the mission, a Resident Archive can continue to server data Active upgrades of data products to be funded by other means NSSDC manages the RAs
“Permanent” archiving may just be moving the data and documentation to a more generic Resident Archive (e.g., SDAC, SPDF) for continued access
At all stages, backups and registries maintain safety and knowledge of the data products
Science Archives Workshop - April 25, 2007 - Page 6
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture. “Central” archiving
More about knowledge than acquisition: What exists? Where is it? Is it well documented? Is it safe?
New focus for NSSDC role (at least for HP): knowledge of data environment; management of RAs.
(Harvested) VO registries augmented as needed can provide a complete set of resources.
Information about the above should be available in ways that provide easy overviews as well as details.
Science Archives Workshop - April 25, 2007 - Page 7
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture. The customer is always right
The community determines directions: Peer review of VOs, RAs, Data Centers,
Missions: What is working? What could be improved? What can go?
HP Data and Computing Working Group provides feedback on HQ directions
“Top down vision, bottom-up implementation”
“Market-driven” including what we want from archives
Science Archives Workshop - April 25, 2007 - Page 8
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture. It’s the metadata, stupid
Standards that work: Value of sharing data SPASE data model provides a uniform
description of data products SPASE description + data = “SIP”, “AIP”, and “DIP”
Preserved data should be in common, open, supported formats (e.g, FITS, HDF, CDF, documented ASCII, …)
Communication and other standards TBD Important to decide the level of description
Science Archives Workshop - April 25, 2007 - Page 9
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture. Consider the legacy
Preserving and serving what matters for the long term: What is most useful? (If “all” is not possible) What works now, and what will last (and how)?
Calibrated, best-effort products should accompany level-zero plus software/algorithms
Science Archives Workshop - April 25, 2007 - Page 10
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.A model Heliophysics never quite
implemented
Main problems:
(1) “Planning” is a mission function (in collaboration with VOs and others)
(2) “Ingest” is replaced by “production” and “transfer”
(3) “Access” is a distributed function as are the archives in general
Science Archives Workshop - April 25, 2007 - Page 11
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.The New Heliophysics Mission
Data Lifecycleand Framework
Science Archives Workshop - April 25, 2007 - Page 12
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture. Summary
•Easily accessible, scientifically useable, well-documented, secure data = a good archive.• Archiving is a journey, not a destination• “Central” archiving is more about knowledge than
acquisition• Knowledge must be easily available: presentation
matters• The customer is always right• Standards are only as good as the community that
supports them, but they are essential: “It’s the metadata, stupid”
• Consider the legacy
Science Archives Workshop - April 25, 2007 - Page 13
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Backup Slides (HP Data Policy)
Science Archives Workshop - April 25, 2007 - Page 14
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.The HP Data Environment
Data from the Heliophysics Great Observatory reside in a distributed environment and are served from multiple sources.
Multimission Data Centers Solar Data Analysis Center Space Physics Data Facility (CDAWeb, OMNIWeb, etc.) National Space Science Data Center
Mission-level active archives: e.g. ACE, TIMED, TRACE, Cluster, etc.
Much of our data are served from individual instrument sites. We are moving into a new data environment of
Virtual Observatories for convenient search and access of the distributed data, and
Resident Archives to retain the distributed data sources even after mission termination.
We have a Data and Computing Working Group to help us move ahead.
Science Archives Workshop - April 25, 2007 - Page 15
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Goals of the HP Science Data Management Policy
Improve management of and access to HP mission data.
Clarify the architecture and associated data lifecycle milestones of the data environment.
Provide guidelines for proposals, Project Data Management Plans, NRAs, peer reviews, and other activities related to the HP data environment.
Science Archives Workshop - April 25, 2007 - Page 16
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture. Basic Philosophy
Evolve the existing HP data environment: take advantage of new computer and Internet technologies to respond to our evolving mission set and community research needs
(enable the HP Great Observatory)
Blend ‘bottoms-up’, ‘market-driven’ implementation approaches with a ‘top-down’ vision for an integrated data environment.
Assure that the HP science community participates in all levels of data management.
Science Archives Workshop - April 25, 2007 - Page 17
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture. Guiding Principles
All data produced by the HP missions will be open and made available as soon as is practical.
Gurman's "Right Amount of Glue” from the Fall 2002 AGU meeting sets the philosophy [see http://lwsde.gsfc.nasa.gov], a key component of which is a standard of behavior - share one’s data with everyone.
Data will be independently scientifically usable. adequate documentation including uniform SPASE descriptions sustainable and open data formats easy electronic access provision of appropriate analysis tools.
Science Archives Workshop - April 25, 2007 - Page 18
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture. Architecture
The environment will be distributed Many archives with different internal workings
Data integration capabilities provided by discipline-based virtual observatories (“VxO’s”; VSO first for x = “Solar” and now 5 others)
linked by a central dictionary (“SPASE Data model”) and machine-to-machine communication routines.
Easily permits the inclusion of essential data sets from non-NASA sources.
Provides a context for services and advanced analysis tools developed under, e.g. AISRP, LWS TR&T, and the VxOs.
Science Archives Workshop - April 25, 2007 - Page 19
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Policy Recommendations, Etc.
The Policy includes: Roles of data environment components “Rules of the Road” for data use, Recommendations for Project Data Management Plans
and Mission Archive Plans, A timeline of the HP mission data lifecycle
Science Archives Workshop - April 25, 2007 - Page 20
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture. Implementation
Use peer-review processes to assist in managing the elements of the environment.
NRAs for: (a) VxOs, (b) Data quality and access improvement, (c) Resident Archives, and (d) Value-added services.
Mission and Data Center Senior Reviews RA reviews.
Success will be determined by community use and feedback. The process is “market-driven.”
Science Archives Workshop - April 25, 2007 - Page 21
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture. Current Activities
Finalizing the Data Policy with community input. Our goal is to have this ready for the MIDEX AO
Implementing a second round of VxOs and processing the next round of proposals for VxOs and related services.
Coordinating these efforts through frequent interactions and work with the SPASE group.
Implementing Resident Archives and the processes to manage these archives.
Working with new missions to incorporate the Data Policy from the start, and “retrofitting” older missions through VxOs and other means.
Working on collaboration with other NASA science divisions, other US agencies, and international partners.
Maintaining a web site for latest news about our data environment:http://hpde.gsfc.nasa.gov.