podag 25: levels of service data set prioritization maturity matrix and levels of service ron weaver...
TRANSCRIPT
PoDAG 25: Levels of Service
Data Set PrioritizationMaturity Matrix
and Levels of Service
Ron Weaver(with liberal borrowing from Bruce R. Barkstrom &
John J. Bates NOAA –NESDIS-NCDC
PoDAG 25: Levels of Service
Outline
• Background: Why do we need data set prioritization?
• Definitions: Maturity Matrix Levels of Service (LOS)
• Examples: How might NSIDC employ LOS and MM?
• Discussion: Role of PoDAG in the process
PoDAG 25: Levels of Service
BACKGROUND
• Prioritization called for by multiple studies. (c.f.): NRC Climate Data Records NSB Long Lived Data Sets NASA Roadmapping efforts NESDIS CLASS/ Science Data Stewardship
• NASA HQ is asking the DAACs to prioritize their data holdings NSIDC was the first DAAC to go through a
prioritization process (January 2006). JPL is the second, and is using templates developed by NSIDC and ESDIS
PoDAG 25: Levels of Service
An Approach to Prioritization
Prioritization for what purpose?Identify data that is scientifically important
Ingest, Keep, Throw away
Change level/type of service
years
activ
ity
? Years ?
years
impa
ct
Terra launched in 1999, papers by non-MODIS team members only in literature in ‘05
PoDAG 25: Levels of Service
An Approach to Prioritization
Data Set priority might be determined from an assessment of the following Data set Activity Level Stakeholder interest Maturity Matrix
Level of Service is the outcome of the prioritization process and informs the users and stakeholders of the actions (to be) taken
PoDAG 25: Levels of Service
MATURITY MATRIX
PoDAG 25: Levels of Service
MM: Objective and Approaches
• Objectives Reduce difficulty and confusion in community about
scientific data stewardship Produce an easily understood way of identifying
maturity of data products and hence science data stewardship requirements
• Approaches Barkstrom/Bates SDS Model NASA Roadmap MMI Model
PoDAG 25: Levels of Service
MM: A Simple Maturity Model
• Represent data maturity in terms of three separate dimensions: Scientific Maturity Preservation Maturity Societal Impact
• Total maturity is simply length of vector
Scientific Maturity
Preservation Maturity
Societal Impact
Maturity ofdata for use
PoDAG 25: Levels of Service
MM: Questions
• How do we ensure common understanding of errors and their impact? Two parts: error budget of the environmental element and of
the measurement method leading to a signal to noise determination
• How do we produce understandable measures of costs – including data production and long-term stewardship?
• What metrics should we use for long-term value of data?
• How do we assure that prioritizations are ‘apples to apples’?
NRC Panel on Climate Variability and Change July 25, 200513
Technology
Measurements
Modeling
Decision Support
- H- M- L
- H- M- L
- H- M- L
- H- M- L
Pathfinder
Operational Precursor
Operational Mission
Instrument Incubator
Planned Improvement
InitialValidated
OSSI
Pilot
Routine Use
Policy
Technology development
starts
Pathfinder Mission launch
Operational Precursor Mission launch
Pilot Program
for decision support begins
Operational Mission launch
Technology Improvement
begins
Decision support use
demonstrated
Decision support use
routine
Measurement Maturity
Measurement Maturity Index
1 2 3 4 5 6 7
PoDAG 25: Levels of Service
MM: NSIDC’s Proposed Approach
• Science Maturity1: physical understanding of measurement process (algorithms and data sources
documented)2: key measurement characteristics understood (instrument variability
documented)3: data processing steps transparent and available4: rigorous validation (community acceptance of algorithms and validation)
• Preservation Maturity1: systematic approach to preservation implemented (metadata, data using
known standards)2: threats to data loss mitigated (routine media refresh, off-site backup)3: long term preservation assured (funding and systems in place for multiple year
curation)• Societal Impact (very tentative)
1: Short term predictions that impact society (health, property etc.)2: Useful to determine trends that impact society3: Useful to characterized impacts or uncertainty in other measurements that do
impact society
PoDAG 25: Levels of Service
Levels of Service: As Defined by NSIDC
PoDAG 25: Levels of Service
Levels of Service from a Data Center PerspectiveFunction Level of Service (increasing to right)
Orphan(not public)
Brokered “Bare Bones”(public)
Basic + Distribution
Full
Known to NSIDC Ingest + Archive Metadata ? ? Skinny DIF
+ Distribution User Services Referral Minimal Minimal Documentation As Provided
Examples WDC Orphans
CLP ?
Orphans
ARCSS DAAC EOS Standard Products
PoDAG 25: Levels of Service
Levels of Service from a Data Center Perspective
• Not in the previous list, but certainly considered Production Tools preparation and access Long term archival issues Resource demands
PoDAG 25: Levels of Service
Downloaded presentation contains additional descriptive material derived from the NSIDC Data Policy Document
PoDAG 25: Levels of Service
Examples: How DAAC Might Approach Prioritization
PoDAG 25: Levels of Service
An Approach to Prioritization
Prioritization for what purpose?Identify data that is scientifically important Keep, throwawayChange level/type of service
Data Set priority might be determined from an assessment of the following Data set Activity Level Stakeholder interest Maturity Matrix
Level of Service defines the outcome of the prioritization process and informs the users and stakeholders of the actions taken
PoDAG 25: Levels of Service
Data Set Activity Levels
• Ingest1. None: no ingest or production, no new data being added.2. Low: less than 10% of total volume being ingested or produced in a given
year or little or no staff intervention3. Nominal: between 10 and 80% of volume being ingested or produced in a
given year and/or routine staff intervention4. High: greater than 80% of the volume being ingested or produced in a
given year and/or significant staff intervention• Distribution*
1. None: no requests. Data set is archived in a steady state2. Low: between 1 – 5 requests per year, less than 5% of the total data
volume3. Nominal: greater than 5 requests4. High: greater than 50 requests and/or greater than 100 GB per month
*distribution impact on NSIDC is driven more by number of requests that require user services interaction at the low end, but more by data volume at the high end.
PoDAG 25: Levels of Service
Maturity Matrix Levels
• Science Maturity1: physical understanding of measurement process (algorithms and data sources
documented)2: key measurement characteristics understood (instrument variability
documented)3: data processing steps transparent and available4: rigorous validation (community acceptance of algorithms and validation)
• Preservation Maturity1: systematic approach to preservation implemented (metadata, data using
known standards)2: threats to data loss mitigated (routine media refresh, off-site backup)3: long term preservation assured (funding and systems in place for multiple year
curation)• Societal Impact (very tentative)
1: Short term predictions that impact society (health, property etc.)2: Useful to determine trends that impact society3: Useful to characterized impacts or uncertainty in other measurements that do
impact society
PoDAG 25: Levels of Service
A proposed Template
AE_DYSNO
Description
I. Product DevelopersProduct Algorithm Theoretical Basis
Science Need (justification) Quality and Accuracy Information Intended or Appropriate Product UseScience Value
II. DAAC and ESDIS-SOOHeritage
RationaleWhere data came fromAuthorization or agreement for DAAC to manage these data
Current Involvement/Responsibility DAAC developed and/or managedDAAC provided infrastructureShared responsibility with other NSIDC or external programsBrokered with other institutions
Descriptive MetricsData volume 2.30 GB
Number of Granules 1,171 granules
Category/Level 1 2 3 4Activity - Ingest X
Activity - Distribution XMaturity - Science X
Maturity - Preservation XLevel of Service X
III. Proposed recommendations for <Title>Science research value: (Comments on potential designation as Climate Data Record/Earth Science Data Record.)NASA management priority:(Keep, move to other center, move to long term archive, other)Suggestions on Level and Type of Service desired: (Raise, lower, keep the same)
AMSR-E/Aqua Daily L3 Global Snow Water Equivalent EASE-GridsThe AMSR-E Level-3 daily snow water equivalent (SWE) data set contains SWE data and quality assurance flags mapped to Northern and Southern Hemisphere 25 km Equal-Area Scalable Earth Grids (EASE-Grids). Data are stored in HDF-EOS format.
AMSR Snow Water Equivalent ATBD (http://eospso.gsfc.nasa.gov/eos_homepage/for_scientists/atbd/docs/AMSR/atbd-amsr-snow.pdf)
MANDATED BY EOS, STANDARD PRODUCT
MANAGED IN ECS
PoDAG 25: Levels of Service
Unanswered Questions in General
• Important that the prioritization strategy fit in situ data sets as well as remote sensing (global coverage) data sets. Does this framework fit both?
• How do we characterize unanticipated future uses?• Are there a different set of questions that should be
asked at initial consideration time, versus questions when long term retention is being considered?
• What is a data set? In an ESDR framework, are the SSMIs (F-8, F-11, F-13 …) treated as
data sets or is the SSMI timeseries the data set? Important from a naming convention point of view
PoDAG 25: Levels of Service
Unanswered Questions for PoDAG
• How should PoDAG proceed on prioritization?
PoDAG 25: Levels of Service
BACKUP SLIDES
PoDAG 25: Levels of Service
Template for Data Producers
Title for specific data set (ESDT) or group of datasets
Brief Narrative Description Product Algorithm Theoretical Basis Science Need (justification) Quality and Accuracy Information
(cal/val, relative and absolute uncertainty, stability, maturity of algorithm)
Intended or Appropriate Product Use(also including limitations on use where appropriate)
Science Value(use of product for science, papers written, breakthroughs,
multidisciplinary use)
PoDAG 25: Levels of Service
Template for DAAC and ESDIS-SOO • Title for specific data set (ESDT) or group of datasets• Heritage
Rationale for DAAC involvement in the data set(s) Where data came from Authorization or agreement for DAAC to manage these data
(EOS Program, DAAC User Working Group, MOUs, requests, other)• Descriptive Metrics (as described in SOO metrics presentation)
Size (e.g. data volume, number of granules, etc) Activity levels
• Level and Type of Service(s) Characterization of Services from DAAC
• Current Involvement/Responsibility DAAC developed and/or managed DAAC provided infrastructure Shared responsibility with other NSIDC or external programs Brokered with other institutions
(meaning they are hosted at other institutions, with web presence on DAAC website)
PoDAG 25: Levels of Service
Review Template prepared by NSIDC
• Heritage• Justification
Science EOSDIS UWG
• DAAC Responsibility (DAAC, NSIDC Shared, Brokered)
Category/Level 1 2 3 4
Activity - Ingest
Activity - Distribution
Maturity - Science
Maturity - Preservation
Level of Service
PoDAG 25: Levels of Service
Following slides from John BatesNOAA NCDC
PoDAG 25: Levels of Service
Metrics – A Maturity Model for Climate Data Records*
• Reduce difficulty and confusion in community about scientific data stewardship
• Produce an easily understood way of identifying maturity of data products and science data stewardship approaches
• Help identify areas needing improvement
* With Bruce Barkstrom, NASA
PoDAG 25: Levels of Service
Component Maturity for Climate Data Records
• Identify key attributes of maturity in each dimension
• Develop maturity ranking for each attribute on scale of 1 to 5
• Summarize component maturity by weighting each attribute Simplest weight = 1/Number of attributes Develop more complex weightings after experience
with approach
• Advantage: can do much of work with simple spreadsheet
PoDAG 25: Levels of Service
Scientific Maturity Key Attributes
• Physical Understanding of Measurement Process
• Measurement of Key Instrument Characteristics
• Public Accessibility of Data Processing
• Rigorous Validation
PoDAG 25: Levels of Service
Key Attribute Assessment Areas - Public Accessibility of Data Processing
Key Assessment Area
Level 1 Level 2 Level 3 Level 4 Level 5
Number of Analysis
Teams/CDR
None Single Two Multiple Consensus benchmark
Number of Independent Observing
Systems/CDR
None Single Two Multiple Benchmark
Reducing Model Uncertainties:
Forcings/Feedbacks/Validation
None Single Two Three Demonstrated benchmark
Availability of technique and computer code
None Technique in one
publication
Technique in multiple
publications
Computer code available
Computer code available and used by other groups
PoDAG 25: Levels of Service
Preservation Maturity Key Attributes
• Systematic Approach to Guaranteeing Preservation of Data Understanding
• Systematic Reduction of Threats to Preservation
• Assurance of Preservation Cost Effectiveness
PoDAG 25: Levels of Service
Societal Benefit Key Attributes
• Bibliometric Metrics Publications and Citations
• Scientific Community Knowledge Data use, including interdisciplinary data fusion
and statistical studies
• Economic and Policy Utility Market valuation increase Reduction in time to influence policy Benefit/Hazard Reduction resulting from data use
PoDAG 25: Levels of Service
Some Caveats
• Using a Maturity Model will be exploratory – and iterative No expectation we’ll get it “right” the first time through
• Community Diversity must be incorporated Different views of data processing, calibration,
validation, need for knowledge preservation Different vocabularies
• Deep Uncertainty needs to be incorporated Diversity of opinions on areas of scientific controversy
and value need common framework and disciplined discussion – openness a key
Including “societal benefit” is very difficult and risky
PoDAG 25: Levels of Service
Key Benefits
• Allows us to develop an approach consistent with NRC Recommendations on Metrics
• Open Process Can surface divergent needs and opinions Can provide disciplined forum for discussion and
resolution of differences
• Periodic Evaluation is required Incorporate new information and deeper thought Evaluation allows new directions