podag 25: levels of service data set prioritization maturity matrix and levels of service ron weaver...

35
PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom & John J. Bates NOAA – NESDIS-NCDC

Upload: noreen-carter

Post on 29-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

Data Set PrioritizationMaturity Matrix

and Levels of Service

Ron Weaver(with liberal borrowing from Bruce R. Barkstrom &

John J. Bates NOAA –NESDIS-NCDC

Page 2: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

Outline

• Background: Why do we need data set prioritization?

• Definitions: Maturity Matrix Levels of Service (LOS)

• Examples: How might NSIDC employ LOS and MM?

• Discussion: Role of PoDAG in the process

Page 3: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

BACKGROUND

• Prioritization called for by multiple studies. (c.f.): NRC Climate Data Records NSB Long Lived Data Sets NASA Roadmapping efforts NESDIS CLASS/ Science Data Stewardship

• NASA HQ is asking the DAACs to prioritize their data holdings NSIDC was the first DAAC to go through a

prioritization process (January 2006). JPL is the second, and is using templates developed by NSIDC and ESDIS

Page 4: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

An Approach to Prioritization

Prioritization for what purpose?Identify data that is scientifically important

Ingest, Keep, Throw away

Change level/type of service

years

activ

ity

? Years ?

years

impa

ct

Terra launched in 1999, papers by non-MODIS team members only in literature in ‘05

Page 5: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

An Approach to Prioritization

Data Set priority might be determined from an assessment of the following Data set Activity Level Stakeholder interest Maturity Matrix

Level of Service is the outcome of the prioritization process and informs the users and stakeholders of the actions (to be) taken

Page 6: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

MATURITY MATRIX

Page 7: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

MM: Objective and Approaches

• Objectives Reduce difficulty and confusion in community about

scientific data stewardship Produce an easily understood way of identifying

maturity of data products and hence science data stewardship requirements

• Approaches Barkstrom/Bates SDS Model NASA Roadmap MMI Model

Page 8: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

MM: A Simple Maturity Model

• Represent data maturity in terms of three separate dimensions: Scientific Maturity Preservation Maturity Societal Impact

• Total maturity is simply length of vector

Scientific Maturity

Preservation Maturity

Societal Impact

Maturity ofdata for use

Page 9: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

MM: Questions

• How do we ensure common understanding of errors and their impact? Two parts: error budget of the environmental element and of

the measurement method leading to a signal to noise determination

• How do we produce understandable measures of costs – including data production and long-term stewardship?

• What metrics should we use for long-term value of data?

• How do we assure that prioritizations are ‘apples to apples’?

Page 10: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

NRC Panel on Climate Variability and Change July 25, 200513

Technology

Measurements

Modeling

Decision Support

- H- M- L

- H- M- L

- H- M- L

- H- M- L

Pathfinder

Operational Precursor

Operational Mission

Instrument Incubator

Planned Improvement

InitialValidated

OSSI

Pilot

Routine Use

Policy

Technology development

starts

Pathfinder Mission launch

Operational Precursor Mission launch

Pilot Program

for decision support begins

Operational Mission launch

Technology Improvement

begins

Decision support use

demonstrated

Decision support use

routine

Measurement Maturity

Measurement Maturity Index

1 2 3 4 5 6 7

Page 11: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

MM: NSIDC’s Proposed Approach

• Science Maturity1: physical understanding of measurement process (algorithms and data sources

documented)2: key measurement characteristics understood (instrument variability

documented)3: data processing steps transparent and available4: rigorous validation (community acceptance of algorithms and validation)

• Preservation Maturity1: systematic approach to preservation implemented (metadata, data using

known standards)2: threats to data loss mitigated (routine media refresh, off-site backup)3: long term preservation assured (funding and systems in place for multiple year

curation)• Societal Impact (very tentative)

1: Short term predictions that impact society (health, property etc.)2: Useful to determine trends that impact society3: Useful to characterized impacts or uncertainty in other measurements that do

impact society

Page 12: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

Levels of Service: As Defined by NSIDC

Page 13: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

Levels of Service from a Data Center PerspectiveFunction Level of Service (increasing to right)

Orphan(not public)

Brokered “Bare Bones”(public)

Basic + Distribution

Full

Known to NSIDC Ingest + Archive Metadata ? ? Skinny DIF

+ Distribution User Services Referral Minimal Minimal Documentation As Provided

Examples WDC Orphans

CLP ?

Orphans

ARCSS DAAC EOS Standard Products

Page 14: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

Levels of Service from a Data Center Perspective

• Not in the previous list, but certainly considered Production Tools preparation and access Long term archival issues Resource demands

Page 15: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

Downloaded presentation contains additional descriptive material derived from the NSIDC Data Policy Document

Page 16: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

Examples: How DAAC Might Approach Prioritization

Page 17: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

An Approach to Prioritization

Prioritization for what purpose?Identify data that is scientifically important Keep, throwawayChange level/type of service

Data Set priority might be determined from an assessment of the following Data set Activity Level Stakeholder interest Maturity Matrix

Level of Service defines the outcome of the prioritization process and informs the users and stakeholders of the actions taken

Page 18: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

Data Set Activity Levels

• Ingest1. None: no ingest or production, no new data being added.2. Low: less than 10% of total volume being ingested or produced in a given

year or little or no staff intervention3. Nominal: between 10 and 80% of volume being ingested or produced in a

given year and/or routine staff intervention4. High: greater than 80% of the volume being ingested or produced in a

given year and/or significant staff intervention• Distribution*

1. None: no requests. Data set is archived in a steady state2. Low: between 1 – 5 requests per year, less than 5% of the total data

volume3. Nominal: greater than 5 requests4. High: greater than 50 requests and/or greater than 100 GB per month

*distribution impact on NSIDC is driven more by number of requests that require user services interaction at the low end, but more by data volume at the high end.

Page 19: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

Maturity Matrix Levels

• Science Maturity1: physical understanding of measurement process (algorithms and data sources

documented)2: key measurement characteristics understood (instrument variability

documented)3: data processing steps transparent and available4: rigorous validation (community acceptance of algorithms and validation)

• Preservation Maturity1: systematic approach to preservation implemented (metadata, data using

known standards)2: threats to data loss mitigated (routine media refresh, off-site backup)3: long term preservation assured (funding and systems in place for multiple year

curation)• Societal Impact (very tentative)

1: Short term predictions that impact society (health, property etc.)2: Useful to determine trends that impact society3: Useful to characterized impacts or uncertainty in other measurements that do

impact society

Page 20: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

A proposed Template

AE_DYSNO

Description

I. Product DevelopersProduct Algorithm Theoretical Basis

Science Need (justification) Quality and Accuracy Information Intended or Appropriate Product UseScience Value

II. DAAC and ESDIS-SOOHeritage

RationaleWhere data came fromAuthorization or agreement for DAAC to manage these data

Current Involvement/Responsibility DAAC developed and/or managedDAAC provided infrastructureShared responsibility with other NSIDC or external programsBrokered with other institutions

Descriptive MetricsData volume 2.30 GB

Number of Granules 1,171 granules

Category/Level 1 2 3 4Activity - Ingest X

Activity - Distribution XMaturity - Science X

Maturity - Preservation XLevel of Service X

III. Proposed recommendations for <Title>Science research value: (Comments on potential designation as Climate Data Record/Earth Science Data Record.)NASA management priority:(Keep, move to other center, move to long term archive, other)Suggestions on Level and Type of Service desired: (Raise, lower, keep the same)

AMSR-E/Aqua Daily L3 Global Snow Water Equivalent EASE-GridsThe AMSR-E Level-3 daily snow water equivalent (SWE) data set contains SWE data and quality assurance flags mapped to Northern and Southern Hemisphere 25 km Equal-Area Scalable Earth Grids (EASE-Grids). Data are stored in HDF-EOS format.

AMSR Snow Water Equivalent ATBD (http://eospso.gsfc.nasa.gov/eos_homepage/for_scientists/atbd/docs/AMSR/atbd-amsr-snow.pdf)

MANDATED BY EOS, STANDARD PRODUCT

MANAGED IN ECS

Page 21: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

Unanswered Questions in General

• Important that the prioritization strategy fit in situ data sets as well as remote sensing (global coverage) data sets. Does this framework fit both?

• How do we characterize unanticipated future uses?• Are there a different set of questions that should be

asked at initial consideration time, versus questions when long term retention is being considered?

• What is a data set? In an ESDR framework, are the SSMIs (F-8, F-11, F-13 …) treated as

data sets or is the SSMI timeseries the data set? Important from a naming convention point of view

Page 22: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

Unanswered Questions for PoDAG

• How should PoDAG proceed on prioritization?

Page 23: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

BACKUP SLIDES

Page 24: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

Template for Data Producers

Title for specific data set (ESDT) or group of datasets

Brief Narrative Description Product Algorithm Theoretical Basis Science Need (justification) Quality and Accuracy Information

(cal/val, relative and absolute uncertainty, stability, maturity of algorithm)

Intended or Appropriate Product Use(also including limitations on use where appropriate)

Science Value(use of product for science, papers written, breakthroughs,

multidisciplinary use)

Page 25: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

Template for DAAC and ESDIS-SOO • Title for specific data set (ESDT) or group of datasets• Heritage

Rationale for DAAC involvement in the data set(s) Where data came from Authorization or agreement for DAAC to manage these data

(EOS Program, DAAC User Working Group, MOUs, requests, other)• Descriptive Metrics (as described in SOO metrics presentation)

Size (e.g. data volume, number of granules, etc) Activity levels

• Level and Type of Service(s) Characterization of Services from DAAC

• Current Involvement/Responsibility DAAC developed and/or managed DAAC provided infrastructure Shared responsibility with other NSIDC or external programs Brokered with other institutions

(meaning they are hosted at other institutions, with web presence on DAAC website)

Page 26: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

Review Template prepared by NSIDC

• Heritage• Justification

Science EOSDIS UWG

• DAAC Responsibility (DAAC, NSIDC Shared, Brokered)

Category/Level 1 2 3 4

Activity - Ingest

Activity - Distribution

Maturity - Science

Maturity - Preservation

Level of Service

Page 27: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

Following slides from John BatesNOAA NCDC

Page 28: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

Metrics – A Maturity Model for Climate Data Records*

• Reduce difficulty and confusion in community about scientific data stewardship

• Produce an easily understood way of identifying maturity of data products and science data stewardship approaches

• Help identify areas needing improvement

* With Bruce Barkstrom, NASA

Page 29: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

Component Maturity for Climate Data Records

• Identify key attributes of maturity in each dimension

• Develop maturity ranking for each attribute on scale of 1 to 5

• Summarize component maturity by weighting each attribute Simplest weight = 1/Number of attributes Develop more complex weightings after experience

with approach

• Advantage: can do much of work with simple spreadsheet

Page 30: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

Scientific Maturity Key Attributes

• Physical Understanding of Measurement Process

• Measurement of Key Instrument Characteristics

• Public Accessibility of Data Processing

• Rigorous Validation

Page 31: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

Key Attribute Assessment Areas - Public Accessibility of Data Processing

Key Assessment Area

Level 1 Level 2 Level 3 Level 4 Level 5

Number of Analysis

Teams/CDR

None Single Two Multiple Consensus benchmark

Number of Independent Observing

Systems/CDR

None Single Two Multiple Benchmark

Reducing Model Uncertainties:

Forcings/Feedbacks/Validation

None Single Two Three Demonstrated benchmark

Availability of technique and computer code

None Technique in one

publication

Technique in multiple

publications

Computer code available

Computer code available and used by other groups

Page 32: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

Preservation Maturity Key Attributes

• Systematic Approach to Guaranteeing Preservation of Data Understanding

• Systematic Reduction of Threats to Preservation

• Assurance of Preservation Cost Effectiveness

Page 33: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

Societal Benefit Key Attributes

• Bibliometric Metrics Publications and Citations

• Scientific Community Knowledge Data use, including interdisciplinary data fusion

and statistical studies

• Economic and Policy Utility Market valuation increase Reduction in time to influence policy Benefit/Hazard Reduction resulting from data use

Page 34: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

Some Caveats

• Using a Maturity Model will be exploratory – and iterative No expectation we’ll get it “right” the first time through

• Community Diversity must be incorporated Different views of data processing, calibration,

validation, need for knowledge preservation Different vocabularies

• Deep Uncertainty needs to be incorporated Diversity of opinions on areas of scientific controversy

and value need common framework and disciplined discussion – openness a key

Including “societal benefit” is very difficult and risky

Page 35: PoDAG 25: Levels of Service Data Set Prioritization Maturity Matrix and Levels of Service Ron Weaver (with liberal borrowing from Bruce R. Barkstrom &

PoDAG 25: Levels of Service

Key Benefits

• Allows us to develop an approach consistent with NRC Recommendations on Metrics

• Open Process Can surface divergent needs and opinions Can provide disciplined forum for discussion and

resolution of differences

• Periodic Evaluation is required Incorporate new information and deeper thought Evaluation allows new directions