a probabilistic-spatial approach to the quality control of climate observations christopher daly,...

34
A Probabilistic-Spatial Approach A Probabilistic-Spatial Approach to the Quality Control of Climate to the Quality Control of Climate Observations Observations Christopher Daly, Wayne Gibson, Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and Matthew Doggett, Joseph Smith, and George Taylor George Taylor Spatial Climate Analysis Service Spatial Climate Analysis Service Oregon State University Oregon State University Corvallis, Oregon, USA Corvallis, Oregon, USA

Upload: garey-mills

Post on 12-Jan-2016

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

A Probabilistic-Spatial Approach to the A Probabilistic-Spatial Approach to the

Quality Control of Climate ObservationsQuality Control of Climate Observations

Christopher Daly, Wayne Gibson, Matthew Doggett, Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George TaylorJoseph Smith, and George Taylor

Spatial Climate Analysis ServiceSpatial Climate Analysis Service

Oregon State UniversityOregon State University

Corvallis, Oregon, USACorvallis, Oregon, USA

Page 2: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

Traditional QC Systems are Traditional QC Systems are CategoricalCategorical and and DeterministicDeterministic

• Data subjected to categorical quality checksData subjected to categorical quality checks– Designed to uncover mistakesDesigned to uncover mistakes

• Validity determined from test resultsValidity determined from test results– Mistake = flag / tossMistake = flag / toss

– No mistake = no flag / keepNo mistake = no flag / keep

Designed to Work With Human Observing Systems

Page 3: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

Alien Electronic Devices are Invading the Alien Electronic Devices are Invading the Climate Observing World!Climate Observing World!

They’re Everywhere!They’re Everywhere!

Page 4: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

Electronic SensorsElectronic Sensors and and Modern ApplicationsModern Applications Create Challenges for Traditional QC SystemsCreate Challenges for Traditional QC Systems

• Errors tend to be Errors tend to be continuous drift, rather continuous drift, rather than categorical than categorical mistakesmistakes

• Increasing usage of Increasing usage of computer applications computer applications that rely on climate that rely on climate observationsobservations

• ContinuousContinuous estimates, estimates, rather than categorical rather than categorical tests, of observation validitytests, of observation validity

• QuantitativeQuantitative estimates of estimates of observational uncertainty, observational uncertainty, not just flagsnot just flags

Situation Need

Page 5: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

More Challenges…More Challenges…

• Range of applications is Range of applications is increasingly rapidly, and increasingly rapidly, and each has a difference each has a difference tolerance for outlierstolerance for outliers

• Data are often more Data are often more

voluminous and voluminous and disseminated in a more disseminated in a more timely mannertimely manner

• ProbabilisticProbabilistic information from information from which a decision to use an which a decision to use an obs can be made, not up-obs can be made, not up-front decisionfront decision

• AutomatedAutomated QC methods QC methods

Situation Need

Page 6: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

An OpportunityAn Opportunity

Advances in climate mapping technology now make it possible to estimate a reasonably accurate “expected value” for an observation based on surrounding stations.

Assumption: Spatial consistency is related to observation validity

Page 7: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

Useful Characteristics for a Next-Generation Useful Characteristics for a Next-Generation Climate QC SystemClimate QC System

continuouscontinuousquantitativequantitativeprobabilisticprobabilisticautomatedautomatedspatialspatial

Page 8: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

PRISM Probabilistic-Spatial QC (PSQC) System PRISM Probabilistic-Spatial QC (PSQC) System for SNOTEL Datafor SNOTEL Data

Uses climate mapping technology and climate statistics to provide a continuous, quantitative confidence probability for each observation, estimate a replacement value, and provide a confidence interval for that replacement.

• Start with daily max/min temperature for all SNOTEL sites, period of record

• Move to precipitation, SWE, soil temperature and moisture

• Develop automated system for near-real time operation at NRCS

Page 9: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

Climatological Grid DevelopmentClimatological Grid Development

– PRISM must produce a high-quality PRISM must produce a high-quality estimate of temperature at each estimate of temperature at each SNOTEL station each day SNOTEL station each day

– Highest interpolation skill obtained by Highest interpolation skill obtained by using a high-quality predictive grid that using a high-quality predictive grid that represents the long-term climatological represents the long-term climatological temperature for that day, rather than a temperature for that day, rather than a digital elevation grid digital elevation grid

– Climatological grid: 0.8 km resolution, Climatological grid: 0.8 km resolution, 1971-2000 1971-2000

4 km

0.8 km

Page 10: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

Oregon Annual Precipitation

Leveraging Information Content of High-Quality Climatologies to Create New Maps with Fewer Data and Less Effort

Climatology used in place of DEM as PRISM predictor grid

Page 11: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

PRISM Regression of “Weather vs Climate”

PRISM Results

18

20

22

24

26

28

30

32

34

16.5

17.5

18.5

19.5

20.5

21.5

22.5

23.5

24.5

25.5

26.5

71-00 Mean July Maximum Temperature

Dai

ly M

axim

um

Tem

per

atu

re (

C)

21D12S

21D35S

21D13S

353402

21D08S

5211C70E

324045CC

3240335C

21D14S

Regression

Stn: 21D12SDate: 2000-07-20Climate: 21.53Obs:26.0Prediction: 25.75Slope: 1.4Y-Intercept: -4.37

20 July 2000 Tmax vs 1971-2000 Mean July Tmax

Page 12: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

- Generates gridded estimates of climatic parameters

- Moving-window regression of climate vs. elevation for each grid cell- Uses nearby station observations

- Spatial climate knowledge base (KBS) weights stations in the regression function by their climatological similarity to the target grid cell

PRISM

Parameter-elevation Regressions on Independent Slopes Model

Page 13: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

PRISM KBS accounts for spatial variations in climate due to:

- Elevation- Terrain orientation- Terrain steepness- Moisture regime- Coastal proximity- Inversion layer- Long-term climate patterns

PRISM

Parameter-elevation Regressions on Independent Slopes Model

Page 14: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

PRISM Moving-Window Regression Function

1961-90 Mean April Precipitation, Qin Ling Mountains, China

Weighted linearregression

Page 15: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

Rain Shadows: 1961-90 Mean Annual PrecipitationOregon Cascades

Portland

Eugene

Sisters

Redmond

Bend

Mt. Hood

Mt. Jefferson

Three Sisters

N

350 mm/yr

2200 mm/yr

2500 mm/yr

Dominant PRISM KBSComponents

Elevation

Terrain orientation

Terrain steepness

Moisture Regime

Page 16: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

1961-90 Mean Annual Precipitation, Cascade Mtns, OR, USA

Page 17: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

1961-90 Mean Annual Precipitation, Cascade Mtns, OR, USA

Page 18: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

Coastal Effects: 1971-00 July Maximum TemperatureCentral California Coast

Monterey

San Francisco

San Jose

Santa Cruz

Hollister

Salinas

Stockton

Sacramento

Pac

ific

Oce

an

Fremont

N

PreferredTrajectories

DominantPRISM KBS Components

Elevation

Coastal Proximity

Inversion Layer

34°

20° 27°

Oakland

Page 19: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

Inversions – 1971-00 July Minimum Temperature Northwestern California

Ukiah

Cloverdale Lakeport

Willits

Cle

ar

Lak

e

Pacific Ocean

Lake Pilsbury.

N

DominantPRISM KBS Components

Elevation

Inversion Layer

Topographic Index

Coastal Proximity

12°

17°

16°

10°

17°

Page 20: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

Definition of CP: Given the difference between an observation and an expected value (residual), CP is the probability that another observation and expected value from the same time of year would differ by at least as much

Residual distribution+/- 15 day, +/- 2 year window = 5 yrs, 31 days each (N~155)

PRISM PSQC SystemPRISM PSQC SystemConfidence Probability Confidence Probability

(CP)(CP)

XS

X X

P P

Page 21: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

Confidence Probability Takes into Account Confidence Probability Takes into Account Uncertainty in the SystemUncertainty in the System

XS

X X

P P

XS

X XP P

P-value is higher for a given deviation from the mean when Sx is large (low skill)

X = Residual (P-O)

Low Overall Skill High Overall Skill

Page 22: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

Interpreting Confidence ProbabilityInterpreting Confidence Probability

Continuous values from 0 – 100%

0% = highly spatially inconsistent observation, reflected in a PRISM prediction that is unusually different than the observation

100% = highly consistent observation, reflected in a PRISM prediction that is relatively close to the observation

Guidelines to dateCP > 30: Use observation as-is

10 < CP < 30: Blend prediction and observation

CP < 10: Use prediction instead of observation

Page 23: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

PRISM PSQC ProcessPRISM PSQC Process1. 1. CreateCreate Database RecordsDatabase Records

Goal:Goal: Enter daily tmax/tmin observations for all networks into database and prepare Enter daily tmax/tmin observations for all networks into database and prepare data data

Current Actions: Current Actions:

1.1. Ingest daily tmin/tmax observations from SNOTEL, COOP, RAWS, Agrimet, Ingest daily tmin/tmax observations from SNOTEL, COOP, RAWS, Agrimet, ASOS, and first-order networks.ASOS, and first-order networks.

2.2. Shift AM COOP observations of tmax to previous day (assumes standard Shift AM COOP observations of tmax to previous day (assumes standard diurnal curve, which does not always apply).diurnal curve, which does not always apply).

3.3. Convert units to degrees Celsius.Convert units to degrees Celsius.

Page 24: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

PRISM PSQC ProcessPRISM PSQC Process2. 2. Single-Station ChecksSingle-Station Checks

Goal:Goal: Take all QC actions possible at the single-station level, before entering the spatial QC Take all QC actions possible at the single-station level, before entering the spatial QC process. process.

Current Checks: Current Checks:

1.1. Temperature observation is well above the all-time record maximum or well below the Temperature observation is well above the all-time record maximum or well below the all-time record minimum for the state – flag set and CP set to 0all-time record minimum for the state – flag set and CP set to 0

2.2. Maximum temperature is less than the minimum temperature – flag set and CP set to 0Maximum temperature is less than the minimum temperature – flag set and CP set to 0

3.3. First daily tmax/tmin observation after a period of missing data – flag set and CP set to First daily tmax/tmin observation after a period of missing data – flag set and CP set to 0 (COOP only?)0 (COOP only?)

4.4. More than 10 consecutive observations with the same value (<+/-1F COOP, <+/-0.1C More than 10 consecutive observations with the same value (<+/-1F COOP, <+/-0.1C others), or more than 5 consecutive zero values, is a definite flatliner – flag set and CP others), or more than 5 consecutive zero values, is a definite flatliner – flag set and CP set to 0set to 0

5.5. 5-10 consecutive observations with the same value is a potential flatliner, to be 5-10 consecutive observations with the same value is a potential flatliner, to be assessed by the spatial QC system – flag set and CP unchangedassessed by the spatial QC system – flag set and CP unchanged

Page 25: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

PRISM PSQC ProcessPRISM PSQC Process3. Spatial QC System3. Spatial QC System

GoalGoal:: Through a series of iterations, gradually and systematically “weed out” spatially inconsistent Through a series of iterations, gradually and systematically “weed out” spatially inconsistent observations from consistent onesobservations from consistent ones

Overview: Overview:

1.1. PRISM is run for each station location for each day, and summary statistics are PRISM is run for each station location for each day, and summary statistics are accumulatedaccumulated

2.2. Once all days have been run, frequency distributions are developed and confidence Once all days have been run, frequency distributions are developed and confidence probabilities (probabilities (CPCP) for each daily station observation are estimated) for each daily station observation are estimated

3.3. These These CPCP values are used to weight the daily observations in a second iteration of values are used to weight the daily observations in a second iteration of PRISM daily runsPRISM daily runs

4.4. Obs with lower Obs with lower CPCP values are given lower weight, and thus have less influence, in the values are given lower weight, and thus have less influence, in the second set of PRISM predictions, and are also given lower weight in the calculation of second set of PRISM predictions, and are also given lower weight in the calculation of the second set of summary statistics the second set of summary statistics

5.5. CPCP values are again calculated and passed back to the daily PRISM runs values are again calculated and passed back to the daily PRISM runs

6.6. This iterative process continues for about 5 iterations, at which time the This iterative process continues for about 5 iterations, at which time the CPCP values have values have reached equilibriumreached equilibrium

Page 26: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

QC IterationQC Iteration

For each station-day:For each station-day:• Run PRISM for each station location in its absence, estimating its obs for each dayRun PRISM for each station location in its absence, estimating its obs for each day• PRISM omits nearby stations, singly, and in pairs, to try to better match observationPRISM omits nearby stations, singly, and in pairs, to try to better match observation• Prediction closest to obs is accepted Prediction closest to obs is accepted

– Raw PRISM variables: Raw PRISM variables: Observation (O), Prediction (P), Residual (R=P-O), PRISM Regression Standard Deviation (S)

Once all station-days are run:• Calculate summary statistics for each station for each day

– Mean and std dev of O (Os), P (Ps), R (Rs), and S (Ss)– +/- 15 day, +/- 2 year window = 5 yrs, 31 days each (N~155)– 5-day running Standard Deviation (RunSD) as a measure of day-to-day variability (time shifting)– Potential flatliners: calculate V, the ratio of station’s RunSD (set to 0.3) to that of surrounding stations

• Determine “effective” standard deviation for frequency distribution– Sigma = Max ( Rs, S, Ss, RunSD, 2 )

• Calculate probability statistics for O, P, R, S, and V for each day– Probability statistics are p-values from z-tests– Residual Probability (RP) used as an estimate of overall Confidence Probability (CP) for an observation– Except in the case of potential flatliners, where CP = min(RP,VP)

• CP used to weight stations in next iteration

Page 27: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

Observations and CP values, Date: 1996-02-08

Drifting sensor : MCKENZIE PASS (21E07S)

Page 28: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

Climatology vs Observation and Prediction, Date: 1996-02-08

Drifting sensor : MCKENZIE PASS (21E07S)

Page 29: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

Warm Bias: SALT CREEK FALLS (22F04S)Observations, Date: 2000-07-14

Page 30: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

Anomalies and CP values, 7-21 July 2000

Warm Bias: SALT CREEK FALLS (22F04S)

14 July

Page 31: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

Scatter Plot: Climatology vs Observation, 14 July 2000

Warm Bias: SALT CREEK FALLS (22F04S)

22F04S

Odell Lake COOP

Page 32: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

Tmax Observations, Date: 2000-07-14

Page 33: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

Computing ObstaclesComputing Obstacles

• Computing – currently takes about 60 hours to run PRISM PSQC system for SNOTEL sites in the western US– 14-processor cluster

• Disk space – we now have > 1 TB, but will probably need more

• Funds are insufficient to “do it right”

Page 34: A Probabilistic-Spatial Approach to the Quality Control of Climate Observations Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George

Issues to ConsiderIssues to Consider

• How far can the assumption be taken that spatial How far can the assumption be taken that spatial consistency equates with validity?consistency equates with validity?

• Are continuous and probabilistic QC systems useful for manual observing systems?

• Can a high-quality QC system ever be completely automated?