helper variables why do we want them? how do we create them? what to avoid with them? tom pagano...

13
Helper Variables Why do we want them? How do we create them? What to avoid with them? Tom Pagano [email protected] 503-414-3010

Upload: timothy-potter

Post on 17-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Helper Variables

Why do we want them?

How do we create them?

What to avoid with them?

Tom [email protected] 503-414-3010

Why helper variables?

The target (i.e. predictand) time series may have holes in important years

or a short period of record.

If that data is easily estimated, filling the gaps may lead to a

better, or at least more honest, forecast.

Sargents is missing during a hydrologically interesting period. This is also the period of most of our predictors (i.e. SNOTEL). Gunnison could be used to fill in gaps.

Why?

Sargents is missing during a hydrologically interesting period. This is also the period of most of our predictors (i.e. SNOTEL). Gunnison could be used to fill in gaps.

Strength of correlation very good

Why?

Another example… seasonally operated gages

Another example… seasonally operated gages

Correlation between mar-sep and apr-sep = 0.9996No point in throwing away years where only march is missing.

Helper variable interface

Neat stuff here but don’t touch if you don’t know what you’re doing.Default is unchecked.

Main ways to use helper variables

Different station, same months: (Upstream vs downstream)Estimating one gage from another

Same station, different months: (May-Jul vs Apr-Jul)Estimating longer time period from shorter

Same station, months, different sources: (USGS vs AWDB)Estimating natural flow from observed

Helper not used

Helper not used

Helper vs targetscatterplot

Helper used

Wider range of years…

More stable relationship

More consistent

with nearby forecasts

Dangers of helper variables

Statistically, we do not include the imperfect relationship between helper and original target in the final forecast error bounds.

We are increasing our chances of overconfident forecasts.

Therefore, it is best to only estimate a few years and only if the relationship is very good (e.g. r2>0.9)

Dangers of helper variables

Statistically, we do not include the imperfect relationship between helper and original target in the final forecast error bounds.

We are increasing our chances of overconfident forecasts.

Therefore, it is best to only estimate a few years and only if the relationship is very good (e.g. r2>0.9)

Consider too whether the relationshipbetween the helper and the original target

is stable versus time…

For example… Use observed flow as helper

to estimate natural flow.Have the regulations changed over time?