department of mathematics and geosciences 1 department of earth & planetary sciences and...

28
Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston, Illinois, USA 2 Department of Statistics and Institute for Policy Research, Northwestern University, Evanston, Illinois, USA 3 Department of Mathematics and Geosciences, University of Trieste. Italy 4 SAND Group, ICTP. Trieste. Italy Edward Brooks Edward Brooks 1 , Seth Stein , Seth Stein 1 , Bruce D. Spencer , Bruce D. Spencer 2 Antonella Peresan Antonella Peresan 3,4 3,4 Metrics, observations, and biases in quantitative assessment of seismic hazard model predictions CSNI Workshop on Testing PSHA Results and Benefit of Bayesian Techniques for Seismic Hazard Assessment Pavia, Italy (4-6 February 2015)

Upload: tamsin-conley

Post on 19-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston,

Department of Mathematics and Geosciences

1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston, Illinois, USA

2 Department of Statistics and Institute for Policy Research, Northwestern University, Evanston, Illinois, USA

3 Department of Mathematics and Geosciences, University of Trieste. Italy

4 SAND Group, ICTP. Trieste. Italy

Edward BrooksEdward Brooks11, Seth Stein, Seth Stein11, Bruce D. Spencer, Bruce D. Spencer22

Antonella PeresanAntonella Peresan3,43,4

Metrics, observations, and biases in quantitative assessment

of seismic hazard model predictions

CSNI Workshop onTesting PSHA Results and Benefit of Bayesian Techniques for Seismic Hazard Assessment

Pavia, Italy (4-6 February 2015)

Page 2: Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston,

What’s going wrong with existing maps?

How can we improve forecasts?

How can we quantify their uncertainties?

How can we measure their performance?

How do we know when to update them?

How good do they have to be useful?

How do we make sensible policy given forecasts limitations?

Forecasting ground shaking: Forecasting ground shaking: many maps… and many questionsmany maps… and many questions

Page 3: Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston,

Geller 2011

Geller (2011) argued that “all of Japan is at risk from earthquakes, and the present state of seismological science does not allow us to reliably differentiate the risk level in particular geographic areas,” so a map showing uniform hazard would be preferable to the existing maps.

How should we test this idea?

Page 4: Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston,

How good a baseball player was Babe Ruth?

The answer depends on the metric used.

In many seasons Ruth led the league in both home runs and in the number of times he struck out.

By one metric he did very well, and by another, very poorly.

Page 5: Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston,

From users’ perspective, what specifically should hazard maps seek to

accomplish?Different users likely want different things

How do we measure how well they meet users requirements?No agreed way yet…

Page 6: Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston,

Lessons from meteorologyLessons from meteorology Weather forecasts are routinely evaluated to assess how well

their predictions matched what actually occurred: "it is difficult to establish well-defined goals for any project designed to enhance forecasting performance without an unambiguous definition of what constitutes a good forecast." (Murphy, 1993)

Information about how a forecast performs is crucial in determining how best to use it. The better a weather forecast has worked to date, the more we factor it into our daily plans.

Page 7: Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston,

Chosing appropriate metrics is crucial in assessing performance of forecasts.

Silver (2012) shows that TV weather forecasts have a "wet bias" - predicting more rain than actually occurs, probably because they feel that customers accept unexpectedly sunny weather, but are annoyed by unexpected rain.

Page 8: Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston,

From users’ perspective, what specifically should hazard maps

seek to accomplish?

How do we measure how well they do it?

How much can we improve them?How can we quantify their large

uncertainties?

Page 9: Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston,

How to measure map performance?

Implicit probabilistic map criterion: after appropriate time predicted shaking exceeded at only a fraction p of sites

Define fractional site exceedance metric M0(f,p) = |f – p| where f is fraction of sites exceeding predicted shaking

Ideal map has M0 = 0

M0=0

Page 10: Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston,

Fractional site exceedance is a useful metric but only tells part of the story

Both maps are successful, but… This map exposed some sites to

much greater shaking than predicted. This situation could reflect faults that had larger earthquakes than assumed.

M0=0

Page 11: Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston,

This map significantly

overpredicted shaking, which

could arise from overestimating the

magnitude of the largest

earthquakes.

M0=0Fractional site exceedance is a useful metric but only tells part of the story

All these maps are successful, but…

M0=0

Page 12: Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston,

Other metrics can provide additional information beyond the fractional site exceedance M0

Squared misfit to the data

M1(s,x) = i (xi - si)2/N

 measures how well the predicted shaking

compares to the highest observed.

From a purely seismological view, M1 tells us more than M0 about how well a map

performed.

Page 13: Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston,

Other metrics can provide additional information beyond the fractional site exceedance M0

Because underprediction does potentially more harm

than overprediction, we could weight

underprediction more heavily.

Asymmetric squared misfit

M2(s,x) = i wi(xi - si)2/N with

 wi = a for (xi - si) > 0 and wi = b for (xi - si) ≤ 0

More useful for hazard mitigation than M1

Page 14: Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston,

Other metrics can provide additional information beyond the fractional site exceedance M0

Shaking-weightedasymmetric squared misfit We could use larger weights for areas predicted to be the most hazardous, so the map is judged most on how it does there.

Page 15: Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston,

Other metrics can provide additional information beyond the fractional site exceedance M0

Exposure-weightedasymmetric squared misfit We could use larger weights for areas with the largest exposure of people or property, so the map is judged most on how it does there.

Page 16: Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston,

Although no single metric fully characterizes map performance, using several metrics can provide valuable insight for assessing and improving hazard maps

Page 17: Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston,

Comparing maps could be done via the skill score  

SS(s,r,x) = 1 - M(s,x) / M(r,x) 

where M is any of the metrics, x is the maximum observed shaking, s is the map prediction, and r is the

prediction of a reference map produced using a selected null hypothesis (e.g. uniform hazard).

The skill score would be positive if the map's predictions did better than those of the map made with the null

hypothesis, and negative if they did worse.

We could assess how well maps have done after a certain time, and whether successive generations of

maps do better.

Page 18: Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston,

Nekrasova et al., 2014

217 BC – 2002 AD

Page 19: Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston,

One possible space-time sampling bias…

The probabilistic map with 2% probability of exceedance in 50 years (i.e. ground shaking expected at least once in 2475 years) significantly overestimates the shaking reported over a comparable time span (about 2200 years).

The deterministic map, which is not associated to a specific time span, also tends to overestimate the ground shaking with respect to past earthquakes.

Historical catalog thought to be incomplete (Stucchi et al., 2004) and may underestimate the largest shaking due to space-time sampling bias

Page 20: Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston,

a) TOTAL – [1000,1500) b) TOTAL – [1500,2000)

Intensity differences between the NDSHA map obtained for the entire catalog (TOTAL) and the maps obtained for the time intervals (500 years catalog):

a) [1000,1500) e b) [1500, 2000)

Dependence of seismic hazard estimateson the time span of the input catalog: NDSHA map

Page 21: Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston,

TOTAL – [1000,1500) TOTAL – [1500,2000)

Intensity differences between the NDSHA map obtained for the entire catalog (TOTAL) and the maps obtained, considering the seismogenic nodes, for the

time intervals: a) [1000,1500) e b) [1500, 2000)

Dependence of seismic hazard estimateson the time span of the input catalog: NDSHA map

Page 22: Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston,

Options after an earthquake yields shaking larger than anticipated:

Either regard the high shaking as a low-probability event allowed by the map

Or – as usually done - accept that high shaking was not simply a low-probability event and revise the map

Page 23: Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston,

No formal or objective criteria

are used to decide whether to change

map & how

Done via BOGSAT (“Bunch Of Guys Sitting Around

Table”)

Challenge: a new map that better describes the past may or may not better predict the

future

Page 24: Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston,

Deciding whether to remake a map

is like deciding after a coin has come up heads a number of times whether to continue assuming that the coin is fair and the run is a low-probability event, or to change to a model

in which the coin is assumed to be biased.

Changing the model may describe future worse

?

Page 25: Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston,

Bayes’ Rule – how much to change depends on one’s confidence in prior model

Revised probability model =Likelihood of observations given the prior model

x Prior probability model

If you were confident that the coin was fair, you would probably not change your model. If you were

given the coin at a magic show, your confidence would be lower and you would be more likely to

change your model.

?

Page 26: Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston,

Assume Poisson earthquake recurrence with λ = 1/T = 1/50 = 0.02 years

This estimate is assumed (prior) to have mean μ and standard deviation σ

If earthquake occurs after only 1 year

The updated forecast, described by the posterior mean, increasingly differs from the initial forecast (prior mean) when the uncertainty in the prior distribution is larger. The less confidence we have in the prior model, the more a new datum can change it.

Page 27: Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston,

We need We need agreed ways of assessing how well hazard maps performed and thus whether one map performed better than another.

This information is crucial to tell how much confidence to have in using them for very expensive policy decisions.

Although no single metric alone fully characterizes map behavior, using several metrics can provide useful insight for comparing and improving maps.

Deciding when and how to revise hazard maps should combine BOGSAT – subjective judgement given limited information - and and Bayes – ideas about parameter uncertainty.

Conclusions

Page 28: Department of Mathematics and Geosciences 1 Department of Earth & Planetary Sciences and Institute for Policy Research, Northwestern University, Evanston,

ChallengeChallenge

U.S. Meteorologists (U.S. Meteorologists (Hirschberg et al., 2011Hirschberg et al., 2011) ) have adopted a goal of have adopted a goal of “routinely providing “routinely providing

the nation with comprehensive, skillful, the nation with comprehensive, skillful, reliable, sharp, and useful information about reliable, sharp, and useful information about

the uncertainty of hydrometeorological the uncertainty of hydrometeorological forecasts.forecasts.” ”

Although seismologists have a tougher Although seismologists have a tougher challenge and a longer way to go, we should challenge and a longer way to go, we should try to do the same for earthquake hazards.try to do the same for earthquake hazards.