data mining of environmental models for sensitivity analysis

31
Data Mining of Data Mining of Environmental Models Environmental Models for Sensitivity for Sensitivity Analysis Analysis Tom Stockton Paul Black, Andy Schuh, Kate Catlett, John Tauxe Neptune and Company, Inc. www.neptuneandco.com nowledge Discovery re

Upload: charles-porter

Post on 30-Dec-2015

19 views

Category:

Documents


0 download

DESCRIPTION

Data Mining of Environmental Models for Sensitivity Analysis. re. Knowledge Discovery. Tom Stockton Paul Black, Andy Schuh, Kate Catlett, John Tauxe Neptune and Company, Inc. www.neptuneandco.com. Issue. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Mining of Environmental Models for Sensitivity Analysis

Data Mining of Environmental Data Mining of Environmental Models for Sensitivity AnalysisModels for Sensitivity Analysis

Tom Stockton

Paul Black, Andy Schuh, Kate Catlett, John Tauxe

Neptune and Company, Inc.

www.neptuneandco.com

Knowledge Discoveryre

Page 2: Data Mining of Environmental Models for Sensitivity Analysis

IssueIssue

How to conduct a sensitivity analysis of a complex high dimensional probabilistic environmental model?

Page 3: Data Mining of Environmental Models for Sensitivity Analysis

Decision ModelingDecision Modeling

1. Decision Model, build and solve– Decision Actions and Outcomes– Utility (costs, liabilities, desires) – Probabilistic model

• Scenario• Model• Parameter

2. Sensitivity analysis (knowledge re-discovery)3. Value of information analysis (OUT-path)4. Data collection5. Update model (Bayesian or ad hoc)

Page 4: Data Mining of Environmental Models for Sensitivity Analysis

Decision ModelingDecision Modeling

U(d | I) = supd SMY U(d | y , S, M,M) utility function

p(S) scenario uncertainty

p(M | S) model uncertainty

p(M | S) parameter uncertainty

p(I | M M, S) data likelihood

p(y | M , M,S) risk predictive dist

dy dS dM dM

where:U = utility, loss, cost M = model structured = decision M = model parametersI = information/data S = scenario

y = risk

Page 5: Data Mining of Environmental Models for Sensitivity Analysis

Sensitivity AnalysisSensitivity Analysis

Given a model:

Y = f (X) [Y = GoldSim(X)]

Sensitivity analysis is aimed at describing the influence of each input variable Xi on the model response Y

Page 6: Data Mining of Environmental Models for Sensitivity Analysis

Sensitivity MeasuresSensitivity Measures

• One-At-A-Time (OAT)

• Differential Analysis

• Global– Statistical

• scatter plots, correlation, regression, rank transformations

– Data mining• Sobol, FAST, MARS, MART

iX

f

)(X

Page 7: Data Mining of Environmental Models for Sensitivity Analysis

Desirable PropertiesDesirable Propertiesof a SA Measureof a SA Measure

• Efficiency– account for all effects while being

computationally affordable

• Simplicity– implementable and interpretable

• Model Independent– The method can handle non-linearity, non-

monotonicity (across time and space)

K. Chan, S. Tarantola and A. Saltelli, 2000, Variance-Based Methods, in Sensitivity Analysis, A. Saltelli, K. Chan, E.M.Scott.John Wiley and Sons.

Page 8: Data Mining of Environmental Models for Sensitivity Analysis

Sensitivity MeasuresSensitivity Measures

• OAT and Differential Analysis, for complex probabilistic models, often are– not efficient, and

– not model independent

Page 9: Data Mining of Environmental Models for Sensitivity Analysis

Global Sensitivity MeasuresGlobal Sensitivity Measures• Sensitivity Measure

• Build a statistical model of the model response and the model inputs using the Monte Carlo simulation results

• Decompose variance of the output and attribute to input variables

)(Var

)]|(E[Var

Y

xYS iX

ii

Page 10: Data Mining of Environmental Models for Sensitivity Analysis

Standardized Rank RegressionStandardized Rank Regression

SRR– Rank Y and Xi and scale the ranks to mean of 0 and

variance of 1 for convenience

2

1

2

1

so

)(Var)(Var

ii

p

i ii

p

i ii

S

XY

xy

Based on the ranks of Y and Xi

Assuming the Xi are independent

Page 11: Data Mining of Environmental Models for Sensitivity Analysis

Fourier Amplitude Sensitivity TestFourier Amplitude Sensitivity Test

FAST– Explores the multidimensional input

space of the input factors by a search curve using Fourier transform function.

– Handles main and interaction effects

K. Chan, S. Tarantola and A. Saltelli, 2000, Variance-Based Methods, in Sensitivity Analysis, A. Saltelli, K. Chan, E.M.Scott.John Wiley and Sons.

Page 12: Data Mining of Environmental Models for Sensitivity Analysis

IssuesIssues• Differential Analysis

– not feasible: derivatives of complex models

• SRR and OAT– not model independent: trouble with

nonmonotonic nonlinear models.

– not efficient: trouble with interaction effects in high dimensional models

• FAST– not efficient: Separate model runs

Page 13: Data Mining of Environmental Models for Sensitivity Analysis

Possible SolutionsPossible Solutions

• Data mine the probabilistic model output– Multivariate Adaptive Regression Splines

(MARS)– Multiple Additive Regression Trees

(MART)

Page 14: Data Mining of Environmental Models for Sensitivity Analysis

Data MiningData Mining• MARS

– Non-parametric recursive partitioning approach that fits separate splines to distinct intervals of the predictor variables.

• MART– Explores the multidimensional input space of the

input factors using gradient boosting of additive regression models.

• Advantages– Search for interactions between variables, allowing any degree of

interaction to be considered. – Tracks very complex data structures in high-dimensional data.

Page 15: Data Mining of Environmental Models for Sensitivity Analysis

Sensitivity Indices viaSensitivity Indices viaANOVA decompositionANOVA decomposition

x

xx

sx

skjikji

sjiji

siios

Kkjikji

Kjiji

Kiio

SPR

SPRS

fySPR

xxxfxxfxfaf

xxxfxxfxfaf

s

s

s

mmm

2

},,{},{

3,,

2,

1

)(ˆ

),,(),()()(ˆ

),,(),()()(ˆ

X

X

X

Sensitivity indices are calculated using basis functions not including xs

Page 16: Data Mining of Environmental Models for Sensitivity Analysis

Analytical ExampleAnalytical Example

Sobol’ g-function

 

  

 

 

p

iii xgy

1

)(

i

iiii a

axxg

1

|24|)(

))(arcsin(sin1

2

1iii swx

Saltelli A., Tarantola S., and Chan K.P.-S. (1999), “A Quantitative Model-Independent Method for Global Sensitivity Analysis of Model Output,” Technometrics, 41, 39-55.

Page 17: Data Mining of Environmental Models for Sensitivity Analysis

Example: Sobol’ Example: Sobol’ gg-function-function

Input a Sensitivities

Analytic MART MARS FAST SRR

x1 0 23 0.73 0.565 0.733 0.773 0.0005

x2 1 55 0.23 0.281 0.224 0.193 0.0015

x3 4.5 77 0.032 0.094 0.036 0.025 0.045

x4 9 97 0.009 0.05 0.009 0.008 0.197

x5 99 107 0.0001 0.005 0.0006 0.0002 0.207

x6 99 113 0.0001 0.004 0.0000 0.0005 0.437

x7 99 121 0.0001 0.0 0.0000 0.0001 0.007

x8 99 125 0.0001 0.0 0.0000 0.0002 0.105

Saltelli A., Tarantola S., and Chan K.P.-S. (1999), “A Quantitative Model-Independent Method for Global Sensitivity Analysis of Model Output,” Technometrics, 41, 39-55.

Page 18: Data Mining of Environmental Models for Sensitivity Analysis

Public BenefitAnalysis Costs

ALARA Costs

Monitoring Costs

Disposal Fees

Cumulative (CA)

Management Options - Institutional Controls - Site Maintenance - Waste Acceptance - Closure - Monitoring/Surveillance

Potential Liabilities

Closure Costs

Research, Monitoring,Information & Data

Collection

Choose Management Options & Update Management Plan

YES

NO

Ecosystem

MOP & IHI Occupational

Regulations & Guidance

Can the risk be managed to regulatory thresholds at an acceptable cost with

an acceptable level of uncertainty?

Assessm

ntam inati

Disposal Costs

Budgets

• Maintenance Review• Periodic Review• Waste Acceptance

Decision• Closure Decision

C ost-Benefit Analysis

Fate & Transport

Existing Inventory

Future Inventory

12

3

4

5

Cost

Management

Risk

Contamination

Uncertainty

analysis

Sensitivity

analysis

Value of

Information

6

7

Iteration

loop

Legend

1Sequence

number

8

Page 19: Data Mining of Environmental Models for Sensitivity Analysis

Simulation ResultsSimulation Results

• Model Inputs ( X )– Inventory– Fate and transport

• Upward advection

• Biotic transport

• Model response ( Y )– “EPA-SUM”

Page 20: Data Mining of Environmental Models for Sensitivity Analysis

Model ResponseModel Response

EPA Sum

Pro

ba

bili

ty

1.0

e-0

30

1.0

e-0

25

1.0

e-0

20

1.0

e-0

15

2.6

e-0

12

3.7

e-0

09

3.2

e-0

07

3.8

e-0

05

9.6

e-0

03

1.0

e+

00

01

.0e

+0

01

0.0001

0.0010

0.0100

0.1000

0.5000

1.0000

Page 21: Data Mining of Environmental Models for Sensitivity Analysis

Relative Influence PlotRelative Influence Plot

Relative Influence

Ant2 MaxDepth

Ant2 NestWidth

Dry Bulk Density

Kd Np

Kd U

Solubility U

Termite1 b

Upward Flux Rate

0 0.2 0.4 0.6 0.8 1

Key

MART

SRR

Page 22: Data Mining of Environmental Models for Sensitivity Analysis

Partial Dependence PlotsPartial Dependence Plotspa

rtia

l dep

ende

nce

0 1e-04 2e-04 3e-04 4e-04

-4-2

02

4

Upward Flux Rate

5e-04 0.001 0.0015 0.002

05

1015

20

Kd Np

5 10 15 20 25 30

-0.6

-0.4

-0.2

00.

20.

40.

6 Termite1 b

50 100 150 200

-0.4

-0.2

00.

20.

4

Ant2 NestWidth

1200 1400 1600 1800 2000

-0.4

-0.2

00.

20.

4

Dry Bulk Density

0.001 0.002 0.003 0.004

00.

51

Kd U

0 0.002 0.004 0.006

-0.2

-0.1

00.

10.

2

Solubility U

300 320 340 360 380 400

-0.2

-0.1

00.

10.

2

Ant2 MaxDepth

MART

SRR

Density

Page 23: Data Mining of Environmental Models for Sensitivity Analysis

Co-partial Dependence PlotCo-partial Dependence Plot

Page 24: Data Mining of Environmental Models for Sensitivity Analysis

Variation ExplainedVariation Explained

MART/MART/TimeTime SRRSRR MARSMARSGCDGCD10,00010,000 0.910.91 0.990.99LANLLANL

5050 0.870.87 0.940.94100100 0.860.86 0.960.96500500 0.750.75 0.910.91

1,0001,000 0.710.71 0.950.9510,00010,000 0.710.71 0.930.93

Page 25: Data Mining of Environmental Models for Sensitivity Analysis

Sensitivity ConvergenceSensitivity Convergence

Measure of Relative Sensitivity

Sim

ula

tion

Siz

e

0 0.2 0.4 0.6 0.8 1

100 Sims (MART)

100 Sims (SRR)

500 Sims (MART)

500 Sims (SRR)

1000 Sims (MART)

1000 Sims (SRR)

2500 Sims (MART)

2500 Sims (SRR)

5000 Sims (MART)

5000 Sims (SRR)

Upward.Flux.Rate

0 0.2 0.4 0.6 0.8 1

Kd.def.Kd.Np

0 0.02 0.04 0.06 0.08 0.1

Ant2.Data.NestWidth

0 0.05 0.1 0.15

100 Sims (MART)

100 Sims (SRR)

500 Sims (MART)

500 Sims (SRR)

1000 Sims (MART)

1000 Sims (SRR)

2500 Sims (MART)

2500 Sims (SRR)

5000 Sims (MART)

5000 Sims (SRR)

Dry.Bulk.Density

0 0.05 0.1 0.15 0.2 0.25

Kd.def.Kd.U

0 0.05 0.1 0.15 0.2 0.25

Termite1.Data.b

Page 26: Data Mining of Environmental Models for Sensitivity Analysis

Upward Flux OATUpward Flux OAT

Upward flux rate

EP

A S

um

0.00005 0.00015 0.00025 0.00035

1e-18

1e-16

1e-14

1e-12

1e-10

Page 27: Data Mining of Environmental Models for Sensitivity Analysis

SummarySummary

• MART and MARS appear to provide an– Efficient– Simple (?)– Model Independent

approach to data mining probabilistic model results for sensitivity analysis

Page 28: Data Mining of Environmental Models for Sensitivity Analysis

Finally…Finally…

• The decision context:– Is the uncertainty in the model response

too high?– Is there value in reducing input

uncertainty?– SA and cost used to estimate the value of

collecting additional information.

Page 29: Data Mining of Environmental Models for Sensitivity Analysis

FASTFAST

}sincos{ jsBjsAy jj ( 1 )

w h e r e A j a n d B j a r e t h e F o u r i e r c o e f f i c i e n t s a n d c a n b e e s t i m a t e d v i a a f a s t F o u r i e rt r a n s f o r m a l g o r i t h m

T h e s p e c t r u m o f t h e F o u r i e r t r a n s f o r m i s

22jjj BA ( 2 )

S u m m i n g a l l j p r o v i d e s a n e s t i m a t e o f t h e t o t a l v a r i a n c e i n y

Zj

jDˆ ( 3 )

S u m m i n g a l l j e x c l u d i n g t h e f r e q u e n c y e m b e d d e d i n x i a n d i t s a s s o c i a t e d h i g h e r

h a r m o n i c s , Z 0 , p r o v i d e s a n e s t i m a t e o f t h e v a r i a n c e d u e t o t h e u n c e r t a i n t y i n x i

0

ˆZj

jiD ( 4 )

T h e s e n s i t i v i t y o f y t o x i i s t h e n g i v e n b y

DDS iiˆ/ˆˆ ( 5 )

Page 30: Data Mining of Environmental Models for Sensitivity Analysis

MARSMARS• Non-parametric recursive partitioning approach that fits

separate splines to distinct intervals of the predictor variables.

• Both the selected variables and the knots are found via a brute force, exhaustive search procedure optimized simultaneously by evaluating a "loss of fit" criterion.

• Searches for interactions between variables, allowing any degree of interaction to be considered.

• Tracks very complex data structures in high-dimensional data.

J.H. Friedman, (1991), “Multivariate Adaptive Regression Splines,” The Annals of Statistics, 19, 1-14

Software:Trevor Hastie and Robert Tibshirani, MDA Library for R (‘GNU S’).

Ross Ihaka and Robert Gentleman, (1996) R: A Language for Data Analysis and Graphics, Journal of Computational and Graphical Statistics, 5, 3, 299-314. www.r-project.org.

Page 31: Data Mining of Environmental Models for Sensitivity Analysis

MARTMART

• Multiple Additive Regression Trees– Explores the multidimensional input

space of the input factors using gradient boosting of additive regression models.

– Handles main and interaction effects.– Fast

K. Chan, S. Tarantola and A. Saltelli, 2000, Variance-Based Methods, in Sensitivity Analysis, A. Saltelli, K. Chan, E.M.Scott.John Wiley and Sons.