department of civil and environmental engineering stanford ...rx578gy9871/tr175_jayaram.pdf ·...

Department of Civil and Environmental Engineering

Stanford University

Report No.

The John A. Blume Earthquake Engineering Center was established to promote research and education in earthquake engineering. Through its activities our understanding of earthquakes and their effects on mankind’s facilities and structures is improving. The Center conducts research, provides instruction, publishes reports and articles, conducts seminar and conferences, and provides financial support for students. The Center is named for Dr. John A. Blume, a well-known consulting engineer and Stanford alumnus. Address: The John A. Blume Earthquake Engineering Center Department of Civil and Environmental Engineering Stanford University Stanford CA 94305-4020 (650) 723-4150 (650) 725-9755 (fax) [email protected] http://blume.stanford.edu

©2010 The John A. Blume Earthquake Engineering Center

c© Copyright by Nirmal Jayaram 2010

All Rights Reserved

ii

Abstract

Lifelines are large, geographically-distributed systems that are essential support systems

for any society. Probabilistic seismic risk assessment for lifelines is less straightforward

than for individual structures, due to challenges in quantifying the ground-motion hazard

over a region rather than at just a single site and in developing a risk assessment frame-

work that deals with the heavy computational burden associated with lifeline performance

evaluations.

Quantification of the regional ground-motion hazard requires information on the joint

distribution of ground-motion intensities at multiple sites. Statistical tests are used here

to examine the commonly-used assumptions of univariate normality of logarithmic inten-

sities and multivariate normality of spatially-distributed logarithmic intensities. Further,

observed and simulated ground-motion time histories are used to estimate the spatial cor-

relation between intra-event residuals, which can be used to parameterize the joint distribu-

tion of the ground-motion intensities. Factors that affect the decay of the correlation with

increasing separation distance are identified.

The study then develops a computationally-efficient lifeline risk assessment framework

based on efficient sampling and data reduction techniques. The framework can be used

for developing a small, but stochastically representative, catalog of spatially-correlated

ground-motion intensity maps that can be used for performing lifeline risk assessments.

The catalog is used to evaluate the exceedance rates of various travel-time delays on an

aggregated (higher-scale) model of the San Francisco Bay Area transportation network.

The risk estimates obtained are consistent with those obtained using conventional Monte

Carlo simulation (MCS) that requires three orders of magnitudes more ground-motion in-

tensity maps. Therefore, the proposed technique can be used to drastically reduce the

iv

computational expense of a MCS-based risk assessment, without compromising the accu-

racy of the risk estimates. Further, the catalog of ground-motion intensity maps is used in

conjunction with a statistical learning technique termed Multivariate Adaptive Regression

Trees (MART) in order to obtain an approximate relationship between the ground-motion

intensities at lifeline component locations and the lifeline performance. The lifeline perfor-

mance predicted by this relationship can be used in place of the actual lifeline performance

with advantage in problems whose computational demand stems from the need for repeated

lifeline performance evaluations.

Even though the above-mentioned risk assessment framework facilitates the considera-

tion of spatial correlation between ground-motion intensities, current ground-motion mod-

els (e.g., NGA ground-motion models) that are used to predict the distribution of ground-

motion intensities at individual sites are fitted assuming independence between the intra-

event residuals. This study proposes a method to consider the spatial correlation in the

mixed-effects regression procedure used for fitting ground-motion models, and empirically

shows that the risk estimates of spatially-distributed systems can be inaccurate while using

ground-motion models fitted without the consideration of spatial correlation.

Finally, the study also investigates the extension of the seismic hazard and risk as-

sessment concepts discussed earlier to hurricane hazard and risk modeling. The focus is

on quantifying the uncertainties and the spatial correlation in hurricane wind fields (us-

ing the same techniques that are used to quantify these parameters in earthquake ground-

motion fields), and evaluating their impact on the hurricane risk of spatially-distributed

systems. The results show that the uncertainties and the spatial correlation in the wind

fields must be modeled in order to avoid introducing errors into the risk calculations of

spatially-distributed systems. The results also show that the tools developed in this thesis

for seismic risk assessment can also be applicable to risk assessments that consider other

hazards.

v

Acknowledgments

This work was supported by the Stanford Graduate Fellowship and the U.S. Geological

Survey (USGS) via External Research Program awards 07HQGR0031 and 07HQGR0032.

Any opinions, findings, and conclusions or recommendations expressed in this material are

those of the authors and do not necessarily reflect those of the USGS.

The report was originally published as the Ph.D. dissertation of the first author. The au-

thors would like to thank Professors Anne Kiremidjian, Sarah Billington, Kincho Law, Eric

Dunham, Jerome Friedman and Dr. Paolo Bazzurro for providing constructive feedback on

this work.

vi

Contents

Abstract iv

Acknowledgments vi

1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Areas of contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Multi-site hazard modeling . . . . . . . . . . . . . . . . . . . . . . 5

1.2.2 Lifeline risk assessment . . . . . . . . . . . . . . . . . . . . . . . 10

1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 Statistical Tests of the Joint Distribution of Spectral Acceleration Values 202.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.3 Testing the univariate normality of residuals . . . . . . . . . . . . . . . . . 23

2.3.1 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . 24

2.4 Testing the assumption of multivariate normality for random vectors using

independent samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.4.1 Henze-Zirkler test . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.4.2 Mardia’s measures of kurtosis and skewness . . . . . . . . . . . . . 31


2.5 Testing the assumption of multivariate normality for spatially distributed data 39

2.5.1 Check for bivariate normality . . . . . . . . . . . . . . . . . . . . 40


vii

2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2.7 Data source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

2.8 Appendix: Normal score transform . . . . . . . . . . . . . . . . . . . . . . 46

3 Correlation model for spatially distributed ground-motion intensities 483.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.3 Modeling correlations using semivariograms . . . . . . . . . . . . . . . . . 51

3.4 Computation of semivariogram ranges for intra-event residuals using em-

pirical data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.4.1 Construction of experimental semivariograms using empirical data . 57

3.4.2 1994 Northridge earthquake recordings . . . . . . . . . . . . . . . 59

3.4.3 1999 Chi-Chi earthquake . . . . . . . . . . . . . . . . . . . . . . . 61

3.4.4 Other earthquakes . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.4.5 A predictive model for spatial correlations . . . . . . . . . . . . . . 67

3.5 Isotropy of semivariograms . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.5.1 Isotropy of intra-event residuals . . . . . . . . . . . . . . . . . . . 69

3.5.2 Construction of a directional semivariogram . . . . . . . . . . . . . 70

3.5.3 Test for anisotropy using Northridge ground motion data . . . . . . 71

3.6 Comparison with previous research . . . . . . . . . . . . . . . . . . . . . . 71

3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4 Spatial correlation between spectral accelerations using simulated ground-motion time histories 794.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4.3 Statistical estimation of spatial correlation . . . . . . . . . . . . . . . . . . 83

4.4 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.4.1 Effect of ground-motion component orientation on the semivari-

ogram range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.4.2 Testing the assumption of isotropy using directional semivariograms 87

4.4.3 Testing the assumption of second-order stationarity . . . . . . . . . 88

viii

4.4.4 Effect of directivity on spatial correlation . . . . . . . . . . . . . . 90

4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5 Simulation of spatially-correlated ground-motion intensities with and withoutconsideration of recorded intensity values 935.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.3 Simulation of correlated residuals without consideration of recorded ground

motion intensities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

5.3.1 Single-step simulation technique . . . . . . . . . . . . . . . . . . . 97

5.3.2 Sequential simulation technique . . . . . . . . . . . . . . . . . . . 100

5.4 Importance sampling of normalized intra-event residuals . . . . . . . . . . 103

5.5 Sequential simulation of correlated residuals with consideration of recorded

ground motion intensities . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5.7 Appendix: The conditional sequential simulation of heteroscedastic nor-

malized residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

6 Efficient sampling and data reduction techniques for probabilistic seismic life-line risk assessment 1116.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

6.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

6.3 Simulation of ground-motion intensity maps using importance sampling . . 116

6.3.1 Importance sampling procedure . . . . . . . . . . . . . . . . . . . 117

6.3.2 Simulation of earthquake catalogs . . . . . . . . . . . . . . . . . . 118

6.3.3 Simulation of normalized intra-event residuals . . . . . . . . . . . 121

6.3.4 Simulation of normalized inter-event residuals . . . . . . . . . . . 123

6.4 Lifeline risk assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

6.4.1 Risk assessment based on realizations from Monte Carlo simulation 124

6.4.2 Risk assessment based on realizations from importance sampling . 125

6.5 Data reduction using K-means clustering . . . . . . . . . . . . . . . . . . . 126

ix

6.6 Application: Seismic risk assessment of the San Francisco Bay Area trans-

portation network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

6.6.1 Network data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

6.6.2 Transportation network loss measure . . . . . . . . . . . . . . . . 130

6.6.3 Ground-motion hazard . . . . . . . . . . . . . . . . . . . . . . . . 132


6.6.5 Importance of modeling ground-motion uncertainties and spatial

correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

6.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

6.8 Appendix: Proof that the exceedance rates obtained using IS and K-means

clustering are unbiased . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

6.9 Appendix: Improving the computational efficiency of the K-means cluster-

ing method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

7 Lifeline performance assessment using statistical learning techniques 1447.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

7.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

7.3 Brief introduction to ground-motion map sampling . . . . . . . . . . . . . 147

7.3.1 Conventional MCS of ground-motion maps . . . . . . . . . . . . . 147

7.3.2 Importance sampling of ground-motion maps . . . . . . . . . . . . 148

7.3.3 K-means clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 149

7.4 Confidence intervals for lifeline risk estimates . . . . . . . . . . . . . . . . 150

7.4.1 Network data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

7.4.2 Ground-motion hazard data . . . . . . . . . . . . . . . . . . . . . 153

7.4.3 Statistical description of the problem . . . . . . . . . . . . . . . . 153

7.4.4 Confidence intervals using bootstrap . . . . . . . . . . . . . . . . . 154

7.4.5 Approximate loss estimation using non-parametric regression . . . 156

7.4.6 Bootstrap confidence intervals estimated using the exact and the

approximate loss functions . . . . . . . . . . . . . . . . . . . . . . 162

7.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

x

8 Seismic risk assessment of spatially distributed systems using ground motionmodels fitted considering spatial correlation 1678.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

8.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

8.2.1 Current regression algorithm . . . . . . . . . . . . . . . . . . . . . 169

8.2.2 Should spatial correlation be considered in the regression algorithm? 172

8.3 Regression algorithm for mixed-effects models considering spatial correlation173

8.3.1 Covariance matrix for the total residuals . . . . . . . . . . . . . . . 174

8.3.2 Obtaining inter-event residuals from total residuals . . . . . . . . . 174

8.3.3 Algorithm summary . . . . . . . . . . . . . . . . . . . . . . . . . 176

8.3.4 Large sample standard errors of σ and τ . . . . . . . . . . . . . . . 176

8.3.5 Mixed-effects regression procedure in R . . . . . . . . . . . . . . . 177

8.4 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

8.4.1 Standard deviation of residuals as a function of period . . . . . . . 180

8.4.2 Estimates of spatial correlation . . . . . . . . . . . . . . . . . . . . 182

8.4.3 Risk assessment for a hypothetical portfolio of buildings . . . . . . 183

8.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

9 Hurricane risk assessment of spatially-distributed systems with considerationof wind-field uncertainties and spatial correlation 1879.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

9.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

9.3 Spatial correlation estimation methodology . . . . . . . . . . . . . . . . . 191

9.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

9.4.1 Data source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

9.4.2 Hurricane Jeanne (2004) . . . . . . . . . . . . . . . . . . . . . . . 194

9.4.3 Hurricane Frances (2004) . . . . . . . . . . . . . . . . . . . . . . 198

9.4.4 Hurricane risk assessment of a hypothetical portfolio of buildings . 200

9.5 Limitations and research needs . . . . . . . . . . . . . . . . . . . . . . . . 203

9.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

xi

10 Conclusions 20610.1 Contributions and practical implications . . . . . . . . . . . . . . . . . . . 206

10.1.1 Joint distribution of spectral acceleration values at different sites

and/ or different periods . . . . . . . . . . . . . . . . . . . . . . . 206

10.1.2 Spatial correlation model for spectral accelerations . . . . . . . . . 208

10.1.3 Lifeline seismic risk assessment using efficient sampling and data

reduction techniques . . . . . . . . . . . . . . . . . . . . . . . . . 209

10.1.4 Lifeline performance assessment using statistical learning techniques211

10.1.5 Seismic risk assessment of spatially-distributed systems using ground-

motion models fitted considering spatial correlation . . . . . . . . . 211

10.1.6 Extension of proposed ground-motion modeling approaches to hur-

ricane risk assessment . . . . . . . . . . . . . . . . . . . . . . . . 212

10.2 Limitations and future work . . . . . . . . . . . . . . . . . . . . . . . . . 213

10.2.1 Spatial correlation model for spectral accelerations . . . . . . . . . 213

10.2.2 Lifeline risk assessment . . . . . . . . . . . . . . . . . . . . . . . 215

10.2.3 Risk management . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

10.2.4 Multi-hazard risk assessment . . . . . . . . . . . . . . . . . . . . . 219

10.3 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

A Characterizing spatial cross-correlation between ground-motion spectral ac-celerations at multiple periods 221A.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

A.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

A.3 Statistical Estimation of Spatial Cross-Correlation . . . . . . . . . . . . . . 224

A.4 Sample Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . 227

A.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

B Supporting details for the spatial correlation model developed in Chapter 3 230B.1 Semivariograms of residuals estimated using the Northridge earthquake

ground motions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

B.2 Semivariograms of residuals estimated using Chi-Chi earthquake ground

motions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

xii

B.2.1 Exact versus approximate semivariogram fit . . . . . . . . . . . . . 235

B.2.2 Semivariograms of the residuals at seven periods ranging between

0 and 10s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

B.3 Semivariograms of residuals estimated using broadband simulations for

scenario earthquakes on the Puente Hills thrust fault system . . . . . . . . . 240

B.4 Clustering of Vs30’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

B.5 Correlation between near-fault ground-motion intensities . . . . . . . . . . 244

B.6 Directional semivariograms estimated using the Northridge and the Chi-

Chi earthquake records at various periods . . . . . . . . . . . . . . . . . . 250

C Deaggregation of lifeline risk: Insights for choosing deterministic scenarioearthquakes 257C.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

C.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

C.3 Deaggregation of seismic loss . . . . . . . . . . . . . . . . . . . . . . . . 260

C.4 Loss assessment for the San Francisco Bay Area transportation network . . 261

C.5 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

C.5.1 Contribution of magnitudes and faults to the lifeline losses . . . . . 263

C.5.2 Contribution of inter- and intra-event residuals to the lifeline loss . 267

C.6 Transportation network performance under sample scenario ground-motion

maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270

C.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270

Bibliography 273

xiii

List of Tables

2.1 Tests on normalized intra-event residuals computed at different periods . . . 35

2.2 Tests on inter-event residuals computed at different periods . . . . . . . . . 47

2.3 Tests on residuals corresponding to two orthogonal directions (fault-normal

and fault-parallel directions) . . . . . . . . . . . . . . . . . . . . . . . . . 47

8.1 Regression coefficients for estimating median Sa(1s) . . . . . . . . . . . . 179

8.2 Standard deviations of residuals corresponding to Sa(1s) . . . . . . . . . . 179

xiv

List of Figures

1.1 Comparison of the risk assessment frameworks for (a) single structures and

(b) lifelines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 1999 Chi-Chi earthquake: (a) recorded PGAs (b) median PGAs predicted

by the Boore and Atkinson [2008] ground-motion model (c) normalized

total residuals computed using Equation 1.2. . . . . . . . . . . . . . . . . . 6

1.3 2004 hurricane Jeanne (The line indicates the hurricane track.): (a) recorded

wind speeds (b) wind speeds predicted by Batts et al. [1980] wind-speed

model (c) wind-speed residuals. . . . . . . . . . . . . . . . . . . . . . . . 11

1.4 Ground-motion intensity simulation for a magnitude 8 earthquake on the

San Andreas fault: (a) median intensities obtained using the Boore and

Atkinson [2008] ground-motion model (b) simulated values of the normal-

ized total residuals (c) total intensities. . . . . . . . . . . . . . . . . . . . . 14

2.1 The normal Q-Q plots of the normalized intra-event residuals at four differ-

ent periods. (a) T = 0.5 seconds (1560 samples) (b) T = 1.0 seconds (1548

samples) (c) T = 2.0 seconds (1498 samples) (d) T = 10.0 seconds (507

samples). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.2 The histogram of the 12,194 pooled normalized intra-event residuals com-

puted at 10 periods, with the theoretical standard normal distribution su-

perimposed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.3 The normal Q-Q plot of the pooled set of normalized intra-event residuals. . 35

2.4 The normal Q-Q plots of inter-event residuals at four different periods. (a)

T = 0.5 seconds (64 samples) (b) T = 1.0 seconds (64 samples) (c) T = 2.0

seconds (62 samples) (d) T = 10.0 seconds (21 samples). . . . . . . . . . . 36

xv

2.5 Theoretical and empirical semivariograms for residuals computed at 2 sec-

onds: (a) results for the 0.1 quantile of the residuals from the Chi-Chi data

(b) results for the 0.25 quantile of the residuals from the Chi-Chi data (c)

results for the 0.5 quantile of the residuals based from the Chi-Chi data (d)

results for the 0.25 quantile of the residuals from the Northridge data. . . . 43

2.6 Theoretical and empirical semivariograms for the 0.25 quantile of the resid-

uals: (a) results for the residuals computed at 0.5s from the Northridge data

(b) results for the residuals computed at 0.5s from the Chi-Chi data (c) re-

sults for the residuals computed at 1s from the Chi-Chi earthquake data (d)

results for the residuals computed at 5s from the Chi-Chi data. . . . . . . . 45

3.1 (a) Parameters of a semivariogram (b) Semivariograms fitted to the same

data set using the manual approach and the method of least squares. . . . . 53

3.2 Range of semivariograms of ε , as a function of the period at which ε values

are computed: (a) the residuals are obtained using the Northridge earth-

quake data (b) the residuals are obtained using the Chi-Chi earthquake data. 59

3.3 (a) Experimental semivariogram obtained using normalized Vs30’s at the

recording stations of the Northridge earthquake. No semivariogram is fit-

ted on account of the extreme scatter (b) Experimental semivariogram ob-

tained using normalized Vs30’s at the recording stations of the Chi-Chi

earthquake. The range of the fitted exponential semivariogram equals 25 km. 63

3.4 Range of semivariograms of ε , as a function of the period at which ε values

are computed. The residuals are obtained using the: (a) Big Bear City

earthquake data (b) Parkfield earthquake data; (c) Alum Rock earthquake

data; (d) Anza earthquake data; (e) Chino Hills earthquake data. . . . . . . 65

3.5 Ranges of residuals computed using PGAs versus ranges of normalized

Vs30 values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.6 (a) Range of semivariograms of ε , as a function of the period at which ε

values are computed. The residuals are obtained from six different sets of

time histories as shown in the figure; (b) Range of semivariograms of ε

predicted by the proposed model as a function of the period. . . . . . . . . 67

xvi

3.7 (a) Parameters of a directional semivariogram. Subfigures (b), (c) and (d)

show experimental directional semivariograms at discrete separations ob-

tained using the Northridge earthquake ε values computed at 2 seconds.

Also shown in the figures is the best fit to the omni-directional semivari-

ogram: (b) azimuth = 0◦ (c) azimuth = 45◦ (d) azimuth = 90◦. . . . . . . . . 72

3.8 Semivariogram obtained using residuals computed based on Chi-Chi earth-

quake peak ground velocities: (a) residuals from Annaka et al. [1997] and

semivariogram model from Wang and Takada [2005] (b) residuals from

Annaka et al. [1997] and semivariogram fitted to model the discrete values

well at short separation distances (c) residuals from Annaka et al. [1997],

considering random amplification factors. . . . . . . . . . . . . . . . . . . 74

4.1 Semivariogram computed using the Sa(T=2s) residuals. . . . . . . . . . . . 86

4.2 Ranges of semivariograms obtained using residuals computed from the (a)

1989 Loma Prieta simulations (b) recorded ground motions [Jayaram and

Baker, 2009a]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.3 (a) Ranges are computed using residuals at different orientations (b) Omni-

directional (i.e., obtained using all pairs of points, irrespective of the az-

imuth) and directional semivariograms computed using residuals for Sa(2s). 89

4.4 (a) Ranges are computed using residuals from different spatial domains (b)

Ranges are computed using pulse-like and non-pulse-like near fault ground

motions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5.1 Ground-motion intensities map simulation: (a) median intensities (b) spa-

tially correlated normalized total residuals and (c) total intensities. . . . . . 96

5.2 Illustration of the sequential step procedure. . . . . . . . . . . . . . . . . . 102

5.3 The alternate sampling distribution (marginal distribution) used for the im-

portance sampling of residuals [Jayaram and Baker, 2010]. . . . . . . . . . 104

xvii

6.1 Importance sampling density functions for: (a) magnitude and (b) normal-

ized intra-event residual; (c) recommended mean-shift as a function of the

average number of sites and the average site-to-site distance normalized by

the range of the spatial correlation model. . . . . . . . . . . . . . . . . . . 120

6.2 (a) San Francisco Bay Area transportation network (b) Aggregated network. 134

6.3 (a) Travel-time delay exceedance curves (b) Coefficient of variation of the

annual exceedance rate (c) Comparison of the efficiency of MCS, IS and

the combination of K-means and IS (d) Travel-time delay exceedance curve

obtained using the K-means method. . . . . . . . . . . . . . . . . . . . . . 134

6.4 (a) Mean of travel-time delays within a cluster (b) Standard deviation of

travel-time delays within a cluster. With both clustering methods, cluster

numbers are assigned in order of increasing mean travel-time delay within

the cluster for plotting purposes. . . . . . . . . . . . . . . . . . . . . . . . 138

6.5 Comparison of site hazard curves obtained at two sample sites using the

sampling framework with that obtained using numerical integration. (a)

Sample site 1 and (b) Sample site 2. . . . . . . . . . . . . . . . . . . . . . 138

6.6 Exceedance curves obtained using simplifying assumptions. . . . . . . . . 143

6.7 Travel-time delay exceedance curve obtained using the two-step clustering

technique. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

7.1 Sample ground-motion map corresponding to an earthquake on the San

Andreas fault. A map is a collection of ground movement levels (ground-

motion intensities) at all the sites of interest. The sites of interest, in this

case, are located in the San Francisco Bay Area. . . . . . . . . . . . . . . . 145

7.2 (a) Stratified sampling of earthquake magnitudes (b) Importance sampling

of residuals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

7.3 Four simulated ground-motion maps, two of which are reasonably similar

and grouped together into one cluster. . . . . . . . . . . . . . . . . . . . . 151

7.4 (a) The San Francisco Bay Area transportation network (b) Aggregated

model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

7.5 Exceedance rates of travel-time delays. . . . . . . . . . . . . . . . . . . . . 154

xviii

7.6 (a) Predicted vs. exact delay values (b) Prediction residuals. . . . . . . . . 157

7.7 (a) A LOESS fit to the prediction residuals (b) Predicted and exact delay

values after bias correction. . . . . . . . . . . . . . . . . . . . . . . . . . . 158

7.8 Two sample exceedance curves obtained using the exact and the approxi-

mate loss functions (after bias correction). . . . . . . . . . . . . . . . . . . 159

7.9 (a) Residuals from the prediction model (b) Residuals normalized (divided)

by the predicted delays. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

7.10 Normal Q-Q plot of the residuals. . . . . . . . . . . . . . . . . . . . . . . 160

7.11 MART model fitted using 150 MCS maps. . . . . . . . . . . . . . . . . . . 162

7.12 Methodology for estimating bootstrap confidence intervals for the loss curves.163

7.13 1000 bootstrapped exceedance curves obtained using the (a) exact loss

function (b) approximate loss function. . . . . . . . . . . . . . . . . . . . . 163

7.14 Bootstrap confidence intervals. . . . . . . . . . . . . . . . . . . . . . . . . 164

7.15 Bootstrap confidence intervals. . . . . . . . . . . . . . . . . . . . . . . . . 165

7.16 Balanced bootstrap confidence intervals. . . . . . . . . . . . . . . . . . . . 166

8.1 Comparison of predicted median Sa(1s) values obtained using the CB08

model fitted with and without the consideration of spatial correlation: (a)

linear scale (b) log scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

8.2 Effect of spatial correlation on: (a) estimated intra-event residual standard

deviation (σ ), (b) estimated inter-event residual standard deviation (τ), (c)

estimated total residual standard deviation. (d) Ratio of inter-event residual

standard deviation to total residual standard deviation. . . . . . . . . . . . . 181

8.3 Risk assessment results for a hypothetical portfolio of buildings performed

using ground-motion models developed with and without the proposed re-

finement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

9.1 Hurricane Jeanne: (a) Observed wind speeds (b) Predicted wind speeds (c)

Residuals (d) Bias-corrected residuals. . . . . . . . . . . . . . . . . . . . . 195

9.2 Residuals and bias-corrected residuals versus closest distances from the

hurricane track. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

xix

9.3 (a) Histogram of bias-corrected residuals estimated using the Hurricane

Jeanne data (b) Normal QQ plot of normalized bias-corrected residuals

from Hurricane Jeanne. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

9.4 Semivariogram of bias-corrected residuals estimated using the Hurricane

Jeanne data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

9.5 Bias-corrected residuals estimated using the Hurricane Frances data. . . . . 199

9.6 (a) Histogram of bias-corrected residuals estimated using the Hurricane

Frances data (b) Normal QQ plot of normalized bias-corrected residuals

from Hurricane Frances. . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

9.7 Semivariogram of bias-corrected residuals estimated using the Hurricane

Frances data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

9.8 Portfolio of five residential buildings considered in the risk assessment. . . 201

9.9 Portfolio loss exceedance probabilities. . . . . . . . . . . . . . . . . . . . 203

10.1 Comparison of the risk assessment frameworks for (a) single structures and

(b) lifelines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

A.1 (a) The San Francisco Bay Area transportation network and (b) Annual

exceedance rates of various travel time delays on that network (results from

Jayaram and Baker [2010]). . . . . . . . . . . . . . . . . . . . . . . . . . . 225

A.2 (a) Chi-Chi earthquake normalized residuals computed using spectral ac-

celerations at 1 second (b) Chi-Chi earthquake normalized residuals com-

puted using spectral accelerations at 2 seconds (c) Cross-semivariogram

estimated using the 1s and 2s Chi-Chi earthquake residuals. . . . . . . . . . 229

B.1 Semivariogram of ε based on the peak ground accelerations observed dur-

ing the Northridge earthquake data . . . . . . . . . . . . . . . . . . . . . . 231

B.2 Semivariogram of ε computed at 0.5 seconds based on the Northridge

earthquake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

B.3 Semivariogram of ε computed at 1 second based on the Northridge earth-

quake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

xx

B.4 Semivariogram of ε computed at 2 seconds based on the Northridge earth-

quake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233


quake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

B.6 Semivariogram of ε computed at 7.5 seconds based on the Northridge

earthquake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234


quake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

B.8 Experimental semivariogram of ε computed at 2 seconds based on the Chi-

Chi earthquake data. Also shown in the figure are two fitted semivariogram

models: (i) An accurate exponential + nugget model and (ii) An approxi-

mate exponential model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

B.9 Semivariogram of ε based on the peak ground accelerations observed dur-

ing the Chi-Chi earthquake data . . . . . . . . . . . . . . . . . . . . . . . 236

B.10 Semivariogram of ε computed at 0.5 seconds based on the Chi-Chi earth-

quake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

B.11 Semivariogram of ε computed at 1 second based on the Chi-Chi earthquake

data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

B.12 (Approximate) Semivariogram of ε computed at 2 seconds based on the

Chi-Chi earthquake data . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

B.13 Semivariogram of ε computed at 5 seconds based on the Chi-Chi earth-

quake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

B.14 Semivariogram of ε computed at 7.5 seconds based on the Chi-Chi earth-

quake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

B.15 Semivariogram of ε computed at 10 seconds based on the Chi-Chi earth-

quake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

B.16 Experimental Semivariogram of ε computed at 5 seconds based on the sim-

ulated ground-motion data. Also shown in the figure are two fitted semivar-

iogram models: (i) An accurate spherical model and (ii) An approximate

exponential model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

xxi

B.17 Range of semivariograms of ε , as a function of the period at which ε val-

ues are computed. The residuals are obtained using the simulated ground-

motion data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

B.18 Simulated multivariate normal random fields. The correlation structure is

defined using an exponential semivariogram with range equaling (a) 0km

(b) 20km and (c) 40km. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

B.19 Comparison between the experimental semivariogram of ε’s computed us-

ing pulse-like ground motions and the experimental semivariogram of ε’s

computed using all usable ground motions. The ε’s are computed from

peak ground accelerations . . . . . . . . . . . . . . . . . . . . . . . . . . . 246



computed using all usable ground motions. The ε’s are obtained from spec-

tral accelerations computed at 0.5 seconds . . . . . . . . . . . . . . . . . . 246




tral accelerations computed at 1 second . . . . . . . . . . . . . . . . . . . 247




tral accelerations computed at 2 seconds . . . . . . . . . . . . . . . . . . . 247




tral accelerations computed at 5 seconds . . . . . . . . . . . . . . . . . . . 248




tral accelerations computed at 7.5 seconds . . . . . . . . . . . . . . . . . . 248

xxii




tral accelerations computed at 10 seconds . . . . . . . . . . . . . . . . . . 249

B.26 Experimental directional semivariograms at discrete separations obtained

using the Northridge earthquake ε values computed at 2 seconds. Also

shown in the figures is the best fit to the omni-directional semivariogram:

(a) Omni-directional; (b) Azimuth = 0; (c) Azimuth = 45 and (d) Azimuth

= 90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251


using the Chi-Chi earthquake ε values computed at 1 second. Also shown

in the figures is the best fit to the omni-directional semivariogram: (a)

Omni-directional; (b) Azimuth = 0; (c) Azimuth = 45 and (d) Azimuth

= 90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252


using the Chi-Chi earthquake ε values computed at 7.5 seconds. Also

shown in the figures is the best fit to the omni-directional semivariogram:

(a) Omni-directional; (b) Azimuth = 0; (c) Azimuth = 45 and (d) Azimuth

= 90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253


using the simulated time histories. The ε values are computed at 2 seconds.

Also shown in the figures is the best fit to the omni-directional semivari-

ogram: (a) Omni-directional; (b) Azimuth = 0; (c) Azimuth = 45 and (d)

Azimuth = 90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254


using the simulated time histories. The ε values are computed at 7.5 sec-

onds. Also shown in the figures is the best fit to the omni-directional semi-

variogram: (a) Omni-directional; (b) Azimuth = 0; (c) Azimuth = 45 and

(d) Azimuth = 90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

xxiii


using the simulated time histories. The ε values are computed at 7.5 sec-

onds. Also shown in the figures is an anisotropic model that fits the four

experimental semivariograms well (It is to be noted that an anisotropic

semivariogram has different shapes in different directions.): (a) Omni-

directional; (b) Azimuth = 0; (c) Azimuth = 45 and (d) Azimuth = 90 . . . . 256

C.1 The aggregated San Francisco bay area transportation network. . . . . . . . 262

C.2 Recurrence curve for the travel time delay obtained using the simulation-

based framework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

C.3 Joint likelihoods of magnitudes and faults given that travel time delay ex-

ceeds (a) 0 hours, (b) 5,000 hours, (c) 10,000 hours and (d) 20,000 hours. . 264

C.4 Level of congestion in the network as indicated by the volume/ capacity ratio.265

C.5 Joint likelihoods of inter-event residual given that travel time delay exceeds

(a) 0 hours, (b) 5,000 hours, (c) 10,000 hours and (d) 20,000 hours. . . . . . 266

C.6 Joint likelihoods of inter-event residual given that travel time delay exceeds

(a) 0 hours, (b) 5,000 hours, (c) 10,000 hours and (d) 20,000 hours. . . . . . 267

C.7 Mean magnitude of earthquakes producing a travel time delay exceeding a

specified threshold. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268

C.8 (a) Average of mean intra-event residual of earthquakes producing a travel

time delay exceeding a specified threshold (b) Average of inter-event resid-

ual of earthquakes producing a travel time exceeding a specified threshold. . 268

C.9 Recurrence curves obtained without completely accounting for inter-event

and intra-event residuals. . . . . . . . . . . . . . . . . . . . . . . . . . . . 269

C.10 Performance of the network under three difference ground-motion scenar-

ios corresponding to three different inter-event residuals. (a) η = 3.79, (b)

η = -1.64 and (c) η= 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

xxiv

Chapter 1

Introduction

1.1 Motivation


for any society. Due to their known vulnerabilities, it is important to proactively assess

and mitigate the seismic risk of lifelines. For instance, the Northridge earthquake caused

over $1.5 billion in business interruption losses ascribed to transportation network dam-

age [Chang, 2003]. The city of Los Angeles suffered a power blackout and $75 million

of power-outage related losses as a result of the earthquake [e.g., Tanaka et al., 1997].

Lifeline seismic risk assessment is a systematic approach for quantifying the likelihood of

observing such losses during future earthquakes (pre-event risk assessment) or in the im-

mediate aftermath of an earthquake (post-event risk assessment). It is often the first step

in the process of management of the lifeline seismic risk, and is useful for several appli-

cations including the prediction (or estimation after an earthquake) of quantities such as

the monetary losses associated with structures and infrastructure owned by a corporation

or insured by an insurance company, the number of injuries and casualties in a certain area

and the probability that lifeline networks for power, water and transportation may be inter-

rupted. This knowledge is useful for decision makers interested in seismic risk mitigation

(e.g., lifeline retrofit), post-disaster management planning, post-earthquake decision mak-

ing (e.g., opening and closing of facilities such as gas pipelines) and insurance modeling.

1

CHAPTER 1. INTRODUCTION 2

Lifeline seismic risk assessment is a multi-disciplinary problem that involves seismol-

ogy to quantify the earthquake hazard, structural engineering to quantify the damage to

infrastructure components, statistics to handle the numerous uncertainties that are present

in the seismic environment and in the infrastructure performance, as well as tools and tech-

niques from fields such as optimization, network flow modeling and economics.

The analytical Pacific Earthquake Engineering Research Center (PEER) loss analysis

framework has been used to perform the risk assessment for a single structure at a given

site, by estimating the site ground-motion hazard and assessing probable losses using the

hazard information [Cornell and Krawinkler, 2000, Deierlein, 2004]. The risk is measured

as the exceedance rates of various loss levels, and is obtained as follows:

λ (DV ) =∫ ∫ ∫

G(DV |DM)dG(DM|EDP)dG(EDP|IM)∣∣dλ (IM)

∣∣ (1.1)

where λ (DV ) is the exceedance rate of the decision variable (loss measure) denoted DV ,

dλ (IM) is the derivative of the exceedance rate of a ground-motion intensity measure

denoted IM (e.g., spectral acceleration, peak ground acceleration), dG(EDP|IM) is the

derivative of the probability of exceedance of an engineering demand parameter (EDP)

(e.g., inter-story drift ratio) given an IM, dG(DM|EDP) is the derivative of the probability

of exceedance of a damage measure (DM) (e.g., minor damage, severe damage) given an

EDP and G(DV |DM) is the probability of exceedance of a decision variable (DV ) (e.g.,

monetary loss) given a DM. It is to be noted that the parameters IM, EDP and DM can also

be vectors.

Often, numerical integration is sufficient to estimate λ (DV ) for a single structure. Life-

line risk assessment, however, is based on a large vector of ground-motion intensities (e.g.,

intensities at all bridge locations in a transportation network). In other words, the scalar IM

in Equation 1.1 is now replaced by a large vector of IMs which adds considerable complex-

ity to the integral. The intensities also show significant spatial correlation (i.e., dependence

between the intensities at different sites), which needs to be carefully modeled in order to

accurately assess the seismic risk [e.g., Park et al., 2007, Bazzurro and Luco, 2004]. Fur-

ther, the link between the lifeline component damage measures and the performance of the

lifeline (i.e., G(DV |DM)) is usually not available in closed form. For instance, the travel


time of vehicles in a transportation network, a commonly-used performance measure, is

only obtained using an optimization procedure rather than being a closed-form function

of the ground-motion intensities and the bridge damage states. These additional complex-

ities make it difficult to use the PEER framework for lifeline risk assessment. There are

some analytical approaches that are sometimes used for lifeline risk assessment [e.g., Kang

et al., 2008], but those are generally applicable to only specific classes of lifeline reliabil-

ity problems. Hence, many past research works use Monte Carlo simulation (MCS)-based

approaches instead of analytical approaches for lifeline risk assessment [e.g., Chang et al.,

2000, Campbell and Seligson, 2003, Werner et al., 2004, Crowley and Bommer, 2006,

Kiremidjian et al., 2007, Shiraki et al., 2007]. Figure 1.1 illustrates the above-mentioned

similarities and dissimilarities between the risk assessment frameworks for single struc-

tures and lifelines. (The bold font in the figure denotes a vector. It is also to be noted that

the value of G(DM|IM) for a lifeline component can be computed using G(DM|EDP) and

G(EDP|IM) for the component, if desired.)

In a MCS-based approach, several possible future earthquakes are simulated and the

losses sustained by the lifeline due to the ground-motion intensities during these earth-

quakes are evaluated. These losses are then probabilistically combined in order to obtain

the exceedance rates of various loss levels. Basic MCS-based approaches necessitate per-

formance evaluations of the lifeline under a large number of possible future earthquake sce-

narios and are therefore highly computationally demanding. The current study addresses

these challenges and proposes a computationally-efficient MCS-based framework for as-

sessing the seismic risk of lifelines, with full consideration of the uncertainties and corre-

lations present in spatial ground-motion fields.

1.2 Areas of contribution

This thesis aims to address the challenges mentioned above. The major contributions of

this work are summarized below.


Figure 1.1: Comparison of the risk assessment frameworks for (a) single structures and (b)lifelines.


1.2.1 Multi-site hazard modeling

Challenges

Lifeline risk assessment requires knowledge about the joint distribution of a vector of

spatially-distributed ground-motion intensities during probable future earthquakes. The

distribution of the ground-motion intensity at a single site is typically predicted using a

ground-motion model, which takes the following form [e.g., Boore and Atkinson, 2008,

Abrahamson and Silva, 2008, Chiou and Youngs, 2008, Campbell and Bozorgnia, 2008]:

ln(Yi j) = ln(Yi j)+σi jεi j + τi jηi j (1.2)

where Yi j denotes the ground-motion intensity parameter of interest (e.g., Sa(T ), the spec-

tral acceleration at period T ) at site i during earthquake j; Yi j denotes the predicted (by

the ground-motion model) median ground-motion intensity which depends on parameters

such as magnitude, distance, period and local-site conditions; εi j denotes the normalized

intra-event residual and ηi j denotes the normalized inter-event residual. Both εi j and ηi j

are univariate normal random variables with zero mean and unit standard deviation. σi j

and τi j are standard deviation terms that are estimated as part of the ground-motion model

and are functions of the spectral period of interest, and in some models also functions of

the earthquake magnitude and the distance of the site from the rupture. The term σi jεi j

is called the intra-event residual and the term τi jηi j is called the inter-event residual. The

inter-event residual is a constant across all the sites for a given earthquake. (It is to be noted

that some chapters of this thesis describe the ground-motion model directly in terms of the

inter-event and the intra-event residuals, rather than using the normalized forms.) The sum

of the inter-event residual and the intra-event residual is called the total residual. Figures

1.2a-b show, for example, the observed (i.e., Yi j) and predicted (Yi j) peak ground accelera-

tions (PGA) during the 1999 Chi-Chi earthquake. Figure 1.2c shows the normalized total

residuals (i.e., total residuals normalized by their standard deviation) computed using the

Boore and Atkinson [2008] ground-motion model.

While quantifying the hazard over two or more sites, the ground-motion model is used

to predict the ground-motion intensity at each site of interest. For instance, the following


Figure 1.2: 1999 Chi-Chi earthquake: (a) recorded PGAs (b) median PGAs predicted by theBoore and Atkinson [2008] ground-motion model (c) normalized total residuals computedusing Equation 1.2.


equations are used to predict the distribution of ground-motion intensity at sites i and i′.

ln(Yi j) = ln(Yi j)+σi jεi j + τi jηi j (1.3)

ln(Yi′ j) = ln(Yi′ j)+σi′ jεi′ j + τi′ jηi′ j (1.4)

It is to be noted, however, that the above equations only provide information about the

marginal distribution of the ground-motion intensity at sites i and i′. Regional risk assess-

ments require knowledge about the joint distribution of the ground-motion intensity at sites

i and i′ in order to capture possible dependencies between the ground-motion intensities at

the two sites. Since the median predictions at sites i and i′ are deterministic and the values

of the inter-event residual at sites i and i′ are equal, the only additional information re-

quired to quantify the joint distribution of intensities at sites i and i′ is the joint distribution

of the εi j and εi′ j. While, the inter-event residual and each of the intra-event residuals dur-

ing an earthquake have been statistically seen to follow the univariate normal distribution

marginally [Abrahamson, 1988], not much is known about the joint distribution of multiple

spatially-distributed intra-event residuals. In the past, some research works assume that the

intra-event residuals follow a multivariate normal distribution [e.g., Bazzurro and Cornell,

2002, Baker and Cornell, 2006, Kiremidjian et al., 2007], though this assumption has not

been verified using recorded time history data.

Once the nature of the distribution of the residuals (and equivalently, that of the inten-

sities) is determined, the distribution needs to be parameterized so that it can be used for

forward-predicting residuals and ultimately, ground-motion intensities from future earth-

quakes. One of the challenges in parameterizing the joint distribution of the intra-event

residuals is that the intra-event residuals exhibit ‘spatial correlation’ [e.g., Boore et al.,

2003]. The spatial correlation is a term that denotes the interdependency between the intra-

event residuals located over a region during an earthquake. It arises due to several reasons

including common-source effects (a part of this effect is captured by the inter-event resid-

ual) and similarity in local-site effects and propagation-path effects. The correlation is

known to be large when the sites are close to one another, and decays with increase in sep-

aration between the sites [Boore et al., 2003]. Evidence of spatial correlation can be seen

in Figure 1.2c, which shows clusters of large- and small-valued residuals (which indicates


dependence between closely-spaced residuals).

The impact of this correlation on lifeline risk has only been recently studied, and has

been seen to be significant [e.g., Park et al., 2007, Lee and Kiremidjian, 2007, Straub

and Der Kiureghian, 2008, Rix et al., 2009]. The Sacramento delta levee system risk as-

sessment project is a practical example where the spatial correlation was considered in the

risk assessment process [Hanson et al., 2008, Bazzurro, 2010]. Straub and Der Kiureghian

[2008] note that the presence of spatial correlation tends to increase the reliability of se-

ries systems and decrease the reliability of parallel systems. Irrespective of the nature of

the lifeline (which is neither a series nor a parallel system), it is important to consider

the spatial correlation in the risk assessment in order to obtain unbiased estimates of the

probability of sustaining large losses (and small frequent losses).

The ground-motion models (Equation 1.2) that quantify the distribution of intensities at

a single site do not provide information about the spatial correlation between the intensities.

Researchers, in the past, have computed these correlations using ground-motion time his-

tories recorded during earthquakes [Goda and Hong, 2008, Wang and Takada, 2005, Boore

et al., 2003]. Boore et al. [2003] used observations of PGA from the 1994 Northridge

earthquake to compute the spatial correlations. Wang and Takada [2005] computed the

correlations using observations of peak ground velocities (PGV) from several earthquakes

in Japan and the 1999 Chi-Chi earthquake. Goda and Hong [2008] used the Northridge and

Chi-Chi earthquake PGAs and spectral accelerations at three periods ranging between 0.3s

and 3s. The results reported by these research works, however, differ in terms of the rate

of decay of correlation with separation distance. For instance, while Boore et al. [2003]

report that the correlation drops to essentially zero at a site separation distance of approxi-

mately 10 km, the non-zero correlations observed by Wang and Takada [2005] extend past

100 km. Further, Goda and Hong [2008] observe differences between the correlation decay

rate estimated using the Northridge earthquake records and the correlation decay rate based

on the Chi-Chi earthquake records. To date, no explanation for these differences has been

identified.

Additionally, the ground-motion models used in the development of the correlation

models and for performing risk assessments are currently calibrated using regression analy-

sis that assumes independence between the intra-event residuals [Abrahamson and Youngs,


1992]. Few works have verified the impact of considering the spatial correlation in the

development of ground-motion models. One recent work is that of Hong et al. [2009],

who investigated the influence of including spatial correlation in the regression analysis on

the ground-motion models fitted using a two-stage regression algorithm and a one-stage

algorithm of Joyner and Boore [1993]. They observed that the differences in the estimated

ground-motion model coefficients (used for predicting the median intensity) obtained with

and without the incorporation of spatial correlation were insignificant. They did not, how-

ever, investigate the impact on the variances predicted by the ground-motion models in

detail.

Contributions

In the current study, statistical tests are used to verify the commonly-used assumptions

of univariate normality of logarithmic intensities and multivariate normality of spatially-

distributed logarithmic intensities. Further, observed and simulated ground-motion time

histories are used to estimate the spatial correlation between intra-event residuals, which

can be used to parameterize the joint distribution of the ground-motion intensities. Factors

that affect the rate of decay of the correlation with separation distance are studied. Probable

explanations for the differing correlation estimates reported in the literature are provided.

Finally, the importance of considering spatial correlation in lifeline risk assessments is

illustrated.

The study also investigates the impact of incorporating spatial correlation on the ground

motion model coefficients and on the variance of the predicted intensities. The commonly-

used mixed-effects regression algorithm of Abrahamson and Youngs [1992] is modified to

account for the spatial correlation. This modified algorithm is then used to refit a sample

ground-motion model (the Campbell and Bozorgnia [2008] model) in order to study the

impact of incorporating spatial correlation on ground-motion models and subsequently, on

the lifelines risk estimates.

Additionally, the techniques described above for quantifying the seismic hazard over

a region can be extended to other types of hazard and multi-hazard scenarios. This study

investigates extension of the regional seismic risk framework to regional hurricane hazard

modeling. Multi-site hurricane wind hazard assessment involves the simulation of possible


hurricane tracks (i.e., the point of origin, path and other properties such as the central pres-

sure and velocity) and the prediction of the wind fields (peak wind speeds at all the sites of

interest) associated with each track [Lee and Rosowsky, 2007, Vickery et al., 2009b, Legg

et al., 2010]. This is analogous to the simulation of earthquake events and the prediction

of associated ground-motion fields in the seismic hazard assessment framework. Most pre-

diction models developed in the past for predicting hurricane wind fields are deterministic,

however, and the uncertainties in wind fields have been rarely analyzed. To the author’s

knowledge, the spatial correlation in hurricane wind fields has not been studied in the liter-

ature. The current study focuses on quantifying the uncertainties and the spatial correlation

in hurricane wind fields (using the same techniques that were used to quantify these pa-

rameters in earthquake ground-motion fields), and evaluating their impact on the hurricane

risk of spatially-distributed systems. Hurricane wind-speed predictions are obtained for

two sample hurricanes, Hurricane Jeanne and Hurricane Frances, using the Batts et al.

[1980] wind-speed model, and the uncertainties in these predictions are evaluated using

actual wind-speed recordings. For instance, Figure 1.3a shows the track of the 2004 Hur-

ricane Jeanne [Landsea et al., 2004] and the observed maximum wind speeds (maximum

sustained one minute wind speed at 10 meter height) during the hurricane [Powell et al.,

1998]. Figure 1.3b shows the corresponding wind speeds predicted by Batts et al. [1980]

wind-speed model. Figure 1.3c shows the total residuals computed from the observed and

the predicted wind speeds. The smoothness of the residuals in Figure 1.3c indicates the

presence of spatial correlation between the residuals. This spatial correlation structure is

estimated and modeled using geostatistical tools.

1.2.2 Lifeline risk assessment

Challenges

Lifelines are complex infrastructure systems with a large number of components. (For in-

stance, there are over 1,000 bridges in the San Francisco Bay Area transportation network

model used later in this thesis). Estimating the performance of a lifeline during an earth-

quake scenario is often extremely computationally intensive. One important challenge in

lifeline risk assessment is to devise methods to handle this large computational demand


Figure 1.3: 2004 hurricane Jeanne (The line indicates the hurricane track.): (a) recordedwind speeds (b) wind speeds predicted by Batts et al. [1980] wind-speed model (c) wind-speed residuals.


[Der Kiureghian, 2009].

In the past, researchers have developed and used several techniques for lifeline risk as-

sessment. Numerical integration-based techniques are used in some special cases where

the components of the lifeline do not interact with each other. This is done, for instance,

while evaluating the exceedance rates of monetary losses associated with structural damage

to bridges in a transportation network [Stergiou and Kiremidjian, 2006]. This also arises

in other situations involving spatially-distributed systems such as the evaluation of the ex-

ceedance rates of monetary losses to a portfolio of buildings [Wesson et al., 2009]. These

works, however, ignore the spatial correlation between the ground-motion intensities in

order to facilitate the use of numerical integration.

Some research works use simplified lifeline performance measures in order to reduce

the computational demand. Basoz and Kiremidjian [1996], Duenas-Osorio et al. [2005],

Kang et al. [2008] and Bensi et al. [2009b] use connectivity between the nodes of a lifeline

(e.g., transportation network connectivity between a city and a hospital) as a measure of

network performance. Not only does the use of a simplified connectivity-based measure

(instead of a flow-based measure such as the travel-time delay in a transportation network)

reduce the time required to evaluate the network performance under various earthquake

scenarios, it enables the use of computationally-efficient analytical techniques such as the

matrix-based system reliability (MSR) method of Kang et al. [2008] to evaluate the lifeline

risk. It is to be noted that the MSR method can be extended to problems using flow-based

performance measures, but is computationally expensive in such cases [Kang et al., 2008].

On account of the above-mentioned complications involved in modeling the hazard

and the lifeline performance (particularly while using flow-based measures), many past re-

search works use MCS-based approaches instead of analytical approaches for lifeline risk

assessment [e.g., Werner et al., 2000, Chang et al., 2000, Campbell and Seligson, 2003,

Crowley and Bommer, 2006, Kiremidjian et al., 2007, Shiraki et al., 2007]. One simple

MCS-based approach used in the past involves studying the performance of lifelines under

those earthquake scenarios that dominate the hazard in the region of interest [e.g., Adachi

and Ellingwood, 2008, Kiremidjian et al., 2007, Duenas-Osorio et al., 2005]. While this ap-

proach is more tractable, it does not adequately capture all the seismic hazard uncertainties.


A more comprehensive approach uses MCS to probabilistically generate ground-motion in-

tensity maps, considering all possible earthquake scenarios that could occur in the region,

and then use these for the risk assessment [Crowley and Bommer, 2006]. Sample scenarios

are probabilistically generated by first estimating the median intensities due to a particular

earthquake using a ground-motion model, and by subsequently combining the median in-

tensities with simulated values of residuals. Figure 1.4, for instance, illustrates the MCS

of ground-motion intensities for a magnitude 8 earthquake on the San Andreas fault. The

most basic form of MCS, the conventional MCS, is computationally inefficient because

large magnitude earthquakes and above-average ground-motion intensities are considerably

more important than small magnitude earthquakes and small ground-motion intensities to

lifeline risks, but these are infrequently sampled in conventional MCS. Kiremidjian et al.

[2007] improved the MCS process by preferentially simulating large magnitude events us-

ing importance sampling (IS). Werner et al. [2004] also implemented variance-reduction

techniques in the software package REDARS (Risks from Earthquake Damage to Road-

way Systems) in order to simulate fewer earthquakes.

Chang et al. [2000] used a MCS-based approach to estimate earthquake-induced delays

in a transportation network. They generated a catalog of 47 earthquakes and corresponding

intensity maps for the Los Angeles area and assigned probabilities to these earthquakes

such that the site hazard curves obtained using this catalog match with the known local site

hazard curves obtained from PSHA. In other words, the probabilities of the scenario earth-

quakes were chosen to make the catalog hazard consistent. Only median PGAs were used

to produce the ground-motion intensity maps corresponding to the scenario earthquakes,

however, and variability about these medians was ignored, which can bias the resulting risk

estimates [e.g., Grossi and Kunreuther, 2005]. While this approach is highly computation-

ally efficient on account of the use of a small catalog of earthquakes, the selection of earth-

quakes is a somewhat subjective process, and the assignment of probabilities is based on

hazard consistency rather than on actual event likelihoods. Campbell and Seligson [2003]

proposed a more quantitative procedure to develop the hazard consistent scenarios, but the

rest of the drawbacks were not resolved.

Recently, Guikema [2009] proposed that the lifeline performance evaluations can be ex-

pedited by using an approximate regression relationship between the lifeline performance


Figure 1.4: Ground-motion intensity simulation for a magnitude 8 earthquake on the SanAndreas fault: (a) median intensities obtained using the Boore and Atkinson [2008] ground-motion model (b) simulated values of the normalized total residuals (c) total intensities.


and the predictive hazard variables (e.g., ground-motion intensities at component locations)

obtained using a statistical learning technique. He, however, did not provide any risk as-

sessment examples. Another recent work is that of Bensi et al. [2009a], who explored the

use of Bayesian network models, particularly for post-earthquake lifeline risk assessment.

The computational feasibility of this approach particularly while estimating the risk of large

lifelines needs further investigation.

Contributions

The current study develops a computationally-efficient lifeline risk assessment framework

based on efficient sampling and data reduction techniques. The framework can be used for

developing a small, but stochastically representative catalog of spatially-correlated ground-

motion intensity maps that can be used for performing lifeline risk assessments. This tech-

nique is seen to reduce the computational demand of complex risk assessments by more

than three orders of magnitude, without compromising the accuracy of the risk estimates.

The proposed framework is used to evaluate the exceedance rates of various travel-time

delays on an aggregated (higher-scale) model of the San Francisco Bay Area transportation

network. Lifeline risk deaggregation calculations are used to illustrate the need to consider

uncertainties in the lifeline risk assessment process. Finally, the study also explores the use

of a statistical learning technique called multivariate adaptive regression trees in order to

expedite lifeline performance evaluation.

1.3 Organization

This thesis addresses several important issues related to the risk assessment of lifelines.

Chapters 2, 3 and 4 deal with the joint distribution of spatially-distributed ground-motion

intensities. Chapter 5 discusses systematic approaches to probabilistically sampling ground

motion intensity fields. Chapters 6 and 7 present new computationally-efficient lifeline risk

assessment techniques. Chapter 8 discusses the impact of considering spatial correlation

on ground-motion models and subsequently, on lifeline seismic risk. Chapter 9 explores

extending the probabilistic framework used for the seismic risk assessment of lifelines to


hurricane lifeline risk assessment.

Chapter 2 deals with the important issue of quantifying the joint distribution of spectral

accelerations, which is required for the risk assessment of lifelines. The chapter discusses

statistical tests that are used to examine the commonly-used assumptions of univariate nor-

mality of logarithmic spectral acceleration values and multivariate normality of vectors of

logarithmic spectral acceleration values computed at different sites and/or different peri-

ods. The statistical hypothesis tests carried out in this work indicate that these assumptions

are reasonable.

Chapter 3 presents a new spatial correlation model for spectral accelerations at a single

period (and the related Appendix A describes the estimation of cross-correlations between

spectral accelerations at two different periods), developed using recorded earthquake time

histories. The correlation is expressed as a function of the site separation distance, the

spectral acceleration period and the local soil conditions. The correlations predicted by the

model, along with the means and the variances provided by the ground-motion models, can

be used to completely parameterize the joint distribution of spatial spectral acceleration

fields, which is necessary for lifeline risk calculations.

Chapter 4 investigates the validity of commonly-used assumptions in spatial correlation

models such as stationarity (invariance of correlation with spatial location) and isotropy

(directional independence). Testing these assumptions, however, requires a large number

of ground-motion time histories. Since real data are sparse, this chapter uses simulated

ground-motion time histories instead. The chapter also takes advantage of the large simu-

lated ground-motion database to carry out tests to identify whether the correlations between

pulse-like ground motions that arise due to directivity effects are different from the corre-

lations between non-pulse-like ground motions. Overall, this chapter tests and provides a

basis for some of the subtle assumptions commonly used in spatial correlation models.

Chapter 5 discusses techniques for simulating ground-motion intensity maps with and

without the consideration of recorded ground-motion intensities. A ground-motion inten-

sity map is generated by combining median intensity predictions from ground-motion mod-

els with realizations of inter-event and intra-event residuals that account for the uncertainty

in the intensities. Intra-event residuals can be simulated as a correlated vector (using the

correlation model presented in Chapter 3) of multivariate normal random variables, and the


inter-event residual can be simulated as a univariate Gaussian random variable (based on the

discussion in Chapter 2). The chapter discusses two MCS techniques, termed, single-step

simulation and sequential simulation, for generating residuals in the absence of recorded

ground-motion intensities. While both procedures are theoretically equivalent, it is possible

to reduce computational expense by using the sequential simulation technique. The chap-

ter also describes a sequential simulation technique for simulating residuals incorporating

knowledge about recorded ground-motion intensities. This is useful for post-earthquake

damage assessment and for determining optimal emergency response strategies.

Chapter 6 presents a novel computationally-efficient MCS procedure based on im-

portance sampling and K-means clustering, that can be used for the seismic risk assess-

ment of lifelines. The framework can be used for developing a small, but stochastically-

representative catalog of ground-motion intensity maps that can be used for performing life-

line risk assessments. The importance sampling technique is used to preferentially sample

important ground-motion intensity maps (using the MCS techniques discusses in Chapter

5), and the K-means clustering technique is used to identify and combine redundant maps.

It is shown theoretically and empirically that the risk estimates obtained using these tech-

niques are unbiased. The proposed framework is used to compute the exceedance rates of

travel-time delays (the chosen performance measure) on an aggregated form (coarse-scale

model) of the San Francisco Bay Area transportation network. The exceedance rates of

travel-time delays are obtained using a catalog of only 150 maps, and are shown to be in

good agreement with those obtained using the conventional MCS method. The proposed

method is three orders of magnitude faster (computationally) than the conventional MCS,

and therefore will potentially facilitate computationally intensive risk analysis of lifelines,

with full consideration of the uncertainties and the spatial correlation in ground-motion

intensity fields. The related Appendix C uses lifeline risk deaggregation calculations to

illustrate the need to consider these uncertainties in the lifeline risk assessment process.

Chapter 7 explores the use of statistical learning techniques to reduce the computa-

tional expense of the lifeline risk assessment problem. MCS and its variants are generally

well suited for characterizing ground motions and computing resulting losses to lifelines.

MCS-based methods are, however, highly computationally intensive, primarily because


they involve repeated evaluations of lifeline performance under a large number of sim-

ulated ground-motion intensity maps. In this study, a non-parametric statistical learning

technique termed Multivariate Adaptive Regression Trees (MART) is used to obtain an

approximate relationship between the ground-motion intensities at lifeline component lo-

cations and the lifeline performance. Non-parametric regression is used in place of clas-

sical regression since the number of predictor variables (ground-motion intensities at the

component locations) far exceeds the number of available training data points. The life-

line performance predicted by this relationship can potentially be used in place of the

actual lifeline performance (the evaluation of which is intensive) to expedite the compu-

tation of several lifeline risk-related parameters. The study illustrates this by developing

a MART-based relationship between the ground-motion intensities at bridge locations and

the network travel times in the San Francisco Bay Area transportation network, and using

it for estimating confidence intervals for the risk estimates presented in Chapter 6. More

generally, these approximate performance relationships can be used in several problems

such as prioritizing lifeline retrofits, whose computational demand stems from the need for

repeated performance evaluations.

Even though the risk assessment framework described in Chapter 6 facilitates the con-

sideration of spatial correlation between ground-motion intensities, current ground-motion

models (e.g., NGA ground-motion models) that are used to predict the distribution of

ground-motion intensities at individual sites are fitted assuming independence between

the intra-event residuals. Chapter 8 proposes a method to consider the spatial correlation

(discussed in Chapter 3) in the mixed-effects regression procedure used for fitting ground-

motion models, and illustrates the impact of considering spatial correlation on the means

and the variances predicted by the ground-motion models. It is shown using an illustra-

tive example that the risk estimates of spatially-distributed systems can be inaccurate while

using ground-motion models fitted without the consideration of spatial correlation.

Frameworks for the risk assessment of structures and infrastructure systems under a

variety of natural and man-made hazards share many similarities. It is conceivable there-

fore, that the techniques developed for the risk assessment under one type of natural or

man-made hazard will be applicable for the risk assessment under another hazard or multi-

hazard scenario. Chapter 9 describes an exploratory study carried out to investigate the


extension of the seismic hazard and risk assessment concepts and techniques discussed in

the earlier chapters to hurricane hazard and risk modeling. The study focuses on quantify-

ing the uncertainties and the spatial correlation in hurricane wind fields (using techniques

that are used to quantify these parameters in earthquake ground-motion fields), and evalu-

ating their impact on the hurricane risk of spatially-distributed systems.

Finally, Chapter 10 summarizes the important contributions and findings of this thesis,

and discusses future extensions of this research.

The chapters of this thesis are designed to be largely self-contained because they have

been or will be published as individual journal articles. Because of this, there is some

repetition of background material. In addition, notational conventions were chosen to be

simple and clear for the topic of each chapter rather than for the thesis as a whole; because

of this, the notational conventions may not be identical for each chapter. Apologies are

made for any distraction this causes when reading the thesis as a continuous document.

Chapter 2

Statistical Tests of the Joint Distributionof Spectral Acceleration Values

N. Jayaram and J.W. Baker (2008). Statistical tests of the joint distribution of spectral

acceleration values, Bulletin of the Seismological Society of America, 98(5), 2231-2243.

2.1 Abstract

Assessment of seismic hazard using conventional probabilistic seismic hazard analysis

(PSHA) typically involves the assumption that the logarithmic spectral acceleration values

follow a normal distribution marginally. There are, however, a variety of cases in which

a vector of ground-motion intensity measures is considered for seismic hazard analysis.

In such cases, assumptions regarding the joint distribution of the ground-motion intensity

measures are required for analysis. In this article, statistical tests are used to examine

the assumption of univariate normality of logarithmic spectral acceleration values and to

verify that vectors of logarithmic spectral acceleration values computed at different sites

and/or different periods follow a multivariate normal distribution. Multivariate normality

of logarithmic spectral accelerations are verified by testing the multivariate normality of

inter-event and intra-event residuals obtained from ground-motion models.

The univariate normality tests indicate that both inter-event and intra-event residuals

20

CHAPTER 2. JOINT DISTRIBUTION OF SPECTRAL ACCELERATIONS 21

can be well represented by normal distributions marginally. No evidence is found to sup-

port truncation of the normal distribution, as is sometimes done in PSHA. The tests for

multivariate normality show that inter-event and intra-event residuals at a site, computed

at different periods, follow multivariate normal distributions. It is also seen that spatially-

distributed intra-event residuals can be well represented by the multivariate normal distri-

bution. This study provides a sound statistical basis for assumptions regarding the marginal

and the joint distribution of ground-motion parameters that must be made for a variety of

seismic hazard calculations.

2.2 Introduction

Spectral acceleration values of earthquake ground motions are widely used in seismic haz-

ard analysis. Conventional probabilistic seismic hazard analysis (PSHA) [e.g., Kramer,

1996] provides a framework for the probabilistic assessment of a single ground-motion

parameter (such as the spectral acceleration computed at a single period). When imple-

menting PSHA, it is typically assumed that the spectral acceleration follows a lognormal

distribution marginally. There are, however, cases in which knowledge about the joint

occurrence of several spectral acceleration values, corresponding to different periods, is

required for hazard assessment [Bazzurro and Cornell, 2002]. Additionally, a single earth-

quake can cause severe damage over a large area. Hence, when assessing the impact of

earthquakes on a portfolio of structures or a spatially-distributed infrastructure system, it

is necessary to study the joint occurrence of spectral acceleration values at various sites in

the region [Crowley and Bommer, 2006]. Moreover, the knowledge of a vector of ground-

motion intensity measures is useful in other practical applications that involve computation

of the seismic response of a structure dominated by more than one mode [Shome and Cor-

nell, 1999, Vamvatsikos and Cornell, 2005], or that involve joint prediction of structural

and non-structural seismic responses for loss estimation purposes, and prediction of multi-

ple demand parameters such as displacement and hysteric energy. In such cases, a vector

of intensity measures needs to be considered and hence, it is necessary to study the joint

distribution of these intensity measures in observed ground motions.


Various empirical ground-motion models have been developed for estimating the re-

sponse spectrum of a given ground motion [e.g., Campbell and Bozorgnia, 2008, Boore

and Atkinson, 2008, Abrahamson and Silva, 2008, Chiou and Youngs, 2008]. A typical

ground-motion model has the form:

ln(Y ) = ln(Y )+ ε +η (2.1)

where Y denotes the ground-motion parameter of interest (e.g., Sa(T1), the spectral acceler-

ation at period T1); Y denotes the predicted (by the ground-motion model) median value of

the ground-motion parameter (which depends on parameters such as magnitude, distance,

period and local soil conditions); ε denotes the intra-event residual, which is a random vari-

able with zero mean and a standard deviation of σ ; and η denotes the inter-event residual,

which is a random variable with zero mean and a standard deviation of τ . The standard

deviations, σ and τ , are estimated during the derivation of the ground-motion model and

are a function of the response period, and in some models a function of earthquake mag-

nitude and distance from the rupture. Normalized intra-event residuals (ε) are obtained by

dividing ε by σ . Similarly, η can be normalized using τ to obtain η .

The logarithmic spectral acceleration at a site due to an earthquake is usually assumed to

be well represented by the normal distribution marginally [e.g., Kramer, 1996]. Abraham-

son [1988] performed rigorous statistical studies to verify the assumption that logarithmic

peak ground acceleration (PGA) values follow the normal distribution marginally. Such

rigorous studies have, however, not been performed on spectral accelerations. Moreover,

the assumption of normality must be extended to the joint distribution of the logarithmic

spectral accelerations, when performing vector-valued seismic hazard analysis [Bazzurro

and Cornell, 2002, Baker and Cornell, 2006]. When multiple ground-motion parameters

are considered (for instance, Y1 and Y2), the ground-motion model equations take the fol-

lowing form:

ln(Y1) = ln(Y1)+ ε1 +η1

ln(Y2) = ln(Y2)+ ε2 +η2 (2.2)


where Y1 and Y2 denote the predicted median values of the ground-motion parameters; ε1

and ε2 denote the intra-event residuals corresponding to the two parameters; η1 and η2

denote the inter-event residuals (η1 equals η2 if Y1 and Y2 denote Sa(T ) at two sites during

the same earthquake). If Y1 and Y2 are spectral accelerations at two-closely spaced sites

or spectral accelerations at two different periods at the same site, the residuals will not be

independent [Baker and Jayaram, 2008, Baker and Cornell, 2006]. Thus, an assumption

of univariate normality does not necessarily imply joint normality between the residuals.

There is a paucity of research work that examines the validity of assuming multivariate

normality. This chapter explores the validity of these assumptions using statistical tests

for univariate and multivariate normality, and a large library of spectral acceleration values

from recorded ground motions.

The ground-motion model of Campbell and Bozorgnia [2008] is used in this study to

compute the parameters shown in equations 2.1 and 2.2. The conclusions drawn from the

work, however, did not change when the Boore and Atkinson [2008] ground-motion model

was used as well. The spectral acceleration definition typically used in the NGA ground-

motion models is ‘GMRotI50’ (also known as ‘GMRotI’). This is the 50th percentile of

the set of geometric means of spectral accelerations at a given period, obtained by rotating

the as-recorded orthogonal horizontal motions through all possible non-redundant rotation

angles [Boore et al., 2006]. The residuals used in this work are obtained based on this

definition of the spectral acceleration.

The data for the analysis is obtained from the PEER NGA Database [2005]. In order to

exclude records whose characteristics differ from those used by the ground-motion model-

ers for data analysis, only records used by the ground-motion model authors are considered

in the tests for normality.

2.3 Testing the univariate normality of residuals

This section discusses tests performed on the assumption that logarithmic spectral acceler-

ations at a site due to a given earthquake are well represented by the normal distribution,

marginally. A practical way to test the univariate normality of a data set is to inspect the

normal Q-Q plot obtained from the data set by plotting the quantiles of the data sample


against the corresponding quantiles of the theoretical normal distribution [e.g., Johnson

and Wichern, 2007].

The following steps are involved in the construction of a normal Q-Q plot. Let x be

a collection of n data values that need to be tested for normality. The data set is ordered

(sorted in ascending order) to obtain[x(1),x(2), ...,x(n)

](such that x(1) ≤ x(2) ≤ ·· · ≤ x(n)).

When these sample quantiles x(k) are distinct (which is a reasonable assumption for contin-

uously varying data), exactly k observations are less than or equal to x(k). The cumulative

probabilities p(k) of each x(k) can be computed as kn . It has been shown, however, that a con-

tinuity correction gives an improved p(k) estimate of (k−3/8)(n+1/4) [Johnson and Wichern, 2007]

and hence, this definition of p(k) is used in this work. The normal Q-Q plot is obtained by

plotting the ordered data samples against the theoretical normal quantiles corresponding to

each of the probabilities p(k). The theoretical normal quantile corresponding to probability

p(k) is obtained as Φ−1(p(k)), where Φ−1 denotes the inverse of the cumulative normal

distribution with the mean and the variance equaling the sample mean and the sample vari-

ance respectively. If the data sample follows a normal distribution, the normal Q-Q plot

will form a straight line with a slope of 45 ◦, passing through the origin.

2.3.1 Results and discussion

Normality tests are performed on intra-event and inter-event residuals in order to verify the

univariate normality of logarithmic spectral accelerations at a site due to an earthquake.

The intra-event and the inter-event residuals provided to the authors by the ground-motion

model authors are used in the normality tests.

Intra-event residuals

This section discusses results of the univariate standard normality tests performed on the

normalized intra-event residuals (ε). As mentioned previously, ε values are obtained by

dividing the intra-event residuals (ε’s) by the standard deviations (σ ’s) provided by the

Campbell and Bozorgnia [2008] model.

Figure 2.1 shows the normal Q-Q plots of ε computed at four different periods rang-

ing between 0.5 seconds and 10 seconds, with the theoretical quantiles derived from the


standard normal distribution (normal distribution with zero mean and unit variance). Long

periods such as 10 seconds may not be used in practice as often as short periods. These

long periods are considered in this work, however, in order to cover the entire range of pe-

riods in which the ground-motion model used is applicable. Also shown in the figures are

45 ◦ lines passing though the origin. Deviation of the normal Q-Q plot from the 45 ◦ line

indicates deviation from standard normality. It can be seen from Figure 2.1 that the normal

Q-Q plots match reasonably well with the 45 ◦ lines in all the four cases. This indicates

that ε can be considered to be univariate standard normal based on this data set. Note that

while normality of ε is assumed in PSHA, it is often assumed that the distribution is trun-

cated. A typical decision would be to truncate the distribution at a ε = 2 or 3, and not allow

any larger ε values [Bommer and Abrahamson, 2006]. The tail of the marginal distribution

needs to be studied in order to determine if this truncation of the normal distribution is

reasonable. Figure 2.1 shows that ε values larger than 2 are observed as often as would be

expected from a non-truncated distribution. With the small data sets used, however, it is

not possible to study the tail distribution beyond ε = 3.

A technique to obtain a larger number of samples at the tail of the distribution would

be to pool the ε values computed at different periods. The normalized residuals computed

at various periods are shown to follow a standard normal distribution using the normal Q-

Q plots in Figure 2.1. Hence, it can be inferred that quantiles of the pooled data set will

match with the corresponding quantiles of a theoretical standard normal distribution. The

pooled set has a larger number of data points in the tail and hence, it is preferable to study

the tail properties using the pooled data set rather than the individual data sets. Hence,

12,194 ε values computed at 10 periods ranging from 0.5-10 seconds are pooled together.

The histogram of the pooled data set is shown in Figure 2.2 along with a scaled plot of

the theoretical standard normal distribution. The figure shows that the data are in excellent

agreement with the standard normal distribution, as expected based on the normal Q-Q

plots shown in Figure 2.1. The normal Q-Q plot for the pooled data set is shown in Figure

2.3. It can be seen that the quantiles from the observed data match reasonably well with

the theoretical quantiles up to ε values of 3.5 or 4. Beyond ε = ±4, there is no longer

enough data to study possible truncation. This large data set thus contradicts claims that

an ε truncation at less than 4 is reasonable, and provides no evidence to support truncation


Figure 2.1: The normal Q-Q plots of the normalized intra-event residuals at four differentperiods. (a) T = 0.5 seconds (1560 samples) (b) T = 1.0 seconds (1548 samples) (c) T = 2.0seconds (1498 samples) (d) T = 10.0 seconds (507 samples).


at a larger value. This is consistent with the findings of other researchers examining large

data sets [Strasser et al., 2008, Abrahamson, 2006, Bommer et al., 2004].

Inter-event residuals

According to the ground-motion model of Campbell and Bozorgnia [2008], the standard

deviation of the inter-event residuals (η) depends on the rock PGA at the sites. As a result,

while the η values computed at any particular period are identical across all the sites during

a given earthquake, the normalized inter-event residuals (η) vary across sites even during

a single earthquake (because the standard deviation, τ , with which they are normalized

varies from site to site). This makes it impossible to use η for the normality study. It

is seen, however, using the records in the PEER NGA Database [2005] that over 90% of

the standard deviations of η’s (obtained using the ground-motion model of Campbell and

Bozorgnia [2008]) lie within a reasonably narrow interval (with an approximate range of

0.04). Hence, homoscedasticity (i.e., constant variance) of η is considered to be reasonable

and so the η values are used as such, without normalization.

Figure 2.4 shows the normal Q-Q plot obtained using the η values corresponding to

four different periods. The theoretical quantiles are obtained using a normal distribution

with zero mean and a standard deviation that equals the sample standard deviation (which

does not equal one since the η values are not normalized). It is seen from Figures 2.4a-

d that the normal Q-Q plots match reasonably well with the 45 ◦ straight lines, thereby

indicating the univariate normality of inter-event residuals.

2.4 Testing the assumption of multivariate normality for

random vectors using independent samples

In this section, several statistical tests are presented that can be used with observed ground-

motion data to test the validity of the assumed multivariate normal distribution for logarith-

mic spectral accelerations.

A given ground motion will have spectral acceleration values that vary stochastically

as a function of period. Hence, for any d periods, T = [T1,T2, ...,Td], let the corresponding


Figure 2.2: The histogram of the 12,194 pooled normalized intra-event residuals computedat 10 periods, with the theoretical standard normal distribution (scaled) superimposed.


values of spectral acceleration at the sites be denoted by S ja(Ti), where j is an index that

denotes a given recording, while Ti indicates a particular period. The mathematical proce-

dures explained in this section can be used to test whether the random vectors of logarithmic

spectral accelerations, [ln(Sa(T1)) , ln(Sa(T2)) , · · · , ln(Sa(Td))], are jointly normal.

Testing for multivariate normality is much more complex than testing for univariate

normality since there are many more properties in a multivariate distribution to be con-

sidered during the test. Among the many possible tests for multivariate normality of a

given data set, eight are reviewed in detail by Mecklin and Mundfrom [2003]. They exam-

ined the power of the eight tests using a Monte Carlo study for several data sets that had

pre-determined multivariate distributions. They recommend the use of the Henze-Zirkler

test [Henze and Zirkler, 1990] as a formal test of multivariate normality, complemented

by other test procedures such as the Mardia’s skewness and kurtosis tests [Mardia, 1970].

Multivariate normality can also be tested using the Chi-square plot (also known as the

gamma plot) [Johnson and Wichern, 2007], which is a multivariate equivalent of the nor-

mal Q-Q plot. The procedure to obtain the Chi-square plot is similar to that used for a

normal Q-Q plot except that squared Mahalanobis distances [Mardia et al., 1979] of data

samples are used in place of the data quantiles and a theoretical Chi-square distribution is

used in place of the theoretical normal distribution. A departure from linearity indicates

departure from multivariate normality. In this work, however, only the three more quantita-

tive tests, namely, the Henze-Zirkler test and Mardia’s test of skewness and of kurtosis are

used. These three tests are described in the following paragraphs.

2.4.1 Henze-Zirkler test

Henze and Zirkler [1990] proposed a class of invariant consistent tests for testing multi-

variate normality. The test procedure is based on the computation of a defined test statistic

which is a function of the given data and whose asymptotic distribution is known if the data

follows a multivariate normal distribution. The statistic can be compared to the asymptotic

distribution to test whether the data set can be reasonably assumed to be normal. The

Henze-Zirkler test statistic is defined as follows: Let X1,X2, ...,Xn be a set of n indepen-

dent data samples (i.e., the X1,X2, ...,Xn are obtained from n independent records) each of


dimension d (i.e., Xi = {Xi1,Xi2, ...,Xid}). It is to be noted that the variables Xi( j1) and Xi( j2)

could be correlated.

Tn,β =1n

n

∑k=1

n

∑j=1

[exp(−β 2

2

∥∥Y j−Yk∥∥2)]

−2(1+β2)−

d2

n

∑j=1

[exp(− β 2

2(1+β 2)

∥∥Yj∥∥2)]

+n(1+2β

2)− d2 (2.3)

where

β = 1√2

(2d+14

) 1d+4 n

1d+4∥∥Y j−Yk

∥∥2=(X j−Xk

)′S−1 (X j−Xk

)∥∥Y j∥∥2

=(X j− Xn

)′S−1 (X j− Xn

)where Tn,β is the test statistic; Xn is the sample mean vector of the n realizations X1, ...,Xn

and S is the sample covariance matrix defined as S = 1n ∑

nj=1(X j− Xn

)(X j− Xn

)′Henze and Zirkler [1990] also approximated the limiting distribution of Tn,β (given the

multivariate normality of X) with a lognormal distribution with the mean and the variance

defined as follows:

E[Tβ

]= 1−

(1+2β

2)− d2

[1+

dβ 2

1+2β 2 +d(d +2)β 4

2(1+2β 2)2

](2.4)

Var[Tβ

]= 2

(1+4β

2)− d2 +2

(1+2β

2)−d[

1+2dβ 4

(1+2β 2)2 +

3d(d +2)β 8

4(1+2β 2)4

]

−4w(β )−d2

[1+

3dβ 4

2w(β )+

d(d +2)β 8

2w(β )2

](2.5)

where w(β ) =(1+β 2)(1+3β 2)

Based on the value of the statistic computed using the data and the asymptotic distribu-

tion of Tn,β , the p-value of the test of multivariate normality can be calculated. The p-value

is the probability of obtaining a statistic value that is at least as extreme as the statistic

computed from the data, if the null hypothesis of multivariate normality were true. The


smaller the p-value, the stronger the evidence against the null hypothesis. It is suggested

that this test be used if the sample size n is at least 20 [Henze and Zirkler, 1990].

2.4.2 Mardia’s measures of kurtosis and skewness

Mardia [1970] extended the concepts of kurtosis and skewness from the univariate case

to the multivariate case. Mardia [1970] also obtained the asymptotic distribution of the

multivariate kurtosis and skewness parameters (which is needed to test the null hypothesis

of multivariate normality).

Multivariate kurtosis

Mardia [1970] defined the multivariate kurtosis coefficient as follows:

K = E[(X−µµµ)

′ΣΣΣ−1 (X−µµµ)

]2(2.6)

where X = [X1,X2, ...,Xn] is the random vector whose distribution is tested; µµµ is the mean

vector of X; (X−µµµ)′

refers to the transpose of (X−µµµ) and ΣΣΣ is the covariance matrix of

X. In practice, the value of multivariate kurtosis can be computed from the sample data as

follows:

k =1n

n

∑i=1

[(Xi− Xn

)′S−1 (Xi− Xn

)]2(2.7)

Mardia [1970] also showed that the asymptotic distribution of the above-defined mul-

tivariate kurtosis parameter (k) can be obtained from the following equation, if X follows

the multivariate normal distribution:

k− (d(d +2)(n−1)/(n+1))

(8d(d +2)/n)0.5 ⇒ N(0,1) (2.8)

where N(0,1) denotes the univariate standard normal distribution. The asymptotic distri-

bution can be used to test if the sample data are from a multivariate normally distributed

population, by allowing a p-value to be computed.


Multivariate skewness

Mardia [1970] and Mardia et al. [1979] defined the measure of multivariate skewness to be

as follows:

S = E[(X1−µµµ)

′ΣΣΣ−1 (X2−µµµ)

]3(2.9)

where X= [X1,X2, ...,Xn] is the random vector whose distribution is tested. This parameter

can be computed from the sample data as follows:

s =1n2

n

∑i=1

n

∑j=1

[(Xi− Xn

)′S−1 (X j− Xn

)]3(2.10)

The asymptotic distribution of the multivariate skewness parameter (s) can be obtained

from the following equation:

ns6⇒ χ

2d(d+1)(d+2)/6 (2.11)

where χ2d(d+1)(d+2)/6 is the Chi-square distribution with d(d +1)(d +2)/6 degrees of free-

dom. This asymptotic distribution can be used to test the null hypothesis of multivariate

normality.

The above procedures can be used to test the multivariate normality of any random vec-

tor using a set of independent data samples. For instance, these tests can be used to verify

the multivariate normality of intra-event residuals computed at multiple periods. In this

case, in order to obtain a set of independent data samples, each random vector (compris-

ing of intra-event residuals computed at multiple periods) must be obtained from records

that are independent of one another. A technique to obtain independent data samples is

discussed in a subsequent section.


As mentioned earlier, multivariate normality tests need to be performed on intra-event and

inter-event residuals in order to verify multivariate normality of the logarithmic spectral

accelerations. The intra-event residuals are normalized by the appropriate standard devia-

tions before use, while the inter-event residuals are used without normalization, for reasons


mentioned previously.

Normalized intra-event residuals at different periods

Let ε(T) = [ε(T1), ε(T2), · · · , ε(Td)] denote the random vector of normalized intra-event

residuals computed at d different periods. During an earthquake, different sites experience

different levels of ground motion based on their distance from the earthquake source, the

local soil conditions and other factors. These ground motions can be used to compute

samples (e j(T)) of the random vector ε(T) at site j. This section uses the samples e j(T)obtained at various sites to test whether ε(T) follows a multivariate normal distribution.

The results presented in this work are based on data from the 1994 Northridge earth-

quake and the 1999 Chi-Chi earthquake. The PEER NGA Database [2005] is used to

obtain the data and contains 160 records from the Northridge earthquake and 421 records

from the Chi-Chi earthquake (the aftershock data are not used). From these records, only

those used by the authors of the Campbell and Bozorgnia [2008] ground-motion model are

included in the analysis. Even this reduced data set can not be used as such because the

samples will not be independent of one another on account of the spatial correlation of the

ground motion during a given earthquake. It is known, however, that the correlation be-

tween ei(Tp) and e j(Tp) decreases with increasing separation distance between the sites i

and j, where Tp denotes any particular period. It is seen from the literature that the correla-

tion coefficient drops close to zero (i.e., the ε(Tp)’s are approximately uncorrelated) when

the separation distance exceeds 10km [Boore et al., 2003]. Moreover, it is shown subse-

quently in this chapter that the ε(Tp)’s obtained at different sites from a single earthquake

follow a multivariate normal distribution. Hence, approximately uncorrelated ε(Tp) values

are also approximately independent, and, therefore, samples of random vectors obtained

from recordings at mutually well-separated sites would be approximately independent and

can be used in the tests described in the previous section. Therefore, in the current work,

well-separated locations (with separation distances exceeding 20km) are identified for the

Northridge earthquake and the Chi-Chi earthquake and the tests of normality are performed

on the data set obtained by combining the Chi-Chi and the Northridge earthquake data.

There are several possible combinations of recordings that would satisfy the constraints on

the minimum separation distance and the minimum sample size (as defined in Section 2.4)


and hence the tests are carried out on the various allowable configurations. Though the test

results vary slightly based on the configuration used, p-values from only a single data set

are reported in this chapter. The combined data set has around 35 records at periods less

than or equal to 2 seconds and close to 30 records at periods below 7.5 seconds, which are

reasonable sample sizes for testing the hypothesis. At 10 seconds, however, the number of

independent samples available is 22, which barely exceeds the threshold of 20, mentioned

in Section 2.4. Hence, ε values computed at 10 seconds are not used often in the tests.

In order to strictly prove multivariate normality of ε , one must evaluate multivariate

normality of normalized residuals having all possible period combinations (i.e., all pairs,

triplets etc.). For all practical purposes, however, it is sufficient to consider the joint dis-

tribution of ε’s computed at five periods. Incidentally, if multivariate normality can be

established for such a case, it can be inferred that the lower-order combinations (i.e., sub-

sets of the five periods that are used) also follow a multivariate normal distribution and

do not have to be tested explicitly. This is because all subsets of a random vector X are

multivariate normal if X is multivariate normal [Johnson and Wichern, 2007].

Results from a set of hypothesis test results are shown in Table 2.1 and explained in

the following paragraphs. The table shows the set of periods at which the ε values are

computed and the p-values obtained based on the Henze-Zirkler test, the Mardia’s test of

skewness and the Mardia’s test of kurtosis. Case 1 shown in the table corresponds to the

bivariate normality tests on the ε’s obtained at 1 second and 2 seconds. The p-values re-

ported by all three tests are statistically insignificant at the 5% significance level typically

used for testing. In Case 2, five different periods ranging between 0.5 seconds and 2 sec-

onds are chosen. The Henze-Zirkler test and the test of skewness report highly insignificant

p-values, and the test of kurtosis reports a p-value of 0.05, which is insignificant as well.

The normality tests are also performed considering long periods. In Case 4, the periods are

chosen over the 0.5-7.5 seconds range, as shown in Table 2.1. The p-values reported by all

three tests are highly statistically insignificant. Finally, a test is carried out considering long

periods exclusively (Case 5); the p-values obtained from all the tests are also statistically

insignificant. Overall, there seems to be not much evidence to reject the null hypothesis

that ε computed at different periods follows a multivariate normal distribution.


Figure 2.3: The normal Q-Q plot of the pooled set of normalized intra-event residuals.

Table 2.1: Tests on normalized intra-event residuals computed at different periodsCase Periods (secs) PHZ PSK PKT1 T={1.0,2.0} 0.10 0.23 0.932 T={0.5,0.75,1.0,1.5,2.0} 0.49 0.92 0.053 T={0.5,1.0,2.0,5.0,7.5} 0.69 0.90 0.424 T={5.0,7.5,10.0} 0.19 0.14 0.62

Explanation of Abbreviations used in the tableaPHZ : p-value obtained from Henze-Zirkler testbPSK : p-value obtained from Mardia’s test of skewnesscPKT : p-value obtained from Mardia’s test of kurtosis


Figure 2.4: The normal Q-Q plots of inter-event residuals at four different periods. (a) T =0.5 seconds (64 samples) (b) T = 1.0 seconds (64 samples) (c) T = 2.0 seconds (62 samples)(d) T = 10.0 seconds (21 samples).


Inter-event residuals at different periods

This section discusses tests carried out on inter-event residuals (η) at multiple periods. The

number of inter-event residuals available for the tests ranges from 64 at 0.5 seconds to 40

at 7.5 seconds. Only 21 records are available, however, at 10 seconds.

Table 2.2 shows the hypothesis test results based on η values. In Case 1, η values at

two periods, 1 second and 2 seconds, are tested for bivariate normality. It can be seen that

the p-values reported by all three tests are highly insignificant. In Case 2, five different

periods are chosen ranging between 0.5 and 2 seconds. The table shows that the p-values

reported by all three tests are statistically significant. The authors believe, however, that

this is a result of the deviations from marginal normality due to the small sample size being

carried over to the higher-order distributions (i.e., even if the true marginal distribution is

normal, a sample from the distribution will not be exactly normal). In order to verify this,

the η values are again computed at the same set of periods as in Case 2 and are trans-

formed so that their marginal distributions are normal (in order to remove the deviations in

the sample’s univariate distribution from the normal distribution), using the normal score

transform procedure described by Deutsch and Journel [1998]. It is to be noted that the

normal score transform (or any other monotonic transform) of the univariate distribution

can not change the basic nature of the bivariate and the other multivariate distributions.

Further, the marginal distribution of η has been shown to be normal in section 2.3 and

hence, the transformation of the marginal distribution of the sampled data does not inter-

fere with the tests for multivariate normality. This transformation procedure is described in

Appendix 2.8. The tests are performed on the transformed data (Case 3) and the p-values

corresponding to all three tests are seen to increase significantly, indicating that the statis-

tically significant p-values in Case 2 is probably a result of the deviation of the sample’s

marginal distribution from a normal distribution rather than an indicator of non-normality

in the joint distribution.

Case 4 involves testing η values at five periods ranging from 0.5-7.5 seconds. The

reported p-values are, again, found to be insignificant. In Case 5, η’s at three long periods

are tested for multivariate normality. The p-values reported by the three tests are highly

statistically insignificant. It can, hence, be concluded from the results that it is reasonable to


assume that the η’s computed at different periods follow a multivariate normal distribution.

Since both the inter-event and intra-event residuals computed at multiple periods follow

multivariate normal distributions, it is concluded that the logarithmic spectral accelerations

computed at different periods, at a given site during a given earthquake, follow a multivari-

ate normal distribution.

Spectral acceleration values at different orientations

This section describes tests carried out to verify whether spectral acceleration values corre-

sponding to two different orientations at a site follow a bivariate normal distribution. The

test procedures are identical to those described in section 2.4, except that the random vec-

tor would now be written as[SH1

a (T1),SH2a (T2)

], where H1 and H2 refer to two orthogonal

horizontal orientations (e.g., the fault-normal and the fault-parallel directions) and T1 and

T2 denote the periods in consideration in the two orthogonal directions.

In order to verify bivariate normality of the spectral accelerations corresponding to

two different orientations, normality tests should be carried out on the inter-event and the

intra-event residuals separately. The inter-event residuals in the fault-normal and the fault-

parallel directions, however, are not known. As a result, an approximate test for bivariate

normality of spectral accelerations in different orientations is carried out by performing

tests on normalized total residuals. Total residuals are computed based on the following

alternate formulation of the ground-motion equations:

ln(Y ) = ln(Y )+δ (2.12)

where Y denotes the ground-motion parameter of interest; Y denotes the predicted median

value of the ground-motion parameter; δ refers to the total residual, which is a random

variable that represents both the inter-event and the intra-event residuals. From equations

2.1 and 2.12, it can be inferred that δ has zero mean and standard deviation√

σ2 + τ2.

Hence, normalized total residuals (δ ) can be obtained as δ√σ2+τ2 .

In this work, δ values are computed using the fault-normal and the fault-parallel time

histories observed during the Chi-Chi and the Northridge earthquakes [Chiou et al., 2008].


As mentioned earlier, the tests described in section 2.4 require independent data samples

and hence, pairs of fault-normal and fault-parallel residuals are computed at well-separated

sites (separation distances exceeding 20km).

Table 2.3 shows a sample of the multivariate normality test results obtained when δ

values are computed at different orientations (fault-normal and fault-parallel) and/ or dif-

ferent periods. In Case 1, the δ values corresponding to the fault-normal direction and the

fault-parallel direction are computed at the same period (2 seconds). The three tests of

multivariate normality report insignificant p-values in this case. In Case 2, the δ ’s corre-

sponding to the fault-normal and the fault-parallel directions are computed at 2 different

periods. All three tests report insignificant p-values in Case 2 as well. Finally, it is intended

to check if a larger separation in the periods affects the bivariate distributional properties.

Hence, in Case 3, the fault normal δ values are computed at 0.5 seconds, while the fault-

parallel δ values are computed at 10 seconds. It can be seen from the table that the p-values

are highly insignificant in this case as well.

2.5 Testing the assumption of multivariate normality for

spatially distributed data

The tests that have been described so far are only valid for testing random vectors using

independent samples. While testing spatially-distributed data from a given earthquake,

ground-motion recordings at closely-separated sites should also be considered and hence, it

is not possible to obtain independent samples using the techniques described in section 2.4.

Hence, certain other tests are needed for testing the multivariate normality assumption for

ground-motion intensities distributed over space. Multivariate normality can be ascertained

by verifying univariate normality, bivariate normality, trivariate normality etc. Goovaerts

[1997] and Deutsch and Journel [1998] described a procedure to test the assumption of

bivariate normality of spatially-distributed data whose marginal distribution is standard

normal. This test procedure can be used to verify whether pairs of residuals computed at

two different sites during a single earthquake follow a bivariate normal distribution. The

test is described in the following subsection, followed by test results from recorded ground


motions.

2.5.1 Check for bivariate normality

Let X(u) denote the random variable (for example, the residuals) in consideration at lo-

cation u and let X(u+ h) denote the random variable in consideration at location u+ h

(h denotes the spatial separation between the 2 locations). The procedure to test bivari-

ate normality [Goovaerts, 1997, Deutsch and Journel, 1998] involves the comparison of

the indicator semivariogram of the data (the experimental indicator semivariogram) to the

theoretical indicator semivariogram obtained by assuming that (X(u),X(u+h)) follows a

bivariate normal distribution.

An indicator semivariogram is a measure of spatial variability and is defined as follows:

γI (h;xp) =12

E([I (X(u+h);xp)− I (X(u);xp)]

2)

(2.13)

where xp denotes the p-quantile of X , and I (X(u);xp) = 1 if X(u)≤ xp;= 0 otherwise.

The experimental indicator semivariogram is a regression-based relationship between

γI (h;xp) and h. In this study, an exponential model is assumed as the form of the regression.

Based on an exponential model, the experimental indicator semivariogram can be defined

as follows:

γI (h;xp) = axp

[1− exp

(−3h/bxp

)](2.14)

where axp and bxp are the sill and the range of the experimental indicator semivariogram

respectively. The sill of a semivariogram equals the variance of X , while the range of a

semivariogram is defined as the separation distance h at which γI (h;xp) equals 0.95 times

the sill (for the exponential model). The range and the sill can be computed using non-

linear least squares regression based on observed values of γI (h;xp) and h. The values

(observed) of γI (h;xp) for a given data set can be obtained as follows (based on Equation

2.13):

γI (h;xp) =1

2N(h)

N(h)

∑α=1

[I (X(uα +h);xp)− I (X(uα);xp)]2 (2.15)


where N(h) is the number of pairs of data points separated by h (within some tolerance);

and (X(uα +h),X(uα)) denotes the α th such pair.

Theoretically, if X(u) and X(u+h) follow a bivariate normal distribution, the indicator

semivariogram is [Goovaerts, 1997]:

γI(h;xp) = p−

[p2 +

12π

∫ sin−1Cx(h)

0exp

(−x2

p

1+ sin(θ)

)dθ

](2.16)

where Cx(h) denotes the covariance model of X , given as follows:

CX(h) =Covariance(X(u),X(u+h)) (2.17)

The null hypothesis that X(u) and X(u+ h) follow a bivariate normal distribution is

not rejected if the experimental indicator semivariogram compares well to the theoretical

indicator semivariogram.

As mentioned earlier, univariate and bivariate normality are not sufficient conditions

for multivariate normality. For realistic data sets, however, the tests for trivariate normality

and normality at other higher dimensions are impractical. This is because, for example, the

trivariate normality test requires many triplets of data points that have the same geometric

configuration (in terms of the spatial orientation of the three points), which are usually

not available. Hence, in practice, if the sample statistics do not show a violation of the

univariate and bivariate normalities, a multivariate normal model can be assumed for X

[Goovaerts, 1997].


If the spatially-distributed normalized intra-event residuals (ε) follow a multivariate normal

distribution, it can be seen from equation 2.2 that the logarithmic spectral accelerations

conditioned on the predicted median spectral accelerations will be multivariate normal as

well. This is because the inter-event residuals at any particular period are constant across

all sites, during any single earthquake. Hence, in this section, normality tests are carried

out on the normalized intra-event residuals (ε) only.


It has been shown previously that the ε values can be represented by a normal distribu-

tion marginally, and hence, only the bivariate normality test results are considered in this

section. To prevent the deviations in the sample’s univariate distribution from the normal

distribution (which can arise even if the population actually follows a univariate normal

distribution) from affecting the results of the bivariate normality test, the univariate distri-

butions of ε are transformed to the standard normal space using the normal score transform

procedure. As mentioned earlier, the normal score transform of the univariate distribution

does not change the basic nature of the bivariate distributions and hence, does not interfere

with the test of bivariate normality.

The procedure to test the bivariate normality of spatially-distributed data described by

Goovaerts [1997] involves comparing the theoretical and the experimental indicator semi-

variograms obtained based on the ε values computed at various periods and for all quantiles

xp (Equation 2.14 and 2.16). However such an exhaustive test is practically impossible and

so a few sample periods and quantiles are tested here. Based on the symmetry of the bi-

variate normal distribution, only values of p in the interval [0, 0.5] are needed. The authors

present results corresponding to p = 0.1, 0.25 and 0.5, so as to cover the entire range. The

periods chosen for the illustrations vary over the range of periods for which the ground-

motion models are usually valid.

Figures 2.5a-c show comparisons of the theoretical and the experimental indicator semi-

variograms obtained using the Chi-Chi data set, with the ε values computed at a period of

2 seconds. It is to be noted that all records (that are usuable at the chosen period) can

be part of the sample data used for obtaining the experimental indicator semivariograms

(unlike in section 2.4 where the sample data had to be independent of each other). The the-

oretical and the experimental indicator semivariograms match reasonably well in all cases.

Figure 2.5d shows the comparison of the theoretical and the experimental indicator semi-

variograms (p = 0.25) for the ε values computed at T = 2 seconds based on the Northridge

earthquake data set, and a reasonable match can be seen there as well. Similar plots are

obtained using the Northridge and the Chi-Chi earthquake data sets and are shown in Fig-

ure 2.6. In obtaining this figure, the value of p is kept constant at 0.25, while the value

of T is varied from as low as 0.5 seconds to as high as 5 seconds. A reasonably good

match between the theoretical and the experimental semivariograms can be seen in these


0

0.04

0.08

0.12

0.16

0.2

0

0.04

0.08

0.12

0.16

0.2

Experimental indicator semivariogramTheoretical indicator semivariogram

0

0.05

0.1

0.15

0.2

0.25

0.3

0

0.05

0.1

0.15

0.2

0.25

0.3

0 50 100 150 200Distance (km)

0 50 100 150 200Distance (km)

0 50 100 150 200Distance (km)

0 50 100 150 200Distance (km)

γ I(h;

xp)

γ I(h;

xp)

γ I(h;

xp)

γ I(h;

xp)

(a) (b)

(c) (d)

Figure 2.5: Theoretical and empirical semivariograms for residuals computed at 2 seconds:(a) results for the 0.1 quantile of the residuals from the Chi-Chi data (b) results for the0.25 quantile of the residuals from the Chi-Chi data (c) results for the 0.5 quantile of theresiduals based from the Chi-Chi data (d) results for the 0.25 quantile of the residuals fromthe Northridge data.


figures as well. All these results suggest that bivariate normality can be safely assumed for

spatially-distributed ε’s. Incidentally, it can be seen from Figures 2.5 and 2.6 that the sill of

the indicator semivariograms equals p(1− p), which is a consequence of the independence

between well-separated intra-event residuals [Goovaerts, 1997].

2.6 Conclusions

Statistical tests have been used to test the assumption of joint normality of logarithmic

spectral accelerations. Joint normality of logarithmic spectral accelerations was verified

by testing the multivariate normality of inter-event and intra-event residuals. Univariate

normality of inter-event and intra-event residuals was studied using normal Q-Q plots. The

normal Q-Q plots showed strong linearity, indicating that the residuals are well represented

by a normal distribution marginally. No evidence was found to support truncation of the

marginal distribution of intra-event residuals as is sometimes done in PSHA. Using the

Henze-Zirkler test, the Mardia’s test of skewness and the Mardia’s test of kurtosis, it was

shown that inter-event and the intra-event residuals at a site, computed at different periods,

follow multivariate normal distributions. The normality test of Goovaerts was used to il-

lustrate that pairs of spatially-distributed intra-event residuals can be represented by the bi-

variate normal distribution. For a set of correlated spatially-distributed data, it is practically

impossible to ascertain the trivariate normality and the normality at higher dimensions and

hence, the presence of univariate and bivariate normalities is considered to indicate multi-

variate normality of the spatially-distributed intra-event residuals [Goovaerts, 1997]. The

results reported in this study are based on the residuals computed using the ground-motion

model of Campbell and Bozorgnia [2008], but similar results were obtained when using

the Boore and Atkinson [2008] ground-motion model. This study provides a sound statis-

tical basis for assumptions regarding the marginal and joint distribution of ground-motion

parameters that must be made for a variety of seismic hazard calculations.


0

0.05

0.1

0.15

0.2

0.25

0

0.05

0.1

0.15

0.2

0.25

0

0.04

0.08

0.12

0.16

0.2

0

0.04

0.08

0.12

0.16

0.2

0 50 100 150 200Distance (km)

0 50 100 150 200Distance (km)

0 50 100 150 200Distance (km)

0 50 100 150 200Distance (km)

γ I(h;

xp)

γ I(h;

xp)

γ I(h;

xp)

γ I(h;

xp)

Experimental indicator semivariogramTheoretical indicator semivariogram

(a) (b)

(c) (d)

Figure 2.6: Theoretical and empirical semivariograms for the 0.25 quantile of the residuals:(a) results for the residuals computed at 0.5 seconds from the Northridge data (b) resultsfor the residuals computed at 0.5 seconds from the Chi-Chi data (c) results for the resid-uals computed at 1 second from the Chi-Chi earthquake data (d) results for the residualscomputed at 5 seconds from the Chi-Chi data.


2.7 Data source

The data for all the ground motions studied here came from the PEER NGA Database

[2005].

http://peer.berkeley.edu/nga (last accessed 18 May 2007).

2.8 Appendix: Normal score transform

The data sample can be transformed to have a standard normal distribution by a normal

score transform. The transformation involves equating the various quantiles of the data to

the corresponding quantiles of a standard normal distribution.

Let z represent the given data set and let the empirical cumulative distribution function

of the data be denoted by F(z). The F(z)-quantile of the standard normal distribution

is given by Φ−1 (F(z)), where Φ represents the standard normal cumulative distribution

function. Hence, for a given zk, the corresponding normal score value (yk) is computed as

follows:

yk = Φ−1 (F(zk)

)(2.18)


Table 2.2: Tests on inter-event residuals computed at different periodsCase Periods (secs) PHZ PSK PKT1 T={1.0,2.0} 0.85 0.20 0.352 T={0.5,0.75,1.0,1.5,2.0} 0.00 0.01 0.013 T={0.5,0.75,1.0,1.5,2.0; Norm.} 0.24 0.11 0.114 T={0.5,1.0,2.0,5.0,7.5} 0.79 0.28 0.415 T={5.0,7.5,10.0} 0.68 0.18 0.31

Explanation of Abbreviations used in the tableaNorm.: Data transformed to the standard normal space

Table 2.3: Tests on residuals corresponding to two orthogonal directions (fault-normal andfault-parallel directions)

Case Periods (secs) PHZ PSK PKT1 T1=2;T2=2 0.14 0.13 0.412 T1=1;T2=2 0.17 0.34 0.963 T1=0.5;T2=10 0.94 0.80 0.22

Chapter 3

Correlation model for spatiallydistributed ground-motion intensities

N. Jayaram and J.W. Baker (2009). Correlation model for spatially-distributed ground-

motion intensities, Earthquake Engineering and Structural Dynamics, 38(15), 1687-1708.

3.1 Abstract

Risk assessment of spatially-distributed building portfolios or infrastructure systems re-

quires quantification of the joint occurrence of ground-motion intensities at several sites,

during the same earthquake. The ground-motion models that are used for site-specific haz-

ard analysis do not provide information on the spatial correlation between ground-motion

intensities, which is required for the joint prediction of intensities at multiple sites. More-

over, researchers who have previously computed these correlations using observed ground-

motion recordings differ in their estimates of spatial correlation. In this chapter, ground

motions observed during seven past earthquakes are used to estimate correlations between

spatially-distributed spectral accelerations at various spectral periods. Geostatistical tools

are used to quantify and express the observed correlations in a standard format. The es-

timated correlation model is also compared to previously published results, and apparent

discrepancies among the previous results are explained.

48

CHAPTER 3. SPATIAL CORRELATION MODEL 49

The analysis shows that the spatial correlation reduces with increasing separation be-

tween the sites of interest. The rate of decay of correlation typically decreases with increas-

ing spectral acceleration period. At periods longer than 2 seconds, the correlations were

similar for all the earthquake ground motions considered. At shorter periods, however,

the correlations were found to be related to the local-site conditions (as indicated by site

Vs30 values) at the ground-motion recording stations. The research work also investigates

the assumption of isotropy used in developing the spatial correlation models. It is seen

using Northridge and Chi-Chi earthquake time histories that the isotropy assumption is

reasonable at both long and short periods. Based on the factors identified as influencing the

spatial correlation, a model is developed that can be used to select appropriate correlation

estimates for use in practical risk assessment problems.

3.2 Introduction

The probabilistic assessment of ground-motion intensity measures (such as spectral accel-

eration) at an individual site is a well researched topic. Several ground-motion models

have been developed to predict median ground-motion intensities as well as dispersion

about the median values [e.g., Boore and Atkinson, 2008, Abrahamson and Silva, 2008,

Chiou and Youngs, 2008, Campbell and Bozorgnia, 2008]. Site-specific hazard analysis

does not suffice, however, in many applications that require knowledge about the joint oc-

currence of ground-motion intensities at several sites, during the same earthquake. For

instance, the risk assessment of portfolios of buildings or spatially-distributed infrastruc-

ture systems (such as transportation networks, oil and water pipeline networks and power

systems) requires prediction of ground-motion intensities at multiple sites. Such joint pre-

dictions are possible, however, only if the correlation between ground-motion intensities at

different sites are known [e.g., Lee and Kiremidjian, 2007, Bazzurro and Luco, 2004]. The

correlation is known to be large when the sites are close to one another, and decays with

increase in separation between the sites. Park et al. [2007] report that ignoring or underes-

timating these correlations overestimates frequent losses and underestimates rare ones, and

hence, it is important that accurate ground-motion correlation models be developed for loss

assessment purposes. The current work analyzes correlations between the ground-motion


intensities observed in recorded ground motions, in order to identify factors that affect these

correlations, and to select a correlation model that can be used for the joint prediction of

spatially-distributed ground-motion intensities in future earthquakes.

Ground-motion models that predict intensities at an individual site i due to an earth-

quake j take the following form:

ln(Yi j) = ln(Yi j)+ εi j +η j (3.1)

where Yi j denotes the ground-motion parameter of interest (e.g., Sa(T ), the spectral acceler-

ation at period T ); Yi j denotes the predicted (by the ground-motion model) median ground-

motion intensity (which depends on parameters such as magnitude, distance, period and

local-site conditions); εi j denotes the intra-event residual, which is a random variable with

zero mean and standard deviation σi j; and η j denotes the inter-event residual, which is a

random variable with zero mean and standard deviation τ j. The standard deviations, σi j

and τ j, are estimated as part of the ground-motion model and are a function of the spectral

period of interest, and in some models also a function of the earthquake magnitude and the

distance of the site from the rupture. During an earthquake, the inter-event residual(η j)

computed at any particular period is a constant across all the sites.

Chapter 2 [Jayaram and Baker, 2008] showed that a vector of spatially-distributed intra-

event residuals εεε jjj =(ε1 j,ε2 j, · · · ,εd j

)follows a multivariate normal distribution. Hence,

the distribution of εεε jjj can be completely defined using the first two moments of the dis-

tribution, namely, the mean and variance of εεε jjj, and the correlation between all εi1 j and

εi2 j pairs (Alternately, the distribution can be defined using the mean and the covariance

of εεε jjj, since the covariance completely specifies the variance and correlations.) Since the

intra-event residuals are zero-mean random variables, the mean of εεε jjj is the zero vector

of dimension d. The covariance, however, is not entirely known from the ground-motion

models since the models only provide the variances of the residuals, and not the correlation

between residuals at two different sites.

Researchers, in the past, have computed these correlations using ground-motion time

histories recorded during earthquakes [Goda and Hong, 2008, Wang and Takada, 2005,

Boore et al., 2003]. Boore et al. [2003] used observations of peak ground acceleration


(PGA, which equals Sa(0)) from the 1994 Northridge earthquake to compute the spatial

correlations. Wang and Takada [2005] computed the correlations using observations of

peak ground velocities (PGV) from several earthquakes in Japan and the 1999 Chi-Chi

earthquake. Goda and Hong [2008] used the Northridge and Chi-Chi earthquake ground-

motion records to compute the correlation between PGA residuals, as well as the correlation

between residuals computed from spectral accelerations at three periods between 0.3 sec-

onds and 3 seconds. The results reported by these research works, however, differ in terms

of the rate of decay of correlation with separation distance. For instance, while Boore et al.

[2003] report that the correlation drops to zero at a site separation distance of approxi-

mately 10 km, the non-zero correlations observed by Wang and Takada [2005] extend past

100 km. Further, Goda and Hong [2008] observe differences between the correlation decay

rate estimated using the Northridge earthquake records and the correlation decay rate based

on the Chi-Chi earthquake records. To date, no explanation for these differences has been

identified.

The current work uses observed ground motions to estimate correlations between spec-

tral accelerations at the same period. (Appendix A describes the estimation of cross-

correlations between spectral accelerations at two different periods.) Factors that affect

the rate of decay in the correlation with separation distance are identified. The work also

provides probable explanations for the differing results reported in the literature. In this

study, an emphasis is placed on developing a standard correlation model that can be used

for predicting spatially-distributed ground-motion intensities for risk assessment purposes.

3.3 Modeling correlations using semivariograms

Geostatistical tools are widely used in several fields for modeling spatially-distributed ran-

dom vectors (also called random functions) [Deutsch and Journel, 1998, Goovaerts, 1997].

The current research work takes advantage of this well-developed approach to model the

correlation between spatially-distributed ground-motion intensities. The needed tools are

briefly described in this section.

Let Z = (Zu1 ,Zu2, · · · ,Zud) denote a spatially-distributed random function, where ui

denotes the location of site i; Zui is the random variable of interest (in this case, εui j from


equation 3.1) at site location ui and d denotes the total number of sites. The correlation

structure of the random function Z can be represented by a semivariogram, which is a

measure of the average dissimilarity between the data [Goovaerts, 1997]. Let u and u′

denote two sites separated by h. The semivariogram (γ(u,u′)) is computed as half the

expected squared difference between Zu and Zu′ .

γ(u,u′) =12

E[{Zu−Zu′}2] (3.2)

The semivariogram defined in equation 3.2 is location-dependent and its inference re-

quires repetitive realizations of Z at locations u and u′. Such repetitive measurements of

{Zu,Zu′} are, however, never available in practice (e.g., in the current application, one

would need repeated observations of ground motions at every pair of sites of interest).

Hence, it is typically assumed that the semivariogram does not depend on site locations u

and u′, but only on their separation h. The stationary semivariogram (γ(h)) can then be

obtained as follows:

γ(h) =12

E[{Zu−Zu+h}2] (3.3)

Equation 3.2 can be replaced with equation 3.3 if the random function (Z) is second-

order stationary. Second-order stationarity implies that (i) the expected value of the random

variable Zu is a constant across space and (ii) the two-point statistics (measures that depend

on Zu and Zu′) depend only on the separation between u and u′, and not on the actual

locations (i.e., the statistics depend on the separation vector h between u and u′ and not on

u and u′ as such). A stationary semivariogram can be estimated from a data set as follows:

γ(h) =1

2N(h)

N(h)

∑α=1{zuα− zuα+h}2 (3.4)

where γ(h) is the experimental stationary semivariogram (estimated from a data set); zu

denotes the data value at location u; N(h) denotes the number of pairs of sites separated

by h; and {zuα,zuα+h} denotes the α’th such pair. A stationary semivariogram is said to be

isotropic if it is a function of the separation distance (h = ‖h‖) rather than the separation

vector h.

The function γ(h) provides a set of experimental values for a finite number of separation


Figure 3.1: (a) Parameters of a semivariogram (b) Semivariograms fitted to the same dataset using the manual approach and the method of least squares.

vectors h. A continuous function must be fitted based on these experimental values in

order to deduce semivariogram values for any possible separation h. A valid (permissible)

semivariogram function needs to be negative definite so that the variances and conditional

variances corresponding to this semivariogram are non-negative. In order to satisfy this

condition, the semivariogram functions are usually chosen to be linear combinations of

basic models that are known to be permissible. These include the exponential model, the

Gaussian model, the spherical model and the nugget effect model.

The exponential model, in an isotropic case (i.e., the vector distance h is replaced by a

scalar separation length ‖h‖, also denoted as h), is expressed as follows:

γ(h) = a [1− exp(−3h/b)] (3.5)

where a and b are the sill and the range of the semivariogram function respectively (Figure

3.1a). The sill of a semivariogram equals the variance of Zu, while the range is defined

as the separation distance h at which γ(h) equals 0.95 times the sill of the exponential

semivariogram.


The Gaussian model is as follows:

γ(h) = a[1− exp

(−3h2/b2)] (3.6)

The sill and the range of a Gaussian semivariogram are as defined for an exponential semi-

variogram.

The Spherical model is as follows:

γ(h) = a

[32

(hb

)− 1

2

(hb

)3]

if h≤ b (3.7)

= a otherwise

where a and b are again the sill and range of the semivariogram, respectively. The range of

a spherical semivariogram is the separation distance at which γ(h) equals a.

The nugget effect model can be described as:

γ(h) = a [I (h > 0)] (3.8)

where I (h > 0) is an indicator variable that equals 1 when h > 0 and equals 0 otherwise.

The covariance structure of Z is completely specified by the semivariogram function

and the sill and the range of the semivariogram. It can be theoretically shown that the

following relationship holds [Goovaerts, 1997]:

γ(h) = a(1−ρ (h)) (3.9)

where ρ (h) denotes the correlation coefficient between Zu and Zu+h. It can also be shown

that the sill of the semivariogram equals the variance of Zu. Therefore, it would suffice

to estimate the semivariogram of a random function in order to determine its covariance

structure. Moreover, based on equations 3.5 (for instance) and 3.9, it can be seen that a

large range implies a small rate of increase in γ(h) and therefore, large correlations between

Zu and Zu+h. Further, it can be seen from equation 3.8 that the nugget effect model specifies

zero correlation for all non-zero separation distances.

In the current work, correlations between ground-motion intensities at different sites


are represented using semivariograms. Ground-motion recordings from past earthquakes

are used to estimate ranges of semivariograms and to identify the factors that could affect

the estimates. Throughout this work, the semivariograms are assumed to be second-order

stationary. Second-order stationarity is assumed so that the data available over the entire

region of interest can be pooled and used for estimating semivariogram sills and ranges. In

the current work, like many other works involving spatial-correlation estimation, the semi-

variograms are also assumed to be isotropic. The assumptions of stationarity and isotropy

are investigated in more detail subsequently in this chapter.

3.4 Computation of semivariogram ranges for intra-event

residuals using empirical data

As mentioned earlier, the covariance of intra-event residuals can be represented using a

semivariogram, whose functional form (e.g., exponential model), sill and range need to

be determined. This section discusses the semivariograms estimated based on observed

ground-motion time histories.

For a given earthquake, it can be seen from equation 3.1 that,

εi +η = ln(Yi)− ln(Yi) (3.10)

Let εi denote the normalized intra-event residual at site i (The subscript j in equation

3.1 is no longer used since the residuals used in these calculations are observed during a

single earthquake.) εi is computed as follows:

εi =εi

σi(3.11)

where σi denotes the standard deviation of the intra-event residuals at site i. Further, let εi

denote the sum of the intra-event residual (εi) and inter-event residual (η) normalized by

the standard deviation of the intra-event residual (σi). εi can be computed as follows:

εi =εi +η

σi=

ln(Yi)− ln(Yi)

σi(3.12)


While assessing covariances, it is convenient to work with ε’s rather than ε’s, since ε’s

are homoscedastic (i.e., constant variance) with unit variance unlike the ε’s.

Since the inter-event residual (η), computed at any particular period, is a constant across

all the sites during a given earthquake, the experimental semivariogram function of ε can

be obtained as follows (based on equation 3.4):

γ(h) =1

2N(h)

N(h)

∑α=1

[εuα− εuα+h]

2 (3.13)

=1

2N(h)

N(h)

∑α=1

[ln(Yuα

)− ln(Yuα)−η

σuα

−ln(Yuα+h)− ln(Yuα+h)−η

σuα+h

]2

≈ 12N(h)

N(h)

∑α=1

[ln(Yuα

)− ln(Yuα)

σuα

−ln(Yuα+h)− ln(Yuα+h)

σuα+h

]2

=1

2N(h)

N(h)

∑α=1

[εuα− εuα+h]

2

where ε is defined by equation 3.12; (uα ,uα +h) denotes the location of a pair of sites

separated by h; N(h) denotes the number of such pairs; Yuαdenotes the ground-motion

intensity at location uα ; and σuαis the standard deviation of the intra-event residual at

location uα . The sill of the semivariogram of ε (i.e., the sill of γ(h)) should equal 1 since

the ε’s have a unit variance. Hence, based on equation 3.9, it can be concluded that:

γ(h) = 1− ρ (h) (3.14)

where ρ (h) is the estimate of ρ (h).

Incidentally, equation 3.13 shows that the covariances of intra-event residuals can be

estimated without having to account for the inter-event residual η . As indicated, equation

3.13 involves an approximation due to the mild assumption that η

σuα

= η

σuα+h. The Boore

and Atkinson [2008] model, which is used in the current work, suggests that the standard

deviation of the intra-event residuals depends only on the period at which the residuals are

computed, and hence, it can be inferred that this approximation is reasonable. Incidentally,

though the current work only uses the Boore and Atkinson [2008] ground-motion model,

the results obtained were found to be similar when an alternate model, namely, the Chiou


and Youngs [2008] model, was used.

The ground motion databases typically report recordings in two orthogonal horizontal

directions. For instance, the PEER NGA database [Chiou et al., 2008] provides the fault-

normal and the fault-parallel components of the ground motions for each earthquake. In

the current work, it was found that the correlations computed using both the fault-normal

and the fault-parallel time-histories were similar. Hence, only results corresponding to the

fault-normal orientation are reported here. In fact, Baker and Jayaram [2009] and Bazzurro

et al. [2008] used several sets of recorded and simulated ground motions to show that the

estimated correlations are independent of the ground-motion component used.

3.4.1 Construction of experimental semivariograms using empiricaldata

Figure 3.1a shows a sample semivariogram constructed from empirical data. The first step

in obtaining such a semivariogram is to compute site-to-site distances for all pairs of sites

and place them in different bins based on the separation distances. For example, the bins

could be centered at multiples of h km with bin widths of δh km (δh ≤ h). All pairs

of sites that fall in the bin centered at h km (i.e., the sites that are separated by a distance

∈(

h− δh2 ,h+ δh

2

)are used to compute γ(h) (based on equation 3.4)). If δh is chosen to be

very small, it can result in few pairs of sites in the bins, which will affect the robustness of

the results obtained. On the other hand, a large value of δh will mix site pairs with differing

distances reducing the resolution of the experimental semivariograms. In the current work,

experimental semivariograms are obtained using δh = 2 km (unless stated otherwise), since

this was seen to be the smallest value that results in a reasonable number of site pairs in the

bins.

The semivariogram shown in Figure 3.1a has an exponential form with a sill of 1 and a

range of 40 km. This model can be expressed as follows (based on equation 3.5):

γ(h) = 1− exp(−3h/40) (3.15)

The correlation function corresponding to this model equals ρ(h) = 1− γ(h) =


exp(−3h/40) (based on equation 3.14).

An easy and transparent method to determine the model and the model parameters

is to fit the experimental semivariogram values obtained at discrete separation distances

manually. Suppose that γ(h) can be expressed as follows:

γ(h) = c0γ0(h)+N

∑n=1

cnγn(h) (3.16)

where γ0(h) is a pure nugget effect and γn(h) is a spherical, exponential or Gaussian model

(as defined in equations 3.5-3.8); cn is the contribution of the model n to the semivariogram;

and N is the total number of models used (excluding the nugget effect). The ranges and

the contributions of the models can be systematically varied to obtain the best fit to the

experimental semivariogram values.

In the following sections, priority is placed on building models that fit the empirical

data well at short distances, even if this requires some misfit with empirical data at large

separation distances, because it is more important to model the semivariogram structure

well at short separation distances. This is because the large separation distances are asso-

ciated with low correlations, which thus have relatively little effect on joint distributions

of ground motion intensities. In addition to having low correlation, widely separated sites

also have little impact on each other due to an effective ’screening’ of their influence by

more closely-located sites (Goovaerts, 1997). (It is to be noted that in cases where there

are fewer than 10 closely spaced points, the influence of farther away points will not be

completely screened, according to Goovaerts [1997]. In such cases, the correlation model

developed in this study might provide slightly inaccurate correlation estimates. This might,

however, be mitigated by the fact that the large separation distances are associated with low

correlations, which thus have relatively little effect on joint distributions of ground motion

intensities.) Figure 3.1b shows sample semivariograms fitted to a data set using the the

manual approach and the method of least squares. It can be seen that, at small separations,

the manually-fitted semivariogram is a better model than the one fitted using the method

of least squares. More detailed discussion on the advantages of using manual-fitting rather

than least-squares fitting follows in section 3.6, where the proposed approach is also com-

pared to approaches used in previous research on this topic.


Figure 3.2: Range of semivariograms of ε , as a function of the period at which ε valuesare computed: (a) the residuals are obtained using the Northridge earthquake data (b) theresiduals are obtained using the Chi-Chi earthquake data.

3.4.2 1994 Northridge earthquake recordings

This section discusses the ranges of semivariograms estimated using observed Northridge

earthquake ground motions. The manual fitting approach described previously is used to

compute ranges of the semivariograms of ε’s (obtained based on the Northridge earthquake

time histories) computed at seven periods ranging between 0 seconds and 10 seconds. Of

the three functional forms considered (equations 3.5-3.7), the exponential model is found

to provide the ‘best fit’ (particularly at small separations) for experimental semivariograms

obtained using ε’s computed at several different periods, based on recordings from different

earthquakes. The constancy of the semivariogram function across periods makes it simpler

to specify a standard correlation model for the ε’s. Moreover, the use of a single model

enables a direct comparison of the correlations between residuals computed at different

periods, using only the ranges of the semivariograms. The ranges of these estimated semi-

variograms are plotted against period in Figure 3.2a. The semivariogram fits corresponding

to all the periods considered can be found in Appendix B.

It can be observed from Figure 3.2a that the estimated range of the semivariogram

tends to increase with period. As described earlier, it can be inferred that the ε values at

long periods show larger correlations than those at short periods. This is consistent with


comparable past studies of ground motion coherency, which has been widely researched in

the past. Coherency can be thought of as a measure of similarity in two spatially separated

ground motion time histories. Der Kiureghian [1996] reports that coherency is reduced

by the scattering of waves during propagation, and that this reduction is greater for high

frequency waves. High-frequency waves, which have short wavelengths, tend to be more

affected by small scale heterogeneities in the propagation path, and as a result tend to be

less coherent than long period ground waves [Zerva and Zervas, 2002]. It is reasonable to

expect highly coherent ground motions to exhibit correlated peak amplitudes (i.e., spectral

accelerations) as well. Since the ε’s studied here, which quantify these peak amplitudes,

tend to show the same correlation trend with period as previous coherency studies, it may be

that a similar wave-scattering mechanism is partially responsible for the correlation trends

observed here.

The Northridge earthquake data used for the above analysis are obtained from the NGA

database. In order to exclude records whose characteristics differ from those used by the

ground-motion modelers for data analysis, in most cases, only records used by the authors

of the Boore and Atkinson [2008] ground-motion model are considered. For the purposes

of this chapter, these records are denoted ‘usable records’. The semivariograms of residuals

computed at periods of 5, 7.5 and 10 seconds, however, are obtained using all available

Northridge records in the NGA database. This is on account of the limited number of

Northridge earthquake recordings at extremely long periods. At 5 seconds, the residuals

can be computed using 158 total available records, while 66 of these are used by the ground-

motion model authors. Since there is a reasonable number of records available in both

cases, a semivariogram constructed using all 158 records (denoted SV1) can be compared

to that estimated from the usable 66 records (in this case, the bin size was increased to 4

km to compensate for the lack of available records) (denoted SV2). The ranges of the two

semivariograms, SV1 and SV2, are 40 km and 30 km respectively. This shows that there is

a slight difference in the estimated ranges, which could be due to the additional correlated

systematic errors introduced by the extra records.

As mentioned in section 3.2, correlation between intensities estimated using the fault-

normal components are discussed in this chapter. This is because the correlations obtained

using the fault-normal and the fault-parallel ground motions were found to be similar. For


example, the semivariogram of ε’s computed at 2 seconds, based on the fault-parallel

ground motions recorded during the Northridge earthquake was found to be reasonably

modeled using an exponential function with a unit sill and a range of 36 km. The corre-

sponding range for the semivariogram based on the fault-normal ground motions equals 42

km. Similar results were observed when the residuals were computed at other periods, and

using other earthquake recordings.

3.4.3 1999 Chi-Chi earthquake

In this section, the semivariogram ranges of ε’s from the Chi-Chi earthquake recordings are

presented. The Chi-Chi earthquake ground motions came from the NGA database. Only

records used by the authors of the Boore and Atkinson [2008] ground-motion model are

considered. The summary plot of the estimated ranges is shown in Figure 3.2b. (The semi-

variograms are shown in Appendix B.) The following can be observed from the figures:

(a) As seen with the Northridge earthquake data, the range of the semivariogram typically

increases with period (An exception is observed when the peak ground accelerations (PGA)

are considered, and this is explored further subsequently in this chapter.)

(b) The ranges are higher, in general, than those observed based on the Northridge earth-

quake data (Figure 3.2a). This is consistent with observations made by other researchers

considering Northridge and Chi-Chi earthquake data [e.g., Goda and Hong, 2008].

The large ranges obtained here, relative to the comparable results from Northridge, can

be explained using the Vs30 values (average shear-wave velocities in the top 30 m of the

soil) at the recording stations (The author found an empirical link between the range and

Vs30, but not between range and other earthquake- and site-related parameters such as mag-

nitude, distance. Further research using bigger datasets is necessary to quantify such links.)

The Vs30 values are commonly used in ground-motion models as indicators of the effects of

local-site conditions on the ground motion. ε ′s are affected if the predicted ground-motion

intensities are affected by inaccurate Vs30 values, or if the Vs30’s are inadequate to capture

the local-site effects entirely (i.e., the ground-motion models do not entirely capture the

local-site effects using Vs30 values).

Close to 70% of the Taiwan site Vs30 values are inferred from Geomatrix site classes,


while the rest of the Vs30 values are measured (NGA database). Since closely-spaced sites

are likely to belong to the same site class and posess similar (and unknown) Vs30 values,

errors in the inferred Vs30 values are likely to be correlated among sites that are close

to each other. Such correlated Vs30 measurement errors will result in correlated prediction

errors at all these closely-spaced sites, which will increase the range of the semivariograms.

The larger ranges of semivariograms estimated using the Chi-Chi earthquake ground

motions may also be due to possible correlation between the true Vs30 values (and not just

the correlation between the Vs30 errors). Larger correlation between the Vs30’s indicate a

more homogeneous soil (homogeneous in terms of properties that affect site effects but not

accounted by the ground-motion models). In such cases, if a ground-motion model does not

accurately capture the local-site effect at one site, it is likely to produce similar prediction

errors in a cluster of closely-spaced sites (on account of the homogeneity). Castellaro et al.

[2008] compared the site-dependent seismic amplification factors (Fa, the site amplification

factor is defined as the amplification of the ground-motion spectral level at a site with

respect to that at a reference ground condition [Borcherdt, 1994]) observed during the 1989

Loma Prieta earthquake to the corresponding site Vs30 values. They found substantial

scatter in the plot of Fa versus Vs30, and also found that this scatter was more pronounced

at short periods (below 0.5 seconds) than at longer periods. This suggests that ground-

motion intensity predictions based on Vs30 will have errors, particularly at periods below

0.5 seconds.

Figures 3.3a and 3.3b show semivariograms of the normalized Vs30 values (the Vs30

semivariogram is not to be confused with the ε semivariogram) at the Northridge earth-

quake recording stations and the Chi-Chi earthquake recording stations respectively (Nor-

malization involves scaling the Vs30 values so that the normalized Vs30 values have a unit

variance to enable a direct comparison of the semivariograms.) Figure 3.3a shows signif-

icant scatter at all separation distances indicating zero correlation at all separations. In

contrast, Figure 3.3b indicates that the Taiwan Vs30 values have significant spatial corre-

lation. This suggests that ε’s may have additional spatial correlation in Taiwan, due to

homogeneous site effects that cause correlated prediction errors.

As mentioned previously, one notable aberration in the plot of range versus period

(Figure 3.2b) is the large range observed when the residuals are computed at 0 seconds


Figure 3.3: (a) Experimental semivariogram obtained using normalized Vs30’s at therecording stations of the Northridge earthquake. No semivariogram is fitted on accountof the extreme scatter (b) Experimental semivariogram obtained using normalized Vs30’sat the recording stations of the Chi-Chi earthquake. The range of the fitted exponentialsemivariogram equals 25 km.

as compared to some of the longer periods. This is not consistent with the coherency

argument of the previous section. It can, however, be explained using the relationship

between the range and the Vs30’s described in the above paragraphs. The inaccuracies in

ground-motion prediction based on Vs30’s will reflect in increased correlation between the

residuals computed at nearby sites. These inaccuracies are larger at short periods (below

0.5 seconds) [Castellaro et al., 2008], which explains the larger correlation between the

residuals (which ultimately results in the larger range observed) computed using PGAs.

One final test that was considered here was whether spatial correlations differed for

near-fault ground motions experiencing directivity. Baker [2007b] identified pulse-like

ground motions from the NGA database based on wavelet analysis. Thirty such pulses were

identified in the fault-normal components of the Chi-Chi earthquake recordings. Experi-

mental semivariograms of residuals were computed using these pulse-like ground motions,

and their ranges were estimated. It was seen that the ranges were reasonably similar to

those obtained using all usable ground motions (i.e., pulse-like and non-pulse-like). Since


the available pulse-like ground-motion data set is very small, however, the results obtained

were not considered to be sufficiently reliable, and hence not considered further in this

chapter. A more detailed analysis can be found in Appendix B and Bazzurro et al. [2008].

Based on the discussion in this section, it can be seen that the correlated Vs30 values and

the correlated Vs30 measurement errors are possible reasons for the larger ranges estimated

in section 3.4.3 than in section 3.4.2. Other factors, such as the size of the rupture areas,

may also affect the correlations. These factors could not, however, be investigated with the

limited data set available.

3.4.4 Other earthquakes

The correlations computed using data from the 2003 M5.4 Big Bear City earthquake, the

2004 M6.0 Parkfield earthquake, the 2005 M5.1 Anza earthquake, the 2007 M5.6 Alum

Rock earthquake and the 2008 M5.4 Chino Hills earthquake are presented in this section.

The time histories for these earthquakes were obtained from the CESMD database [2008].

The Vs30 data used for these computations came from the CESMD database [2008] (for the

Parkfield earthquake) and the U.S. Geological Survey Vs30 maps (for the other earthquakes)

[Global Vs30 map server, 2008].

Exponential models are fitted to experimental semivariograms of ε’s computed using

the time histories from the above-mentioned earthquakes, at periods ranging from 0 - 10

seconds. Figure 3.4 shows plots of range versus period for the Big Bear City, Parkfield,

Alum Rock, Anza and Chino Hills earthquake residuals respectively. The ranges of the

semivariograms are generally seen to increase with period, which is consistent with find-

ings from the Chi-Chi and the Northridge earthquake data. It can also be seen from the

figure that, at short periods, the ranges obtained from the Anza earthquake data are larger

than those from the other earthquakes considered. On the other hand, the ranges com-

puted using the Parkfield earthquake data are fairly small at short periods. Semivariograms

of the Vs30’s at the recording stations for all five earthquakes of interest were computed.

The semivariogram range computed using the Anza earthquake Vs30’s was found to be the

largest at 40 km, while the ranges computed from the Chino Hills, Big Bear City, Alum

Rock and Parkfield earthquake data were smaller at 35, 30, 18 and approximately 0 km


Figure 3.4: Range of semivariograms of ε , as a function of the period at which ε valuesare computed. The residuals are obtained using the: (a) Big Bear City earthquake data (b)Parkfield earthquake data; (c) Alum Rock earthquake data; (d) Anza earthquake data; (e)Chino Hills earthquake data.


Figure 3.5: Ranges of residuals computed using PGAs versus ranges of normalized Vs30values.

respectively. The estimated ranges of the semivariograms of the residuals and of the Vs30’s

reinforce the argument made previously that clustering in the Vs30 values (as indicated by

a large range of the Vs30 semivariogram) results in increased correlation among the resid-

uals (the low PGA-based range estimated using the Chino hills earthquake data seems to

be an exception, however). This trend is seen in Figure 3.5, which shows the range of

PGA-based residuals plotted against the range of the Vs30’s, for the earthquakes consid-

ered in this work. This dependence on the Vs30 range seems to be lesser at longer periods,

which is in line with the observations of Castellaro et al. [2008] that the scatter in the plot

of Fa versus Vs30 is greater at short periods than at long periods. The authors hypothesize

that the reduced dependence of range on Vs30’s at long periods could also be because the

long-period ranges are considerably influenced by factors other than Vs30 values, such as

coherency as explained in section 3.4.2 and prediction errors unrelated to Vs30’s (which

are likely since the ground-motion models are fitted using much fewer data points at long

periods). Finally, an additional advantage of considering these five additional events is that

earthquakes covering a range of magnitudes have been studied. No trends of range with

magnitude were detected.

A few research works studying spatial correlations use ground-motion recordings from


Figure 3.6: (a) Range of semivariograms of ε , as a function of the period at which ε valuesare computed. The residuals are obtained from six different sets of time histories as shownin the figure; (b) Range of semivariograms of ε predicted by the proposed model as afunction of the period.

earthquakes in Japan, based on the data provided in the KiK Net [2007]. In this work, data

from the 2004 Mid Niigata Prefecture earthquake and the 2005 Miyagi-Oki earthquake

were explored. Though the number of sites at which the ground-motion recordings are

available is fairly large, most recording stations are far away from each another. The KiK

Net [2007] consists of 681 recording stations, of which only 19 pairs of stations are within

10 km of one another. As explained in section 3.4.2, it is important to accurately model

the semivariogram at short separation distances, particularly at separation distances below

10 km. Hence, the recordings from the KiK Net [2007] were not considered further for

studying the ranges of semivariograms.

3.4.5 A predictive model for spatial correlations

The above sections presented spatial correlations computed using recorded ground mo-

tions from several past earthquakes. In this section, these correlation estimates are used

to develop a model that can be used to select appropriate correlation estimates for risk

assessment purposes.

Figure 3.6a shows the ranges computed using various earthquake data as a function of


period. From a practical perspective, despite the wide differences in the characteristics of

the earthquakes considered, the ranges computed are quite similar, particularly at periods

longer than 2 seconds. At short periods (below 2 seconds), however, there are considerable

differences in the estimated ranges depending on the ground-motion time histories used.

The previous sections suggested empirically that differences in correlation of ε’s is in large

part explained by the Vs30 values at the recording stations for these earthquakes. Hence,

the following cases can be considered for decision making:

Case 1: If the Vs30 values do not show or are not expected to show clustering (i.e.,

the geologic condition of the soil varies widely over the region (this can be quantified

by constructing the semivariogram of the Vs30’s as explained previously or by using a

simplified visual approach described in Section B.4)), the smaller ranges reported in Figure

3.6a will be appropriate.

Case 2: If the Vs30 values show or are expected to show clustering (i.e., there are

clusters of sites in which the geologic conditions of the soil are similar), the larger ranges

reported in Figure 3.6a should be chosen.

Based on these conclusions, the following model was developed to predict a suitable

range based on the period of interest:

At short periods (T < 1 second), for case 1:

b = 8.5+17.2T (3.17)

At short periods (T < 1 second), for case 2:

b = 40.7−15.0T (3.18)

At long periods (T ≥ 1 second), for both cases 1 and 2:

b = 22.0+3.7T (3.19)

where b denotes the range of the exponential semivariogram (equation 3.5), and T denotes

the period. Based on this model, the correlation between normalized intra-event residuals


separated by h km is obtained as follows (follows from equations 3.5 and 3.14):

ρ(h) = exp(−3h/b) (3.20)

It is to be noted that the correlations between intra-event residuals will exactly equal the

correlations between normalized intra-event residuals defined above.

The plot of the predicted range versus period is shown in Figure 3.6b. The model has

been developed based on only seven earthquakes, but since the trends exhibited were found

to be similar for these seven, it can be expected that the model will predict reasonable

ranges for future earthquakes.

The predictive model can be used for simulating correlated ground-motion fields for a

particular earthquake as follows:

Step 1 : Obtain median ground motion values (denoted Yi j in equation 3.1) at the sites

of interest using a ground-motion model.

Step 2 : Probabilistically generate (simulate) the inter-event residual term (η j in equa-

tion 3.1), which follows a univariate normal distribution. The mean of the inter-event resid-

ual is zero, and its standard deviation can be obtained using ground-motion models.

Step 3: Simulate the intra-event residuals (εi j in equation 3.1) using the standard devi-

ations from the ground-motion models and the correlations from equations 3.17 - 3.20.

Step 4: Combine the three terms generated in Steps 1 - 3 using equation 3.1 to obtain

simulated ground-motions at the sites of interest.

3.5 Isotropy of semivariograms

This section examines the assumption of isotropy of semivariograms using the ground mo-

tions discussed previously.

3.5.1 Isotropy of intra-event residuals

A stationary semivariogram (γ(h)) is said to be isotropic if it depends only on the separation

distance h = ‖h‖, rather than the separation vector h. Anisotropy is said to be present

when the semivariogram is also influenced by the orientation of the data locations. The


presence of anisotropy can be studied using directional semivariograms [Goovaerts, 1997].

Directional semivariograms are obtained as shown in equation 3.4 except that the estimate

is obtained using only pairs of (zuα,zuα+h) such that the azimuth of the vector h are identical

and as specified for all the pairs. Since an isotropic semivariogram is independent of data

orientation, the directional semivariograms obtained considering any specific azimuth will

be identical to the isotropic semivariogram if the data is in fact isotropic. Differences

between the directional semivariograms indicate one of two different forms of anisotropy,

namely, geometric anisotropy and zonal anisotropy. Geometric anisotropy is said to be

present if directional semivariograms with differing azimuths have differing ranges. Zonal

anisotropy is indicated by a variation in the sill with azimuth.

3.5.2 Construction of a directional semivariogram

A directional semivariogram is specified by several parameters, as illustrated in Figure

3.7a. The parameters include the azimuth of the direction vector (the azimuth angle (φ ) is

measured from the North), the azimuth tolerance (δφ ), the bin separation (h) and the bin

width (δh). A semivariogram obtained using all pairs of points irrespective of the azimuth

is known as the omni-directional semivariogram, and is an accurate measure of spatial cor-

relation in the presence of isotropy (The semivariograms that have been described in the

previous sections are omni-directional semivariograms.) In determining the experimen-

tal semivariogram in any bin, only pairs of sites separated by distance ranging between[h− δh

2 ,h+ δh2

], and with azimuths ranging between [φ −δφ ,φ +δφ ] are considered. For

example, let α be a site located in a 2 dimensional region, as shown in Figure 3.7a. It is

intended to construct a directional semivariogram with an azimuth of φ (as marked in the

figure). The computation of the experimental semivariogram value (γ(h)) involves pairing

up the data values at all sites falling within the hatched region (the region that satisfies the

conditions on the separation distance and the azimuth, as mentioned above) with the data

value at site α (i.e., uα ). The area of the hatched region is defined by the azimuth tolerance

used and can be seen to increase with increase in separation distance (h) (Figure 3.7a). For

large values of h, the area of the hatched region will be undesirably large and hence, in

addition to placing constraints on the azimuth tolerance, a constraint is explicitly specified


on the bandwidth of the region of interest, as marked in the figure.

It is usually difficult to compute experimental directional semivariograms on account of

the need to obtain pairs of sites oriented along pre-specified directions. Hence, it is required

that the bin width, the azimuth tolerance and the bandwidth be specified liberally while

constructing directional semivariograms. The results reported in this chapter are obtained

by considering a bin separation of 4 km, a bin width of 4 km, an azimuth tolerance of 10 ◦

and a bandwidth of 10 km. Directional semivariograms are plotted for azimuths of 0 ◦, 45◦ and 90 ◦ in order to capture the effects of anisotropy, if any.

3.5.3 Test for anisotropy using Northridge ground motion data

Figure 3.7b-d shows the omni-directional and the three directional experimental semivar-

iograms of the 2 second ε’s from the Northridge earthquake data. The semivariogram

function shown in the figures is the exponential model with a unit sill and a range of 42 km.

This exponential model (obtained assuming isotropy in section 3.4.2) fits all the experimen-

tal directional semivariograms reasonably well (at short separations, which are of interest).

This is a good indication that the semivariogram is isotropic. Similar results were obtained

at other periods and for other earthquakes (Appendix B and Bazzurro et al. [2008]).

3.6 Comparison with previous research

Researchers have previously computed the correlation between ground-motion intensities

using observed peak ground accelerations, peak ground velocities and spectral accelera-

tions. These works, however, differ widely in the estimated rate of decay of correlation

with separation distance. This section compares the results observed in the current work to

those in the literature and also discusses possible reasons for the apparent inconsistencies

in the previous estimates.

Wang and Takada [2005] used the ground-motion relationship of Annaka et al. [1997]

to compute the normalized auto-covariance function of residuals computed using the Chi-

Chi earthquake peak ground velocities (PGV). They used an exponential model to fit the

discrete experimental covariance values and reported a result which is equivalent to the


Figure 3.7: (a) Parameters of a directional semivariogram. Subfigures (b), (c) and (d)show experimental directional semivariograms at discrete separations obtained using theNorthridge earthquake ε values computed at 2 seconds. Also shown in the figures is thebest fit to the omni-directional semivariogram: (b) azimuth = 0◦ (c) azimuth = 45◦ (d)azimuth = 90◦.


following semivariogram:

γ(h) = 1− exp(−3h/83.4). (3.21)

This semivariogram has a unit sill and a range of 83.4 km (from equation 3.5). The current

work does not consider the spatial correlation between PGV-based residuals. The PGVs,

however, are comparable to spectral accelerations computed at moderate periods (0.5 to 1

s), and hence, the semivariogram ranges of residuals computed from PGVs can be quali-

tatively compared to the corresponding ranges estimated in this work (Figure 3.6a). It can

be seen that the range reported by Wang and Takada [2005] is substantially higher than the

ranges observed in the current work.

In order to explain this inconsistency, the correlations computed by Wang and Takada

[2005] are recomputed in the current work using the Chi-Chi earthquake time histories

available in the NGA database and the ground-motion model of Annaka et al. [1997]. The

Annaka et al. [1997] ground-motion model does not explicitly capture the effect of local-

site conditions. To account for the local-site effects, Wang and Takada [2005] amplified the

predicted PGV at all sites by a factor of 2.0 and the same amplification is carried out here

for consistency. The observed and the predicted PGVs are used to compute residuals, and

the experimental semivariograms (at discrete separations) of these residuals are estimated

(considering a bin size of 4 km) using the procedures discussed previously in this chapter.

Figure 3.8a shows the experimental semivariogram obtained, along with an exponential

semivariogram function having a unit sill and a range of 83.4 km (there are slight differ-

ences between this experimental semivariogram and the one shown in Wang and Takada

[2005] possibly due to the differences in processing carried out on the raw data or the spe-

cific recordings used). It is clear from Figure 3.8a, as well as the results presented in Wang

and Takada [2005], that the exponential model with a range of 83.4 km does not provide an

accurate fit to the experimental semivariogram values at small separation distances. This

is because Wang and Takada [2005] minimized the fitting error over all distances to obtain

their model.

In the literature, several research works use the method of least squares (or visual meth-

ods that attempt to minimize the fitting error over all distances, which in effect, produces

fits similar to the least-squares fit), to fit a model to an experimental semivariogram [Goda


Figure 3.8: Semivariogram obtained using residuals computed based on Chi-Chi earth-quake peak ground velocities: (a) residuals from Annaka et al. [1997] and semivariogrammodel from Wang and Takada [2005] (b) residuals from Annaka et al. [1997] and semivar-iogram fitted to model the discrete values well at short separation distances (c) residualsfrom Annaka et al. [1997], considering random amplification factors.


and Hong, 2008, Hayashi et al., 2006, Wang and Takada, 2005]. There are three major

drawbacks in using the method of least squares to fit an experimental semivariogram:

(a) As explained in section 3.4.2, it is more important to model the semivariogram struc-

ture well at short separation distances than at long separation distances. This is because of

the low correlation between intensities at well-separated sites and the screening of a far-

away site by more closely-located sites [Goovaerts, 1997]. It is, therefore, inefficient if a

fit is obtained by assigning equal weights to the data points at all separation distances, as

done in the method of least squares.

(b) The results provided by the method of least squares are highly sensitive to the

presence of outliers (because differences between the observed and predicted γ(h)’s are

squared, any observed γ(h) lying away from the general trend will have a disproportionate

influence on the fit).

(c) The least-squares fit results can be sensitive to the maximum separation distance

considered. This is of particular significance if the method of least squares is used to

determine the sill of the semivariogram in addition to its range.

Some of the these drawbacks can be corrected within the framework of the least-squares

method. Drawback (a) can be partly overcome by assigning large weights to the data points

at short separation distances. The presence of outliers can be checked rigorously using

standard statistical techniques [Kutner et al., 2005] and the least-squares fit can be obtained

after eliminating the outliers in order to overcome the second drawback mentioned above.

These procedures, however, add to the complexity of the approach. For this reason, experi-

mental semivariograms are fitted manually rather than using the method of least squares in

the current work [as recommended by Deutsch and Journel, 1998]. This approach allows

one to overlook outliers and also to focus on the semivariogram model at distances that

are of practical interest. Though this method is more subjective than the method of least

squares, experience shows that the results obtained are reasonably robust.

Figure 3.8b shows the experimental semivariogram (identical to the one shown in Fig-

ure 3.8a) along with an exponential function, which is manually fitted to model the experi-

mental semivariogram values well at short separation distances. The range of this exponen-

tial model equals 55 km, which is much less than the range of 83.4 km mentioned earlier,

and is closer to the results reported earlier for the Chi-Chi spectral accelerations.


The large range reported in Wang and Takada [2005] may also be due to inaccuracies in

modeling the local-site effects. As explained in section 3.4.2, errors in capturing the local-

site effects will cause systematic errors in the predicted ground motions that will result in

an increase in the range of the semivariogram. Using a constant amplification factor of

2.0 (without considering the actual local-site effects) will produce even larger systematic

errors in the predicted ground motions than considered previously. Consider a complemen-

tary hypothetical example in which the ground-motion amplification factor for each site is

considered to be an independent random variable, uniformly distributed between 1.0 and

2.0. Randomizing the ground-motion amplification will break up the correlation between

the prediction errors in a cluster of closely-spaced sites. The semivariogram of residuals

obtained considering such random amplification factors is shown in Figure 3.8c. The range

of this semivariogram equals 43 km, which is less than the 55 km from Figure 3.8b. The

true amplifications are neither constant at 2.0, nor are totally random between 1.0 and 2.0.

Hence, the range of the semivariogram is expected to lie within 43 km and 55 km, which is

close to the range observed using short period spectral-accelerations in the current work.

Boore et al. [2003] estimated correlations between the PGA residuals computed from

the Northridge earthquake. They observed that the correlations dropped to zero when the

inter-site separation distance was approximately 10 km. This matches with the range of

10 km estimated in the current work using the Northridge earthquake PGAs (Figure 3.2a).

Those results appear to be consistent with the results shown here (and it is interesting to

note that the two efforts used different estimation procedures and data sets).

The observations in the current work are also consistent with those reported in Goda

and Hong [2008] who reported a more rapid decrease in correlations with distance for the

Northridge earthquake ground motions than for the Chi-Chi earthquake ground motions.

They also reported that the decay of spatial correlation of the residuals computed from

spectral accelerations is more gradual at longer periods, a feature observed and analyzed in

the current research work. The current work adds plausible physical explanations for these

empirically-observed trends.


3.7 Conclusions

Geostatistical tools have been used to quantify the correlation between spatially-distributed

ground-motion intensities. The correlation is known to decrease with increase in the separa-

tion between the sites, and this correlation structure can be modeled using semivariograms.

A semivariogram is a measure of the average dissimilarity between the data, whose func-

tional form, sill and range uniquely identify the ground-motion correlation as a function of

separation distance.

Ground motions observed during the Northridge, Chi-Chi, Big Bear City, Parkfield,

Alum Rock, Anza and Chino Hills earthquakes were used to compute the correlations be-

tween spatially-distributed spectral accelerations, at various spectral periods. The correla-

tions were computed for normalized intra-event residuals, since the normalized intra-event

residuals will be homoscedastic. The ground-motion model of Boore and Atkinson [2008]

was used for the computations, but the results did not change when the Chiou and Youngs

[2008] model was used instead.

It was seen that the rate of decay of the correlation with separation typically decreases

with increasing spectral period. It was reasoned that this could be because long period

ground motions at two different sites tend to be more coherent than short period ground

motions, on account of lesser wave scattering during propagation. It was also observed

that, at periods longer than 2 seconds, the estimated correlations were similar for all the

earthquake ground motions considered. At shorter periods, however, the correlations were

found to be related to the site Vs30 values. It was shown that the clustering of site Vs30’s is

likely to result in larger correlations between residuals. Based on these findings, a predic-

tive model was developed that can be used to select appropriate correlation estimates for

use in risk assessment of spatially-distributed building portfolios or infrastructure systems.

The research work also investigates the effect of directivity on the correlations using

pulse-like ground motions. The correlations obtained were similar to those estimated us-

ing all ground motions. The results, however, are not discussed in detail due to concerns

about the reliability of the results on account of the small data set of pulse-like ground

motions. The work also investigated the commonly-used assumption of isotropy in the cor-

relation between residuals using directional semivariograms. If directional semivariograms


computed based on different azimuths are identical to the omni-directional semivariogram

(which is obtained assuming isotropy), it can be concluded that the semivariograms (and

therefore, the correlations) are isotropic. It was seen using empirical data that the corre-

lation between Chi-Chi and Northridge earthquake intensities show isotropy at both short

and long periods.

The results obtained were also compared to those reported in the literature [Goda and

Hong, 2008, Wang and Takada, 2005, Boore et al., 2003]. Wang and Takada [2005] report

larger correlations using the PGVs computed using the Chi-Chi earthquake recordings than

those reported in this work for spectral accelerations. It was shown that these larger cor-

relations are a result of attempting to fit the experimental semivariogram reasonably well

over the entire range of separation distances of interest (which is a typical result of using

least-squares fits and eye-ball fits that produce results similar to least-squares fits), and of

using a ground-motion model that does not account for the effect of local-site conditions.

Typically, a semivariogram model should represent correlations accurately at small sepa-

rations since ground motions at a site are more influenced by ground motions at nearby

sites. The method of least squares assigns equal importance to all separation distances

and is therefore, inefficient. In the current research work, semivariogram models are fitted

manually with emphasis on accurately modeling correlations at small separations.

This study illustrates various factors that affect the spatial correlation between ground-

motion intensities, and provides a basis to choose an appropriate model using empirical

data. The proposed predictive model can be used for obtaining the joint distribution of

spatially-distributed ground-motion intensities, which is necessary for a variety of seismic

hazard calculations.

Chapter 4

Spatial correlation between spectralaccelerations using simulatedground-motion time histories

N. Jayaram, Park, J., Bazzurro, P. and Tothong, P. (2010). Estimation of spatial correla-

tion between spectral accelerations using simulated ground-motion time histories, 9th U.S.

National and 10th Canadian Conference on Earthquake Engineering, Toronto, Canada.

4.1 Abstract

The impact of earthquakes on a region rather than on just a single property at a specific site

is of interest to several public and private stakeholders, including government and relief

organizations that are in charge of disaster mitigation and post-disaster response planning

and management, and private organizations that insure and manage spatially-distributed as-

sets. Regional earthquake impact assessment requires knowledge about the distribution of

ground-motion intensities over the entire region. Ground-motion models that are used for

quantifying the hazard at a single site do not provide information on the spatial correlation

between ground-motion intensities, which is required for the joint prediction of intensities

at multiple sites. Statistical models that describe the spatial correlation between intensity

measures are available in the literature, and the mathematics behind models that estimate

79

CHAPTER 4. SPATIAL CORRELATION ESTIMATES FROM SIMULATIONS 80

the spatial correlation as a function of site separation distance has already been developed.

This study investigates whether a more sophisticated model of spatial correlation that in-

corporates features such as non-stationarity (variation of correlation with spatial location),

anisotropy (directional dependence) and directivity effects (different correlation models for

pulse-like and non-pulse-like ground motions) is warranted. Testing the need for these ad-

ditional features, however, requires a large number of ground-motion time histories. Since

real data are sparse, the current study uses simulated ground-motion time histories instead.

Overall, this study tests and provides a basis for some of the subtle assumptions commonly

used in spatial correlation models.

4.2 Introduction

The impact of earthquakes on a region rather than on just a single property at a specific site

is of interest to several public and private stakeholders. In the aftermath of a large event,

public entities such as government agencies and relief organizations, and private entities

such as corporations and utilities need to assess the potential damage on a regional scale

in order to plan their emergency response in a timely manner. These organizations also

need to assess regional risks from future earthquakes in order to determine risk mitigation

strategies such as retrofitting and acquiring insurance coverage.

Regional earthquake impact assessment requires knowledge about the joint ground-

motion hazard at multiple sites of interest spread over the entire region. Predictive equa-

tions have been developed for estimating the distribution of the ground-motion intensity

that an earthquake can cause at a single site [e.g., Boore and Atkinson, 2008]. Much less

attention has been devoted, however, to estimating the statistical dependence (spatial cor-

relation) between ground-motion intensities generated by an earthquake at multiple sites.

The spatial correlation between ground-motion intensity measures arises to many factors

including common source effects (e.g., a high stress-drop earthquake may generate ground-

motion intensities that are, on average, higher than the median values from events of the

same magnitude), common path effects (the seismic waves travel over a similar path from

the source to two nearby sites) and common site effects (similar non-linear amplification at

two nearby sites due to proximity). Modern ground-motion models implicitly account for


a part of the dependence via a specific inter-event error term, ηi, as follows [e.g., Boore

and Atkinson, 2008, Abrahamson and Silva, 2008, Chiou and Youngs, 2008, Campbell and

Bozorgnia, 2008]:

ln(Yi j)= lnYi j +σεi j + τηi (4.1)


tral acceleration at period T ) at site j during earthquake i; Yi j is the median value of Yi j pre-

dicted by the ground-motion model at site j for earthquake i (which depends on parameters

such as magnitude, distance of source from site, local site conditions); ηi is the normalized

inter-event standard normal residual, εi j is the site-to-site normalized intra-event standard

normal residual, τ and σ are the corresponding standard deviations of the two residuals.

While the ground-motion model in Equation 4.1 partially accounts for the correlation of Yi j

at different sites via the common ηi, there is a significant amount of unaccounted correla-

tion in the εi j’s, which is not quantified by the ground-motion models. It is of interest in

this study to further explore the properties of this correlation.

An alternative formulation for Equation 4.1, which was common in older prediction

equations, is given by

ln(Yi j)= lnYi j + σ εi j (4.2)

where εi j is a random variable called the normalized total residual, which represents both

the inter-event and the intra-event variability at site j from earthquake i. Comparing Equa-

tions 4.1 and 4.2, it is seen that

σ =√

σ2 + τ2 (4.3)

εi j =τηi +σεi j

σ(4.4)

This study intends to empirically estimate the correlation between the intra-event resid-

uals (εi j) using ground-motion time histories. Since the inter-event residual is a constant


across all sites during a given earthquake, the correlation between εi j’s equals the corre-

lation between εi j’s [Jayaram and Baker, 2009a] (Chapter 3 of this thesis). While esti-

mating spatial correlations, it is convenient to directly work with total residuals (Equation

4.2) since the values of εi j can be directly computed from the ground-motion observations

without the knowledge about ηi.

In the past, researchers have estimated the spatial correlations between the total residu-

als using recorded ground-motion data [e.g., Wang and Takada, 2005, Jayaram and Baker,

2009a]. Using geostatistical tools, Jayaram and Baker [2009a] identified various factors in-

fluencing the extent of the spatial correlation, and developed a predictive model that can be

used to select appropriate correlation estimates. While recorded ground motions represent

the natural source for estimating the extent of correlation between ground-motion intensi-

ties at two sites, they do not suffice for investigating the validity of assumptions such as

second-order stationarity (i.e., dependence of correlation on just the separation between

sites, and not on the actual location of the sites) and isotropy (i.e., invariance of correlation

with the orientation of the sites) that are commonly used in the spatial correlation models

developed so far. This is on account of the scarcity of ground-motion recordings for any

particular earthquake. This limitation can be partially overcome by using simulated ground

motions. Although the simulations may not be complete substitutes for recorded data, they

are still extremely useful for testing and refining existing correlation models (which re-

quires large amounts of data). This chapter describes the tests carried out to verify the

commonly-used assumptions of stationarity and isotropy using ground motions simulated

by Dr. Brad Aagaard of the United States Geological Survey based on the 1989 Loma

Prieta earthquake source model [Aagaard et al., 2008]. Further, tests carried out to verify

whether pulse-like ground motions that arise due to directivity effects and non-pulse-like

ground motions have similar correlation structures are also described. Information about

tests carried out using other sets of simulated ground motions can be found in [Bazzurro

et al., 2008].


4.3 Statistical estimation of spatial correlation

The current work uses geostatistical tools previously used by Jayaram and Baker [2009a]

to empirically estimate the spatial correlations of residuals from simulated ground-motion

time histories. These tools are described briefly in this section; a detailed discussion can

be found in, for example, Deutsch and Journel [1998] and Jayaram and Baker [2009a]

(Chapter 3 of this thesis).

Let εεε denote the normalized total residuals distributed over space. The correlation

structure of εεε (equivalently, that of εεε) can be represented using a semivariogram, which is

a measure of the dissimilarity between the residuals. Let u and u′ denote two sites separated

by distance vector hhh. The semivariogram (γ(u,u′)) is defined as follows:

γ(u,u′) =12

E[{εu− εu′}2] (4.5)

The semivariogram defined in Equation 4.5 is location-dependent and its inference re-

quires repetitive realizations of ε at locations u and u′. Such repetitive measurements are,

however, never available in practice. Hence, it is typically assumed that the semivariogram

does not depend on site locations u and u′, but only on their separation hhh to obtain a station-

ary semivariogram. The stationary semivariogram (γ(hhh)) can then be estimated as follows:

γ(hhh) =12

E[{εu− εu+h}2] (4.6)

A stationary semivariogram is said to be isotropic if it is a function of the separation dis-

tance (h = ‖hhh‖) rather than the separation vector hhh. An isotropic, stationary semivariogram

can be empirically estimated from a data set as follows:

γ(h) =1

2N(h)

N(h)

∑α=1{εuα− εuα+h}2 (4.7)

where γ(h) is the experimental stationary isotropic semivariogram (estimated from a data

set); N(h) denotes the number of pairs of sites separated by h; and {εuα, εuα+h} denotes the

α’th such pair.


When empirically estimated, γ(h) only provides semivariogram values at discrete val-

ues of h, and hence, a continuous is usually fitted to the discrete values to obtain the semi-

variogram for continuous values of h. The exponential function shown below is commonly

used for this purpose.

γ(h) = a [1− exp(−3h/b)] (4.8)

where a denotes the ‘sill’ of the semivariogram (which equals the variance of the data)

and b denotes the ‘range’ of the semivariogram (which equals the separation distance h at

which γ(h) equals 0.95a).

It can be theoretically shown that the spatial correlation function (ρ(h)) for normalized

total residuals (and therefore, for normalized intra-event residuals) can be computed from

the semivariogram function as follows:

γ(h) = a(1−ρ (h)) (4.9)

Therefore, it can be seen that the correlations are completely defined by the semivari-

ogram, which in turn, is a function only of the range. (The sill is known to equal 1, which

is the variance of the normalized residuals for which the semivariogram is constructed.)

Moreover, note from equations 4.7 and 4.9 that a larger range implies a smaller rate of

increase in γ(h) with h, and subsequently, a smaller rate of decay of correlation with sepa-

ration distance.

4.4 Results and discussion

This section describes the tests carried out to verify the commonly-used assumptions of sta-

tionarity and isotropy using ground motions simulated by Dr. Brad Aagaard of the United

States Geological Survey for the 1989 Loma Prieta earthquake source model [Aagaard

et al., 2008]. Further, tests carried out to verify whether pulse-like ground motions that

arise due to directivity effects and non-pulse-like ground motions have similar correlation

structures are also described. The simulated 1989 Loma Prieta data set contains ground-

motion time histories at 35,547 sites. Soft soil sites with Vs30 ≤ 500m/s are excluded


from the tests, due to concerns about the ability of the simulation methodology to capture

nonlinear soil behavior. Also, the current limitations in the simulation procedure allow us

to investigate the spatial correlation of spectral accelerations only at periods longer than 2s.

The total residuals, ε’s, are computed from the fault normal Sa(T ) values with T =2s, 5s,

7.5s, and 10s using the Boore and Atkinson [2008] ground-motion model. Using the geo-

statistical procedure described in the previous section, discrete semivariogram values are

estimated for these residuals, and an exponential function (Equation 4.8) is subsequently

fitted to the discrete values. Figure 4.1 shows a sample semivariogram obtained using the

residuals corresponding to Sa(T = 2s). This semivariogram has a sill of 1 and a range

of 30km. The ranges of the semivariograms obtained using the fault normal residuals at

the four different periods are plotted in Figure 4.2a. As mentioned earlier, the range is an

indicator of the extent of spatial correlation, and a larger range implies a larger amount

of spatial correlation. Figure 4.2a shows that the range and therefore, the amount of spa-

tial correlation increases with oscillator period. This trend is on expected lines because

the coherency between the period components of the ground motion increases with period

[Der Kiureghian, 1996]. Note that the ranges obtained from this simulated 1989 Loma

Prieta data set are slightly larger than those from recorded ground motions computed by

Jayaram and Baker [2009a] shown in Figure 4.2b. This means that this simulated ground

motion data set is more spatially correlated than real, recorded data sets analyzed so far.

While uncovering the reasons of this apparent discrepancy is beyond the scope of this

study, this finding can perhaps be used to enhance the simulation technique. Despite this

limitation, it is assumed that the large number of simulated ground-motions contains use-

ful information for studying the isotropy and the second-order stationarity assumptions of

spatial-correlation models. These tests can be performed irrespective of the actual extent

of correlations measured.

4.4.1 Effect of ground-motion component orientation on the semivar-iogram range

In order to test whether the extent of spatial correlation is a function of the orientation of

the ground-motion component, semivariograms of residuals are estimated using the fault


Figure 4.1: Semivariogram computed using the Sa(T=2s) residuals.

Figure 4.2: Ranges of semivariograms obtained using residuals computed from the (a) 1989Loma Prieta simulations (b) recorded ground motions [Jayaram and Baker, 2009a].


normal, fault parallel, north-south and east-west components of the simulated ground mo-

tions. The ranges of these semivariograms are shown in Figure 4.3a. The range estimates

are essentially identical for Sa at T =2s, and do not show a significant variation with the

orientation at longer periods. Hence, most of the following analyses in this chapter are

based on the fault normal components of the simulated ground motions.

4.4.2 Testing the assumption of isotropy using directional semivari-ograms

Directional semivariograms of residuals [Deutsch and Journel, 1998, Jayaram and Baker,

2009a] (illustrated in Chapter 3, Appendix B) are obtained as shown in Equation 4.6 except

that the estimates are obtained using only pairs of {εuα, εuα+h} such that the azimuth of the

vector h is identical (or, strictly speaking, within a narrow band of azimuths) for all the pairs

utilized. This study considers azimuth angles of 0◦, 45◦ and 90◦. If anisotropy is present in

the data, the semivariograms along the pre-specified azimuths will differ from each other

and from the omni-directional semivariogram (i.e., the semivariogram obtained using all

pairs of points irrespective of the azimuth). Figure 4.3b compares the omni-directional

semivariogram with the semivariograms obtained by considering azimuths of 0◦, 45◦ and

90◦ for residuals for Sa(T = 2s). All the semivariograms are almost identical for separation

distances below 10km and are reasonably close for separation distances between 10km

and 20km. Recall that during the characterization of the distribution of ground-motion

intensities over a region, it is more important to capture the effects of the spatial correlation

at short separation distances since the extent of spatial correlation decreases rapidly with

separation distance. Also, in addition to having low correlation, widely separated sites also

have little impact on each other due to an effective ’screening’ of their influence by more

closely-located sites Deutsch and Journel [1998]. As a result, since the semivariograms in

Figure 4.3b are nearly identical at short separation distances, it can be reasonably concluded

that, at least for this data set, the spatial correlations can be adequately represented using

an isotropic model. Tests carried out using this Loma Prieta simulated data set for residuals

computed for Sa at longer periods showed similar results as well [Bazzurro et al., 2008].


4.4.3 Testing the assumption of second-order stationarity

A spatial random function Z is said to be second-order stationary if the random variable

Zu and Zv (i.e., the random variables that represent the values of Z at locations u and v,

respectively) have constant means and second-order statistics (i.e., the covariances) that

depend only on the distance vector between u and v and not on the actual locations. In

other words, the covariance is the same between any two sites that are separated by the

same distance and direction (direction is not a concern for isotropic semivariograms), no

matter where the sites are located with respect to the causative fault. The assumption of

second-order stationarity is convenient while developing correlation models since it allows

the data available over the entire region of interest to be pooled together and because it

considerably simplifies the application of the spatial correlation models.

We know that the means of the residuals equal zero irrespective of the location of the

residuals. Therefore, second-order stationarity can be tested by comparing the spatial cor-

relation estimates obtained using residuals located in different spatial domains (i.e., using

data from two groups of sites, one close to the fault and one far from it). Similar semi-

variograms imply that the actual spatial location of the sites where the ground-motion

intensities are measured does not matter. In the current work, seven spatial domains are

defined based on the distance of the sites from the rupture: Domain 1 includes sites be-

tween 0-20km while Domains 2-7 consist of sites between 20-40km, 40-60km, 60-80km,

80-120km, 120-160km and 160-200km of the rupture, respectively. Note that, as with his-

tograms, the selection of the distance bins is somewhat arbitrary. Very narrow bins may

provide results that are both unstable because of scarcity of data and potentially influenced

by local effects (e.g., a cluster of sites with large residuals). Conversely, very broad bins

may not detect any trend in the data, even if there is one. Here, the width of the domains is

selected judiciously to avoid both the above pitfalls.

The 1989 Loma Prieta fault normal ground motions are used to compute ε values at

four different periods, namely, 2s, 5s, 7.5s and 10s. Semivariograms are constructed for

each spatial domain using only the residuals at sites that belong to that domain, and the

estimated ranges are reported in Figure 4.4a. It can be seen that the ranges estimated using

residuals at sites within 20-160km of the rupture are reasonably close to the range estimated


Figure 4.3: (a) Ranges are computed using residuals at different orientations (b) Omni-directional (i.e., obtained using all pairs of points, irrespective of the azimuth) and direc-tional semivariograms computed using residuals for Sa(T = 2s).

Figure 4.4: (a) Ranges are computed using residuals from different spatial domains (b)Ranges are computed using pulse-like and non-pulse-like near fault ground motions.


using all fault normal residuals (’all-site ranges’). There are more significant differences,

however, between the ranges computed using residuals at sites that are very close to or very

far away from the rupture from the all-site ranges. Semivariograms computed using sites

that are farther than 160 km from the rupture show significantly smaller ranges, as do the

semivariograms computed using sites that are within 20 km of the rupture. The ground-

motion intensities at sites farther than 160 km from the rupture are generally very small

and, therefore, accounting for the reduced correlations at these extremely far-off sites is

certainly not critical. It is, however, important to further analyze the smaller correlations

observed at near-fault locations. Intuitively, it is reasonable to expect path effects and small-

scale variations to reduce spatial correlation between ground motions at near-fault sites. At

sites farther than 20km, the path effects and small-scale variations have less differential

influence, thereby resulting in larger ranges and, therefore, larger correlations.

4.4.4 Effect of directivity on spatial correlation

Ground motions at near-fault sites are often influenced by directivity effects, resulting in

large amplitude pulse-like ground motions in the forward-directivity region [Somerville

et al., 1997]. Most ground-motion models, however, do not explicitly capture this effect.

Therefore, the residuals in such cases may be more correlated because of the additional

prediction errors at sites influenced by directivity that are not captured by the ground-

motion model. This study intends to verify whether the spatial correlation between pulse-

like ground motions is different from that between non-pulse-like ground motions.

Baker [2007a] developed a technique that uses wavelet analysis to identify ground mo-

tions with pulses. Although not all the pulses identified by this technique are due to direc-

tivity effects, this approach provides a reasonable data set for studying the potential impact

of directivity. The wavelet analysis procedure of Baker [2007a] is used to identify 434

pulses in the fault normal components of 1989 Loma Prieta simulations (incidentally, the

wavelet analysis procedure also identified 121 pulses in the fault parallel direction, which

are not utilized here). Residuals at four different periods are computed based on these

ground motions and semivariograms of the residuals are developed. The estimated ranges

(shown in Figure 4.4b) of these semivariograms are smaller than those estimated based


on all the fault normal residuals, but similar to those estimated based on ground motions

at all the sites that are within 20 km from the rupture (Figure 4.4a). For a comparison,

Fig. 4b also shows the ranges obtained using ground motions at all the sites that do not

have pulse-like ground motions, but are within 20 km from the rupture (called near-fault

non-pulse records in the legend). It is seen that the ranges obtained in this case are similar

to the ranges obtained using pulses. This indicates that the effect of directivity does not

substantially alter the ranges of the semivariograms. It needs to be verified whether similar

observations can be made using recorded ground motions as well.

4.5 Conclusions

This study investigates the validity of commonly-used assumptions in spatial correlation

models such as non-stationarity (variation of correlation with location) and anisotropy (di-

rectional dependence). Testing the need for these additional features, however, requires a

large number of ground-motion time histories. Since real data are sparse, the tests can be

performed using simulated ground motions. This chapter describes the tests performed us-

ing ground-motion time histories simulated by Dr. Brad Aagaard for the 1989 Loma Prieta

earthquake source model instead. Other data sets were considered in Bazzurro et al. [2008].

Geostatistical tools were used to measure the extent of spatial correlation between spec-

tral accelerations using the simulated ground-motion data set. The correlations were esti-

mated using different orientations of the time histories, namely, fault normal, fault parallel,

north-south and east-west, and were found to be similar in all four cases. The assumption

of isotropy of spatial correlations was studied using directional semivariograms, and was

found to be reasonable. The correlations were seen to be smaller than average between

sites located extremely close to the fault rupture. Intuitively, it is reasonable to expect path

effects and small-scale variations to reduce spatial correlation between ground motions at

near-fault sites. Incidentally, the ground-motion intensities at sites very far away from the

rupture were also found to be less spatially correlated than average, but this finding is of

not much practical importance. It is important, however, to further investigate the smaller

correlations seen at near-fault sites. The pulse-identification algorithm of Baker [2007a]


was used for identifying pulse-like ground motions, and the correlations between pulse-

like and non-pulse-like ground motions were compared. The study, however, did not find

significant differences between the correlations in these two cases. Although some addi-

tional investigation using recorded time histories is needed, this study tests and provides a

basis for some of the subtle assumptions commonly used in spatial correlation models.

Chapter 5

Simulation of spatially-correlatedground-motion intensities with andwithout consideration of recordedintensity values

5.1 Abstract

Quantifying the distribution of ground-motion intensities that might exist over a spatially

distributed region during a future earthquake is important for several practical applications

such as risk assessment and risk mitigation of spatially-distributed systems. Analytically,

this is more complicated than a comparable quantification for only a single site, due to the

interdependence between the intensities at multiple sites. As a result, simulation-based

techniques are often used to quantify this distribution using probabilistically generated

representative ground-motion intensity maps for the region. This chapter discusses two

techniques, namely, single-step simulation and sequential simulation, for generating such

intensity maps.

It may also be of interest to estimate likely ground-motion intensities over a region in

the wake of an earthquake, when ground-motion intensities have been recorded at one or

93

CHAPTER 5. SIMULATION OF GROUND-MOTION INTENSITY FIELDS 94

more locations in the region. These intensity estimates are useful, for instance, in deter-

mining optimal post-earthquake response strategies. In such cases, it is possible to use

ground-motion intensities recorded during the earthquake to improve the ground-motion

intensity prediction at sites where recordings are not available. This chapter discusses a se-

quential simulation technique for generating ground-motion intensity maps incorporating

the information about the recorded intensities.

5.2 Introduction

Quantifying the distribution of ground-motion intensities that might exist over a spatially-

distributed region during a future earthquake is of great interest for several practical ap-

plications. This is important, for instance, to predict (or estimate after an earthquake) the

damage to portfolios of buildings and lifelines and the number of injuries and casualties

in a certain region. This is, however, more complicated than a comparable quantification

at a single site on account of the spatial correlation between the ground-motion intensities

at two different sites. Hence, the distribution of spatial ground-motion intensities is of-

ten quantified using simulation-based approaches that involve probabilistically generating

representative ground-motion intensity maps (a collection of intensities at all the sites of

interest) for future earthquakes. For a given earthquake, the intensities are predicted using

ground-motion models which take the following form [e.g., Boore and Atkinson, 2008,


ln(Sai) = ln(Sai

)+σiεi + τiηi (5.1)

where Sai denotes the spectral acceleration (at the period of interest) at site i; Sai denotes

the predicted (by the ground-motion model) median spectral acceleration which depends

on parameters such as magnitude, distance, period and local-site conditions; εi denotes the

normalized intra-event residual and ηi denotes the normalized inter-event residual. Both εi

and ηi are univariate normal random variables with zero mean and unit standard deviation.

σi and τi are standard deviation terms that are estimated as part of the ground-motion model

and are functions of the spectral period of interest, and in some models also functions of the


earthquake magnitude and the distance of the site from the rupture. The term σiεi is called

the intra-event residual and the term τiηi is called the inter-event residual. The inter-event

residual is a constant across all the sites for any particular earthquake event. The sum of

the inter-event residual and the intra-event residual is called the total residual.

For a given earthquake, ground-motion intensities can be predicted by combining the

median intensity estimate with simulated values (realizations) of the normalized inter-event

and intra-event residuals, in accordance with Equation 5.1. Past research has indicated that

the normalized intra-event residuals at two different sites are correlated, and the extent of

this correlation depends on the separation distance between the sites [e.g., Jayaram and

Baker, 2009a, Wang and Takada, 2005, Boore et al., 2003] (Chapter 3 of this thesis). Any

simulation of the normalized intra-event residuals must account for this spatial correla-

tion in order to accurately quantify the regional ground-motion hazard [e.g., Jayaram and

Baker, 2010, Park et al., 2007] (Chapter 6 of this thesis). For illustration, Figure 5.1 shows

a sample simulated ground-motion intensity map (the intensity measure used here is Sa(1s),

the spectral acceleration at a period of 1 second) for a magnitude 8 earthquake on the San

Andreas fault. Figure 5.1a shows the median Sa(1s) values estimated using the Boore and

Atkinson [2008] ground-motion model. Figure 5.1b shows a sample realization of the sum

of the inter-event and the intra-event residuals, obtained considering spatial correlation.

Figure 5.1c shows the ground-motion intensities over the region obtained by combining

the median intensities and the simulated residuals.

While the above simulation technique is used for generating ground-motion maps in

the absence of any recorded intensities (say, for a future earthquake), it is often of interest

to quantify the ground-motion intensities over a region following an earthquake (e.g., for

determining the optimal post-earthquake emergency response strategy). Ground-motion

intensity predictions in such cases can be significantly improved (in other words, the un-

certainty in the predictions can be reduced) by utilizing the knowledge about the recorded

intensities.

This chapter primarily focuses on the simulation of correlated residuals with and with-

out consideration of recorded ground-motion intensities. A single-step simulation tech-

nique and a sequential simulation technique are described for simulating residuals in the


Figure 5.1: Ground-motion intensities map simulation: (a) median intensities (b) spatiallycorrelated normalized total residuals and (c) total intensities.


absence of recorded intensities. The sequential simulation technique is subsequently ex-

tended to incorporate information from recorded intensities.

This chapter is organized as follows. Sections 5.3.1 and 5.3.2 describe procedures for

simulating a vector of spatially-correlated intra-event residuals and the inter-event resid-

ual for future earthquakes. Section 5.4 describes an importance sampling procedure for

spatially-correlated residuals (used by Jayaram and Baker [2010] for improving the com-

putational efficiency of the lifeline risk assessment process). Section 5.5 describes a simu-

lation procedure that uses information about recorded ground-motion intensities for simu-

lating post-earthquake residuals.

5.3 Simulation of correlated normalized residuals without

consideration of recorded ground-motion intensities

This section describes a single-step and a sequential simulation technique for simulating

correlated normalized residuals.

5.3.1 Single-step simulation technique

Simulation of normalized intra-event residuals

Chapter 2 [Jayaram and Baker, 2008] showed that a vector of spatially-distributed normal-

ized intra-event residuals εεε = (ε1,ε2, · · · ,εp) (where p denotes the total number of sites of

interest) follows a multivariate normal distribution. This distribution is solely defined by

the mean and the variance of the marginal distributions (i.e., the mean and the variance of

εi, which are zero and one respectively), and the correlation between all εi and ε j pairs. The

correlation between the residuals is typically a function of the separation distance between

the residuals, and can be obtained from empirical spatial correlation models [e.g., Jayaram

and Baker, 2009a] (Chapter 3).

The single-step simulation technique makes use of this fact to simulate normalized

residuals as a vector of correlated standard normal random variables. In practice, this

is done using a computer function if available. For instance, the command ‘mvnrnd’ in


MATLAB accepts input mean and covariance matrices and outputs a vector of correlated

normally-distributed random variables. The mean matrix in this case is a vector of p zeros,

expressed as follows:

µµµ =

0

0

.

.

0

(5.2)

The covariance matrix of εεε , denoted ΣΣΣ, can be expressed as follows:

ΣΣΣ =

1 ρ12 · · · ρ1p

ρ21 1 · · · ρ2p

. . · · · .

. . · · · .

ρp1 ρp2 · · · 1

(5.3)

where ρi j is the correlation between εi and ε j. Chapter 3 [Jayaram and Baker, 2009a]

expressed ρi j as exp(−3hb ), where h is the separation distance between sites i and j, and b

is called the range parameter, which controls the rate of decay of correlation with distance.

The random variables can also be simulated in principle, by first simulating independent

standard normal random variables (for instance, using Box-Muller transform as described

in Law and Kelton [2007]) denoted nnn = [n1,n2, · · · ,np], and by subsequently inducing the

desired correlation between the independent variables using the Choleskey triangle. The

procedure used to induce this correlation is described below.

ΣΣΣ can be decomposed using the Choleskey decomposition [Law and Kelton, 2007] as

follows:

ΣΣΣ = LLLLLLt (5.4)

where LLL is a lower triangular matrix of size p by p and (.)t denotes the transpose operation.

The vector of independent standard normal variable realizations (nnn) can be converted to a


vector of correlated standard normal variables (eee = [e1,e2, · · · ,ep]) as follows:

eeet = LLLnnnt (5.5)

This vector eee serves as a realization of εεε .

Simulation of the normalized inter-event residuals

Following standard conventions, since the inter-event residual is a constant across all the

sites during a single earthquake [e.g., Abrahamson and Youngs, 1992], the simulated nor-

malized inter-event residuals should satisfy the following relation (which does not assume

that the τi’s are equal in order to be compatible with ground-motion models such as that of

Abrahamson and Silva [2008]):

ηi =τ1

τiη1 (5.6)

Thus the normalized inter-event residuals can be simulated by first simulating η1 from a

univariate normal distribution with zero mean and unit standard deviation (using randn or

mvnrnd in MATLAB for instance), and by subsequently evaluating other normalized inter-

event residuals using Equation 5.6.

Incidentally, if all the τi’s are equal, the ηi’s will be equal as well (=η). In this case,

the value of η can be simulated as a univariate normal variable with zero mean and unit

standard deviation.

Summary of the steps involved

In summary, the steps involved in the single-step simulation procedure are as follows:

• Step 1: Estimate the mean (Equation 5.2) and the covariance (Equation 5.3) matrices

of the residuals. The covariances can be computed from a spatial-correlation model

such as that of Jayaram and Baker [2009a].

• Step 2: Use a computer function such as mvnrnd to generate p jointly normally-

distributed random variables using the mean and the covariance matrices. If a com-

puter function is not available, the variables can be simulated by first simulating


p independent variables, and by subsequently inducing the correlation using the

Choleskey triangle, as described earlier in the section.

• Step 3: Simulate a normalized inter-event residual η1 from a univariate normal dis-

tribution with zero mean and unit standard deviation (using the same approach used

in Step 2). Estimate the other ηi’s using Equation 5.6. If all the τi’s are equal, all the

ηi’s will equal η1.

• Step 4: Obtain the spectral acceleration at all the sites by combining the medians and

the normalized inter- and intra-event residuals according to Equation 5.1.

5.3.2 Sequential simulation technique

Sequential simulation of intra-event residuals

The single-step simulation technique described previously is computationally inefficient

because the Choleskey decomposition (Equation 5.4) is an O(p3) operation (which is a

problem when p is large). One alternative to the single-step simulation technique is the

sequential simulation technique [Goovaerts, 1997, Deutsch and Journel, 1998] that lends

itself to performing computationally efficient simulations. In this technique, the residu-

als are simulated one at a time, conditioned on the residuals previously simulated. This

conditioning ensures that correlation between the residuals is appropriately accounted for.

The residuals can be simulated in any order as long as each residual is conditioned on ev-

ery other previously simulated residual. The following paragraphs describe the sequential

simulation technique for obtaining p intra-event residuals.

First, obtain e1 (a realization of ε1) by sampling from a univariate normal distribution

with zero mean and unit standard deviation. The other ei’s can be obtained using the proce-

dure described below for simulating εi assuming that ε1,ε2, · · · ,ε(i−1) have been previously

simulated.

Let e1,e2, · · · ,e(i−1) denote the simulated values of the normalized intra-event residuals

ε1,ε2, · · · ,ε(i−1). Since the ε’s follow a multivariate normal distribution, εi conditioned on[ε1,ε2, · · · ,ε(i−1)

]follows a univariate normal distribution with the following conditional


mean [Johnson and Wichern, 2007]:

E[εi∣∣ε1,ε2, · · · ,ε(i−1)

]= ΣiOΣ

−1OOeeeOOO (5.7)

where eeeOOO =[e1,e2, · · · ,e(i−1)

]t , ΣOO is the covariance matrix of[ε1,ε2, · · · ,ε(i−1)

], and ΣiO

is a row vector of covariances between εi and[ε1,ε2, · · · ,ε(i−1)

]. The symbol O denotes

the set of sites at which the residuals have been previously simulated. ΣiO is thus defined

as follows:

ΣΣΣiO =[ρi1 ρi2 · · · ρi(i−1)

](5.8)

and ΣOO is defined as follows:

ΣΣΣOO =

1 ρ12 · · · ρ1(i−1)

ρ21 1 · · · ρ2(i−1)

. . · · · .

. . · · · .

ρ(i−1)1 ρ(i−1)2 · · · 1

(5.9)

The variance of εi conditioned on[ε1,ε2, · · · ,ε(i−1)

]is expressed as follows:

var[εi∣∣ε1,ε2, · · · ,ε(i−1)

]= 1−ΣiOΣ

−1OOΣOi (5.10)

where ΣOi is the transpose of ΣiO.

ei is now obtained as a realization from a univariate normal distribution with the mean in

Equation 5.7 and the variance in Equation 5.10. This simulation can be performed using the

Box-Muller method [Law and Kelton, 2007] or using a computer function such as ‘randn’

in MATLAB.

As mentioned earlier, the primary reason for using the sequential simulation technique

is to achieve higher computational efficiency. The basic sequential simulation technique as

described above, however, does not provide much benefit since it requires the computation

of the inverses of several ΣOO matrices, some of which will be large if the number of

conditioning sites is large. Hence, in practice, the number of conditioning sites is always


Figure 5.2: Illustration of the sequential step procedure.

kept small even if a large number of residuals have previously been simulated. This is

typically done by conditioning εi on the q closest ε’s (closest in terms of the Euclidean

distance of the associated sites). This is reasonable because it has been observed in practice

that εi is screened by nearby ε’s from the effect of far away ε’s [Goovaerts, 1997]. Due to

this screening effect, the far-away residuals can be ignored without significantly affecting

the statistical properties of the simulated residuals. The value of q is typically chosen to be

between 10 and 30 to ensure accuracy and computational efficiency. Alternately, we can

also condition εi on only the residuals at sites that are within a distance r from site i, as

illustrated in Figure 5.2. A typical value for r is 30km.

When the residuals are not conditioned on all other previously simulated residuals,

Goovaerts [1997] reports that the order in which the residuals are simulated should be

randomized during the simulation of each ground-motion intensity map to avoid any bias.

This is, however, computationally inefficient since this necessitates the computation and

inversion of many more ΣiO and ΣOO matrices (since the matrices now vary from simulated

map to simulated map). For all practical purposes, the authors’ experience shows that the

use of a single fixed order causes negligible bias in the results, and hence can be used in

order to save significant computational effort (noting that ΣiO and Σ−1OO are identical across


simulations if a fixed order is assumed).

The inter-event residual simulation is identical to that described in Section 5.3.1.

Summary of the steps involved

In summary, the steps involved in the sequential simulation technique are as follows:

• Step 1: Simulate e1 (a realization of ε1 from a univariate normal distribution with

zero mean and unit standard deviation). Set variable i = 2.

• Step 2: Simulate ei conditioned on the previously simulated residuals ε1,ε2, ..,ε(i−1)

(or just the closest q residuals or the residuals that are within a distance r from site i)

from a univariate normal distribution with the mean in Equation 5.7 and the variance

in Equation 5.10.

• Step 3: Increment i by 1. If i is less than p, go to Step 2, else go to Step 4.

• Step 4: Simulate the normalized inter-event residuals as described in Section 5.3.1.

• Step 5: Obtain the spectral acceleration at all the sites by combining the medians and

normalized inter- and intra-event residuals according to Equation 5.1.

5.4 Importance sampling of normalized intra-event resid-

uals

Sometimes, it is of interest to preferentially sample ground-motion intensity maps with pos-

itive residuals in order to evaluate the performance of structures and lifelines under extreme

events [e.g., Jayaram and Baker, 2010, 2009b]. In such cases, the normalized residuals can

be sampled from an alternate distribution that produces a larger number of positive residu-

als. This procedure of using an alternate distribution for preferential sampling is known as

importance sampling [Law and Kelton, 2007].


Figure 5.3: The alternate sampling distribution (marginal distribution) used for the impor-tance sampling of residuals [Jayaram and Baker, 2010].

Jayaram and Baker [2010] sampled from a multivariate normal distribution with a pos-

itive mean for the marginal distributions of the normalized intra-event residuals as the al-

ternate sampling distribution (Figure 5.3), in order to preferentially generate positive resid-

uals. This choice was based on the simplicity of the corresponding importance sampling

weights, a parameter (discussed subsequently) that needs to be computed as part of the

importance sampling procedure.

There are minor differences between the sampling procedures using the original (zero

mean distribution) and the alternate (positive mean distribution) sampling distributions, and

these are listed below.

In the single-step simulation technique, Equation 5.5 is replaced by the following equa-

tion:

eee = m111ppp +LLLnnnt (5.11)

where m is the mean of the alternate sampling distribution, and 1p denotes a column vector

of ones of size p. Since the vector nnn is sampled from a zero mean distribution, it can be

noted that the mean of the sampled residuals eee equals the mean of the alternate sampling

distribution.


Equation 5.2 is replaced by

µµµ =

m

m

.

.

m

(5.12)

If the sequential simulation technique is used, Equation 5.7 is replaced by the following

equation:

E[εi∣∣ε1,ε2, · · · ,ε(i−1)

]= m+ΣiOΣ

−1OO(eeeOOO−m111ppp) (5.13)

It is to be noted that Equation 5.10 remains unaltered.

The rest of this section discusses the computation of the importance sampling weight

for this choice of the alternate sampling distribution. The importance sampling weight can

be viewed as a correction factor that accounts for the differences between the sampling

distribution and the true distribution. Suppose that we are interested in using a simulation-

based approach to compute the expected value of an arbitrary function of εεε denoted q(εεε).

Let f (εεε) denote the probability density function (PDF) of the normalized intra-event resid-

uals, and g(εεε) denote the alternate PDF. The expected value of q(εεε) (denoted H) can be

evaluated as follows:

H =∫

Dq(eee) f (eee)deee (5.14)

where D is the set of all values taken by eee.

The integral can be rewritten as follows:

H =∫

Dq(eee)

f (eee)g(eee)

g(eee)deee (5.15)

Equation 5.15 shows that H can be computed using samples from the alternate PDF

in place of samples from the true PDF if the function q(eee) is multiplied by the correction

factor f (eee)g(eee) . This correction factor is called the importance sampling weight.

In the specific application discussed in this chapter, the distributions f (eee) and g(eee) are

known to be multivariate normal, and are expressed as follows:


f (e) =1

(2π)p2 |Σ|

12

exp[−1

2eeet

Σ−1eee]

(5.16)

where Σ denotes the covariance matrix of εεε (Equation 5.3).

g(e) =1

(2π)p2 |Σ|

12

exp[−1

2(eee−m111ppp)

tΣ−1(eee−m111ppp)

](5.17)

When the single-step simulation technique is used, the importance sampling weight is

estimated as follows:

f (e)g(e)

= exp[

12(eee−m111ppp)

tΣ−1(eee−m111ppp)−

12

eeetΣ−1eee]

(5.18)

When the sequential simulation technique is used, the importance sampling weight is

computed using the following relationship:

f (e)g(e)

=f (e1)

g(e1)

f (e2|e1)

g(e2|e1)· · ·

f (ep|e1,e2, · · · ,ep−1)

g(ep|e1,e2, · · · ,ep−1)(5.19)

From Equations 5.7, 5.10 and 5.13,

f (ei|e1,e2, · · · ,ei−1)∼ N(ΣiOΣ−1OOeeeOOO,Σii−ΣiOΣ

−1OOΣOi) (5.20)

g(ei|e1,e2, · · · ,ei−1)∼ N(m+ΣiOΣ−1OO(eeeOOO−m111ppp),Σii−ΣiOΣ

−1OOΣOi) (5.21)

While the above discussion focuses on intra-event residuals, the same importance sam-

pling technique can also be used for preferentially sampling positive inter-event residuals.


5.5 Sequential simulation of correlated normalized resid-

uals with consideration of recorded ground-motion in-

tensities

In this section, a procedure is described for simulating normalized residuals conditioned

on recorded ground-motion intensities. Here, it is assumed for simplicity that the standard

deviations of the inter-event residual and that of the intra-event residuals are constants (i.e.,

σi = σ and τi = τ). Appendix 5.7 discusses the more general case that arises when this

assumption is not true.

In the simulation techniques described in the previous section, the inter-event residual

and the intra-event residuals are simulated separately. This is because the screening ef-

fect is more effective when the intra-event residuals are simulated separately as discussed

in more detail subsequently. When we wish to utilize the recorded intensity information,

however, it is preferable to simulate total residuals (sum of inter-event and intra-event resid-

uals) directly conditioned on total residuals computed from the recordings. This is because

the ground-motion intensity recordings only provide us with information about the total

residuals (computed as the difference between the observed logarithmic intensity and the

predicted logarithmic intensity). If the residual terms are to be simulated separately, the

recorded total residuals will first have to be split into the corresponding inter-event and

intra-event terms, which leads to statistical errors.

The recorded normalized total residual ε(t) can be computed from the recorded ground-

motion intensities as follows (using Equation 5.1):

ε(t)i =

σεi + τη√σ2 + τ2

=ln(Sai)− ln

(Sai

)√

σ2 + τ2(5.22)

where Sai , the observed spectral acceleration at site i, is the intensity measure considered,

and Sai , σ and τ are parameters computed from the ground-motion model as described

earlier. The normalizing factor in the above equation is√

σ2 + τ2 since the variance of the

total residual equals the sum of the variances of the inter-event and the intra-event residual.


The sequential simulation technique for the normalized intra-event residuals described

earlier in Section 5.3.2 can be used to simulate normalized total residuals as well. The

following changes are necessary, however, since total residuals are simulated directly and

since the recorded total residuals are now considered during the simulation procedure.

(a) Each ε(t)i is conditioned on the ε(t)’s previously simulated as well as the ε(t)’s at the

recording stations. In other words, from a simulation perspective, the recorded ε(t)’s are

treated as additional previously simulated ε(t)’s.

(b) As mentioned earlier, it is reasonable to condition ε(t)i on only the q closest normal-

ized total residuals (including recorded and previously simulated total residuals). It is to be

noted, however, that the screening effect used as the basis for this simplification is slightly

less effective when total residuals are directly simulated as compared to when intra-event

residuals are simulated separately. This is because even though the spatial correlation re-

duces with distance, the minimum value of spatial correlation between ε(t)i and ε

(t)j equals

τ2

σ2+τ2 and not zero (as before). Therefore, ignoring far away residuals can cause slightly

more bias in this case than when only intra-event residuals are simulated.

(c) The conditional mean and the conditional variance of ε(t)i (analogous to the quanti-

ties in Equations 5.7 and 5.10) are now obtained as follows:

E[ε(t)i

∣∣ε(t)1 ,ε(t)2 , · · · ,ε(t)q

]= Σ

(t)iO Σ

(t)−1

OO eee(t)OOO (5.23)

Var[ε(t)i

∣∣ε(t)1 ,ε(t)2 , · · · ,ε(t)q

]= 1−Σ

(t)iO Σ

(t)−1

OO Σ(t)Oi (5.24)

where the set[ε(t)1 ,ε

(t)2 , · · · ,ε(t)q

]comprises of the q closest recorded and previously sim-

ulated normalized total residuals, and eee(t)O denotes the realization (or recorded value as

appropriate) of[ε(t)1 ,ε

(t)2 , · · · ,ε(t)q

].

The covariance matrices ΣiO, ΣOO and Σii in Equations 5.8 and 5.9 were defined for

intra-event residuals only. The corresponding covariance matrices for the normalized total

residuals are obtained as follows:

ΣΣΣ(t)iO =

[σ2ρi1+τ2

σ2+τ2σ2ρi2+τ2

σ2+τ2 · · · σ2ρiq+τ2

σ2+τ2

](5.25)


ΣΣΣ(t)OO =

1 σ2ρ12+τ2

σ2+τ2 · · · σ2ρ1q+τ2

σ2+τ2

σ2ρ21+τ2

σ2+τ2 1 · · · σ2ρ2q+τ2

σ2+τ2

. . · · · .

. . · · · .σ2ρq1+τ2

σ2+τ2σ2ρq2+τ2

σ2+τ2 · · · 1

(5.26)

The rest of the sequential simulation technique is identical to the one described earlier.

In particular, residual ε(t)1 is first simulated as a univariate normally-distributed random

variable with zero mean and unit standard deviation. The residual ε(t)i is then simulated

conditioned on the previously simulated residuals from a normal distribution with the mean

in Equation 5.23 and the variance in Equation 5.24.

5.6 Conclusions

Quantifying the distribution of ground-motion intensities over a spatially-distributed re-

gion is an important task for several practical applications such as the risk assessment

and post-earthquake damage assessment of spatially-distributed systems. Often, this is

done using a simulation-based framework that involves generating probabilistic samples

of representative ground-motion intensity maps. This chapter discussed techniques for

simulating ground-motion intensity maps with and without the consideration of recorded

ground-motion intensities. A ground-motion intensity map is generated by combining me-

dian intensity predictions from ground-motion models with realizations of inter-event and

intra-event residuals that account for the uncertainty in the intensities. Intra-event residuals

can be simulated as a correlated vector of normal random variables, and the inter-event

residual can be simulated as a univariate normal random variable.

The chapter discussed two simulation techniques, namely, single-step simulation and

sequential simulation for generating residuals in the absence of recorded ground-motion

intensities. While both procedures are theoretically equivalent, it is possible to achieve

higher computational efficiency using the sequential simulation technique. The chapter also


described a sequential simulation technique for simulating residuals incorporating knowl-

edge about recorded ground-motion intensities. This is useful for post-earthquake damage

assessment and for determining optimal emergency response strategies.

5.7 Appendix: The conditional sequential simulation of

normalized heteroscedastic residuals

This section generalizes the results shown in section 5.5 for the case where the residuals

are heteroscedastic (i.e., σi and τi are not constant across all sites). The normalized total

residual is now defined as follows:

ε(t)i =

σiεi + τiηi√σ2

i + τ2i

(5.27)

The simulation procedure is similar to that described in section 5.5 with changes to the

covariance matrices shown in Equations 5.25 and 5.26. The new matrices can be estimated

as follows:

ΣΣΣ(t)iO =

[σiσ1ρi1+τiτ1√σ2

i +τ2i

√σ2

1+τ21

σiσ2ρi2+τiτ2√σ2

i +τ2i

√σ2

2+τ22· · · σiσqρiq+τiτq√

σ2i +τ2

i

√σ2

q+τ2q

](5.28)

ΣΣΣ(t)OO =

1 σ1σ2ρ12+τ1τ2√σ2

1+τ21

√σ2

2+τ22· · · σ1σqρ1q+τ1τq√

σ21+τ2

1

√σ2

q+τ2q

σ2σ1ρ21+τ2τ1√σ2

2+τ22

√σ2

1+τ21

1 · · · σ2σqρ2q+τ2τq√σ2

2+τ22

√σ2

q+τ2q

. . · · · .

. . · · · .σqσ1ρq1+τqτ1√σ2

q+τ2q

√σ2

1+τ21

σqσ2ρq2+τqτ2√σ2

q+τ2q

√σ2

2+τ22· · · 1

(5.29)

Chapter 6

Efficient sampling and data reductiontechniques for probabilistic seismiclifeline risk assessment

N. Jayaram and J.W. Baker (2010). Efficient sampling and data reduction techniques for

probabilistic seismic lifeline risk assessment, Earthquake Engineering and Structural Dy-

namics (published online).

6.1 Abstract

Probabilistic seismic risk assessment for spatially-distributed lifelines is less straightfor-

ward than for individual structures. While procedures such as the ‘PEER framework’ have

been developed for risk assessment of individual structures, these are not easily applica-

ble to distributed lifeline systems, due to difficulties in describing ground-motion inten-

sity (e.g., spectral acceleration) over a region (in contrast to ground-motion intensity at a

single site, which is easily quantified using Probabilistic Seismic Hazard Analysis), and

since the link between the ground-motion intensities and lifeline performance is usually

not available in closed form. As a result, Monte Carlo simulation and its variants are well

suited for characterizing ground motions and computing resulting losses to lifelines. This

111

CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 112

chapter proposes a simulation-based framework for developing a small but stochastically-

representative catalog of earthquake ground-motion intensity maps that can be used for

lifeline risk assessment. In this framework, Importance Sampling is used to preferentially

sample ‘important’ ground-motion intensity maps, and K-Means Clustering is used to iden-

tify and combine redundant maps in order to obtain a small catalog. The effects of sam-

pling and clustering are accounted for through a weighting on each remaining map, so the

resulting catalog is still a probabilistically correct representation. The feasibility of the

proposed simulation framework is illustrated by using it to assess the seismic risk of a

simplified model of the San Francisco Bay Area transportation network. A catalog of just

150 intensity maps is generated to represent hazard at 1,038 sites from ten regional fault

segments causing earthquakes with magnitudes between five and eight. The risk estimates

obtained using these maps are consistent with those obtained using conventional Monte

Carlo simulation utilizing many orders of magnitudes more ground-motion intensity maps.

Therefore, the proposed technique can be used to drastically reduce the computational ex-

pense of a simulation-based risk assessment, without compromising the accuracy of the

risk estimates. This will facilitate computationally intensive risk analysis of systems such

as transportation networks. Finally, the study shows that the uncertainties in the ground-

motion intensities and the spatial correlations between ground-motion intensities at various

sites must be modeled in order to obtain unbiased estimates of lifeline risk.

6.2 Introduction


for any society. Due to their known vulnerabilities, it is important to proactively assess

and mitigate the seismic risk of lifelines. For instance, the Northridge earthquake caused

over $1.5 billion in business interruption losses ascribed to transportation network damage

[Chang, 2003]. The city of Los Angeles suffered a power blackout and $75 million of

power-outage related losses as a result of the earthquake [e.g., Tanaka et al., 1997]. Re-

cently, the analytical Pacific Earthquake Engineering Research Center (PEER) loss analysis

framework has been used to perform risk assessment for a single structure at a given site,


by estimating the site ground-motion hazard and assessing probable losses using the haz-

ard information [e.g., McGuire, 2007]. Lifeline risk assessment, however, is based on a

large vector of ground-motion intensities (e.g., spectral accelerations at all lifeline compo-

nent locations). The intensities also show significant spatial correlation, which needs to be

carefully modeled in order to accurately assess the seismic risk. Further, the link between

the ground-motion intensities at the sites and the performance of the lifeline is usually

not available in closed form. For instance, the travel time of vehicles in a transportation

network, a commonly-used performance measure, is only obtained using an optimization

procedure rather than being a closed-form function of the ground-motion intensities. These

additional complexities make it difficult to use the PEER framework for lifeline risk as-

sessment. There are some analytical approaches that are sometimes used for lifeline risk

assessment [e.g., Kang et al., 2008, Duenas-Osorio et al., 2005], but those are generally ap-

plicable to only specific classes of lifeline reliability problems. Hence, many past research

works use simulation-based approaches instead of analytical approaches for lifeline risk

assessment [e.g., Campbell and Seligson, 2003, Bazzurro and Luco, 2004, Crowley and

Bommer, 2006, Kiremidjian et al., 2007, Shiraki et al., 2007]. One simple simulation-based

approach involves studying the performance of lifelines under those earthquake scenarios

that may dominate the hazard in the region of interest [e.g., Adachi and Ellingwood, 2008].

While this approach is more tractable, it does not capture seismic hazard uncertainties in

the way a Probabilistic Seismic Hazard Analysis (PSHA)-based framework would. Fur-

ther, it is not easy to identify the earthquake scenario that dominates the hazard at the loss

levels of interest [Jayaram and Baker, 2009b]. (Appendix C uses lifeline loss deaggrega-

tion calculations to illustrate the difficulties involved in selecting a dominating earthquake

scenario.) A more comprehensive approach uses Monte Carlo simulation (MCS) to prob-

abilistically generate ground-motion intensity maps (also referred to as intensity maps in

this chapter), considering all possible earthquake scenarios that could occur in the region,

and then use these for the risk assessment. Ground-motion intensities are generated using

an existing ground-motion model, which is described below.

The ground-motion intensity at a site is modeled as


ln(Sai j) = ln(Sai j

)+σi jεi j + τi jηi j (6.1)

where Sai j denotes the spectral acceleration (at the period of interest) at site i during earth-

quake j; Sai j denotes the predicted (by the ground-motion model) median spectral accel-

eration which depends on parameters such as magnitude, distance, period and local-site

conditions; εi j denotes the normalized intra-event residual and ηi j denotes the normalized

inter-event residual. Both εi j and ηi j are univariate normal random variables with zero

mean and unit standard deviation. σi j and τi j are standard deviation terms that are es-

timated as part of the ground-motion model and are functions of the spectral period of

interest, and in some models also functions of the earthquake magnitude and the distance

of the site from the rupture. The term σi jεi j is called the intra-event residual and the term

τi jηi j is called the inter-event residual. The inter-event residual is a constant across all the

sites for a given earthquake.

Crowley and Bommer [2006] describe the following MCS approach to simulate inten-

sity maps using Equation 6.1:

Step 1: Use Monte Carlo simulation to generate earthquakes of varying magnitudes on

the active faults in the region, considering appropriate magnitude-recurrence relationships

(e.g., the Gutenberg-Richter relationship).

Step 2: Using a ground-motion model (Equation 6.1), obtain the median ground-motion

intensities (Sai j) and the standard deviations of the inter-event and the intra-event residuals

(σi j and τi j) at all the sites.

Step 3: Generate the normalized inter-event residual term (ηi j) by sampling from the

univariate normal distribution.

Step 4: Simulate the normalized intra-event residuals (εi j’s) using the parameters pre-

dicted by the ground-motion model. Chapter 2 [Jayaram and Baker, 2008] showed that a

vector of spatially-distributed normalized intra-event residuals εεε jjj =(ε1 j,ε2 j, · · · ,εp j

)fol-

lows a multivariate normal distribution. Hence, the distribution of εεε jjj can be completely

defined using the mean (zero) and standard deviation (one) of εi j, and the correlation be-

tween all εi1 j and εi2 j pairs. The correlations between the residuals can be obtained from a

predictive model calibrated using past ground-motion intensity observations [Jayaram and


Baker, 2009a, Wang and Takada, 2005].

Step 5: Combine the median intensities, the normalized intra-event residuals and the

normalized inter-event residual for each earthquake in accordance with Equation 6.1 to

obtain ground-motion intensity maps (i.e., obtain Sa j =(Sa1 j ,Sa2 j , · · · ,Sap j

)).

Crowley and Bommer [2006] used the above-mentioned approach to generate multiple

earthquake scenarios that were then used for the loss assessment of a portfolio of build-

ings. They found that the results differed significantly from those obtained using other

approximate approaches (e.g., using PSHA to obtain individual site hazard and loss ex-

ceedance curves, which are then heuristically combined to obtain the overall portfolio loss

exceedance curve). Crowley and Bommer [2006], however, ignored the spatial correlations

of εi j’s when simulating intensity maps. Further, they used conventional MCS (i.e., brute-

force MCS or random MCS), which is computationally inefficient because large magni-

tude events and above-average ground-motion intensities are considerably more important

than small magnitude events and small ground-motion intensities while modeling lifeline

risks, but these are infrequently sampled in conventional MCS. Kiremidjian et al. [2007]

improved the simulation process by preferentially simulating large magnitudes using im-

portance sampling (IS). The normalized residuals (εi j and ηi j), however, were simulated

using conventional MCS.

Shiraki et al. [2007] also used a MCS-based approach to estimate earthquake-induced

delays in a transportation network. They generated a catalog of 47 earthquakes and cor-

responding intensity maps for the Los Angeles area and assigned probabilities to these

earthquakes such that the site hazard curves obtained using this catalog match with the

known local site hazard curves obtained from PSHA. In other words, the probabilities of

the scenario earthquakes were made to be hazard consistent. Only median peak ground

accelerations were used to produce the ground-motion intensity maps corresponding to the

scenario earthquakes, however, and the known variability about these medians was ignored.

While this approach is highly computationally efficient on account of the use of a small

catalog of earthquakes, the selection of earthquakes is a somewhat subjective process, and

the assignment of probabilities is based on hazard consistency rather than on actual event

likelihoods. Moreover, the procedure does not capture the effect of the uncertainties in

ground-motion intensities.


The current research work develops an importance sampling-based framework to ef-

ficiently sample important magnitudes and ground-motion residuals. It is seen that the

number of IS simulations is about two orders of magnitude smaller than the number of

Monte Carlo simulations required to obtain equally accurate lifeline loss estimates. De-

spite this improvement with respect to the performance of the conventional MCS approach,

the number of IS intensity maps required for risk assessment is still likely to be an incon-

veniently large number. As a result, the K-means clustering technique is used to further

reduce the number of intensity maps required for risk assessment by over an order of mag-

nitude. The feasibility of the proposed framework is illustrated by assessing the seismic

risk of an aggregated form of the San Francisco Bay Area transportation network using

a sampled catalog of 150 intensity maps. The resulting risk estimates are shown to be in

good agreement with those obtained using the conventional MCS approach (the benchmark

method).

6.3 Simulation of ground-motion intensity maps using im-

portance sampling

This section provides a description of the importance sampling technique used in the cur-

rent work to efficiently simulate ground-motion intensity maps. Importance sampling (IS)

is a technique used to evaluate functions of random variables with a certain probability

density function (PDF) using samples from an alternate density function [Fishman, 2006].

This technique is explained in more detail in section 6.3.1. Sections 6.3.2, 6.3.3 and 6.3.4

describe the application of IS to the simulation of ground-motion intensity maps, which in-

volves probabilistically sampling a catalog of earthquake magnitudes and rupture locations

(which are required for computing the median ground-motion intensities), the normalized

inter-event residuals and the normalized intra-event residuals (Equation 6.1).


6.3.1 Importance sampling procedure

Let f (x) be a PDF defined over domain D for random variable X . Define an integral H as

follows:

H =∫

Dq(x) f (x)dx (6.2)

where q(x) is an arbitrary function of x. The integral can be rewritten as follows:

H =∫

Dq(x)

f (x)g(x)

g(x)dx (6.3)

where g(x) is any probability density assuming non-zero values over the same domain D.

The term f (x)g(x) is called the importance sampling weight.

Based on Equation 6.2, the integral H can be estimated using conventional MCS as

follows:

H =1n

n

∑i=1

q(xi) (6.4)

where H is an estimate of H and x1, ...,xn are n realizations of the random variable X

obtained using f (x). The IS procedure involves estimating the integral H using the alternate

density g(x) as follows (based on Equation 6.3):

H =1r

r

∑i=1

q(yi)f (yi)

g(yi)(6.5)

where y1, ...,yr are r realizations from g(y), and f (yi)g(yi)

is a weighting function (the impor-

tance sampling weight) that accounts for the fact that the realizations are based on the

alternate density g(y) rather than the original density f (y).

While Equations 6.4 and 6.5 provide two methods of estimating the same integral H, it

can be shown that the variance of the estimate H obtained using Equation 6.5 can be made

very small if an appropriate alternate density function g(x) is chosen [Fishman, 2006]. As a

result of this variance reduction, the required number of IS realizations (r) is much smaller

than the required number of conventional MCS realizations (n) for an equally reliable (i.e.,

same variance) estimate H.

Intuitively, the density g(x) should be such that the samples from g(x) are concentrated


in regions where the function q(x) is ‘rough’. This will ensure fine sampling in regions

that ultimately determine the accuracy of the estimate and coarse sampling elsewhere. The

challenge in implementing IS lies in choosing this alternate density g(x). Useful alternate

densities for this application are provided in the following subsections.

6.3.2 Simulation of earthquake catalogs

Let n f denote the number of active faults in the region of interest and ν j denote the annual

recurrence rate of earthquakes on fault j with magnitudes exceeding a minimum magnitude

mmin. Let f j(m) denote the density function for magnitudes of earthquakes on fault j. Let

f (m) denote the density function for the magnitude of an earthquake on any of the n f

faults (i.e., this density function models the distribution of earthquakes resulting from all

the faults). Using the theorem of total probability, f (m) can be computed as follows:

f (m) =∑

n fj=1 ν j f j(m)

∑n fj=1 ν j

(6.6)

In the event of an earthquake of magnitude m on a random fault, let Pj(m) denote the

probability that the earthquake rupture lies on fault j. The Pj(m)’s can be calculated using

the Bayes’ theorem as follows:

Pj(m) =ν j f j(m)

∑n fj=1 ν j f j(m)

(6.7)

A conventional MCS approach would use the density function f (m) to simulate earth-

quake magnitudes, although this approach will result in a large number of small magnitude

events since such events are considerably more probable than large magnitude events. This

is not efficient since lifeline losses due to frequent small events are less important than

those due to rare large events (although not negligible, so they can not be ignored). It is

desirable to improve the computational efficiency of the risk assessment process without

compromising the accuracy of the estimates by using the importance sampling technique

described in section 6.3.1 to preferentially sample large events while still ensuring that the

simulated events are ‘stochastically representative’. In other words, the magnitudes are


simulated from a sampling distribution g(m) (rather than f (m)), which is chosen to have a

high probability of producing large magnitude events.

Let mmin and mmax denote the range of magnitudes of interest. This range [mmin,mmax]

can be stratified into nm partitions as follows:

[mmin,mmax] = [mmin,m2)∪ [m2,m3)∪·· ·∪ [mnm,mmax] (6.8)

In the current work, the partitions are chosen such that the width of the interval (i.e., mk+1−mk) is large at small magnitudes and small at large magnitudes (Figure 6.1a). A single

magnitude is randomly sampled from each partition using the magnitude density function

f (m), thereby obtaining nm realizations of the magnitudes. Since, the partitions are chosen

to have small widths at large magnitudes, there are naturally a larger number of realizations

of large magnitude events. In this case, the sampling distribution g(m) is not explicit,

but rather is implicitly defined by the magnitude selection partitioning. This procedure,

sometimes called stratified sampling, has the advantage of forcing the inclusion of specified

subsets of the random variable while maintaining the probabilistic character of random

sampling [Fishman, 2006].

The importance sampling weight f (m)g(m) can be obtained by noting that the sampling

distribution assigns equal weight to all the chosen partitions (1/nm), while the actual prob-

ability of a magnitude lying in a partition (mk,mk+1) is obtained by integrating the density

function f (m). Hence, the importance sampling weight for a magnitude m chosen from the

kth partition is computed as follows:

f (m)

g(m)=

∫ mk+1mk

f (m)dm

1/nm(6.9)

Once the magnitudes are sampled using IS, the rupture locations can be obtained by

sampling faults using fault probabilities Pj(m) (Equation 6.7). It is to be noted that Pj(m)

will be non-zero only if the maximum allowable magnitude on fault j exceeds m. Let

n f (m) denote all such faults with non-zero values of Pj(m). If n f (m) is small (around

10), a more efficient sampling approach will be to consider each of those n f (m) faults

to be the source of the earthquake and consider n f (m) different earthquakes of the same


Figure 6.1: Importance sampling density functions for: (a) magnitude and (b) normalizedintra-event residual; (c) recommended mean-shift as a function of the average number ofsites and the average site-to-site distance normalized by the range of the spatial correlationmodel.


simulated magnitude. It is to be noted that this fault sampling procedure is similar to the

importance sampling of magnitudes. The importance sampling weight for fault j chosen

by this procedure is computed as follows:

f ( j|m)

g( j|m)=

Pj(m)

1/n f (m)(6.10)

where f ( j|m) and g( j|m) denote the original and the alternate (implicit) probability mass

functions for fault j given an earthquake of magnitude m. Once a fault is sampled, the

rupture is located randomly on the fault.

6.3.3 Simulation of normalized intra-event residuals

The set of normalized intra-event residuals at p sites of interest, εεε j =(ε1 j,ε2 j, · · · ,εp j

),

follows a multivariate normal distribution f (εεε j) [Jayaram and Baker, 2008]. The mean of

εεε j is the zero vector of size p, while the variance of each εi j equals one. The correlation

between the residuals at two sites is a function of the separation between the sites, and

can be obtained from a spatial correlation model. In this work, the correlation coefficient

between the residuals at two sites i1 and i2 separated by h km is computed using the fol-

lowing equation, which was calibrated using empirical observations (Chapter 3) [Jayaram

and Baker, 2009a]:

ρεi1 j,εi2 j(h) = exp(−3h/R) (6.11)

where R controls the rate of decay of spatial correlation and is called the ‘range’ of the

correlation model. The range depends on the intensity measure being used. In this work,

the intensity measure of interest is the spectral acceleration corresponding to a period of 1

second, and the corresponding value of R equals 26 km.

While a conventional MCS approach can be used to obtain realizations of εεε j using f (e)[Fishman, 2006], this will result in a large number of near-zero (i.e., near-mean) residu-

als and few realizations from the upper and the lower tails. This is inefficient since for

the purposes of lifeline risk assessment it is often of interest to study the upper tail (i.e.,

the εεε j values that produce large intensities), which is not sampled adequately in the con-

ventional MCS approach. An efficient alternate sampling density g(e) is a multivariate


normal density with the same variance and correlation structure as f (e), but with posi-

tive means for all ε ′i js (i.e., a positive mean for the marginal distribution of each intra-

event residual). In other words, the mean vector of g(e) is the p-dimensional vector

mmmsssintra = (msintra,msintra, · · · ,msintra). Sampling normalized intra-event residuals from

this distribution g(e), which has a positive mean, will produce more realizations of large

normalized intra-event residuals. Figure 6.1b shows the original and sampling marginal

distributions for one particular εi j. It is to be noted that this particular choice of the sam-

pling distribution results in importance sampling weights that are simple to estimate. The

importance sampling weights can be estimated as follows:

f (e)g(e)

= exp(

12(((eee−−−mmmsssintra)))

′Σ−1(((eee−−−mmmsssintra)))−

12

eee′Σ−1eee)

(6.12)

where Σ denotes the covariance matrix of εεε j.

The positive mean of g(e) will ensure that the realizations from g(e) will tend to be

larger than the realizations from f (e). It is, however, important to choose a reasonable

value of the mean-shift msintra to ensure adequate preferential sampling of large εεε j’s, while

avoiding sets of extremely large normalized intra-event residuals that will make the simu-

lated intensity map so improbable as to be irrelevant. The process of selecting a reasonable

value of msintra is described below.

The first step in fixing the value of msintra is to note that the preferred value depends

predominantly on three factors, namely, the extent of spatial correlations (measured by the

range parameter R in Equation 6.11), the average site-to-site separation distance in the life-

line network being studied and the number of sites in the network. If sites are close to one

another and if the spatial correlations are significant, the correlations between the residuals

permit a larger mean-shift as it is reasonably likely to observe simultaneously large values

of positively-correlated random variables. Similarly, the presence of fewer sites permits

larger mean-shifts since it is more likely to observe jointly large values of residuals over a

few sites than over a large number of sites. Hence, it is intended to determine the preferred

mean-shifts as a function of the number of sites and the average site-to-site separation

distances normalized by the range parameter. This is done by simulating the normalized


intra-event residuals in hypothetical analysis cases with varying numbers of sites and vary-

ing average site separation distances, considering several feasible mean-shifts in each case.

The feasibility of the resulting residuals (i.e., whether the simulated set of residuals is rea-

sonably probable) is then studied using the resulting importance sampling weights. Based

on extensive sensitivity analysis, the authors found that the best results are obtained when

30% of the importance sampling weights fall below 0.1, if exceedance rates larger than

10−6 are of interest. The preferred mean-shifts are determined for each case based on this

criterion, and are plotted in Figure 6.1c. This figure will enable users to avoid an extremely

computationally expensive search for an appropriate sampling distribution in a given analy-

sis case. Incidentally the figure shows that the mean-shift increases with average site sepa-

ration distance and decreases with the number of sites. This validates the above-mentioned

statement that larger site separation distances and fewer sites permit larger mean-shifts.

6.3.4 Simulation of normalized inter-event residuals

Following standard conventions, since the inter-event residual is a constant across all the

sites during a single earthquake [e.g., Abrahamson and Youngs, 1992], the simulated nor-

malized inter-event residuals should satisfy the following relation (which does not assume

that the τi j’s are equal in order to be compatible with ground-motion models such as that

of Abrahamson and Silva [2008]):

ηi j =τ1 j

τi jη1 j ∀ j (6.13)

Thus the normalized inter-event residuals can be simulated by first simulating η1 j from

a univariate normal distribution with zero mean and unit standard deviation, and by sub-

sequently evaluating other normalized inter-event residuals using Equation 6.13. The IS

procedure for η1 j is similar to that for εεε j, except that the alternate sampling distribution

is univariate normal rather than multivariate normal, and has unit standard deviation and a

positive mean msinter. The likelihood ratio in this case is

f (t)g(t)

= exp(

12(t−msinter)

2− 12

t2)

(6.14)


where t denotes a realization of the normalized inter-event residual.

The authors have found that values of msinter between 0.5 and 1.0 produce an appropri-

ate number of normalized inter-event residuals from the tail of the distribution.

6.4 Lifeline risk assessment

In this chapter, it is intended to obtain the exceedance curve for a lifeline loss measure

denoted L (e.g., travel-time delay in a transportation network) considering seismic hazard.

The exceedance curve, which provides the annual exceedance rates of various values of L, is

the product of the exceedance probability curve and the total recurrence rate of earthquakes

exceeding the minimum considered magnitude on all faults.

νL≥u =

(n f

∑j=1

ν j

)P(L≥ u) (6.15)

A simple way to compute the annual exceedance rates, while treating each fault separately,

would be to compute ∑n fj=1 ν jP(L j ≥ u), where P(L j ≥ u) denotes the exceedance proba-

bility for fault j, and the ν j values account for unequal recurrence rates across faults. That

approach is not possible here because the importance sampling of Equation 6.9 makes sep-

aration by faults difficult. In Equation 6.15, P(L≥ u) is the probability that the loss due to

any earthquake event of interest (irrespective of the fault of occurrence) exceeds u. It can

be computed using the simulated maps, and in that form already accounts for the individual

P(L j ≥ u) values and the ν j values.

6.4.1 Risk assessment based on realizations from Monte Carlo simu-lation

If a catalog of n intensity maps obtained using the conventional MCS approach is used for

the risk assessment, the empirical estimate of the exceedance probabilities (P(L ≥ u)) can


be obtained as follows (from Equation 6.4):

P(L≥ u) =1n

n

∑i=1

I(li ≥ u) (6.16)

where li is the loss level corresponding to intensity map i, and I(li ≥ u) is an indicator

function which equals 1 if li ≥ u and 0 otherwise.

6.4.2 Risk assessment based on realizations from importance sam-pling

The summand in Equation 6.16 can be evaluated using the approach described in section

6.3. Assuming that a catalog of r importance sampling-based intensity maps are used for

evaluating the risk, the estimate of the exceedance probability curve can be obtained as

follows (from Equation 6.5):

P(L≥ u) =1r

r

∑i=1

I(li ≥ u)fS(i)gS(i)

(6.17)

where fS(i)gS(i)

is the importance sampling weight corresponding to scenario intensity map i,

which can be evaluated as follows:

fS(i)gS(i)

=f (m)

g(m)

f ( j|m)

g( j|m)

f (e)g(e)

f (t)g(t)

= Λi (6.18)

where m, j, e, t denote the magnitude, fault, normalized intra-event residuals and normal-

ized inter-event residual corresponding to map i respectively. The terms in Equation 6.18

can be obtained from Equations 6.9, 6.10, 6.12 and 6.14.

Equation 6.17 shows that the exceedance probability curve is obtained by weighting

the indicator functions by the importance sampling weights for the maps. In the rest of

the chapter, this weight is denoted Λi as shown in Equation 6.18. Using this notation for

weight, Equation 6.17 can be rewritten as follows:

P(L≥ u) =1r

r

∑i=1

I(li ≥ u)Λi =∑

ri=1 I(li ≥ u)Λi

∑ri=1 Λi

(6.19)


The second equality in the above equation comes from the fact that ∑ri=1 Λi = r, as seen by

substituting u = 0 in the equation and noting that P(L≥ 0) = 1.

The variance (var) of this estimate can be shown to be

var[P(L≥ u)

]=

∑ri=1[I(li ≥ u)Λi− P(L≥ u)

]2(∑r

i=1 Λi)(∑ri=1 Λi−1)

(6.20)

6.5 Data reduction using K-means clustering

The use of importance sampling causes a significant improvement in the computational

efficiency of the simulation procedure, but the number of required IS intensity maps is

still large and may pose a heavy computational burden. K-means clustering [McQueen,

1967] is thus used as a data reduction technique in order to develop a smaller catalog of

maps by ‘clustering’ simulated ground-motion intensity maps with similar properties (i.e.,

similar spectral acceleration values at the sites of interest). This data reduction procedure is

also used in machine learning and signal processing, where it is called vector quantization

[Gersho and Gray, 1991].

K-means clustering groups a set of observations into K clusters such that the dissim-

ilarity between the observations (typically measured by the Euclidean distance) within a

cluster is minimized [McQueen, 1967]. Let Sa1,Sa2 , · · · ,Sar denote r maps generated us-

ing importance sampling to be clustered, where each map Sa j is a p-dimensional vector

defined by Sa j =[Sa1 j ,Sa2 j , · · · ,Sap j

]. The K-means method groups these maps into clus-

ters by minimizing V , which is defined as follows:

V =K

∑i=1

∑Sa j∈Si

‖Sa j −Ci‖2 (6.21)

where K denotes the number of clusters, Si denotes the set of maps in cluster i, Ci =

[C1i,C2i, · · · ,Cpi] is the cluster centroid obtained as the mean of all the maps in cluster i,

and ‖Sa j −Ci‖2 denotes the distance between the map Sa j and the cluster centroid Ci. If

the Euclidean distance is adopted to measure dissimilarity, then the distance between Sa j


and Ci is computed as follows:

‖Sa j −Ci‖2 =p

∑q=1

(Saq j −Cqi

)2 (6.22)

In its simplest version, the K-means algorithm is composed of the following four steps:

Step 1: Pick K maps to denote the initial cluster centroids. This selection can be done

randomly.

Step 2: Assign each map to the cluster with the closest centroid.

Step 3: Recalculate the centroid of each cluster after the assignments.

Step 4: Repeat steps 2 and 3 until no more reassignments take place.

Once all the maps are clustered, the final catalog can be developed by selecting a single

map from each cluster, which is used to represent all maps in that cluster on account of

the similarity of the maps within a cluster. In other words, if the map selected from a

cluster produces loss l, it is assumed that all other maps in the cluster produce the same

loss l by virtue of similarity. The maps in this smaller catalog can be used in place of

the maps generated using importance sampling for the risk assessment (i.e., for evaluating

P(L ≥ u)), which results in a dramatic improvement in the computational efficiency. This

is particularly useful in applications where it is practically impossible to compute the loss

measure L using more than K maps (where K equals a few hundreds). In such cases, the

maps obtained using IS can be grouped using the K-means method into K clusters, and one

map can be randomly selected from each cluster in order to obtain the catalog of intensity

maps to be used for the risk assessment. This procedure allows us to select K strongly

dissimilar intensity maps as part of the catalog (since the maps eliminated are similar to

one of these K maps in the catalog), but will ensure that the catalog is ‘stochastically

representative’. Because only one map from each cluster is now used, the total weight

associated with the map should be equal to the sum of the weights of all the maps in that

cluster (∑ri=1 Λi). It is to be noted that even though the maps within a cluster are expected

to be similar, for probabilistic consistency, a map must be chosen from a cluster with a

probability proportional to its weight. Equation 6.19 can then be used with these sampled

maps and the total weights to compute an exceedance probability curve using the catalog


as follows:

P(L≥ u) =∑

Kc=1 I

(l(c) ≥ u

)(∑i∈c Λi)

∑Kc=1 (∑i∈c Λi)

(6.23)

where l(c) denotes the loss measure associated with the map selected from cluster c

Appendix 6.8 shows that the exceedance probabilities obtained using Equation 6.23

will be unbiased. This and the fact that all the random variables are accounted for appropri-

ately is the reason why the catalog selected is claimed to be stochastically representative.

Incidentally, the computational efficiency of this procedure can be improved with minor

modifications to the clustering approach, as described in Appendix 6.9.

6.6 Application: Seismic risk assessment of the San Fran-

cisco Bay Area transportation network

In this section, the San Francisco Bay Area transportation network is used to illustrate

the feasibility of the proposed risk assessment framework. It is intended to show that

the seismic risk estimated using the catalog of 150 intensity maps matches well with the

seismic risk estimated using the conventional MCS framework and a much greater number

of maps (which is the benchmark approach). The catalog size of 150 is chosen since it may

be tractable to a real-life lifeline risk assessment problem. If reduced accuracy and reduced

emphasis on very large losses is acceptable, the number of maps could be reduced even

further. Alternately, a larger number of maps can be chosen if the computational demand

remains tractable.

6.6.1 Network data

The San Francisco Bay Area transportation network data are obtained from Stergiou and

Kiremidjian [2006]. Figure 6.2a shows the Metropolitan Transportation Commission

(MTC) San Francisco Bay Area highway network, which includes 29,804 links (roads) and

10,647 nodes. The network also consists of 1,125 bridges from the five counties of the Bay

Area. Stergiou and Kiremidjian [2006] classified these bridges based on their structural

properties in accordance with the HAZUS [1999] manual. (The HAZUS [1999] fragility


functions are used here only for illustrative purposes, and more realistic fragility functions

can be used if applicable.) This classification is useful for estimating the structural damage

to bridges due to various simulated intensity maps. The Bay Area network consists of a to-

tal of 1,120 transportation analysis zones (TAZ), which are used to predict the trip demand

in specific geographic areas. The origin-destination (OD) data provided by Stergiou and

Kiremidjian [2006] were obtained from the 1990 MTC household survey [Purvis, 1999].

Analyzing the performance of a network as large and complex as the San Francisco

Bay Area transportation network under a large number of scenarios is extremely computa-

tionally intensive. Therefore, an aggregated representation of the Bay Area network is used

for this example application. The aggregated network consists predominantly of freeways

and expressways, along with the ramps linking the freeways and expressways. The nodes

are placed at locations where links intersect or change in characteristics (e.g., change in

the number of lanes). The aggregated network comprises of 586 links and 310 nodes, and

is shown in Figure 6.2b. Of the 310 nodes, 46 are denoted centroidal nodes that act as

origins and destinations for the traffic. These centroidal nodes are chosen from the cen-

troidal nodes of the original network in such a way that they are spread out over the entire

transportation network. The data from the 1990 MTC household survey are aggregated to

obtain the traffic demands at each centroidal node. The aggregation involves assigning the

traffic originating or culminating in any TAZ to its nearest centroidal node. Of the 1,125

bridges in the original network, 1,038 bridges lie on the links of the aggregated network

and are considered in the risk assessment procedure.

While the performance of the aggregated network may or may not be similar to that of

the full network, the aggregated network serves as a reasonably realistic and complex test

case for the proposed framework, to demonstrate its feasibility. The goal is to demonstrate

that the data reduction techniques proposed here produce the same exceedance curve as

the more exhaustive MCS. The simplified network is simple enough that MCS is feasible,

but still retains the spatial distribution and network effects that are characteristic of more

complex models. If the proposed techniques can be shown to be effective for this simplified

model, then they can be used with more complex models where validation using MCS is

not feasible.


6.6.2 Transportation network loss measure

A popular measure of network performance is the travel-time delay experienced by pas-

sengers in a network after an earthquake [Stergiou and Kiremidjian, 2006, Shiraki et al.,

2007]. The delay is computed as the difference between the total travel time in the network

before and after an earthquake.

Estimating travel time in the network

The total travel time (T ) in a network is estimated as follows:

T = ∑i∈links

xiti(xi) (6.24)

where xi denotes the traffic flow on link i and ti(xi) denotes the travel time of an individual

passenger on link i. The travel time on link i is obtained as follows [Bureau of Public

Roads, 1964]:

ti(xi) = t fi

[1+α

(xi

ci

)β]

(6.25)

where t fi denotes the free-flow link travel time (i.e., the travel time of a passenger if link i

were to be empty), ci is the capacity of link i, α and β are calibration parameters, taken as

0.15 and 4 respectively [Shiraki et al., 2007].

Travel times on transportation networks are usually computed using the user equilib-

rium principle [Beckman et al., 1956], which states that each individual user would follow

the route that will minimize his or her travel time. Based on the user-equilibrium principle,

the link flows in the network are obtained by solving the following optimization problem:

min ∑i∈{links}

∫ xi

0ti(u)du (6.26)

subject to the following constraints:

∑j∈paths

f odj = Qod ∀o ∈ {org},d ∈ {dest} (6.27)


xi = ∑o∈org

∑d∈dest

∑j∈paths

f odj δ

odji ∀i ∈ {links} (6.28)

f odj ≥ 0 ∀o ∈ {org},d ∈ {dest}, j ∈ {paths} (6.29)

where f odj denotes the flow between origin o and destination d that passes through path j

(here, a path denotes a set of links through which the flow between a specified origin and

a specified destination occurs), Qod denotes the desired flow between o and d, δ odji is an

indicator variable that equals 1 if the link i lies on path j and 0 otherwise, org denotes the

set of all origins and dest denotes the set of all destinations. The current research work uses

a popular solution technique for this optimization problem provided by Frank and Wolfe

[1956]. It is to be noted that there are also other travel time and traffic flow estimation

techniques such as the dynamic user equilibrium formulation [e.g., Friesz et al., 1993] that

could incorporate the non-equilibrium conditions which might exist after an earthquake.

Post-earthquake network performance

The current work assumes for simplicity that the post-earthquake demands equal the pre-

earthquake demands even though this is known not to be true [Kiremidjian et al., 2003].

The changes in network performance after an earthquake are assumed to be due only to the

delay and rerouting of traffic caused by structural damage to bridges. The damage states of

the bridges are computed considering only the ground shaking, and other possible damage

mechanisms such as liquefaction are not considered. The bridge fragility curves provided

by HAZUS [1999] are used to estimate the probability of a bridge being in or exceeding a

particular damage state (no damage, minor damage etc.) based on the simulated ground-

motion intensity (spectral acceleration at 1 second) at the bridge site. These damage state

probabilities are then used to simulate the damage state of the bridge following the earth-

quake. Damaged bridges cause reduced capacity in the link containing the bridge. The

reduced capacities corresponding to the five different HAZUS damage states are 100% (no

damage), 75% (slight damage/ moderate damage) and 50% (extensive damage/ collapse).

The non-zero capacity corresponding to the bridge collapse damage state may seem sur-

prising at first glance. This is based on the argument that there are alternate routes (apart

from the freeways and highways considered in the model) that provide reduced access to


transportation services in the event of a freeway or a highway closure [Shiraki et al., 2007].

Such redundancies are prevalent in most transportation networks.

A network can have several bridges in a single link, and in such cases, the link capacity

is a function of the damage to all the bridges in the link. The current work assumes that the

link capacity reduction equals the average of the capacity reductions attributable to each

bridge in the link. This is a simplification, and further research is needed to handle the

presence of multiple bridges in a link. The post-earthquake network performance is then

computed by solving the user-equilibrium problem using the new set of link capacities,

and a new estimate of the total travel time in the network is obtained. It is to be noted

that the current work estimates the performance of the network only immediately after an

earthquake. The changes in the performance with network component restorations are not

considered here for simplicity.

6.6.3 Ground-motion hazard

The San Francisco Bay Area seismicity information is obtained from USGS [2003]. Ten

active faults and fault segments are considered. The characteristic magnitude-recurrence

relationship of Youngs and Coppersmith [1985] is used to model f (m) with the distribution

parameters specified by the USGS, and 5.0 considered to be the lower bound magnitude

of interest. The flattening of this magnitude distribution towards the maximum magnitude

value (Figure 6.1) is to account for the higher probability of occurrence of the characteristic

earthquake on the fault [Youngs and Coppersmith, 1985]. The ground-motion model of

Boore and Atkinson [2008] is used to obtain the median ground-motion intensities and the

standard deviations of the residuals needed in Equation 6.1.


Risk assessment using importance sampling

The IS framework requires that the parameters of the sampling distribution for the magni-

tude and the residuals be chosen reasonably in order to obtain reliable results efficiently.

The set of parameters includes the appropriate stratification for magnitudes, the mean-shift


for normalized inter-event residuals (msinter) and the mean-shift for normalized intra-event

residuals (msintra).

The stratification of the range of magnitudes is carried out so as to obtain a desired

histogram of magnitudes. The partition width is chosen to be 0.3 between 5.0 and 6.5, 0.15

between 6.5 and 7.3 and 0.05 beyond 7.3. The results obtained using the simulations are

not significantly affected by moderate variations in the partitions, suggesting that the strat-

ification will be effective as long as it is chosen to preferentially sample large magnitudes.

Normalized inter-event residuals are sampled using an msinter of 1.0. Using the procedure

described earlier, the value of msintra is fixed at 0.3.

The loss measure of interest here is the travel-time delay (i.e., the variable L denoting

loss measure in the previous section is the travel-time delay). Figure 6.3a shows the ex-

ceedance curve for travel-time delays obtained using the IS framework. This exceedance

curve is obtained by sampling 25 magnitudes, each of which is then positioned on the

active faults as described in Section 6.3.2, and 50 sets of inter and intra-event residuals

for each magnitude-location pair (resulting in a total of 12,500 maps). To validate the IS,

an exceedance curve is also estimated using the benchmark method (MCS). Strictly, the

benchmark approach should use MCS to sample the magnitudes and the ground-motion

residuals. This is computationally prohibitive, however, even for the aggregated network

and hence the benchmark approach used in the current study uses IS for generating the

magnitudes but MCS for the residuals. IS of a single random variable has been shown to be

effective in a wide variety of applications including lifeline risk assessment [Kiremidjian

et al., 2003], and so further validation is not needed. On the other hand, the simulation pro-

cedure for intra-event residuals involves the novel application of IS of a correlated vector of

random variables, and hence, is the focus of the validation study described in this section.

Figure 6.3a shows the exceedance curve obtained using IS for generating 25 magni-

tudes and MCS for generating 500 sets of inter and intra-event residuals per magnitude-

location pair, resulting in a total of 125,000 maps. As seen from the figure, the exceedance

curve obtained using the IS framework closely matches that obtained using the benchmark

method, indicating the accuracy of the results obtained using IS. This is further substan-

tiated by Figure 6.3b, which plots the estimated coefficient of variation (CoV) (computed

using Equations 6.19 and 6.20) of the exceedance rates obtained using the IS approach and


Figure 6.2: (a) San Francisco Bay Area transportation network (b) Aggregated network.

Figure 6.3: (a) Travel-time delay exceedance curves (b) Coefficient of variation of theannual exceedance rate (c) Comparison of the efficiency of MCS, IS and the combinationof K-means and IS (d) Travel-time delay exceedance curve obtained using the K-meansmethod.


the benchmark approach. It can be seen from the figure that the CoV values corresponding

to travel-time delays obtained using IS are comparable to those obtained using MCS even

though the IS uses one-tenth the number of simulations required by the MCS. Further, it

is also seen that using IS in place of MCS for simulating magnitudes typically reduces the

computational expense of the risk assessment by a factor of 10, and hence, the overall IS

framework reduces the number of computations required for the risk assessment by a fac-

tor of nearly 100. It is to be noted that IS produces unbiased risk estimates, and any minor

deviation between the IS and the MCS curves in Figure 6.3a is due to the small variances

in the risk estimates.

Risk assessment using IS and K-means clustering

The 12,500 maps obtained using IS are next grouped into 150 clusters using the K-means

method. A catalog is then developed by randomly sampling one map from each cluster

in accordance with the map weights as described in section 6.5. This catalog is used to

estimate the travel-time delay exceedance curve based on Equation 6.23, and the curve is

seen to match reasonably well with the exceedance curve obtained using the IS technique

(Figure 6.3a). Based on the authors’ experience, the deviation of this curve from the IS

curve at the large delay levels is a result of the variance of the exceedance rates rather than

any systematic deviation. The variance in the exceedance curves is a consequence of the

fact that the map sampled from each cluster is not identical to the other maps in the cluster

(although they are similar).

To ascertain the variance of the exceedance rates, the clustering and the map selection

processes are repeated several times in order to obtain multiple catalogs of 150 represen-

tative ground-motion intensities, which are then used for obtaining multiple exceedance

curves. The coefficient of variation of the exceedance rates are then computed from these

multiple exceedance curves and are plotted in Figure 6.3b. It can be seen that the CoV

values obtained using the 150 maps generated by the IS and K-means combination are

about three times larger than those obtained using the 12,500 IS maps and the 125,000

MCS maps. This is to be expected, though, on account of the large reduction in the number

of maps. The factor of three increase in the CoVs, however, is significantly smaller than

what can be expected if IS and MCS are used to obtain the 150 maps directly. This can be


seen from Figure 6.3b, which shows the large CoV values of the exceedance rates obtained

using 150 ground-motion maps selected directly using the IS and the MCS procedures. Al-

ternately, the relative performances of the IS and K-means combination, the IS method and

the MCS method can also be assessed by comparing the number of maps to be simulated

using these methods in order to achieve the same CoVs. It is seen that 3,500 IS maps and

11,750 MCS maps are necessary to produce similar CoVs (Figure 6.3c) achieved using the

150 IS and K-means combination maps.

Finally, Figure 6.3d shows the mean exceedance rates, along with the empirical 95 per-

centile (point-wise) confidence interval obtained using the K-means method. Also shown

in this figure is the exceedance curve obtained using the IS technique. The mean K-means

curve and the IS curve match very closely, indicating that the sampling and data reduction

procedure suggested in this work results in unbiased exceedance rates (This is also theo-

retically established in Appendix 6.8). These width of the confidence interval turns out to

be reasonably small, especially considering that the exceedance rates have been obtained

using only 150 intensity maps.

If the K-means clustering procedure is effective, intensity maps in a cluster will be sim-

ilar to each other. Therefore, the travel-time delays associated with all the maps in a cluster

should be similar to one another, and different from the travel-time delays associated with

the maps in other clusters. In other words, the mean travel-time delays computed using all

the maps in one cluster should be different from the mean from other clusters, while the

standard deviation of the travel-time delays in a cluster should be small as a result of the

similarity within a cluster. Conversely, ‘random clustering’ in which the maps obtained

from the IS are randomly placed in clusters irrespective of their properties would be very

inefficient. Figure 6.4 compares the mean and the standard deviation of cluster travel-time

delays, obtained using K-means clustering and random clustering. The smoothly varying

cluster means obtained using K-means as compared to the nearly uniform means obtained

using random clustering shows that the K-means has been successful in separating dissim-

ilar intensity maps. Similarly, the cluster standard deviations obtained using K-means are

considerably smaller than the standard deviations obtained using random clustering for the

most part (and are large for larger cluster numbers because all delays in these clusters are

large). The occasional spikes in the standard deviations are a result of small sample sizes


in some clusters.

In summary, the exceedance curves obtained and the results from the tests for the ef-

ficiency of K-means clustering indicate that the clustering method has been successful in

identifying and grouping similar maps together. As a consequence, substantial computa-

tional savings can be achieved by eliminating redundant (similar) maps, without consider-

ably affecting the accuracy of the exceedance rates. It is to be noted that this approach is

primarily meant for modeling the upper tail of the risk curve accurately. A conventional

Monte Carlo approach might be more appropriate when more frequently exceeded losses

such as median loss is of interest.

Hazard consistency

The proposed framework not only produces reasonably accurate loss estimates, but also

intensity maps that are hazard consistent. In other words, the site hazard curves obtained

based on the final catalog of intensity maps match the site ground-motion hazard curves

obtained from the fault and the ground-motion model using numerical integration (i.e.,

traditional PSHA). Figures 6.5a and b show the site hazard curves at two different sites

obtained using numerical integration, importance sampling (for magnitudes and residuals)

and the combination of importance sampling and K-means clustering. It can be seen that the

sampling and clustering framework reasonably reproduces the site ground-motion hazard

obtained through numerical integration.

6.6.5 Importance of modeling ground-motion uncertainties and spa-tial correlations

The transportation network risk assessment is repeated assuming uncorrelated intra-event

residuals, and a new exceedance curve is obtained, and plotted in Figure 6.6. It can be

seen that the risk is considerably underestimated when the spatial correlations are ignored.

Further, some past risk assessments have completely ignored the uncertainty in the ground-

motion intensities (i.e., median intensity maps are used, and inter- and intra-event residuals

are ignored). A risk assessment carried out this way, and plotted in Figure 6.6 shows that

the risk is even more substantially underestimated in this case. This happens because the


Figure 6.4: (a) Mean of travel-time delays within a cluster (b) Standard deviation of travel-time delays within a cluster. With both clustering methods, cluster numbers are assigned inorder of increasing mean travel-time delay within the cluster for plotting purposes.

Figure 6.5: Comparison of site hazard curves obtained at two sample sites using the sam-pling framework with that obtained using numerical integration. (a) Sample site 1 and (b)Sample site 2.


possibility of observing above-median ground-motion intensities during a given earthquake

is not considered. Such simplifications clearly introduce significant errors into the risk

calculations, and should thus be avoided.

6.7 Conclusions

An efficient simulation-based framework based on importance sampling and K-means clus-

tering has been proposed, that can be used for the seismic risk assessment of lifelines.

The framework can be used for developing a small, but stochastically-representative cat-

alog of ground-motion intensity maps that can be used for performing lifeline risk as-

sessments. The importance sampling technique is used to preferentially sample important

ground-motion intensity maps, and the K-means clustering technique is used to identify

and combine redundant maps. It is shown theoretically and empirically that the risk esti-

mates obtained using these techniques are unbiased. The study proposes importance sam-

pling schemes that can be used for sampling earthquake magnitudes, rupture locations,

inter-event residuals and spatially correlated maps of intra-event residuals. Magnitudes are

sampled by first stratifying the magnitude range of interest into smaller partitions and by

selecting one magnitude from each partition. The partitions are made narrower at larger

magnitudes to ensure that larger magnitudes are preferentially sampled. The normalized

residuals are sampled from a normal distribution with a positive mean, rather than a zero

mean, to sample more large positive residuals. Techniques are also suggested to estimate

the optimal parameters of these alternate sampling density functions. The proposed frame-

work was used to evaluate the exceedance rates of various travel-time delays on an ag-

gregated form of the San Francisco Bay Area transportation network. Simplified trans-

portation network analysis models were used to illustrate the feasibility of the proposed

framework. The exceedance rates were obtained using a catalog of 150 maps generated

using the combination of importance sampling and K-means clustering, and were shown

to be in good agreement with those obtained using the conventional Monte Carlo simula-

tion method. Therefore, the proposed techniques can reduce the computational expense of

a simulation-based risk assessment by several orders of magnitude, making it practically

feasible. The efficiency of the proposed technique was compared to that of conventional


techniques using the coefficient of variation (CoV) of the exceedance rates. It was shown

that the CoVs achieved using the 150 maps obtained from the combination of importance-

sampling and K-means clustering can only be reproduced by 3,500 importance-sampling

maps and 11,750 MCS maps (conventional MCS for residuals and importance sampling for

magnitudes), thereby indicating the efficiency of the proposed technique. The study also

showed that the proposed framework automatically produces intensity maps that are hazard

consistent. Finally, the study showed that the uncertainties in ground-motion intensities and

the spatial correlations between ground-motion intensities at multiple sites must be mod-

eled in order to avoid introducing significant errors into the lifeline risk calculations. For

the network considered in this work, ignoring spatial correlations results in about a 30 %

reduction in the estimated travel-time delays at small annual exceedance rates (10−6/year),

while ignoring uncertainties results in about a 70 % reduction in the estimated travel-travel

time delays at small exceedance rates.

6.8 Appendix: Proof that the exceedance rates obtained

using IS and K-means clustering are unbiased

This section illustrates that the loss (e.g., travel-time delay) exceedance rates obtained using

a catalog of ground-motion intensities generated by the IS and K-means framework are un-

biased. Since the importance sampling procedure produces unbiased estimates [Fishman,

2006], it will suffice to establish that the exceedance rates obtained using the K-means

clustered catalog of maps are unbiased estimators of the exceedance rates obtained using

the IS maps. This proof will further support the empirical observation that the example

exceedance rates from the different procedures are equivalent.

Let l1, l2, · · · , lr denote the loss measures (e.g., travel-time delay in a transportation

network) corresponding to the r intensity maps obtained using importance sampling. Let

Λ1,Λ2, · · · ,Λr denote the weights corresponding to the maps as defined in Equation 6.18.

Let PIS denote the exceedance probability curve obtained using the IS maps (Equation

6.19). Assume that the r maps are grouped into K clusters. (This proof does not require


knowledge about the clustering technique used.) Let l(c) be the travel-time delay in the net-

work corresponding to the map selected from cluster c. The exceedance probability curve

(PKM(L ≥ u)) can be obtained from the catalog of[l(1), l(2), · · · , l(K)

]based on Equation

6.23.

Unbiasedness can be established by showing that the expected value of PKM(L ≥ u)

equals PIS(L≥ u). The expected value of PKM(L≥ u) is computed using the law of iterated

expectations, by first conditioning it on a possible grouping G (i.e., a possible grouping

of maps into clusters obtained using the clustering method), and then by computing the

expectation over all possible groupings. The following equations describe this procedure:

E[PKM(L≥ u)

]= E

[∑

Kc=1 I

(l(c) ≥ u

)∑i∈c Λi

∑Kc=1 ∑i∈c Λi

](6.30)

= E

[∑

Kc=1 I

(l(c) ≥ u

)∑i∈c Λi

∑ri=1 Λi

]

= EG

{E

[∑

Kc=1 I

(l(c) ≥ u

)∑i∈c Λi

∑ri=1 Λi

∣∣∣G]}

= EG

[1

∑ri=1 Λi

K

∑c=1

P(

l(c) ≥ u∣∣∣G)∑

i∈cΛi

]

= EG

[1

∑ri=1 Λi

K

∑c=1

∑ j∈c I(l j ≥ u

)Λ j

∑ j∈c Λ j∑i∈c

Λi

]

= EG

[1

∑ri=1 Λi

K

∑c=1

∑j∈c

I(l j ≥ u

)Λ j

]

=1

∑ri=1 Λi

K

∑c=1

∑j∈c

I(l j ≥ u

)Λ j

=∑

ri=1 I(li ≥ u)Λi

∑ri=1 Λi

= PIS(L≥ u)

This shows that the exceedance rates obtained using the small catalog of ground-motion

intensities are unbiased.


6.9 Appendix: Improving the computational efficiency of

the K-means clustering method

Clustering a large number of intensity maps (e.g., 12,500) in a single step may be compu-

tationally prohibitive on computers with limited memory and processing ability, because

clustering involves repetitive computations of the distance between each map and the clus-

ter centroids. In such cases, the authors propose the following two-step clustering technique

in which the maps are preliminarily grouped into clusters using a simplified distance mea-

sure, followed by a rigorous final clustering step using the distance measure defined in

Equation 6.22. This two-step process is described below.

In the preliminary clustering step, the intensity maps are grouped into a small number

of preliminary clusters with the distance between map Sa j and centroid Ci computed as(∑

pq=1 Saq j −∑

pq=1Cqi

)2. In other words, the distance measure is based on the sum of the

intensities corresponding to the intensity map. The sum of the intensities is chosen as the

basis for clustering since it has been seen in past research [Campbell and Seligson, 2003]

and in the current research work to be a reasonable indicator of the risk associated with an

intensity map. Further, the K-means method is extremely fast when the distance is based

on a single parameter.

The final clustering step is used to refine the preliminary clusters, and involves further

clustering within each preliminary cluster using the distance measure defined in Equation

6.22. If 50 preliminary clusters are used, each of these could be subdivided into 3 clusters

using the K-means method. Even though the more rigorous distance measure is used in this

step, it is much faster because the final clustering is based on a far fewer number of maps

stored within each preliminary cluster. Further, the memory demand in this case is much

smaller than when clustering is carried out in a single step.

Figure 6.7 shows the (point-wise) confidence intervals of the travel-time delay ex-

ceedance curves obtained using the two-step clustering procedure, where 50 preliminary

clusters are each subdivided in to three final clusters. It can be seen from Figures 6.3d

and 6.7 that the results obtained using both the single-step and the two-step clustering ap-

proaches are essentially identical. For this application, the two-step clustering procedure is

five times faster than the single-step clustering procedure.


Figure 6.6: Exceedance curves obtained using simplifying assumptions.

Figure 6.7: Travel-time delay exceedance curve obtained using the two-step clusteringtechnique.

Chapter 7

Lifeline performance assessment usingstatistical learning techniques

7.1 Abstract

Chapter 6 proposed a simulation-based method involving importance sampling and K-

means clustering to efficiently generate a small catalog of stochastically-representative

ground-motion maps that can be used for lifeline risk assessment. The current study fo-

cuses on the highly computationally demanding task of estimating the confidence interval

for the risk estimates obtained using this simulation-based method. Estimating the confi-

dence intervals is computationally intensive because it requires repetitive risk calculations

(in order to estimate a variance for the risk estimates) that in turn involve numerous life-

line performance evaluations. In order to reduce the computational demand, the catalog

of ground-motion maps generated in Chapter 6 is used in conjunction with a statistical

learning technique called Multivariate Adaptive Regression Trees (MART) to develop an

approximate relationship between the lifeline performance and the ground-motion inten-

sities during an earthquake. The lifeline performance predicted by this relationship can

be used in place of the exact lifeline performance (the evaluation of which is intensive) to

expedite the computation of several lifeline risk-related parameters, including confidence

intervals.

144

CHAPTER 7. LEARNING TECHNIQUES FOR PERFORMANCE ASSESSMENT 145

Figure 7.1: Sample ground-motion map corresponding to an earthquake on the San Andreasfault. A map is a collection of ground movement levels (ground-motion intensities) at allthe sites of interest. The sites of interest, in this case, are located in the San Francisco BayArea.

7.2 Introduction

Probabilistic seismic risk assessment for lifelines is less straightforward than for individ-

ual structures. Lifelines, by virtue of their large size and geographic spread are affected

by earthquakes that originate on several faults, which necessitates the consideration of nu-

merous probable future earthquake scenarios. Further, lifeline risk assessment is based

on a large vector of spatially-correlated ground-motion intensities. The link between the

ground-motion intensities at the sites and the performance of the lifeline is usually not

available in closed form. These complexities make it difficult to use analytical frameworks

for lifeline risk assessment. As a result, Monte Carlo simulation (MCS)-based methods are

commonly used for characterizing spatial ground motions and for estimating lifeline risk

[e.g., Campbell and Seligson, 2003, Crowley and Bommer, 2006, Kiremidjian et al., 2007,

Shiraki et al., 2007].


In the MCS approach, several possible future ground-motion maps (which are collec-

tions of ground-motion intensities at all the sites of interest) are probabilistically gener-

ated, and the performance of the lifeline is evaluated under each intensity map. (A sample

ground-motion map due to a magnitude 8 earthquake on the San Andreas fault is shown

in Figure 7.1. This particular map has been simulated without consideration of local-site

effects purely for illustration purposes, but the studies carried out in this thesis include

local-site effects.) This approach is, however, highly computationally intensive, primarily

because it involves repeated evaluations of lifeline performance under a large number of

simulated ground-motion intensity maps. In the past, researchers have used several sim-

plifying assumptions (e.g., a single dominating scenario earthquake, deterministic ground-

motion intensities, absence of spatial correlation) in order to reduce the required number of

simulations. These simplifications can, however, lead to inaccuracies in the risk assessment

results, as discussed elsewhere in the thesis.

Chapter 6 [Jayaram and Baker, 2010] proposed a simulation-based method involving

importance sampling and K-means clustering to efficiently generate a small catalog of

stochastically-representative ground-motion maps that can be used for lifeline risk assess-

ment. Importance sampling is used to preferentially sample events with extreme ground-

motion intensities that contribute to the lifeline risk. K-means clustering is used to eliminate

redundant intensity maps (i.e., maps that are similar to other maps). They showed that the

risk estimates obtained using this small catalog are in good agreement with those obtained

using the conventional MCS that uses a much larger number of simulations.

The current study focuses on the highly-computationally demanding task of estimat-

ing the confidence intervals for the risk estimates obtained using the above described

simulation-based method. Estimating the confidence intervals is computationally inten-

sive because it requires repetitive risk calculations (in order to estimate a variance for the

risk estimates) that involves numerous lifeline performance evaluations. In order to reduce

the computational demand, the catalog of ground-motion maps generated in Chapter 6 is

used in conjunction with a statistical learning technique called Multivariate Adaptive Re-

gression Trees (MART) [Friedman, 1999] to develop an approximate relationship between

the lifeline performance and the ground-motion intensities during an earthquake. The life-

line performance predicted by this relationship can be used in place of the exact lifeline


performance (the evaluation of which is intensive) to expedite the computation of several

lifeline risk-related parameters. One notable work in this regard is that of Guikema [2009]

who proposed to use approximate regression relationships for evaluating the lifeline per-

formance. That work, however, is purely conceptual and does not give concrete examples.

Chapter 6 estimated the travel-time delay exceedance curves for the San Francisco Bay

Area transportation network. In this study, the performance relationship developed using

MART is used for estimating confidence intervals for these curves. It is seen that the

confidence intervals obtained using MART match well with those obtained using the exact

loss function.

7.3 Brief introduction to ground-motion map sampling

This section describes the conventional Monte Carlo ground-motion sampling procedure

as well as the importance sampling and K-means clustering procedures used in Chapter 6.

7.3.1 Conventional MCS of ground-motion maps

The distribution of the ground-motion intensity at any particular site is predicted using a

ground-motion model, which takes the following form [e.g., Boore and Atkinson, 2008,


ln(Sai j) = ln(Sai j

)+σi jεi j + τi jηi j (7.1)

where Sai j denotes the spectral acceleration (at the period of interest) at site i during earth-

quake j; Sai j denotes the predicted (by the ground-motion model) median spectral accel-

eration, which depends on parameters such as magnitude, distance, period and local-site

conditions; εi j denotes the normalized intra-event residual and ηi j denotes the normalized

inter-event residual. Both εi j and ηi j are univariate normal random variables with zero

mean and unit standard deviation. σi j and τi j are standard deviation terms that are es-

timated as part of the ground-motion model and are functions of the spectral period of

interest, and in some models also functions of the earthquake magnitude and the distance

of the site from the rupture.


Probabilistic sampling of ground-motion intensity at multiple sites involves the follow-

ing steps [Crowley and Bommer, 2006, Jayaram and Baker, 2010]:

Step 1: Use MCS to generate earthquakes of varying magnitudes on the active faults in

the region, considering appropriate magnitude-recurrence relationships (e.g., the Gutenberg

Richter relationship).

Step 2: Using a ground-motion model (Equation 7.1), obtain the median ground-motion

intensities (Sai j) and the standard deviations of the inter-event and the intra-event residuals

(σi j and τi j) at all the sites.

Step 3: Generate the normalized inter-event residual term (ηi j) by sampling from the

univariate normal distribution.

Step 4: Simulate the normalized intra-event residuals (εi j’s) using the parameters pre-

dicted by the ground-motion model. Chapter 2 [Jayaram and Baker, 2008] showed that a

vector of spatially-distributed normalized intra-event residuals εεε jjj =(ε1 j,ε2 j, · · · ,εp j

)fol-

lows a multivariate normal distribution. Hence, the distribution of εεε jjj can be completely

defined using the mean (zero) and standard deviation (one) of εi j, and the correlation be-

tween all εi1 j and εi2 j pairs. The correlations between the residuals can be obtained from a

predictive model calibrated using past ground-motion intensity observations [Jayaram and

Baker, 2009a, Wang and Takada, 2005].

Step 5: Combine the median intensities, the normalized intra-event residuals and the

normalized inter-event residual for each earthquake in accordance with Equation 7.1 to

obtain ground-motion intensity maps (i.e., obtain Sa j =(Sa1 j ,Sa2 j , · · · ,Sap j

)).

7.3.2 Importance sampling of ground-motion maps

Most of the past research works use random MCS (based on the original distributions of

magnitudes and residuals) for simulating ground-motion maps (with the notable exception

of Kiremidjian et al. [2007] who used importance sampling for magnitudes). While small

magnitude earthquakes and average values of residuals are highly probable, they are less

interesting for risk assessment purposes, where we are interested in large values of these

random variables. Hence, Chapter 6 proposed to sample these random variables prefer-

entially from the tails of their distributions by sampling from alternate distributions. The


Figure 7.2: (a) Stratified sampling of earthquake magnitudes (b) Importance sampling ofresiduals.

magnitudes were simulated using stratified sampling, where the entire range of magnitudes

was stratified into bins, with the bin width being large at small magnitudes and small at

large magnitudes (Figure 7.2a), and one magnitude was selected from each bin. This en-

sures an adequate sampling of large magnitude events. The residuals were sampled from

a multivariate normal distribution with positive means for the residuals rather than zero

means (Figure 7.2b) (in order to sample large values of residuals). Overall, the large mag-

nitude events combined with large positive residuals lead to large values of ground-motion

intensities in the sampled maps. It was seen that the importance sampling procedure re-

sults in two orders of magnitude reduction in the number of samples needed for the risk

assessment.

7.3.3 K-means clustering

The use of importance sampling causes significant improvement in the computational ef-

ficiency of the simulation procedure, but the number of required IS intensity maps is still

large and may pose a heavy computational burden. The K-means clustering [McQueen,

1967] was used in Chapter 6 as a data reduction technique in order to develop a smaller

catalog of maps by ‘clustering’ simulated ground-motion intensity maps with similar prop-

erties (i.e., similar spectral acceleration values at the sites of interest), and subsequently

using only one map from each cluster. The clustering was performed using the K-means


algorithm, which groups a set of observations into K clusters such that the dissimilarity

between the observations (typically measured by the Euclidean distance) within a cluster is

minimized [McQueen, 1967].

In its simplest version, the K-means algorithm comprises of the following four steps:

Step 1: Pick K maps to denote the initial cluster centroids (This selection can be done

randomly.)

Step 2: Assign each map to the cluster with the closest centroid.

Step 3: Recalculate the centroid of each cluster after the assignments.

Step 4: Repeat steps 2 and 3 until no more reassignments take place.

For instance, Figure 7.3 shows four simulated ground-motion maps, two of which can

be grouped together due to their similarity. Once all the maps are clustered, the final catalog

is developed by selecting one map from each cluster, which is used to represent all maps in

that cluster on account of the similarity of the maps within a cluster. In other words, if the

map selected from a cluster produces loss l, it is assumed that all other maps in the cluster

produce the same loss l (by virtue of similarity). The maps in this smaller catalog can be

used in place of the maps generated using importance sampling for the loss assessment,

which results in a dramatic improvement in the computational efficiency.

Both the importance sampling and the K-means clustering methods make the final set

of maps unequiprobable (i.e., each map is not equally likely). Hence, suitable weights (e.g.,

importance sampling weights) are attributed to these maps so that risk estimates obtained

using these maps are unbiased. The details of these weight calculations and a proof of

unbiasedness can be found in Chapter 6.

7.4 Confidence intervals for lifeline risk estimates

Chapter 6 used the catalog of maps generated using IS and K-means (described above) to

obtain the travel-time delay exceedance curve (i.e., rates of exceedance of various travel-

time delays) for the San Francisco Bay Area transportation network. In this work, it is of

interest to obtain the confidence intervals for the exceedance rates in a computationally-

efficient manner.


Figure 7.3: Four simulated ground-motion maps, two of which are reasonably similar andgrouped together into one cluster.


Figure 7.4: (a) The San Francisco Bay Area transportation network (b) Aggregated model.

7.4.1 Network data

This section describes the properties of the San Francisco Bay Area transportation net-

work used as the sample lifeline in this work. The relevant network data were obtained

from Stergiou and Kiremidjian [2006]. Figure 7.4a shows the Metropolitan Transportation

Commission (MTC) San Francisco Bay Area highway network, which consists of 29,804

links (roads) and 10,647 nodes. The network also consists of 1,125 bridges from the five

counties of the Bay Area. The traffic demand-supply data were obtained from the 1990

MTC household survey [Purvis, 1999].

Analyzing the performance of a network as large and complete as the San Francisco Bay

Area transportation network under maps generated, in particular, by conventional MCS

is extremely computationally intensive. Therefore, an aggregated representation of the

Bay Area network is used for this example application. The aggregated network consists

predominantly of freeways and expressways, along with the ramps linking the freeways

and expressways. The nodes are placed at locations where links intersect or change in

characteristics (e.g., change in the number of lanes). The aggregated network comprises of

586 links and 310 nodes (Figure 6.2b). While the performance of the aggregated network


may or may not be similar to that of the full network, the aggregated network should serve

as a reasonably realistic and complex test case for the proposed framework. If desired, the

methods developed here can be applied to the complete network as well.

7.4.2 Ground-motion hazard data

The San Francisco Bay Area seismicity information is obtained from USGS [2003]. Ten

active faults and fault segments are considered in the current work. The characteristic

magnitude-recurrence relationship of Youngs and Coppersmith [1985] is used to model the

density function for magnitudes with the distribution parameters specified by the USGS.

The ground-motion model of Boore and Atkinson [2008] is used to obtain the median

ground-motion intensities and the standard deviations of the residuals needed in Equation

7.1.

7.4.3 Statistical description of the problem

Let XXX denote the ground-motion intensities at all the sites of interest in one ground-motion

map. The number of sites equals 1,125 (the number of bridges in the network) and hence,

XXX is a reasonably large-dimensional vector. Let xxx1,xxx2, · · · ,xxxm denote various importance

sampling realizations of XXX . Assume that these realizations are segmented using K-means

clustering into K clusters. (The clustering attempts to minimize the sum of the Euclidean

distances between the vectors in the clusters from the cluster medians as described in Sec-

tion 6.5 of this thesis.) The lifeline losses are then computed using just one map sampled

from each cluster xxx(1),xxx(2), · · · ,xxx(K) in place of the m original samples. The loss estimates

are appropriately weighted (weights are denoted by w(i)) in order to ensure statistical con-

sistency. The complete details about the weights can be found in Chapter 6.

It is of interest to empirically estimate the exceedance curve of a loss function L (e.g.,

travel-time delay), and the corresponding confidence interval (CI). The rate of exceedance

of a loss value equals the rate of occurrence of earthquakes multiplied by the probability of

exceedance of the loss value (P(L > l)). The probability of exceedance can be estimated


Figure 7.5: Exceedance rates of travel-time delays.

as follows:

P(L > l) =K

∑i=1

I[L(xxx(i))> l

]w(i) (7.2)

where I[.] is an indicator variable and the w’s are the weights referred to earlier.

A sample exceedance curve (which provides the rate of observing various levels of

travel-time delays on the aggregated transportation network) is shown in Figure 7.5. The

loss function L(xxx) used in this case is the travel-time delay induced by ground-motion map

xxx. (The structural damage to the bridges increases the free-flow travel times in the roads,

and increases the overall travel time in the network.) The network delays are computed

using the static user-equilibrium framework [Frank and Wolfe, 1956]. This study intends

to obtain a pointwise CI for the exceedance rates of losses.

7.4.4 Confidence intervals using bootstrap

The confidence intervals (CI) for the risk estimates can be obtained by repeating the entire

risk assessment process several times in order to obtain multiple exceedance curves, and

by estimating the CIs as the quantiles of these exceedance curves. In other words, this

procedure involves repeating the IS and the K-means clustering procedures multiple times

to obtain multiple catalogs of 150 ground-motion maps each. Each catalog is used to obtain

one exceedance curve, and the CIs are estimated as the quantiles of this set of exceedance


curves.

Applying IS multiple times can be computationally-inefficient, therefore the current

work uses bootstrap resampling to simplify this procedure. For simplicity, denote the

collection of the original set of importance sampled maps, (xxx1,xxx2, · · · ,xxxm), as xxx. The

first step involved in the procedure is to obtain B bootstrap realizations of xxx (denoted xxx∗bfor b ∈ [1,B]). A bootstrap realization of xxx is a set of maps sampled with replacement

from (xxx1,xxx2, · · · ,xxxm) [Efron and Tibshirani, 1997] (In other words, the sets xxx∗b’s are ob-

tained by bootstrapping the original set, rather than by resampling using IS.) The sec-

ond step is to cluster xxx∗b into 150 clusters, pick one map from each cluster, and compute

θ ∗b (l) = P[L(xxx∗(b))> l

](where xxx∗(b) denotes the 150 maps obtained after clustering xxx∗b and

selecting one map from each cluster), for all b and all l values of interest. The collection

of θ ∗(b)(l)’s at all values of l denotes the probability of exceedance curve obtained using

the bootstrapped and clustered set of ground-motion maps xxx∗(b). The point-wise bootstrap

confidence interval is then estimated as the quantiles of the replicates (i.e., θ ∗(b)(l)’s) for

each value of l [Davison and Hinkley, 1997]. In essence, this procedure involves repeat-

ing (using bootstrap) the simulation procedure several times, and obtaining the confidence

intervals using quantiles of the collection of exceedance curves obtained.

The biggest hurdle in the above procedure is the computation of P[L(xxx∗(b))> l

]B times,

given that it is computationally intensive to estimate this even once (which is the reason why

the importance sampling and the clustering are used in the first place). This is not the case

for the aggregated network used in this study, but is certainly true for real-life networks.

Hence, it is intended to use an approximate loss estimation calculation obtained using a

non-parametric regression between the lifeline loss (L) and the ground-motion intensities

(xxx∗b). This approximate loss function is used in place of the exact loss function for evalu-

ating B values of P[L(xxx∗(b))> l

]. The procedure used for obtaining the approximate loss

function is described in the next section.


7.4.5 Approximate loss estimation using non-parametric regression

Application of MART to loss estimation

Multiple additive regression trees (MART) is a methodology for predictive data mining

(regression and classification). For a set of input ground-motion maps xxx’s ∈ xxx and corre-

sponding loss values L’s, the goal is to find a function F(xxx) that maps xxx to L, such that over

the joint distribution of all input-loss pairs, the expected value of the squared prediction er-

ror is minimized. MART is a gradient boosting algorithm [Friedman, 1999] that expresses

this function F as an additive expansion of the form

L = F(xxx) =P

∑p=0

βph(xxx;aaap) (7.3)

where L denotes the predicted loss value, the functions h(xxx;aaap) are called ‘base learners’

which are functions of xxx with parameters aaap. In the case of MART, the base learners

are regression trees [Brieman et al., 1983]. It is advantageous to use MART over other

regression techniques for approximating the loss function for the following reasons: (a)

there are considerably more input variables (1,125) than data points (150) and hence, it

is infeasible to use classical regression for this purpose (requires regularized regression),

(b) using a non-parametric model allows for quicker model fitting, (c) MART is capable

of modeling highly nonlinear behavior, and (d) MART is resistant against moderate to

heavy contamination by bad measurements (outliers) of the predictors and/or the responses,

missing values, and to the inclusion of potentially large numbers of irrelevant predictor

variables that have little or no effect on the response [Friedman, 2002].

The MART prediction model is developed based on the 150 intensity maps obtained

using the IS and the K-means approaches. Figure 7.6a shows the comparison of the pre-

dicted losses and the exact losses for a cross-validation set of maps. It is to be noted that

the cross-validation set used to develop the model is chosen to be different from the train-

ing set in order to obtain an unbiased estimate of the accuracy of the model. The overall

prediction accuracy is quite reasonable, but the predictions show small biases. The plot of

residuals (computed as the predicted delay (L) - the exact delay (L)) versus predicted loss

values (Figure 7.6b) shows that small losses are slightly over-predicted, while large losses


Figure 7.6: (a) Predicted vs. exact delay values (b) Prediction residuals.

are substantially under-predicted. In order to not adversely affect the prediction accuracy,

a bias correction needs to be applied to the predictions from the MART model. The next

subsection describes the bias correction procedure used in the current work.

Bias correction using LOESS

The bias correction procedure involves estimating the residual (bias) as a function of the

predicted delay, and subtracting it from the predicted value. The residual is fit as a function

of the predicted delay using locally weighted scatterplot smoothing (LOESS) [Efron and

Tibshirani, 1997], as shown in Figure 7.7a. As expected from previous comparisons of

exact and predicted losses, the residual is positive for small loss values and negative for

large loss values, and the LOESS fit captures this effect well. The corrected loss predictions

are obtained by subtracting out the residual (provided by LOESS) from the MART loss

prediction. A comparison of these corrected predictions against the exact values is shown

in Figure 7.7b. The figure shows a significantly better match between the exact and the

predicted values. For further validation, the loss exceedance curves are estimated using the

exact and the approximate loss functions (after bias correction), and are shown in Figure

7.8. The figure shows a very good match between the two curves illustrating the accuracy

of the loss prediction model developed.


Figure 7.7: (a) A LOESS fit to the prediction residuals (b) Predicted and exact delay valuesafter bias correction.


Figure 7.8: Two sample exceedance curves obtained using the exact and the approximateloss functions (after bias correction).

Study of residuals

The predictions from MART and LOESS are not exact, as evidenced by the scatter around

the predictions. Figure 7.9a shows the plot of residuals (i.e., observed value - predicted

value) versus the predicted loss values. While using the predictive model, it is important to

appropriately account for this variability, particularly while estimating confidence intervals,

since the smoothed predictions (obtained when the residuals are ignored) will result in an

underestimation of the variance and the width of the CI of the risk estimates.

Figure 7.9a shows that the residuals are heteroscedastic (i.e., the standard deviation

of the residuals varies with the predicted value), and that the standard deviation of the

residuals increases linearly with the predicted loss. In order to model these residuals, they

are first normalized by the predicted losses (i.e., the residuals are divided by the predicted

losses) and these normalized residuals shown in Figure 7.9b are seen to be homoscedastic.

A normal Q-Q plot of these normalized residuals, shown in Figure 7.10, indicates that the

residuals can be reasonably assumed to follow a normal distribution (since the deviation

from the 45◦ straight line is negligible). Further, the standard deviation of the normalized

residuals is estimated to be 0.27 and hence, the residuals are modeled as follows:

ε ∼ N(0,0.27F(xxx)) (7.4)


Figure 7.9: (a) Residuals from the prediction model (b) Residuals normalized (divided) bythe predicted delays.

Figure 7.10: Normal Q-Q plot of the residuals.


where ε denotes the residual, N(0,0.27F(xxx)) denotes the normal distribution with mean

0 and standard deviation 0.27F(xxx) and F(xxx) denotes the predicted loss for ground-motion

map xxx.

Summary of the loss prediction procedure

For a given ground-motion map xxx, the approximate loss is evaluated as follows:

(a) Use MART to obtain F1(xxx), which is a biased estimate of the loss.

(b) Estimate the bias using the LOESS fit: B(F1(xxx))

(c) Obtain the bias-corrected prediction: F2(xxx) = F1(xxx)− B(F1(xxx))

(d) Simulate a residual (e) from the univariate normal distribution N(0,0.27F2(xxx))

(e) Obtain the final estimate of the loss: F(xxx) = F2(xxx)+ e

Discussion: Importance of data selection for training the MART model

This section illustrates the reason behind obtaining a reasonably good MART fit despite us-

ing only 150 training samples. The good fit is primarily because the importance sampling

and the K-means clustering procedures that are used for selecting the training catalog of

150 maps select highly dissimilar maps that cover almost all the intensity values (even rare

intensities) of interest to the decision maker. In other words, the 150 maps are fairly rep-

resentative of the ground-motion hazard in the region. This does not happen, for instance,

when the maps are selected using random MCS. In order to illustrate this, 150 maps were

sampled using random MCS and are used to fit a MART model. The comparison between

the exact and the predicted losses from this new MART model for the cross-validation set

used earlier in Section 7.4.5 is shown in Figure 7.11. The random MCS method samples

a lot of ground-motion maps with small but frequently-observed ground-motion intensi-

ties that correspond to very small travel-time delays. As a result, the model performs very

poorly while predicting the losses due to the large intensity maps present in the cross-

validation set (unlike in the training set).


Figure 7.11: MART model fitted using 150 MCS maps.

7.4.6 Bootstrap confidence intervals estimated using the exact and theapproximate loss functions

Results and discussion

Bootstrap confidence intervals are estimated for the travel-time delay exceedance curve

using the procedure described in Section 7.4.4. In summary, the maps obtained using im-

portance sampling (12,500 importance sampled maps were used in Chapter 6) are first

bootstrapped (sampled with replacement) to obtain 1000 sets of 12,500 maps each (de-

noted as xxx∗b in Section 7.4.4). Each of these 1000 sets are then segmented into 150 clusters

each using the K-means clustering procedure. 150 maps are drawn (one from each cluster)

from each set (denoted as xxx∗(b) in Section 7.4.4), and are used for obtaining the exceedance

curve for that set (denoted as L(xxx∗(b)) in Section 7.4.4). Figure 7.13a and b show the 1000

exceedance curves obtained using the exact and approximate (MART+LOESS) loss func-

tions respectively. The point-wise CIs for these curves are estimated as the quantiles of the

1000 loss curves at each loss level. This procedure is summarized in Figure 7.12.

Figure 7.14a shows the CIs obtained using the exact and the approximate loss functions.


Figure 7.12: Methodology for estimating bootstrap confidence intervals for the loss curves.

Figure 7.13: 1000 bootstrapped exceedance curves obtained using the (a) exact loss func-tion (b) approximate loss function.


Figure 7.14: Bootstrap confidence intervals.

The two curves do not match perfectly, but are reasonably close to one another. Given that

it will be computationally almost impossible to estimate the CI using the exact loss function

in practice, the CI obtained using MART+LOESS is a reasonable substitute.

It was mentioned earlier that it is important to model the residuals in the MART +

LOESS predictions in order to obtain accurate CIs. This is illustrated by Figure 7.14b,

which shows the CIs obtained using the exact and the approximate loss functions, but

without accounting for the residuals. The CI estimated using the predicted losses is consid-

erably narrower than the CI estimated using the exact losses, indicating that the ‘smoothed’

prediction results in an underestimation of the variance of the risk estimates. (The addi-

tional jaggedness seen is due to the use of only 200 bootstrap samples while estimating the

approximate CI.)

Sensitivity to the number of bootstrap samples

Let B denote the number of bootstrap samples used for estimating the CI. Efron and Tib-

shirani [1997] recommend a B value of 1000 for obtaining a robust CI. Figure 7.15 shows


Figure 7.15: Bootstrap confidence intervals.

the CIs obtained using 20, 200 and 1000 bootstrap samples. It is seen that the CIs ob-

tained using 20 and 200 samples are highly jagged. It is also seen that (not illustrated by

Figure 7.15) the CI obtained using only 20 or 200 samples vary from one computation to

another. Overall, there is sufficient evidence to conclude that a B value of 1000 is optimal

for computing the CI.

Balanced bootstrap confidence interval

One of the techniques that can be adopted to reduce the number of bootstrap samples is

the balanced bootstrap method [Davison et al., 1986]. This method involves simulating

bootstrap samples such that each sample observation is used equally often. It has been

seen in past works that balanced bootstrap improves on ordinary uniform resampling when

employed to estimate distribution functions or quantiles [Hall, 2005], and hence is relevant

while estimating CIs.

In this study, 200 balanced bootstrap samples were generated (each with 12,500 maps),

and are used for estimating 200 exceedance curves. The point-wise confidence interval

obtained from these curves is shown in Figure 7.16. It can be seen that this CI is less

jagged than that obtained using 200 uniform bootstrap samples (though still not as good as

the CI obtained using 1000 uniform samples).


Figure 7.16: Balanced bootstrap confidence intervals.

7.5 Conclusions

The current study focused on the highly-computationally demanding task of estimating the

confidence interval for lifeline risk estimates. Estimating the confidence intervals is com-

putationally intensive because it requires repetitive risk calculations (in order to estimate

a variance for the risk estimates) that involves numerous lifeline performance evaluations.

In order to reduce the computational demand, the stochastically-representative catalog of

ground-motion maps generated in Chapter 6 using importance sampling and K-means

clustering was used in conjunction with a statistical learning technique called Multivari-

ate Adaptive Regression Trees (MART) to develop an approximate relationship between

the lifeline performance and the ground-motion intensities during an earthquake. Predic-

tion biases from the model were modeled using Locally Weighted Scatterplot Smoothing

(LOESS), and were subtracted out from the predictions to obtain unbiased performance

estimates. The lifeline performances predicted by the combination of MART and LOESS

were used in place of the exact lifeline performances (the evaluation of which is intensive)

to expedite the computation of the confidence intervals. It was seen that the exceedance

curves and their confidence intervals obtained using the exact and the approximate perfor-

mance measures match well.

Chapter 8

Seismic risk assessment ofspatially-distributed systems usingground-motion models fitted consideringspatial correlation

N. Jayaram and J.W. Baker (2010). Considering spatial correlation in mixed-effects re-

gression, and impact on ground-motion models, Bulletin of the Seismological Society of

America (in review).

8.1 Abstract

Ground-motion models are commonly used in earthquake engineering to predict the prob-

ability distribution of the ground-motion intensity at a given site due to a particular earth-

quake event. These models are often built using regression on observed ground-motion

intensities, and are fitted using either the one-stage mixed-effects regression algorithm pro-

posed by Abrahamson and Youngs [1992] or the two-stage algorithm of Joyner and Boore

[1993]. In their current forms, these algorithms ignore the spatial correlation between intra-

event residuals. This chapter theoretically motivates the importance of considering spatial

167

CHAPTER 8. IMPACT OF SPATIAL CORRELATION ON GMMS 168

correlation while fitting ground-motion models and proposes an extension to the Abraham-

son and Youngs [1992] algorithm that allows the consideration of spatial correlation.

By refitting the Campbell and Bozorgnia [2008] ground-motion model using the mixed-

effects regression algorithm considering spatial correlation, it is seen that the variance of

the total residuals and the ground-motion model coefficients used for predicting the median

ground-motion intensity are not significantly different from the published values even after

the incorporation of spatial correlation. It is, however, seen that that there is an increase

in the variance of the intra-event residual and a significant decrease in the variance of the

inter-event residual. These changes have implications for risk assessments of spatially-

distributed systems, because a smaller inter-event residual variance implies lesser likeli-

hood of observing large ground-motion intensities at all sites in a region. An example risk

assessment is performed on a hypothetical portfolio of buildings to demonstrate that ne-

glecting the proposed refinement causes an overestimation of the recurrence rates of large

losses.

8.2 Introduction

Ground-motion models are commonly used in earthquake engineering to predict the prob-

ability distribution of the ground-motion intensity at a given site due to a particular earth-

quake event. Typically, a ground-motion model takes the following form:

ln(Yi j)= f (PPPi j,θθθ)+ εi j +ηi (8.1)


tral acceleration at period T ) at site j during earthquake i; f (PPPi j,θθθ) denotes the ground-

motion prediction function with predictive parameters PPPi j (e.g., magnitude, distance of

source from site, site condition) and coefficient set θθθ ; εi j denotes the intra-event residual,

which is a zero mean random variable with standard deviation σi j; ηi denotes the inter-

event residual, which is a random variable with zero mean and standard deviation τi j. The

rest of this chapter assumes for simplicity that the residuals have a constant σ (i.e., σi j = σ)

and τ (i.e., τi j = τ) for any given ground-motion intensity parameter (i.e., the residuals are


homoscedastic). This assumption is not true in some modern models [e.g., Abrahamson

and Silva, 2008], in which case, the concepts remain the same, but some of the equations

are no longer directly applicable.

Ground-motion models are primarily fitted using two approaches: the two-stage regres-

sion algorithm of Joyner and Boore [1993] [e.g., Boore and Atkinson, 2008] and the one-

stage mixed-effects model regression algorithm of Abrahamson and Youngs [1992] [e.g.,

Abrahamson and Silva, 2008, Campbell and Bozorgnia, 2008, Chiou and Youngs, 2008].

Joyner and Boore [1993] provide a detailed comparison of these two algorithms. Both these

algorithms, in their current forms, assume that the intra-event residuals are independent of

each other. The intra-event residuals, however, are known to be spatially correlated [Boore

et al., 2003, Wang and Takada, 2005, Goda and Hong, 2008, Jayaram and Baker, 2009a].

Recently, Hong et al. [2009] investigated the influence of including spatial correlation in

the regression analysis on the ground-motion models fitted using the two-stage regression

algorithm and a one-stage algorithm of Joyner and Boore [1993]. They concluded that

the influence of considering spatial correlation on the estimated ground-motion models is

negligible based on insignificant changes to the coefficient set θθθ . Fitting ground-motion

models considering correlation does, however, change the variances of the inter-event and

the intra-event residuals (observed by Hong et al. [2009] themselves). This chapter pro-

vides a theoretical basis for such changes to the variance terms, and also discusses the

impact of these changes on the estimated seismic risk of spatially-distributed systems. Fur-

ther, a modified algorithm based on that of Abrahamson and Youngs [1992] is developed

that accounts for the spatial correlation in the mixed-effects regression. This modified al-

gorithm is used to refit the Campbell and Bozorgnia [2008] ground-motion model in order

to illustrate the impact of incorporating spatial correlation.

8.2.1 Current regression algorithm

Brillinger and Preisler [1984a,b] first proposed regressing a ground-motion model as a

fixed-effects model. In this approach, the ground-motion model takes the following form:

ln(Yi j)= f (PPPi j,θθθ)+ ε

(t)i j (8.2)


where ε(t)i j denotes the total residual term at site j during earthquake i.

Abrahamson and Youngs [1992] (henceforth referred to as AY92) subsequently devel-

oped a more stable algorithm for the regression by treating the ground-motion model as

a mixed-effects model. The mixed-effects model differs from the fixed-effects model in

its consideration of the error term as being the sum of an intra-event error term and an

inter-event error term (Equation 8.1). The inter-event term helps partially account for the

correlation between the ground-motion intensities recorded during any particular earth-

quake. The AY92 algorithm uses a combination of a fixed-effects regression algorithm and

a likelihood maximization approach, and is described below in more detail.

In the first step of the algorithm, it is assumed that the random-effects terms η1,η2, · · ·,ηM equal zero, in which case Equation 8.1 simplifies to ln

(Yi j)= f (PPPi j,θθθ)+εi j. The co-

efficient set θθθ is then estimated based on the observed Yi j’s using a fixed-effects regression

algorithm. In the next step, the standard deviations σ (for the intra-event residuals) and

τ (for the inter-event residuals) are computed using the likelihood maximization approach

described below.

The total residuals (i.e., the sum of the inter-event and the intra-event residuals), denoted

ε(t)i j , can be computed using the θθθ estimated in the previous step as follows:

ε(t)i j = εi j +ηi = ln(Yi j)− f (PPPi j,θθθ) (8.3)

It is known that the total residuals follow a multivariate normal distribution [Jayaram and

Baker, 2008], and hence, the likelihood (L1) of having observed the set of total residuals

εεε(t) =(

ε(t)i j

)can be estimated as follows:

ln(L1) =−N2

ln(2π)− 12

ln|CCC|− 12

(εεε(t))′

CCC−1(

εεε(t))

(8.4)

where N is the total number of data points, CCC is the covariance matrix of the total residuals

and(

εεε(t))′

denotes the transpose of εεε(t). While estimating the model coefficients, AY92

assume that the intra-event residuals are independent of each other and of the inter-event


residuals. Hence, the covariance matrix CCC can be written as follows:

CCC = σ2IIIN + τ

2M

∑i=1

+111ni,ni (8.5)

where IIIN is the identity matrix of size N by N, 111ni,ni is a matrix of ones of size ni by

ni, Σ+ indicates a direct sum operation (using the notation of AY92), M is the number of

earthquake events and ni is the number of recordings for the ith event. The matrix CCC can be

expanded as follows:

CCC =

σ2IIIn1 + τ2111n1,n1 000 · · · 000

000 σ2IIIn2 + τ2111n2,n2 · · · 000

. . · · · 000

. . · · · 000

000 000 · · · σ2IIInM + τ2111nM ,nM

(8.6)

The maximum likelihood estimates of σ and τ are those that maximize the likelihood

function L1, and are obtained using numerical optimization. Now, for given θθθ and the

maximum likelihood estimates of σ and τ , the random-effects term ηi is estimated using

the maximum likelihood approach as well. The maximum likelihood estimate of ηi is

obtained as follows [Abrahamson and Youngs, 1992]:

ηi =τ2

∑nij=1 εεε(t)

niτ2 +σ2 (8.7)

Finally, using the estimated value of ηi, a new set of coefficients θθθ is obtained using a

fixed-effects algorithm for ln(Yi j)−ηi (i.e., considering ln(Yi j)−ηi = f (PPPi j,θθθ)+εi j). The

new set θθθ is then used to reestimate σ , τ and ηηη , and this iterative algorithm is continued

until the coefficient estimates converge.

In summary, the steps of the mixed-effects algorithm used by AY92 are as follows:

1. Estimate the model coefficients θθθ using a fixed effects regression algorithm assuming

ηηη equals 0.

2. Using θθθ , solve for the variances of the residuals, σ2 and τ2, by maximizing the

likelihood function described in Equation 8.4.


3. Given θθθ ,σ2 and τ2, estimate ηi using Equation 8.7.

4. Given ηi, estimate new coefficients (θθθ ) using a fixed effects regression algorithm for

ln(Yi j)−ηi.

5. Repeat steps 2, 3 and 4 until the likelihood in step 2 is maximized and the estimates

for the coefficient set converge.

One drawback of this algorithm is the assumption in Equation 8.5 that the intra-event

residuals are independent of each other. It is known that the intra-event residuals are spa-

tially correlated, with the correlation decreasing with increasing separation distance be-

tween the residuals [e.g., Jayaram and Baker, 2009a]. Before addressing that issue, the

need to account for the spatial correlation in the regression algorithm is illustrated in the

next section.

8.2.2 Should spatial correlation be considered in the regression algo-rithm?

Consider the hypothetical case where the correlation between the intra-event residuals at

any two different sites is a constant equal to ρ . In this case, the covariance matrix (CCC) for

the total residuals (ε(t)i j ) is defined by the following equations:

C(

ε(t)i j ,ε

(t)i j′

)= ρσ

2 + τ2 ∀ i, j 6= j′ (8.8a)

C(

ε(t)i j ,ε

(t)i j

)= σ

2 + τ2 ∀ i, j (8.8b)

C(

ε(t)i j ,ε

(t)i′ j′

)= 0 ∀ j, j′, i 6= i′ (8.8c)

In summary, the covariance matrix for the total residuals can be expressed as follows:

CCC = (1−ρ)σ2IIIN +(τ2 +ρσ2)

M

∑i=1

+111ni,ni (8.9)


Denoting√

1−ρσ by σ ′ and√

τ2 +ρσ2 by τ ′, Equation 8.9 can be rewritten as

CCC = σ′2IIIN + τ

′2M

∑i=1

+111ni,ni (8.10)

Comparing the forms of Equations 8.5 and 8.10, it can be seen that the algorithm of

AY92 actually provides the estimates of σ ′ and τ ′ rather than σ and τ (If spatial correlations

are absent, this is correct since σ ′ = σ and τ ′ = τ .)

Assume for simplicity that the set of coefficients θθθ is not affected by the spatial corre-

lation (this assumption is relaxed subsequently). Hence, the ‘correct’ estimates of σ and τ

can be estimated from the σ ′ and τ ′ provided by AY92 as follows:

σ =σ ′√1−ρ

(8.11a)

τ =√

τ ′2−ρσ2 (8.11b)

It is to be noted from the above discussion and Equation 8.11 that assuming indepen-

dent intra-event residuals will underestimate σ and overestimate τ . This has implications

for lifeline risk assessments since a larger τ implies a higher likelihood of observing large

ground-motion intensities throughout the region of interest. Thus, it is important to deter-

mine whether fitting the ground-motion equations while considering correlated intra-event

residuals changes the estimates of σ and τ significantly.

8.3 Regression algorithm for mixed-effects models consid-

ering spatial correlation

This section describes an algorithm for fitting the mixed-effects model while accounting

for spatial correlation between intra-event residuals. The algorithm described here differs

from that of AY92 in the estimation of the likelihood function L1 (used in step 2) and in

the computation of the inter-event residual ηi (step 4). Both these changes are necessary to

account for the spatial correlation between intra-event residuals in the regression algorithm.


8.3.1 Covariance matrix for the total residuals

The covariance matrix for the total residuals shown in Equation 8.5 is based on the assump-

tion of independence between spatially-distributed intra-event residuals. The covariance

matrix in the presence of spatial correlation is described below.

Let ρ(d j j′) denote the spatial correlation between intra-event residuals at two sites j

and j′ as a function of d j j′ , the separation distance between j and j′. Then,

C(ε(t)i j ,ε

(t)i j′ ) = C(εi j +ηi,εi j′+ηi)

= ρ(d j j′)σ2 + τ

2 ∀ i, j, j′ (8.12a)

C(

ε(t)i j ,ε

(t)i′ j′

)= 0 ∀ j, j′, i 6= i′ (8.12b)

8.3.2 Obtaining inter-event residuals from total residuals

The maximum likelihood approach is typically used to estimate a constant but unknown

parameter from observed data. The parameter ηi that is of interest here, however, is a

random variable in itself, and hence the authors use a Bayesian framework rather than the

method of maximum likelihood to estimate ηi.

The prior distribution of ηi is N(0,τ2). Conditional on the knowledge of ηi, the ε(t)i j ’s

marginally follow a normal distribution with mean ηi and variance σ2 (since ε(t)i j = εi j +

ηi). Also, the correlation coefficient between ε(t)i j and ε

(t)i j′ conditional on ηi is given by

ρ(d j j′). In other words, the conditional covariance matrix (Cc) for the total residuals can

be expressed as follows:

Cc(ε(t)i j ,ε

(t)i j′ ) = ρ(d j j′)σ

2 ∀ i, j, j′ (8.13a)

Cc

(ε(t)i j ,ε

(t)i′ j′

)= 0 ∀ j, j′, i 6= i′ (8.13b)

Hence the joint density of εεε(t)i =

[ε(t)i1 ,ε

(t)i2 , · · · ,ε(t)ini

]and ηi is expressed as follows:


f (εεε(t)i ,ηi) = f (εεε(t)i |ηi) f (ηi) (8.14)

∝ exp[−1

2

(εεε(t)i −ηi1ni,1

)′C−1

c


)]exp[− 1

2τ2 η2i

]

where εεε(t)i =

[ε(t)i1 ,ε

(t)i2 , · · · ,ε(t)ini

]is the collection of total residuals at all the sites during

earthquake i, f (.) denotes the probability density function,(

εεε(t)i −ηi1ni,1

)′denotes the

transpose of(

εεε(t)i −ηi1ni,1

), and 1ni,1 denotes a column matrix of ones of length ni.

Noting that f (εεε(t)i ,ηi) = f (εεε(t)i ) f (ηi|εεε(t)i ), one possible approach to identify the poste-

rior distribution of ηi given εεε(t)i is to divide the joint density into a function of just εεε

(t)i and

a function that also contains ηi. Let Q(εεε(t)i ) denote any generic function of only εεε

(t)i not

containing ηi. Hence,

f (εεε(t)i ,ηi) ∝ exp[−1

2


)′C−1

c


)]exp[− 1

2τ2 η2i

](8.15)

= Q(εεε(t)i )exp

[12

εεε(t)i

′

C−1c ηi1ni,1 +ηi1

′ni,1C−1

c εεε(t)i −

12

η2i 1′ni,1C−1

c 1ni,1

]exp[− 1

2τ2 η2i

]

= Q(εεε(t)i )exp

−12

(1τ2 +1

′ni

C−1c 1ni,1

)ηi−1′ni,1C−1

c εεε(t)i

1τ2 +1′ni,1C−1

c 1ni,1

2

From the above equation, it can be seen that f (ηi|εεε(t)i ) has a normal distribution with mean1′ni,1

C−1c εεε

(t)i

1τ2 +1′ni,1

C−1c 1ni,1

and variance 11

τ2 +1′ni,1C−1

c 1ni,1. If the best estimator for ηi is to be obtained

under the squared-error loss criterion, then the Bayesian estimator of ηi equals the posterior

mean [Lehmann and Casella, 2003]

ηi =1′ni,1C−1

c εεε(t)i

1τ2 +1′ni,1C−1

c 1ni,1(8.16)


If the spatial correlation is absent, Cc is simply σ2 times an identity matrix of size ni

by ni, in which case, 1′ni,1C−1

c 1ni,1 equals ni/σ2 and 1′ni,1C−1

c εεε(t)i equals ∑

nij=1 ε

(t)i j /σ2, and

Equation 8.16 becomes identical to Equation 8.7.

8.3.3 Algorithm summary

In summary, the steps of the modified mixed-effects algorithm are as follows:

1. Estimate the model coefficients θθθ using a fixed effects regression algorithm assuming

ηηη equals 0.

2. Using θθθ , solve for the variances of the residuals, σ2 and τ2, by maximizing the like-

lihood function described in Equation 8.4. The covariance C in Equation 8.4 is estimated

using Equation 8.12.

3. Given θθθ ,σ2 and τ2, estimate ηi using Equation 8.16.

4. Given ηi, estimate new coefficients (θθθ ) using a fixed effects regression algorithm for

ln(Yi j)−ηi.

5. Repeat steps 2, 3 and 4 until the likelihood in step 2 is maximized and the estimates

for the coefficient set converge.

8.3.4 Large sample standard errors of σ and τ

If desired, the standard errors of the inter- and intra-event residual variances can be calcu-

lated based on the following results from Searle [1977]:

var(σ2) = 2

[tr(

C−1 ∂C∂ (σ2)

)2]−1

(8.17a)

var(τ2) = 2

[tr(

C−1 ∂C∂ (τ2)

)2]−1

(8.17b)

where C is the covariance matrix defined in Equation 8.12, ∂C∂(σ2)

denotes the partial deriva-

tive of C with respect to σ2, ∂C∂(τ2)

denotes the partial derivative of C with respect to τ2, tr


denotes the trace of a matrix and var denotes variance. The partial derivatives, ∂C∂(σ2)

and∂C

∂(τ2), can be evaluated using numerical differentiation.

Alternately, the standard errors can also be evaluated using statistical techniques such

as bootstrap [Efron, 1998].

8.3.5 Mixed-effects regression procedure in R

While mixed-effects regression procedures that consider spatial correlation (referred to as

‘within-group correlation’ in statistical literature) are available in statistical programming

languages such as R (e.g., the nlme package of Pinheiro and Bates [2000]), it is potentially

more convenient for current users of the Abrahamson and Youngs [1992] algorithm to

switch to the modified algorithm described in this chapter. Further, based on the authors’

experience, the nlme implementation in R suffers from numerical instabilities while fitting

the over-parameterized ground-motion models, while the implementation of the proposed

algorithm in MATLAB recovers from similar numerical instabilities potentially due to a

more robust fixed-effects regression implementation in MATLAB.

8.4 Results and discussion

In the current study, the algorithm described in the previous section is used to refit the

Campbell and Bozorgnia [2008] ground-motion prediction model (henceforth referred to

as the CB08 model) for illustration. First, in order to provide a baseline model for compar-

ison, the coefficients of the CB08 model are reestimated while ignoring spatial correlation.

For consistency, only records used by CB08 are used for estimating the coefficients. Ta-

ble 8.1 shows the coefficients estimated in this study for predicting spectral accelerations

at 1 second (denoted Sa(1s)) in the uncorrelated case. Also shown in the table for com-

parison are the corresponding published CB08 model coefficients. Documentation of how

these coefficients are used to make predictions is provided by CB08. The estimates of the

standard deviations of the intra-event residual and the inter-event residual (i.e., σ and τ re-

spectively) are shown in Table 8.2. The value of the published intra-event residual standard

deviation reported here corresponds to that at large Vs30’s (the Vs30 is set above a threshold


value beyond which the ground-motion model no longer consider soil non-linearity effects,

wherein the intra-event residuals have a constant variance at any given period). The refit-

ted coefficients and variance estimates obtained in this work are similar, but not identical,

to those reported by CB08. These small discrepancies are likely due to the manual co-

efficient smoothing carried out by the authors of the CB08 model [Campbell, 2009]. For

consistency, the refitted model coefficients are treated as the benchmark values, for compar-

ison to model coefficients obtained considering spatial correlation. It is to be noted that the

functional form of the CB08 model required knowledge about the A1100 value (median es-

timate of PGA on a reference rock outcrop with Vs30 = 1100m/s) for the median prediction.

This is obtained directly using the coefficients of the CB08 model corresponding to PGA

(as against fitting a separate model for the PGA’s) for simplicity. This is reasonable because

the model coefficients used for predicting median values do not change significantly after

incorporating spatial correlation as shown subsequently in this chapter.

The model coefficients are then reestimated considering spatial correlation. The spatial

correlation model is obtained from Jayaram and Baker [2009a], and is shown below.

ρ(h) = e−3h/b (8.18)

where h (km) denotes the separation distance between the sites of interest, and b denotes

the ‘range’ parameter which determines the rate of decay of correlation. This range is a

function of the spectral period, and equals 26km when Sa(1s) is considered. The coefficient

estimates (i.e., θθθ ) obtained in this case are shown in Table 8.1. It can be seen from the table

that the coefficients obtained by considering spatial correlation are similar to those obtained

by ignoring spatial correlation. This is reinforced by a plot of the predicted medians at all

the data sites using these two approaches (Figure 8.1). This matches with the observation

of Hong et al. [2009] that the ground-motion model coefficients do not change significantly

when considering spatial correlation.

While the coefficients for the median predictions are found to be relatively insensitive

to the incorporation of spatial correlation, significant changes are seen in the estimates of

the variance of the residuals (Table 8.2). In particular, the value of σ increases from 0.578


Table 8.1: Regression coefficients for estimating median Sa(1s)Case c0 c1 c2 c3 c4 c5 c6 c7

1 -6.406 1.196 -0.772 -0.314 -2.000 0.170 4.00 0.2552 -6.487 1.181 -0.878 -0.379 -2.064 0.195 3.884 0.2643 -6.942 1.297 -1.073 -0.182 -2.112 0.198 4.440 0.324

Case c8 c9 c10 c11 c12 k1 k2 k31 0.000 0.490 1.571 0.150 1.000 400.0 -1.955 1.9292 -0.110 0.897 1.577 0.122 0.871 400.0 -1.955 1.9293 -0.093 0.796 1.565 0.093 0.865 400.0 -1.955 1.929

Case 1: Published CB08 results [Campbell and Bozorgnia, 2008]Case 2: Estimated in this study without considering spatial correlationCase 3: Estimated in this study considering spatial correlation

Table 8.2: Standard deviations of residuals corresponding to Sa(1s)Case σ τ

√σ2 + τ2

1 0.568 0.255 0.6232 0.578 0.223 0.6203 0.654 0.157 0.673

Case 1: Published CB08 results [Campbell and Bozorgnia, 2008]Case 2: Estimated in this study without considering spatial correlationCase 3: Estimated in this study considering spatial correlationσ denotes the standard deviation of the intra-event residualτ denotes the standard deviation of the inter-event residual√

σ2 + τ2 denotes the standard deviation of the total residual


Figure 8.1: Comparison of predicted median Sa(1s) values obtained using the CB08 modelfitted with and without the consideration of spatial correlation: (a) linear scale (b) log scale.

to 0.654 and the value of τ decreases from 0.223 to 0.157 after incorporating the spatial

correlation. This trend is to be expected based on the illustrative example shown in Section

8.2.

8.4.1 Standard deviation of residuals as a function of period

The results presented in the previous section support the use of the published coefficients

(i.e., θθθ ) for predicting the median intensities. The values of σ and τ , however, must be

obtained considering spatial correlation. This implies that the iterative mixed-effects algo-

rithm described earlier in the chapter can be simplified to a computation of only the residual

variances σ2 and τ2 (Step 3) using the published values of θθθ (i.e., the mixed-effects regres-

sion is now simply a random-effects regression procedure).

Hence, in this work, the CB08 model coefficients are assumed to be the fixed-effects

model coefficients, and the total residuals are computed using the records in the PEER

NGA database (only those records used by the authors of the CB08 model are considered

for compatibility) [Chiou et al., 2008]. The maximum likelihood estimates of σ and τ

are then obtained at different spectral acceleration periods from the total residuals using

the procedures described earlier. Figure 8.2a compares the estimates of σ obtained in this

study to those reported by CB08. It can be seen that the values of σ obtained considering


Figure 8.2: Effect of spatial correlation on: (a) estimated intra-event residual standarddeviation (σ ), (b) estimated inter-event residual standard deviation (τ), (c) estimated totalresidual standard deviation. (d) Ratio of inter-event residual standard deviation to totalresidual standard deviation.


spatial correlation are mostly larger than the published σ ’s (which have been estimated

ignoring spatial correlations). Figure 8.2b shows that the values of τ , on the other hand, are

considerably smaller when spatial correlations are considered. The values of σ and τ are

then used to compute the standard deviations of the total residuals (computed as√

σ2 + τ2),

and plotted in Figure 8.2c. It can be seen from this figure that considering spatial correlation

does not significantly alter the total residual standard deviation. (Hong et al., 2010 noticed

a small reduction in the total residual standard deviation when the spatial correlation was

considered. The alteration in the total residual standard deviation could depend on the data

set and the spatial correlation model used.)

Though the current work only refits the CB08 model, the trends in the values of σ and

τ are the same for the other recent NGA ground-motion models [e.g., Boore and Atkinson,

2008, Chiou and Youngs, 2008]. This can be seen from Figure 8.2d, which shows typical

ratios of the inter-event residual standard deviation to the total residual standard deviation

reported by these ground-motion models. It is seen that the ratios reported by the ground-

motion modelers are generally much larger than those estimated in this work considering

spatial correlation.

8.4.2 Estimates of spatial correlation

The spatial correlation estimates (Equation 8.18) provided by Jayaram and Baker [2009a]

are based on residuals computed using the published ground-motion models that assume in-

dependence between intra-event residuals. As discussed earlier, the consideration of spatial

correlation while fitting the models does not change the median predictions, and therefore,

the total residuals (Equation 8.1). Jayaram and Baker [2009a] also showed that the spa-

tial correlation between intra-event residuals can be estimated directly from total residuals

(exactly when the intra-event residuals are homoscedastic and approximately otherwise).

Therefore, it can be inferred that the estimates of spatial correlation will be very similar

when estimated using ground-motion models fitted with/ without consideration of spatial

correlation. In other words, it is still appropriate to use the correlation models previously

developed using the published ground-motion models.


8.4.3 Risk assessment for a hypothetical portfolio of buildings

Since ignoring spatial correlation while fitting the ground-motion model does not signifi-

cantly affect the estimates of the ground-motion medians ( f (θθθ)) or the standard deviation

of the total residuals (Figure 8.2c), hazard and loss analyses for single structures will pro-

duce accurate results if the existing ground-motion models are used. Risk assessments for

spatially-distributed systems, however, are influenced by the standard deviation of the inter-

event and the intra-event residuals and not just by the medians and the standard deviation of

the total residuals (this is discussed in more detail in the following section). Therefore, risk

assessments of such systems carried out using ground-motion models fitted with and with-

out consideration of spatial correlation could result in different loss estimates. In the next

section, this is illustrated using a risk assessment carried out on a hypothetical portfolio of

buildings located in the San Francisco Bay Area.

Consider a hypothetical portfolio of 100 buildings in the San Francisco Bay Area lo-

cated on a 10 by 10 grid with a grid spacing of 20km. Each building in the portfolio is

assumed to have a replacement value of $1,000,000. The seismic risk of this portfolio is

estimated by modeling the seismic hazard due to 10 different faults and fault segments.

(The source model is obtained from USGS [2003]). The risk assessment is carried out us-

ing a simulation-based procedure described in Crowley and Bommer [2006] and Jayaram

and Baker [2010]. The steps involved in this procedure are summarized below.

Step 1: Simulate earthquakes of different magnitudes on the active faults in the region,

using appropriate magnitude-recurrence relationships.

Step 2: Using the ground-motion model, compute the median ground-motion intensities

( f (θθθ)) and the standard deviations of the inter-event and the intra-event residuals (σ and τ

respectively) at the sites of interest.

Step 3: Simulate the inter-event residual (i.e., η j) by sampling from the univariate

normal distribution with mean zero and standard deviation τ .

Step 4: Simulate the intra-event residuals (i.e., εi j’s) by sampling from a multivariate

normal distribution with mean 000p,1 (zero vector of size p) and covariance matrix given by

Equation 8.12. Here, the spatial correlation (ρ j j′) is defined by the exponential model in

Equation 8.18 with a range of 26 km.


Step 5: Combine the medians, inter-event residuals and intra-event residuals using

Equation 8.1 to obtain realizations of the ground-motion intensity at all sites of interest.

In the rest of the chapter, each set of ground-motion intensities is referred to as a ground-

motion intensity map. The collection of all simulated ground-motion intensity maps quan-

tifies the total ground-motion hazard in the region.

Step 6: Simulate the damage to the buildings due to each ground-motion intensity map.

Here, this is done using fragility functions which provide the probability of the building

damage being in or exceeding various damage states (no damage, minor damage, moder-

ate damage, extensive damage and collapse) as a function of the spectral acceleration at

1 second at the building location. The damage functions were assumed to be cumulative

lognormal distribution functions with median values 0.4, 0.5, 0.7 and 0.9 for the minor,

moderate, extensive and collapse damage states respectively. The lognormal standard de-

viation was assumed to be 0.6 in all these cases.

Step 7: Compute the total monetary loss associated with the damage to the portfolio

due to each ground-motion intensity map. This is computed by assuming the damage ratio

(ratio of repair cost to replacement cost) to be 0.03, 0.08, 0.25 and 1.00 for the minor,

moderate, severe and collapse damage states respectively.

Step 8: Obtain the loss exceedance curve which provides the annual rate of exceedance

of various monetary loss values. The loss exceedance curve is obtained as the product of

the recurrence rates of all earthquakes in the region and the probability of exceedance of

various monetary loss values. The exceedance probabilities are calculated as follows:

P(L≥ l) =1n

n

∑i=1

I(Li ≥ l) (8.19)

where P(L≥ l) is the probability that the loss exceeds l, n denotes the number of simulated

ground-motion intensity maps, Li is the monetary loss associated with ground-motion in-

tensity map i, and I(Li ≥ l) is an indicator variable that equals one if Li exceeds l and zero

otherwise.

The above-mentioned risk assessment process is carried out using the values of σ and

τ provided by CB08 as well as with the σ and the τ estimated in this work by considering

spatial correlations in the regression formulation (Figures 8.2a and 8.2b). In both cases, the


CB08 median model coefficients are used for estimating median intensities. The resulting

loss exceedance curves are shown in Figure 8.3. It can be seen in Figure 8.3 that the recur-

rence rates of extreme losses are overestimated when the CB08 estimates are used. This is

a result of the fact that the CB08 model overestimates τ and underestimates σ by ignoring

spatial correlation. A large value of τ increases the likelihood of observing large positive

inter-event residuals, which will simultaneously increase the ground-motion intensity at all

the sites in the region. If spatial correlations are large, a large value of σ will have a similar

effect and can result in large ground-motion intensities at multiple sites. In such a case, the

effect of underestimating σ is compensated by the effect of overestimating τ . If the spatial

correlations are small, however, underestimating σ and overestimating τ will have the net

effect of jointly producing more extreme ground-motion intensities at multiple sites than is

probable in reality. It can be inferred from Equation 8.18 that the spatial correlation will be

small if h is large or if b is small. Therefore, when the components of a spatially-distributed

system are well separated (large h) or if the correlation range is small, the ground-motion

models fitted without considering spatial correlation will overestimate the likelihood of

jointly observing extreme ground-motion intensities at multiple sites. It is to be noted that

the separation between the buildings in the hypothetical portfolio considered in this work

is substantial, which leads to significant differences between the loss curves obtained with

and without consideration of spatial correlation. It is difficult to make general conclusions

about the size of this effect, but it is clear that seismic risk analysis calculations using exist-

ing ground-motion model estimates of σ and τ will overestimate the chance of observing

large losses.

8.5 Conclusions

This work illustrated the impact of considering spatial correlation between intra-event

residuals while developing ground-motion models. The mixed-effects algorithm of Abra-

hamson and Youngs [1992], which assumes independence between intra-event residuals,

was modified to account for the spatial correlation between the intra-event residuals. This

was done by changing the likelihood function used for estimating the inter-event and the

intra-event residual variances given other model coefficients and changing the estimate of


Figure 8.3: Risk assessment results for a hypothetical portfolio of buildings performedusing ground-motion models developed with and without the proposed refinement.

the inter-event residual given the total residuals at multiple sites. The modified algorithm

was used to refit the Campbell and Bozorgnia [2008] ground-motion model, to illustrate the

effect of this refinement. The variance of the total residuals and the model coefficients used

for predicting the median ground-motion intensity were not significantly affected by the

proposed refinement. Significant changes, however, were seen in the variance of the intra-

event and the inter-event residuals. Incorporating spatial correlation was seen to increase

the intra-event residual variance and to decrease the inter-event residual variance. These

changes have implications for risk assessments of spatially-distributed systems because a

smaller inter-event residual variance implies a lesser likelihood of simultaneously observ-

ing larger-than-median ground-motion intensities at all sites in a region. To demonstrate

this effect, a risk assessment was performed for a hypothetical portfolio of buildings using

the ground-motion models obtained with and without accounting for spatial correlation.

The results showed that using the published variance estimates causes an overestimation of

the exceedance rates of large losses.

Chapter 9

Hurricane risk assessment ofspatially-distributed systems withconsideration of wind-field uncertaintiesand spatial correlation

9.1 Abstract

With a view toward extending the seismic risk assessment techniques developed in this

work for risk assessment under other types of hazards, this exploratory study focuses on

quantifying the uncertainties and the spatial correlation in hurricane wind fields (using tech-

niques that were used for earthquake ground motion fields), and evaluating their impact on

the hurricane risk of spatially-distributed systems. Hurricane wind-speed predictions are

obtained for two sample hurricanes, Hurricane Jeanne and Hurricane Frances, using the

Batts et al. [1980] wind-speed model, and the uncertainties in these predictions are evalu-

ated using ‘actual’ wind-speed recordings. The spatial correlation of wind speeds is esti-

mated and modeled using geostatistical tools. Finally, the impact of the wind-speed uncer-

tainties and the spatial correlation on the hurricane risk of a spatially-distributed system is

illustrated by a sample risk assessment of a hypothetical portfolio of buildings. The results

of the risk assessment show that the uncertainties and the spatial correlations in the wind

187

CHAPTER 9. PROBABILISTIC HURRICANE RISK ASSESSMENT 188

fields needs to be modeled in order to avoid introducing errors into the risk calculations of

spatially-distributed systems.

9.2 Introduction

Frameworks for the risk assessment of structures and infrastructure systems under natural

and man-made hazards share many similarities. Broadly, they involve the quantification

of a hazard intensity measure (e.g., ground-motion intensities during earthquakes, wind

speeds during hurricanes) and the associated probable losses. The techniques developed in

this thesis for seismic risk assessment can thus be applicable for the risk assessment under

other types of hazards. This exploratory study extends the seismic hazard and risk assess-

ment concepts and techniques discussed in the earlier chapters of this thesis to hurricane

(chosen as a sample alternate hazard) hazard and risk modeling.

Vickery et al. [2000b] developed a hurricane wind-hazard model that forms the basis

for the wind-speed contours in ASCE Standard 7-02 [2003] in the Southeast U.S. The

following steps are involved in this hazard quantification approach:

• Step 1: Use historical hurricane records to develop probability density functions

(PDF) for key hurricane parameters such as the location of origin, translation di-

rection, translation speed, central pressure and radius of maximum wind.

• Step 2: Use the PDFs developed in Step 1 to Monte Carlo simulate probable future

hurricanes.

• Step 3: Predict the peak wind speeds due to each simulated hurricane at the sites

of interest using empirical or physics-based wind-speed models. [e.g., Batts et al.,

1980, Vickery et al., 2000a, 2008].

• Step 4: Develop a PDF for the peak wind speeds experienced at any particular site

using the wind-speed information from Step 3.

In general, it can be seen that this process is similar to probabilistic seismic hazard

analysis (PSHA), which is used for quantifying seismic ground-motion hazard at a given

site [Cornell, 1968, Kramer, 1996].


The wind-hazard model described above can be used in combination with structural

fragility curves to obtain the exceedance rates of different levels of structural losses using

numerical integration. Alternately, the hazard information can be used in a structural relia-

bility framework to estimate the failure probability of a structure under hurricane loading.

For instance, Li and Ellingwood [2009] modeled the site wind speeds as a Weibull random

variable (whose distribution is parameterized using the wind hazard information obtained

from Vickery et al. [2000b]), and estimated the reliability of low-rise light-frame wood

residential construction in the U.S. subjected to hurricane loading.

It is, however, difficult to use the two analytical risk assessment approaches described

above for assessing the risk of spatially-distributed systems such as portfolios of buildings

and lifelines. This is because the risk assessment of spatially-distributed systems is based

on a large vector of correlated wind speeds (wind speeds at all component locations), which

makes it difficult to use numerical integration and other analytical techniques. Hence, many

past research works use Monte Carlo simulation (MCS) instead of analytical approaches

for the risk assessment of spatially-distributed systems [e.g., Legg et al., 2010]. The basic

MCS approach for the risk assessment involves the following steps:

• Step 1: Simulate probable future hurricanes using the PDFs of hurricane-related pa-

rameters developed in past research works such as that of Vickery et al. [2000b].

• Step 2: Predict the peak wind speeds due to each simulated hurricane using empirical

or physics-based wind-speed models. [e.g., Batts et al., 1980, Vickery et al., 2000a,

2008].

• Step 3: Monte Carlo simulate the total loss due to the wind speeds.

• Step 4: Estimate the probability of exceeding various loss levels using the loss esti-

mates from Step 3.

Most hurricane wind-speed prediction models developed in the past are deterministic,

and the uncertainties in wind fields have been analyzed in few research works. (In this

chapter, the wind field denotes the collection of peak wind speeds (over the duration of

the hurricane) at all the sites of interest. The peak wind speed at a site (similar to peak


ground acceleration for earthquakes) is often the hurricane intensity measure used to esti-

mate probable losses [e.g., Jarvinen et al., 1984, Li and Ellingwood, 2009].) One notable

exception is the work of Vickery et al. [2009b], who computed the uncertainties in the wind

fields (maximum peak gust wind speed) using observed and predicted (by the Vickery et al.

[2008] wind-field model) wind speeds during 24 different hurricanes. They found that the

ratio of the observed wind speeds to the predicted wind speeds has a mean of one and and

a coefficient of variation of 0.1. Chapter 5 of this thesis illustrated that ignoring the un-

certainties in the earthquake ground motions can lead to inaccurate lifeline risk estimates.

This study demonstrates the potential for comparable inaccuracies caused by ignoring the

uncertainties in wind fields during hurricane risk assessments. The seismic risk assess-

ments described in Chapter 5 of this thesis also illustrated the importance of considering

spatial correlation in ground motion fields for obtaining accurate risk estimates. To the

author’s knowledge, the spatial correlation in hurricane wind fields has not been studied in

the literature.

Assume that a probabilistic hurricane wind-speed model takes the following form:

ln(Vi) = ln(Vi)+ εi (9.1)

where Vi denotes the observed peak wind speed at site i, Vi denotes the predicted (by the

wind-field model) median peak wind speed at site i, and εi denotes the residual (error

term). For clarity, the spatial correlation mentioned in this chapter refers to the correlation

between the residuals (ε’s) at two different sites. There is a significant amount of corre-

lation between the wind speeds at two closely-spaced sites during a hurricane (which was

considered by Legg et al. [2010]), but a large portion of this correlation is accounted by the

wind-speed model, which predicts similar wind speeds at sites close to one another. The

residuals (εi’s) are correlated as well, and this correlation is of interest in this study. (Chap-

ter 3 of this thesis discusses the comparable concept of spatial correlation for earthquake

ground-motion fields in detail.) Causes of this correlation include common source effects

and similarity in topography- and land friction-related effects.

It is of interest to quantify the uncertainties and the spatial correlation in wind fields. In

this exploratory study, two sample hurricanes are used, with the primary goals of obtaining


approximate estimates for these parameters and illustrating the tools and methods that can

be used for the estimation. Further, a sample hurricane risk assessment is carried out for a

hypothetical portfolio of buildings in order to illustrate the importance of considering the

uncertainties and the spatial correlation in the risk assessment process.

9.3 Spatial correlation estimation methodology

In this chapter, the uncertainties and the spatial correlation in hurricane wind fields are

empirically estimated using recorded hurricane wind speeds. The wind-field uncertainties

are quantified using the mean and the variance of the residuals (ε’s). The residuals are

computed from recorded hurricane wind speeds using Equation 9.1, where the wind-speed

predictions are obtained from the Batts et al. [1980] model. This model is chosen in this

work for its simplicity, and the analyses performed using this simple model can be repeated

with a more rigorous model [e.g., Vickery et al., 2008] if desired. The spatial correlations

between the residuals are estimated using well-established geostatistical tools [Deutsch and

Journel, 1998, Goovaerts, 1997] that were previously used in Chapter 3 for quantifying the

spatial correlation in ground motion fields. These tools are described briefly in this section

supplement the more detailed discussion in Chapter 3.

Let ε denote the normalized residual, estimated as follows:

εi =εi

σ(9.2)

where σ denotes the standard deviation of the residual.

The correlation structure of εi (equivalently, that of εi) can be represented using a semi-

variogram, which represents the dissimilarity between the εi’s. Let u and u′ denote two

sites separated by distance vector hhh, and εu denote the residual at site u. The semivari-

ogram (γ(u,u′)) is defined as follows:

γ(u,u′) =12

E[{εu− εu′}2] (9.3)

where E(.) denotes the expectation operator.

The semivariogram defined in Equation 9.3 is location-dependent, and its inference


requires repetitive realizations of εi at locations u and u′. Such repetitive measurements

are, however, never available in practice. Hence, it is typically assumed that the semivari-

ogram does not depend on site locations u and u′, but only on their separation hhh to obtain

a stationary semivariogram. The stationary semivariogram (γ(hhh)) can then be estimated as

follows:

γ(hhh) =12[E{εu− εu+h}2] (9.4)

A stationary semivariogram is said to be isotropic if it is a function of the separation dis-

tance (h = ‖hhh‖) rather than the separation vector hhh. An isotropic, stationary semivariogram

can be empirically estimated from a data set as follows:

γ(h) =1

2N(h)

N(h)

∑α=1{εuα− εuα+h}2 (9.5)

where γ(h) is the experimental stationary isotropic semivariogram (estimated from a data

set); N(h) denotes the number of pairs of sites separated by h; and {εuα, εuα+h} denotes the

α’th such pair.

When empirically estimated, γ(h) only provides semivariogram values at discrete val-

ues of h, and hence, a continuous function is usually fitted to the discrete values to obtain

the semivariogram for continuous values of h. There are only a few permissible continuous

functions that ensure that the covariance matrices estimated using these semivariograms

are positive definite [Goovaerts, 1997]. The current study uses the permissible Gaussian

semivariogram (shown below), which is seen to provide the best fit to the empirical semi-

variogram values.

γ(h) = a[1− exp

(−3h2/b2)] (9.6)

where a denotes the ‘sill’ of the semivariogram (which in this case equals one, the variance

of the normalized residuals) and b denotes the ‘range’ of the semivariogram (which equals

the separation distance h at which γ(h) equals 0.95a).


It can be theoretically shown that the spatial correlation function (ρ(h)) for the normal-

ized residuals can be computed from the semivariogram function as follows:

γ(h) = a(1−ρ (h)) (9.7)

Hence, it can be seen that the correlations are completely defined by the semivariogram,

which in turn, is a function only of the range. (The sill is known to equal 1, the variance

of the normalized residuals for which the semivariogram is constructed.) Moreover, note

from equations 9.5 and 9.7 that a larger range implies a smaller rate of increase in γ(h) with

h, and subsequently, a smaller rate of decay of correlation with separation distance.

9.4 Results and Discussion

9.4.1 Data source

The analyses performed in this study use ‘recorded’ wind-speed information from two hur-

ricanes, namely, Hurricane Jeanne (2004) and Hurricane Frances (2004). In both cases,

information about hurricane-related parameters such as central pressure, storm position,

direction and translation speed are obtained from the six hour position data provided by the

HURDAT database [Jarvinen et al., 1984]. The ‘recorded’ wind-speed data are obtained

from the Hurricane Research Division (HRD) H*Wind program [Powell et al., 1996]. The

primary data comes from the Air Force Reserves (AFRES) reconnaissance flight-level ob-

servations reduced from near 3 km to the surface with a boundary layer model Powell

[1980]. Other data sources include ships, buoys, Coastal-Marine Automated Network (C-

MAN) observations, airport observations including Automated Surface Observing Stations

(ASOS), and supplemental data collected after landfall from public and private sources

[Powell and Houston, 1997]. Additional data over sea is collected by deploying ‘drop-

windsondes’ from aircrafts that drift down on a parachute measuring vertical profiles of

pressure, temperature, humidity and wind as they fall [Aberson and Franklin, 1999]. The

wind-speed data are quality controlled and processed to conform to a common framework

for height of recording (10m), exposure (open terrain) and averaging period (maximum


sustained 1 minute wind speed) [Powell et al., 1996, Powell and Houston, 1996]. These

data were then objectively analyzed with a technique based upon the spectral application of

finite element representation (SAFER) method [Ooyama, 1987, Franklin et al., 1993] in or-

der to obtain an interpolated grid of peak wind speeds. In past research, tests on the SAFER

methodology have indicated that the technique correctly reproduced known surface wind

fields based on the available wind observations [Houston et al., 1999].

9.4.2 Hurricane Jeanne (2004)

This section describes the uncertainties and the spatial correlation estimated using the

recorded wind field from the 2004 Hurricane Jeanne. Hurricane Jeanne formed on Septem-

ber 13, 2004 and made its landfall and stayed over Florida on September 26. In this study,

recorded hurricane and wind-field data collected between September 24-26 are used for the

analysis.

Figure 9.1a shows the observed maximum (over the duration of the hurricane) wind

speeds during Hurricane Jeanne, and Figure 9.1b shows the maximum wind speeds pre-

dicted by the Batts et al. [1980] model. The HURDAT database only provides six hour

hurricane-related data. In order to obtain a finer resolution of wind speeds over time, the

six hour hurricane data are interpolated linearly to obtain 30 minute data, which are then

used to predict (using the Batts et al. [1980] model) the wind speeds at every 30 minute

interval and subsequently the peak wind speeds over the duration of the hurricane (Figure

9.1b). Figure 9.1c shows the residuals computed using Equation 9.1. As mentioned ear-

lier, the Batts et al. [1980] model is chosen in this study primarily for its simplicity. It is,

however, seen to predict wind speeds that are biased as a function of the closest distance of

the site from the hurricane track (denoted di for site i). This is illustrated by Figure 9.2a,

which shows the residuals as a function of the d’s. This plot indicates that, in general, the

Batts et al. [1980] model under-predicts wind speeds at sites far away from the hurricane

track, and over-predicts wind speeds at sites close to the hurricane track. The newer wind-

speed models have smaller biases on account of the availability of larger data sets and better

model development techniques. Therefore, in order to prevent the Batts et al. [1980] model

bias from affecting the uncertainty and the correlation estimates, a simple bias correction


Figure 9.1: Hurricane Jeanne: (a) Observed wind speeds (b) Predicted wind speeds (c)Residuals (d) Bias-corrected residuals.


Figure 9.2: Residuals and bias-corrected residuals versus closest distances from the hurri-cane track.

is performed before analyzing the residuals.

Since the plot between the ε’s and the d’s (Figure 9.2a) shows a linear trend, a bias

correction factor is obtained using a linear regression between ε and d. The bias correction

is then added to the predicted wind speeds in order to eliminate the bias. Figure 9.2b shows

the residuals obtained after the bias correction (denoted ε in the rest of the chapter). Some

minor local trends can still be seen between the residuals and the closest distances, but these

are reasonably insignificant compared to the overall trend seen in Figure 9.2a. (It might be

possible to employ other bias correction techniques to completely eliminate the trends, but

this is not done in this exploratory study. Further this may not be necessary while using the

more recent wind-speed models.) The scatter in Figure 9.2b is the bias-corrected ‘aleatory’

uncertainty in the wind-speed predictions.

The histogram and the normal quantile-quantile (QQ) plot of the ε (the normal QQ plot

is estimated after dividing the ε’s by their standard deviation) are shown in Figure 9.3. The

figure shows that the residuals have a heavier upper tail than the normal distribution. But

normality holds reasonably well until a normalized ε value of 2. In the rest of the chapter,

the residuals are assumed to follow a normal distribution for simplicity during simulation,

though this assumption should be verified using data from other recorded hurricanes and


Figure 9.3: (a) Histogram of bias-corrected residuals estimated using the Hurricane Jeannedata (b) Normal QQ plot of normalized bias-corrected residuals from Hurricane Jeanne.

particularly, using newer wind-speed models. The ε’s have mean zero (on account of the

bias correction) and standard deviation 0.15, which roughly agrees with the coefficient of

variation of 0.1 reported by Vickery et al. [2008] (noting that the coefficient of variation of

the multiplicative error term defined by Vickery et al. [2008] is comparable to the standard

deviation of ε , if the ε’s reasonably follow a normal distribution). This standard deviation is

much smaller than that of the total residuals computed from ground-motion fields (∼ 0.6),

but is not negligible.

Figure 9.4 shows the semivariogram computed using the ε’s. The experimental semi-

variogram values are fitted using the Gaussian model shown in Equation 9.6, with a range

of 170km. The Gaussian fit is more appropriate here (as compared to the exponential fit

used for earthquake ground-motion intensity semivariograms) on account of the smoothly-

varying wind-speed residual field. The range of 170km is chosen so that the fit is better

at short distances (≤ 20km), even if this requires some misfit with empirical data at large

separation distances. As described in Chapter 3, this is because it is more important to

model the semivariogram structure well at short separation distances. Large separation dis-

tances are associated with low correlations, which thus have relatively little effect on joint

distributions of ground motion intensities. In addition to having low correlation, widely

separated sites also have little impact on each other due to an effective ’screening’ of their


Figure 9.4: Semivariogram of bias-corrected residuals estimated using the HurricaneJeanne data.

influence by more closely-located sites (Goovaerts, 1997). It can be seen from Figure 9.4

that the extent of spatial correlation is much larger than what was seen from the earthquake

data [Jayaram and Baker, 2009a]. This is not surprising since hurricane wind speeds are

less influenced by factors such as local heterogeneities that reduce the spatial correlation in

ground-motion fields.

9.4.3 Hurricane Frances (2004)

Hurricane Frances formed on August 24, 2004 and made its landfall in Florida on Septem-

ber 4. In this study, recorded hurricane and wind-field data collected between September

4-6 are used for the analysis. The hurricane track data and the recorded wind speeds are

obtained from the HURDAT database and the HRD respectively. The peak wind speed

predictions are obtained using the Batts et al. [1980] model. Figure 9.5 shows the bias

corrected residuals obtained from the Hurricane Frances recordings.


Figure 9.5: Bias-corrected residuals estimated using the Hurricane Frances data.

Figure 9.6: (a) Histogram of bias-corrected residuals estimated using the Hurricane Francesdata (b) Normal QQ plot of normalized bias-corrected residuals from Hurricane Frances.


Figure 9.7: Semivariogram of bias-corrected residuals estimated using the HurricaneFrances data.

Figure 9.6 shows the histogram and the normal QQ plot of the ε’s. The QQ plot in-

dicates that the residuals are more heavy tailed than the normal distribution beyond plus-

minus two standard deviations. The ε’s have mean zero (on account of the bias correc-

tion) and standard deviation 0.13. Figure 9.7 shows the semivariogram computed using the

ε’s. The experimental semivariogram values are fitted using the Gaussian model shown in

Equation 9.6, with a range of 130km.

9.4.4 Hurricane risk assessment of a hypothetical portfolio of build-ings

This section describes the simulation-based hurricane risk assessment of a hypothetical

portfolio of buildings, and illustrates the importance of modeling the uncertainties and the

spatial correlation in the wind fields for obtaining accurate risk estimates. The portfolio

considered here consists of five two-story residential buildings (gable roof, 6d roof sheath-

ing nails, shingle roof cover, wood frames, two-nailed roof/wall connections, no garage)


Figure 9.8: Portfolio of five residential buildings considered in the risk assessment.

located in Palm Bay, Florida at coordinates [-80.6,28], [-80.7,28], [-80.5,27.9], [-80.6,27.9]

and [-80.5,27.8] (Figure 9.8). The replacement value of each building is assumed to be

$1,000,000. It is of interest in this study to evaluate the exceedance rates of post-Hurricane

Jeanne losses to this portfolio. The steps involved in the risk assessment procedure are

described below.

• Step 1: Using the wind-speed model of Batts et al. [1980], the parameters of Hur-

ricane Jeanne obtained from HURDAT, and the bias correction of Section 9.4.2, the

wind speeds are predicted at all the sites of interest.

• Step 2: The residuals (ε in Equation 9.1) at the sites of interest are assumed to follow

a multivariate normal distribution with mean zero and standard deviation 0.15, based

on the findings in this work. The spatial correlation is defined by the Gaussian model

in Equation 9.6 with a range of 150km. The residuals are simulated at the sites of

interest using this distribution. The simulation approach is described in detail in

Chapter 4.

• Step 4: The predicted wind speeds and the simulated residuals are combined using

Equation 9.1 to obtain realizations of the wind speeds at all the sites of interest (i.e.,

a simulated wind field).

• Step 5: The building losses due to each simulated wind field are evaluated using

damage functions provided by HAZUS [2006]. These damage functions provide an

estimate of the mean damage ratio (ratio of loss sustained to the replacement cost) as

a function of the peak gust wind speed experienced during the hurricane (in this case,


the simulated peak wind speed at the building site). It is to be noted that the damage

functions provided by HAZUS [2006] are deterministic [Vickery et al., 2009a]. For

the purposes of this exploratory study these deterministic damage functions are used,

but more realistic damage functions can be used with this framework if desired.

• Step 6: Obtain the probability of exceedance of various monetary loss values. The

exceedance probabilities are calculated as follows:

P(L≥ l) =1n

n

∑i=1

I(Li ≥ l) (9.8)

where P(L ≥ l) is the probability that the loss exceeds l, n denotes the number of

simulated wind fields, Li is the monetary loss associated with wind field i, and I(Li ≥l) is an indicator variable that equals one if Li exceeds l and zero otherwise.

It is to be noted that the steps described above do not include steps 1 and 2 listed

in Section 3.2 since the risk assessment carried out post-Hurricane Jeanne does not

require the simulation of hurricane paths and other hurricane-related parameters (The

recorded Hurricane Jeanne parameters are directly used.)

The exceedance probabilities obtained for the portfolio are shown in Figure 9.9. Also

shown in the figure are the exceedance probabilities obtained by ignoring the spatial corre-

lation between the residuals when performing the simulation in Step 2. It can be seen that

ignoring the spatial correlation results in an overestimation of the probability of exceeding

small losses and an underestimation of the probability of exceeding large losses. The ex-

tent of the overestimation and the underestimation will be smaller if the uncertainties in the

damage function are considered, but the risk estimates will nevertheless be inaccurate. Sev-

eral past risk assessments have completely ignored the uncertainty in the wind fields (i.e.,

predicted wind speeds are used, and the residuals are ignored). The loss estimate obtained

in this deterministic case is shown by the vertical line in Figure 9.9. It is seen that the loss

estimate at moderately large probabilities of exceedances can be significantly smaller than

some of the probable loss estimates obtained when the residuals and the correlations are

considered.


Figure 9.9: Portfolio loss exceedance probabilities.

9.5 Limitations and research needs

This section lists some of the challenges and needs in hurricane risk assessment research,

and discusses the limitations of the approach proposed in this chapter.

One of the primary concerns in developing empirical models related to hurricane wind

speeds is the availability of reliable wind speed data from past hurricanes for use in model

development. The HRD H*Wind program partly alleviates the concerns by processing data

from a multitude of data sources including from low flying aircraft, ships, buoys, airport

observations and other public and private data sources. In addition, the use of dropwind-

sondes improves the overall data quality over sea. Nevertheless, boundary layer models are

used to convert the collected data to a common framework for height of recording (10m),

exposure (open terrain) and averaging period (maximum sustained 1 minute wind speed),

and interpolation algorithms (SAFER) are used to estimate wind speeds over a grid of

points. Therefore, the wind fields developed by HRD are not entirely empirical, but rather

involve the use of additional algorithms which can have an impact on the data quality.

The current study uses hurricane recordings from only two hurricanes for quantifying

the hurricane wind-speed uncertainties and spatial correlations. The Batts et al. [1980]


wind-speed model was chosen primarily for its simplicity, but it has known limitations.

For instance, the model does not consider the reduction in wind speeds attributable to land

friction, but rather assumes a constant 15% reduction in the wind speed when the hurri-

cane enters the land from the sea. This is likely to result in correlated prediction errors

at neighboring sites with similar levels of land friction, which will increase the estimated

value of spatial correlation. In the future, the uncertainties and the spatial correlations need

to be estimated using data from additional hurricanes, using a newer and a more rigorous

wind-speed model such as that of Vickery et al. [2008].

In this study, a deterministic damage function obtained from HAZUS [2006] was used

for the illustrative hurricane risk assessment. A probabilistic damage function that captures

the uncertainties in the losses during a hurricane should be used in future works. This

will give a better estimate of the importance of considering wind-field uncertainties and

spatial correlation in hurricane risk assessments. The illustrative risk assessment carried

out in this work estimated the risk of a portfolio of buildings. Further research is required

to estimate the hurricane-based risk of lifelines such as transportation networks. The risk

assessment did not involve simulation of hurricane tracks, rather only estimates the risk

given that hurricane Jeanne had occurred. The current research only considered the wind

hazard during the hurricanes, and did not consider the flood and storm surge hazards.

9.6 Conclusions

An exploratory study was carried out to investigate the extension of the seismic hazard

and risk assessment concepts and techniques discussed in the earlier chapters to hurricane

hazard and risk modeling. The study focused on quantifying the uncertainties and the spa-

tial correlation in hurricane wind fields (using techniques that were used to quantify these

parameters in earthquake ground motion fields), and evaluating their impact on the hurri-

cane risk of spatially-distributed systems. Hurricane wind-speed predictions were obtained

for two sample hurricanes, Hurricane Jeanne and Hurricane Frances, using the Batts et al.

[1980] wind-speed model, and the uncertainties in these predictions were evaluated us-

ing actual wind-speed recordings. The wind-speed residuals had a standard deviation of

approximately 0.15, indicating that the uncertainties are not negligible. The wind-speed


uncertainties at two sites were seen to be correlated, with the correlation decaying as a

Gaussian function of the separation between the sites.

Finally, the impact of the wind-speed uncertainties and the spatial correlation on the

hurricane risk of a spatially-distributed system was illustrated by a sample risk assessment

of a hypothetical portfolio of buildings. It was seen that ignoring the uncertainties or the

correlations results in an overestimation of the probability of exceedance of small losses

and an underestimation of the probability of exceedance of large losses.

Chapter 10

Conclusions

This study focused on developing a computationally-efficient framework for the seismic

risk assessment of lifelines (infrastructure systems). Two important challenges in the seis-

mic risk assessment of a lifeline as compared to that of a single structure are the quantifi-

cation of the ground-motion hazard over a region rather than at just a single site and the

minimization of the computational burden associated with lifeline performance evaluations

(Figure 10.1). Contributions have been made in both of these areas. The following sub-

sections briefly summarize the important findings of this work, the limitations of this work

and suggested future work related to this thesis.

10.1 Contributions and practical implications

10.1.1 Joint distribution of spectral acceleration values at differentsites and/ or different periods

Risk assessment of spatially-distributed building portfolios or infrastructure systems re-

quires assumptions regarding the joint distribution of the ground-motion intensity measures

at multiple sites during the same earthquake. Chapter 2 of this thesis discussed statistical

tests that were used to examine the commonly-used assumptions of univariate normality of

logarithmic spectral acceleration values and multivariate normality of vectors of logarith-

mic spectral acceleration values computed at different sites and/or different periods. Joint

206

CHAPTER 10. CONCLUSIONS 207

Figure 10.1: Comparison of the risk assessment frameworks for (a) single structures and(b) lifelines.


normality of logarithmic spectral accelerations was verified by testing the multivariate nor-

mality of inter-event and intra-event residuals. Univariate normality of inter-event and

intra-event residuals was studied using normal Q-Q plots. The normal Q-Q plots showed

strong linearity, indicating that the residuals are well represented by a normal distribution

marginally. No evidence was found to support truncation of the marginal distribution of

intra-event residuals as is sometimes done in PSHA.

Using the Henze-Zirkler test, the Mardia’s test of skewness and the Mardia’s test of

kurtosis, it was shown that inter-event and the intra-event residuals at a site, computed at

different periods, follow multivariate normal distributions. The normality test of Goovaerts

was used to illustrate that pairs of spatially-distributed intra-event residuals can be rep-

resented by the bivariate normal distribution. For a set of observed spatially-distributed

data, it is practically impossible to ascertain the trivariate normality and the normality at

higher dimensions and hence, the presence of univariate and bivariate normalities was con-

sidered to indicate multivariate normality of the spatially-distributed intra-event residuals

[Goovaerts, 1997].

10.1.2 Spatial correlation model for spectral accelerations

The ground-motion models that are used for site-specific hazard analysis do not provide in-

formation on the spatial correlation between ground-motion intensities, which is required

for the joint prediction of intensities at multiple sites. Chapter 3 described a spatial cor-

relation model that has been developed from recorded ground-motion time histories using

geostatistical tools. The correlation decreases with increasing separation between the sites,

and this correlation structure can be modeled using semivariograms. A semivariogram is

a measure of the average dissimilarity between the data, whose functional form, sill and

range uniquely identify the ground-motion correlation as a function of separation distance.

Ground motions observed during the Northridge, Chi-Chi, Big Bear City, Parkfield, Alum

Rock, Anza and Chino Hills earthquakes were used to compute the correlations between

spatially-distributed spectral accelerations, at various spectral periods. It was seen that the

rate of decay of the correlation with separation typically decreases with increasing spec-

tral period. It was reasoned that this could be because long period ground motions at two


different sites tend to be more coherent than short period ground motions, on account of

lesser wave scattering during propagation. It was also observed that, at periods longer than

2 seconds, the estimated correlations were similar for all the earthquake ground motions

considered. At shorter periods, however, the correlations were found to be related to the

site Vs30 values. It was shown that the clustering of site Vs30’s suggests larger correla-

tions between residuals. The work also investigated the commonly-used assumption of

isotropy, and it was seen using the empirical data that the correlation between Chi-Chi

and Northridge earthquake intensities show isotropy. Based on these findings, a predictive

model was developed that can be used to select appropriate correlation estimates for use in

risk assessment of spatially-distributed building portfolios or infrastructure systems.

Chapter 4 described additional tests that were carried out using simulated ground-

motion time histories [Aagaard et al., 2008] to verify the validity of commonly-used as-

sumptions in spatial correlation models such as stationarity (invariance of correlation with

spatial location) and isotropy (directional independence).The correlations were estimated

using different orientations of the time histories, namely, fault normal, fault parallel, north-

south and east-west, and were found to be similar in all four cases. The assumption of

isotropy of spatial correlations was studied using directional semivariograms, and was

found to be reasonable. The correlations were seen to be smaller than average between

sites located extremely close to the fault rupture. Intuitively, it is reasonable to expect path

effects and small-scale variations to reduce spatial correlation between ground motions at

near-fault sites. The pulse-identification algorithm of Baker [2007a] was used for identify-

ing pulse-like ground motions, and the correlations between pulse-like and non-pulse-like

ground motions were compared. For the data set used, no significant differences were

found between the correlations in these two cases.

10.1.3 Lifeline seismic risk assessment using efficient sampling anddata reduction techniques

Chapter 6 discussed an efficient Monte Carlo simulation (MCS)-based framework based

on importance sampling and K-means clustering that has been proposed for the seismic

risk assessment of lifelines. The framework can be used for developing a small, but


stochastically-representative, catalog of ground-motion intensity maps that can be used

for performing lifeline risk assessments. The importance sampling technique was used to

preferentially sample important ground-motion intensity maps, and the K-means clustering

technique was used to identify and combine redundant maps. It was shown theoretically

and empirically that the risk estimates obtained using these techniques are unbiased.

The proposed framework was used to evaluate the exceedance rates of various travel-

time delays on an aggregated form of the San Francisco Bay Area transportation network.

Simplified transportation network analysis models were used to illustrate the feasibility of

the proposed framework. The exceedance rates were obtained using a catalog of 150 maps

generated using the combination of importance sampling and K-means clustering, and were

shown to be in good agreement with those obtained using the conventional MCS method.

Therefore, the proposed techniques can potentially reduce the computational expense of a

MCS-based risk assessment by several orders of magnitude, making it practically feasible.

The study also showed that the proposed framework automatically produces intensity maps

that are hazard consistent. Finally, the study showed that the uncertainties in ground-motion

intensities and the spatial correlations between ground-motion intensities at multiple sites

needs to be be modeled in order to avoid introducing errors into the lifeline risk calcula-

tions.

Appendix C described lifeline loss deaggregation calculations that were used to identify

the ground-motion scenarios most likely to produce exceedance of a given loss threshold

for a spatially-distributed lifeline system. Deaggregation calculations were performed to

identify the likelihoods of earthquake events that cause various levels of travel time delays

(the lifeline loss measure) in an aggregated form of the San Francisco bay area transporta-

tion network. The deaggregation calculations indicated that the ‘most-likely’ scenario de-

pends on the loss level of interest, and is influenced by factors such as the seismicity of

the region, the location of the lifeline with respect to the faults and the performance state

of the various components of the lifeline under normal operating conditions. The calcu-

lations also showed that large losses are typically caused by moderately large magnitude

events with large values of inter-event and intra-event residuals, indicating the importance

of accounting for the residuals in the loss assessment framework. Loss assessments carried

out without accounting for either the inter-event or the intra-event residuals produce biased


loss estimates.

10.1.4 Lifeline performance assessment using statistical learning tech-niques

MCS and its variants are well suited for characterizing ground motions and computing re-

sulting losses to lifelines, but are highly computationally intensive because they involve

repeated evaluations of lifeline performance under a large number of simulated ground-

motion intensity maps. Chapter 7 explored the use of a statistical learning technique termed

Multivariate Adaptive Regression Trees (MART) to obtain an approximate relationship be-

tween the ground-motion intensities at lifeline component locations and the lifeline per-

formance. The lifeline performance predicted by this relationship can be used in place of

the actual lifeline performance (the evaluation of which is intensive) to expedite the com-

putation of several lifeline risk-related parameters. The study illustrated this approach by

developing a MART-based relationship between the ground-motion intensities at bridge lo-

cations and the network travel times in the San Francisco Bay Area transportation network,

and using it for estimating confidence intervals for the risk estimates presented in Chapter

6. It was seen that the confidence intervals obtained using the actual and the approxi-

mate performance measures match well. More generally, these approximate performance

relationships can be used in other problems such as prioritizing lifeline retrofits, whose

computational demand stems from the need for repeated performance evaluations.

10.1.5 Seismic risk assessment of spatially-distributed systems usingground-motion models fitted considering spatial correlation

Even though the risk estimates were obtained in Chapter 6 considering spatial correla-

tion, the ground-motion models that were used to predict the distribution of ground-motion

intensities are fitted assuming independence between the intra-event residuals. Chapter

8 illustrated the impact of considering spatial correlation between intra-event residuals

while developing ground-motion models. The mixed-effects algorithm of Abrahamson


and Youngs [1992], which assumes independence between intra-event residuals, was mod-

ified to account for the spatial correlation between the intra-event residuals. This was done

by changing the likelihood function used for estimating the inter-event and the intra-event

residual variances given other model coefficients and changing the estimate of the inter-

event residual given the total residuals at multiple sites. The modified algorithm was used

to refit the Campbell and Bozorgnia [2008] ground-motion model, to illustrate the effect of

this refinement. The variance of the total residuals and the model coefficients used for pre-

dicting the median ground-motion intensity were not significantly affected by the proposed

refinement. Significant changes, however, were seen in the variance of the intra-event and

the inter-event residuals. Incorporating spatial correlation was seen to increase the intra-

event residual variance and to decrease the inter-event residual variance. These changes

have implications for risk assessments of spatially-distributed systems because a smaller

inter-event residual variance implies a lesser likelihood of simultaneously observing larger-

than-median ground-motion intensities at all sites in a region. To demonstrate this effect, a

risk assessment was performed for a hypothetical portfolio of buildings using the ground-

motion models obtained with and without accounting for spatial correlation. The results

showed that using the published inter- and intra-event variance estimates causes an over-

estimation of the exceedance rates of large losses. Ground-motion hazard and seismic risk

calculations at individual locations are unaffected by this issue.

10.1.6 Extension of proposed ground-motion modeling approaches tohurricane risk assessment

Frameworks for the risk assessment of structures and infrastructure systems under a variety

of natural and man-made hazards share many similarities. It is conceivable, therefore, that

the techniques developed for the risk assessment under one type of natural or man-made

hazard will be applicable for the risk assessment under another hazard or multi-hazard sce-

nario. Chapter 9 described an exploratory study carried out to investigate the extension

of the seismic hazard and risk assessment concepts and techniques discussed in the earlier

chapters to hurricane hazard and risk modeling. The study focused on quantifying the un-

certainties and the spatial correlation in hurricane wind fields (using techniques that were


used to quantify these parameters in earthquake ground motion fields), and evaluating their

impact on the hurricane risk of spatially-distributed systems. Hurricane wind-speed predic-

tions were obtained for two sample hurricanes, Hurricane Jeanne and Hurricane Frances,

using the Batts et al. [1980] wind-speed model, and the uncertainties in these predictions

were evaluated using actual wind-speed recordings. The wind-speed residuals had a stan-

dard deviation of approximately 0.15, indicating that the uncertainties are not negligible.

The wind-speed uncertainties at two sites were seen to be correlated, with the correlation

decaying as a Gaussian function of the separation between the sites.

Finally, the impact of the wind-speed uncertainties and the spatial correlation on the

hurricane risk of a spatially-distributed system was illustrated by a sample risk assessment

of a hypothetical portfolio of buildings. It was seen that ignoring the uncertainties or the

correlations results in an overestimation of the probability of exceedance of small losses

and an underestimation of the probability of exceedance of large losses.

10.2 Limitations and future work

10.2.1 Spatial correlation model for spectral accelerations

Chapter 3 presented a spatial correlation model for spectral accelerations developed using

recorded ground motions. The model was developed assuming stationarity (location inde-

pendence) of correlations. Tests for stationarity carried out in Chapter 4 using simulated

time histories showed that the correlation between spectral accelerations at two sites that

are close to the rupture (within 10km) is smaller than the correlation between spectral ac-

celerations at sites that are farther away from the rupture. This is probably because path

effects and small-scale variations near the rupture reduce the spatial correlation between

ground-motion intensities at near-fault sites. This implies that the assumption of stationar-

ity does not completely hold. In the future, it is important to verify this observation using

recorded ground-motion time histories, and develop a correlation model for near-fault sites,

if required. The tests for isotropy indicated that the assumption of isotropy is reasonable

on average. It might be possible that the correlations are stronger along certain directions

in certain locations even if not on average. For instance, it might be reasonable to expect


strong correlation between residuals in the direction of propagation of waves, particularly

at near-fault sites [e.g., Walling, 2009]. This needs to be investigated in more detail in the

future.

While developing the correlation models, priority is placed on building models that fit

the empirical data well at short distances, even if this requires some misfit with empirical

data at large separation distances, because it is more important to model the semivariogram

structure well at short separation distances. This is because widely separated sites also have

little impact on each other due to an effective ’screening’ of their influence by more closely-

located sites (Goovaerts, 1997). In special cases where there are very few closely spaced

points (less than 10, according to Goovaerts, 1997), the influence of farther away points

will not be completely screened. In such cases, the correlation model developed in this

study might provide slightly inaccurate correlation estimates. It is, however, to be noted

that this is mitigated by the fact that the large separation distances are associated with low

correlations, which thus have relatively little effect on joint distributions of ground motion

intensities.

The correlation studies carried out in Chapters 3 and 4 treated the residuals as random

variables. In reality, though, the residuals are related to other unmeasured and unaccounted

(completely or partially) for physical effects such as directivity, basin effects and local site

effects. If these physical effects are directly modeled in the risk assessments (as part of

the mean ground-motion intensity prediction), the models for spatial variability and spatial

correlation should be modified accordingly.

The study in Chapter 3 identified an empirical link between the extent of spatial cor-

relation and the local-site conditions. The link between spatial correlation and other site-

and earthquake-related parameters such as magnitude and faulting mechanism were not in-

vestigated on account of the limited availability of well-processed recorded ground-motion

data sets. In the future, these links can possibly be investigated using simulated ground

motions.

Chapters 3 and 4 focused on simultaneously estimating the spatial correlations between

spectral accelerations at the same period. There are scenarios where spectral accelerations

at multiple periods (or in general, multiple intensity measures) need to be used for assessing

the lifeline risk, in which case the consideration of spatial cross-correlations (correlations


between two different intensity measures) becomes important. This scenario arises, for

instance, when the risk assessment is carried out for a portfolio comprising of structures

with different fundamental periods in which case the risk assessment is based on spectral

accelerations at multiple periods. For instance, damage to tall buildings is better predicted

using spectral accelerations at a long period (say, T1) as compared to short buildings whose

damage due to ground shaking is more correlated with spectral accelerations at a short pe-

riod (say, T2). In such cases, the spatial cross-correlation between εi(T1) and ε j(T2) should

be considered in order to account for the likelihood of observing jointly large spectral ac-

celerations at sites i and j (the same way spatial correlations should be considered when the

spectral accelerations at a single period are used at both sites). Appendix A provided a brief

summary of the technical framework behind the estimation of spatial cross-correlation. In

the future, it is important to develop a cross-correlation model between spectral accelera-

tions at different periods using recorded and simulated time histories.

Chapter 8 described a ground-motion model fitting algorithm that can be used for devel-

oping ground-motion models considering the spatial correlation. This algorithm estimates

the inter-event and the intra-event residual variances using a maximum likelihood frame-

work. Future work can focus on estimating spatial correlation and cross-correlation in

addition to these variances while fitting ground-motion models, by extending the current

maximum likelihood framework to consider the correlation as an unknown parameter as

well.

10.2.2 Lifeline risk assessment

The MCS-based lifeline seismic risk assessment framework proposed in this study was il-

lustrated by assessing the seismic risk of an aggregated model of the San Francisco Bay

Area transportation network. An aggregated network was used because analyzing the per-

formance of a network as large and complex as the San Francisco Bay Area transportation

network under a large number of scenarios (which was be required while implementing the


benchmark conventional MCS framework) is extremely computationally intensive. Net-

work aggregation has been used in several fields of research while assessing the perfor-

mance of large complex networks such as social networks, internet and transportation net-

works. The performance of an aggregated higher-scale network is then used for decision

making on the actual lower-scale network. Future work by the author will further explore

the opportunity to develop new methods to systematically aggregate networks, particularly

for risk assessment purposes, for obtaining significant computational savings.

The importance sampling (IS) technique proposed in Chapter 6 involves sampling large

values of inter-event and intra-event residuals in order to capture the upper tail of the life-

line loss curve accurately. The technique requires determination of the means (referred

to as mean-shifts in Section 6.3.3) of the inter- and intra-event residual sampling distribu-

tions. Large values of the means will result in realizations of large values of inter- and

intra-event residuals (i.e., realizations from the upper tail). It is, however, important to

choose a reasonable value for the means to ensure adequate preferential sampling of large

residuals, while avoiding sets of extremely large residuals that will make the simulated

residuals so improbable as to be irrelevant. Chapter 6 also provided guidance of choosing

the sampling means based on the network in consideration. Using larger or smaller than

optimal means could increase the variance of the risk estimates, and additional research is

required to investigate this effect in detail. It is to be noted that the K-means data reduction

algorithm (applied after the importance sampling step as described in Section 6.5) elimi-

nates redundant importance sampling maps and therefore, has the potential to retain a good

proportion of maps that are relevant to the risk assessment even if IS samples inefficiently.

This, however, needs to be verified in detail through additional research.

The current study did not consider the dependence between component performances,

which may arise between two components constructed by the same contractor (similar

workmanship and material quality) and/ or subjected to the similar material degradations

due to natural environmental fluctuations over time [Lee and Kiremidjian, 2007]. The

current study also did not consider the deterioration of the structural performance of lifeline

components with time.

The risk assessment framework is reasonably general, and can potentially be used to

estimate the risk of a variety of lifelines. The current work, however, only considered a


single isolated infrastructure system (a transportation network). There has been significant

research interest of late in the risk assessment of multiple infrastructure systems (such as

power distribution networks and water distribution networks), with consideration of the in-

terdependencies between the different systems. MCS-based risk assessment frameworks

are conducive to modeling interdependencies, but additional research is required to inves-

tigate this further.

This study used a rate independent earthquake hazard model provided by USGS [2003].

Background seismicity was not considered. Additional research is required to incorporate

rate dependent models and background seismicity in the ground-motion sampling proce-

dure described in Chapter 6. Further, future studies by the author will use seismicity models

provided in more recent USGS reports.

The primary objective of the transportation network risk assessment described in Chap-

ter 6 is to illustrate the effectiveness of the proposed efficient risk assessment framework. A

simplified transportation network model was used for evaluating pre-event and post-event

network performances. These simplifications are identified below.

The current work assumed for simplicity that the post-earthquake demands equal the

pre-earthquake demands, even though this is known not to be true [Kiremidjian et al.,

2003]. The changes in network performance after an earthquake were assumed to be due

only to the delay and the rerouting of traffic caused by structural damage to bridges. The

damage states of the bridges were computed considering only the ground shaking, and other

possible damage mechanisms such as liquefaction and landslides were not considered. The

development of the cross-correlation model discussed earlier will allow the consideration of

multiple types of intensity measures that are required for estimating the damage considering

secondary hazards such as liquefaction and landslides.

The bridge fragility curves provided by HAZUS [1999] were used to estimate the prob-

ability of a bridge being in a particular damage state (no damage, slight damage, moderate

damage, extensive damage and collapse) based on the simulated ground-motion intensity

(spectral acceleration at 1 second) at the bridge site. It was assumed that the bridge fragility

functions can be used to analyze even long span bridges such as a the Golden Gate bridge,

which may not provide realistic results. Ongoing work by the author focuses on devel-

oping a procedure to incorporate the use of ground-motion time histories (instead of only


ground-motion intensities) in the risk assessment framework.

Damaged bridges cause reduced capacity in the link containing the bridge. The study

assumed that the reduced capacities corresponding to the five different HAZUS damage

states as 100% (no damage), 75% (slight damage/ moderate damage) and 50% (extensive

damage/ collapse). (It is to be noted that the current study did not model the possible

increase to the free-flow travel times on damaged links.) The non-zero capacity corre-

sponding to the bridge collapse damage state may seem surprising at first glance. This was

based on the argument that there are alternate routes (apart from the freeways and high-

ways considered in the aggregated model used in this study) that provide reduced access to

transportation services in the event of a freeway or a highway closure [Shiraki et al., 2007].

Such redundancies are prevalent in most transportation networks, but the precise impact of

the redundancy on the capacity of the links in the aggregated model should be studied in

more detail in the future.

A network can have several bridges in a single link, and in such cases, the link capacity

is a function of the damage to all the bridges in the link. The current work assumed that the

link capacity reduction equals the average of the capacity reductions attributable to each

bridge in the link. This is a simplification, and further research is needed to handle the

presence of multiple bridges in a link. The post-earthquake network performance was then

computed by solving the user-equilibrium problem using the new set of link capacities,

and a new estimate of the total travel time in the network was obtained. It is to be noted

that the current work estimated the performance of the network only immediately after an

earthquake. The changes in the performance with network component restorations were

not considered here for simplicity.

It is to be noted that the framework can be used with more accurate and rigorous trans-

portation network models, if desired, but more work is needed to study and overcome

challenges that may arise.

10.2.3 Risk management

One of the important goals of lifeline risk assessment is risk management, by, for example,

retrofitting lifeline components in order to reduce the adverse impact of the earthquakes on


the lifeline. Prioritizing lifeline retrofit is extremely computationally intensive due to the

numerous components present in a lifeline, and on account of the need to evaluate the per-

formance of each possible retrofit scheme under several possible future earthquake scenar-

ios. The computational demand can be reduced using the efficient MCS-based framework

proposed in this work (Chapter 6), in combination with the use of statistical learning tech-

niques to efficiently model network performance (Chapter 7). This needs to be explored in

the future.

10.2.4 Multi-hazard risk assessment

In order to illustrate the application of the proposed ground-motion modeling techniques

to modeling other hazards, the seismic hazard and risk assessment concepts and techniques

discussed in this thesis were applied to hurricane hazard and risk modeling in Chapter 9.

(It is to be noted that the hurricane was the only alternate hazard considered in this work.

It is possible that the challenges that arise in extending the seismic hazard concepts to

modeling other hazards may vary from one hazard to another, and further research is needed

to investigate this.) Many simplifying modeling assumptions were made in this exploratory

study on hurricane risk assessment. The primary simplifying assumptions include the use

of data from only two hurricanes, the use of the simplified Batts et al. [1980] model (which

does not consider the reduction in wind speeds attributable to land friction) and the use of

the deterministic HAZUS [2006] fragility function. Some concerns are also present on the

quality of the wind speed recordings provided by the NOAA Hurricane Research Division

H*Wind program. A more detailed discussion of the limitations and potential future works

connected to the current study can be found in Chapter 9.

10.3 Concluding remarks

The study quantified the distribution of earthquake ground-motion intensities over a re-

gion, which is required for the risk assessment of lifelines. A computationally-efficient

Monte Carlo sampling technique was proposed to evaluate the lifeline seismic risk with

full consideration of the uncertainties and the spatial correlation in ground-motion fields.


Given the effectiveness of the framework when applied to the simplified lifeline model used

here, future research appears warranted to study its use with more realistic lifeline models,

and extend it to quantify the risk of multiple interdependent infrastructure systems under

other hazard and multi-hazard scenarios. Further research is also necessary to utilize the

framework for prioritizing risk-mitigation solutions.

Appendix A

Characterizing spatial cross-correlationbetween ground-motion spectralaccelerations at multiple periods

N. Jayaram and J.W. Baker (2010). Spatial cross-correlation between ground-motion in-

tensities, 9th U.S. National and 10th Canadian Conference on Earthquake Engineering,

Toronto, Canada.

A.1 Abstract

Quantifying ground-motion shaking over a spatially-distributed region rather than at just

a single site is of interest for a variety of applications relating to risk of infrastructure or

portfolios of properties. The risk assessment for a single structure can be easily performed

using the available ground-motion models that predict the distribution of the ground-motion

intensity at a single site due to a given earthquake. These models, however, do not provide

information about the joint distribution of ground-motion intensities over a region, which

is required to quantify the seismic hazard at multiple sites. In particular, the ground-motion

models do not provide information on the correlation between the ground-motion intensi-

ties at different sites during a single event.

Researchers have previously estimated the correlations between residuals of spectral

221

APPENDIX A. SPATIAL CROSS-CORRELATIONS 222

accelerations at the same spectral period at two different sites. But there is still not much

knowledge about cross-correlations between residuals of spectral accelerations at different

periods (or more generally between residuals of two different intensity measures) at two

different sites, which becomes important, for instance, when assessing the risk of a portfolio

of buildings with different fundamental periods. Spatial cross-correlations are also impor-

tant when assessing the risk due to multiple ground-motion effects such as ground shaking

and liquefaction, because this involves the use of multiple types of intensity measures. This

manuscript summarizes recent research in ground-motion spatial cross-correlation estima-

tion using geostatistical tools. Recorded ground-motion intensities are used to compute

residuals at multiple periods, which are then used to estimate the spatial cross-correlation.

These cross-correlation estimates can then be used in risk assessments of portfolios of struc-

tures with different fundamental periods, and in assessing the seismic risk under multiple

ground-motion effects.

A.2 Introduction

Quantifying ground-motion shaking over a spatially-distributed region rather than at just

a single site is of interest for a variety of applications relating to risk of infrastructure or

portfolios of properties. For instance, the knowledge about ground-motion shaking over a

region is important to predict (or estimate after an earthquake) the monetary losses associ-

ated with structures insured by an insurance company, the number of casualties in a certain

area and the probability that lifeline networks for power, water, and transportation may be

interrupted. The risk assessment for a single structure requires only the quantification of

seismic hazard at a single site, which can be easily done using probabilistic seismic hazard

analysis (PSHA). The hazard is typically measured in terms of an intensity measure such as

the spectral acceleration corresponding to the building’s fundamental period (peak response

of simple single-degree-of-freedom (SDOF) oscillators with the same fundamental period

of the real structure) when the damage to a building is to be estimated. Other ground-motion

parameters such as the peak ground acceleration (PGA) or peak ground velocity (PGV) are

used for other applications such as the prediction of liquefaction of saturated sandy soil

or the response of buried pipelines. The hazard assessment procedure uses ground-motion


models that have been developed to predict the distribution of the ground-motion intensity

at a single site after a given earthquake. These models, however, do not provide informa-

tion on the joint distribution of ground-motion intensities over a region, which is required

to quantify the seismic hazard at multiple sites such as for lifeline risk assessment. In par-

ticular, the ground-motion models do not provide information on the correlation between

the ground-motion intensities at different sites during a single event.

In general, the ground-motion intensities at two sites are expected to be correlated for

a variety of reasons, such as a common source earthquake (whose unique properties may

cause correlations in ground motions at many sites), similar locations to fault asperities,

similar wave propagation paths, and similar local-site conditions. Modern ground-motion

models partially account for the correlation via a specific inter-event term as follows:

ln(Sai(T )) = ln(Sai(T )

)+σi(T )εi(T )+ τi(T )ηi(T ) (A.1)

where Sai(T ) denotes the spectral acceleration at period T at site i; Sai(T ) denotes the

predicted (by the ground-motion model) median spectral acceleration (which depends on

parameters such as magnitude, distance, period and local-site conditions); εi(T ) denotes the

normalized intra-event residual at site i associated with Sai(T ), η(T ) denotes the normal-

ized inter-event residual associated with Sai(T ). Both εi(T ) and ηi(T ) are random variables

with zero mean and unit standard deviation. The standard deviations, σi(T ) and τi(T ), are

estimated as part of the ground-motion model and are functions of the spectral period (T )

of interest, and in some models also functions of the earthquake magnitude and the distance

of the site from the rupture. The term σi(T )εi(T ) is called the intra-event residual, and the

term τi(T )ηi(T ) is called the inter-event residual.

Though the ground-motion models partly account for the correlation via ηi, the εi’s still

show a significant amount of residual correlation. Researchers have previously estimated

the correlations between residuals of spectral accelerations at the same spectral period (e.g.,

between εi(T ) and ε j(T )) using recorded ground motions [e.g., Boore et al., 2003, Wang

and Takada, 2005, Goda and Hong, 2008, Jayaram and Baker, 2009a]. These models have

shown that the spatial correlation decays with site separation distance between sites i and

j, and that the rate of decay is a function of the spectral period. These works, however, do


not investigate the nature of the spatial cross-correlation between residuals of two different

intensity measures at two different sites (e.g., between ε j(T1) and ε j(T2)).

Considering spatial correlation in risk analysis is important because correlation between

residuals can lead to large ground-motion intensities over a spatially-extended area. Recent

research has shown that ignoring spatial correlations can significantly underestimate the

seismic risk of portfolios of buildings and of other lifelines such as transportation networks

[e.g., Park et al., 2007, Jayaram and Baker, 2010]. For instance, Figure A.1 shows the

exceedance rates of earthquake-induced travel-time delays in the San Francisco Bay Area

transportation network estimated by [Jayaram and Baker, 2010] while considering/ignoring

spatial correlation. This figure shows that the likelihood of observing large delays gets sig-

nificantly underestimated when spatial correlations are ignored. Spatial cross-correlations

are equally important when multiple intensity measures are used for assessing the system

risk. This arises, for instance, when predicting damage to a portfolio of structures whose

individual damage states are predicted using spectral accelerations at multiple periods. Spa-

tial cross-correlations are also important when secondary effects such as landslides and liq-

uefaction are considered apart from ground shaking. For instance, according to [HAZUS,

1997], the susceptibility of soil to liquefy is a function of the peak ground acceleration (i.e.,

Sa(0)) at the site, which might be different from the primary intensity measure (Sa(T )) of

interest.

This manuscript summarizes recent research in ground-motion spatial cross-correlation

estimation using geostatistical tools. Recorded ground-motion intensities are used to com-

pute residuals at multiple periods, which are then used to estimate the spatial cross corre-

lation. These cross-correlation estimates can then be used in risk assessments of portfolios

of structures with different fundamental periods, and in assessing the seismic risk under

multiple earthquake effects.

A.3 Statistical Estimation of Spatial Cross-Correlation

In this study, geostatistical tools are used to estimate the spatial cross-correlations using

recorded ground-motion data from the Pacific Earthquake Engineering Research (PEER)

Center’s Next Generation Attenuation (NGA) ground-motion library.


Figure A.1: (a) The San Francisco Bay Area transportation network and (b) Annual ex-ceedance rates of various travel time delays on that network (results from Jayaram andBaker [2010]).

The first step involved in developing an empirical cross-correlation model using

recorded ground-motion time histories is to use the time histories to compute the corre-

sponding ground-motion intensities ({SSSaaa(T1),SSSaaa(T2), · · · ,SSSaaa(Tm)}) and the associated nor-

malized residuals ({εεε(T1),εεε(T2), · · · ,εεε(Tm)}) using a ground-motion model. The cross-

correlation structure of the residuals can then be represented by a ‘cross-semivariogram’,

which is a measure of the average dissimilarity between the data [Goovaerts, 1997]. Let

u and u′ denote two sites separated by hhh. The cross-semivariogram (γ(u,u′)) is defined as

follows:

γ(u,u′) =12[E{εu(T1)− εu′(T1)}{εu(T2)− εu′(T2)}] (A.2)

The cross-semivariogram defined in equation A.2 is location-dependent and its infer-

ence requires repetitive realizations of εεε(T1) and εεε(T2) at locations u and u′. Such repetitive

measurements are, however, never available in practice (e.g., in the current application, one

would need repeated observations of ground-motion intensities at every pair of sites of in-

terest). Hence, it is typically assumed that the cross-semivariogram does not depend on site

locations u and u′, but only on their separation hhh to obtain a stationary cross-semivariogram.


The stationary cross-semivariogram (γ(hhh)) can then be estimated as follows:

γ(hhh) =12[E{εu(T1)− εu+hhh(T1)}{εu(T2)− εu+hhh(T2)}] (A.3)

A stationary cross-semivariogram is said to be isotropic if it is a function of the sep-

aration distance (h = ‖hhh‖) rather than the separation vector hhh. An isotropic, stationary

semivariogram can be empirically estimated from a data set as follows:

γ(h) =1

2N(h)

N(h)

∑α=1{εuα

(T1)− εuα+h(T1)}{εuα(T2)− εuα+h(T2)} (A.4)

where γ(h) is the experimental stationary semivariogram (estimated from a data set); N(h)

denotes the number of pairs of sites separated by h; and {εuα(T ),εuα+h(T )} denotes the

α’th such pair.

The covariance structure of εεε(TTT ) is completely specified by the semivariogram function

and the sill and the range of the cross-semivariogram. It can be theoretically shown that

the following relationship can be used to estimate the cross-correlations from the cross-

semivariograms:

γ(h) = ρ12(0)−ρ12 (h) (A.5)

where ρ12(0) denotes the cross-correlation between εu(T1) and εu(T2) at the same site u

and ρ12(h) denotes the cross-correlation between εu(T1) and εu+h(T2). Therefore, it would

suffice to estimate the cross-semivariogram of the residuals in order to determine their

cross-correlations. The correlation term ρ12(0) has been estimated in the past [e.g., Baker

and Jayaram, 2008, Baker and Cornell, 2006], and this work extends these results to include

the effects of differing locations as well.

Once the cross-semivariogram values are obtained at discrete values of h, they are then

fitted using a continuous function of h for prediction purposes. In this work, the discrete

cross-semivariogram values are fitted with an exponential semivariogram which has the

following form:

γ(h) = S(

1− e−3hR

)(A.6)

where S and R denote the sill and the range of the cross-semivariogram respectively. The


value of the sill equals ρ12(0) (from Equations A.5 and A.6), and the range denotes the

separation distance at which the cross-correlation decays to less than 5% of the sill. Since

the values of ρ12(0) have been previously computed by Baker and Jayaram [2008], it will

suffice to estimate the range R to quantify the extent of spatial cross-correlation.

A.4 Sample Results and Discussion

This section discusses some sample cross-correlation estimates obtained using recorded

time histories from the 1999 Chi-Chi earthquake. In particular, spatial cross-correlation

estimates are computed for the 1 second and the 2 second spectral acceleration residuals

from the Chi-Chi earthquake ground motions using the geostatistical procedure described

in the earlier section. These residuals are first computed from the recorded ground motions

using the Boore and Atkinson [2008] ground-motion model, and are shown in Figure A.2a-

b. Visually, the presence of spatial cross-correlation is indicated by the similarity between

the nearby residuals across A.2a-b.

Figure A.2c shows the cross-semivariogram estimated using the above mentioned resid-

uals. An exponential function is then fitted to the discrete cross-semivariogram values, the

sill of which equals 0.7490 (which is the ρ12(0) value obtained from Baker and Jayaram

[2008]). The range of the cross-semivariogram equals 47km, and has been chosen to pro-

vide a good fit at short separation distances, although compromising on the quality of the

fit at larger separation distances. This is because it is more important to model the cross-

semivariogram structure well at short separation distances since large separation distances

are associated with low correlations, which thus have relatively little effect on joint distribu-

tions of ground motion intensities. In addition to having low correlation, widely separated

sites also have little impact on each other due to an effective ’screening’ of their influence

by more closely-located sites [Goovaerts, 1997]. A more detailed discussion on the im-

portance of fitting well at short separation distances can be found in Jayaram and Baker

[2009a].

The sample cross-semivariogram in Figure A.2 shows that the extent of spatial cross-

correlation is reasonably significant. For instance, the value of the cross-correlation equals

0.4 for sites separated by 10km and increases up to 0.75 for sites that are very close to each


another. As a result, it will likely be important to consider spatial cross-correlations while

studying multiple types of intensity measures distributed over a region.

Currently, the author is in the process of developing a spatial cross-correlation model

considering the residuals from multiple intensity measures using recordings from multiple

earthquakes.

A.5 Conclusions

This manuscript summarized recent research in ground-motion spatial cross-correlation

estimation using geostatistical tools. Spatial cross-correlations become important while

quantifying the distribution of different types of ground-motion intensity measures over

a region. This work used cross-semivariograms to model the cross-correlation structure.

A cross-semivariogram is a measure of dissimilarity between the data, whose functional

form (e.g., exponential function), sill and range uniquely identify the ground-motion cross-

correlation as a function of separation distance.

In this work, recorded ground-motion spectral accelerations were used to compute

residuals at multiple periods, which are then used to estimate the spatial cross-correlation.

The manuscript showed sample cross-correlation estimates obtained using the 1s and 2s

Chi-Chi earthquake residuals. The extent of the cross-correlation was found to be fairly sig-

nificant, and hence, it will likely be important to consider spatial cross-correlations while

studying the distribution of multiple types of intensity measures over a region. Currently,

the authors are in the process of developing a spatial cross-correlation model considering

the residuals from multiple intensity measures using recordings from multiple earthquakes.

Once developed, these cross-correlation estimates can be used in risk assessments of port-

folios of structures with different fundamental periods, and in assessing the seismic risk

under multiple ground-motion effects.


Figure A.2: (a) Chi-Chi earthquake normalized residuals computed using spectral accel-erations at 1 second (b) Chi-Chi earthquake normalized residuals computed using spectralaccelerations at 2 seconds (c) Cross-semivariogram estimated using the 1s and 2s Chi-Chiearthquake residuals.

Appendix B

Supporting details for the spatialcorrelation model developed in Chapter3

Excerpted from:

J.W. Baker and N. Jayaram (2009). Effects of spatial correlation of ground-motion param-

eters for multi-site risk assessment: Collaborative research with Stanford University and

AIR. Technical report, Report for U.S. Geological Survey National Earthquake Hazards

Reduction Program (NEHRP) External Research Program Award 07HQGR0031.

(Professor Baker was the first author of the above report as the Principal Investigator of this

project, but all the results and the writing in this appendix were produced by the author of

this thesis)

In Chapter 3, several statements were made about properties of spectral acceleration spatial

correlations that were not explained in detail in the text. In this Appendix, details to support

230

APPENDIX B. SPATIAL CORRELATION MODEL 231

those statements are presented for interested readers.

B.1 Semivariograms of residuals estimated using the

Northridge earthquake ground motions

Chapter 3 discussed the semivariogram ranges at seven periods ranging from 0 to 10s es-

timated using the Northridge earthquake recordings. Figures B.1-B.7 show the semivari-

ograms and the exponential fits obtained in these cases.

Figure B.1: Semivariogram of ε based on the peak ground accelerations observed duringthe Northridge earthquake data


Figure B.2: Semivariogram of ε computed at 0.5 seconds based on the Northridge earth-quake data

Figure B.3: Semivariogram of ε computed at 1 second based on the Northridge earthquakedata


Figure B.4: Semivariogram of ε computed at 2 seconds based on the Northridge earthquakedata

Figure B.5: Semivariogram of ε computed at 5 seconds based on the Northridge earthquakedata


Figure B.6: Semivariogram of ε computed at 7.5 seconds based on the Northridge earth-quake data

Figure B.7: Semivariogram of ε computed at 10 seconds based on the Northridge earth-quake data


B.2 Semivariograms of residuals estimated using Chi-Chi

earthquake ground motions

Chapter 3 discussed the semivariogram ranges at seven periods ranging from 0 to 10s es-

timated using the Chi-Chi earthquake recordings. This section shows the semivariograms

and the exponential fits obtained in these cases.

B.2.1 Exact versus approximate semivariogram fit

Figure B.8 shows the experimental semivariogram values at discrete separation distances,

obtained using the ε values computed at 2 seconds. The most accurate model for the

semivariogram function is a combination of a nugget effect with a contribution of 0.3 and

an exponential semivariogram with a contribution of 0.7 and a range of 85 km, which is

also shown in Figure B.8. This model can be expressed as follows:

γ(h) = 0.3I(h > 0)+0.7(1− exp(−3h/85)) (B.1)

where I(h > 0) is an indicator variable that equals 1 when h > 0 and equals 0 otherwise.

The use of a single model for all semivariograms is highly desirable in order to facilitate

development of a standard correlation model for use in future predictions. The exponential

model is seen to be accurate in most cases and hence, an approximate exponential model

is fitted even in cases where alternate accurate models are available. Hence, the semivari-

ogram function for the ε values computed at 2 seconds is approximated by an exponential

model with a range of 36 km and a sill of 1, as shown in Figure B.8. This semivariogram

function fits the data reasonably well at small separations.

B.2.2 Semivariograms of the residuals at seven periods ranging be-tween 0 and 10s

Figures B.1-B.7 show the semivariograms and the exponential fits obtained using the Chi-

Chi earthquake records.


Figure B.8: Experimental semivariogram of ε computed at 2 seconds based on the Chi-Chiearthquake data. Also shown in the figure are two fitted semivariogram models: (i) Anaccurate exponential + nugget model and (ii) An approximate exponential model

Figure B.9: Semivariogram of ε based on the peak ground accelerations observed duringthe Chi-Chi earthquake data


Figure B.10: Semivariogram of ε computed at 0.5 seconds based on the Chi-Chi earthquakedata

Figure B.11: Semivariogram of ε computed at 1 second based on the Chi-Chi earthquakedata


Figure B.12: (Approximate) Semivariogram of ε computed at 2 seconds based on the Chi-Chi earthquake data

Figure B.13: Semivariogram of ε computed at 5 seconds based on the Chi-Chi earthquakedata


Figure B.14: Semivariogram of ε computed at 7.5 seconds based on the Chi-Chi earthquakedata

Figure B.15: Semivariogram of ε computed at 10 seconds based on the Chi-Chi earthquakedata


B.3 Semivariograms of residuals estimated using broad-

band simulations for scenario earthquakes on the

Puente Hills thrust fault system

Chapter 3 only discussed the correlations estimated using recorded ground motions. This

section describes the correlations between residuals computed based on broadband ground-

motion simulations for scenario earthquakes on the Puente Hills thrust fault system [Graves,

2006], which are not discussed in any of the earlier chapters. The simulated time histories

are available for five different rupture scenarios that differ in the rupture velocity and the

rise time. In this work, ground motions due to the rupture scenario defined by a rupture

velocity equaling 80% of the shear wave velocity and a rise time of 1.4 seconds are used

for the analysis. The ground-motion time histories have been simulated at 648 sites cov-

ering the Los Angeles, San Fernando and San Gabriel basin regions. The time histories at

locations with very low Vs30 values, however, were reported to be possibly inaccurate be-

cause the simulation algorithm does not yet fully account for non-linear site effects [Graves,

2007]. Hence, in the current work, only the time histories at sites with Vs30 values exceed-

ing 300m/s are considered for analysis.

Experimental semivariograms are obtained for ε’s computed at several different periods

ranging from 0 - 10 seconds. The exponential model is found to provide a good fit at periods

below 2 seconds. At longer periods, however, a spherical model provides a better fit than an

exponential model. For example, Figure B.16 shows the experimental semivariogram and

a fitted spherical model (unit sill and range equaling 32 km) based on residuals computed

at 5 seconds.

γ(h) =32

r32− 1

2

( r32

)3if h≤ 32 (B.2)

= 1 otherwise

As explained earlier, for consistency with other results, exponential models that provide

a reasonably good approximation at short separation distances (that are useful in practice)

are used to model the semivariograms. For example, the experimental semivariogram can


Figure B.16: Experimental Semivariogram of ε computed at 5 seconds based on the simu-lated ground-motion data. Also shown in the figure are two fitted semivariogram models:(i) An accurate spherical model and (ii) An approximate exponential model

also be fitted with an exponential model which has a unit sill and a range of 60 km as

shown in Figure B.16. It can be seen from the figure that this exponential function models

the correlations at small separations reasonably accurately.

A plot of the range of semivariograms as a function of period is shown in Figure B.17.

The trend of increasing range with period is seen in this figure as well. The computed

ranges are reasonably similar to those seen from the Northridge earthquake data. It is to be

noted, however, that the ground-motion simulations at short periods (periods ≤ 2 seconds)

may not be entirely accurate, and hence, the ranges obtained using the Northridge and

Chi-Chi earthquake data are more reliable estimates.


Figure B.17: Range of semivariograms of ε , as a function of the period at which ε valuesare computed. The residuals are obtained using the simulated ground-motion data

B.4 Clustering of Vs30’s

The semivariogram model described in Chapter 3 involves determining the presence of

Vs30 clustering (Section 3.4.5). This is best done by computing the range of the Vs30

semivariogram and comparing it to the ranges shown in Figure 3.5. In order to provide

additional guidance to users of the correlation model, Figure B.18 shows simulated mul-

tivariate normal random fields with three different levels of correlation, with mean and

variance equaling those of the Vs30’s in the San Francisco bay area region. The correlation

structure in Figure B.18a is defined by an exponential semivariogram with a range of 0km,

and is an example of heterogeneous Vs30 conditions (Case 1 in Section 3.4.5). This is indi-

cated by the lack of clustering of the Vs30’s in the figure. The correlation structure in Figure

B.18b is defined by an exponential semivariogram with a range of 20km, and is an example

of low-moderately heterogeneous conditions (Case 1 in Section 3.4.5). Figure B.18c is an

example of homogeneous Vs30 conditions, and has a correlation structure defined by an

exponential semivariogram with a range of 40km (Case 2 in Section 3.4.5). The clustering

of the Vs30 field in the region of interest (where the region is defined as the collection of

sites of interest) can be compared to that of the three maps to approximately determine the


appropriate case to use.

Figure B.18: Simulated multivariate normal random fields. The correlation structure isdefined using an exponential semivariogram with range equaling (a) 0km (b) 20km and (c)40km.


B.5 Correlation between near-fault ground-motion inten-

sities

Most currently available ground-motion models do not directly predict ground motions

containing strong velocity pulses, such as those caused by near-fault directivity. As a result,

the ground-motion intensities predicted by the models at sites that experience pulse-like

ground motions will be different from the observed values. Such systematic prediction

errors can increase the apparent correlation between the residuals computed at these sites.

Hence, in this section, empirical data are used to verify whether the correlation between

residuals at sites experiencing pulse-like ground motion is significantly different from the

correlation between residuals at other sites.

Baker [2007a] used wavelet analysis to extract velocity pulses from ground motions

and developed a quantitative criterion for classifying a ground motion as pulse-like. Ninety

one large-velocity pulses were found in the fault-normal components of the approximately

3500 strong ground-motion recordings in the PEER NGA Database [2005]. It should be

noted that not all of these pulses may be due to directivity effects, but this provides a

reasonable data set for studying the potential impact of directivity. Of these, 30 pulses

were found in the fault-normal components of the Chi-Chi earthquake recordings, while

the rest of the earthquakes have far fewer recordings with pulses. In the current work, the

pulse-like ground motions from the Chi-Chi earthquake are used to compute ε values at

different periods. The semivariograms of the residuals are obtained and compared to those

estimated using all usable records (section 3.4.3).

Figures B.19-B.25 compare experimental semivariograms of residuals (at seven differ-

ent periods) computed using pulse-like ground motions to experimental semivariograms

of residuals computed using all usable ground motions. The figures show the experimental

semivariogram values at short separation distances, which are of interest in practice. On ac-

count of the fewer available records, it is to be noted that the experimental semivariograms

obtained using pulse-like ground motions are less clearly defined than those obtained using

all usable ground motions. Hence, it is difficult to fit robust models for the experimental

semivariograms obtained using the pulse-like ground motions. As a result, the experimental

semivariograms are compared as such, rather than by their models and ranges.


It can be seen from Figures B.19-B.25 that the experimental semivariogram values ob-

tained using the pulse-like ground motions are slightly less than those obtained using all

usable ground motions, particularly at separation distances below 10 km and at long pe-

riods (7.5 and 10 seconds). This is consistent with expectations as the pulses from this

earthquake typically have periods of approximately 7 seconds and so, it is expected that

this is the period range that would be most strongly influenced by directivity. In other

words, the ε’s obtained using pulse-like ground motions show slightly larger correlations

than those obtained using all usable ground motions. The difference in the correlations is

typically around 0.1, with a maximum value of approximately 0.2.

While the increased correlations between the residuals at sites experiencing pulse-

like ground motions is expected, the difference in the correlation seems reasonably small.

Moreover, it is to be noted that the source of this additional correlation is the systematic pre-

diction errors caused by the ground-motion models at sites experiencing pulse-like ground

motions. Hence, if ground-motion models that account for directivity effects accurately are

developed, the correlations between near-fault ground-motion intensities can be expected

to the similar to the correlation between ground-motion intensities at other sites. That is,

the directivity effects are best addressed through refinements to ground-motion models,

rather than refinements to correlation models.


Figure B.19: Comparison between the experimental semivariogram of ε’s computed usingpulse-like ground motions and the experimental semivariogram of ε’s computed using allusable ground motions. The ε’s are computed from peak ground accelerations

Figure B.20: Comparison between the experimental semivariogram of ε’s computed usingpulse-like ground motions and the experimental semivariogram of ε’s computed using allusable ground motions. The ε’s are obtained from spectral accelerations computed at 0.5seconds


Figure B.21: Comparison between the experimental semivariogram of ε’s computed usingpulse-like ground motions and the experimental semivariogram of ε’s computed using allusable ground motions. The ε’s are obtained from spectral accelerations computed at 1second

Figure B.22: Comparison between the experimental semivariogram of ε’s computed usingpulse-like ground motions and the experimental semivariogram of ε’s computed using allusable ground motions. The ε’s are obtained from spectral accelerations computed at 2seconds



Figure B.24: Comparison between the experimental semivariogram of ε’s computed usingpulse-like ground motions and the experimental semivariogram of ε’s computed using allusable ground motions. The ε’s are obtained from spectral accelerations computed at 7.5seconds


B.6 Directional semivariograms estimated using the

Northridge and the Chi-Chi earthquake records at

various periods

Chapter 3 showed that the directional semivariograms obtained using the Northridge earth-

quake 2s residuals match reasonably well, thereby indicating that the use of an isotropic

correlation model is reasonable. This section provides more empirical evidence (directional

semivariograms obtained using Chi-Chi and Northridge earthquake recordings, considering

residuals at three different periods) to support the assumption of isotropy.


(a) (b)

(c) (d)

Figure B.26: Experimental directional semivariograms at discrete separations obtained us-ing the Northridge earthquake ε values computed at 2 seconds. Also shown in the figuresis the best fit to the omni-directional semivariogram: (a) Omni-directional; (b) Azimuth =0; (c) Azimuth = 45 and (d) Azimuth = 90


(a) (b)

(c) (d)

Figure B.27: Experimental directional semivariograms at discrete separations obtained us-ing the Chi-Chi earthquake ε values computed at 1 second. Also shown in the figures is thebest fit to the omni-directional semivariogram: (a) Omni-directional; (b) Azimuth = 0; (c)Azimuth = 45 and (d) Azimuth = 90


(a) (b)

(c) (d)

Figure B.28: Experimental directional semivariograms at discrete separations obtained us-ing the Chi-Chi earthquake ε values computed at 7.5 seconds. Also shown in the figures isthe best fit to the omni-directional semivariogram: (a) Omni-directional; (b) Azimuth = 0;(c) Azimuth = 45 and (d) Azimuth = 90


(a) (b)

(c) (d)

Figure B.29: Experimental directional semivariograms at discrete separations obtained us-ing the simulated time histories. The ε values are computed at 2 seconds. Also shown inthe figures is the best fit to the omni-directional semivariogram: (a) Omni-directional; (b)Azimuth = 0; (c) Azimuth = 45 and (d) Azimuth = 90


(a) (b)

(c) (d)

Figure B.30: Experimental directional semivariograms at discrete separations obtained us-ing the simulated time histories. The ε values are computed at 7.5 seconds. Also shown inthe figures is the best fit to the omni-directional semivariogram: (a) Omni-directional; (b)Azimuth = 0; (c) Azimuth = 45 and (d) Azimuth = 90


(a) (b)

(c) (d)

Figure B.31: Experimental directional semivariograms at discrete separations obtained us-ing the simulated time histories. The ε values are computed at 7.5 seconds. Also shown inthe figures is an anisotropic model that fits the four experimental semivariograms well (It isto be noted that an anisotropic semivariogram has different shapes in different directions.):(a) Omni-directional; (b) Azimuth = 0; (c) Azimuth = 45 and (d) Azimuth = 90

Appendix C

Deaggregation of lifeline risk: Insightsfor choosing deterministic scenarioearthquakes

N. Jayaram and Baker, J.W. (2009). Deaggregation of lifeline risk: Insights for choos-

ing deterministic scenario earthquakes, Lifeline Earthquake Engineering in a Multihazard

Environment TCLEE, Oakland, California.

C.1 Abstract

Probabilistic seismic risk assessment for lifelines is less straightforward than for individual

structures. Analytical risk assessment techniques such as the ‘PEER framework’ are insuf-

ficient for a probabilistic study of lifeline performance, due in large part to difficulties in

describing ground-motion hazard over a region. As a result, Monte Carlo simulation and its

variants appear to be the best approach for characterizing ground motions for lifelines. A

challenge with Monte Carlo simulation is its large computational expense, and in situations

where computing lifeline losses is extremely computationally demanding, assessments may

consider only a single ‘interesting’ ground-motion scenario and a single associated map of

resulting ground motion intensities.

In this paper, a probabilistic simulation-based risk assessment procedure is coupled

257

APPENDIX C. LIFELINE RISK DEAGGREGATION 258

with a deaggregation calculation to identify the ground-motion scenarios most likely to

produce exceedance of a given loss threshold. The deaggregation calculations show that

this ‘most-likely scenario’ depends on the loss level of interest, and is influenced by factors

such as the seismicity of the region, the location of the lifeline with respect to the faults

and the current performance state of the various components of the lifeline. It is seen that

large losses are typically caused by moderately large magnitude events with large average

values of inter-event and intra-event residuals, implying that the scenario ground motions

should be obtained in a manner that accounts for ground-motion uncertainties. Explicit

loss analysis calculations that exclude residuals will demonstrate that the resulting loss

estimates are highly biased.

C.2 Introduction

Probabilistic seismic risk assessment for lifelines is less straightforward than for individual

structures. While procedures such as the ‘PEER framework’ have been developed for risk

assessment of individual structures, these are not easily applicable to distributed lifeline

systems, due in large part to difficulties in describing ground-motion hazard over a region

(in contrast to ground-motion hazard at a single site, which is easily quantified using Proba-

bilistic Seismic Hazard Analysis). In the past, researchers have used simplified approaches

to tackle the problem of specifying ground motions over a region. In the simplest case,

the uncertainties in the ground-motion intensities are ignored, and lifeline risks are stud-

ied using the median ground motions predicted by ground-motion models [e.g., Shiraki

et al., 2007, Campbell and Seligson, 2003]. While this approach reduces the computa-

tional burden significantly, ignoring the uncertainties in the ground-motion intensities will

result in highly biased risk estimates as shown in this paper subsequently. Sometimes, as

a simplification, lifeline risks are assessed using only those earthquake scenarios that may

dominate the ground-motion hazard in the region of interest [e.g., Adachi and Ellingwood,

2008]. This approach is helpful practically in reducing computational expense, but suffers

from several problems. First, it is difficult to identify the probability of actually incurring

the computed losses resulting from a single ground-motion scenario. Second, the scenario

earthquake is generally chosen in a somewhat ad hoc manner, and so there is no guarantee


that the chosen scenario is the one that is most ‘interesting’ in terms of risk to the lifeline

system.

Crowley and Bommer [2006] and more recently, Jayaram and Baker [2010] proposed

Monte Carlo simulation (MCS)-based frameworks to forward simulate ground-motion in-

tensities in future earthquakes, which can then be used for the risk assessment of lifelines.

The sampling frameworks are based on the form of existing ground-motion models, which

is described below. The ground motion at a site is modeled as [e.g., Boore and Atkinson,

2008]

ln(Yi) = ln(Yi)+ εi +η (C.1)

where Yi denotes the ground-motion parameter of interest (e.g., Sa(T ), the spectral acceler-

ation at period T ); Yi denotes the predicted (by the ground-motion model) median ground-

motion intensity (which depends on parameters such as magnitude, distance, period and

local-site conditions); εi denotes the intra-event residual, which is a random variable with

zero mean and standard deviation σi; and η denotes the inter-event residual, which is a

random variable with zero mean and standard deviation τ . The standard deviations, σi

and τ , are estimated as part of the ground-motion model and are a function of the spectral

period of interest, and in some models also a function of the earthquake magnitude and

the distance of the site from the rupture. The intra-event residual at two sites i and j are

correlated, and the correlation is a function of the separation distance between the sites.

The extent of the correlation can be obtained from spatial correlation models such as that

of Jayaram and Baker [2009a] and Wang and Takada [2005].

Crowley and Bommer [2006] describe the MCS approach used to probabilistically sam-

ple ground-motion maps. This approach involves simulating earthquakes of different mag-

nitudes on various active faults in the region, followed by simulating the inter-event and

the intra-event residuals at the sites of interest for each earthquake. The residuals are then

combined with the median ground motions in accordance with Equation 1 in order to obtain

the ground motions at all the sites. In the current work, the simulation approach described

above is coupled with a deaggregation calculation that can identify the ground-motion sce-

nario most likely to produce exceedance of a given loss threshold. The results show that


the most-likely scenario depends on the loss level of interest, and is influenced by factors

such as the seismicity of the region, the location of the lifeline with respect to the faults

and the current performance state of the various components of the lifeline. It is also seen

that large losses are most likely to be caused by moderately large magnitude earthquakes

combined with large positive inter-event and intra-event residuals. The findings illustrate

the importance of accounting for ground-motion uncertainty, as well as provide a basis for

a decision maker to choose interesting scenario ground motions for lifeline risk assessment.

C.3 Deaggregation of seismic loss

This section describes the fundamentals of the seismic loss deaggregation procedure which

is used in the current study. Deaggregation is the process used to quantify the likelihood

that various events could have produced the exceedance of a given loss threshold. For

instance, if it is known that the seismic loss exceeds x units, the likelihood that an event of

magnitude m could have caused the exceedance is given as follows:

P(Magnitude = m|Loss > x) =P(Loss > x,Magnitude = m)

P(Loss > x)

=λ (Loss > x,Magnitude = m)

λ (Loss > x)(C.2)

where λ (Loss > x,Magnitude = m) denotes the recurrence rate of events of magnitude m

causing more than loss x and λ (Loss > x) is the recurrence rates of events causing a loss

exceedance of x. These parameters can be estimated using the simulation-based framework

described in Section C.2.

The likelihoods can also be computed considering multiple parameters such as magni-

tudes and faults as follows:

P(Magnitude=m, f ault = f |Loss> x)=λ (Loss > x,Magnitude = m, f ault = g)

λ (Loss > x)(C.3)

Such calculations are common practice when loss assessments are carried out for a


single structure (though most deaggregation calculations estimate the contribution (likeli-

hood) of various earthquake scenarios to ground-motion intensity exceedance rather than

loss exceedance). Typical results from the single-site deaggregation computations include

the joint likelihoods of magnitudes, rupture distances (distance of the structure from the

rupture) and residuals (Equation C.1).

In the current work, it is of interest to identify the contributions of magnitudes, rupture

locations and residuals (inter-event and intra-event) to lifeline losses. Deaggregation calcu-

lations for lifeline losses need to account for the fact that ground motions at multiple sites

are of interest. This would mean that a specific distance to the rupture cannot be obtained

as is commonly done when a single structure is involved. In the current work, this problem

is overcome by specifying the fault on which the rupture lies rather than the distance to any

particular site. Further, since each site of interest is associated with a different intra-event

residual, deaggregation is used to compute the contribution of the mean intra-event residual

(i.e., the average of the intra-event residuals at all sites) rather than the contribution of the

intra-event residual at any particular site.

C.4 Loss assessment for the San Francisco Bay Area

transportation network

The deaggregation computations in the current work are based on the loss estimates for an

aggregated form of the San Francisco bay area transportation network provided by Jayaram

and Baker [2009a]. This section describes the details of the aggregated network as well as

describes the performance measures considered in the loss assessment process. Figure

C.1 shows the deaggregated network along with the various important faults in the San

Francisco bay area. The network consists predominantly of freeways and expressways,

and has a total of 586 links, 310 nodes and 1,125 bridges. In this network, the traffic

originates and culminates in 46 nodes denoted centroidal nodes. Transportation network

performance is usually measured in terms of the total travel time of the network [Shiraki

et al., 2007, Stergiou and Kiremidjian, 2006]. The total travel time is obtained using the

user-equilibrium principle which states that, under equilibrium, each user would choose the


Figure C.1: The aggregated San Francisco bay area transportation network.

path that would minimize his/ her travel time [Beckman et al., 1956]. The user-equilibrium

formulation is solved by the commonly-used solution technique provided by [Frank and

Wolfe, 1956].

The changes in the network travel time after an earthquake are due to structural damage

to bridges which will result in link closures and reduction in the link capacities. (The

current work considers only the change in the total network travel time, and omits monetary

costs due to structural damage.) Thus, the loss assessment is carried out by accounting for

the structural damage to bridges caused by each simulated ground-motion map (obtained

using the simulation-based procedure described in Section C.2) and computing the network

travel time in the damaged state (In the current work, only peak-hour demands and travel

times are considered.) Figure C.2 shows the loss estimates in the form of a recurrence

curve, which shows the rate of exceeding various travel times delays. The current work

uses these loss estimates (i.e., travel time delays) in the deaggregation calculations.


Figure C.2: Recurrence curve for the travel time delay obtained using the simulation-basedframework.

C.5 Results and Discussion

This section presents the results from the deaggregation calculations, which include the

contribution of magnitudes, faults, inter-event residuals and mean intra-event residuals to

lifeline losses. The estimates are obtained using equations similar to C.2 and C.3, where the

required recurrence rates are obtained using the simulation-based loss assessment frame-

work described in the previous sections. For instance, if 100 out of 15,000 simulated events

involve an earthquake of magnitude 7 and a loss (i.e., travel time delay) exceeding 10,000

hours, P(Loss > 10,000,Magnitude = 7) = 100/15,000.

C.5.1 Contribution of magnitudes and faults to the lifeline losses

Figure C.3 shows the contribution (i.e., the likelihood term obtained from Equation 3) of

various magnitudes and faults to the probability of exceeding four different travel delay

thresholds, namely, 0 hours, 5,000 hours, 10,000 hours and 20,000 hours. (The total travel

time in the network during normal operating conditions equals 73,000 hours.) In order

to obtain the contributions of discrete magnitudes to the loss exceedance, earthquakes of


Figure C.3: Joint likelihoods of magnitudes and faults given that travel time delay exceeds(a) 0 hours, (b) 5,000 hours, (c) 10,000 hours and (d) 20,000 hours.

different magnitudes need to be pooled in to bins of select discrete magnitudes. In the

current work, the bin size is chosen to be 0.5. For instance, all magnitudes between 7.75

and 8.25 will be classified as magnitude 8.

Figure C.3 shows that, at small loss thresholds, small magnitude events contribute sig-

nificantly to the loss, which is understandable since small magnitude events are signifi-

cantly more probable than large magnitude events. Also, as seen from the Figure C.3, the

loss is typically dominated by events on the northern segment of the San Andreas Fault.

This is because the rate of earthquake occurrence on the San Andreas Fault is much larger

than that on other faults.

At moderate loss levels (5,000-10,000 hours), a significant portion of the contribution

is shared by earthquake events on the Hayward and the San Andreas Faults. Events of

magnitude close to 7 on the Hayward Fault and of magnitude around 8 on the San Andreas

Fault are ’characteristic events’ on the respective faults [USGS, 2003]. In other words,

these earthquakes are known to occur on a fairly regular basis and hence, are more likely


Figure C.4: Level of congestion in the network as indicated by the volume/ capacity ratio.

than even some of the smaller magnitude events on these faults. It can be seen from Figure

C.3 that the characteristic events contribute most to the moderate losses by virtue of the

higher likelihoods of occurrence. Further, it is interesting to note that an event of magnitude

7 on the Hayward has a slightly larger contribution than a much larger event (magnitude 8)

on the San Andreas fault. This is due to the fact that the Hayward fault is right down the

middle of the network while the San Andreas is on the western end. As a result, an event

on the Hayward fault causes moderate damage to all the links in the network, while the

San Andreas event causes extensive damage to the west end of the network and very less

damage to the east end. The overall effect is a nearly equal contribution to the losses by

both the above-mentioned events.

At large loss levels (20,000 hours), however, events on the San Andreas Fault again

dominate the hazard. Of all the links present in the transportation network, the most con-

gested ones under normal operating conditions are in the western portion of the network.

This can be seen from Figure C.4 which shows the ratio of the volume of traffic in each

link normalized by the link capacity. Large travel time delays are incurred if links that are

congested (volume/capacity greater than 0.75) under normal conditions suffer damage in-

creasing the congestion even further. This happens when a moderate to large event occurs


Figure C.5: Joint likelihoods of inter-event residual given that travel time delay exceeds (a)0 hours, (b) 5,000 hours, (c) 10,000 hours and (d) 20,000 hours.

on the San Andreas Fault (which is adjacent to several congested links) and has large resid-

uals, and hence such a scenario is the primary cause for large delays in the network. It can

be seen from the above discussion that the most-likely scenario depends on the loss level

of interest, and is influenced by factors such as the seismicity of the region, the location of

the lifeline with respect to the faults and the performance state of the various components

of the lifeline under normal operating conditions. In fact, for certain loss levels, it may

not even be possible to choose a single dominating event as shown in Figures C.3b and

c, which show nearly equal contributions by events on the Hayward and the San Andreas

Faults.


Figure C.6: Joint likelihoods of inter-event residual given that travel time delay exceeds (a)0 hours, (b) 5,000 hours, (c) 10,000 hours and (d) 20,000 hours.

C.5.2 Contribution of inter- and intra-event residuals to the lifelineloss

Figures C.5 and C.6 show the contribution of mean intra-event and inter-event residuals

to the probability of exceeding four different travel time delay thresholds. As expected,

events with residuals close to zero (the mean value) dominate small seismic losses. As the

loss level increases, the contribution of large inter-event and large mean intra-event resid-

uals increases rapidly. It can be seen from Figures C.5d and C.6d that, at a loss threshold

of 20,000 hours, significant contributions are obtained from mean intra-event residuals be-

tween 0.3 and 0.5 and inter-event residuals between 1.5 and 3. These results are perhaps

not surprising given the large effect that inter-event and intra-event residuals have on the re-

sulting ground motions. Since the inter-event residual is constant across the entire region, a

large positive value will increase the ground-motion intensity at every site in the region. As

a consequence, appropriate consideration of the inter-event residual is extremely important

while assessing lifeline losses than while assessing the losses for a single structure.


Figure C.7: Mean magnitude of earthquakes producing a travel time delay exceeding aspecified threshold.

Figure C.8: (a) Average of mean intra-event residual of earthquakes producing a travel timedelay exceeding a specified threshold (b) Average of inter-event residual of earthquakesproducing a travel time exceeding a specified threshold.


Figure C.9: Recurrence curves obtained without completely accounting for inter-event andintra-event residuals.

Finally, Figures C.7 and C.8 summarize the findings from the deaggregation calcula-

tions, and illustrate the variation in the mean magnitude and the mean residuals of the

ground-motion scenarios that contribute to the probability of exceeding various lifeline

loss thresholds. For instance, the mean magnitude causing a travel time delay exceeding

x hours is obtained by averaging the magnitudes of all earthquakes that produce a travel

time delay greater than x hours. The figures show that the magnitude, inter-event residual

and mean intra-event residual increase rapidly as the travel time delay threshold increases

(Some of the wiggles seen at large thresholds are due to small sample sizes at these thresh-

olds.) It is interesting to note that most of the extremely large losses occur at magnitudes

well below the maximum (the maximum is 8.05 in this source model), which indicates that

large losses are typically caused by moderately large events combined with large values

of residuals (Figure 8) as explained previously. This result can be understood intuitively

as follows: while ‘maximum magnitude’ events certainly cause large losses, they occur so

infrequently that in many cases, more common moderate magnitude events may be more

important.


In order to further emphasize the importance of residuals, in the current work, the loss

assessment for the aggregated network was repeated without considering one or both types

of residuals (i.e., the inter-event and the intra-event residuals). The recurrence curves ob-

tained are shown in Figure C.9. The figure shows that the loss is significantly underesti-

mated if even one of the two types of residuals is not considered. This is to be expected

based on the previous observation that the contribution to large loss levels typically comes

from events of moderately large magnitude and large positive residuals rather than events

of extremely large magnitudes and zero residuals.

C.6 Transportation network performance under sample

scenario ground-motion maps

This section provides a graphical illustration of why residuals play an important part in

determining the losses to the transportation network. The performance of the network is

analyzed under three different ground-motion scenarios, namely, A, B and C. All three

scenarios result from an earthquake of magnitude 8.1 on the northern segment of the San

Andreas Fault, and have a mean intra-event residual of approximately zero. The value of

the inter-event residual equals 3.79 in scenario A, -1.64 in scenario B and 0 in scenario C.

Figure C.10 graphically shows the performance of the transportation network under the

three ground-motion scenarios. Thicker lines indicate links experiencing larger increases

in the travel times. It can be seen that the delays are much greater under scenario A than

under scenarios B and C. In fact, the travel time delay in the network equals 32,600 hours

under scenario A, 1,550 hours under scenario B and 4,580 hours under scenario C. The

significant differences are a result of the differences in the inter-event residual, since the

predicted median ground-motion intensities in all these three cases are identical.

C.7 Conclusions

In this paper, a probabilistic simulation-based loss assessment procedure is coupled with

a deaggregation calculation that can identify the ground-motion scenarios most likely to


Figure C.10: Performance of the network under three difference ground-motion scenarioscorresponding to three different inter-event residuals. (a) η = 3.79, (b) η = -1.64 and (c)η= 0.


produce exceedance of a given loss threshold for a spatially-distributed lifeline system.

The deaggregation calculation quantifies the likelihood that various events (magnitudes,

faults, inter-event and intra-event residuals) could have produced the exceedance of a given

loss threshold. In the current work, deaggregation calculations are performed to identify

the likelihoods of earthquake events that cause various levels of travel time delays (the

lifeline loss measure) in an aggregated form of the San Francisco bay area transportation

network. The deaggregation calculations indicate that the ‘most-likely’ scenario depends

on the loss level of interest, and is influenced by factors such as the seismicity of the

region, the location of the lifeline with respect to the faults and the performance state

of the various components of the lifeline under normal operating conditions. In fact, for

certain loss levels, it is seen that two different events (different magnitudes and faults)

could have similar contributions to the loss exceedance making it impossible to identify

a single most-likely scenario earthquake. The deaggregation calculations also show that

large losses are typically caused by moderately large magnitude events with large values

of inter-event and intra-event residuals, indicating that it is very important to appropriately

account for the residuals in the loss assessment framework. Loss assessments carried out

without accounting for either the inter-event or the intra-event residuals produce highly

biased and incorrect loss estimates.

Bibliography

B.T. Aagaard, T.M. Brocher, D. Dolenc, D. Dreger, W. Graves, S. Harmsen, S. Hartzell,

S. Larsen, and M.L. Zoback. Ground-motion modeling of the 1906 San Francisco earth-

quake, part I: Validation using the 1989 Loma Prieta earthquake. Bulletin of the Seismo-

logical Society of America, 98(2):989–1011, 2008.

S.D. Aberson and J.L. Franklin. Impact on hurricane track and intensity forecasts of GPS

dropwindsonde observations from the first-season flights of the NOAA Gulfstream-IV

jet aircraft. Bulletin of the American Meteorological Society, 80(3):421–428, 1999.

N.A. Abrahamson. Statistical properties of peak ground accelerations recorded by the

SMART 1 array. Bulletin of the Seismological Society of America, 78(1):26–41, 1988.

N.A. Abrahamson. Seismic hazard assessment: Problems with current practice and fu-

ture developments. Keynote address in the First European Conference on Earthquake

Engineering and Seismology, Geneva, Switzerland, 2006.

N.A. Abrahamson and W.J. Silva. Summary of the Abrahamson & Silva NGA ground-

motion relations. Earthquake Spectra, 24(1):99–138, 2008.

N.A. Abrahamson and R.R. Youngs. A stable algorithm for regression analyses using the

random effects model. Bulletin of the Seismological Society of America, 82(1):505–510,

1992.

T. Adachi and B.R. Ellingwood. Serviceability of earthquake-damaged water systems: Ef-

fects of electrical power availability and power backup systems on system vulnerability.

Reliability Engineering and System Safety, 93:78–88, 2008.

273

BIBLIOGRAPHY 274

T. Annaka, F. Yamazaki, and F. Katahira. Proposal of peak ground velocity and response

spectra based on JMA 87 type accelerometer records. Proceedings, 27th JSCE Earth-

quake Engineering Symposium, 1:161–164, 1997.

ASCE Standard 7-02. Minimum design loads for buildings and other structures. Technical

report, Reston (VA): American Society of Civil Engineering, 2003.

J.W. Baker. Quantitative classification of near–fault ground motions using wavelet analysis.

Bulletin of the Seismological Society of America, 97(5):1486–1501, 2007a.

J.W. Baker. Quantitative classification of near–fault ground motions using wavelet analysis.

Bulletin of the Seismological Society of America, 97(5):1486–1501, 2007b.

J.W. Baker and C.A. Cornell. Correlation of response spectral values for multicomponent

ground motions. Bulletin of the Seismological Society of America, 96(1):215–227, 2006.

J.W. Baker and N. Jayaram. Correlation of spectral acceleration values from NGA ground

motion models. Earthquake Spectra, 24(1):299–317, 2008.

J.W. Baker and N. Jayaram. Effects of spatial correlation of ground-motion parame-

ters for multi-site risk assessment: Collaborative research with stanford university and

air. Technical report, Technical report, Report for U.S. Geological Survey National

Earthquake Hazards Reduction Program (NEHRP) External Research Program Awards

07HQGR0031, 2009.

N. Basoz and A.S. Kiremidjian. Risk assessment for highway transportation systems. Tech-

nical report, Report No. 118, Blume Earthquake Engineering Center, Stanford Univer-

sity, 1996.

M.E. Batts, M.R. Cordes, L.R. Russel, J.R. Shaver, and E. Simiu. Hurricane wind speeds in

the United States. Technical report, Report No. BSS-124, National Bureau of Standards,

U.S. Department of Commerce, Washington, D.C., 1980.

P. Bazzurro. Personal communication, 2010.

BIBLIOGRAPHY 275

P. Bazzurro and C.A. Cornell. Vector-valued probabilistic seismic hazard analysis (VP-

SHA). In Proceedings of the 7th U.S. National Conference on Earthquake Engineering,

Boston, MA, 2002.

P. Bazzurro and N. Luco. Effects of different sources of uncertainty and

correlation on earthquake-generated losses. Technical report, Presented at

IFED: International Forum on Engineering Decision Making, Stoos, Switzerland.

http://www.ifed.ethz.ch/events/Forum04/Bazzurro paper.pdf, 2004.

P. Bazzurro, J. Park, P. Tothong, and N. Jayaram. Effects of spatial correlation of ground-

motion parameters for multi-site risk assessment: Collaborative research with Stanford

University and AIR. Technical report, Report for U.S. Geological Survey National

Earthquake Hazards Reduction Program (NEHRP) External Research Program Awards

07HQGR0032, 2008.

M.J. Beckman, C.B. McGuire, and C.B. Winsten. Studies in the economics of transporta-

tion. Technical report, Cowles Comission Monograph, New Haven, Conn.: Yale Univer-

sity Press, 1956.

M. Bensi, A. Der Kiureghian, and D. Straub. A Bayesian network framework for post-

earthquake infrastructure performance assessment. In Proceedings, TCLEE2009 Con-

ference: Lifeline Earthquake Engineering in a Multihazard Environment, Oakland, Cal-

ifornia, 2009a.

M. Bensi, D. Straub, P. Friis-Hansen, and A. Der Kiureghian. Modeling infrastructure

system performance using BN. In 10th International Conference on Structural Safety

and Reliability (ICOSSAR09), Osaka, Japan, 2009b.

J.J. Bommer and N.A. Abrahamson. Why do modern probabilistic seismic-hazard analy-

ses often lead to increased hazard estimates? Bulletin of the Seismological Society of

America, 96(6):1967–1977, 2006.

J.J. Bommer, N.A. Abrahamson, F.O. Strasser, A. Pecker, P.Y. Bard, H. Bungum, F. Cotton,

D. Fah, F. Sabetta, F. Scherbaum, and J. Studer. The challenge of defining upper bounds

on earthquake ground motions. Seismological Research Letters, 75(1):82–95, 2004.

BIBLIOGRAPHY 276

D.M. Boore and G.M. Atkinson. Ground-motion prediction equations for the average hor-

izontal component of PGA, PGV and 5% damped SA at spectral periods between 0.01s

and 10.0s. Earthquake Spectra, 24(1):99–138, 2008.

D.M. Boore, J.F. Gibbs, W.B. Joyner, J.C. Tinsley, and D.J. Ponti. Estimated ground motion

from the 1994 Northridge, California, earthquake at the site of the Interstate 10 and

La Cienega Boulevard bridge collapse, West Los Angeles, California. Bulletin of the

Seismological Society of America, 93(6), 2003.

D.M. Boore, J. Watson-Lamprey, and N.A. Abrahamson. Orientation-independent mea-

sures of ground motion. Bulletin of the Seismological Society of America, 96(4):1502–

1511, 2006.

R.D. Borcherdt. Estimates of site-dependent response spectra for design (methodology and

justification). Earthquake Spectra, 10:617–653, 1994.

L. Brieman, J.H. Friedman, J.H. Olshen, and C.J. Stone. CART: Classification and Regres-

sion Trees. Belmont, CA: Wadsworth, 1983.

D.R. Brillinger and H.K. Preisler. An exploratory analysis of the Joyner-Boore attenuation

data. Bulletin of the Seismological Society of America, 74:1441–1450, 1984a.

D.R. Brillinger and H.K. Preisler. Further analysis of the Joyner-Boore attenuation data.

Bulletin of the Seismological Society of America, 75:611–614, 1984b.

Bureau of Public Roads. Traffic assignment manual. U.S. Dept. of Commerce, Urban

Planning Division, Washington D.C., 1964.

K. Campbell. Personal Communication, 2009.

K.W. Campbell and Y. Bozorgnia. NGA ground motion model for the geometric mean hor-

izontal component of PGA, PGV, PGD and 5% damped linear elastic response spectra

for periods ranging from 0.01 to 10s. Earthquake Spectra, 24(1):139–171, 2008.

K.W. Campbell and H.A. Seligson. Quantitative method for developing hazard-consistent

earthquake scenarios. In proceedings of the 6th U.S. Conference and Workshop on Life-

line Earthquake Engineering, Long Beach, CA, 2003.

BIBLIOGRAPHY 277

S. Castellaro, F. Mulargia, and P. L. Rossi. Vs30: Proxy for seismic amplification. Seismo-

logical Research Letters, 79(4):540–543, 2008.

CESMD database. Center for Engineering Strong Motion Data,

http://www.strongmotioncenter.org (last accessed 16 March 2010), 2008.

S. Chang. Evaluating disaster mitigations: Methodology for urban infrastructure systems.

Natural Hazards, 4(4):186–196, 2003.

S. Chang, M. Shinozuka, and K.E. Moore II. Probabilistic earthquake scenarios: extending

risk analysis methodologies to spatially distributed systems. Earthquake Spectra, 16:

557–572, 2000.

B. Chiou, R. Darragh, N. Gregor, and W. Silva. NGA project strong-motion database.

Earthquake Spectra, 24(1):23–44, 2008.

B.S-J. Chiou and R.R. Youngs. An NGA model for the average horizontal component of

peak ground motion and response spectra. Earthquake Spectra, 24(1):173–215, 2008.

C.A. Cornell. Engineering seismic risk analysis. Bulletin of the Seismological Society of

America, 58(5):1583–1606, 1968.

C.A. Cornell and H. Krawinkler. Progress and challenges in seismic performance assess-

ment. PEER Center News 2000; 3(2), 2000.

H. Crowley and J.J. Bommer. Modelling seismic hazard in earthquake loss models with

spatially distributed exposure. Bulletin of Earthquake Engineering, 4(3):249–273, 2006.

A.C. Davison and D.V. Hinkley. Bootstrap Methods and Their Application. Cambridge

University Press, 1997.

A.C. Davison, D.V. Hinkley, and E. Schechtman. Efficient bootstrap simulation. Biomet-

rica, 73(3):555–566, 1986.

G.G. Deierlein. Overview of a comprehensive framework for earthquake performance as-

sessment. Technical report, International Workshop on Performance-Based Seismic De-

sign Concepts and Implementation, Bled, Slovenia, 2004.

BIBLIOGRAPHY 278

A. Der Kiureghian. A coherency model for spatially varying ground motions. Earthquake

Engineering and Structural Dynamics, 25:99–111, 1996.

A. Der Kiureghian. Seismic risk assessment and management of infrastructure systems:

Review and new perspectives. In 10th International Conference on Structural Safety

and Reliability (ICOSSAR09), Osaka, Japan, 2009.

C.V. Deutsch and A.G. Journel. Geostatistical Software Library and User’s Guide. Oxford

University Press, Oxford, New York, 1998.

L. Duenas-Osorio, J.I. Craig, B.J. Goodno, and A. Bostrom. Interdependent response of

networked systems. Journal of Infrastructure Systems, 13(3):185–194, 2005.

B. Efron. An introduction to the bootstrap. CRC Press LLC, 1998.

B. Efron and R.J. Tibshirani. An Introduction to the Bootstrap. Chapman and Hall/ CRC,

1997.

G.S. Fishman. A First Course in Monte Carlo. Duxbury, Belmont, CA, 2006.

M. Frank and P. Wolfe. An algorithm for quadratic programming. Naval Research Logistics

Quarterly, 3:95–110, 1956.

J.L. Franklin, S.J. Lord, S.E. Feuer, and F.D. Marks. The kinematic structure of Hurricane

Gloria (1985) determined from nested analyses of dropwindsonde and Doppler data.

Monthly Weather Review, 121:2433–2451, 1993.

J.H. Friedman. Greedy Function Approximation: A Gradient Boosting Machine. Technical

report, Stanford University, 1999.

J.H. Friedman. Tutorial: Getting Started with MART in R. http://www-stat.stanford.edu/∼jhf/r-mart/tutorial/tutorial.pdf, 2002.

T.L. Friesz, D. Bernstein, T.E. Smith, R.L. Tobin, and B.W. Wie. A variational inequality

formulation of the dynamic network user equilibrium problem. Operations Research,

41:179–191, 1993.

BIBLIOGRAPHY 279

A. Gersho and R.M. Gray. Vector Quantization and Signal Compression. Springer, 1991.

Global Vs30 map server. http://earthquake.usgs.gov/hazards/apps/vs30/ (last accessed 16

March 2010), 2008.

K. Goda and H.P. Hong. Spatial correlation of peak ground motions and response spectra.

Bulletin of the Seismological Society of America, 98(1):354–365, 2008.

P. Goovaerts. Geostatistics for Natural Resources Evaluation. Oxford University Press,

Oxford, New York, 1997.

R. Graves. Broadband ground motion simulations for the Puente hills thrust system. Report

for U.S. Geological Survey National Earthquake Hazards Reduction Program (NEHRP)

External Research Program Awards 05HQGR0076, 2006.

R. Graves. Broadband ground motion simulations for the Puente hills thrust system. Per-

sonal communication, 2007.

P. Grossi and H. Kunreuther. Catastrophic modeling: A new approach of managing risk.

New York: Springer, 2005.

S.D. Guikema. Natural disaster risk analysis for critical infrastructure systems: An ap-

proach based on statistical learning theory. Reliability Engineering and Systems Safety,

94:855–860, 2009.

P. Hall. Performance of balanced bootstrap resampling in distribution function and quantile

problems. Probability Theory and Related Fields, 85(2):239–260, 2005.

C. Hanson, M. McCann, C. Stevens, J. Rosenfield, P. Rawlings, T. Cooke, A. Fraser, and

K. Karkanen. Delta risk management strategy. Technical report, Department of Water

Resources, 2008.

T. Hayashi, S. Fukushima, and H. Yashiro. Effects of the spatial correlation in ground mo-

tion on the seismic risk of portfolio of buildings. First European conference on Earth-

quake engineering and Seismology, Geneva, Switzerland, 2006.

BIBLIOGRAPHY 280

HAZUS. Earthquake loss estimation methodology. Technical manual. Prepared by the

National Institute of Building Sciences for Federal Emergency Management Agency,

1997.

HAZUS. Earthquake loss estimation technical manual. Technical report, National Institute

of Building Sciences, Washington D.C., 1999.

HAZUS. Multihazard loss estimation methodology: Hurricane model. Technical manual.

Prepared by the National Institute of Building Sciences for Federal Emergency Manage-

ment Agency, 2006.

N. Henze and B. Zirkler. A class of invariant consistent tests for multivariate normality.

Communications in Statistics-Theory and Methods, 19:3595–3618, 1990.

H.P. Hong, Y. Zhang, and K. Goda. Effect of spatial correlation on estimated ground-

motion prediction equations. Bulletin of the Seismological Society of America, 99(2A):

928–934, 2009.

S.H. Houston, W.A. Shaffer, M.D. Powell, and J. Chen. Comparisons of HRD and SLOSH

Surface Wind Fields in Hurricanes: Implications for Storm Surge Modeling. Weather

and Forecasting, 14:671–686, 1999.

B.R. Jarvinen, C.J. Neumann, and M.A.S. Davis. A tropical cyclone data tape for the North

Atlantic Basin 1886-1983: Contents, limitations and use. Technical report, NOAA Tech-

nical Memorandum No. NWS-NHC-22, U.S. Department of Commerce, Washington,

D.C., 1984.

N. Jayaram and J.W. Baker. Statistical tests of the joint distribution of spectral acceleration

values. Bulletin of the Seismological Society of America, 98(5):2231–2243, 2008.

N. Jayaram and J.W. Baker. Correlation model for spatially-distributed ground-motion in-

tensities. Earthquake Engineering and Structural Dynamics, 38(15):1687–1708, 2009a.

N. Jayaram and J.W. Baker. Deaggregation of lifeline risk: Insights for choosing deter-

ministic scenario earthquakes. In Proceedings, TCLEE2009 Conference: Lifeline Earth-

quake Engineering in a Multihazard Environment, Oakland, California, 2009b.

BIBLIOGRAPHY 281

N. Jayaram and J.W. Baker. Efficient sampling and data reduction techniques for probabilis-

tic seismic lifeline risk assessment. Earthquake Engineering and Structural Dynamics

(published online), 2010.

R.A. Johnson and D.W. Wichern. Applied Multivariate Statistical Analysis. Prentice Hall,

Upper Saddle River, NJ, 2007.

W.B. Joyner and D.M. Boore. Methods for regression analysis of strong-motion data.

Bulletin of the Seismological Society of America, 83(2):469–487, 1993.

W.H. Kang, J. Song, and P. Gardoni. Matrix-based system reliability method and applica-

tions to bridge networks. Reliability Engineering & System Safety, 93(11):1584 – 1593,

2008.

KiK Net. http://www.kik.bosai.go.jp/ (last accessed 16 March 2010), 2007.

A.S. Kiremidjian, J. Moore, Y.Y. Fan, N. Basiz, O. Yazali, and M. Williams. PEER highway

demonstration project. In 6th US Conference and Workshop on Lifeline Earthquake

Engineering, TCLEE/ASCE, Monograph No.25, Long Beach, CA, 2003.

A.S. Kiremidjian, E. Stergiou, and R. Lee. Issues in seismic risk assessment of trans-

portation networks. Chapter 19, Earthquake Geotechnical Engineering, pages 939–964.

Springer, 2007.

S.L. Kramer. Geotechnical Earthquake Engineering. Prentice Hall, Upper Saddle River,

New Jersey, 1996.

M.H. Kutner, C.J. Nachtsheim, J. Neter, and W. Li. Applied Linear Statistical Models. The

McGraw-Hill Companies Inc., New York, 2005.

C.W. Landsea, C. Anderson, N. Charles, G. Clark, J. Dunion, J. Fernandez-Partagas,

P. Hunderford, C. Neumann, and M. Zimmer. The Atlantic hurricane database re-

analysis project: Documentation for 1851-1910 alterations and additions to the HUR-

DAT database. In Hurricanes and Typhoons: Past, Present and Future, edited by R.J.

Murname and K.B. Liu, Columbia Univ. Press, NY, pages 177–221, 2004.

BIBLIOGRAPHY 282

A.M. Law and W.D. Kelton. Simulation Modeling and Analysis. McGraw-Hill, 2007.

K.H. Lee and D.V. Rosowsky. Synthetic hurricane wind speed records: Development of

a database for hazard analyses and risk studies. Natural Hazards Review, 8(2):23–34,

2007.

R. Lee and A.S. Kiremidjian. Uncertainty and correlation for loss assessment of spatially

distributed systems. Earthquake Spectra, 23(4):743–770, 2007.

M.R. Legg, L.K. Nozick, and R.A. Davidson. Optimizing the selection of hazard-consistent

probabilistic scenarios for long-term regional hurricane loss estimation. Structural

Safety, 32:90–100, 2010.

E.L. Lehmann and G. Casella. Theory of Point Estimation. Springer, 2nd edition, 2003.

Y. Li and B.R. Ellingwood. Hurricane damage to residential construction in the US: Im-

portance of uncertainty modeling in risk assessment. Engineering Structures, 28:1009–

1018, 2009.

K. Mardia. Measures of multivariate skewness and kurtosis with applications. Biometrika,

57:519–530, 1970.

K.V. Mardia, J.T. Kent, and J.M. Bibby. Multivariate Analysis. Academic Press, 1979.

R.K. McGuire. Seismic Hazard and Risk Analysis. Earthquake Engineering Research

Institute, 2007.

J. B. McQueen. Some methods for classification and analysis of multivariate observations.

In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probabil-

ity, Berkeley, CA, 1967.

C.J. Mecklin and D.J. Mundfrom. On using asymptotic crit-

ical values in testing for multivariate normality. InterStat,

http://interstat.statjournals.net/YEAR/2003/abstracts/0301001.php (last accessed

16 March 2010), 2003.

BIBLIOGRAPHY 283

K.V. Ooyama. Scale controlled objective analysis. Monthly Weather Review, 115:2479–

2506, 1987.

J. Park, P. Bazzurro, and J.W. Baker. Modeling spatial correlation of ground motion in-

tensity measures for regional seismic hazard and portfolio loss estimation. 10th Inter-

national Conference on Application of Statistics and Probability in Civil Engineering

(ICASP10), Tokyo, Japan, 2007.

PEER NGA Database. http://peer.berkeley.edu/nga (last accessed 16 March 2010), 2005.

J.C. Pinheiro and D.M. Bates. Mixed-effects models in S and S-PLUS. Springer, 2000.

M.D. Powell. Evaluations of diagnostic marine boundary layer models applied to hurri-

canes. Monthly Weather Review, 108:757–766, 1980.

M.D. Powell and S.H. Houston. Hurricane Andrew’s landfall in South Florida Part II :

Surface wind fields and potential real-time applications. Weather Forecast, 11:329–349,

1996.

M.D. Powell and S.H. Houston. Surface Wind Fields of 1995 Hurricanes Erin, Opal, Luis,

Marilyn, and Roxanne at Landfall. Monthly Weather Review, 126(5):1259–1273, 1997.

M.D. Powell, S.H. Houston, and T.A. Reinhold. Hurricane Andrew’s landfall in South

Florida Part I : Standardizing measurements for documentation of surface wind fields.

Weather Forecast, 11:304–328, 1996.

M.D. Powell, S.H. Houston, L.R. Amat, and N. Morisseau-Leroy. The HRD real-time hur-

ricane wind analysis system. Journal of Wind Engineering & Industrial Aerodynamics,

77 & 78:53–64, 1998.

C. Purvis. Peak spreading models: promises and limitations. In 7th TRB Conference on

the Application of Transportation Planning Models, Boston, Massachusetts, 1999.

G.J. Rix, D. Werner, and L.M. Ivey. Seismic risk analyses for container ports. In Pro-

ceedings, TCLEE2009 Conference: Lifeline Earthquake Engineering in a Multihazard

Environment, Oakland, California, 2009.

BIBLIOGRAPHY 284

S.R. Searle. Linear Models. John Wiley and Sons, Inc., 1977.

N. Shiraki, M. Shinozuka, J.E. Moore II, S.E. Chang, H. Kameda, and S. Tanaka. Sys-

tem risk curves: Probabilistic performance scenarios for highway networks subject to

earthquake damage. Journal of Infrastructure Systems, 213(1):43–54, 2007.

N. Shome and C.A. Cornell. Probabilistic seismic demand analysis of nonlinear structures.

Report No. 35, RMS Program, Stanford, CA. www.stanford.edu/group/rms (last accessed

16 March 2010), 1999.

P.G. Somerville, N.F. Smith, R.W. Graves, and N.A. Abrahamson. Modification of em-

pirical strong ground motion attenuation relations to include the amplitude and duration

effects of rupture directivity. Seismological Research Letters, 68(1):199–222, 1997.

E. Stergiou and A.S. Kiremidjian. Treatment of uncertainties in seismic risk analysis of

transportation systems. Technical report, No. 154, Blume Earthquake Engineering Cen-

ter, Stanford University, 2006.

F.O. Strasser, J.J. Bommer, and N.A. Abrahamson. Truncation of the distribution of

ground-motion residuals. Journal of Seismology, 12(1):79–105, 2008.

D. Straub and A. Der Kiureghian. Improved seismic fragility modeling from empirical

data. Structural Safety, 30:320–336, 2008.

S. Tanaka, M. Shinozuka, A. Schiff, and Y. Kawata. Lifeline seismic performance of elec-

tric power systems during the Northridge earthquake. In Proceedings of the Northridge

Earthquake Research Conference, Los Angeles, California, 1997.

USGS. Earthquake probabilities in the San Francisco bay region: 2002-2031. Technical

report, Open File Report 03-214, USGS, 2003.

D. Vamvatsikos and C.A. Cornell. Developing efficient scalar and vector intensity mea-

sures for IDA capacity estimation by incorporating elastic spectral shape information.

Earthquake Engineering and Structural Dynamics, 34(13):1573–1600, 2005.

BIBLIOGRAPHY 285

P.J. Vickery, P.F. Skerlj, A.C. Steckley, and L.A. Twisdale. Hurricane wind field model

for use in hurricane simulation. Journal of Structural Engineering, 126(10):1203–1221,

2000a.

P.J. Vickery, P.F. Skerlj, and L.A. Twisdale. Simulation of hurricane risk in the U.S. using

empirical track model. Journal of Structural Engineering, 126(10):1222–1237, 2000b.

P.J. Vickery, D. Wadhera, M.D. Powell, and Y. Chen. A hurricane boundary layer and wind

field model for use in engineering applications. Journal of Applied Meteorology, 48:

381–405, 2008.

P.J. Vickery, P.F. Skerlj, J. Lin, L.A. Twisdale Jr., M.A. Young, and F.M. Lavelle. HASUS-

MH hurricane model methodology. II: Damage and loss estimation. Natural Hazards

Review, 7(2):94–103, 2009a.

P.J. Vickery, D. Wadhera, L.A. Twisdale Jr., and F.M. Lavelle. U.S. hurricane wind speed

risk and uncertainty. Journal of Structural Engineering, 135(3):301–320, 2009b.

M.A. Walling. Non-Ergodic Probabilistic Seismic Hazard Analysis and Spatial Simulation

of Variation in Ground Motion. PhD thesis, University of California at Berkeley, 2009.

M. Wang and T. Takada. Macrospatial correlation model of seismic ground motions. Earth-

quake Spectra, 21(4):1137–1156, 2005.

S.D. Werner, C.E.Taylor, J.E. Moore, J.S. Walton, and S.Cho. A Risk-Based methodology

for assessing the seismic performance of highway systems. Technical report, Multidis-

ciplinary Center for Earthquake Engineering Research, University at Buffalo, Buffalo,

2000.

S.D. Werner, J.P. Lavoie, C. Eitzel, S.Cho, C.K. Huyck, S. Ghosh, R.T. Eguchi, C.E. Tay-

lor, and J.E. Moore. REDARS 1: Demonstration software for seismic risk analysis of

highway systems. Technical report, Multidisciplinary Centre for Earthquake Engineer-

ing MCEER, University at Buffalo, Buffalo, 2004.

BIBLIOGRAPHY 286

R.L. Wesson, D.M. Perkins, N. Luco, and E. Karaca. Direct calculation for the probability

distribution for earthquake losses to a portfolio. Earthquake Spectra, 25(3):687–706,

2009.

R.R. Youngs and K.J. Coppersmith. Implications of fault slip rates and earthquake recur-

rence models to probabilistic seismic hazard estimates. Bulletin of the Seismological

Society of America, 75(4):939–964, 1985.

A. Zerva and V. Zervas. Spatial variation of seismic ground motions. Applied Mechanics

Reviews, 55(3):271–297, 2002.

department of civil and environmental engineering stanford ...rx578gy9871/tr175_jayaram.pdf ·...

Documents