department of civil and environmental engineering stanford ...rx578gy9871/tr175_jayaram.pdf ·...
TRANSCRIPT
Department of Civil and Environmental Engineering
Stanford University
Report No.
The John A. Blume Earthquake Engineering Center was established to promote research and education in earthquake engineering. Through its activities our understanding of earthquakes and their effects on mankind’s facilities and structures is improving. The Center conducts research, provides instruction, publishes reports and articles, conducts seminar and conferences, and provides financial support for students. The Center is named for Dr. John A. Blume, a well-known consulting engineer and Stanford alumnus. Address: The John A. Blume Earthquake Engineering Center Department of Civil and Environmental Engineering Stanford University Stanford CA 94305-4020 (650) 723-4150 (650) 725-9755 (fax) [email protected] http://blume.stanford.edu
©2010 The John A. Blume Earthquake Engineering Center
c© Copyright by Nirmal Jayaram 2010
All Rights Reserved
ii
Abstract
Lifelines are large, geographically-distributed systems that are essential support systems
for any society. Probabilistic seismic risk assessment for lifelines is less straightforward
than for individual structures, due to challenges in quantifying the ground-motion hazard
over a region rather than at just a single site and in developing a risk assessment frame-
work that deals with the heavy computational burden associated with lifeline performance
evaluations.
Quantification of the regional ground-motion hazard requires information on the joint
distribution of ground-motion intensities at multiple sites. Statistical tests are used here
to examine the commonly-used assumptions of univariate normality of logarithmic inten-
sities and multivariate normality of spatially-distributed logarithmic intensities. Further,
observed and simulated ground-motion time histories are used to estimate the spatial cor-
relation between intra-event residuals, which can be used to parameterize the joint distribu-
tion of the ground-motion intensities. Factors that affect the decay of the correlation with
increasing separation distance are identified.
The study then develops a computationally-efficient lifeline risk assessment framework
based on efficient sampling and data reduction techniques. The framework can be used
for developing a small, but stochastically representative, catalog of spatially-correlated
ground-motion intensity maps that can be used for performing lifeline risk assessments.
The catalog is used to evaluate the exceedance rates of various travel-time delays on an
aggregated (higher-scale) model of the San Francisco Bay Area transportation network.
The risk estimates obtained are consistent with those obtained using conventional Monte
Carlo simulation (MCS) that requires three orders of magnitudes more ground-motion in-
tensity maps. Therefore, the proposed technique can be used to drastically reduce the
iv
computational expense of a MCS-based risk assessment, without compromising the accu-
racy of the risk estimates. Further, the catalog of ground-motion intensity maps is used in
conjunction with a statistical learning technique termed Multivariate Adaptive Regression
Trees (MART) in order to obtain an approximate relationship between the ground-motion
intensities at lifeline component locations and the lifeline performance. The lifeline perfor-
mance predicted by this relationship can be used in place of the actual lifeline performance
with advantage in problems whose computational demand stems from the need for repeated
lifeline performance evaluations.
Even though the above-mentioned risk assessment framework facilitates the considera-
tion of spatial correlation between ground-motion intensities, current ground-motion mod-
els (e.g., NGA ground-motion models) that are used to predict the distribution of ground-
motion intensities at individual sites are fitted assuming independence between the intra-
event residuals. This study proposes a method to consider the spatial correlation in the
mixed-effects regression procedure used for fitting ground-motion models, and empirically
shows that the risk estimates of spatially-distributed systems can be inaccurate while using
ground-motion models fitted without the consideration of spatial correlation.
Finally, the study also investigates the extension of the seismic hazard and risk as-
sessment concepts discussed earlier to hurricane hazard and risk modeling. The focus is
on quantifying the uncertainties and the spatial correlation in hurricane wind fields (us-
ing the same techniques that are used to quantify these parameters in earthquake ground-
motion fields), and evaluating their impact on the hurricane risk of spatially-distributed
systems. The results show that the uncertainties and the spatial correlation in the wind
fields must be modeled in order to avoid introducing errors into the risk calculations of
spatially-distributed systems. The results also show that the tools developed in this thesis
for seismic risk assessment can also be applicable to risk assessments that consider other
hazards.
v
Acknowledgments
This work was supported by the Stanford Graduate Fellowship and the U.S. Geological
Survey (USGS) via External Research Program awards 07HQGR0031 and 07HQGR0032.
Any opinions, findings, and conclusions or recommendations expressed in this material are
those of the authors and do not necessarily reflect those of the USGS.
The report was originally published as the Ph.D. dissertation of the first author. The au-
thors would like to thank Professors Anne Kiremidjian, Sarah Billington, Kincho Law, Eric
Dunham, Jerome Friedman and Dr. Paolo Bazzurro for providing constructive feedback on
this work.
vi
Contents
Abstract iv
Acknowledgments vi
1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Areas of contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Multi-site hazard modeling . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Lifeline risk assessment . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2 Statistical Tests of the Joint Distribution of Spectral Acceleration Values 202.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Testing the univariate normality of residuals . . . . . . . . . . . . . . . . . 23
2.3.1 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4 Testing the assumption of multivariate normality for random vectors using
independent samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4.1 Henze-Zirkler test . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4.2 Mardia’s measures of kurtosis and skewness . . . . . . . . . . . . . 31
2.4.3 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . 32
2.5 Testing the assumption of multivariate normality for spatially distributed data 39
2.5.1 Check for bivariate normality . . . . . . . . . . . . . . . . . . . . 40
2.5.2 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . 41
vii
2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.7 Data source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.8 Appendix: Normal score transform . . . . . . . . . . . . . . . . . . . . . . 46
3 Correlation model for spatially distributed ground-motion intensities 483.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.3 Modeling correlations using semivariograms . . . . . . . . . . . . . . . . . 51
3.4 Computation of semivariogram ranges for intra-event residuals using em-
pirical data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.4.1 Construction of experimental semivariograms using empirical data . 57
3.4.2 1994 Northridge earthquake recordings . . . . . . . . . . . . . . . 59
3.4.3 1999 Chi-Chi earthquake . . . . . . . . . . . . . . . . . . . . . . . 61
3.4.4 Other earthquakes . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.4.5 A predictive model for spatial correlations . . . . . . . . . . . . . . 67
3.5 Isotropy of semivariograms . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.5.1 Isotropy of intra-event residuals . . . . . . . . . . . . . . . . . . . 69
3.5.2 Construction of a directional semivariogram . . . . . . . . . . . . . 70
3.5.3 Test for anisotropy using Northridge ground motion data . . . . . . 71
3.6 Comparison with previous research . . . . . . . . . . . . . . . . . . . . . . 71
3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4 Spatial correlation between spectral accelerations using simulated ground-motion time histories 794.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.3 Statistical estimation of spatial correlation . . . . . . . . . . . . . . . . . . 83
4.4 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.4.1 Effect of ground-motion component orientation on the semivari-
ogram range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.4.2 Testing the assumption of isotropy using directional semivariograms 87
4.4.3 Testing the assumption of second-order stationarity . . . . . . . . . 88
viii
4.4.4 Effect of directivity on spatial correlation . . . . . . . . . . . . . . 90
4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5 Simulation of spatially-correlated ground-motion intensities with and withoutconsideration of recorded intensity values 935.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.3 Simulation of correlated residuals without consideration of recorded ground
motion intensities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.3.1 Single-step simulation technique . . . . . . . . . . . . . . . . . . . 97
5.3.2 Sequential simulation technique . . . . . . . . . . . . . . . . . . . 100
5.4 Importance sampling of normalized intra-event residuals . . . . . . . . . . 103
5.5 Sequential simulation of correlated residuals with consideration of recorded
ground motion intensities . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.7 Appendix: The conditional sequential simulation of heteroscedastic nor-
malized residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6 Efficient sampling and data reduction techniques for probabilistic seismic life-line risk assessment 1116.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.3 Simulation of ground-motion intensity maps using importance sampling . . 116
6.3.1 Importance sampling procedure . . . . . . . . . . . . . . . . . . . 117
6.3.2 Simulation of earthquake catalogs . . . . . . . . . . . . . . . . . . 118
6.3.3 Simulation of normalized intra-event residuals . . . . . . . . . . . 121
6.3.4 Simulation of normalized inter-event residuals . . . . . . . . . . . 123
6.4 Lifeline risk assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.4.1 Risk assessment based on realizations from Monte Carlo simulation 124
6.4.2 Risk assessment based on realizations from importance sampling . 125
6.5 Data reduction using K-means clustering . . . . . . . . . . . . . . . . . . . 126
ix
6.6 Application: Seismic risk assessment of the San Francisco Bay Area trans-
portation network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.6.1 Network data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.6.2 Transportation network loss measure . . . . . . . . . . . . . . . . 130
6.6.3 Ground-motion hazard . . . . . . . . . . . . . . . . . . . . . . . . 132
6.6.4 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . 132
6.6.5 Importance of modeling ground-motion uncertainties and spatial
correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.8 Appendix: Proof that the exceedance rates obtained using IS and K-means
clustering are unbiased . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.9 Appendix: Improving the computational efficiency of the K-means cluster-
ing method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
7 Lifeline performance assessment using statistical learning techniques 1447.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
7.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
7.3 Brief introduction to ground-motion map sampling . . . . . . . . . . . . . 147
7.3.1 Conventional MCS of ground-motion maps . . . . . . . . . . . . . 147
7.3.2 Importance sampling of ground-motion maps . . . . . . . . . . . . 148
7.3.3 K-means clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.4 Confidence intervals for lifeline risk estimates . . . . . . . . . . . . . . . . 150
7.4.1 Network data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
7.4.2 Ground-motion hazard data . . . . . . . . . . . . . . . . . . . . . 153
7.4.3 Statistical description of the problem . . . . . . . . . . . . . . . . 153
7.4.4 Confidence intervals using bootstrap . . . . . . . . . . . . . . . . . 154
7.4.5 Approximate loss estimation using non-parametric regression . . . 156
7.4.6 Bootstrap confidence intervals estimated using the exact and the
approximate loss functions . . . . . . . . . . . . . . . . . . . . . . 162
7.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
x
8 Seismic risk assessment of spatially distributed systems using ground motionmodels fitted considering spatial correlation 1678.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
8.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
8.2.1 Current regression algorithm . . . . . . . . . . . . . . . . . . . . . 169
8.2.2 Should spatial correlation be considered in the regression algorithm? 172
8.3 Regression algorithm for mixed-effects models considering spatial correlation173
8.3.1 Covariance matrix for the total residuals . . . . . . . . . . . . . . . 174
8.3.2 Obtaining inter-event residuals from total residuals . . . . . . . . . 174
8.3.3 Algorithm summary . . . . . . . . . . . . . . . . . . . . . . . . . 176
8.3.4 Large sample standard errors of σ and τ . . . . . . . . . . . . . . . 176
8.3.5 Mixed-effects regression procedure in R . . . . . . . . . . . . . . . 177
8.4 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
8.4.1 Standard deviation of residuals as a function of period . . . . . . . 180
8.4.2 Estimates of spatial correlation . . . . . . . . . . . . . . . . . . . . 182
8.4.3 Risk assessment for a hypothetical portfolio of buildings . . . . . . 183
8.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
9 Hurricane risk assessment of spatially-distributed systems with considerationof wind-field uncertainties and spatial correlation 1879.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
9.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
9.3 Spatial correlation estimation methodology . . . . . . . . . . . . . . . . . 191
9.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
9.4.1 Data source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
9.4.2 Hurricane Jeanne (2004) . . . . . . . . . . . . . . . . . . . . . . . 194
9.4.3 Hurricane Frances (2004) . . . . . . . . . . . . . . . . . . . . . . 198
9.4.4 Hurricane risk assessment of a hypothetical portfolio of buildings . 200
9.5 Limitations and research needs . . . . . . . . . . . . . . . . . . . . . . . . 203
9.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
xi
10 Conclusions 20610.1 Contributions and practical implications . . . . . . . . . . . . . . . . . . . 206
10.1.1 Joint distribution of spectral acceleration values at different sites
and/ or different periods . . . . . . . . . . . . . . . . . . . . . . . 206
10.1.2 Spatial correlation model for spectral accelerations . . . . . . . . . 208
10.1.3 Lifeline seismic risk assessment using efficient sampling and data
reduction techniques . . . . . . . . . . . . . . . . . . . . . . . . . 209
10.1.4 Lifeline performance assessment using statistical learning techniques211
10.1.5 Seismic risk assessment of spatially-distributed systems using ground-
motion models fitted considering spatial correlation . . . . . . . . . 211
10.1.6 Extension of proposed ground-motion modeling approaches to hur-
ricane risk assessment . . . . . . . . . . . . . . . . . . . . . . . . 212
10.2 Limitations and future work . . . . . . . . . . . . . . . . . . . . . . . . . 213
10.2.1 Spatial correlation model for spectral accelerations . . . . . . . . . 213
10.2.2 Lifeline risk assessment . . . . . . . . . . . . . . . . . . . . . . . 215
10.2.3 Risk management . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
10.2.4 Multi-hazard risk assessment . . . . . . . . . . . . . . . . . . . . . 219
10.3 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
A Characterizing spatial cross-correlation between ground-motion spectral ac-celerations at multiple periods 221A.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
A.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
A.3 Statistical Estimation of Spatial Cross-Correlation . . . . . . . . . . . . . . 224
A.4 Sample Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . 227
A.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
B Supporting details for the spatial correlation model developed in Chapter 3 230B.1 Semivariograms of residuals estimated using the Northridge earthquake
ground motions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
B.2 Semivariograms of residuals estimated using Chi-Chi earthquake ground
motions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
xii
B.2.1 Exact versus approximate semivariogram fit . . . . . . . . . . . . . 235
B.2.2 Semivariograms of the residuals at seven periods ranging between
0 and 10s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
B.3 Semivariograms of residuals estimated using broadband simulations for
scenario earthquakes on the Puente Hills thrust fault system . . . . . . . . . 240
B.4 Clustering of Vs30’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
B.5 Correlation between near-fault ground-motion intensities . . . . . . . . . . 244
B.6 Directional semivariograms estimated using the Northridge and the Chi-
Chi earthquake records at various periods . . . . . . . . . . . . . . . . . . 250
C Deaggregation of lifeline risk: Insights for choosing deterministic scenarioearthquakes 257C.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
C.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
C.3 Deaggregation of seismic loss . . . . . . . . . . . . . . . . . . . . . . . . 260
C.4 Loss assessment for the San Francisco Bay Area transportation network . . 261
C.5 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
C.5.1 Contribution of magnitudes and faults to the lifeline losses . . . . . 263
C.5.2 Contribution of inter- and intra-event residuals to the lifeline loss . 267
C.6 Transportation network performance under sample scenario ground-motion
maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
C.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
Bibliography 273
xiii
List of Tables
2.1 Tests on normalized intra-event residuals computed at different periods . . . 35
2.2 Tests on inter-event residuals computed at different periods . . . . . . . . . 47
2.3 Tests on residuals corresponding to two orthogonal directions (fault-normal
and fault-parallel directions) . . . . . . . . . . . . . . . . . . . . . . . . . 47
8.1 Regression coefficients for estimating median Sa(1s) . . . . . . . . . . . . 179
8.2 Standard deviations of residuals corresponding to Sa(1s) . . . . . . . . . . 179
xiv
List of Figures
1.1 Comparison of the risk assessment frameworks for (a) single structures and
(b) lifelines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 1999 Chi-Chi earthquake: (a) recorded PGAs (b) median PGAs predicted
by the Boore and Atkinson [2008] ground-motion model (c) normalized
total residuals computed using Equation 1.2. . . . . . . . . . . . . . . . . . 6
1.3 2004 hurricane Jeanne (The line indicates the hurricane track.): (a) recorded
wind speeds (b) wind speeds predicted by Batts et al. [1980] wind-speed
model (c) wind-speed residuals. . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Ground-motion intensity simulation for a magnitude 8 earthquake on the
San Andreas fault: (a) median intensities obtained using the Boore and
Atkinson [2008] ground-motion model (b) simulated values of the normal-
ized total residuals (c) total intensities. . . . . . . . . . . . . . . . . . . . . 14
2.1 The normal Q-Q plots of the normalized intra-event residuals at four differ-
ent periods. (a) T = 0.5 seconds (1560 samples) (b) T = 1.0 seconds (1548
samples) (c) T = 2.0 seconds (1498 samples) (d) T = 10.0 seconds (507
samples). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2 The histogram of the 12,194 pooled normalized intra-event residuals com-
puted at 10 periods, with the theoretical standard normal distribution su-
perimposed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3 The normal Q-Q plot of the pooled set of normalized intra-event residuals. . 35
2.4 The normal Q-Q plots of inter-event residuals at four different periods. (a)
T = 0.5 seconds (64 samples) (b) T = 1.0 seconds (64 samples) (c) T = 2.0
seconds (62 samples) (d) T = 10.0 seconds (21 samples). . . . . . . . . . . 36
xv
2.5 Theoretical and empirical semivariograms for residuals computed at 2 sec-
onds: (a) results for the 0.1 quantile of the residuals from the Chi-Chi data
(b) results for the 0.25 quantile of the residuals from the Chi-Chi data (c)
results for the 0.5 quantile of the residuals based from the Chi-Chi data (d)
results for the 0.25 quantile of the residuals from the Northridge data. . . . 43
2.6 Theoretical and empirical semivariograms for the 0.25 quantile of the resid-
uals: (a) results for the residuals computed at 0.5s from the Northridge data
(b) results for the residuals computed at 0.5s from the Chi-Chi data (c) re-
sults for the residuals computed at 1s from the Chi-Chi earthquake data (d)
results for the residuals computed at 5s from the Chi-Chi data. . . . . . . . 45
3.1 (a) Parameters of a semivariogram (b) Semivariograms fitted to the same
data set using the manual approach and the method of least squares. . . . . 53
3.2 Range of semivariograms of ε , as a function of the period at which ε values
are computed: (a) the residuals are obtained using the Northridge earth-
quake data (b) the residuals are obtained using the Chi-Chi earthquake data. 59
3.3 (a) Experimental semivariogram obtained using normalized Vs30’s at the
recording stations of the Northridge earthquake. No semivariogram is fit-
ted on account of the extreme scatter (b) Experimental semivariogram ob-
tained using normalized Vs30’s at the recording stations of the Chi-Chi
earthquake. The range of the fitted exponential semivariogram equals 25 km. 63
3.4 Range of semivariograms of ε , as a function of the period at which ε values
are computed. The residuals are obtained using the: (a) Big Bear City
earthquake data (b) Parkfield earthquake data; (c) Alum Rock earthquake
data; (d) Anza earthquake data; (e) Chino Hills earthquake data. . . . . . . 65
3.5 Ranges of residuals computed using PGAs versus ranges of normalized
Vs30 values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.6 (a) Range of semivariograms of ε , as a function of the period at which ε
values are computed. The residuals are obtained from six different sets of
time histories as shown in the figure; (b) Range of semivariograms of ε
predicted by the proposed model as a function of the period. . . . . . . . . 67
xvi
3.7 (a) Parameters of a directional semivariogram. Subfigures (b), (c) and (d)
show experimental directional semivariograms at discrete separations ob-
tained using the Northridge earthquake ε values computed at 2 seconds.
Also shown in the figures is the best fit to the omni-directional semivari-
ogram: (b) azimuth = 0◦ (c) azimuth = 45◦ (d) azimuth = 90◦. . . . . . . . . 72
3.8 Semivariogram obtained using residuals computed based on Chi-Chi earth-
quake peak ground velocities: (a) residuals from Annaka et al. [1997] and
semivariogram model from Wang and Takada [2005] (b) residuals from
Annaka et al. [1997] and semivariogram fitted to model the discrete values
well at short separation distances (c) residuals from Annaka et al. [1997],
considering random amplification factors. . . . . . . . . . . . . . . . . . . 74
4.1 Semivariogram computed using the Sa(T=2s) residuals. . . . . . . . . . . . 86
4.2 Ranges of semivariograms obtained using residuals computed from the (a)
1989 Loma Prieta simulations (b) recorded ground motions [Jayaram and
Baker, 2009a]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.3 (a) Ranges are computed using residuals at different orientations (b) Omni-
directional (i.e., obtained using all pairs of points, irrespective of the az-
imuth) and directional semivariograms computed using residuals for Sa(2s). 89
4.4 (a) Ranges are computed using residuals from different spatial domains (b)
Ranges are computed using pulse-like and non-pulse-like near fault ground
motions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.1 Ground-motion intensities map simulation: (a) median intensities (b) spa-
tially correlated normalized total residuals and (c) total intensities. . . . . . 96
5.2 Illustration of the sequential step procedure. . . . . . . . . . . . . . . . . . 102
5.3 The alternate sampling distribution (marginal distribution) used for the im-
portance sampling of residuals [Jayaram and Baker, 2010]. . . . . . . . . . 104
xvii
6.1 Importance sampling density functions for: (a) magnitude and (b) normal-
ized intra-event residual; (c) recommended mean-shift as a function of the
average number of sites and the average site-to-site distance normalized by
the range of the spatial correlation model. . . . . . . . . . . . . . . . . . . 120
6.2 (a) San Francisco Bay Area transportation network (b) Aggregated network. 134
6.3 (a) Travel-time delay exceedance curves (b) Coefficient of variation of the
annual exceedance rate (c) Comparison of the efficiency of MCS, IS and
the combination of K-means and IS (d) Travel-time delay exceedance curve
obtained using the K-means method. . . . . . . . . . . . . . . . . . . . . . 134
6.4 (a) Mean of travel-time delays within a cluster (b) Standard deviation of
travel-time delays within a cluster. With both clustering methods, cluster
numbers are assigned in order of increasing mean travel-time delay within
the cluster for plotting purposes. . . . . . . . . . . . . . . . . . . . . . . . 138
6.5 Comparison of site hazard curves obtained at two sample sites using the
sampling framework with that obtained using numerical integration. (a)
Sample site 1 and (b) Sample site 2. . . . . . . . . . . . . . . . . . . . . . 138
6.6 Exceedance curves obtained using simplifying assumptions. . . . . . . . . 143
6.7 Travel-time delay exceedance curve obtained using the two-step clustering
technique. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
7.1 Sample ground-motion map corresponding to an earthquake on the San
Andreas fault. A map is a collection of ground movement levels (ground-
motion intensities) at all the sites of interest. The sites of interest, in this
case, are located in the San Francisco Bay Area. . . . . . . . . . . . . . . . 145
7.2 (a) Stratified sampling of earthquake magnitudes (b) Importance sampling
of residuals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.3 Four simulated ground-motion maps, two of which are reasonably similar
and grouped together into one cluster. . . . . . . . . . . . . . . . . . . . . 151
7.4 (a) The San Francisco Bay Area transportation network (b) Aggregated
model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
7.5 Exceedance rates of travel-time delays. . . . . . . . . . . . . . . . . . . . . 154
xviii
7.6 (a) Predicted vs. exact delay values (b) Prediction residuals. . . . . . . . . 157
7.7 (a) A LOESS fit to the prediction residuals (b) Predicted and exact delay
values after bias correction. . . . . . . . . . . . . . . . . . . . . . . . . . . 158
7.8 Two sample exceedance curves obtained using the exact and the approxi-
mate loss functions (after bias correction). . . . . . . . . . . . . . . . . . . 159
7.9 (a) Residuals from the prediction model (b) Residuals normalized (divided)
by the predicted delays. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
7.10 Normal Q-Q plot of the residuals. . . . . . . . . . . . . . . . . . . . . . . 160
7.11 MART model fitted using 150 MCS maps. . . . . . . . . . . . . . . . . . . 162
7.12 Methodology for estimating bootstrap confidence intervals for the loss curves.163
7.13 1000 bootstrapped exceedance curves obtained using the (a) exact loss
function (b) approximate loss function. . . . . . . . . . . . . . . . . . . . . 163
7.14 Bootstrap confidence intervals. . . . . . . . . . . . . . . . . . . . . . . . . 164
7.15 Bootstrap confidence intervals. . . . . . . . . . . . . . . . . . . . . . . . . 165
7.16 Balanced bootstrap confidence intervals. . . . . . . . . . . . . . . . . . . . 166
8.1 Comparison of predicted median Sa(1s) values obtained using the CB08
model fitted with and without the consideration of spatial correlation: (a)
linear scale (b) log scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
8.2 Effect of spatial correlation on: (a) estimated intra-event residual standard
deviation (σ ), (b) estimated inter-event residual standard deviation (τ), (c)
estimated total residual standard deviation. (d) Ratio of inter-event residual
standard deviation to total residual standard deviation. . . . . . . . . . . . . 181
8.3 Risk assessment results for a hypothetical portfolio of buildings performed
using ground-motion models developed with and without the proposed re-
finement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
9.1 Hurricane Jeanne: (a) Observed wind speeds (b) Predicted wind speeds (c)
Residuals (d) Bias-corrected residuals. . . . . . . . . . . . . . . . . . . . . 195
9.2 Residuals and bias-corrected residuals versus closest distances from the
hurricane track. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
xix
9.3 (a) Histogram of bias-corrected residuals estimated using the Hurricane
Jeanne data (b) Normal QQ plot of normalized bias-corrected residuals
from Hurricane Jeanne. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
9.4 Semivariogram of bias-corrected residuals estimated using the Hurricane
Jeanne data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
9.5 Bias-corrected residuals estimated using the Hurricane Frances data. . . . . 199
9.6 (a) Histogram of bias-corrected residuals estimated using the Hurricane
Frances data (b) Normal QQ plot of normalized bias-corrected residuals
from Hurricane Frances. . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
9.7 Semivariogram of bias-corrected residuals estimated using the Hurricane
Frances data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
9.8 Portfolio of five residential buildings considered in the risk assessment. . . 201
9.9 Portfolio loss exceedance probabilities. . . . . . . . . . . . . . . . . . . . 203
10.1 Comparison of the risk assessment frameworks for (a) single structures and
(b) lifelines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
A.1 (a) The San Francisco Bay Area transportation network and (b) Annual
exceedance rates of various travel time delays on that network (results from
Jayaram and Baker [2010]). . . . . . . . . . . . . . . . . . . . . . . . . . . 225
A.2 (a) Chi-Chi earthquake normalized residuals computed using spectral ac-
celerations at 1 second (b) Chi-Chi earthquake normalized residuals com-
puted using spectral accelerations at 2 seconds (c) Cross-semivariogram
estimated using the 1s and 2s Chi-Chi earthquake residuals. . . . . . . . . . 229
B.1 Semivariogram of ε based on the peak ground accelerations observed dur-
ing the Northridge earthquake data . . . . . . . . . . . . . . . . . . . . . . 231
B.2 Semivariogram of ε computed at 0.5 seconds based on the Northridge
earthquake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
B.3 Semivariogram of ε computed at 1 second based on the Northridge earth-
quake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
xx
B.4 Semivariogram of ε computed at 2 seconds based on the Northridge earth-
quake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
B.5 Semivariogram of ε computed at 5 seconds based on the Northridge earth-
quake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
B.6 Semivariogram of ε computed at 7.5 seconds based on the Northridge
earthquake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
B.7 Semivariogram of ε computed at 10 seconds based on the Northridge earth-
quake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
B.8 Experimental semivariogram of ε computed at 2 seconds based on the Chi-
Chi earthquake data. Also shown in the figure are two fitted semivariogram
models: (i) An accurate exponential + nugget model and (ii) An approxi-
mate exponential model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
B.9 Semivariogram of ε based on the peak ground accelerations observed dur-
ing the Chi-Chi earthquake data . . . . . . . . . . . . . . . . . . . . . . . 236
B.10 Semivariogram of ε computed at 0.5 seconds based on the Chi-Chi earth-
quake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
B.11 Semivariogram of ε computed at 1 second based on the Chi-Chi earthquake
data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
B.12 (Approximate) Semivariogram of ε computed at 2 seconds based on the
Chi-Chi earthquake data . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
B.13 Semivariogram of ε computed at 5 seconds based on the Chi-Chi earth-
quake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
B.14 Semivariogram of ε computed at 7.5 seconds based on the Chi-Chi earth-
quake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
B.15 Semivariogram of ε computed at 10 seconds based on the Chi-Chi earth-
quake data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
B.16 Experimental Semivariogram of ε computed at 5 seconds based on the sim-
ulated ground-motion data. Also shown in the figure are two fitted semivar-
iogram models: (i) An accurate spherical model and (ii) An approximate
exponential model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
xxi
B.17 Range of semivariograms of ε , as a function of the period at which ε val-
ues are computed. The residuals are obtained using the simulated ground-
motion data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
B.18 Simulated multivariate normal random fields. The correlation structure is
defined using an exponential semivariogram with range equaling (a) 0km
(b) 20km and (c) 40km. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
B.19 Comparison between the experimental semivariogram of ε’s computed us-
ing pulse-like ground motions and the experimental semivariogram of ε’s
computed using all usable ground motions. The ε’s are computed from
peak ground accelerations . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
B.20 Comparison between the experimental semivariogram of ε’s computed us-
ing pulse-like ground motions and the experimental semivariogram of ε’s
computed using all usable ground motions. The ε’s are obtained from spec-
tral accelerations computed at 0.5 seconds . . . . . . . . . . . . . . . . . . 246
B.21 Comparison between the experimental semivariogram of ε’s computed us-
ing pulse-like ground motions and the experimental semivariogram of ε’s
computed using all usable ground motions. The ε’s are obtained from spec-
tral accelerations computed at 1 second . . . . . . . . . . . . . . . . . . . 247
B.22 Comparison between the experimental semivariogram of ε’s computed us-
ing pulse-like ground motions and the experimental semivariogram of ε’s
computed using all usable ground motions. The ε’s are obtained from spec-
tral accelerations computed at 2 seconds . . . . . . . . . . . . . . . . . . . 247
B.23 Comparison between the experimental semivariogram of ε’s computed us-
ing pulse-like ground motions and the experimental semivariogram of ε’s
computed using all usable ground motions. The ε’s are obtained from spec-
tral accelerations computed at 5 seconds . . . . . . . . . . . . . . . . . . . 248
B.24 Comparison between the experimental semivariogram of ε’s computed us-
ing pulse-like ground motions and the experimental semivariogram of ε’s
computed using all usable ground motions. The ε’s are obtained from spec-
tral accelerations computed at 7.5 seconds . . . . . . . . . . . . . . . . . . 248
xxii
B.25 Comparison between the experimental semivariogram of ε’s computed us-
ing pulse-like ground motions and the experimental semivariogram of ε’s
computed using all usable ground motions. The ε’s are obtained from spec-
tral accelerations computed at 10 seconds . . . . . . . . . . . . . . . . . . 249
B.26 Experimental directional semivariograms at discrete separations obtained
using the Northridge earthquake ε values computed at 2 seconds. Also
shown in the figures is the best fit to the omni-directional semivariogram:
(a) Omni-directional; (b) Azimuth = 0; (c) Azimuth = 45 and (d) Azimuth
= 90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
B.27 Experimental directional semivariograms at discrete separations obtained
using the Chi-Chi earthquake ε values computed at 1 second. Also shown
in the figures is the best fit to the omni-directional semivariogram: (a)
Omni-directional; (b) Azimuth = 0; (c) Azimuth = 45 and (d) Azimuth
= 90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
B.28 Experimental directional semivariograms at discrete separations obtained
using the Chi-Chi earthquake ε values computed at 7.5 seconds. Also
shown in the figures is the best fit to the omni-directional semivariogram:
(a) Omni-directional; (b) Azimuth = 0; (c) Azimuth = 45 and (d) Azimuth
= 90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
B.29 Experimental directional semivariograms at discrete separations obtained
using the simulated time histories. The ε values are computed at 2 seconds.
Also shown in the figures is the best fit to the omni-directional semivari-
ogram: (a) Omni-directional; (b) Azimuth = 0; (c) Azimuth = 45 and (d)
Azimuth = 90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
B.30 Experimental directional semivariograms at discrete separations obtained
using the simulated time histories. The ε values are computed at 7.5 sec-
onds. Also shown in the figures is the best fit to the omni-directional semi-
variogram: (a) Omni-directional; (b) Azimuth = 0; (c) Azimuth = 45 and
(d) Azimuth = 90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
xxiii
B.31 Experimental directional semivariograms at discrete separations obtained
using the simulated time histories. The ε values are computed at 7.5 sec-
onds. Also shown in the figures is an anisotropic model that fits the four
experimental semivariograms well (It is to be noted that an anisotropic
semivariogram has different shapes in different directions.): (a) Omni-
directional; (b) Azimuth = 0; (c) Azimuth = 45 and (d) Azimuth = 90 . . . . 256
C.1 The aggregated San Francisco bay area transportation network. . . . . . . . 262
C.2 Recurrence curve for the travel time delay obtained using the simulation-
based framework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
C.3 Joint likelihoods of magnitudes and faults given that travel time delay ex-
ceeds (a) 0 hours, (b) 5,000 hours, (c) 10,000 hours and (d) 20,000 hours. . 264
C.4 Level of congestion in the network as indicated by the volume/ capacity ratio.265
C.5 Joint likelihoods of inter-event residual given that travel time delay exceeds
(a) 0 hours, (b) 5,000 hours, (c) 10,000 hours and (d) 20,000 hours. . . . . . 266
C.6 Joint likelihoods of inter-event residual given that travel time delay exceeds
(a) 0 hours, (b) 5,000 hours, (c) 10,000 hours and (d) 20,000 hours. . . . . . 267
C.7 Mean magnitude of earthquakes producing a travel time delay exceeding a
specified threshold. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
C.8 (a) Average of mean intra-event residual of earthquakes producing a travel
time delay exceeding a specified threshold (b) Average of inter-event resid-
ual of earthquakes producing a travel time exceeding a specified threshold. . 268
C.9 Recurrence curves obtained without completely accounting for inter-event
and intra-event residuals. . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
C.10 Performance of the network under three difference ground-motion scenar-
ios corresponding to three different inter-event residuals. (a) η = 3.79, (b)
η = -1.64 and (c) η= 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
xxiv
Chapter 1
Introduction
1.1 Motivation
Lifelines are large, geographically-distributed systems that are essential support systems
for any society. Due to their known vulnerabilities, it is important to proactively assess
and mitigate the seismic risk of lifelines. For instance, the Northridge earthquake caused
over $1.5 billion in business interruption losses ascribed to transportation network dam-
age [Chang, 2003]. The city of Los Angeles suffered a power blackout and $75 million
of power-outage related losses as a result of the earthquake [e.g., Tanaka et al., 1997].
Lifeline seismic risk assessment is a systematic approach for quantifying the likelihood of
observing such losses during future earthquakes (pre-event risk assessment) or in the im-
mediate aftermath of an earthquake (post-event risk assessment). It is often the first step
in the process of management of the lifeline seismic risk, and is useful for several appli-
cations including the prediction (or estimation after an earthquake) of quantities such as
the monetary losses associated with structures and infrastructure owned by a corporation
or insured by an insurance company, the number of injuries and casualties in a certain area
and the probability that lifeline networks for power, water and transportation may be inter-
rupted. This knowledge is useful for decision makers interested in seismic risk mitigation
(e.g., lifeline retrofit), post-disaster management planning, post-earthquake decision mak-
ing (e.g., opening and closing of facilities such as gas pipelines) and insurance modeling.
1
CHAPTER 1. INTRODUCTION 2
Lifeline seismic risk assessment is a multi-disciplinary problem that involves seismol-
ogy to quantify the earthquake hazard, structural engineering to quantify the damage to
infrastructure components, statistics to handle the numerous uncertainties that are present
in the seismic environment and in the infrastructure performance, as well as tools and tech-
niques from fields such as optimization, network flow modeling and economics.
The analytical Pacific Earthquake Engineering Research Center (PEER) loss analysis
framework has been used to perform the risk assessment for a single structure at a given
site, by estimating the site ground-motion hazard and assessing probable losses using the
hazard information [Cornell and Krawinkler, 2000, Deierlein, 2004]. The risk is measured
as the exceedance rates of various loss levels, and is obtained as follows:
λ (DV ) =∫ ∫ ∫
G(DV |DM)dG(DM|EDP)dG(EDP|IM)∣∣dλ (IM)
∣∣ (1.1)
where λ (DV ) is the exceedance rate of the decision variable (loss measure) denoted DV ,
dλ (IM) is the derivative of the exceedance rate of a ground-motion intensity measure
denoted IM (e.g., spectral acceleration, peak ground acceleration), dG(EDP|IM) is the
derivative of the probability of exceedance of an engineering demand parameter (EDP)
(e.g., inter-story drift ratio) given an IM, dG(DM|EDP) is the derivative of the probability
of exceedance of a damage measure (DM) (e.g., minor damage, severe damage) given an
EDP and G(DV |DM) is the probability of exceedance of a decision variable (DV ) (e.g.,
monetary loss) given a DM. It is to be noted that the parameters IM, EDP and DM can also
be vectors.
Often, numerical integration is sufficient to estimate λ (DV ) for a single structure. Life-
line risk assessment, however, is based on a large vector of ground-motion intensities (e.g.,
intensities at all bridge locations in a transportation network). In other words, the scalar IM
in Equation 1.1 is now replaced by a large vector of IMs which adds considerable complex-
ity to the integral. The intensities also show significant spatial correlation (i.e., dependence
between the intensities at different sites), which needs to be carefully modeled in order to
accurately assess the seismic risk [e.g., Park et al., 2007, Bazzurro and Luco, 2004]. Fur-
ther, the link between the lifeline component damage measures and the performance of the
lifeline (i.e., G(DV |DM)) is usually not available in closed form. For instance, the travel
CHAPTER 1. INTRODUCTION 3
time of vehicles in a transportation network, a commonly-used performance measure, is
only obtained using an optimization procedure rather than being a closed-form function
of the ground-motion intensities and the bridge damage states. These additional complex-
ities make it difficult to use the PEER framework for lifeline risk assessment. There are
some analytical approaches that are sometimes used for lifeline risk assessment [e.g., Kang
et al., 2008], but those are generally applicable to only specific classes of lifeline reliabil-
ity problems. Hence, many past research works use Monte Carlo simulation (MCS)-based
approaches instead of analytical approaches for lifeline risk assessment [e.g., Chang et al.,
2000, Campbell and Seligson, 2003, Werner et al., 2004, Crowley and Bommer, 2006,
Kiremidjian et al., 2007, Shiraki et al., 2007]. Figure 1.1 illustrates the above-mentioned
similarities and dissimilarities between the risk assessment frameworks for single struc-
tures and lifelines. (The bold font in the figure denotes a vector. It is also to be noted that
the value of G(DM|IM) for a lifeline component can be computed using G(DM|EDP) and
G(EDP|IM) for the component, if desired.)
In a MCS-based approach, several possible future earthquakes are simulated and the
losses sustained by the lifeline due to the ground-motion intensities during these earth-
quakes are evaluated. These losses are then probabilistically combined in order to obtain
the exceedance rates of various loss levels. Basic MCS-based approaches necessitate per-
formance evaluations of the lifeline under a large number of possible future earthquake sce-
narios and are therefore highly computationally demanding. The current study addresses
these challenges and proposes a computationally-efficient MCS-based framework for as-
sessing the seismic risk of lifelines, with full consideration of the uncertainties and corre-
lations present in spatial ground-motion fields.
1.2 Areas of contribution
This thesis aims to address the challenges mentioned above. The major contributions of
this work are summarized below.
CHAPTER 1. INTRODUCTION 4
Figure 1.1: Comparison of the risk assessment frameworks for (a) single structures and (b)lifelines.
CHAPTER 1. INTRODUCTION 5
1.2.1 Multi-site hazard modeling
Challenges
Lifeline risk assessment requires knowledge about the joint distribution of a vector of
spatially-distributed ground-motion intensities during probable future earthquakes. The
distribution of the ground-motion intensity at a single site is typically predicted using a
ground-motion model, which takes the following form [e.g., Boore and Atkinson, 2008,
Abrahamson and Silva, 2008, Chiou and Youngs, 2008, Campbell and Bozorgnia, 2008]:
ln(Yi j) = ln(Yi j)+σi jεi j + τi jηi j (1.2)
where Yi j denotes the ground-motion intensity parameter of interest (e.g., Sa(T ), the spec-
tral acceleration at period T ) at site i during earthquake j; Yi j denotes the predicted (by
the ground-motion model) median ground-motion intensity which depends on parameters
such as magnitude, distance, period and local-site conditions; εi j denotes the normalized
intra-event residual and ηi j denotes the normalized inter-event residual. Both εi j and ηi j
are univariate normal random variables with zero mean and unit standard deviation. σi j
and τi j are standard deviation terms that are estimated as part of the ground-motion model
and are functions of the spectral period of interest, and in some models also functions of
the earthquake magnitude and the distance of the site from the rupture. The term σi jεi j
is called the intra-event residual and the term τi jηi j is called the inter-event residual. The
inter-event residual is a constant across all the sites for a given earthquake. (It is to be noted
that some chapters of this thesis describe the ground-motion model directly in terms of the
inter-event and the intra-event residuals, rather than using the normalized forms.) The sum
of the inter-event residual and the intra-event residual is called the total residual. Figures
1.2a-b show, for example, the observed (i.e., Yi j) and predicted (Yi j) peak ground accelera-
tions (PGA) during the 1999 Chi-Chi earthquake. Figure 1.2c shows the normalized total
residuals (i.e., total residuals normalized by their standard deviation) computed using the
Boore and Atkinson [2008] ground-motion model.
While quantifying the hazard over two or more sites, the ground-motion model is used
to predict the ground-motion intensity at each site of interest. For instance, the following
CHAPTER 1. INTRODUCTION 6
Figure 1.2: 1999 Chi-Chi earthquake: (a) recorded PGAs (b) median PGAs predicted by theBoore and Atkinson [2008] ground-motion model (c) normalized total residuals computedusing Equation 1.2.
CHAPTER 1. INTRODUCTION 7
equations are used to predict the distribution of ground-motion intensity at sites i and i′.
ln(Yi j) = ln(Yi j)+σi jεi j + τi jηi j (1.3)
ln(Yi′ j) = ln(Yi′ j)+σi′ jεi′ j + τi′ jηi′ j (1.4)
It is to be noted, however, that the above equations only provide information about the
marginal distribution of the ground-motion intensity at sites i and i′. Regional risk assess-
ments require knowledge about the joint distribution of the ground-motion intensity at sites
i and i′ in order to capture possible dependencies between the ground-motion intensities at
the two sites. Since the median predictions at sites i and i′ are deterministic and the values
of the inter-event residual at sites i and i′ are equal, the only additional information re-
quired to quantify the joint distribution of intensities at sites i and i′ is the joint distribution
of the εi j and εi′ j. While, the inter-event residual and each of the intra-event residuals dur-
ing an earthquake have been statistically seen to follow the univariate normal distribution
marginally [Abrahamson, 1988], not much is known about the joint distribution of multiple
spatially-distributed intra-event residuals. In the past, some research works assume that the
intra-event residuals follow a multivariate normal distribution [e.g., Bazzurro and Cornell,
2002, Baker and Cornell, 2006, Kiremidjian et al., 2007], though this assumption has not
been verified using recorded time history data.
Once the nature of the distribution of the residuals (and equivalently, that of the inten-
sities) is determined, the distribution needs to be parameterized so that it can be used for
forward-predicting residuals and ultimately, ground-motion intensities from future earth-
quakes. One of the challenges in parameterizing the joint distribution of the intra-event
residuals is that the intra-event residuals exhibit ‘spatial correlation’ [e.g., Boore et al.,
2003]. The spatial correlation is a term that denotes the interdependency between the intra-
event residuals located over a region during an earthquake. It arises due to several reasons
including common-source effects (a part of this effect is captured by the inter-event resid-
ual) and similarity in local-site effects and propagation-path effects. The correlation is
known to be large when the sites are close to one another, and decays with increase in sep-
aration between the sites [Boore et al., 2003]. Evidence of spatial correlation can be seen
in Figure 1.2c, which shows clusters of large- and small-valued residuals (which indicates
CHAPTER 1. INTRODUCTION 8
dependence between closely-spaced residuals).
The impact of this correlation on lifeline risk has only been recently studied, and has
been seen to be significant [e.g., Park et al., 2007, Lee and Kiremidjian, 2007, Straub
and Der Kiureghian, 2008, Rix et al., 2009]. The Sacramento delta levee system risk as-
sessment project is a practical example where the spatial correlation was considered in the
risk assessment process [Hanson et al., 2008, Bazzurro, 2010]. Straub and Der Kiureghian
[2008] note that the presence of spatial correlation tends to increase the reliability of se-
ries systems and decrease the reliability of parallel systems. Irrespective of the nature of
the lifeline (which is neither a series nor a parallel system), it is important to consider
the spatial correlation in the risk assessment in order to obtain unbiased estimates of the
probability of sustaining large losses (and small frequent losses).
The ground-motion models (Equation 1.2) that quantify the distribution of intensities at
a single site do not provide information about the spatial correlation between the intensities.
Researchers, in the past, have computed these correlations using ground-motion time his-
tories recorded during earthquakes [Goda and Hong, 2008, Wang and Takada, 2005, Boore
et al., 2003]. Boore et al. [2003] used observations of PGA from the 1994 Northridge
earthquake to compute the spatial correlations. Wang and Takada [2005] computed the
correlations using observations of peak ground velocities (PGV) from several earthquakes
in Japan and the 1999 Chi-Chi earthquake. Goda and Hong [2008] used the Northridge and
Chi-Chi earthquake PGAs and spectral accelerations at three periods ranging between 0.3s
and 3s. The results reported by these research works, however, differ in terms of the rate
of decay of correlation with separation distance. For instance, while Boore et al. [2003]
report that the correlation drops to essentially zero at a site separation distance of approxi-
mately 10 km, the non-zero correlations observed by Wang and Takada [2005] extend past
100 km. Further, Goda and Hong [2008] observe differences between the correlation decay
rate estimated using the Northridge earthquake records and the correlation decay rate based
on the Chi-Chi earthquake records. To date, no explanation for these differences has been
identified.
Additionally, the ground-motion models used in the development of the correlation
models and for performing risk assessments are currently calibrated using regression analy-
sis that assumes independence between the intra-event residuals [Abrahamson and Youngs,
CHAPTER 1. INTRODUCTION 9
1992]. Few works have verified the impact of considering the spatial correlation in the
development of ground-motion models. One recent work is that of Hong et al. [2009],
who investigated the influence of including spatial correlation in the regression analysis on
the ground-motion models fitted using a two-stage regression algorithm and a one-stage
algorithm of Joyner and Boore [1993]. They observed that the differences in the estimated
ground-motion model coefficients (used for predicting the median intensity) obtained with
and without the incorporation of spatial correlation were insignificant. They did not, how-
ever, investigate the impact on the variances predicted by the ground-motion models in
detail.
Contributions
In the current study, statistical tests are used to verify the commonly-used assumptions
of univariate normality of logarithmic intensities and multivariate normality of spatially-
distributed logarithmic intensities. Further, observed and simulated ground-motion time
histories are used to estimate the spatial correlation between intra-event residuals, which
can be used to parameterize the joint distribution of the ground-motion intensities. Factors
that affect the rate of decay of the correlation with separation distance are studied. Probable
explanations for the differing correlation estimates reported in the literature are provided.
Finally, the importance of considering spatial correlation in lifeline risk assessments is
illustrated.
The study also investigates the impact of incorporating spatial correlation on the ground
motion model coefficients and on the variance of the predicted intensities. The commonly-
used mixed-effects regression algorithm of Abrahamson and Youngs [1992] is modified to
account for the spatial correlation. This modified algorithm is then used to refit a sample
ground-motion model (the Campbell and Bozorgnia [2008] model) in order to study the
impact of incorporating spatial correlation on ground-motion models and subsequently, on
the lifelines risk estimates.
Additionally, the techniques described above for quantifying the seismic hazard over
a region can be extended to other types of hazard and multi-hazard scenarios. This study
investigates extension of the regional seismic risk framework to regional hurricane hazard
modeling. Multi-site hurricane wind hazard assessment involves the simulation of possible
CHAPTER 1. INTRODUCTION 10
hurricane tracks (i.e., the point of origin, path and other properties such as the central pres-
sure and velocity) and the prediction of the wind fields (peak wind speeds at all the sites of
interest) associated with each track [Lee and Rosowsky, 2007, Vickery et al., 2009b, Legg
et al., 2010]. This is analogous to the simulation of earthquake events and the prediction
of associated ground-motion fields in the seismic hazard assessment framework. Most pre-
diction models developed in the past for predicting hurricane wind fields are deterministic,
however, and the uncertainties in wind fields have been rarely analyzed. To the author’s
knowledge, the spatial correlation in hurricane wind fields has not been studied in the liter-
ature. The current study focuses on quantifying the uncertainties and the spatial correlation
in hurricane wind fields (using the same techniques that were used to quantify these pa-
rameters in earthquake ground-motion fields), and evaluating their impact on the hurricane
risk of spatially-distributed systems. Hurricane wind-speed predictions are obtained for
two sample hurricanes, Hurricane Jeanne and Hurricane Frances, using the Batts et al.
[1980] wind-speed model, and the uncertainties in these predictions are evaluated using
actual wind-speed recordings. For instance, Figure 1.3a shows the track of the 2004 Hur-
ricane Jeanne [Landsea et al., 2004] and the observed maximum wind speeds (maximum
sustained one minute wind speed at 10 meter height) during the hurricane [Powell et al.,
1998]. Figure 1.3b shows the corresponding wind speeds predicted by Batts et al. [1980]
wind-speed model. Figure 1.3c shows the total residuals computed from the observed and
the predicted wind speeds. The smoothness of the residuals in Figure 1.3c indicates the
presence of spatial correlation between the residuals. This spatial correlation structure is
estimated and modeled using geostatistical tools.
1.2.2 Lifeline risk assessment
Challenges
Lifelines are complex infrastructure systems with a large number of components. (For in-
stance, there are over 1,000 bridges in the San Francisco Bay Area transportation network
model used later in this thesis). Estimating the performance of a lifeline during an earth-
quake scenario is often extremely computationally intensive. One important challenge in
lifeline risk assessment is to devise methods to handle this large computational demand
CHAPTER 1. INTRODUCTION 11
Figure 1.3: 2004 hurricane Jeanne (The line indicates the hurricane track.): (a) recordedwind speeds (b) wind speeds predicted by Batts et al. [1980] wind-speed model (c) wind-speed residuals.
CHAPTER 1. INTRODUCTION 12
[Der Kiureghian, 2009].
In the past, researchers have developed and used several techniques for lifeline risk as-
sessment. Numerical integration-based techniques are used in some special cases where
the components of the lifeline do not interact with each other. This is done, for instance,
while evaluating the exceedance rates of monetary losses associated with structural damage
to bridges in a transportation network [Stergiou and Kiremidjian, 2006]. This also arises
in other situations involving spatially-distributed systems such as the evaluation of the ex-
ceedance rates of monetary losses to a portfolio of buildings [Wesson et al., 2009]. These
works, however, ignore the spatial correlation between the ground-motion intensities in
order to facilitate the use of numerical integration.
Some research works use simplified lifeline performance measures in order to reduce
the computational demand. Basoz and Kiremidjian [1996], Duenas-Osorio et al. [2005],
Kang et al. [2008] and Bensi et al. [2009b] use connectivity between the nodes of a lifeline
(e.g., transportation network connectivity between a city and a hospital) as a measure of
network performance. Not only does the use of a simplified connectivity-based measure
(instead of a flow-based measure such as the travel-time delay in a transportation network)
reduce the time required to evaluate the network performance under various earthquake
scenarios, it enables the use of computationally-efficient analytical techniques such as the
matrix-based system reliability (MSR) method of Kang et al. [2008] to evaluate the lifeline
risk. It is to be noted that the MSR method can be extended to problems using flow-based
performance measures, but is computationally expensive in such cases [Kang et al., 2008].
On account of the above-mentioned complications involved in modeling the hazard
and the lifeline performance (particularly while using flow-based measures), many past re-
search works use MCS-based approaches instead of analytical approaches for lifeline risk
assessment [e.g., Werner et al., 2000, Chang et al., 2000, Campbell and Seligson, 2003,
Crowley and Bommer, 2006, Kiremidjian et al., 2007, Shiraki et al., 2007]. One simple
MCS-based approach used in the past involves studying the performance of lifelines under
those earthquake scenarios that dominate the hazard in the region of interest [e.g., Adachi
and Ellingwood, 2008, Kiremidjian et al., 2007, Duenas-Osorio et al., 2005]. While this ap-
proach is more tractable, it does not adequately capture all the seismic hazard uncertainties.
CHAPTER 1. INTRODUCTION 13
A more comprehensive approach uses MCS to probabilistically generate ground-motion in-
tensity maps, considering all possible earthquake scenarios that could occur in the region,
and then use these for the risk assessment [Crowley and Bommer, 2006]. Sample scenarios
are probabilistically generated by first estimating the median intensities due to a particular
earthquake using a ground-motion model, and by subsequently combining the median in-
tensities with simulated values of residuals. Figure 1.4, for instance, illustrates the MCS
of ground-motion intensities for a magnitude 8 earthquake on the San Andreas fault. The
most basic form of MCS, the conventional MCS, is computationally inefficient because
large magnitude earthquakes and above-average ground-motion intensities are considerably
more important than small magnitude earthquakes and small ground-motion intensities to
lifeline risks, but these are infrequently sampled in conventional MCS. Kiremidjian et al.
[2007] improved the MCS process by preferentially simulating large magnitude events us-
ing importance sampling (IS). Werner et al. [2004] also implemented variance-reduction
techniques in the software package REDARS (Risks from Earthquake Damage to Road-
way Systems) in order to simulate fewer earthquakes.
Chang et al. [2000] used a MCS-based approach to estimate earthquake-induced delays
in a transportation network. They generated a catalog of 47 earthquakes and corresponding
intensity maps for the Los Angeles area and assigned probabilities to these earthquakes
such that the site hazard curves obtained using this catalog match with the known local site
hazard curves obtained from PSHA. In other words, the probabilities of the scenario earth-
quakes were chosen to make the catalog hazard consistent. Only median PGAs were used
to produce the ground-motion intensity maps corresponding to the scenario earthquakes,
however, and variability about these medians was ignored, which can bias the resulting risk
estimates [e.g., Grossi and Kunreuther, 2005]. While this approach is highly computation-
ally efficient on account of the use of a small catalog of earthquakes, the selection of earth-
quakes is a somewhat subjective process, and the assignment of probabilities is based on
hazard consistency rather than on actual event likelihoods. Campbell and Seligson [2003]
proposed a more quantitative procedure to develop the hazard consistent scenarios, but the
rest of the drawbacks were not resolved.
Recently, Guikema [2009] proposed that the lifeline performance evaluations can be ex-
pedited by using an approximate regression relationship between the lifeline performance
CHAPTER 1. INTRODUCTION 14
Figure 1.4: Ground-motion intensity simulation for a magnitude 8 earthquake on the SanAndreas fault: (a) median intensities obtained using the Boore and Atkinson [2008] ground-motion model (b) simulated values of the normalized total residuals (c) total intensities.
CHAPTER 1. INTRODUCTION 15
and the predictive hazard variables (e.g., ground-motion intensities at component locations)
obtained using a statistical learning technique. He, however, did not provide any risk as-
sessment examples. Another recent work is that of Bensi et al. [2009a], who explored the
use of Bayesian network models, particularly for post-earthquake lifeline risk assessment.
The computational feasibility of this approach particularly while estimating the risk of large
lifelines needs further investigation.
Contributions
The current study develops a computationally-efficient lifeline risk assessment framework
based on efficient sampling and data reduction techniques. The framework can be used for
developing a small, but stochastically representative catalog of spatially-correlated ground-
motion intensity maps that can be used for performing lifeline risk assessments. This tech-
nique is seen to reduce the computational demand of complex risk assessments by more
than three orders of magnitude, without compromising the accuracy of the risk estimates.
The proposed framework is used to evaluate the exceedance rates of various travel-time
delays on an aggregated (higher-scale) model of the San Francisco Bay Area transportation
network. Lifeline risk deaggregation calculations are used to illustrate the need to consider
uncertainties in the lifeline risk assessment process. Finally, the study also explores the use
of a statistical learning technique called multivariate adaptive regression trees in order to
expedite lifeline performance evaluation.
1.3 Organization
This thesis addresses several important issues related to the risk assessment of lifelines.
Chapters 2, 3 and 4 deal with the joint distribution of spatially-distributed ground-motion
intensities. Chapter 5 discusses systematic approaches to probabilistically sampling ground
motion intensity fields. Chapters 6 and 7 present new computationally-efficient lifeline risk
assessment techniques. Chapter 8 discusses the impact of considering spatial correlation
on ground-motion models and subsequently, on lifeline seismic risk. Chapter 9 explores
extending the probabilistic framework used for the seismic risk assessment of lifelines to
CHAPTER 1. INTRODUCTION 16
hurricane lifeline risk assessment.
Chapter 2 deals with the important issue of quantifying the joint distribution of spectral
accelerations, which is required for the risk assessment of lifelines. The chapter discusses
statistical tests that are used to examine the commonly-used assumptions of univariate nor-
mality of logarithmic spectral acceleration values and multivariate normality of vectors of
logarithmic spectral acceleration values computed at different sites and/or different peri-
ods. The statistical hypothesis tests carried out in this work indicate that these assumptions
are reasonable.
Chapter 3 presents a new spatial correlation model for spectral accelerations at a single
period (and the related Appendix A describes the estimation of cross-correlations between
spectral accelerations at two different periods), developed using recorded earthquake time
histories. The correlation is expressed as a function of the site separation distance, the
spectral acceleration period and the local soil conditions. The correlations predicted by the
model, along with the means and the variances provided by the ground-motion models, can
be used to completely parameterize the joint distribution of spatial spectral acceleration
fields, which is necessary for lifeline risk calculations.
Chapter 4 investigates the validity of commonly-used assumptions in spatial correlation
models such as stationarity (invariance of correlation with spatial location) and isotropy
(directional independence). Testing these assumptions, however, requires a large number
of ground-motion time histories. Since real data are sparse, this chapter uses simulated
ground-motion time histories instead. The chapter also takes advantage of the large simu-
lated ground-motion database to carry out tests to identify whether the correlations between
pulse-like ground motions that arise due to directivity effects are different from the corre-
lations between non-pulse-like ground motions. Overall, this chapter tests and provides a
basis for some of the subtle assumptions commonly used in spatial correlation models.
Chapter 5 discusses techniques for simulating ground-motion intensity maps with and
without the consideration of recorded ground-motion intensities. A ground-motion inten-
sity map is generated by combining median intensity predictions from ground-motion mod-
els with realizations of inter-event and intra-event residuals that account for the uncertainty
in the intensities. Intra-event residuals can be simulated as a correlated vector (using the
correlation model presented in Chapter 3) of multivariate normal random variables, and the
CHAPTER 1. INTRODUCTION 17
inter-event residual can be simulated as a univariate Gaussian random variable (based on the
discussion in Chapter 2). The chapter discusses two MCS techniques, termed, single-step
simulation and sequential simulation, for generating residuals in the absence of recorded
ground-motion intensities. While both procedures are theoretically equivalent, it is possible
to reduce computational expense by using the sequential simulation technique. The chap-
ter also describes a sequential simulation technique for simulating residuals incorporating
knowledge about recorded ground-motion intensities. This is useful for post-earthquake
damage assessment and for determining optimal emergency response strategies.
Chapter 6 presents a novel computationally-efficient MCS procedure based on im-
portance sampling and K-means clustering, that can be used for the seismic risk assess-
ment of lifelines. The framework can be used for developing a small, but stochastically-
representative catalog of ground-motion intensity maps that can be used for performing life-
line risk assessments. The importance sampling technique is used to preferentially sample
important ground-motion intensity maps (using the MCS techniques discusses in Chapter
5), and the K-means clustering technique is used to identify and combine redundant maps.
It is shown theoretically and empirically that the risk estimates obtained using these tech-
niques are unbiased. The proposed framework is used to compute the exceedance rates of
travel-time delays (the chosen performance measure) on an aggregated form (coarse-scale
model) of the San Francisco Bay Area transportation network. The exceedance rates of
travel-time delays are obtained using a catalog of only 150 maps, and are shown to be in
good agreement with those obtained using the conventional MCS method. The proposed
method is three orders of magnitude faster (computationally) than the conventional MCS,
and therefore will potentially facilitate computationally intensive risk analysis of lifelines,
with full consideration of the uncertainties and the spatial correlation in ground-motion
intensity fields. The related Appendix C uses lifeline risk deaggregation calculations to
illustrate the need to consider these uncertainties in the lifeline risk assessment process.
Chapter 7 explores the use of statistical learning techniques to reduce the computa-
tional expense of the lifeline risk assessment problem. MCS and its variants are generally
well suited for characterizing ground motions and computing resulting losses to lifelines.
MCS-based methods are, however, highly computationally intensive, primarily because
CHAPTER 1. INTRODUCTION 18
they involve repeated evaluations of lifeline performance under a large number of sim-
ulated ground-motion intensity maps. In this study, a non-parametric statistical learning
technique termed Multivariate Adaptive Regression Trees (MART) is used to obtain an
approximate relationship between the ground-motion intensities at lifeline component lo-
cations and the lifeline performance. Non-parametric regression is used in place of clas-
sical regression since the number of predictor variables (ground-motion intensities at the
component locations) far exceeds the number of available training data points. The life-
line performance predicted by this relationship can potentially be used in place of the
actual lifeline performance (the evaluation of which is intensive) to expedite the compu-
tation of several lifeline risk-related parameters. The study illustrates this by developing
a MART-based relationship between the ground-motion intensities at bridge locations and
the network travel times in the San Francisco Bay Area transportation network, and using
it for estimating confidence intervals for the risk estimates presented in Chapter 6. More
generally, these approximate performance relationships can be used in several problems
such as prioritizing lifeline retrofits, whose computational demand stems from the need for
repeated performance evaluations.
Even though the risk assessment framework described in Chapter 6 facilitates the con-
sideration of spatial correlation between ground-motion intensities, current ground-motion
models (e.g., NGA ground-motion models) that are used to predict the distribution of
ground-motion intensities at individual sites are fitted assuming independence between
the intra-event residuals. Chapter 8 proposes a method to consider the spatial correlation
(discussed in Chapter 3) in the mixed-effects regression procedure used for fitting ground-
motion models, and illustrates the impact of considering spatial correlation on the means
and the variances predicted by the ground-motion models. It is shown using an illustra-
tive example that the risk estimates of spatially-distributed systems can be inaccurate while
using ground-motion models fitted without the consideration of spatial correlation.
Frameworks for the risk assessment of structures and infrastructure systems under a
variety of natural and man-made hazards share many similarities. It is conceivable there-
fore, that the techniques developed for the risk assessment under one type of natural or
man-made hazard will be applicable for the risk assessment under another hazard or multi-
hazard scenario. Chapter 9 describes an exploratory study carried out to investigate the
CHAPTER 1. INTRODUCTION 19
extension of the seismic hazard and risk assessment concepts and techniques discussed in
the earlier chapters to hurricane hazard and risk modeling. The study focuses on quantify-
ing the uncertainties and the spatial correlation in hurricane wind fields (using techniques
that are used to quantify these parameters in earthquake ground-motion fields), and evalu-
ating their impact on the hurricane risk of spatially-distributed systems.
Finally, Chapter 10 summarizes the important contributions and findings of this thesis,
and discusses future extensions of this research.
The chapters of this thesis are designed to be largely self-contained because they have
been or will be published as individual journal articles. Because of this, there is some
repetition of background material. In addition, notational conventions were chosen to be
simple and clear for the topic of each chapter rather than for the thesis as a whole; because
of this, the notational conventions may not be identical for each chapter. Apologies are
made for any distraction this causes when reading the thesis as a continuous document.
Chapter 2
Statistical Tests of the Joint Distributionof Spectral Acceleration Values
N. Jayaram and J.W. Baker (2008). Statistical tests of the joint distribution of spectral
acceleration values, Bulletin of the Seismological Society of America, 98(5), 2231-2243.
2.1 Abstract
Assessment of seismic hazard using conventional probabilistic seismic hazard analysis
(PSHA) typically involves the assumption that the logarithmic spectral acceleration values
follow a normal distribution marginally. There are, however, a variety of cases in which
a vector of ground-motion intensity measures is considered for seismic hazard analysis.
In such cases, assumptions regarding the joint distribution of the ground-motion intensity
measures are required for analysis. In this article, statistical tests are used to examine
the assumption of univariate normality of logarithmic spectral acceleration values and to
verify that vectors of logarithmic spectral acceleration values computed at different sites
and/or different periods follow a multivariate normal distribution. Multivariate normality
of logarithmic spectral accelerations are verified by testing the multivariate normality of
inter-event and intra-event residuals obtained from ground-motion models.
The univariate normality tests indicate that both inter-event and intra-event residuals
20
CHAPTER 2. JOINT DISTRIBUTION OF SPECTRAL ACCELERATIONS 21
can be well represented by normal distributions marginally. No evidence is found to sup-
port truncation of the normal distribution, as is sometimes done in PSHA. The tests for
multivariate normality show that inter-event and intra-event residuals at a site, computed
at different periods, follow multivariate normal distributions. It is also seen that spatially-
distributed intra-event residuals can be well represented by the multivariate normal distri-
bution. This study provides a sound statistical basis for assumptions regarding the marginal
and the joint distribution of ground-motion parameters that must be made for a variety of
seismic hazard calculations.
2.2 Introduction
Spectral acceleration values of earthquake ground motions are widely used in seismic haz-
ard analysis. Conventional probabilistic seismic hazard analysis (PSHA) [e.g., Kramer,
1996] provides a framework for the probabilistic assessment of a single ground-motion
parameter (such as the spectral acceleration computed at a single period). When imple-
menting PSHA, it is typically assumed that the spectral acceleration follows a lognormal
distribution marginally. There are, however, cases in which knowledge about the joint
occurrence of several spectral acceleration values, corresponding to different periods, is
required for hazard assessment [Bazzurro and Cornell, 2002]. Additionally, a single earth-
quake can cause severe damage over a large area. Hence, when assessing the impact of
earthquakes on a portfolio of structures or a spatially-distributed infrastructure system, it
is necessary to study the joint occurrence of spectral acceleration values at various sites in
the region [Crowley and Bommer, 2006]. Moreover, the knowledge of a vector of ground-
motion intensity measures is useful in other practical applications that involve computation
of the seismic response of a structure dominated by more than one mode [Shome and Cor-
nell, 1999, Vamvatsikos and Cornell, 2005], or that involve joint prediction of structural
and non-structural seismic responses for loss estimation purposes, and prediction of multi-
ple demand parameters such as displacement and hysteric energy. In such cases, a vector
of intensity measures needs to be considered and hence, it is necessary to study the joint
distribution of these intensity measures in observed ground motions.
CHAPTER 2. JOINT DISTRIBUTION OF SPECTRAL ACCELERATIONS 22
Various empirical ground-motion models have been developed for estimating the re-
sponse spectrum of a given ground motion [e.g., Campbell and Bozorgnia, 2008, Boore
and Atkinson, 2008, Abrahamson and Silva, 2008, Chiou and Youngs, 2008]. A typical
ground-motion model has the form:
ln(Y ) = ln(Y )+ ε +η (2.1)
where Y denotes the ground-motion parameter of interest (e.g., Sa(T1), the spectral acceler-
ation at period T1); Y denotes the predicted (by the ground-motion model) median value of
the ground-motion parameter (which depends on parameters such as magnitude, distance,
period and local soil conditions); ε denotes the intra-event residual, which is a random vari-
able with zero mean and a standard deviation of σ ; and η denotes the inter-event residual,
which is a random variable with zero mean and a standard deviation of τ . The standard
deviations, σ and τ , are estimated during the derivation of the ground-motion model and
are a function of the response period, and in some models a function of earthquake mag-
nitude and distance from the rupture. Normalized intra-event residuals (ε) are obtained by
dividing ε by σ . Similarly, η can be normalized using τ to obtain η .
The logarithmic spectral acceleration at a site due to an earthquake is usually assumed to
be well represented by the normal distribution marginally [e.g., Kramer, 1996]. Abraham-
son [1988] performed rigorous statistical studies to verify the assumption that logarithmic
peak ground acceleration (PGA) values follow the normal distribution marginally. Such
rigorous studies have, however, not been performed on spectral accelerations. Moreover,
the assumption of normality must be extended to the joint distribution of the logarithmic
spectral accelerations, when performing vector-valued seismic hazard analysis [Bazzurro
and Cornell, 2002, Baker and Cornell, 2006]. When multiple ground-motion parameters
are considered (for instance, Y1 and Y2), the ground-motion model equations take the fol-
lowing form:
ln(Y1) = ln(Y1)+ ε1 +η1
ln(Y2) = ln(Y2)+ ε2 +η2 (2.2)
CHAPTER 2. JOINT DISTRIBUTION OF SPECTRAL ACCELERATIONS 23
where Y1 and Y2 denote the predicted median values of the ground-motion parameters; ε1
and ε2 denote the intra-event residuals corresponding to the two parameters; η1 and η2
denote the inter-event residuals (η1 equals η2 if Y1 and Y2 denote Sa(T ) at two sites during
the same earthquake). If Y1 and Y2 are spectral accelerations at two-closely spaced sites
or spectral accelerations at two different periods at the same site, the residuals will not be
independent [Baker and Jayaram, 2008, Baker and Cornell, 2006]. Thus, an assumption
of univariate normality does not necessarily imply joint normality between the residuals.
There is a paucity of research work that examines the validity of assuming multivariate
normality. This chapter explores the validity of these assumptions using statistical tests
for univariate and multivariate normality, and a large library of spectral acceleration values
from recorded ground motions.
The ground-motion model of Campbell and Bozorgnia [2008] is used in this study to
compute the parameters shown in equations 2.1 and 2.2. The conclusions drawn from the
work, however, did not change when the Boore and Atkinson [2008] ground-motion model
was used as well. The spectral acceleration definition typically used in the NGA ground-
motion models is ‘GMRotI50’ (also known as ‘GMRotI’). This is the 50th percentile of
the set of geometric means of spectral accelerations at a given period, obtained by rotating
the as-recorded orthogonal horizontal motions through all possible non-redundant rotation
angles [Boore et al., 2006]. The residuals used in this work are obtained based on this
definition of the spectral acceleration.
The data for the analysis is obtained from the PEER NGA Database [2005]. In order to
exclude records whose characteristics differ from those used by the ground-motion model-
ers for data analysis, only records used by the ground-motion model authors are considered
in the tests for normality.
2.3 Testing the univariate normality of residuals
This section discusses tests performed on the assumption that logarithmic spectral acceler-
ations at a site due to a given earthquake are well represented by the normal distribution,
marginally. A practical way to test the univariate normality of a data set is to inspect the
normal Q-Q plot obtained from the data set by plotting the quantiles of the data sample
CHAPTER 2. JOINT DISTRIBUTION OF SPECTRAL ACCELERATIONS 24
against the corresponding quantiles of the theoretical normal distribution [e.g., Johnson
and Wichern, 2007].
The following steps are involved in the construction of a normal Q-Q plot. Let x be
a collection of n data values that need to be tested for normality. The data set is ordered
(sorted in ascending order) to obtain[x(1),x(2), ...,x(n)
](such that x(1) ≤ x(2) ≤ ·· · ≤ x(n)).
When these sample quantiles x(k) are distinct (which is a reasonable assumption for contin-
uously varying data), exactly k observations are less than or equal to x(k). The cumulative
probabilities p(k) of each x(k) can be computed as kn . It has been shown, however, that a con-
tinuity correction gives an improved p(k) estimate of (k−3/8)(n+1/4) [Johnson and Wichern, 2007]
and hence, this definition of p(k) is used in this work. The normal Q-Q plot is obtained by
plotting the ordered data samples against the theoretical normal quantiles corresponding to
each of the probabilities p(k). The theoretical normal quantile corresponding to probability
p(k) is obtained as Φ−1(p(k)), where Φ−1 denotes the inverse of the cumulative normal
distribution with the mean and the variance equaling the sample mean and the sample vari-
ance respectively. If the data sample follows a normal distribution, the normal Q-Q plot
will form a straight line with a slope of 45 ◦, passing through the origin.
2.3.1 Results and discussion
Normality tests are performed on intra-event and inter-event residuals in order to verify the
univariate normality of logarithmic spectral accelerations at a site due to an earthquake.
The intra-event and the inter-event residuals provided to the authors by the ground-motion
model authors are used in the normality tests.
Intra-event residuals
This section discusses results of the univariate standard normality tests performed on the
normalized intra-event residuals (ε). As mentioned previously, ε values are obtained by
dividing the intra-event residuals (ε’s) by the standard deviations (σ ’s) provided by the
Campbell and Bozorgnia [2008] model.
Figure 2.1 shows the normal Q-Q plots of ε computed at four different periods rang-
ing between 0.5 seconds and 10 seconds, with the theoretical quantiles derived from the
CHAPTER 2. JOINT DISTRIBUTION OF SPECTRAL ACCELERATIONS 25
standard normal distribution (normal distribution with zero mean and unit variance). Long
periods such as 10 seconds may not be used in practice as often as short periods. These
long periods are considered in this work, however, in order to cover the entire range of pe-
riods in which the ground-motion model used is applicable. Also shown in the figures are
45 ◦ lines passing though the origin. Deviation of the normal Q-Q plot from the 45 ◦ line
indicates deviation from standard normality. It can be seen from Figure 2.1 that the normal
Q-Q plots match reasonably well with the 45 ◦ lines in all the four cases. This indicates
that ε can be considered to be univariate standard normal based on this data set. Note that
while normality of ε is assumed in PSHA, it is often assumed that the distribution is trun-
cated. A typical decision would be to truncate the distribution at a ε = 2 or 3, and not allow
any larger ε values [Bommer and Abrahamson, 2006]. The tail of the marginal distribution
needs to be studied in order to determine if this truncation of the normal distribution is
reasonable. Figure 2.1 shows that ε values larger than 2 are observed as often as would be
expected from a non-truncated distribution. With the small data sets used, however, it is
not possible to study the tail distribution beyond ε = 3.
A technique to obtain a larger number of samples at the tail of the distribution would
be to pool the ε values computed at different periods. The normalized residuals computed
at various periods are shown to follow a standard normal distribution using the normal Q-
Q plots in Figure 2.1. Hence, it can be inferred that quantiles of the pooled data set will
match with the corresponding quantiles of a theoretical standard normal distribution. The
pooled set has a larger number of data points in the tail and hence, it is preferable to study
the tail properties using the pooled data set rather than the individual data sets. Hence,
12,194 ε values computed at 10 periods ranging from 0.5-10 seconds are pooled together.
The histogram of the pooled data set is shown in Figure 2.2 along with a scaled plot of
the theoretical standard normal distribution. The figure shows that the data are in excellent
agreement with the standard normal distribution, as expected based on the normal Q-Q
plots shown in Figure 2.1. The normal Q-Q plot for the pooled data set is shown in Figure
2.3. It can be seen that the quantiles from the observed data match reasonably well with
the theoretical quantiles up to ε values of 3.5 or 4. Beyond ε = ±4, there is no longer
enough data to study possible truncation. This large data set thus contradicts claims that
an ε truncation at less than 4 is reasonable, and provides no evidence to support truncation
CHAPTER 2. JOINT DISTRIBUTION OF SPECTRAL ACCELERATIONS 26
Figure 2.1: The normal Q-Q plots of the normalized intra-event residuals at four differentperiods. (a) T = 0.5 seconds (1560 samples) (b) T = 1.0 seconds (1548 samples) (c) T = 2.0seconds (1498 samples) (d) T = 10.0 seconds (507 samples).
CHAPTER 2. JOINT DISTRIBUTION OF SPECTRAL ACCELERATIONS 27
at a larger value. This is consistent with the findings of other researchers examining large
data sets [Strasser et al., 2008, Abrahamson, 2006, Bommer et al., 2004].
Inter-event residuals
According to the ground-motion model of Campbell and Bozorgnia [2008], the standard
deviation of the inter-event residuals (η) depends on the rock PGA at the sites. As a result,
while the η values computed at any particular period are identical across all the sites during
a given earthquake, the normalized inter-event residuals (η) vary across sites even during
a single earthquake (because the standard deviation, τ , with which they are normalized
varies from site to site). This makes it impossible to use η for the normality study. It
is seen, however, using the records in the PEER NGA Database [2005] that over 90% of
the standard deviations of η’s (obtained using the ground-motion model of Campbell and
Bozorgnia [2008]) lie within a reasonably narrow interval (with an approximate range of
0.04). Hence, homoscedasticity (i.e., constant variance) of η is considered to be reasonable
and so the η values are used as such, without normalization.
Figure 2.4 shows the normal Q-Q plot obtained using the η values corresponding to
four different periods. The theoretical quantiles are obtained using a normal distribution
with zero mean and a standard deviation that equals the sample standard deviation (which
does not equal one since the η values are not normalized). It is seen from Figures 2.4a-
d that the normal Q-Q plots match reasonably well with the 45 ◦ straight lines, thereby
indicating the univariate normality of inter-event residuals.
2.4 Testing the assumption of multivariate normality for
random vectors using independent samples
In this section, several statistical tests are presented that can be used with observed ground-
motion data to test the validity of the assumed multivariate normal distribution for logarith-
mic spectral accelerations.
A given ground motion will have spectral acceleration values that vary stochastically
as a function of period. Hence, for any d periods, T = [T1,T2, ...,Td], let the corresponding
CHAPTER 2. JOINT DISTRIBUTION OF SPECTRAL ACCELERATIONS 28
Figure 2.2: The histogram of the 12,194 pooled normalized intra-event residuals computedat 10 periods, with the theoretical standard normal distribution (scaled) superimposed.
CHAPTER 2. JOINT DISTRIBUTION OF SPECTRAL ACCELERATIONS 29
values of spectral acceleration at the sites be denoted by S ja(Ti), where j is an index that
denotes a given recording, while Ti indicates a particular period. The mathematical proce-
dures explained in this section can be used to test whether the random vectors of logarithmic
spectral accelerations, [ln(Sa(T1)) , ln(Sa(T2)) , · · · , ln(Sa(Td))], are jointly normal.
Testing for multivariate normality is much more complex than testing for univariate
normality since there are many more properties in a multivariate distribution to be con-
sidered during the test. Among the many possible tests for multivariate normality of a
given data set, eight are reviewed in detail by Mecklin and Mundfrom [2003]. They exam-
ined the power of the eight tests using a Monte Carlo study for several data sets that had
pre-determined multivariate distributions. They recommend the use of the Henze-Zirkler
test [Henze and Zirkler, 1990] as a formal test of multivariate normality, complemented
by other test procedures such as the Mardia’s skewness and kurtosis tests [Mardia, 1970].
Multivariate normality can also be tested using the Chi-square plot (also known as the
gamma plot) [Johnson and Wichern, 2007], which is a multivariate equivalent of the nor-
mal Q-Q plot. The procedure to obtain the Chi-square plot is similar to that used for a
normal Q-Q plot except that squared Mahalanobis distances [Mardia et al., 1979] of data
samples are used in place of the data quantiles and a theoretical Chi-square distribution is
used in place of the theoretical normal distribution. A departure from linearity indicates
departure from multivariate normality. In this work, however, only the three more quantita-
tive tests, namely, the Henze-Zirkler test and Mardia’s test of skewness and of kurtosis are
used. These three tests are described in the following paragraphs.
2.4.1 Henze-Zirkler test
Henze and Zirkler [1990] proposed a class of invariant consistent tests for testing multi-
variate normality. The test procedure is based on the computation of a defined test statistic
which is a function of the given data and whose asymptotic distribution is known if the data
follows a multivariate normal distribution. The statistic can be compared to the asymptotic
distribution to test whether the data set can be reasonably assumed to be normal. The
Henze-Zirkler test statistic is defined as follows: Let X1,X2, ...,Xn be a set of n indepen-
dent data samples (i.e., the X1,X2, ...,Xn are obtained from n independent records) each of
CHAPTER 2. JOINT DISTRIBUTION OF SPECTRAL ACCELERATIONS 30
dimension d (i.e., Xi = {Xi1,Xi2, ...,Xid}). It is to be noted that the variables Xi( j1) and Xi( j2)
could be correlated.
Tn,β =1n
n
∑k=1
n
∑j=1
[exp(−β 2
2
∥∥Y j−Yk∥∥2)]
−2(1+β2)−
d2
n
∑j=1
[exp(− β 2
2(1+β 2)
∥∥Yj∥∥2)]
+n(1+2β
2)− d2 (2.3)
where
β = 1√2
(2d+14
) 1d+4 n
1d+4∥∥Y j−Yk
∥∥2=(X j−Xk
)′S−1 (X j−Xk
)∥∥Y j∥∥2
=(X j− Xn
)′S−1 (X j− Xn
)where Tn,β is the test statistic; Xn is the sample mean vector of the n realizations X1, ...,Xn
and S is the sample covariance matrix defined as S = 1n ∑
nj=1(X j− Xn
)(X j− Xn
)′Henze and Zirkler [1990] also approximated the limiting distribution of Tn,β (given the
multivariate normality of X) with a lognormal distribution with the mean and the variance
defined as follows:
E[Tβ
]= 1−
(1+2β
2)− d2
[1+
dβ 2
1+2β 2 +d(d +2)β 4
2(1+2β 2)2
](2.4)
Var[Tβ
]= 2
(1+4β
2)− d2 +2
(1+2β
2)−d[
1+2dβ 4
(1+2β 2)2 +
3d(d +2)β 8
4(1+2β 2)4
]
−4w(β )−d2
[1+
3dβ 4
2w(β )+
d(d +2)β 8
2w(β )2
](2.5)
where w(β ) =(1+β 2)(1+3β 2)
Based on the value of the statistic computed using the data and the asymptotic distribu-
tion of Tn,β , the p-value of the test of multivariate normality can be calculated. The p-value
is the probability of obtaining a statistic value that is at least as extreme as the statistic
computed from the data, if the null hypothesis of multivariate normality were true. The
CHAPTER 2. JOINT DISTRIBUTION OF SPECTRAL ACCELERATIONS 31
smaller the p-value, the stronger the evidence against the null hypothesis. It is suggested
that this test be used if the sample size n is at least 20 [Henze and Zirkler, 1990].
2.4.2 Mardia’s measures of kurtosis and skewness
Mardia [1970] extended the concepts of kurtosis and skewness from the univariate case
to the multivariate case. Mardia [1970] also obtained the asymptotic distribution of the
multivariate kurtosis and skewness parameters (which is needed to test the null hypothesis
of multivariate normality).
Multivariate kurtosis
Mardia [1970] defined the multivariate kurtosis coefficient as follows:
K = E[(X−µµµ)
′ΣΣΣ−1 (X−µµµ)
]2(2.6)
where X = [X1,X2, ...,Xn] is the random vector whose distribution is tested; µµµ is the mean
vector of X; (X−µµµ)′
refers to the transpose of (X−µµµ) and ΣΣΣ is the covariance matrix of
X. In practice, the value of multivariate kurtosis can be computed from the sample data as
follows:
k =1n
n
∑i=1
[(Xi− Xn
)′S−1 (Xi− Xn
)]2(2.7)
Mardia [1970] also showed that the asymptotic distribution of the above-defined mul-
tivariate kurtosis parameter (k) can be obtained from the following equation, if X follows
the multivariate normal distribution:
k− (d(d +2)(n−1)/(n+1))
(8d(d +2)/n)0.5 ⇒ N(0,1) (2.8)
where N(0,1) denotes the univariate standard normal distribution. The asymptotic distri-
bution can be used to test if the sample data are from a multivariate normally distributed
population, by allowing a p-value to be computed.
CHAPTER 2. JOINT DISTRIBUTION OF SPECTRAL ACCELERATIONS 32
Multivariate skewness
Mardia [1970] and Mardia et al. [1979] defined the measure of multivariate skewness to be
as follows:
S = E[(X1−µµµ)
′ΣΣΣ−1 (X2−µµµ)
]3(2.9)
where X= [X1,X2, ...,Xn] is the random vector whose distribution is tested. This parameter
can be computed from the sample data as follows:
s =1n2
n
∑i=1
n
∑j=1
[(Xi− Xn
)′S−1 (X j− Xn
)]3(2.10)
The asymptotic distribution of the multivariate skewness parameter (s) can be obtained
from the following equation:
ns6⇒ χ
2d(d+1)(d+2)/6 (2.11)
where χ2d(d+1)(d+2)/6 is the Chi-square distribution with d(d +1)(d +2)/6 degrees of free-
dom. This asymptotic distribution can be used to test the null hypothesis of multivariate
normality.
The above procedures can be used to test the multivariate normality of any random vec-
tor using a set of independent data samples. For instance, these tests can be used to verify
the multivariate normality of intra-event residuals computed at multiple periods. In this
case, in order to obtain a set of independent data samples, each random vector (compris-
ing of intra-event residuals computed at multiple periods) must be obtained from records
that are independent of one another. A technique to obtain independent data samples is
discussed in a subsequent section.
2.4.3 Results and discussion
As mentioned earlier, multivariate normality tests need to be performed on intra-event and
inter-event residuals in order to verify multivariate normality of the logarithmic spectral
accelerations. The intra-event residuals are normalized by the appropriate standard devia-
tions before use, while the inter-event residuals are used without normalization, for reasons
CHAPTER 2. JOINT DISTRIBUTION OF SPECTRAL ACCELERATIONS 33
mentioned previously.
Normalized intra-event residuals at different periods
Let ε(T) = [ε(T1), ε(T2), · · · , ε(Td)] denote the random vector of normalized intra-event
residuals computed at d different periods. During an earthquake, different sites experience
different levels of ground motion based on their distance from the earthquake source, the
local soil conditions and other factors. These ground motions can be used to compute
samples (e j(T)) of the random vector ε(T) at site j. This section uses the samples e j(T)obtained at various sites to test whether ε(T) follows a multivariate normal distribution.
The results presented in this work are based on data from the 1994 Northridge earth-
quake and the 1999 Chi-Chi earthquake. The PEER NGA Database [2005] is used to
obtain the data and contains 160 records from the Northridge earthquake and 421 records
from the Chi-Chi earthquake (the aftershock data are not used). From these records, only
those used by the authors of the Campbell and Bozorgnia [2008] ground-motion model are
included in the analysis. Even this reduced data set can not be used as such because the
samples will not be independent of one another on account of the spatial correlation of the
ground motion during a given earthquake. It is known, however, that the correlation be-
tween ei(Tp) and e j(Tp) decreases with increasing separation distance between the sites i
and j, where Tp denotes any particular period. It is seen from the literature that the correla-
tion coefficient drops close to zero (i.e., the ε(Tp)’s are approximately uncorrelated) when
the separation distance exceeds 10km [Boore et al., 2003]. Moreover, it is shown subse-
quently in this chapter that the ε(Tp)’s obtained at different sites from a single earthquake
follow a multivariate normal distribution. Hence, approximately uncorrelated ε(Tp) values
are also approximately independent, and, therefore, samples of random vectors obtained
from recordings at mutually well-separated sites would be approximately independent and
can be used in the tests described in the previous section. Therefore, in the current work,
well-separated locations (with separation distances exceeding 20km) are identified for the
Northridge earthquake and the Chi-Chi earthquake and the tests of normality are performed
on the data set obtained by combining the Chi-Chi and the Northridge earthquake data.
There are several possible combinations of recordings that would satisfy the constraints on
the minimum separation distance and the minimum sample size (as defined in Section 2.4)
CHAPTER 2. JOINT DISTRIBUTION OF SPECTRAL ACCELERATIONS 34
and hence the tests are carried out on the various allowable configurations. Though the test
results vary slightly based on the configuration used, p-values from only a single data set
are reported in this chapter. The combined data set has around 35 records at periods less
than or equal to 2 seconds and close to 30 records at periods below 7.5 seconds, which are
reasonable sample sizes for testing the hypothesis. At 10 seconds, however, the number of
independent samples available is 22, which barely exceeds the threshold of 20, mentioned
in Section 2.4. Hence, ε values computed at 10 seconds are not used often in the tests.
In order to strictly prove multivariate normality of ε , one must evaluate multivariate
normality of normalized residuals having all possible period combinations (i.e., all pairs,
triplets etc.). For all practical purposes, however, it is sufficient to consider the joint dis-
tribution of ε’s computed at five periods. Incidentally, if multivariate normality can be
established for such a case, it can be inferred that the lower-order combinations (i.e., sub-
sets of the five periods that are used) also follow a multivariate normal distribution and
do not have to be tested explicitly. This is because all subsets of a random vector X are
multivariate normal if X is multivariate normal [Johnson and Wichern, 2007].
Results from a set of hypothesis test results are shown in Table 2.1 and explained in
the following paragraphs. The table shows the set of periods at which the ε values are
computed and the p-values obtained based on the Henze-Zirkler test, the Mardia’s test of
skewness and the Mardia’s test of kurtosis. Case 1 shown in the table corresponds to the
bivariate normality tests on the ε’s obtained at 1 second and 2 seconds. The p-values re-
ported by all three tests are statistically insignificant at the 5% significance level typically
used for testing. In Case 2, five different periods ranging between 0.5 seconds and 2 sec-
onds are chosen. The Henze-Zirkler test and the test of skewness report highly insignificant
p-values, and the test of kurtosis reports a p-value of 0.05, which is insignificant as well.
The normality tests are also performed considering long periods. In Case 4, the periods are
chosen over the 0.5-7.5 seconds range, as shown in Table 2.1. The p-values reported by all
three tests are highly statistically insignificant. Finally, a test is carried out considering long
periods exclusively (Case 5); the p-values obtained from all the tests are also statistically
insignificant. Overall, there seems to be not much evidence to reject the null hypothesis
that ε computed at different periods follows a multivariate normal distribution.
CHAPTER 2. JOINT DISTRIBUTION OF SPECTRAL ACCELERATIONS 35
Figure 2.3: The normal Q-Q plot of the pooled set of normalized intra-event residuals.
Table 2.1: Tests on normalized intra-event residuals computed at different periodsCase Periods (secs) PHZ PSK PKT1 T={1.0,2.0} 0.10 0.23 0.932 T={0.5,0.75,1.0,1.5,2.0} 0.49 0.92 0.053 T={0.5,1.0,2.0,5.0,7.5} 0.69 0.90 0.424 T={5.0,7.5,10.0} 0.19 0.14 0.62
Explanation of Abbreviations used in the tableaPHZ : p-value obtained from Henze-Zirkler testbPSK : p-value obtained from Mardia’s test of skewnesscPKT : p-value obtained from Mardia’s test of kurtosis
CHAPTER 2. JOINT DISTRIBUTION OF SPECTRAL ACCELERATIONS 36
Figure 2.4: The normal Q-Q plots of inter-event residuals at four different periods. (a) T =0.5 seconds (64 samples) (b) T = 1.0 seconds (64 samples) (c) T = 2.0 seconds (62 samples)(d) T = 10.0 seconds (21 samples).
CHAPTER 2. JOINT DISTRIBUTION OF SPECTRAL ACCELERATIONS 37
Inter-event residuals at different periods
This section discusses tests carried out on inter-event residuals (η) at multiple periods. The
number of inter-event residuals available for the tests ranges from 64 at 0.5 seconds to 40
at 7.5 seconds. Only 21 records are available, however, at 10 seconds.
Table 2.2 shows the hypothesis test results based on η values. In Case 1, η values at
two periods, 1 second and 2 seconds, are tested for bivariate normality. It can be seen that
the p-values reported by all three tests are highly insignificant. In Case 2, five different
periods are chosen ranging between 0.5 and 2 seconds. The table shows that the p-values
reported by all three tests are statistically significant. The authors believe, however, that
this is a result of the deviations from marginal normality due to the small sample size being
carried over to the higher-order distributions (i.e., even if the true marginal distribution is
normal, a sample from the distribution will not be exactly normal). In order to verify this,
the η values are again computed at the same set of periods as in Case 2 and are trans-
formed so that their marginal distributions are normal (in order to remove the deviations in
the sample’s univariate distribution from the normal distribution), using the normal score
transform procedure described by Deutsch and Journel [1998]. It is to be noted that the
normal score transform (or any other monotonic transform) of the univariate distribution
can not change the basic nature of the bivariate and the other multivariate distributions.
Further, the marginal distribution of η has been shown to be normal in section 2.3 and
hence, the transformation of the marginal distribution of the sampled data does not inter-
fere with the tests for multivariate normality. This transformation procedure is described in
Appendix 2.8. The tests are performed on the transformed data (Case 3) and the p-values
corresponding to all three tests are seen to increase significantly, indicating that the statis-
tically significant p-values in Case 2 is probably a result of the deviation of the sample’s
marginal distribution from a normal distribution rather than an indicator of non-normality
in the joint distribution.
Case 4 involves testing η values at five periods ranging from 0.5-7.5 seconds. The
reported p-values are, again, found to be insignificant. In Case 5, η’s at three long periods
are tested for multivariate normality. The p-values reported by the three tests are highly
statistically insignificant. It can, hence, be concluded from the results that it is reasonable to
CHAPTER 2. JOINT DISTRIBUTION OF SPECTRAL ACCELERATIONS 38
assume that the η’s computed at different periods follow a multivariate normal distribution.
Since both the inter-event and intra-event residuals computed at multiple periods follow
multivariate normal distributions, it is concluded that the logarithmic spectral accelerations
computed at different periods, at a given site during a given earthquake, follow a multivari-
ate normal distribution.
Spectral acceleration values at different orientations
This section describes tests carried out to verify whether spectral acceleration values corre-
sponding to two different orientations at a site follow a bivariate normal distribution. The
test procedures are identical to those described in section 2.4, except that the random vec-
tor would now be written as[SH1
a (T1),SH2a (T2)
], where H1 and H2 refer to two orthogonal
horizontal orientations (e.g., the fault-normal and the fault-parallel directions) and T1 and
T2 denote the periods in consideration in the two orthogonal directions.
In order to verify bivariate normality of the spectral accelerations corresponding to
two different orientations, normality tests should be carried out on the inter-event and the
intra-event residuals separately. The inter-event residuals in the fault-normal and the fault-
parallel directions, however, are not known. As a result, an approximate test for bivariate
normality of spectral accelerations in different orientations is carried out by performing
tests on normalized total residuals. Total residuals are computed based on the following
alternate formulation of the ground-motion equations:
ln(Y ) = ln(Y )+δ (2.12)
where Y denotes the ground-motion parameter of interest; Y denotes the predicted median
value of the ground-motion parameter; δ refers to the total residual, which is a random
variable that represents both the inter-event and the intra-event residuals. From equations
2.1 and 2.12, it can be inferred that δ has zero mean and standard deviation√
σ2 + τ2.
Hence, normalized total residuals (δ ) can be obtained as δ√σ2+τ2 .
In this work, δ values are computed using the fault-normal and the fault-parallel time
histories observed during the Chi-Chi and the Northridge earthquakes [Chiou et al., 2008].
CHAPTER 2. JOINT DISTRIBUTION OF SPECTRAL ACCELERATIONS 39
As mentioned earlier, the tests described in section 2.4 require independent data samples
and hence, pairs of fault-normal and fault-parallel residuals are computed at well-separated
sites (separation distances exceeding 20km).
Table 2.3 shows a sample of the multivariate normality test results obtained when δ
values are computed at different orientations (fault-normal and fault-parallel) and/ or dif-
ferent periods. In Case 1, the δ values corresponding to the fault-normal direction and the
fault-parallel direction are computed at the same period (2 seconds). The three tests of
multivariate normality report insignificant p-values in this case. In Case 2, the δ ’s corre-
sponding to the fault-normal and the fault-parallel directions are computed at 2 different
periods. All three tests report insignificant p-values in Case 2 as well. Finally, it is intended
to check if a larger separation in the periods affects the bivariate distributional properties.
Hence, in Case 3, the fault normal δ values are computed at 0.5 seconds, while the fault-
parallel δ values are computed at 10 seconds. It can be seen from the table that the p-values
are highly insignificant in this case as well.
2.5 Testing the assumption of multivariate normality for
spatially distributed data
The tests that have been described so far are only valid for testing random vectors using
independent samples. While testing spatially-distributed data from a given earthquake,
ground-motion recordings at closely-separated sites should also be considered and hence, it
is not possible to obtain independent samples using the techniques described in section 2.4.
Hence, certain other tests are needed for testing the multivariate normality assumption for
ground-motion intensities distributed over space. Multivariate normality can be ascertained
by verifying univariate normality, bivariate normality, trivariate normality etc. Goovaerts
[1997] and Deutsch and Journel [1998] described a procedure to test the assumption of
bivariate normality of spatially-distributed data whose marginal distribution is standard
normal. This test procedure can be used to verify whether pairs of residuals computed at
two different sites during a single earthquake follow a bivariate normal distribution. The
test is described in the following subsection, followed by test results from recorded ground
CHAPTER 2. JOINT DISTRIBUTION OF SPECTRAL ACCELERATIONS 40
motions.
2.5.1 Check for bivariate normality
Let X(u) denote the random variable (for example, the residuals) in consideration at lo-
cation u and let X(u+ h) denote the random variable in consideration at location u+ h
(h denotes the spatial separation between the 2 locations). The procedure to test bivari-
ate normality [Goovaerts, 1997, Deutsch and Journel, 1998] involves the comparison of
the indicator semivariogram of the data (the experimental indicator semivariogram) to the
theoretical indicator semivariogram obtained by assuming that (X(u),X(u+h)) follows a
bivariate normal distribution.
An indicator semivariogram is a measure of spatial variability and is defined as follows:
γI (h;xp) =12
E([I (X(u+h);xp)− I (X(u);xp)]
2)
(2.13)
where xp denotes the p-quantile of X , and I (X(u);xp) = 1 if X(u)≤ xp;= 0 otherwise.
The experimental indicator semivariogram is a regression-based relationship between
γI (h;xp) and h. In this study, an exponential model is assumed as the form of the regression.
Based on an exponential model, the experimental indicator semivariogram can be defined
as follows:
γI (h;xp) = axp
[1− exp
(−3h/bxp
)](2.14)
where axp and bxp are the sill and the range of the experimental indicator semivariogram
respectively. The sill of a semivariogram equals the variance of X , while the range of a
semivariogram is defined as the separation distance h at which γI (h;xp) equals 0.95 times
the sill (for the exponential model). The range and the sill can be computed using non-
linear least squares regression based on observed values of γI (h;xp) and h. The values
(observed) of γI (h;xp) for a given data set can be obtained as follows (based on Equation
2.13):
γI (h;xp) =1
2N(h)
N(h)
∑α=1
[I (X(uα +h);xp)− I (X(uα);xp)]2 (2.15)
CHAPTER 2. JOINT DISTRIBUTION OF SPECTRAL ACCELERATIONS 41
where N(h) is the number of pairs of data points separated by h (within some tolerance);
and (X(uα +h),X(uα)) denotes the α th such pair.
Theoretically, if X(u) and X(u+h) follow a bivariate normal distribution, the indicator
semivariogram is [Goovaerts, 1997]:
γI(h;xp) = p−
[p2 +
12π
∫ sin−1Cx(h)
0exp
(−x2
p
1+ sin(θ)
)dθ
](2.16)
where Cx(h) denotes the covariance model of X , given as follows:
CX(h) =Covariance(X(u),X(u+h)) (2.17)
The null hypothesis that X(u) and X(u+ h) follow a bivariate normal distribution is
not rejected if the experimental indicator semivariogram compares well to the theoretical
indicator semivariogram.
As mentioned earlier, univariate and bivariate normality are not sufficient conditions
for multivariate normality. For realistic data sets, however, the tests for trivariate normality
and normality at other higher dimensions are impractical. This is because, for example, the
trivariate normality test requires many triplets of data points that have the same geometric
configuration (in terms of the spatial orientation of the three points), which are usually
not available. Hence, in practice, if the sample statistics do not show a violation of the
univariate and bivariate normalities, a multivariate normal model can be assumed for X
[Goovaerts, 1997].
2.5.2 Results and discussion
If the spatially-distributed normalized intra-event residuals (ε) follow a multivariate normal
distribution, it can be seen from equation 2.2 that the logarithmic spectral accelerations
conditioned on the predicted median spectral accelerations will be multivariate normal as
well. This is because the inter-event residuals at any particular period are constant across
all sites, during any single earthquake. Hence, in this section, normality tests are carried
out on the normalized intra-event residuals (ε) only.
CHAPTER 2. JOINT DISTRIBUTION OF SPECTRAL ACCELERATIONS 42
It has been shown previously that the ε values can be represented by a normal distribu-
tion marginally, and hence, only the bivariate normality test results are considered in this
section. To prevent the deviations in the sample’s univariate distribution from the normal
distribution (which can arise even if the population actually follows a univariate normal
distribution) from affecting the results of the bivariate normality test, the univariate distri-
butions of ε are transformed to the standard normal space using the normal score transform
procedure. As mentioned earlier, the normal score transform of the univariate distribution
does not change the basic nature of the bivariate distributions and hence, does not interfere
with the test of bivariate normality.
The procedure to test the bivariate normality of spatially-distributed data described by
Goovaerts [1997] involves comparing the theoretical and the experimental indicator semi-
variograms obtained based on the ε values computed at various periods and for all quantiles
xp (Equation 2.14 and 2.16). However such an exhaustive test is practically impossible and
so a few sample periods and quantiles are tested here. Based on the symmetry of the bi-
variate normal distribution, only values of p in the interval [0, 0.5] are needed. The authors
present results corresponding to p = 0.1, 0.25 and 0.5, so as to cover the entire range. The
periods chosen for the illustrations vary over the range of periods for which the ground-
motion models are usually valid.
Figures 2.5a-c show comparisons of the theoretical and the experimental indicator semi-
variograms obtained using the Chi-Chi data set, with the ε values computed at a period of
2 seconds. It is to be noted that all records (that are usuable at the chosen period) can
be part of the sample data used for obtaining the experimental indicator semivariograms
(unlike in section 2.4 where the sample data had to be independent of each other). The the-
oretical and the experimental indicator semivariograms match reasonably well in all cases.
Figure 2.5d shows the comparison of the theoretical and the experimental indicator semi-
variograms (p = 0.25) for the ε values computed at T = 2 seconds based on the Northridge
earthquake data set, and a reasonable match can be seen there as well. Similar plots are
obtained using the Northridge and the Chi-Chi earthquake data sets and are shown in Fig-
ure 2.6. In obtaining this figure, the value of p is kept constant at 0.25, while the value
of T is varied from as low as 0.5 seconds to as high as 5 seconds. A reasonably good
match between the theoretical and the experimental semivariograms can be seen in these
CHAPTER 2. JOINT DISTRIBUTION OF SPECTRAL ACCELERATIONS 43
0
0.04
0.08
0.12
0.16
0.2
0
0.04
0.08
0.12
0.16
0.2
Experimental indicator semivariogramTheoretical indicator semivariogram
0
0.05
0.1
0.15
0.2
0.25
0.3
0
0.05
0.1
0.15
0.2
0.25
0.3
0 50 100 150 200Distance (km)
0 50 100 150 200Distance (km)
0 50 100 150 200Distance (km)
0 50 100 150 200Distance (km)
γ I(h;
xp)
γ I(h;
xp)
γ I(h;
xp)
γ I(h;
xp)
(a) (b)
(c) (d)
Figure 2.5: Theoretical and empirical semivariograms for residuals computed at 2 seconds:(a) results for the 0.1 quantile of the residuals from the Chi-Chi data (b) results for the0.25 quantile of the residuals from the Chi-Chi data (c) results for the 0.5 quantile of theresiduals based from the Chi-Chi data (d) results for the 0.25 quantile of the residuals fromthe Northridge data.
CHAPTER 2. JOINT DISTRIBUTION OF SPECTRAL ACCELERATIONS 44
figures as well. All these results suggest that bivariate normality can be safely assumed for
spatially-distributed ε’s. Incidentally, it can be seen from Figures 2.5 and 2.6 that the sill of
the indicator semivariograms equals p(1− p), which is a consequence of the independence
between well-separated intra-event residuals [Goovaerts, 1997].
2.6 Conclusions
Statistical tests have been used to test the assumption of joint normality of logarithmic
spectral accelerations. Joint normality of logarithmic spectral accelerations was verified
by testing the multivariate normality of inter-event and intra-event residuals. Univariate
normality of inter-event and intra-event residuals was studied using normal Q-Q plots. The
normal Q-Q plots showed strong linearity, indicating that the residuals are well represented
by a normal distribution marginally. No evidence was found to support truncation of the
marginal distribution of intra-event residuals as is sometimes done in PSHA. Using the
Henze-Zirkler test, the Mardia’s test of skewness and the Mardia’s test of kurtosis, it was
shown that inter-event and the intra-event residuals at a site, computed at different periods,
follow multivariate normal distributions. The normality test of Goovaerts was used to il-
lustrate that pairs of spatially-distributed intra-event residuals can be represented by the bi-
variate normal distribution. For a set of correlated spatially-distributed data, it is practically
impossible to ascertain the trivariate normality and the normality at higher dimensions and
hence, the presence of univariate and bivariate normalities is considered to indicate multi-
variate normality of the spatially-distributed intra-event residuals [Goovaerts, 1997]. The
results reported in this study are based on the residuals computed using the ground-motion
model of Campbell and Bozorgnia [2008], but similar results were obtained when using
the Boore and Atkinson [2008] ground-motion model. This study provides a sound statis-
tical basis for assumptions regarding the marginal and joint distribution of ground-motion
parameters that must be made for a variety of seismic hazard calculations.
CHAPTER 2. JOINT DISTRIBUTION OF SPECTRAL ACCELERATIONS 45
0
0.05
0.1
0.15
0.2
0.25
0
0.05
0.1
0.15
0.2
0.25
0
0.04
0.08
0.12
0.16
0.2
0
0.04
0.08
0.12
0.16
0.2
0 50 100 150 200Distance (km)
0 50 100 150 200Distance (km)
0 50 100 150 200Distance (km)
0 50 100 150 200Distance (km)
γ I(h;
xp)
γ I(h;
xp)
γ I(h;
xp)
γ I(h;
xp)
Experimental indicator semivariogramTheoretical indicator semivariogram
(a) (b)
(c) (d)
Figure 2.6: Theoretical and empirical semivariograms for the 0.25 quantile of the residuals:(a) results for the residuals computed at 0.5 seconds from the Northridge data (b) resultsfor the residuals computed at 0.5 seconds from the Chi-Chi data (c) results for the resid-uals computed at 1 second from the Chi-Chi earthquake data (d) results for the residualscomputed at 5 seconds from the Chi-Chi data.
CHAPTER 2. JOINT DISTRIBUTION OF SPECTRAL ACCELERATIONS 46
2.7 Data source
The data for all the ground motions studied here came from the PEER NGA Database
[2005].
http://peer.berkeley.edu/nga (last accessed 18 May 2007).
2.8 Appendix: Normal score transform
The data sample can be transformed to have a standard normal distribution by a normal
score transform. The transformation involves equating the various quantiles of the data to
the corresponding quantiles of a standard normal distribution.
Let z represent the given data set and let the empirical cumulative distribution function
of the data be denoted by F(z). The F(z)-quantile of the standard normal distribution
is given by Φ−1 (F(z)), where Φ represents the standard normal cumulative distribution
function. Hence, for a given zk, the corresponding normal score value (yk) is computed as
follows:
yk = Φ−1 (F(zk)
)(2.18)
CHAPTER 2. JOINT DISTRIBUTION OF SPECTRAL ACCELERATIONS 47
Table 2.2: Tests on inter-event residuals computed at different periodsCase Periods (secs) PHZ PSK PKT1 T={1.0,2.0} 0.85 0.20 0.352 T={0.5,0.75,1.0,1.5,2.0} 0.00 0.01 0.013 T={0.5,0.75,1.0,1.5,2.0; Norm.} 0.24 0.11 0.114 T={0.5,1.0,2.0,5.0,7.5} 0.79 0.28 0.415 T={5.0,7.5,10.0} 0.68 0.18 0.31
Explanation of Abbreviations used in the tableaNorm.: Data transformed to the standard normal space
Table 2.3: Tests on residuals corresponding to two orthogonal directions (fault-normal andfault-parallel directions)
Case Periods (secs) PHZ PSK PKT1 T1=2;T2=2 0.14 0.13 0.412 T1=1;T2=2 0.17 0.34 0.963 T1=0.5;T2=10 0.94 0.80 0.22
Chapter 3
Correlation model for spatiallydistributed ground-motion intensities
N. Jayaram and J.W. Baker (2009). Correlation model for spatially-distributed ground-
motion intensities, Earthquake Engineering and Structural Dynamics, 38(15), 1687-1708.
3.1 Abstract
Risk assessment of spatially-distributed building portfolios or infrastructure systems re-
quires quantification of the joint occurrence of ground-motion intensities at several sites,
during the same earthquake. The ground-motion models that are used for site-specific haz-
ard analysis do not provide information on the spatial correlation between ground-motion
intensities, which is required for the joint prediction of intensities at multiple sites. More-
over, researchers who have previously computed these correlations using observed ground-
motion recordings differ in their estimates of spatial correlation. In this chapter, ground
motions observed during seven past earthquakes are used to estimate correlations between
spatially-distributed spectral accelerations at various spectral periods. Geostatistical tools
are used to quantify and express the observed correlations in a standard format. The es-
timated correlation model is also compared to previously published results, and apparent
discrepancies among the previous results are explained.
48
CHAPTER 3. SPATIAL CORRELATION MODEL 49
The analysis shows that the spatial correlation reduces with increasing separation be-
tween the sites of interest. The rate of decay of correlation typically decreases with increas-
ing spectral acceleration period. At periods longer than 2 seconds, the correlations were
similar for all the earthquake ground motions considered. At shorter periods, however,
the correlations were found to be related to the local-site conditions (as indicated by site
Vs30 values) at the ground-motion recording stations. The research work also investigates
the assumption of isotropy used in developing the spatial correlation models. It is seen
using Northridge and Chi-Chi earthquake time histories that the isotropy assumption is
reasonable at both long and short periods. Based on the factors identified as influencing the
spatial correlation, a model is developed that can be used to select appropriate correlation
estimates for use in practical risk assessment problems.
3.2 Introduction
The probabilistic assessment of ground-motion intensity measures (such as spectral accel-
eration) at an individual site is a well researched topic. Several ground-motion models
have been developed to predict median ground-motion intensities as well as dispersion
about the median values [e.g., Boore and Atkinson, 2008, Abrahamson and Silva, 2008,
Chiou and Youngs, 2008, Campbell and Bozorgnia, 2008]. Site-specific hazard analysis
does not suffice, however, in many applications that require knowledge about the joint oc-
currence of ground-motion intensities at several sites, during the same earthquake. For
instance, the risk assessment of portfolios of buildings or spatially-distributed infrastruc-
ture systems (such as transportation networks, oil and water pipeline networks and power
systems) requires prediction of ground-motion intensities at multiple sites. Such joint pre-
dictions are possible, however, only if the correlation between ground-motion intensities at
different sites are known [e.g., Lee and Kiremidjian, 2007, Bazzurro and Luco, 2004]. The
correlation is known to be large when the sites are close to one another, and decays with
increase in separation between the sites. Park et al. [2007] report that ignoring or underes-
timating these correlations overestimates frequent losses and underestimates rare ones, and
hence, it is important that accurate ground-motion correlation models be developed for loss
assessment purposes. The current work analyzes correlations between the ground-motion
CHAPTER 3. SPATIAL CORRELATION MODEL 50
intensities observed in recorded ground motions, in order to identify factors that affect these
correlations, and to select a correlation model that can be used for the joint prediction of
spatially-distributed ground-motion intensities in future earthquakes.
Ground-motion models that predict intensities at an individual site i due to an earth-
quake j take the following form:
ln(Yi j) = ln(Yi j)+ εi j +η j (3.1)
where Yi j denotes the ground-motion parameter of interest (e.g., Sa(T ), the spectral acceler-
ation at period T ); Yi j denotes the predicted (by the ground-motion model) median ground-
motion intensity (which depends on parameters such as magnitude, distance, period and
local-site conditions); εi j denotes the intra-event residual, which is a random variable with
zero mean and standard deviation σi j; and η j denotes the inter-event residual, which is a
random variable with zero mean and standard deviation τ j. The standard deviations, σi j
and τ j, are estimated as part of the ground-motion model and are a function of the spectral
period of interest, and in some models also a function of the earthquake magnitude and the
distance of the site from the rupture. During an earthquake, the inter-event residual(η j)
computed at any particular period is a constant across all the sites.
Chapter 2 [Jayaram and Baker, 2008] showed that a vector of spatially-distributed intra-
event residuals εεε jjj =(ε1 j,ε2 j, · · · ,εd j
)follows a multivariate normal distribution. Hence,
the distribution of εεε jjj can be completely defined using the first two moments of the dis-
tribution, namely, the mean and variance of εεε jjj, and the correlation between all εi1 j and
εi2 j pairs (Alternately, the distribution can be defined using the mean and the covariance
of εεε jjj, since the covariance completely specifies the variance and correlations.) Since the
intra-event residuals are zero-mean random variables, the mean of εεε jjj is the zero vector
of dimension d. The covariance, however, is not entirely known from the ground-motion
models since the models only provide the variances of the residuals, and not the correlation
between residuals at two different sites.
Researchers, in the past, have computed these correlations using ground-motion time
histories recorded during earthquakes [Goda and Hong, 2008, Wang and Takada, 2005,
Boore et al., 2003]. Boore et al. [2003] used observations of peak ground acceleration
CHAPTER 3. SPATIAL CORRELATION MODEL 51
(PGA, which equals Sa(0)) from the 1994 Northridge earthquake to compute the spatial
correlations. Wang and Takada [2005] computed the correlations using observations of
peak ground velocities (PGV) from several earthquakes in Japan and the 1999 Chi-Chi
earthquake. Goda and Hong [2008] used the Northridge and Chi-Chi earthquake ground-
motion records to compute the correlation between PGA residuals, as well as the correlation
between residuals computed from spectral accelerations at three periods between 0.3 sec-
onds and 3 seconds. The results reported by these research works, however, differ in terms
of the rate of decay of correlation with separation distance. For instance, while Boore et al.
[2003] report that the correlation drops to zero at a site separation distance of approxi-
mately 10 km, the non-zero correlations observed by Wang and Takada [2005] extend past
100 km. Further, Goda and Hong [2008] observe differences between the correlation decay
rate estimated using the Northridge earthquake records and the correlation decay rate based
on the Chi-Chi earthquake records. To date, no explanation for these differences has been
identified.
The current work uses observed ground motions to estimate correlations between spec-
tral accelerations at the same period. (Appendix A describes the estimation of cross-
correlations between spectral accelerations at two different periods.) Factors that affect
the rate of decay in the correlation with separation distance are identified. The work also
provides probable explanations for the differing results reported in the literature. In this
study, an emphasis is placed on developing a standard correlation model that can be used
for predicting spatially-distributed ground-motion intensities for risk assessment purposes.
3.3 Modeling correlations using semivariograms
Geostatistical tools are widely used in several fields for modeling spatially-distributed ran-
dom vectors (also called random functions) [Deutsch and Journel, 1998, Goovaerts, 1997].
The current research work takes advantage of this well-developed approach to model the
correlation between spatially-distributed ground-motion intensities. The needed tools are
briefly described in this section.
Let Z = (Zu1 ,Zu2, · · · ,Zud) denote a spatially-distributed random function, where ui
denotes the location of site i; Zui is the random variable of interest (in this case, εui j from
CHAPTER 3. SPATIAL CORRELATION MODEL 52
equation 3.1) at site location ui and d denotes the total number of sites. The correlation
structure of the random function Z can be represented by a semivariogram, which is a
measure of the average dissimilarity between the data [Goovaerts, 1997]. Let u and u′
denote two sites separated by h. The semivariogram (γ(u,u′)) is computed as half the
expected squared difference between Zu and Zu′ .
γ(u,u′) =12
E[{Zu−Zu′}2] (3.2)
The semivariogram defined in equation 3.2 is location-dependent and its inference re-
quires repetitive realizations of Z at locations u and u′. Such repetitive measurements of
{Zu,Zu′} are, however, never available in practice (e.g., in the current application, one
would need repeated observations of ground motions at every pair of sites of interest).
Hence, it is typically assumed that the semivariogram does not depend on site locations u
and u′, but only on their separation h. The stationary semivariogram (γ(h)) can then be
obtained as follows:
γ(h) =12
E[{Zu−Zu+h}2] (3.3)
Equation 3.2 can be replaced with equation 3.3 if the random function (Z) is second-
order stationary. Second-order stationarity implies that (i) the expected value of the random
variable Zu is a constant across space and (ii) the two-point statistics (measures that depend
on Zu and Zu′) depend only on the separation between u and u′, and not on the actual
locations (i.e., the statistics depend on the separation vector h between u and u′ and not on
u and u′ as such). A stationary semivariogram can be estimated from a data set as follows:
γ(h) =1
2N(h)
N(h)
∑α=1{zuα− zuα+h}2 (3.4)
where γ(h) is the experimental stationary semivariogram (estimated from a data set); zu
denotes the data value at location u; N(h) denotes the number of pairs of sites separated
by h; and {zuα,zuα+h} denotes the α’th such pair. A stationary semivariogram is said to be
isotropic if it is a function of the separation distance (h = ‖h‖) rather than the separation
vector h.
The function γ(h) provides a set of experimental values for a finite number of separation
CHAPTER 3. SPATIAL CORRELATION MODEL 53
Figure 3.1: (a) Parameters of a semivariogram (b) Semivariograms fitted to the same dataset using the manual approach and the method of least squares.
vectors h. A continuous function must be fitted based on these experimental values in
order to deduce semivariogram values for any possible separation h. A valid (permissible)
semivariogram function needs to be negative definite so that the variances and conditional
variances corresponding to this semivariogram are non-negative. In order to satisfy this
condition, the semivariogram functions are usually chosen to be linear combinations of
basic models that are known to be permissible. These include the exponential model, the
Gaussian model, the spherical model and the nugget effect model.
The exponential model, in an isotropic case (i.e., the vector distance h is replaced by a
scalar separation length ‖h‖, also denoted as h), is expressed as follows:
γ(h) = a [1− exp(−3h/b)] (3.5)
where a and b are the sill and the range of the semivariogram function respectively (Figure
3.1a). The sill of a semivariogram equals the variance of Zu, while the range is defined
as the separation distance h at which γ(h) equals 0.95 times the sill of the exponential
semivariogram.
CHAPTER 3. SPATIAL CORRELATION MODEL 54
The Gaussian model is as follows:
γ(h) = a[1− exp
(−3h2/b2)] (3.6)
The sill and the range of a Gaussian semivariogram are as defined for an exponential semi-
variogram.
The Spherical model is as follows:
γ(h) = a
[32
(hb
)− 1
2
(hb
)3]
if h≤ b (3.7)
= a otherwise
where a and b are again the sill and range of the semivariogram, respectively. The range of
a spherical semivariogram is the separation distance at which γ(h) equals a.
The nugget effect model can be described as:
γ(h) = a [I (h > 0)] (3.8)
where I (h > 0) is an indicator variable that equals 1 when h > 0 and equals 0 otherwise.
The covariance structure of Z is completely specified by the semivariogram function
and the sill and the range of the semivariogram. It can be theoretically shown that the
following relationship holds [Goovaerts, 1997]:
γ(h) = a(1−ρ (h)) (3.9)
where ρ (h) denotes the correlation coefficient between Zu and Zu+h. It can also be shown
that the sill of the semivariogram equals the variance of Zu. Therefore, it would suffice
to estimate the semivariogram of a random function in order to determine its covariance
structure. Moreover, based on equations 3.5 (for instance) and 3.9, it can be seen that a
large range implies a small rate of increase in γ(h) and therefore, large correlations between
Zu and Zu+h. Further, it can be seen from equation 3.8 that the nugget effect model specifies
zero correlation for all non-zero separation distances.
In the current work, correlations between ground-motion intensities at different sites
CHAPTER 3. SPATIAL CORRELATION MODEL 55
are represented using semivariograms. Ground-motion recordings from past earthquakes
are used to estimate ranges of semivariograms and to identify the factors that could affect
the estimates. Throughout this work, the semivariograms are assumed to be second-order
stationary. Second-order stationarity is assumed so that the data available over the entire
region of interest can be pooled and used for estimating semivariogram sills and ranges. In
the current work, like many other works involving spatial-correlation estimation, the semi-
variograms are also assumed to be isotropic. The assumptions of stationarity and isotropy
are investigated in more detail subsequently in this chapter.
3.4 Computation of semivariogram ranges for intra-event
residuals using empirical data
As mentioned earlier, the covariance of intra-event residuals can be represented using a
semivariogram, whose functional form (e.g., exponential model), sill and range need to
be determined. This section discusses the semivariograms estimated based on observed
ground-motion time histories.
For a given earthquake, it can be seen from equation 3.1 that,
εi +η = ln(Yi)− ln(Yi) (3.10)
Let εi denote the normalized intra-event residual at site i (The subscript j in equation
3.1 is no longer used since the residuals used in these calculations are observed during a
single earthquake.) εi is computed as follows:
εi =εi
σi(3.11)
where σi denotes the standard deviation of the intra-event residuals at site i. Further, let εi
denote the sum of the intra-event residual (εi) and inter-event residual (η) normalized by
the standard deviation of the intra-event residual (σi). εi can be computed as follows:
εi =εi +η
σi=
ln(Yi)− ln(Yi)
σi(3.12)
CHAPTER 3. SPATIAL CORRELATION MODEL 56
While assessing covariances, it is convenient to work with ε’s rather than ε’s, since ε’s
are homoscedastic (i.e., constant variance) with unit variance unlike the ε’s.
Since the inter-event residual (η), computed at any particular period, is a constant across
all the sites during a given earthquake, the experimental semivariogram function of ε can
be obtained as follows (based on equation 3.4):
γ(h) =1
2N(h)
N(h)
∑α=1
[εuα− εuα+h]
2 (3.13)
=1
2N(h)
N(h)
∑α=1
[ln(Yuα
)− ln(Yuα)−η
σuα
−ln(Yuα+h)− ln(Yuα+h)−η
σuα+h
]2
≈ 12N(h)
N(h)
∑α=1
[ln(Yuα
)− ln(Yuα)
σuα
−ln(Yuα+h)− ln(Yuα+h)
σuα+h
]2
=1
2N(h)
N(h)
∑α=1
[εuα− εuα+h]
2
where ε is defined by equation 3.12; (uα ,uα +h) denotes the location of a pair of sites
separated by h; N(h) denotes the number of such pairs; Yuαdenotes the ground-motion
intensity at location uα ; and σuαis the standard deviation of the intra-event residual at
location uα . The sill of the semivariogram of ε (i.e., the sill of γ(h)) should equal 1 since
the ε’s have a unit variance. Hence, based on equation 3.9, it can be concluded that:
γ(h) = 1− ρ (h) (3.14)
where ρ (h) is the estimate of ρ (h).
Incidentally, equation 3.13 shows that the covariances of intra-event residuals can be
estimated without having to account for the inter-event residual η . As indicated, equation
3.13 involves an approximation due to the mild assumption that η
σuα
= η
σuα+h. The Boore
and Atkinson [2008] model, which is used in the current work, suggests that the standard
deviation of the intra-event residuals depends only on the period at which the residuals are
computed, and hence, it can be inferred that this approximation is reasonable. Incidentally,
though the current work only uses the Boore and Atkinson [2008] ground-motion model,
the results obtained were found to be similar when an alternate model, namely, the Chiou
CHAPTER 3. SPATIAL CORRELATION MODEL 57
and Youngs [2008] model, was used.
The ground motion databases typically report recordings in two orthogonal horizontal
directions. For instance, the PEER NGA database [Chiou et al., 2008] provides the fault-
normal and the fault-parallel components of the ground motions for each earthquake. In
the current work, it was found that the correlations computed using both the fault-normal
and the fault-parallel time-histories were similar. Hence, only results corresponding to the
fault-normal orientation are reported here. In fact, Baker and Jayaram [2009] and Bazzurro
et al. [2008] used several sets of recorded and simulated ground motions to show that the
estimated correlations are independent of the ground-motion component used.
3.4.1 Construction of experimental semivariograms using empiricaldata
Figure 3.1a shows a sample semivariogram constructed from empirical data. The first step
in obtaining such a semivariogram is to compute site-to-site distances for all pairs of sites
and place them in different bins based on the separation distances. For example, the bins
could be centered at multiples of h km with bin widths of δh km (δh ≤ h). All pairs
of sites that fall in the bin centered at h km (i.e., the sites that are separated by a distance
∈(
h− δh2 ,h+ δh
2
)are used to compute γ(h) (based on equation 3.4)). If δh is chosen to be
very small, it can result in few pairs of sites in the bins, which will affect the robustness of
the results obtained. On the other hand, a large value of δh will mix site pairs with differing
distances reducing the resolution of the experimental semivariograms. In the current work,
experimental semivariograms are obtained using δh = 2 km (unless stated otherwise), since
this was seen to be the smallest value that results in a reasonable number of site pairs in the
bins.
The semivariogram shown in Figure 3.1a has an exponential form with a sill of 1 and a
range of 40 km. This model can be expressed as follows (based on equation 3.5):
γ(h) = 1− exp(−3h/40) (3.15)
The correlation function corresponding to this model equals ρ(h) = 1− γ(h) =
CHAPTER 3. SPATIAL CORRELATION MODEL 58
exp(−3h/40) (based on equation 3.14).
An easy and transparent method to determine the model and the model parameters
is to fit the experimental semivariogram values obtained at discrete separation distances
manually. Suppose that γ(h) can be expressed as follows:
γ(h) = c0γ0(h)+N
∑n=1
cnγn(h) (3.16)
where γ0(h) is a pure nugget effect and γn(h) is a spherical, exponential or Gaussian model
(as defined in equations 3.5-3.8); cn is the contribution of the model n to the semivariogram;
and N is the total number of models used (excluding the nugget effect). The ranges and
the contributions of the models can be systematically varied to obtain the best fit to the
experimental semivariogram values.
In the following sections, priority is placed on building models that fit the empirical
data well at short distances, even if this requires some misfit with empirical data at large
separation distances, because it is more important to model the semivariogram structure
well at short separation distances. This is because the large separation distances are asso-
ciated with low correlations, which thus have relatively little effect on joint distributions
of ground motion intensities. In addition to having low correlation, widely separated sites
also have little impact on each other due to an effective ’screening’ of their influence by
more closely-located sites (Goovaerts, 1997). (It is to be noted that in cases where there
are fewer than 10 closely spaced points, the influence of farther away points will not be
completely screened, according to Goovaerts [1997]. In such cases, the correlation model
developed in this study might provide slightly inaccurate correlation estimates. This might,
however, be mitigated by the fact that the large separation distances are associated with low
correlations, which thus have relatively little effect on joint distributions of ground motion
intensities.) Figure 3.1b shows sample semivariograms fitted to a data set using the the
manual approach and the method of least squares. It can be seen that, at small separations,
the manually-fitted semivariogram is a better model than the one fitted using the method
of least squares. More detailed discussion on the advantages of using manual-fitting rather
than least-squares fitting follows in section 3.6, where the proposed approach is also com-
pared to approaches used in previous research on this topic.
CHAPTER 3. SPATIAL CORRELATION MODEL 59
Figure 3.2: Range of semivariograms of ε , as a function of the period at which ε valuesare computed: (a) the residuals are obtained using the Northridge earthquake data (b) theresiduals are obtained using the Chi-Chi earthquake data.
3.4.2 1994 Northridge earthquake recordings
This section discusses the ranges of semivariograms estimated using observed Northridge
earthquake ground motions. The manual fitting approach described previously is used to
compute ranges of the semivariograms of ε’s (obtained based on the Northridge earthquake
time histories) computed at seven periods ranging between 0 seconds and 10 seconds. Of
the three functional forms considered (equations 3.5-3.7), the exponential model is found
to provide the ‘best fit’ (particularly at small separations) for experimental semivariograms
obtained using ε’s computed at several different periods, based on recordings from different
earthquakes. The constancy of the semivariogram function across periods makes it simpler
to specify a standard correlation model for the ε’s. Moreover, the use of a single model
enables a direct comparison of the correlations between residuals computed at different
periods, using only the ranges of the semivariograms. The ranges of these estimated semi-
variograms are plotted against period in Figure 3.2a. The semivariogram fits corresponding
to all the periods considered can be found in Appendix B.
It can be observed from Figure 3.2a that the estimated range of the semivariogram
tends to increase with period. As described earlier, it can be inferred that the ε values at
long periods show larger correlations than those at short periods. This is consistent with
CHAPTER 3. SPATIAL CORRELATION MODEL 60
comparable past studies of ground motion coherency, which has been widely researched in
the past. Coherency can be thought of as a measure of similarity in two spatially separated
ground motion time histories. Der Kiureghian [1996] reports that coherency is reduced
by the scattering of waves during propagation, and that this reduction is greater for high
frequency waves. High-frequency waves, which have short wavelengths, tend to be more
affected by small scale heterogeneities in the propagation path, and as a result tend to be
less coherent than long period ground waves [Zerva and Zervas, 2002]. It is reasonable to
expect highly coherent ground motions to exhibit correlated peak amplitudes (i.e., spectral
accelerations) as well. Since the ε’s studied here, which quantify these peak amplitudes,
tend to show the same correlation trend with period as previous coherency studies, it may be
that a similar wave-scattering mechanism is partially responsible for the correlation trends
observed here.
The Northridge earthquake data used for the above analysis are obtained from the NGA
database. In order to exclude records whose characteristics differ from those used by the
ground-motion modelers for data analysis, in most cases, only records used by the authors
of the Boore and Atkinson [2008] ground-motion model are considered. For the purposes
of this chapter, these records are denoted ‘usable records’. The semivariograms of residuals
computed at periods of 5, 7.5 and 10 seconds, however, are obtained using all available
Northridge records in the NGA database. This is on account of the limited number of
Northridge earthquake recordings at extremely long periods. At 5 seconds, the residuals
can be computed using 158 total available records, while 66 of these are used by the ground-
motion model authors. Since there is a reasonable number of records available in both
cases, a semivariogram constructed using all 158 records (denoted SV1) can be compared
to that estimated from the usable 66 records (in this case, the bin size was increased to 4
km to compensate for the lack of available records) (denoted SV2). The ranges of the two
semivariograms, SV1 and SV2, are 40 km and 30 km respectively. This shows that there is
a slight difference in the estimated ranges, which could be due to the additional correlated
systematic errors introduced by the extra records.
As mentioned in section 3.2, correlation between intensities estimated using the fault-
normal components are discussed in this chapter. This is because the correlations obtained
using the fault-normal and the fault-parallel ground motions were found to be similar. For
CHAPTER 3. SPATIAL CORRELATION MODEL 61
example, the semivariogram of ε’s computed at 2 seconds, based on the fault-parallel
ground motions recorded during the Northridge earthquake was found to be reasonably
modeled using an exponential function with a unit sill and a range of 36 km. The corre-
sponding range for the semivariogram based on the fault-normal ground motions equals 42
km. Similar results were observed when the residuals were computed at other periods, and
using other earthquake recordings.
3.4.3 1999 Chi-Chi earthquake
In this section, the semivariogram ranges of ε’s from the Chi-Chi earthquake recordings are
presented. The Chi-Chi earthquake ground motions came from the NGA database. Only
records used by the authors of the Boore and Atkinson [2008] ground-motion model are
considered. The summary plot of the estimated ranges is shown in Figure 3.2b. (The semi-
variograms are shown in Appendix B.) The following can be observed from the figures:
(a) As seen with the Northridge earthquake data, the range of the semivariogram typically
increases with period (An exception is observed when the peak ground accelerations (PGA)
are considered, and this is explored further subsequently in this chapter.)
(b) The ranges are higher, in general, than those observed based on the Northridge earth-
quake data (Figure 3.2a). This is consistent with observations made by other researchers
considering Northridge and Chi-Chi earthquake data [e.g., Goda and Hong, 2008].
The large ranges obtained here, relative to the comparable results from Northridge, can
be explained using the Vs30 values (average shear-wave velocities in the top 30 m of the
soil) at the recording stations (The author found an empirical link between the range and
Vs30, but not between range and other earthquake- and site-related parameters such as mag-
nitude, distance. Further research using bigger datasets is necessary to quantify such links.)
The Vs30 values are commonly used in ground-motion models as indicators of the effects of
local-site conditions on the ground motion. ε ′s are affected if the predicted ground-motion
intensities are affected by inaccurate Vs30 values, or if the Vs30’s are inadequate to capture
the local-site effects entirely (i.e., the ground-motion models do not entirely capture the
local-site effects using Vs30 values).
Close to 70% of the Taiwan site Vs30 values are inferred from Geomatrix site classes,
CHAPTER 3. SPATIAL CORRELATION MODEL 62
while the rest of the Vs30 values are measured (NGA database). Since closely-spaced sites
are likely to belong to the same site class and posess similar (and unknown) Vs30 values,
errors in the inferred Vs30 values are likely to be correlated among sites that are close
to each other. Such correlated Vs30 measurement errors will result in correlated prediction
errors at all these closely-spaced sites, which will increase the range of the semivariograms.
The larger ranges of semivariograms estimated using the Chi-Chi earthquake ground
motions may also be due to possible correlation between the true Vs30 values (and not just
the correlation between the Vs30 errors). Larger correlation between the Vs30’s indicate a
more homogeneous soil (homogeneous in terms of properties that affect site effects but not
accounted by the ground-motion models). In such cases, if a ground-motion model does not
accurately capture the local-site effect at one site, it is likely to produce similar prediction
errors in a cluster of closely-spaced sites (on account of the homogeneity). Castellaro et al.
[2008] compared the site-dependent seismic amplification factors (Fa, the site amplification
factor is defined as the amplification of the ground-motion spectral level at a site with
respect to that at a reference ground condition [Borcherdt, 1994]) observed during the 1989
Loma Prieta earthquake to the corresponding site Vs30 values. They found substantial
scatter in the plot of Fa versus Vs30, and also found that this scatter was more pronounced
at short periods (below 0.5 seconds) than at longer periods. This suggests that ground-
motion intensity predictions based on Vs30 will have errors, particularly at periods below
0.5 seconds.
Figures 3.3a and 3.3b show semivariograms of the normalized Vs30 values (the Vs30
semivariogram is not to be confused with the ε semivariogram) at the Northridge earth-
quake recording stations and the Chi-Chi earthquake recording stations respectively (Nor-
malization involves scaling the Vs30 values so that the normalized Vs30 values have a unit
variance to enable a direct comparison of the semivariograms.) Figure 3.3a shows signif-
icant scatter at all separation distances indicating zero correlation at all separations. In
contrast, Figure 3.3b indicates that the Taiwan Vs30 values have significant spatial corre-
lation. This suggests that ε’s may have additional spatial correlation in Taiwan, due to
homogeneous site effects that cause correlated prediction errors.
As mentioned previously, one notable aberration in the plot of range versus period
(Figure 3.2b) is the large range observed when the residuals are computed at 0 seconds
CHAPTER 3. SPATIAL CORRELATION MODEL 63
Figure 3.3: (a) Experimental semivariogram obtained using normalized Vs30’s at therecording stations of the Northridge earthquake. No semivariogram is fitted on accountof the extreme scatter (b) Experimental semivariogram obtained using normalized Vs30’sat the recording stations of the Chi-Chi earthquake. The range of the fitted exponentialsemivariogram equals 25 km.
as compared to some of the longer periods. This is not consistent with the coherency
argument of the previous section. It can, however, be explained using the relationship
between the range and the Vs30’s described in the above paragraphs. The inaccuracies in
ground-motion prediction based on Vs30’s will reflect in increased correlation between the
residuals computed at nearby sites. These inaccuracies are larger at short periods (below
0.5 seconds) [Castellaro et al., 2008], which explains the larger correlation between the
residuals (which ultimately results in the larger range observed) computed using PGAs.
One final test that was considered here was whether spatial correlations differed for
near-fault ground motions experiencing directivity. Baker [2007b] identified pulse-like
ground motions from the NGA database based on wavelet analysis. Thirty such pulses were
identified in the fault-normal components of the Chi-Chi earthquake recordings. Experi-
mental semivariograms of residuals were computed using these pulse-like ground motions,
and their ranges were estimated. It was seen that the ranges were reasonably similar to
those obtained using all usable ground motions (i.e., pulse-like and non-pulse-like). Since
CHAPTER 3. SPATIAL CORRELATION MODEL 64
the available pulse-like ground-motion data set is very small, however, the results obtained
were not considered to be sufficiently reliable, and hence not considered further in this
chapter. A more detailed analysis can be found in Appendix B and Bazzurro et al. [2008].
Based on the discussion in this section, it can be seen that the correlated Vs30 values and
the correlated Vs30 measurement errors are possible reasons for the larger ranges estimated
in section 3.4.3 than in section 3.4.2. Other factors, such as the size of the rupture areas,
may also affect the correlations. These factors could not, however, be investigated with the
limited data set available.
3.4.4 Other earthquakes
The correlations computed using data from the 2003 M5.4 Big Bear City earthquake, the
2004 M6.0 Parkfield earthquake, the 2005 M5.1 Anza earthquake, the 2007 M5.6 Alum
Rock earthquake and the 2008 M5.4 Chino Hills earthquake are presented in this section.
The time histories for these earthquakes were obtained from the CESMD database [2008].
The Vs30 data used for these computations came from the CESMD database [2008] (for the
Parkfield earthquake) and the U.S. Geological Survey Vs30 maps (for the other earthquakes)
[Global Vs30 map server, 2008].
Exponential models are fitted to experimental semivariograms of ε’s computed using
the time histories from the above-mentioned earthquakes, at periods ranging from 0 - 10
seconds. Figure 3.4 shows plots of range versus period for the Big Bear City, Parkfield,
Alum Rock, Anza and Chino Hills earthquake residuals respectively. The ranges of the
semivariograms are generally seen to increase with period, which is consistent with find-
ings from the Chi-Chi and the Northridge earthquake data. It can also be seen from the
figure that, at short periods, the ranges obtained from the Anza earthquake data are larger
than those from the other earthquakes considered. On the other hand, the ranges com-
puted using the Parkfield earthquake data are fairly small at short periods. Semivariograms
of the Vs30’s at the recording stations for all five earthquakes of interest were computed.
The semivariogram range computed using the Anza earthquake Vs30’s was found to be the
largest at 40 km, while the ranges computed from the Chino Hills, Big Bear City, Alum
Rock and Parkfield earthquake data were smaller at 35, 30, 18 and approximately 0 km
CHAPTER 3. SPATIAL CORRELATION MODEL 65
Figure 3.4: Range of semivariograms of ε , as a function of the period at which ε valuesare computed. The residuals are obtained using the: (a) Big Bear City earthquake data (b)Parkfield earthquake data; (c) Alum Rock earthquake data; (d) Anza earthquake data; (e)Chino Hills earthquake data.
CHAPTER 3. SPATIAL CORRELATION MODEL 66
Figure 3.5: Ranges of residuals computed using PGAs versus ranges of normalized Vs30values.
respectively. The estimated ranges of the semivariograms of the residuals and of the Vs30’s
reinforce the argument made previously that clustering in the Vs30 values (as indicated by
a large range of the Vs30 semivariogram) results in increased correlation among the resid-
uals (the low PGA-based range estimated using the Chino hills earthquake data seems to
be an exception, however). This trend is seen in Figure 3.5, which shows the range of
PGA-based residuals plotted against the range of the Vs30’s, for the earthquakes consid-
ered in this work. This dependence on the Vs30 range seems to be lesser at longer periods,
which is in line with the observations of Castellaro et al. [2008] that the scatter in the plot
of Fa versus Vs30 is greater at short periods than at long periods. The authors hypothesize
that the reduced dependence of range on Vs30’s at long periods could also be because the
long-period ranges are considerably influenced by factors other than Vs30 values, such as
coherency as explained in section 3.4.2 and prediction errors unrelated to Vs30’s (which
are likely since the ground-motion models are fitted using much fewer data points at long
periods). Finally, an additional advantage of considering these five additional events is that
earthquakes covering a range of magnitudes have been studied. No trends of range with
magnitude were detected.
A few research works studying spatial correlations use ground-motion recordings from
CHAPTER 3. SPATIAL CORRELATION MODEL 67
Figure 3.6: (a) Range of semivariograms of ε , as a function of the period at which ε valuesare computed. The residuals are obtained from six different sets of time histories as shownin the figure; (b) Range of semivariograms of ε predicted by the proposed model as afunction of the period.
earthquakes in Japan, based on the data provided in the KiK Net [2007]. In this work, data
from the 2004 Mid Niigata Prefecture earthquake and the 2005 Miyagi-Oki earthquake
were explored. Though the number of sites at which the ground-motion recordings are
available is fairly large, most recording stations are far away from each another. The KiK
Net [2007] consists of 681 recording stations, of which only 19 pairs of stations are within
10 km of one another. As explained in section 3.4.2, it is important to accurately model
the semivariogram at short separation distances, particularly at separation distances below
10 km. Hence, the recordings from the KiK Net [2007] were not considered further for
studying the ranges of semivariograms.
3.4.5 A predictive model for spatial correlations
The above sections presented spatial correlations computed using recorded ground mo-
tions from several past earthquakes. In this section, these correlation estimates are used
to develop a model that can be used to select appropriate correlation estimates for risk
assessment purposes.
Figure 3.6a shows the ranges computed using various earthquake data as a function of
CHAPTER 3. SPATIAL CORRELATION MODEL 68
period. From a practical perspective, despite the wide differences in the characteristics of
the earthquakes considered, the ranges computed are quite similar, particularly at periods
longer than 2 seconds. At short periods (below 2 seconds), however, there are considerable
differences in the estimated ranges depending on the ground-motion time histories used.
The previous sections suggested empirically that differences in correlation of ε’s is in large
part explained by the Vs30 values at the recording stations for these earthquakes. Hence,
the following cases can be considered for decision making:
Case 1: If the Vs30 values do not show or are not expected to show clustering (i.e.,
the geologic condition of the soil varies widely over the region (this can be quantified
by constructing the semivariogram of the Vs30’s as explained previously or by using a
simplified visual approach described in Section B.4)), the smaller ranges reported in Figure
3.6a will be appropriate.
Case 2: If the Vs30 values show or are expected to show clustering (i.e., there are
clusters of sites in which the geologic conditions of the soil are similar), the larger ranges
reported in Figure 3.6a should be chosen.
Based on these conclusions, the following model was developed to predict a suitable
range based on the period of interest:
At short periods (T < 1 second), for case 1:
b = 8.5+17.2T (3.17)
At short periods (T < 1 second), for case 2:
b = 40.7−15.0T (3.18)
At long periods (T ≥ 1 second), for both cases 1 and 2:
b = 22.0+3.7T (3.19)
where b denotes the range of the exponential semivariogram (equation 3.5), and T denotes
the period. Based on this model, the correlation between normalized intra-event residuals
CHAPTER 3. SPATIAL CORRELATION MODEL 69
separated by h km is obtained as follows (follows from equations 3.5 and 3.14):
ρ(h) = exp(−3h/b) (3.20)
It is to be noted that the correlations between intra-event residuals will exactly equal the
correlations between normalized intra-event residuals defined above.
The plot of the predicted range versus period is shown in Figure 3.6b. The model has
been developed based on only seven earthquakes, but since the trends exhibited were found
to be similar for these seven, it can be expected that the model will predict reasonable
ranges for future earthquakes.
The predictive model can be used for simulating correlated ground-motion fields for a
particular earthquake as follows:
Step 1 : Obtain median ground motion values (denoted Yi j in equation 3.1) at the sites
of interest using a ground-motion model.
Step 2 : Probabilistically generate (simulate) the inter-event residual term (η j in equa-
tion 3.1), which follows a univariate normal distribution. The mean of the inter-event resid-
ual is zero, and its standard deviation can be obtained using ground-motion models.
Step 3: Simulate the intra-event residuals (εi j in equation 3.1) using the standard devi-
ations from the ground-motion models and the correlations from equations 3.17 - 3.20.
Step 4: Combine the three terms generated in Steps 1 - 3 using equation 3.1 to obtain
simulated ground-motions at the sites of interest.
3.5 Isotropy of semivariograms
This section examines the assumption of isotropy of semivariograms using the ground mo-
tions discussed previously.
3.5.1 Isotropy of intra-event residuals
A stationary semivariogram (γ(h)) is said to be isotropic if it depends only on the separation
distance h = ‖h‖, rather than the separation vector h. Anisotropy is said to be present
when the semivariogram is also influenced by the orientation of the data locations. The
CHAPTER 3. SPATIAL CORRELATION MODEL 70
presence of anisotropy can be studied using directional semivariograms [Goovaerts, 1997].
Directional semivariograms are obtained as shown in equation 3.4 except that the estimate
is obtained using only pairs of (zuα,zuα+h) such that the azimuth of the vector h are identical
and as specified for all the pairs. Since an isotropic semivariogram is independent of data
orientation, the directional semivariograms obtained considering any specific azimuth will
be identical to the isotropic semivariogram if the data is in fact isotropic. Differences
between the directional semivariograms indicate one of two different forms of anisotropy,
namely, geometric anisotropy and zonal anisotropy. Geometric anisotropy is said to be
present if directional semivariograms with differing azimuths have differing ranges. Zonal
anisotropy is indicated by a variation in the sill with azimuth.
3.5.2 Construction of a directional semivariogram
A directional semivariogram is specified by several parameters, as illustrated in Figure
3.7a. The parameters include the azimuth of the direction vector (the azimuth angle (φ ) is
measured from the North), the azimuth tolerance (δφ ), the bin separation (h) and the bin
width (δh). A semivariogram obtained using all pairs of points irrespective of the azimuth
is known as the omni-directional semivariogram, and is an accurate measure of spatial cor-
relation in the presence of isotropy (The semivariograms that have been described in the
previous sections are omni-directional semivariograms.) In determining the experimen-
tal semivariogram in any bin, only pairs of sites separated by distance ranging between[h− δh
2 ,h+ δh2
], and with azimuths ranging between [φ −δφ ,φ +δφ ] are considered. For
example, let α be a site located in a 2 dimensional region, as shown in Figure 3.7a. It is
intended to construct a directional semivariogram with an azimuth of φ (as marked in the
figure). The computation of the experimental semivariogram value (γ(h)) involves pairing
up the data values at all sites falling within the hatched region (the region that satisfies the
conditions on the separation distance and the azimuth, as mentioned above) with the data
value at site α (i.e., uα ). The area of the hatched region is defined by the azimuth tolerance
used and can be seen to increase with increase in separation distance (h) (Figure 3.7a). For
large values of h, the area of the hatched region will be undesirably large and hence, in
addition to placing constraints on the azimuth tolerance, a constraint is explicitly specified
CHAPTER 3. SPATIAL CORRELATION MODEL 71
on the bandwidth of the region of interest, as marked in the figure.
It is usually difficult to compute experimental directional semivariograms on account of
the need to obtain pairs of sites oriented along pre-specified directions. Hence, it is required
that the bin width, the azimuth tolerance and the bandwidth be specified liberally while
constructing directional semivariograms. The results reported in this chapter are obtained
by considering a bin separation of 4 km, a bin width of 4 km, an azimuth tolerance of 10 ◦
and a bandwidth of 10 km. Directional semivariograms are plotted for azimuths of 0 ◦, 45◦ and 90 ◦ in order to capture the effects of anisotropy, if any.
3.5.3 Test for anisotropy using Northridge ground motion data
Figure 3.7b-d shows the omni-directional and the three directional experimental semivar-
iograms of the 2 second ε’s from the Northridge earthquake data. The semivariogram
function shown in the figures is the exponential model with a unit sill and a range of 42 km.
This exponential model (obtained assuming isotropy in section 3.4.2) fits all the experimen-
tal directional semivariograms reasonably well (at short separations, which are of interest).
This is a good indication that the semivariogram is isotropic. Similar results were obtained
at other periods and for other earthquakes (Appendix B and Bazzurro et al. [2008]).
3.6 Comparison with previous research
Researchers have previously computed the correlation between ground-motion intensities
using observed peak ground accelerations, peak ground velocities and spectral accelera-
tions. These works, however, differ widely in the estimated rate of decay of correlation
with separation distance. This section compares the results observed in the current work to
those in the literature and also discusses possible reasons for the apparent inconsistencies
in the previous estimates.
Wang and Takada [2005] used the ground-motion relationship of Annaka et al. [1997]
to compute the normalized auto-covariance function of residuals computed using the Chi-
Chi earthquake peak ground velocities (PGV). They used an exponential model to fit the
discrete experimental covariance values and reported a result which is equivalent to the
CHAPTER 3. SPATIAL CORRELATION MODEL 72
Figure 3.7: (a) Parameters of a directional semivariogram. Subfigures (b), (c) and (d)show experimental directional semivariograms at discrete separations obtained using theNorthridge earthquake ε values computed at 2 seconds. Also shown in the figures is thebest fit to the omni-directional semivariogram: (b) azimuth = 0◦ (c) azimuth = 45◦ (d)azimuth = 90◦.
CHAPTER 3. SPATIAL CORRELATION MODEL 73
following semivariogram:
γ(h) = 1− exp(−3h/83.4). (3.21)
This semivariogram has a unit sill and a range of 83.4 km (from equation 3.5). The current
work does not consider the spatial correlation between PGV-based residuals. The PGVs,
however, are comparable to spectral accelerations computed at moderate periods (0.5 to 1
s), and hence, the semivariogram ranges of residuals computed from PGVs can be quali-
tatively compared to the corresponding ranges estimated in this work (Figure 3.6a). It can
be seen that the range reported by Wang and Takada [2005] is substantially higher than the
ranges observed in the current work.
In order to explain this inconsistency, the correlations computed by Wang and Takada
[2005] are recomputed in the current work using the Chi-Chi earthquake time histories
available in the NGA database and the ground-motion model of Annaka et al. [1997]. The
Annaka et al. [1997] ground-motion model does not explicitly capture the effect of local-
site conditions. To account for the local-site effects, Wang and Takada [2005] amplified the
predicted PGV at all sites by a factor of 2.0 and the same amplification is carried out here
for consistency. The observed and the predicted PGVs are used to compute residuals, and
the experimental semivariograms (at discrete separations) of these residuals are estimated
(considering a bin size of 4 km) using the procedures discussed previously in this chapter.
Figure 3.8a shows the experimental semivariogram obtained, along with an exponential
semivariogram function having a unit sill and a range of 83.4 km (there are slight differ-
ences between this experimental semivariogram and the one shown in Wang and Takada
[2005] possibly due to the differences in processing carried out on the raw data or the spe-
cific recordings used). It is clear from Figure 3.8a, as well as the results presented in Wang
and Takada [2005], that the exponential model with a range of 83.4 km does not provide an
accurate fit to the experimental semivariogram values at small separation distances. This
is because Wang and Takada [2005] minimized the fitting error over all distances to obtain
their model.
In the literature, several research works use the method of least squares (or visual meth-
ods that attempt to minimize the fitting error over all distances, which in effect, produces
fits similar to the least-squares fit), to fit a model to an experimental semivariogram [Goda
CHAPTER 3. SPATIAL CORRELATION MODEL 74
Figure 3.8: Semivariogram obtained using residuals computed based on Chi-Chi earth-quake peak ground velocities: (a) residuals from Annaka et al. [1997] and semivariogrammodel from Wang and Takada [2005] (b) residuals from Annaka et al. [1997] and semivar-iogram fitted to model the discrete values well at short separation distances (c) residualsfrom Annaka et al. [1997], considering random amplification factors.
CHAPTER 3. SPATIAL CORRELATION MODEL 75
and Hong, 2008, Hayashi et al., 2006, Wang and Takada, 2005]. There are three major
drawbacks in using the method of least squares to fit an experimental semivariogram:
(a) As explained in section 3.4.2, it is more important to model the semivariogram struc-
ture well at short separation distances than at long separation distances. This is because of
the low correlation between intensities at well-separated sites and the screening of a far-
away site by more closely-located sites [Goovaerts, 1997]. It is, therefore, inefficient if a
fit is obtained by assigning equal weights to the data points at all separation distances, as
done in the method of least squares.
(b) The results provided by the method of least squares are highly sensitive to the
presence of outliers (because differences between the observed and predicted γ(h)’s are
squared, any observed γ(h) lying away from the general trend will have a disproportionate
influence on the fit).
(c) The least-squares fit results can be sensitive to the maximum separation distance
considered. This is of particular significance if the method of least squares is used to
determine the sill of the semivariogram in addition to its range.
Some of the these drawbacks can be corrected within the framework of the least-squares
method. Drawback (a) can be partly overcome by assigning large weights to the data points
at short separation distances. The presence of outliers can be checked rigorously using
standard statistical techniques [Kutner et al., 2005] and the least-squares fit can be obtained
after eliminating the outliers in order to overcome the second drawback mentioned above.
These procedures, however, add to the complexity of the approach. For this reason, experi-
mental semivariograms are fitted manually rather than using the method of least squares in
the current work [as recommended by Deutsch and Journel, 1998]. This approach allows
one to overlook outliers and also to focus on the semivariogram model at distances that
are of practical interest. Though this method is more subjective than the method of least
squares, experience shows that the results obtained are reasonably robust.
Figure 3.8b shows the experimental semivariogram (identical to the one shown in Fig-
ure 3.8a) along with an exponential function, which is manually fitted to model the experi-
mental semivariogram values well at short separation distances. The range of this exponen-
tial model equals 55 km, which is much less than the range of 83.4 km mentioned earlier,
and is closer to the results reported earlier for the Chi-Chi spectral accelerations.
CHAPTER 3. SPATIAL CORRELATION MODEL 76
The large range reported in Wang and Takada [2005] may also be due to inaccuracies in
modeling the local-site effects. As explained in section 3.4.2, errors in capturing the local-
site effects will cause systematic errors in the predicted ground motions that will result in
an increase in the range of the semivariogram. Using a constant amplification factor of
2.0 (without considering the actual local-site effects) will produce even larger systematic
errors in the predicted ground motions than considered previously. Consider a complemen-
tary hypothetical example in which the ground-motion amplification factor for each site is
considered to be an independent random variable, uniformly distributed between 1.0 and
2.0. Randomizing the ground-motion amplification will break up the correlation between
the prediction errors in a cluster of closely-spaced sites. The semivariogram of residuals
obtained considering such random amplification factors is shown in Figure 3.8c. The range
of this semivariogram equals 43 km, which is less than the 55 km from Figure 3.8b. The
true amplifications are neither constant at 2.0, nor are totally random between 1.0 and 2.0.
Hence, the range of the semivariogram is expected to lie within 43 km and 55 km, which is
close to the range observed using short period spectral-accelerations in the current work.
Boore et al. [2003] estimated correlations between the PGA residuals computed from
the Northridge earthquake. They observed that the correlations dropped to zero when the
inter-site separation distance was approximately 10 km. This matches with the range of
10 km estimated in the current work using the Northridge earthquake PGAs (Figure 3.2a).
Those results appear to be consistent with the results shown here (and it is interesting to
note that the two efforts used different estimation procedures and data sets).
The observations in the current work are also consistent with those reported in Goda
and Hong [2008] who reported a more rapid decrease in correlations with distance for the
Northridge earthquake ground motions than for the Chi-Chi earthquake ground motions.
They also reported that the decay of spatial correlation of the residuals computed from
spectral accelerations is more gradual at longer periods, a feature observed and analyzed in
the current research work. The current work adds plausible physical explanations for these
empirically-observed trends.
CHAPTER 3. SPATIAL CORRELATION MODEL 77
3.7 Conclusions
Geostatistical tools have been used to quantify the correlation between spatially-distributed
ground-motion intensities. The correlation is known to decrease with increase in the separa-
tion between the sites, and this correlation structure can be modeled using semivariograms.
A semivariogram is a measure of the average dissimilarity between the data, whose func-
tional form, sill and range uniquely identify the ground-motion correlation as a function of
separation distance.
Ground motions observed during the Northridge, Chi-Chi, Big Bear City, Parkfield,
Alum Rock, Anza and Chino Hills earthquakes were used to compute the correlations be-
tween spatially-distributed spectral accelerations, at various spectral periods. The correla-
tions were computed for normalized intra-event residuals, since the normalized intra-event
residuals will be homoscedastic. The ground-motion model of Boore and Atkinson [2008]
was used for the computations, but the results did not change when the Chiou and Youngs
[2008] model was used instead.
It was seen that the rate of decay of the correlation with separation typically decreases
with increasing spectral period. It was reasoned that this could be because long period
ground motions at two different sites tend to be more coherent than short period ground
motions, on account of lesser wave scattering during propagation. It was also observed
that, at periods longer than 2 seconds, the estimated correlations were similar for all the
earthquake ground motions considered. At shorter periods, however, the correlations were
found to be related to the site Vs30 values. It was shown that the clustering of site Vs30’s is
likely to result in larger correlations between residuals. Based on these findings, a predic-
tive model was developed that can be used to select appropriate correlation estimates for
use in risk assessment of spatially-distributed building portfolios or infrastructure systems.
The research work also investigates the effect of directivity on the correlations using
pulse-like ground motions. The correlations obtained were similar to those estimated us-
ing all ground motions. The results, however, are not discussed in detail due to concerns
about the reliability of the results on account of the small data set of pulse-like ground
motions. The work also investigated the commonly-used assumption of isotropy in the cor-
relation between residuals using directional semivariograms. If directional semivariograms
CHAPTER 3. SPATIAL CORRELATION MODEL 78
computed based on different azimuths are identical to the omni-directional semivariogram
(which is obtained assuming isotropy), it can be concluded that the semivariograms (and
therefore, the correlations) are isotropic. It was seen using empirical data that the corre-
lation between Chi-Chi and Northridge earthquake intensities show isotropy at both short
and long periods.
The results obtained were also compared to those reported in the literature [Goda and
Hong, 2008, Wang and Takada, 2005, Boore et al., 2003]. Wang and Takada [2005] report
larger correlations using the PGVs computed using the Chi-Chi earthquake recordings than
those reported in this work for spectral accelerations. It was shown that these larger cor-
relations are a result of attempting to fit the experimental semivariogram reasonably well
over the entire range of separation distances of interest (which is a typical result of using
least-squares fits and eye-ball fits that produce results similar to least-squares fits), and of
using a ground-motion model that does not account for the effect of local-site conditions.
Typically, a semivariogram model should represent correlations accurately at small sepa-
rations since ground motions at a site are more influenced by ground motions at nearby
sites. The method of least squares assigns equal importance to all separation distances
and is therefore, inefficient. In the current research work, semivariogram models are fitted
manually with emphasis on accurately modeling correlations at small separations.
This study illustrates various factors that affect the spatial correlation between ground-
motion intensities, and provides a basis to choose an appropriate model using empirical
data. The proposed predictive model can be used for obtaining the joint distribution of
spatially-distributed ground-motion intensities, which is necessary for a variety of seismic
hazard calculations.
Chapter 4
Spatial correlation between spectralaccelerations using simulatedground-motion time histories
N. Jayaram, Park, J., Bazzurro, P. and Tothong, P. (2010). Estimation of spatial correla-
tion between spectral accelerations using simulated ground-motion time histories, 9th U.S.
National and 10th Canadian Conference on Earthquake Engineering, Toronto, Canada.
4.1 Abstract
The impact of earthquakes on a region rather than on just a single property at a specific site
is of interest to several public and private stakeholders, including government and relief
organizations that are in charge of disaster mitigation and post-disaster response planning
and management, and private organizations that insure and manage spatially-distributed as-
sets. Regional earthquake impact assessment requires knowledge about the distribution of
ground-motion intensities over the entire region. Ground-motion models that are used for
quantifying the hazard at a single site do not provide information on the spatial correlation
between ground-motion intensities, which is required for the joint prediction of intensities
at multiple sites. Statistical models that describe the spatial correlation between intensity
measures are available in the literature, and the mathematics behind models that estimate
79
CHAPTER 4. SPATIAL CORRELATION ESTIMATES FROM SIMULATIONS 80
the spatial correlation as a function of site separation distance has already been developed.
This study investigates whether a more sophisticated model of spatial correlation that in-
corporates features such as non-stationarity (variation of correlation with spatial location),
anisotropy (directional dependence) and directivity effects (different correlation models for
pulse-like and non-pulse-like ground motions) is warranted. Testing the need for these ad-
ditional features, however, requires a large number of ground-motion time histories. Since
real data are sparse, the current study uses simulated ground-motion time histories instead.
Overall, this study tests and provides a basis for some of the subtle assumptions commonly
used in spatial correlation models.
4.2 Introduction
The impact of earthquakes on a region rather than on just a single property at a specific site
is of interest to several public and private stakeholders. In the aftermath of a large event,
public entities such as government agencies and relief organizations, and private entities
such as corporations and utilities need to assess the potential damage on a regional scale
in order to plan their emergency response in a timely manner. These organizations also
need to assess regional risks from future earthquakes in order to determine risk mitigation
strategies such as retrofitting and acquiring insurance coverage.
Regional earthquake impact assessment requires knowledge about the joint ground-
motion hazard at multiple sites of interest spread over the entire region. Predictive equa-
tions have been developed for estimating the distribution of the ground-motion intensity
that an earthquake can cause at a single site [e.g., Boore and Atkinson, 2008]. Much less
attention has been devoted, however, to estimating the statistical dependence (spatial cor-
relation) between ground-motion intensities generated by an earthquake at multiple sites.
The spatial correlation between ground-motion intensity measures arises to many factors
including common source effects (e.g., a high stress-drop earthquake may generate ground-
motion intensities that are, on average, higher than the median values from events of the
same magnitude), common path effects (the seismic waves travel over a similar path from
the source to two nearby sites) and common site effects (similar non-linear amplification at
two nearby sites due to proximity). Modern ground-motion models implicitly account for
CHAPTER 4. SPATIAL CORRELATION ESTIMATES FROM SIMULATIONS 81
a part of the dependence via a specific inter-event error term, ηi, as follows [e.g., Boore
and Atkinson, 2008, Abrahamson and Silva, 2008, Chiou and Youngs, 2008, Campbell and
Bozorgnia, 2008]:
ln(Yi j)= lnYi j +σεi j + τηi (4.1)
where Yi j denotes the ground-motion intensity parameter of interest (e.g., Sa(T ), the spec-
tral acceleration at period T ) at site j during earthquake i; Yi j is the median value of Yi j pre-
dicted by the ground-motion model at site j for earthquake i (which depends on parameters
such as magnitude, distance of source from site, local site conditions); ηi is the normalized
inter-event standard normal residual, εi j is the site-to-site normalized intra-event standard
normal residual, τ and σ are the corresponding standard deviations of the two residuals.
While the ground-motion model in Equation 4.1 partially accounts for the correlation of Yi j
at different sites via the common ηi, there is a significant amount of unaccounted correla-
tion in the εi j’s, which is not quantified by the ground-motion models. It is of interest in
this study to further explore the properties of this correlation.
An alternative formulation for Equation 4.1, which was common in older prediction
equations, is given by
ln(Yi j)= lnYi j + σ εi j (4.2)
where εi j is a random variable called the normalized total residual, which represents both
the inter-event and the intra-event variability at site j from earthquake i. Comparing Equa-
tions 4.1 and 4.2, it is seen that
σ =√
σ2 + τ2 (4.3)
εi j =τηi +σεi j
σ(4.4)
This study intends to empirically estimate the correlation between the intra-event resid-
uals (εi j) using ground-motion time histories. Since the inter-event residual is a constant
CHAPTER 4. SPATIAL CORRELATION ESTIMATES FROM SIMULATIONS 82
across all sites during a given earthquake, the correlation between εi j’s equals the corre-
lation between εi j’s [Jayaram and Baker, 2009a] (Chapter 3 of this thesis). While esti-
mating spatial correlations, it is convenient to directly work with total residuals (Equation
4.2) since the values of εi j can be directly computed from the ground-motion observations
without the knowledge about ηi.
In the past, researchers have estimated the spatial correlations between the total residu-
als using recorded ground-motion data [e.g., Wang and Takada, 2005, Jayaram and Baker,
2009a]. Using geostatistical tools, Jayaram and Baker [2009a] identified various factors in-
fluencing the extent of the spatial correlation, and developed a predictive model that can be
used to select appropriate correlation estimates. While recorded ground motions represent
the natural source for estimating the extent of correlation between ground-motion intensi-
ties at two sites, they do not suffice for investigating the validity of assumptions such as
second-order stationarity (i.e., dependence of correlation on just the separation between
sites, and not on the actual location of the sites) and isotropy (i.e., invariance of correlation
with the orientation of the sites) that are commonly used in the spatial correlation models
developed so far. This is on account of the scarcity of ground-motion recordings for any
particular earthquake. This limitation can be partially overcome by using simulated ground
motions. Although the simulations may not be complete substitutes for recorded data, they
are still extremely useful for testing and refining existing correlation models (which re-
quires large amounts of data). This chapter describes the tests carried out to verify the
commonly-used assumptions of stationarity and isotropy using ground motions simulated
by Dr. Brad Aagaard of the United States Geological Survey based on the 1989 Loma
Prieta earthquake source model [Aagaard et al., 2008]. Further, tests carried out to verify
whether pulse-like ground motions that arise due to directivity effects and non-pulse-like
ground motions have similar correlation structures are also described. Information about
tests carried out using other sets of simulated ground motions can be found in [Bazzurro
et al., 2008].
CHAPTER 4. SPATIAL CORRELATION ESTIMATES FROM SIMULATIONS 83
4.3 Statistical estimation of spatial correlation
The current work uses geostatistical tools previously used by Jayaram and Baker [2009a]
to empirically estimate the spatial correlations of residuals from simulated ground-motion
time histories. These tools are described briefly in this section; a detailed discussion can
be found in, for example, Deutsch and Journel [1998] and Jayaram and Baker [2009a]
(Chapter 3 of this thesis).
Let εεε denote the normalized total residuals distributed over space. The correlation
structure of εεε (equivalently, that of εεε) can be represented using a semivariogram, which is
a measure of the dissimilarity between the residuals. Let u and u′ denote two sites separated
by distance vector hhh. The semivariogram (γ(u,u′)) is defined as follows:
γ(u,u′) =12
E[{εu− εu′}2] (4.5)
The semivariogram defined in Equation 4.5 is location-dependent and its inference re-
quires repetitive realizations of ε at locations u and u′. Such repetitive measurements are,
however, never available in practice. Hence, it is typically assumed that the semivariogram
does not depend on site locations u and u′, but only on their separation hhh to obtain a station-
ary semivariogram. The stationary semivariogram (γ(hhh)) can then be estimated as follows:
γ(hhh) =12
E[{εu− εu+h}2] (4.6)
A stationary semivariogram is said to be isotropic if it is a function of the separation dis-
tance (h = ‖hhh‖) rather than the separation vector hhh. An isotropic, stationary semivariogram
can be empirically estimated from a data set as follows:
γ(h) =1
2N(h)
N(h)
∑α=1{εuα− εuα+h}2 (4.7)
where γ(h) is the experimental stationary isotropic semivariogram (estimated from a data
set); N(h) denotes the number of pairs of sites separated by h; and {εuα, εuα+h} denotes the
α’th such pair.
CHAPTER 4. SPATIAL CORRELATION ESTIMATES FROM SIMULATIONS 84
When empirically estimated, γ(h) only provides semivariogram values at discrete val-
ues of h, and hence, a continuous is usually fitted to the discrete values to obtain the semi-
variogram for continuous values of h. The exponential function shown below is commonly
used for this purpose.
γ(h) = a [1− exp(−3h/b)] (4.8)
where a denotes the ‘sill’ of the semivariogram (which equals the variance of the data)
and b denotes the ‘range’ of the semivariogram (which equals the separation distance h at
which γ(h) equals 0.95a).
It can be theoretically shown that the spatial correlation function (ρ(h)) for normalized
total residuals (and therefore, for normalized intra-event residuals) can be computed from
the semivariogram function as follows:
γ(h) = a(1−ρ (h)) (4.9)
Therefore, it can be seen that the correlations are completely defined by the semivari-
ogram, which in turn, is a function only of the range. (The sill is known to equal 1, which
is the variance of the normalized residuals for which the semivariogram is constructed.)
Moreover, note from equations 4.7 and 4.9 that a larger range implies a smaller rate of
increase in γ(h) with h, and subsequently, a smaller rate of decay of correlation with sepa-
ration distance.
4.4 Results and discussion
This section describes the tests carried out to verify the commonly-used assumptions of sta-
tionarity and isotropy using ground motions simulated by Dr. Brad Aagaard of the United
States Geological Survey for the 1989 Loma Prieta earthquake source model [Aagaard
et al., 2008]. Further, tests carried out to verify whether pulse-like ground motions that
arise due to directivity effects and non-pulse-like ground motions have similar correlation
structures are also described. The simulated 1989 Loma Prieta data set contains ground-
motion time histories at 35,547 sites. Soft soil sites with Vs30 ≤ 500m/s are excluded
CHAPTER 4. SPATIAL CORRELATION ESTIMATES FROM SIMULATIONS 85
from the tests, due to concerns about the ability of the simulation methodology to capture
nonlinear soil behavior. Also, the current limitations in the simulation procedure allow us
to investigate the spatial correlation of spectral accelerations only at periods longer than 2s.
The total residuals, ε’s, are computed from the fault normal Sa(T ) values with T =2s, 5s,
7.5s, and 10s using the Boore and Atkinson [2008] ground-motion model. Using the geo-
statistical procedure described in the previous section, discrete semivariogram values are
estimated for these residuals, and an exponential function (Equation 4.8) is subsequently
fitted to the discrete values. Figure 4.1 shows a sample semivariogram obtained using the
residuals corresponding to Sa(T = 2s). This semivariogram has a sill of 1 and a range
of 30km. The ranges of the semivariograms obtained using the fault normal residuals at
the four different periods are plotted in Figure 4.2a. As mentioned earlier, the range is an
indicator of the extent of spatial correlation, and a larger range implies a larger amount
of spatial correlation. Figure 4.2a shows that the range and therefore, the amount of spa-
tial correlation increases with oscillator period. This trend is on expected lines because
the coherency between the period components of the ground motion increases with period
[Der Kiureghian, 1996]. Note that the ranges obtained from this simulated 1989 Loma
Prieta data set are slightly larger than those from recorded ground motions computed by
Jayaram and Baker [2009a] shown in Figure 4.2b. This means that this simulated ground
motion data set is more spatially correlated than real, recorded data sets analyzed so far.
While uncovering the reasons of this apparent discrepancy is beyond the scope of this
study, this finding can perhaps be used to enhance the simulation technique. Despite this
limitation, it is assumed that the large number of simulated ground-motions contains use-
ful information for studying the isotropy and the second-order stationarity assumptions of
spatial-correlation models. These tests can be performed irrespective of the actual extent
of correlations measured.
4.4.1 Effect of ground-motion component orientation on the semivar-iogram range
In order to test whether the extent of spatial correlation is a function of the orientation of
the ground-motion component, semivariograms of residuals are estimated using the fault
CHAPTER 4. SPATIAL CORRELATION ESTIMATES FROM SIMULATIONS 86
Figure 4.1: Semivariogram computed using the Sa(T=2s) residuals.
Figure 4.2: Ranges of semivariograms obtained using residuals computed from the (a) 1989Loma Prieta simulations (b) recorded ground motions [Jayaram and Baker, 2009a].
CHAPTER 4. SPATIAL CORRELATION ESTIMATES FROM SIMULATIONS 87
normal, fault parallel, north-south and east-west components of the simulated ground mo-
tions. The ranges of these semivariograms are shown in Figure 4.3a. The range estimates
are essentially identical for Sa at T =2s, and do not show a significant variation with the
orientation at longer periods. Hence, most of the following analyses in this chapter are
based on the fault normal components of the simulated ground motions.
4.4.2 Testing the assumption of isotropy using directional semivari-ograms
Directional semivariograms of residuals [Deutsch and Journel, 1998, Jayaram and Baker,
2009a] (illustrated in Chapter 3, Appendix B) are obtained as shown in Equation 4.6 except
that the estimates are obtained using only pairs of {εuα, εuα+h} such that the azimuth of the
vector h is identical (or, strictly speaking, within a narrow band of azimuths) for all the pairs
utilized. This study considers azimuth angles of 0◦, 45◦ and 90◦. If anisotropy is present in
the data, the semivariograms along the pre-specified azimuths will differ from each other
and from the omni-directional semivariogram (i.e., the semivariogram obtained using all
pairs of points irrespective of the azimuth). Figure 4.3b compares the omni-directional
semivariogram with the semivariograms obtained by considering azimuths of 0◦, 45◦ and
90◦ for residuals for Sa(T = 2s). All the semivariograms are almost identical for separation
distances below 10km and are reasonably close for separation distances between 10km
and 20km. Recall that during the characterization of the distribution of ground-motion
intensities over a region, it is more important to capture the effects of the spatial correlation
at short separation distances since the extent of spatial correlation decreases rapidly with
separation distance. Also, in addition to having low correlation, widely separated sites also
have little impact on each other due to an effective ’screening’ of their influence by more
closely-located sites Deutsch and Journel [1998]. As a result, since the semivariograms in
Figure 4.3b are nearly identical at short separation distances, it can be reasonably concluded
that, at least for this data set, the spatial correlations can be adequately represented using
an isotropic model. Tests carried out using this Loma Prieta simulated data set for residuals
computed for Sa at longer periods showed similar results as well [Bazzurro et al., 2008].
CHAPTER 4. SPATIAL CORRELATION ESTIMATES FROM SIMULATIONS 88
4.4.3 Testing the assumption of second-order stationarity
A spatial random function Z is said to be second-order stationary if the random variable
Zu and Zv (i.e., the random variables that represent the values of Z at locations u and v,
respectively) have constant means and second-order statistics (i.e., the covariances) that
depend only on the distance vector between u and v and not on the actual locations. In
other words, the covariance is the same between any two sites that are separated by the
same distance and direction (direction is not a concern for isotropic semivariograms), no
matter where the sites are located with respect to the causative fault. The assumption of
second-order stationarity is convenient while developing correlation models since it allows
the data available over the entire region of interest to be pooled together and because it
considerably simplifies the application of the spatial correlation models.
We know that the means of the residuals equal zero irrespective of the location of the
residuals. Therefore, second-order stationarity can be tested by comparing the spatial cor-
relation estimates obtained using residuals located in different spatial domains (i.e., using
data from two groups of sites, one close to the fault and one far from it). Similar semi-
variograms imply that the actual spatial location of the sites where the ground-motion
intensities are measured does not matter. In the current work, seven spatial domains are
defined based on the distance of the sites from the rupture: Domain 1 includes sites be-
tween 0-20km while Domains 2-7 consist of sites between 20-40km, 40-60km, 60-80km,
80-120km, 120-160km and 160-200km of the rupture, respectively. Note that, as with his-
tograms, the selection of the distance bins is somewhat arbitrary. Very narrow bins may
provide results that are both unstable because of scarcity of data and potentially influenced
by local effects (e.g., a cluster of sites with large residuals). Conversely, very broad bins
may not detect any trend in the data, even if there is one. Here, the width of the domains is
selected judiciously to avoid both the above pitfalls.
The 1989 Loma Prieta fault normal ground motions are used to compute ε values at
four different periods, namely, 2s, 5s, 7.5s and 10s. Semivariograms are constructed for
each spatial domain using only the residuals at sites that belong to that domain, and the
estimated ranges are reported in Figure 4.4a. It can be seen that the ranges estimated using
residuals at sites within 20-160km of the rupture are reasonably close to the range estimated
CHAPTER 4. SPATIAL CORRELATION ESTIMATES FROM SIMULATIONS 89
Figure 4.3: (a) Ranges are computed using residuals at different orientations (b) Omni-directional (i.e., obtained using all pairs of points, irrespective of the azimuth) and direc-tional semivariograms computed using residuals for Sa(T = 2s).
Figure 4.4: (a) Ranges are computed using residuals from different spatial domains (b)Ranges are computed using pulse-like and non-pulse-like near fault ground motions.
CHAPTER 4. SPATIAL CORRELATION ESTIMATES FROM SIMULATIONS 90
using all fault normal residuals (’all-site ranges’). There are more significant differences,
however, between the ranges computed using residuals at sites that are very close to or very
far away from the rupture from the all-site ranges. Semivariograms computed using sites
that are farther than 160 km from the rupture show significantly smaller ranges, as do the
semivariograms computed using sites that are within 20 km of the rupture. The ground-
motion intensities at sites farther than 160 km from the rupture are generally very small
and, therefore, accounting for the reduced correlations at these extremely far-off sites is
certainly not critical. It is, however, important to further analyze the smaller correlations
observed at near-fault locations. Intuitively, it is reasonable to expect path effects and small-
scale variations to reduce spatial correlation between ground motions at near-fault sites. At
sites farther than 20km, the path effects and small-scale variations have less differential
influence, thereby resulting in larger ranges and, therefore, larger correlations.
4.4.4 Effect of directivity on spatial correlation
Ground motions at near-fault sites are often influenced by directivity effects, resulting in
large amplitude pulse-like ground motions in the forward-directivity region [Somerville
et al., 1997]. Most ground-motion models, however, do not explicitly capture this effect.
Therefore, the residuals in such cases may be more correlated because of the additional
prediction errors at sites influenced by directivity that are not captured by the ground-
motion model. This study intends to verify whether the spatial correlation between pulse-
like ground motions is different from that between non-pulse-like ground motions.
Baker [2007a] developed a technique that uses wavelet analysis to identify ground mo-
tions with pulses. Although not all the pulses identified by this technique are due to direc-
tivity effects, this approach provides a reasonable data set for studying the potential impact
of directivity. The wavelet analysis procedure of Baker [2007a] is used to identify 434
pulses in the fault normal components of 1989 Loma Prieta simulations (incidentally, the
wavelet analysis procedure also identified 121 pulses in the fault parallel direction, which
are not utilized here). Residuals at four different periods are computed based on these
ground motions and semivariograms of the residuals are developed. The estimated ranges
(shown in Figure 4.4b) of these semivariograms are smaller than those estimated based
CHAPTER 4. SPATIAL CORRELATION ESTIMATES FROM SIMULATIONS 91
on all the fault normal residuals, but similar to those estimated based on ground motions
at all the sites that are within 20 km from the rupture (Figure 4.4a). For a comparison,
Fig. 4b also shows the ranges obtained using ground motions at all the sites that do not
have pulse-like ground motions, but are within 20 km from the rupture (called near-fault
non-pulse records in the legend). It is seen that the ranges obtained in this case are similar
to the ranges obtained using pulses. This indicates that the effect of directivity does not
substantially alter the ranges of the semivariograms. It needs to be verified whether similar
observations can be made using recorded ground motions as well.
4.5 Conclusions
This study investigates the validity of commonly-used assumptions in spatial correlation
models such as non-stationarity (variation of correlation with location) and anisotropy (di-
rectional dependence). Testing the need for these additional features, however, requires a
large number of ground-motion time histories. Since real data are sparse, the tests can be
performed using simulated ground motions. This chapter describes the tests performed us-
ing ground-motion time histories simulated by Dr. Brad Aagaard for the 1989 Loma Prieta
earthquake source model instead. Other data sets were considered in Bazzurro et al. [2008].
Geostatistical tools were used to measure the extent of spatial correlation between spec-
tral accelerations using the simulated ground-motion data set. The correlations were esti-
mated using different orientations of the time histories, namely, fault normal, fault parallel,
north-south and east-west, and were found to be similar in all four cases. The assumption
of isotropy of spatial correlations was studied using directional semivariograms, and was
found to be reasonable. The correlations were seen to be smaller than average between
sites located extremely close to the fault rupture. Intuitively, it is reasonable to expect path
effects and small-scale variations to reduce spatial correlation between ground motions at
near-fault sites. Incidentally, the ground-motion intensities at sites very far away from the
rupture were also found to be less spatially correlated than average, but this finding is of
not much practical importance. It is important, however, to further investigate the smaller
correlations seen at near-fault sites. The pulse-identification algorithm of Baker [2007a]
CHAPTER 4. SPATIAL CORRELATION ESTIMATES FROM SIMULATIONS 92
was used for identifying pulse-like ground motions, and the correlations between pulse-
like and non-pulse-like ground motions were compared. The study, however, did not find
significant differences between the correlations in these two cases. Although some addi-
tional investigation using recorded time histories is needed, this study tests and provides a
basis for some of the subtle assumptions commonly used in spatial correlation models.
Chapter 5
Simulation of spatially-correlatedground-motion intensities with andwithout consideration of recordedintensity values
5.1 Abstract
Quantifying the distribution of ground-motion intensities that might exist over a spatially
distributed region during a future earthquake is important for several practical applications
such as risk assessment and risk mitigation of spatially-distributed systems. Analytically,
this is more complicated than a comparable quantification for only a single site, due to the
interdependence between the intensities at multiple sites. As a result, simulation-based
techniques are often used to quantify this distribution using probabilistically generated
representative ground-motion intensity maps for the region. This chapter discusses two
techniques, namely, single-step simulation and sequential simulation, for generating such
intensity maps.
It may also be of interest to estimate likely ground-motion intensities over a region in
the wake of an earthquake, when ground-motion intensities have been recorded at one or
93
CHAPTER 5. SIMULATION OF GROUND-MOTION INTENSITY FIELDS 94
more locations in the region. These intensity estimates are useful, for instance, in deter-
mining optimal post-earthquake response strategies. In such cases, it is possible to use
ground-motion intensities recorded during the earthquake to improve the ground-motion
intensity prediction at sites where recordings are not available. This chapter discusses a se-
quential simulation technique for generating ground-motion intensity maps incorporating
the information about the recorded intensities.
5.2 Introduction
Quantifying the distribution of ground-motion intensities that might exist over a spatially-
distributed region during a future earthquake is of great interest for several practical ap-
plications. This is important, for instance, to predict (or estimate after an earthquake) the
damage to portfolios of buildings and lifelines and the number of injuries and casualties
in a certain region. This is, however, more complicated than a comparable quantification
at a single site on account of the spatial correlation between the ground-motion intensities
at two different sites. Hence, the distribution of spatial ground-motion intensities is of-
ten quantified using simulation-based approaches that involve probabilistically generating
representative ground-motion intensity maps (a collection of intensities at all the sites of
interest) for future earthquakes. For a given earthquake, the intensities are predicted using
ground-motion models which take the following form [e.g., Boore and Atkinson, 2008,
Abrahamson and Silva, 2008, Chiou and Youngs, 2008, Campbell and Bozorgnia, 2008]:
ln(Sai) = ln(Sai
)+σiεi + τiηi (5.1)
where Sai denotes the spectral acceleration (at the period of interest) at site i; Sai denotes
the predicted (by the ground-motion model) median spectral acceleration which depends
on parameters such as magnitude, distance, period and local-site conditions; εi denotes the
normalized intra-event residual and ηi denotes the normalized inter-event residual. Both εi
and ηi are univariate normal random variables with zero mean and unit standard deviation.
σi and τi are standard deviation terms that are estimated as part of the ground-motion model
and are functions of the spectral period of interest, and in some models also functions of the
CHAPTER 5. SIMULATION OF GROUND-MOTION INTENSITY FIELDS 95
earthquake magnitude and the distance of the site from the rupture. The term σiεi is called
the intra-event residual and the term τiηi is called the inter-event residual. The inter-event
residual is a constant across all the sites for any particular earthquake event. The sum of
the inter-event residual and the intra-event residual is called the total residual.
For a given earthquake, ground-motion intensities can be predicted by combining the
median intensity estimate with simulated values (realizations) of the normalized inter-event
and intra-event residuals, in accordance with Equation 5.1. Past research has indicated that
the normalized intra-event residuals at two different sites are correlated, and the extent of
this correlation depends on the separation distance between the sites [e.g., Jayaram and
Baker, 2009a, Wang and Takada, 2005, Boore et al., 2003] (Chapter 3 of this thesis). Any
simulation of the normalized intra-event residuals must account for this spatial correla-
tion in order to accurately quantify the regional ground-motion hazard [e.g., Jayaram and
Baker, 2010, Park et al., 2007] (Chapter 6 of this thesis). For illustration, Figure 5.1 shows
a sample simulated ground-motion intensity map (the intensity measure used here is Sa(1s),
the spectral acceleration at a period of 1 second) for a magnitude 8 earthquake on the San
Andreas fault. Figure 5.1a shows the median Sa(1s) values estimated using the Boore and
Atkinson [2008] ground-motion model. Figure 5.1b shows a sample realization of the sum
of the inter-event and the intra-event residuals, obtained considering spatial correlation.
Figure 5.1c shows the ground-motion intensities over the region obtained by combining
the median intensities and the simulated residuals.
While the above simulation technique is used for generating ground-motion maps in
the absence of any recorded intensities (say, for a future earthquake), it is often of interest
to quantify the ground-motion intensities over a region following an earthquake (e.g., for
determining the optimal post-earthquake emergency response strategy). Ground-motion
intensity predictions in such cases can be significantly improved (in other words, the un-
certainty in the predictions can be reduced) by utilizing the knowledge about the recorded
intensities.
This chapter primarily focuses on the simulation of correlated residuals with and with-
out consideration of recorded ground-motion intensities. A single-step simulation tech-
nique and a sequential simulation technique are described for simulating residuals in the
CHAPTER 5. SIMULATION OF GROUND-MOTION INTENSITY FIELDS 96
Figure 5.1: Ground-motion intensities map simulation: (a) median intensities (b) spatiallycorrelated normalized total residuals and (c) total intensities.
CHAPTER 5. SIMULATION OF GROUND-MOTION INTENSITY FIELDS 97
absence of recorded intensities. The sequential simulation technique is subsequently ex-
tended to incorporate information from recorded intensities.
This chapter is organized as follows. Sections 5.3.1 and 5.3.2 describe procedures for
simulating a vector of spatially-correlated intra-event residuals and the inter-event resid-
ual for future earthquakes. Section 5.4 describes an importance sampling procedure for
spatially-correlated residuals (used by Jayaram and Baker [2010] for improving the com-
putational efficiency of the lifeline risk assessment process). Section 5.5 describes a simu-
lation procedure that uses information about recorded ground-motion intensities for simu-
lating post-earthquake residuals.
5.3 Simulation of correlated normalized residuals without
consideration of recorded ground-motion intensities
This section describes a single-step and a sequential simulation technique for simulating
correlated normalized residuals.
5.3.1 Single-step simulation technique
Simulation of normalized intra-event residuals
Chapter 2 [Jayaram and Baker, 2008] showed that a vector of spatially-distributed normal-
ized intra-event residuals εεε = (ε1,ε2, · · · ,εp) (where p denotes the total number of sites of
interest) follows a multivariate normal distribution. This distribution is solely defined by
the mean and the variance of the marginal distributions (i.e., the mean and the variance of
εi, which are zero and one respectively), and the correlation between all εi and ε j pairs. The
correlation between the residuals is typically a function of the separation distance between
the residuals, and can be obtained from empirical spatial correlation models [e.g., Jayaram
and Baker, 2009a] (Chapter 3).
The single-step simulation technique makes use of this fact to simulate normalized
residuals as a vector of correlated standard normal random variables. In practice, this
is done using a computer function if available. For instance, the command ‘mvnrnd’ in
CHAPTER 5. SIMULATION OF GROUND-MOTION INTENSITY FIELDS 98
MATLAB accepts input mean and covariance matrices and outputs a vector of correlated
normally-distributed random variables. The mean matrix in this case is a vector of p zeros,
expressed as follows:
µµµ =
0
0
.
.
0
(5.2)
The covariance matrix of εεε , denoted ΣΣΣ, can be expressed as follows:
ΣΣΣ =
1 ρ12 · · · ρ1p
ρ21 1 · · · ρ2p
. . · · · .
. . · · · .
ρp1 ρp2 · · · 1
(5.3)
where ρi j is the correlation between εi and ε j. Chapter 3 [Jayaram and Baker, 2009a]
expressed ρi j as exp(−3hb ), where h is the separation distance between sites i and j, and b
is called the range parameter, which controls the rate of decay of correlation with distance.
The random variables can also be simulated in principle, by first simulating independent
standard normal random variables (for instance, using Box-Muller transform as described
in Law and Kelton [2007]) denoted nnn = [n1,n2, · · · ,np], and by subsequently inducing the
desired correlation between the independent variables using the Choleskey triangle. The
procedure used to induce this correlation is described below.
ΣΣΣ can be decomposed using the Choleskey decomposition [Law and Kelton, 2007] as
follows:
ΣΣΣ = LLLLLLt (5.4)
where LLL is a lower triangular matrix of size p by p and (.)t denotes the transpose operation.
The vector of independent standard normal variable realizations (nnn) can be converted to a
CHAPTER 5. SIMULATION OF GROUND-MOTION INTENSITY FIELDS 99
vector of correlated standard normal variables (eee = [e1,e2, · · · ,ep]) as follows:
eeet = LLLnnnt (5.5)
This vector eee serves as a realization of εεε .
Simulation of the normalized inter-event residuals
Following standard conventions, since the inter-event residual is a constant across all the
sites during a single earthquake [e.g., Abrahamson and Youngs, 1992], the simulated nor-
malized inter-event residuals should satisfy the following relation (which does not assume
that the τi’s are equal in order to be compatible with ground-motion models such as that of
Abrahamson and Silva [2008]):
ηi =τ1
τiη1 (5.6)
Thus the normalized inter-event residuals can be simulated by first simulating η1 from a
univariate normal distribution with zero mean and unit standard deviation (using randn or
mvnrnd in MATLAB for instance), and by subsequently evaluating other normalized inter-
event residuals using Equation 5.6.
Incidentally, if all the τi’s are equal, the ηi’s will be equal as well (=η). In this case,
the value of η can be simulated as a univariate normal variable with zero mean and unit
standard deviation.
Summary of the steps involved
In summary, the steps involved in the single-step simulation procedure are as follows:
• Step 1: Estimate the mean (Equation 5.2) and the covariance (Equation 5.3) matrices
of the residuals. The covariances can be computed from a spatial-correlation model
such as that of Jayaram and Baker [2009a].
• Step 2: Use a computer function such as mvnrnd to generate p jointly normally-
distributed random variables using the mean and the covariance matrices. If a com-
puter function is not available, the variables can be simulated by first simulating
CHAPTER 5. SIMULATION OF GROUND-MOTION INTENSITY FIELDS 100
p independent variables, and by subsequently inducing the correlation using the
Choleskey triangle, as described earlier in the section.
• Step 3: Simulate a normalized inter-event residual η1 from a univariate normal dis-
tribution with zero mean and unit standard deviation (using the same approach used
in Step 2). Estimate the other ηi’s using Equation 5.6. If all the τi’s are equal, all the
ηi’s will equal η1.
• Step 4: Obtain the spectral acceleration at all the sites by combining the medians and
the normalized inter- and intra-event residuals according to Equation 5.1.
5.3.2 Sequential simulation technique
Sequential simulation of intra-event residuals
The single-step simulation technique described previously is computationally inefficient
because the Choleskey decomposition (Equation 5.4) is an O(p3) operation (which is a
problem when p is large). One alternative to the single-step simulation technique is the
sequential simulation technique [Goovaerts, 1997, Deutsch and Journel, 1998] that lends
itself to performing computationally efficient simulations. In this technique, the residu-
als are simulated one at a time, conditioned on the residuals previously simulated. This
conditioning ensures that correlation between the residuals is appropriately accounted for.
The residuals can be simulated in any order as long as each residual is conditioned on ev-
ery other previously simulated residual. The following paragraphs describe the sequential
simulation technique for obtaining p intra-event residuals.
First, obtain e1 (a realization of ε1) by sampling from a univariate normal distribution
with zero mean and unit standard deviation. The other ei’s can be obtained using the proce-
dure described below for simulating εi assuming that ε1,ε2, · · · ,ε(i−1) have been previously
simulated.
Let e1,e2, · · · ,e(i−1) denote the simulated values of the normalized intra-event residuals
ε1,ε2, · · · ,ε(i−1). Since the ε’s follow a multivariate normal distribution, εi conditioned on[ε1,ε2, · · · ,ε(i−1)
]follows a univariate normal distribution with the following conditional
CHAPTER 5. SIMULATION OF GROUND-MOTION INTENSITY FIELDS 101
mean [Johnson and Wichern, 2007]:
E[εi∣∣ε1,ε2, · · · ,ε(i−1)
]= ΣiOΣ
−1OOeeeOOO (5.7)
where eeeOOO =[e1,e2, · · · ,e(i−1)
]t , ΣOO is the covariance matrix of[ε1,ε2, · · · ,ε(i−1)
], and ΣiO
is a row vector of covariances between εi and[ε1,ε2, · · · ,ε(i−1)
]. The symbol O denotes
the set of sites at which the residuals have been previously simulated. ΣiO is thus defined
as follows:
ΣΣΣiO =[ρi1 ρi2 · · · ρi(i−1)
](5.8)
and ΣOO is defined as follows:
ΣΣΣOO =
1 ρ12 · · · ρ1(i−1)
ρ21 1 · · · ρ2(i−1)
. . · · · .
. . · · · .
ρ(i−1)1 ρ(i−1)2 · · · 1
(5.9)
The variance of εi conditioned on[ε1,ε2, · · · ,ε(i−1)
]is expressed as follows:
var[εi∣∣ε1,ε2, · · · ,ε(i−1)
]= 1−ΣiOΣ
−1OOΣOi (5.10)
where ΣOi is the transpose of ΣiO.
ei is now obtained as a realization from a univariate normal distribution with the mean in
Equation 5.7 and the variance in Equation 5.10. This simulation can be performed using the
Box-Muller method [Law and Kelton, 2007] or using a computer function such as ‘randn’
in MATLAB.
As mentioned earlier, the primary reason for using the sequential simulation technique
is to achieve higher computational efficiency. The basic sequential simulation technique as
described above, however, does not provide much benefit since it requires the computation
of the inverses of several ΣOO matrices, some of which will be large if the number of
conditioning sites is large. Hence, in practice, the number of conditioning sites is always
CHAPTER 5. SIMULATION OF GROUND-MOTION INTENSITY FIELDS 102
Figure 5.2: Illustration of the sequential step procedure.
kept small even if a large number of residuals have previously been simulated. This is
typically done by conditioning εi on the q closest ε’s (closest in terms of the Euclidean
distance of the associated sites). This is reasonable because it has been observed in practice
that εi is screened by nearby ε’s from the effect of far away ε’s [Goovaerts, 1997]. Due to
this screening effect, the far-away residuals can be ignored without significantly affecting
the statistical properties of the simulated residuals. The value of q is typically chosen to be
between 10 and 30 to ensure accuracy and computational efficiency. Alternately, we can
also condition εi on only the residuals at sites that are within a distance r from site i, as
illustrated in Figure 5.2. A typical value for r is 30km.
When the residuals are not conditioned on all other previously simulated residuals,
Goovaerts [1997] reports that the order in which the residuals are simulated should be
randomized during the simulation of each ground-motion intensity map to avoid any bias.
This is, however, computationally inefficient since this necessitates the computation and
inversion of many more ΣiO and ΣOO matrices (since the matrices now vary from simulated
map to simulated map). For all practical purposes, the authors’ experience shows that the
use of a single fixed order causes negligible bias in the results, and hence can be used in
order to save significant computational effort (noting that ΣiO and Σ−1OO are identical across
CHAPTER 5. SIMULATION OF GROUND-MOTION INTENSITY FIELDS 103
simulations if a fixed order is assumed).
The inter-event residual simulation is identical to that described in Section 5.3.1.
Summary of the steps involved
In summary, the steps involved in the sequential simulation technique are as follows:
• Step 1: Simulate e1 (a realization of ε1 from a univariate normal distribution with
zero mean and unit standard deviation). Set variable i = 2.
• Step 2: Simulate ei conditioned on the previously simulated residuals ε1,ε2, ..,ε(i−1)
(or just the closest q residuals or the residuals that are within a distance r from site i)
from a univariate normal distribution with the mean in Equation 5.7 and the variance
in Equation 5.10.
• Step 3: Increment i by 1. If i is less than p, go to Step 2, else go to Step 4.
• Step 4: Simulate the normalized inter-event residuals as described in Section 5.3.1.
• Step 5: Obtain the spectral acceleration at all the sites by combining the medians and
normalized inter- and intra-event residuals according to Equation 5.1.
5.4 Importance sampling of normalized intra-event resid-
uals
Sometimes, it is of interest to preferentially sample ground-motion intensity maps with pos-
itive residuals in order to evaluate the performance of structures and lifelines under extreme
events [e.g., Jayaram and Baker, 2010, 2009b]. In such cases, the normalized residuals can
be sampled from an alternate distribution that produces a larger number of positive residu-
als. This procedure of using an alternate distribution for preferential sampling is known as
importance sampling [Law and Kelton, 2007].
CHAPTER 5. SIMULATION OF GROUND-MOTION INTENSITY FIELDS 104
Figure 5.3: The alternate sampling distribution (marginal distribution) used for the impor-tance sampling of residuals [Jayaram and Baker, 2010].
Jayaram and Baker [2010] sampled from a multivariate normal distribution with a pos-
itive mean for the marginal distributions of the normalized intra-event residuals as the al-
ternate sampling distribution (Figure 5.3), in order to preferentially generate positive resid-
uals. This choice was based on the simplicity of the corresponding importance sampling
weights, a parameter (discussed subsequently) that needs to be computed as part of the
importance sampling procedure.
There are minor differences between the sampling procedures using the original (zero
mean distribution) and the alternate (positive mean distribution) sampling distributions, and
these are listed below.
In the single-step simulation technique, Equation 5.5 is replaced by the following equa-
tion:
eee = m111ppp +LLLnnnt (5.11)
where m is the mean of the alternate sampling distribution, and 1p denotes a column vector
of ones of size p. Since the vector nnn is sampled from a zero mean distribution, it can be
noted that the mean of the sampled residuals eee equals the mean of the alternate sampling
distribution.
CHAPTER 5. SIMULATION OF GROUND-MOTION INTENSITY FIELDS 105
Equation 5.2 is replaced by
µµµ =
m
m
.
.
m
(5.12)
If the sequential simulation technique is used, Equation 5.7 is replaced by the following
equation:
E[εi∣∣ε1,ε2, · · · ,ε(i−1)
]= m+ΣiOΣ
−1OO(eeeOOO−m111ppp) (5.13)
It is to be noted that Equation 5.10 remains unaltered.
The rest of this section discusses the computation of the importance sampling weight
for this choice of the alternate sampling distribution. The importance sampling weight can
be viewed as a correction factor that accounts for the differences between the sampling
distribution and the true distribution. Suppose that we are interested in using a simulation-
based approach to compute the expected value of an arbitrary function of εεε denoted q(εεε).
Let f (εεε) denote the probability density function (PDF) of the normalized intra-event resid-
uals, and g(εεε) denote the alternate PDF. The expected value of q(εεε) (denoted H) can be
evaluated as follows:
H =∫
Dq(eee) f (eee)deee (5.14)
where D is the set of all values taken by eee.
The integral can be rewritten as follows:
H =∫
Dq(eee)
f (eee)g(eee)
g(eee)deee (5.15)
Equation 5.15 shows that H can be computed using samples from the alternate PDF
in place of samples from the true PDF if the function q(eee) is multiplied by the correction
factor f (eee)g(eee) . This correction factor is called the importance sampling weight.
In the specific application discussed in this chapter, the distributions f (eee) and g(eee) are
known to be multivariate normal, and are expressed as follows:
CHAPTER 5. SIMULATION OF GROUND-MOTION INTENSITY FIELDS 106
f (e) =1
(2π)p2 |Σ|
12
exp[−1
2eeet
Σ−1eee]
(5.16)
where Σ denotes the covariance matrix of εεε (Equation 5.3).
g(e) =1
(2π)p2 |Σ|
12
exp[−1
2(eee−m111ppp)
tΣ−1(eee−m111ppp)
](5.17)
When the single-step simulation technique is used, the importance sampling weight is
estimated as follows:
f (e)g(e)
= exp[
12(eee−m111ppp)
tΣ−1(eee−m111ppp)−
12
eeetΣ−1eee]
(5.18)
When the sequential simulation technique is used, the importance sampling weight is
computed using the following relationship:
f (e)g(e)
=f (e1)
g(e1)
f (e2|e1)
g(e2|e1)· · ·
f (ep|e1,e2, · · · ,ep−1)
g(ep|e1,e2, · · · ,ep−1)(5.19)
From Equations 5.7, 5.10 and 5.13,
f (ei|e1,e2, · · · ,ei−1)∼ N(ΣiOΣ−1OOeeeOOO,Σii−ΣiOΣ
−1OOΣOi) (5.20)
g(ei|e1,e2, · · · ,ei−1)∼ N(m+ΣiOΣ−1OO(eeeOOO−m111ppp),Σii−ΣiOΣ
−1OOΣOi) (5.21)
While the above discussion focuses on intra-event residuals, the same importance sam-
pling technique can also be used for preferentially sampling positive inter-event residuals.
CHAPTER 5. SIMULATION OF GROUND-MOTION INTENSITY FIELDS 107
5.5 Sequential simulation of correlated normalized resid-
uals with consideration of recorded ground-motion in-
tensities
In this section, a procedure is described for simulating normalized residuals conditioned
on recorded ground-motion intensities. Here, it is assumed for simplicity that the standard
deviations of the inter-event residual and that of the intra-event residuals are constants (i.e.,
σi = σ and τi = τ). Appendix 5.7 discusses the more general case that arises when this
assumption is not true.
In the simulation techniques described in the previous section, the inter-event residual
and the intra-event residuals are simulated separately. This is because the screening ef-
fect is more effective when the intra-event residuals are simulated separately as discussed
in more detail subsequently. When we wish to utilize the recorded intensity information,
however, it is preferable to simulate total residuals (sum of inter-event and intra-event resid-
uals) directly conditioned on total residuals computed from the recordings. This is because
the ground-motion intensity recordings only provide us with information about the total
residuals (computed as the difference between the observed logarithmic intensity and the
predicted logarithmic intensity). If the residual terms are to be simulated separately, the
recorded total residuals will first have to be split into the corresponding inter-event and
intra-event terms, which leads to statistical errors.
The recorded normalized total residual ε(t) can be computed from the recorded ground-
motion intensities as follows (using Equation 5.1):
ε(t)i =
σεi + τη√σ2 + τ2
=ln(Sai)− ln
(Sai
)√
σ2 + τ2(5.22)
where Sai , the observed spectral acceleration at site i, is the intensity measure considered,
and Sai , σ and τ are parameters computed from the ground-motion model as described
earlier. The normalizing factor in the above equation is√
σ2 + τ2 since the variance of the
total residual equals the sum of the variances of the inter-event and the intra-event residual.
CHAPTER 5. SIMULATION OF GROUND-MOTION INTENSITY FIELDS 108
The sequential simulation technique for the normalized intra-event residuals described
earlier in Section 5.3.2 can be used to simulate normalized total residuals as well. The
following changes are necessary, however, since total residuals are simulated directly and
since the recorded total residuals are now considered during the simulation procedure.
(a) Each ε(t)i is conditioned on the ε(t)’s previously simulated as well as the ε(t)’s at the
recording stations. In other words, from a simulation perspective, the recorded ε(t)’s are
treated as additional previously simulated ε(t)’s.
(b) As mentioned earlier, it is reasonable to condition ε(t)i on only the q closest normal-
ized total residuals (including recorded and previously simulated total residuals). It is to be
noted, however, that the screening effect used as the basis for this simplification is slightly
less effective when total residuals are directly simulated as compared to when intra-event
residuals are simulated separately. This is because even though the spatial correlation re-
duces with distance, the minimum value of spatial correlation between ε(t)i and ε
(t)j equals
τ2
σ2+τ2 and not zero (as before). Therefore, ignoring far away residuals can cause slightly
more bias in this case than when only intra-event residuals are simulated.
(c) The conditional mean and the conditional variance of ε(t)i (analogous to the quanti-
ties in Equations 5.7 and 5.10) are now obtained as follows:
E[ε(t)i
∣∣ε(t)1 ,ε(t)2 , · · · ,ε(t)q
]= Σ
(t)iO Σ
(t)−1
OO eee(t)OOO (5.23)
Var[ε(t)i
∣∣ε(t)1 ,ε(t)2 , · · · ,ε(t)q
]= 1−Σ
(t)iO Σ
(t)−1
OO Σ(t)Oi (5.24)
where the set[ε(t)1 ,ε
(t)2 , · · · ,ε(t)q
]comprises of the q closest recorded and previously sim-
ulated normalized total residuals, and eee(t)O denotes the realization (or recorded value as
appropriate) of[ε(t)1 ,ε
(t)2 , · · · ,ε(t)q
].
The covariance matrices ΣiO, ΣOO and Σii in Equations 5.8 and 5.9 were defined for
intra-event residuals only. The corresponding covariance matrices for the normalized total
residuals are obtained as follows:
ΣΣΣ(t)iO =
[σ2ρi1+τ2
σ2+τ2σ2ρi2+τ2
σ2+τ2 · · · σ2ρiq+τ2
σ2+τ2
](5.25)
CHAPTER 5. SIMULATION OF GROUND-MOTION INTENSITY FIELDS 109
ΣΣΣ(t)OO =
1 σ2ρ12+τ2
σ2+τ2 · · · σ2ρ1q+τ2
σ2+τ2
σ2ρ21+τ2
σ2+τ2 1 · · · σ2ρ2q+τ2
σ2+τ2
. . · · · .
. . · · · .σ2ρq1+τ2
σ2+τ2σ2ρq2+τ2
σ2+τ2 · · · 1
(5.26)
The rest of the sequential simulation technique is identical to the one described earlier.
In particular, residual ε(t)1 is first simulated as a univariate normally-distributed random
variable with zero mean and unit standard deviation. The residual ε(t)i is then simulated
conditioned on the previously simulated residuals from a normal distribution with the mean
in Equation 5.23 and the variance in Equation 5.24.
5.6 Conclusions
Quantifying the distribution of ground-motion intensities over a spatially-distributed re-
gion is an important task for several practical applications such as the risk assessment
and post-earthquake damage assessment of spatially-distributed systems. Often, this is
done using a simulation-based framework that involves generating probabilistic samples
of representative ground-motion intensity maps. This chapter discussed techniques for
simulating ground-motion intensity maps with and without the consideration of recorded
ground-motion intensities. A ground-motion intensity map is generated by combining me-
dian intensity predictions from ground-motion models with realizations of inter-event and
intra-event residuals that account for the uncertainty in the intensities. Intra-event residuals
can be simulated as a correlated vector of normal random variables, and the inter-event
residual can be simulated as a univariate normal random variable.
The chapter discussed two simulation techniques, namely, single-step simulation and
sequential simulation for generating residuals in the absence of recorded ground-motion
intensities. While both procedures are theoretically equivalent, it is possible to achieve
higher computational efficiency using the sequential simulation technique. The chapter also
CHAPTER 5. SIMULATION OF GROUND-MOTION INTENSITY FIELDS 110
described a sequential simulation technique for simulating residuals incorporating knowl-
edge about recorded ground-motion intensities. This is useful for post-earthquake damage
assessment and for determining optimal emergency response strategies.
5.7 Appendix: The conditional sequential simulation of
normalized heteroscedastic residuals
This section generalizes the results shown in section 5.5 for the case where the residuals
are heteroscedastic (i.e., σi and τi are not constant across all sites). The normalized total
residual is now defined as follows:
ε(t)i =
σiεi + τiηi√σ2
i + τ2i
(5.27)
The simulation procedure is similar to that described in section 5.5 with changes to the
covariance matrices shown in Equations 5.25 and 5.26. The new matrices can be estimated
as follows:
ΣΣΣ(t)iO =
[σiσ1ρi1+τiτ1√σ2
i +τ2i
√σ2
1+τ21
σiσ2ρi2+τiτ2√σ2
i +τ2i
√σ2
2+τ22· · · σiσqρiq+τiτq√
σ2i +τ2
i
√σ2
q+τ2q
](5.28)
ΣΣΣ(t)OO =
1 σ1σ2ρ12+τ1τ2√σ2
1+τ21
√σ2
2+τ22· · · σ1σqρ1q+τ1τq√
σ21+τ2
1
√σ2
q+τ2q
σ2σ1ρ21+τ2τ1√σ2
2+τ22
√σ2
1+τ21
1 · · · σ2σqρ2q+τ2τq√σ2
2+τ22
√σ2
q+τ2q
. . · · · .
. . · · · .σqσ1ρq1+τqτ1√σ2
q+τ2q
√σ2
1+τ21
σqσ2ρq2+τqτ2√σ2
q+τ2q
√σ2
2+τ22· · · 1
(5.29)
Chapter 6
Efficient sampling and data reductiontechniques for probabilistic seismiclifeline risk assessment
N. Jayaram and J.W. Baker (2010). Efficient sampling and data reduction techniques for
probabilistic seismic lifeline risk assessment, Earthquake Engineering and Structural Dy-
namics (published online).
6.1 Abstract
Probabilistic seismic risk assessment for spatially-distributed lifelines is less straightfor-
ward than for individual structures. While procedures such as the ‘PEER framework’ have
been developed for risk assessment of individual structures, these are not easily applica-
ble to distributed lifeline systems, due to difficulties in describing ground-motion inten-
sity (e.g., spectral acceleration) over a region (in contrast to ground-motion intensity at a
single site, which is easily quantified using Probabilistic Seismic Hazard Analysis), and
since the link between the ground-motion intensities and lifeline performance is usually
not available in closed form. As a result, Monte Carlo simulation and its variants are well
suited for characterizing ground motions and computing resulting losses to lifelines. This
111
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 112
chapter proposes a simulation-based framework for developing a small but stochastically-
representative catalog of earthquake ground-motion intensity maps that can be used for
lifeline risk assessment. In this framework, Importance Sampling is used to preferentially
sample ‘important’ ground-motion intensity maps, and K-Means Clustering is used to iden-
tify and combine redundant maps in order to obtain a small catalog. The effects of sam-
pling and clustering are accounted for through a weighting on each remaining map, so the
resulting catalog is still a probabilistically correct representation. The feasibility of the
proposed simulation framework is illustrated by using it to assess the seismic risk of a
simplified model of the San Francisco Bay Area transportation network. A catalog of just
150 intensity maps is generated to represent hazard at 1,038 sites from ten regional fault
segments causing earthquakes with magnitudes between five and eight. The risk estimates
obtained using these maps are consistent with those obtained using conventional Monte
Carlo simulation utilizing many orders of magnitudes more ground-motion intensity maps.
Therefore, the proposed technique can be used to drastically reduce the computational ex-
pense of a simulation-based risk assessment, without compromising the accuracy of the
risk estimates. This will facilitate computationally intensive risk analysis of systems such
as transportation networks. Finally, the study shows that the uncertainties in the ground-
motion intensities and the spatial correlations between ground-motion intensities at various
sites must be modeled in order to obtain unbiased estimates of lifeline risk.
6.2 Introduction
Lifelines are large, geographically-distributed systems that are essential support systems
for any society. Due to their known vulnerabilities, it is important to proactively assess
and mitigate the seismic risk of lifelines. For instance, the Northridge earthquake caused
over $1.5 billion in business interruption losses ascribed to transportation network damage
[Chang, 2003]. The city of Los Angeles suffered a power blackout and $75 million of
power-outage related losses as a result of the earthquake [e.g., Tanaka et al., 1997]. Re-
cently, the analytical Pacific Earthquake Engineering Research Center (PEER) loss analysis
framework has been used to perform risk assessment for a single structure at a given site,
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 113
by estimating the site ground-motion hazard and assessing probable losses using the haz-
ard information [e.g., McGuire, 2007]. Lifeline risk assessment, however, is based on a
large vector of ground-motion intensities (e.g., spectral accelerations at all lifeline compo-
nent locations). The intensities also show significant spatial correlation, which needs to be
carefully modeled in order to accurately assess the seismic risk. Further, the link between
the ground-motion intensities at the sites and the performance of the lifeline is usually
not available in closed form. For instance, the travel time of vehicles in a transportation
network, a commonly-used performance measure, is only obtained using an optimization
procedure rather than being a closed-form function of the ground-motion intensities. These
additional complexities make it difficult to use the PEER framework for lifeline risk as-
sessment. There are some analytical approaches that are sometimes used for lifeline risk
assessment [e.g., Kang et al., 2008, Duenas-Osorio et al., 2005], but those are generally ap-
plicable to only specific classes of lifeline reliability problems. Hence, many past research
works use simulation-based approaches instead of analytical approaches for lifeline risk
assessment [e.g., Campbell and Seligson, 2003, Bazzurro and Luco, 2004, Crowley and
Bommer, 2006, Kiremidjian et al., 2007, Shiraki et al., 2007]. One simple simulation-based
approach involves studying the performance of lifelines under those earthquake scenarios
that may dominate the hazard in the region of interest [e.g., Adachi and Ellingwood, 2008].
While this approach is more tractable, it does not capture seismic hazard uncertainties in
the way a Probabilistic Seismic Hazard Analysis (PSHA)-based framework would. Fur-
ther, it is not easy to identify the earthquake scenario that dominates the hazard at the loss
levels of interest [Jayaram and Baker, 2009b]. (Appendix C uses lifeline loss deaggrega-
tion calculations to illustrate the difficulties involved in selecting a dominating earthquake
scenario.) A more comprehensive approach uses Monte Carlo simulation (MCS) to prob-
abilistically generate ground-motion intensity maps (also referred to as intensity maps in
this chapter), considering all possible earthquake scenarios that could occur in the region,
and then use these for the risk assessment. Ground-motion intensities are generated using
an existing ground-motion model, which is described below.
The ground-motion intensity at a site is modeled as
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 114
ln(Sai j) = ln(Sai j
)+σi jεi j + τi jηi j (6.1)
where Sai j denotes the spectral acceleration (at the period of interest) at site i during earth-
quake j; Sai j denotes the predicted (by the ground-motion model) median spectral accel-
eration which depends on parameters such as magnitude, distance, period and local-site
conditions; εi j denotes the normalized intra-event residual and ηi j denotes the normalized
inter-event residual. Both εi j and ηi j are univariate normal random variables with zero
mean and unit standard deviation. σi j and τi j are standard deviation terms that are es-
timated as part of the ground-motion model and are functions of the spectral period of
interest, and in some models also functions of the earthquake magnitude and the distance
of the site from the rupture. The term σi jεi j is called the intra-event residual and the term
τi jηi j is called the inter-event residual. The inter-event residual is a constant across all the
sites for a given earthquake.
Crowley and Bommer [2006] describe the following MCS approach to simulate inten-
sity maps using Equation 6.1:
Step 1: Use Monte Carlo simulation to generate earthquakes of varying magnitudes on
the active faults in the region, considering appropriate magnitude-recurrence relationships
(e.g., the Gutenberg-Richter relationship).
Step 2: Using a ground-motion model (Equation 6.1), obtain the median ground-motion
intensities (Sai j) and the standard deviations of the inter-event and the intra-event residuals
(σi j and τi j) at all the sites.
Step 3: Generate the normalized inter-event residual term (ηi j) by sampling from the
univariate normal distribution.
Step 4: Simulate the normalized intra-event residuals (εi j’s) using the parameters pre-
dicted by the ground-motion model. Chapter 2 [Jayaram and Baker, 2008] showed that a
vector of spatially-distributed normalized intra-event residuals εεε jjj =(ε1 j,ε2 j, · · · ,εp j
)fol-
lows a multivariate normal distribution. Hence, the distribution of εεε jjj can be completely
defined using the mean (zero) and standard deviation (one) of εi j, and the correlation be-
tween all εi1 j and εi2 j pairs. The correlations between the residuals can be obtained from a
predictive model calibrated using past ground-motion intensity observations [Jayaram and
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 115
Baker, 2009a, Wang and Takada, 2005].
Step 5: Combine the median intensities, the normalized intra-event residuals and the
normalized inter-event residual for each earthquake in accordance with Equation 6.1 to
obtain ground-motion intensity maps (i.e., obtain Sa j =(Sa1 j ,Sa2 j , · · · ,Sap j
)).
Crowley and Bommer [2006] used the above-mentioned approach to generate multiple
earthquake scenarios that were then used for the loss assessment of a portfolio of build-
ings. They found that the results differed significantly from those obtained using other
approximate approaches (e.g., using PSHA to obtain individual site hazard and loss ex-
ceedance curves, which are then heuristically combined to obtain the overall portfolio loss
exceedance curve). Crowley and Bommer [2006], however, ignored the spatial correlations
of εi j’s when simulating intensity maps. Further, they used conventional MCS (i.e., brute-
force MCS or random MCS), which is computationally inefficient because large magni-
tude events and above-average ground-motion intensities are considerably more important
than small magnitude events and small ground-motion intensities while modeling lifeline
risks, but these are infrequently sampled in conventional MCS. Kiremidjian et al. [2007]
improved the simulation process by preferentially simulating large magnitudes using im-
portance sampling (IS). The normalized residuals (εi j and ηi j), however, were simulated
using conventional MCS.
Shiraki et al. [2007] also used a MCS-based approach to estimate earthquake-induced
delays in a transportation network. They generated a catalog of 47 earthquakes and cor-
responding intensity maps for the Los Angeles area and assigned probabilities to these
earthquakes such that the site hazard curves obtained using this catalog match with the
known local site hazard curves obtained from PSHA. In other words, the probabilities of
the scenario earthquakes were made to be hazard consistent. Only median peak ground
accelerations were used to produce the ground-motion intensity maps corresponding to the
scenario earthquakes, however, and the known variability about these medians was ignored.
While this approach is highly computationally efficient on account of the use of a small
catalog of earthquakes, the selection of earthquakes is a somewhat subjective process, and
the assignment of probabilities is based on hazard consistency rather than on actual event
likelihoods. Moreover, the procedure does not capture the effect of the uncertainties in
ground-motion intensities.
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 116
The current research work develops an importance sampling-based framework to ef-
ficiently sample important magnitudes and ground-motion residuals. It is seen that the
number of IS simulations is about two orders of magnitude smaller than the number of
Monte Carlo simulations required to obtain equally accurate lifeline loss estimates. De-
spite this improvement with respect to the performance of the conventional MCS approach,
the number of IS intensity maps required for risk assessment is still likely to be an incon-
veniently large number. As a result, the K-means clustering technique is used to further
reduce the number of intensity maps required for risk assessment by over an order of mag-
nitude. The feasibility of the proposed framework is illustrated by assessing the seismic
risk of an aggregated form of the San Francisco Bay Area transportation network using
a sampled catalog of 150 intensity maps. The resulting risk estimates are shown to be in
good agreement with those obtained using the conventional MCS approach (the benchmark
method).
6.3 Simulation of ground-motion intensity maps using im-
portance sampling
This section provides a description of the importance sampling technique used in the cur-
rent work to efficiently simulate ground-motion intensity maps. Importance sampling (IS)
is a technique used to evaluate functions of random variables with a certain probability
density function (PDF) using samples from an alternate density function [Fishman, 2006].
This technique is explained in more detail in section 6.3.1. Sections 6.3.2, 6.3.3 and 6.3.4
describe the application of IS to the simulation of ground-motion intensity maps, which in-
volves probabilistically sampling a catalog of earthquake magnitudes and rupture locations
(which are required for computing the median ground-motion intensities), the normalized
inter-event residuals and the normalized intra-event residuals (Equation 6.1).
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 117
6.3.1 Importance sampling procedure
Let f (x) be a PDF defined over domain D for random variable X . Define an integral H as
follows:
H =∫
Dq(x) f (x)dx (6.2)
where q(x) is an arbitrary function of x. The integral can be rewritten as follows:
H =∫
Dq(x)
f (x)g(x)
g(x)dx (6.3)
where g(x) is any probability density assuming non-zero values over the same domain D.
The term f (x)g(x) is called the importance sampling weight.
Based on Equation 6.2, the integral H can be estimated using conventional MCS as
follows:
H =1n
n
∑i=1
q(xi) (6.4)
where H is an estimate of H and x1, ...,xn are n realizations of the random variable X
obtained using f (x). The IS procedure involves estimating the integral H using the alternate
density g(x) as follows (based on Equation 6.3):
H =1r
r
∑i=1
q(yi)f (yi)
g(yi)(6.5)
where y1, ...,yr are r realizations from g(y), and f (yi)g(yi)
is a weighting function (the impor-
tance sampling weight) that accounts for the fact that the realizations are based on the
alternate density g(y) rather than the original density f (y).
While Equations 6.4 and 6.5 provide two methods of estimating the same integral H, it
can be shown that the variance of the estimate H obtained using Equation 6.5 can be made
very small if an appropriate alternate density function g(x) is chosen [Fishman, 2006]. As a
result of this variance reduction, the required number of IS realizations (r) is much smaller
than the required number of conventional MCS realizations (n) for an equally reliable (i.e.,
same variance) estimate H.
Intuitively, the density g(x) should be such that the samples from g(x) are concentrated
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 118
in regions where the function q(x) is ‘rough’. This will ensure fine sampling in regions
that ultimately determine the accuracy of the estimate and coarse sampling elsewhere. The
challenge in implementing IS lies in choosing this alternate density g(x). Useful alternate
densities for this application are provided in the following subsections.
6.3.2 Simulation of earthquake catalogs
Let n f denote the number of active faults in the region of interest and ν j denote the annual
recurrence rate of earthquakes on fault j with magnitudes exceeding a minimum magnitude
mmin. Let f j(m) denote the density function for magnitudes of earthquakes on fault j. Let
f (m) denote the density function for the magnitude of an earthquake on any of the n f
faults (i.e., this density function models the distribution of earthquakes resulting from all
the faults). Using the theorem of total probability, f (m) can be computed as follows:
f (m) =∑
n fj=1 ν j f j(m)
∑n fj=1 ν j
(6.6)
In the event of an earthquake of magnitude m on a random fault, let Pj(m) denote the
probability that the earthquake rupture lies on fault j. The Pj(m)’s can be calculated using
the Bayes’ theorem as follows:
Pj(m) =ν j f j(m)
∑n fj=1 ν j f j(m)
(6.7)
A conventional MCS approach would use the density function f (m) to simulate earth-
quake magnitudes, although this approach will result in a large number of small magnitude
events since such events are considerably more probable than large magnitude events. This
is not efficient since lifeline losses due to frequent small events are less important than
those due to rare large events (although not negligible, so they can not be ignored). It is
desirable to improve the computational efficiency of the risk assessment process without
compromising the accuracy of the estimates by using the importance sampling technique
described in section 6.3.1 to preferentially sample large events while still ensuring that the
simulated events are ‘stochastically representative’. In other words, the magnitudes are
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 119
simulated from a sampling distribution g(m) (rather than f (m)), which is chosen to have a
high probability of producing large magnitude events.
Let mmin and mmax denote the range of magnitudes of interest. This range [mmin,mmax]
can be stratified into nm partitions as follows:
[mmin,mmax] = [mmin,m2)∪ [m2,m3)∪·· ·∪ [mnm,mmax] (6.8)
In the current work, the partitions are chosen such that the width of the interval (i.e., mk+1−mk) is large at small magnitudes and small at large magnitudes (Figure 6.1a). A single
magnitude is randomly sampled from each partition using the magnitude density function
f (m), thereby obtaining nm realizations of the magnitudes. Since, the partitions are chosen
to have small widths at large magnitudes, there are naturally a larger number of realizations
of large magnitude events. In this case, the sampling distribution g(m) is not explicit,
but rather is implicitly defined by the magnitude selection partitioning. This procedure,
sometimes called stratified sampling, has the advantage of forcing the inclusion of specified
subsets of the random variable while maintaining the probabilistic character of random
sampling [Fishman, 2006].
The importance sampling weight f (m)g(m) can be obtained by noting that the sampling
distribution assigns equal weight to all the chosen partitions (1/nm), while the actual prob-
ability of a magnitude lying in a partition (mk,mk+1) is obtained by integrating the density
function f (m). Hence, the importance sampling weight for a magnitude m chosen from the
kth partition is computed as follows:
f (m)
g(m)=
∫ mk+1mk
f (m)dm
1/nm(6.9)
Once the magnitudes are sampled using IS, the rupture locations can be obtained by
sampling faults using fault probabilities Pj(m) (Equation 6.7). It is to be noted that Pj(m)
will be non-zero only if the maximum allowable magnitude on fault j exceeds m. Let
n f (m) denote all such faults with non-zero values of Pj(m). If n f (m) is small (around
10), a more efficient sampling approach will be to consider each of those n f (m) faults
to be the source of the earthquake and consider n f (m) different earthquakes of the same
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 120
Figure 6.1: Importance sampling density functions for: (a) magnitude and (b) normalizedintra-event residual; (c) recommended mean-shift as a function of the average number ofsites and the average site-to-site distance normalized by the range of the spatial correlationmodel.
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 121
simulated magnitude. It is to be noted that this fault sampling procedure is similar to the
importance sampling of magnitudes. The importance sampling weight for fault j chosen
by this procedure is computed as follows:
f ( j|m)
g( j|m)=
Pj(m)
1/n f (m)(6.10)
where f ( j|m) and g( j|m) denote the original and the alternate (implicit) probability mass
functions for fault j given an earthquake of magnitude m. Once a fault is sampled, the
rupture is located randomly on the fault.
6.3.3 Simulation of normalized intra-event residuals
The set of normalized intra-event residuals at p sites of interest, εεε j =(ε1 j,ε2 j, · · · ,εp j
),
follows a multivariate normal distribution f (εεε j) [Jayaram and Baker, 2008]. The mean of
εεε j is the zero vector of size p, while the variance of each εi j equals one. The correlation
between the residuals at two sites is a function of the separation between the sites, and
can be obtained from a spatial correlation model. In this work, the correlation coefficient
between the residuals at two sites i1 and i2 separated by h km is computed using the fol-
lowing equation, which was calibrated using empirical observations (Chapter 3) [Jayaram
and Baker, 2009a]:
ρεi1 j,εi2 j(h) = exp(−3h/R) (6.11)
where R controls the rate of decay of spatial correlation and is called the ‘range’ of the
correlation model. The range depends on the intensity measure being used. In this work,
the intensity measure of interest is the spectral acceleration corresponding to a period of 1
second, and the corresponding value of R equals 26 km.
While a conventional MCS approach can be used to obtain realizations of εεε j using f (e)[Fishman, 2006], this will result in a large number of near-zero (i.e., near-mean) residu-
als and few realizations from the upper and the lower tails. This is inefficient since for
the purposes of lifeline risk assessment it is often of interest to study the upper tail (i.e.,
the εεε j values that produce large intensities), which is not sampled adequately in the con-
ventional MCS approach. An efficient alternate sampling density g(e) is a multivariate
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 122
normal density with the same variance and correlation structure as f (e), but with posi-
tive means for all ε ′i js (i.e., a positive mean for the marginal distribution of each intra-
event residual). In other words, the mean vector of g(e) is the p-dimensional vector
mmmsssintra = (msintra,msintra, · · · ,msintra). Sampling normalized intra-event residuals from
this distribution g(e), which has a positive mean, will produce more realizations of large
normalized intra-event residuals. Figure 6.1b shows the original and sampling marginal
distributions for one particular εi j. It is to be noted that this particular choice of the sam-
pling distribution results in importance sampling weights that are simple to estimate. The
importance sampling weights can be estimated as follows:
f (e)g(e)
= exp(
12(((eee−−−mmmsssintra)))
′Σ−1(((eee−−−mmmsssintra)))−
12
eee′Σ−1eee)
(6.12)
where Σ denotes the covariance matrix of εεε j.
The positive mean of g(e) will ensure that the realizations from g(e) will tend to be
larger than the realizations from f (e). It is, however, important to choose a reasonable
value of the mean-shift msintra to ensure adequate preferential sampling of large εεε j’s, while
avoiding sets of extremely large normalized intra-event residuals that will make the simu-
lated intensity map so improbable as to be irrelevant. The process of selecting a reasonable
value of msintra is described below.
The first step in fixing the value of msintra is to note that the preferred value depends
predominantly on three factors, namely, the extent of spatial correlations (measured by the
range parameter R in Equation 6.11), the average site-to-site separation distance in the life-
line network being studied and the number of sites in the network. If sites are close to one
another and if the spatial correlations are significant, the correlations between the residuals
permit a larger mean-shift as it is reasonably likely to observe simultaneously large values
of positively-correlated random variables. Similarly, the presence of fewer sites permits
larger mean-shifts since it is more likely to observe jointly large values of residuals over a
few sites than over a large number of sites. Hence, it is intended to determine the preferred
mean-shifts as a function of the number of sites and the average site-to-site separation
distances normalized by the range parameter. This is done by simulating the normalized
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 123
intra-event residuals in hypothetical analysis cases with varying numbers of sites and vary-
ing average site separation distances, considering several feasible mean-shifts in each case.
The feasibility of the resulting residuals (i.e., whether the simulated set of residuals is rea-
sonably probable) is then studied using the resulting importance sampling weights. Based
on extensive sensitivity analysis, the authors found that the best results are obtained when
30% of the importance sampling weights fall below 0.1, if exceedance rates larger than
10−6 are of interest. The preferred mean-shifts are determined for each case based on this
criterion, and are plotted in Figure 6.1c. This figure will enable users to avoid an extremely
computationally expensive search for an appropriate sampling distribution in a given analy-
sis case. Incidentally the figure shows that the mean-shift increases with average site sepa-
ration distance and decreases with the number of sites. This validates the above-mentioned
statement that larger site separation distances and fewer sites permit larger mean-shifts.
6.3.4 Simulation of normalized inter-event residuals
Following standard conventions, since the inter-event residual is a constant across all the
sites during a single earthquake [e.g., Abrahamson and Youngs, 1992], the simulated nor-
malized inter-event residuals should satisfy the following relation (which does not assume
that the τi j’s are equal in order to be compatible with ground-motion models such as that
of Abrahamson and Silva [2008]):
ηi j =τ1 j
τi jη1 j ∀ j (6.13)
Thus the normalized inter-event residuals can be simulated by first simulating η1 j from
a univariate normal distribution with zero mean and unit standard deviation, and by sub-
sequently evaluating other normalized inter-event residuals using Equation 6.13. The IS
procedure for η1 j is similar to that for εεε j, except that the alternate sampling distribution
is univariate normal rather than multivariate normal, and has unit standard deviation and a
positive mean msinter. The likelihood ratio in this case is
f (t)g(t)
= exp(
12(t−msinter)
2− 12
t2)
(6.14)
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 124
where t denotes a realization of the normalized inter-event residual.
The authors have found that values of msinter between 0.5 and 1.0 produce an appropri-
ate number of normalized inter-event residuals from the tail of the distribution.
6.4 Lifeline risk assessment
In this chapter, it is intended to obtain the exceedance curve for a lifeline loss measure
denoted L (e.g., travel-time delay in a transportation network) considering seismic hazard.
The exceedance curve, which provides the annual exceedance rates of various values of L, is
the product of the exceedance probability curve and the total recurrence rate of earthquakes
exceeding the minimum considered magnitude on all faults.
νL≥u =
(n f
∑j=1
ν j
)P(L≥ u) (6.15)
A simple way to compute the annual exceedance rates, while treating each fault separately,
would be to compute ∑n fj=1 ν jP(L j ≥ u), where P(L j ≥ u) denotes the exceedance proba-
bility for fault j, and the ν j values account for unequal recurrence rates across faults. That
approach is not possible here because the importance sampling of Equation 6.9 makes sep-
aration by faults difficult. In Equation 6.15, P(L≥ u) is the probability that the loss due to
any earthquake event of interest (irrespective of the fault of occurrence) exceeds u. It can
be computed using the simulated maps, and in that form already accounts for the individual
P(L j ≥ u) values and the ν j values.
6.4.1 Risk assessment based on realizations from Monte Carlo simu-lation
If a catalog of n intensity maps obtained using the conventional MCS approach is used for
the risk assessment, the empirical estimate of the exceedance probabilities (P(L ≥ u)) can
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 125
be obtained as follows (from Equation 6.4):
P(L≥ u) =1n
n
∑i=1
I(li ≥ u) (6.16)
where li is the loss level corresponding to intensity map i, and I(li ≥ u) is an indicator
function which equals 1 if li ≥ u and 0 otherwise.
6.4.2 Risk assessment based on realizations from importance sam-pling
The summand in Equation 6.16 can be evaluated using the approach described in section
6.3. Assuming that a catalog of r importance sampling-based intensity maps are used for
evaluating the risk, the estimate of the exceedance probability curve can be obtained as
follows (from Equation 6.5):
P(L≥ u) =1r
r
∑i=1
I(li ≥ u)fS(i)gS(i)
(6.17)
where fS(i)gS(i)
is the importance sampling weight corresponding to scenario intensity map i,
which can be evaluated as follows:
fS(i)gS(i)
=f (m)
g(m)
f ( j|m)
g( j|m)
f (e)g(e)
f (t)g(t)
= Λi (6.18)
where m, j, e, t denote the magnitude, fault, normalized intra-event residuals and normal-
ized inter-event residual corresponding to map i respectively. The terms in Equation 6.18
can be obtained from Equations 6.9, 6.10, 6.12 and 6.14.
Equation 6.17 shows that the exceedance probability curve is obtained by weighting
the indicator functions by the importance sampling weights for the maps. In the rest of
the chapter, this weight is denoted Λi as shown in Equation 6.18. Using this notation for
weight, Equation 6.17 can be rewritten as follows:
P(L≥ u) =1r
r
∑i=1
I(li ≥ u)Λi =∑
ri=1 I(li ≥ u)Λi
∑ri=1 Λi
(6.19)
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 126
The second equality in the above equation comes from the fact that ∑ri=1 Λi = r, as seen by
substituting u = 0 in the equation and noting that P(L≥ 0) = 1.
The variance (var) of this estimate can be shown to be
var[P(L≥ u)
]=
∑ri=1[I(li ≥ u)Λi− P(L≥ u)
]2(∑r
i=1 Λi)(∑ri=1 Λi−1)
(6.20)
6.5 Data reduction using K-means clustering
The use of importance sampling causes a significant improvement in the computational
efficiency of the simulation procedure, but the number of required IS intensity maps is
still large and may pose a heavy computational burden. K-means clustering [McQueen,
1967] is thus used as a data reduction technique in order to develop a smaller catalog of
maps by ‘clustering’ simulated ground-motion intensity maps with similar properties (i.e.,
similar spectral acceleration values at the sites of interest). This data reduction procedure is
also used in machine learning and signal processing, where it is called vector quantization
[Gersho and Gray, 1991].
K-means clustering groups a set of observations into K clusters such that the dissim-
ilarity between the observations (typically measured by the Euclidean distance) within a
cluster is minimized [McQueen, 1967]. Let Sa1,Sa2 , · · · ,Sar denote r maps generated us-
ing importance sampling to be clustered, where each map Sa j is a p-dimensional vector
defined by Sa j =[Sa1 j ,Sa2 j , · · · ,Sap j
]. The K-means method groups these maps into clus-
ters by minimizing V , which is defined as follows:
V =K
∑i=1
∑Sa j∈Si
‖Sa j −Ci‖2 (6.21)
where K denotes the number of clusters, Si denotes the set of maps in cluster i, Ci =
[C1i,C2i, · · · ,Cpi] is the cluster centroid obtained as the mean of all the maps in cluster i,
and ‖Sa j −Ci‖2 denotes the distance between the map Sa j and the cluster centroid Ci. If
the Euclidean distance is adopted to measure dissimilarity, then the distance between Sa j
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 127
and Ci is computed as follows:
‖Sa j −Ci‖2 =p
∑q=1
(Saq j −Cqi
)2 (6.22)
In its simplest version, the K-means algorithm is composed of the following four steps:
Step 1: Pick K maps to denote the initial cluster centroids. This selection can be done
randomly.
Step 2: Assign each map to the cluster with the closest centroid.
Step 3: Recalculate the centroid of each cluster after the assignments.
Step 4: Repeat steps 2 and 3 until no more reassignments take place.
Once all the maps are clustered, the final catalog can be developed by selecting a single
map from each cluster, which is used to represent all maps in that cluster on account of
the similarity of the maps within a cluster. In other words, if the map selected from a
cluster produces loss l, it is assumed that all other maps in the cluster produce the same
loss l by virtue of similarity. The maps in this smaller catalog can be used in place of
the maps generated using importance sampling for the risk assessment (i.e., for evaluating
P(L ≥ u)), which results in a dramatic improvement in the computational efficiency. This
is particularly useful in applications where it is practically impossible to compute the loss
measure L using more than K maps (where K equals a few hundreds). In such cases, the
maps obtained using IS can be grouped using the K-means method into K clusters, and one
map can be randomly selected from each cluster in order to obtain the catalog of intensity
maps to be used for the risk assessment. This procedure allows us to select K strongly
dissimilar intensity maps as part of the catalog (since the maps eliminated are similar to
one of these K maps in the catalog), but will ensure that the catalog is ‘stochastically
representative’. Because only one map from each cluster is now used, the total weight
associated with the map should be equal to the sum of the weights of all the maps in that
cluster (∑ri=1 Λi). It is to be noted that even though the maps within a cluster are expected
to be similar, for probabilistic consistency, a map must be chosen from a cluster with a
probability proportional to its weight. Equation 6.19 can then be used with these sampled
maps and the total weights to compute an exceedance probability curve using the catalog
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 128
as follows:
P(L≥ u) =∑
Kc=1 I
(l(c) ≥ u
)(∑i∈c Λi)
∑Kc=1 (∑i∈c Λi)
(6.23)
where l(c) denotes the loss measure associated with the map selected from cluster c
Appendix 6.8 shows that the exceedance probabilities obtained using Equation 6.23
will be unbiased. This and the fact that all the random variables are accounted for appropri-
ately is the reason why the catalog selected is claimed to be stochastically representative.
Incidentally, the computational efficiency of this procedure can be improved with minor
modifications to the clustering approach, as described in Appendix 6.9.
6.6 Application: Seismic risk assessment of the San Fran-
cisco Bay Area transportation network
In this section, the San Francisco Bay Area transportation network is used to illustrate
the feasibility of the proposed risk assessment framework. It is intended to show that
the seismic risk estimated using the catalog of 150 intensity maps matches well with the
seismic risk estimated using the conventional MCS framework and a much greater number
of maps (which is the benchmark approach). The catalog size of 150 is chosen since it may
be tractable to a real-life lifeline risk assessment problem. If reduced accuracy and reduced
emphasis on very large losses is acceptable, the number of maps could be reduced even
further. Alternately, a larger number of maps can be chosen if the computational demand
remains tractable.
6.6.1 Network data
The San Francisco Bay Area transportation network data are obtained from Stergiou and
Kiremidjian [2006]. Figure 6.2a shows the Metropolitan Transportation Commission
(MTC) San Francisco Bay Area highway network, which includes 29,804 links (roads) and
10,647 nodes. The network also consists of 1,125 bridges from the five counties of the Bay
Area. Stergiou and Kiremidjian [2006] classified these bridges based on their structural
properties in accordance with the HAZUS [1999] manual. (The HAZUS [1999] fragility
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 129
functions are used here only for illustrative purposes, and more realistic fragility functions
can be used if applicable.) This classification is useful for estimating the structural damage
to bridges due to various simulated intensity maps. The Bay Area network consists of a to-
tal of 1,120 transportation analysis zones (TAZ), which are used to predict the trip demand
in specific geographic areas. The origin-destination (OD) data provided by Stergiou and
Kiremidjian [2006] were obtained from the 1990 MTC household survey [Purvis, 1999].
Analyzing the performance of a network as large and complex as the San Francisco
Bay Area transportation network under a large number of scenarios is extremely computa-
tionally intensive. Therefore, an aggregated representation of the Bay Area network is used
for this example application. The aggregated network consists predominantly of freeways
and expressways, along with the ramps linking the freeways and expressways. The nodes
are placed at locations where links intersect or change in characteristics (e.g., change in
the number of lanes). The aggregated network comprises of 586 links and 310 nodes, and
is shown in Figure 6.2b. Of the 310 nodes, 46 are denoted centroidal nodes that act as
origins and destinations for the traffic. These centroidal nodes are chosen from the cen-
troidal nodes of the original network in such a way that they are spread out over the entire
transportation network. The data from the 1990 MTC household survey are aggregated to
obtain the traffic demands at each centroidal node. The aggregation involves assigning the
traffic originating or culminating in any TAZ to its nearest centroidal node. Of the 1,125
bridges in the original network, 1,038 bridges lie on the links of the aggregated network
and are considered in the risk assessment procedure.
While the performance of the aggregated network may or may not be similar to that of
the full network, the aggregated network serves as a reasonably realistic and complex test
case for the proposed framework, to demonstrate its feasibility. The goal is to demonstrate
that the data reduction techniques proposed here produce the same exceedance curve as
the more exhaustive MCS. The simplified network is simple enough that MCS is feasible,
but still retains the spatial distribution and network effects that are characteristic of more
complex models. If the proposed techniques can be shown to be effective for this simplified
model, then they can be used with more complex models where validation using MCS is
not feasible.
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 130
6.6.2 Transportation network loss measure
A popular measure of network performance is the travel-time delay experienced by pas-
sengers in a network after an earthquake [Stergiou and Kiremidjian, 2006, Shiraki et al.,
2007]. The delay is computed as the difference between the total travel time in the network
before and after an earthquake.
Estimating travel time in the network
The total travel time (T ) in a network is estimated as follows:
T = ∑i∈links
xiti(xi) (6.24)
where xi denotes the traffic flow on link i and ti(xi) denotes the travel time of an individual
passenger on link i. The travel time on link i is obtained as follows [Bureau of Public
Roads, 1964]:
ti(xi) = t fi
[1+α
(xi
ci
)β]
(6.25)
where t fi denotes the free-flow link travel time (i.e., the travel time of a passenger if link i
were to be empty), ci is the capacity of link i, α and β are calibration parameters, taken as
0.15 and 4 respectively [Shiraki et al., 2007].
Travel times on transportation networks are usually computed using the user equilib-
rium principle [Beckman et al., 1956], which states that each individual user would follow
the route that will minimize his or her travel time. Based on the user-equilibrium principle,
the link flows in the network are obtained by solving the following optimization problem:
min ∑i∈{links}
∫ xi
0ti(u)du (6.26)
subject to the following constraints:
∑j∈paths
f odj = Qod ∀o ∈ {org},d ∈ {dest} (6.27)
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 131
xi = ∑o∈org
∑d∈dest
∑j∈paths
f odj δ
odji ∀i ∈ {links} (6.28)
f odj ≥ 0 ∀o ∈ {org},d ∈ {dest}, j ∈ {paths} (6.29)
where f odj denotes the flow between origin o and destination d that passes through path j
(here, a path denotes a set of links through which the flow between a specified origin and
a specified destination occurs), Qod denotes the desired flow between o and d, δ odji is an
indicator variable that equals 1 if the link i lies on path j and 0 otherwise, org denotes the
set of all origins and dest denotes the set of all destinations. The current research work uses
a popular solution technique for this optimization problem provided by Frank and Wolfe
[1956]. It is to be noted that there are also other travel time and traffic flow estimation
techniques such as the dynamic user equilibrium formulation [e.g., Friesz et al., 1993] that
could incorporate the non-equilibrium conditions which might exist after an earthquake.
Post-earthquake network performance
The current work assumes for simplicity that the post-earthquake demands equal the pre-
earthquake demands even though this is known not to be true [Kiremidjian et al., 2003].
The changes in network performance after an earthquake are assumed to be due only to the
delay and rerouting of traffic caused by structural damage to bridges. The damage states of
the bridges are computed considering only the ground shaking, and other possible damage
mechanisms such as liquefaction are not considered. The bridge fragility curves provided
by HAZUS [1999] are used to estimate the probability of a bridge being in or exceeding a
particular damage state (no damage, minor damage etc.) based on the simulated ground-
motion intensity (spectral acceleration at 1 second) at the bridge site. These damage state
probabilities are then used to simulate the damage state of the bridge following the earth-
quake. Damaged bridges cause reduced capacity in the link containing the bridge. The
reduced capacities corresponding to the five different HAZUS damage states are 100% (no
damage), 75% (slight damage/ moderate damage) and 50% (extensive damage/ collapse).
The non-zero capacity corresponding to the bridge collapse damage state may seem sur-
prising at first glance. This is based on the argument that there are alternate routes (apart
from the freeways and highways considered in the model) that provide reduced access to
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 132
transportation services in the event of a freeway or a highway closure [Shiraki et al., 2007].
Such redundancies are prevalent in most transportation networks.
A network can have several bridges in a single link, and in such cases, the link capacity
is a function of the damage to all the bridges in the link. The current work assumes that the
link capacity reduction equals the average of the capacity reductions attributable to each
bridge in the link. This is a simplification, and further research is needed to handle the
presence of multiple bridges in a link. The post-earthquake network performance is then
computed by solving the user-equilibrium problem using the new set of link capacities,
and a new estimate of the total travel time in the network is obtained. It is to be noted
that the current work estimates the performance of the network only immediately after an
earthquake. The changes in the performance with network component restorations are not
considered here for simplicity.
6.6.3 Ground-motion hazard
The San Francisco Bay Area seismicity information is obtained from USGS [2003]. Ten
active faults and fault segments are considered. The characteristic magnitude-recurrence
relationship of Youngs and Coppersmith [1985] is used to model f (m) with the distribution
parameters specified by the USGS, and 5.0 considered to be the lower bound magnitude
of interest. The flattening of this magnitude distribution towards the maximum magnitude
value (Figure 6.1) is to account for the higher probability of occurrence of the characteristic
earthquake on the fault [Youngs and Coppersmith, 1985]. The ground-motion model of
Boore and Atkinson [2008] is used to obtain the median ground-motion intensities and the
standard deviations of the residuals needed in Equation 6.1.
6.6.4 Results and discussion
Risk assessment using importance sampling
The IS framework requires that the parameters of the sampling distribution for the magni-
tude and the residuals be chosen reasonably in order to obtain reliable results efficiently.
The set of parameters includes the appropriate stratification for magnitudes, the mean-shift
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 133
for normalized inter-event residuals (msinter) and the mean-shift for normalized intra-event
residuals (msintra).
The stratification of the range of magnitudes is carried out so as to obtain a desired
histogram of magnitudes. The partition width is chosen to be 0.3 between 5.0 and 6.5, 0.15
between 6.5 and 7.3 and 0.05 beyond 7.3. The results obtained using the simulations are
not significantly affected by moderate variations in the partitions, suggesting that the strat-
ification will be effective as long as it is chosen to preferentially sample large magnitudes.
Normalized inter-event residuals are sampled using an msinter of 1.0. Using the procedure
described earlier, the value of msintra is fixed at 0.3.
The loss measure of interest here is the travel-time delay (i.e., the variable L denoting
loss measure in the previous section is the travel-time delay). Figure 6.3a shows the ex-
ceedance curve for travel-time delays obtained using the IS framework. This exceedance
curve is obtained by sampling 25 magnitudes, each of which is then positioned on the
active faults as described in Section 6.3.2, and 50 sets of inter and intra-event residuals
for each magnitude-location pair (resulting in a total of 12,500 maps). To validate the IS,
an exceedance curve is also estimated using the benchmark method (MCS). Strictly, the
benchmark approach should use MCS to sample the magnitudes and the ground-motion
residuals. This is computationally prohibitive, however, even for the aggregated network
and hence the benchmark approach used in the current study uses IS for generating the
magnitudes but MCS for the residuals. IS of a single random variable has been shown to be
effective in a wide variety of applications including lifeline risk assessment [Kiremidjian
et al., 2003], and so further validation is not needed. On the other hand, the simulation pro-
cedure for intra-event residuals involves the novel application of IS of a correlated vector of
random variables, and hence, is the focus of the validation study described in this section.
Figure 6.3a shows the exceedance curve obtained using IS for generating 25 magni-
tudes and MCS for generating 500 sets of inter and intra-event residuals per magnitude-
location pair, resulting in a total of 125,000 maps. As seen from the figure, the exceedance
curve obtained using the IS framework closely matches that obtained using the benchmark
method, indicating the accuracy of the results obtained using IS. This is further substan-
tiated by Figure 6.3b, which plots the estimated coefficient of variation (CoV) (computed
using Equations 6.19 and 6.20) of the exceedance rates obtained using the IS approach and
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 134
Figure 6.2: (a) San Francisco Bay Area transportation network (b) Aggregated network.
Figure 6.3: (a) Travel-time delay exceedance curves (b) Coefficient of variation of theannual exceedance rate (c) Comparison of the efficiency of MCS, IS and the combinationof K-means and IS (d) Travel-time delay exceedance curve obtained using the K-meansmethod.
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 135
the benchmark approach. It can be seen from the figure that the CoV values corresponding
to travel-time delays obtained using IS are comparable to those obtained using MCS even
though the IS uses one-tenth the number of simulations required by the MCS. Further, it
is also seen that using IS in place of MCS for simulating magnitudes typically reduces the
computational expense of the risk assessment by a factor of 10, and hence, the overall IS
framework reduces the number of computations required for the risk assessment by a fac-
tor of nearly 100. It is to be noted that IS produces unbiased risk estimates, and any minor
deviation between the IS and the MCS curves in Figure 6.3a is due to the small variances
in the risk estimates.
Risk assessment using IS and K-means clustering
The 12,500 maps obtained using IS are next grouped into 150 clusters using the K-means
method. A catalog is then developed by randomly sampling one map from each cluster
in accordance with the map weights as described in section 6.5. This catalog is used to
estimate the travel-time delay exceedance curve based on Equation 6.23, and the curve is
seen to match reasonably well with the exceedance curve obtained using the IS technique
(Figure 6.3a). Based on the authors’ experience, the deviation of this curve from the IS
curve at the large delay levels is a result of the variance of the exceedance rates rather than
any systematic deviation. The variance in the exceedance curves is a consequence of the
fact that the map sampled from each cluster is not identical to the other maps in the cluster
(although they are similar).
To ascertain the variance of the exceedance rates, the clustering and the map selection
processes are repeated several times in order to obtain multiple catalogs of 150 represen-
tative ground-motion intensities, which are then used for obtaining multiple exceedance
curves. The coefficient of variation of the exceedance rates are then computed from these
multiple exceedance curves and are plotted in Figure 6.3b. It can be seen that the CoV
values obtained using the 150 maps generated by the IS and K-means combination are
about three times larger than those obtained using the 12,500 IS maps and the 125,000
MCS maps. This is to be expected, though, on account of the large reduction in the number
of maps. The factor of three increase in the CoVs, however, is significantly smaller than
what can be expected if IS and MCS are used to obtain the 150 maps directly. This can be
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 136
seen from Figure 6.3b, which shows the large CoV values of the exceedance rates obtained
using 150 ground-motion maps selected directly using the IS and the MCS procedures. Al-
ternately, the relative performances of the IS and K-means combination, the IS method and
the MCS method can also be assessed by comparing the number of maps to be simulated
using these methods in order to achieve the same CoVs. It is seen that 3,500 IS maps and
11,750 MCS maps are necessary to produce similar CoVs (Figure 6.3c) achieved using the
150 IS and K-means combination maps.
Finally, Figure 6.3d shows the mean exceedance rates, along with the empirical 95 per-
centile (point-wise) confidence interval obtained using the K-means method. Also shown
in this figure is the exceedance curve obtained using the IS technique. The mean K-means
curve and the IS curve match very closely, indicating that the sampling and data reduction
procedure suggested in this work results in unbiased exceedance rates (This is also theo-
retically established in Appendix 6.8). These width of the confidence interval turns out to
be reasonably small, especially considering that the exceedance rates have been obtained
using only 150 intensity maps.
If the K-means clustering procedure is effective, intensity maps in a cluster will be sim-
ilar to each other. Therefore, the travel-time delays associated with all the maps in a cluster
should be similar to one another, and different from the travel-time delays associated with
the maps in other clusters. In other words, the mean travel-time delays computed using all
the maps in one cluster should be different from the mean from other clusters, while the
standard deviation of the travel-time delays in a cluster should be small as a result of the
similarity within a cluster. Conversely, ‘random clustering’ in which the maps obtained
from the IS are randomly placed in clusters irrespective of their properties would be very
inefficient. Figure 6.4 compares the mean and the standard deviation of cluster travel-time
delays, obtained using K-means clustering and random clustering. The smoothly varying
cluster means obtained using K-means as compared to the nearly uniform means obtained
using random clustering shows that the K-means has been successful in separating dissim-
ilar intensity maps. Similarly, the cluster standard deviations obtained using K-means are
considerably smaller than the standard deviations obtained using random clustering for the
most part (and are large for larger cluster numbers because all delays in these clusters are
large). The occasional spikes in the standard deviations are a result of small sample sizes
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 137
in some clusters.
In summary, the exceedance curves obtained and the results from the tests for the ef-
ficiency of K-means clustering indicate that the clustering method has been successful in
identifying and grouping similar maps together. As a consequence, substantial computa-
tional savings can be achieved by eliminating redundant (similar) maps, without consider-
ably affecting the accuracy of the exceedance rates. It is to be noted that this approach is
primarily meant for modeling the upper tail of the risk curve accurately. A conventional
Monte Carlo approach might be more appropriate when more frequently exceeded losses
such as median loss is of interest.
Hazard consistency
The proposed framework not only produces reasonably accurate loss estimates, but also
intensity maps that are hazard consistent. In other words, the site hazard curves obtained
based on the final catalog of intensity maps match the site ground-motion hazard curves
obtained from the fault and the ground-motion model using numerical integration (i.e.,
traditional PSHA). Figures 6.5a and b show the site hazard curves at two different sites
obtained using numerical integration, importance sampling (for magnitudes and residuals)
and the combination of importance sampling and K-means clustering. It can be seen that the
sampling and clustering framework reasonably reproduces the site ground-motion hazard
obtained through numerical integration.
6.6.5 Importance of modeling ground-motion uncertainties and spa-tial correlations
The transportation network risk assessment is repeated assuming uncorrelated intra-event
residuals, and a new exceedance curve is obtained, and plotted in Figure 6.6. It can be
seen that the risk is considerably underestimated when the spatial correlations are ignored.
Further, some past risk assessments have completely ignored the uncertainty in the ground-
motion intensities (i.e., median intensity maps are used, and inter- and intra-event residuals
are ignored). A risk assessment carried out this way, and plotted in Figure 6.6 shows that
the risk is even more substantially underestimated in this case. This happens because the
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 138
Figure 6.4: (a) Mean of travel-time delays within a cluster (b) Standard deviation of travel-time delays within a cluster. With both clustering methods, cluster numbers are assigned inorder of increasing mean travel-time delay within the cluster for plotting purposes.
Figure 6.5: Comparison of site hazard curves obtained at two sample sites using the sam-pling framework with that obtained using numerical integration. (a) Sample site 1 and (b)Sample site 2.
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 139
possibility of observing above-median ground-motion intensities during a given earthquake
is not considered. Such simplifications clearly introduce significant errors into the risk
calculations, and should thus be avoided.
6.7 Conclusions
An efficient simulation-based framework based on importance sampling and K-means clus-
tering has been proposed, that can be used for the seismic risk assessment of lifelines.
The framework can be used for developing a small, but stochastically-representative cat-
alog of ground-motion intensity maps that can be used for performing lifeline risk as-
sessments. The importance sampling technique is used to preferentially sample important
ground-motion intensity maps, and the K-means clustering technique is used to identify
and combine redundant maps. It is shown theoretically and empirically that the risk esti-
mates obtained using these techniques are unbiased. The study proposes importance sam-
pling schemes that can be used for sampling earthquake magnitudes, rupture locations,
inter-event residuals and spatially correlated maps of intra-event residuals. Magnitudes are
sampled by first stratifying the magnitude range of interest into smaller partitions and by
selecting one magnitude from each partition. The partitions are made narrower at larger
magnitudes to ensure that larger magnitudes are preferentially sampled. The normalized
residuals are sampled from a normal distribution with a positive mean, rather than a zero
mean, to sample more large positive residuals. Techniques are also suggested to estimate
the optimal parameters of these alternate sampling density functions. The proposed frame-
work was used to evaluate the exceedance rates of various travel-time delays on an ag-
gregated form of the San Francisco Bay Area transportation network. Simplified trans-
portation network analysis models were used to illustrate the feasibility of the proposed
framework. The exceedance rates were obtained using a catalog of 150 maps generated
using the combination of importance sampling and K-means clustering, and were shown
to be in good agreement with those obtained using the conventional Monte Carlo simula-
tion method. Therefore, the proposed techniques can reduce the computational expense of
a simulation-based risk assessment by several orders of magnitude, making it practically
feasible. The efficiency of the proposed technique was compared to that of conventional
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 140
techniques using the coefficient of variation (CoV) of the exceedance rates. It was shown
that the CoVs achieved using the 150 maps obtained from the combination of importance-
sampling and K-means clustering can only be reproduced by 3,500 importance-sampling
maps and 11,750 MCS maps (conventional MCS for residuals and importance sampling for
magnitudes), thereby indicating the efficiency of the proposed technique. The study also
showed that the proposed framework automatically produces intensity maps that are hazard
consistent. Finally, the study showed that the uncertainties in ground-motion intensities and
the spatial correlations between ground-motion intensities at multiple sites must be mod-
eled in order to avoid introducing significant errors into the lifeline risk calculations. For
the network considered in this work, ignoring spatial correlations results in about a 30 %
reduction in the estimated travel-time delays at small annual exceedance rates (10−6/year),
while ignoring uncertainties results in about a 70 % reduction in the estimated travel-travel
time delays at small exceedance rates.
6.8 Appendix: Proof that the exceedance rates obtained
using IS and K-means clustering are unbiased
This section illustrates that the loss (e.g., travel-time delay) exceedance rates obtained using
a catalog of ground-motion intensities generated by the IS and K-means framework are un-
biased. Since the importance sampling procedure produces unbiased estimates [Fishman,
2006], it will suffice to establish that the exceedance rates obtained using the K-means
clustered catalog of maps are unbiased estimators of the exceedance rates obtained using
the IS maps. This proof will further support the empirical observation that the example
exceedance rates from the different procedures are equivalent.
Let l1, l2, · · · , lr denote the loss measures (e.g., travel-time delay in a transportation
network) corresponding to the r intensity maps obtained using importance sampling. Let
Λ1,Λ2, · · · ,Λr denote the weights corresponding to the maps as defined in Equation 6.18.
Let PIS denote the exceedance probability curve obtained using the IS maps (Equation
6.19). Assume that the r maps are grouped into K clusters. (This proof does not require
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 141
knowledge about the clustering technique used.) Let l(c) be the travel-time delay in the net-
work corresponding to the map selected from cluster c. The exceedance probability curve
(PKM(L ≥ u)) can be obtained from the catalog of[l(1), l(2), · · · , l(K)
]based on Equation
6.23.
Unbiasedness can be established by showing that the expected value of PKM(L ≥ u)
equals PIS(L≥ u). The expected value of PKM(L≥ u) is computed using the law of iterated
expectations, by first conditioning it on a possible grouping G (i.e., a possible grouping
of maps into clusters obtained using the clustering method), and then by computing the
expectation over all possible groupings. The following equations describe this procedure:
E[PKM(L≥ u)
]= E
[∑
Kc=1 I
(l(c) ≥ u
)∑i∈c Λi
∑Kc=1 ∑i∈c Λi
](6.30)
= E
[∑
Kc=1 I
(l(c) ≥ u
)∑i∈c Λi
∑ri=1 Λi
]
= EG
{E
[∑
Kc=1 I
(l(c) ≥ u
)∑i∈c Λi
∑ri=1 Λi
∣∣∣G]}
= EG
[1
∑ri=1 Λi
K
∑c=1
P(
l(c) ≥ u∣∣∣G)∑
i∈cΛi
]
= EG
[1
∑ri=1 Λi
K
∑c=1
∑ j∈c I(l j ≥ u
)Λ j
∑ j∈c Λ j∑i∈c
Λi
]
= EG
[1
∑ri=1 Λi
K
∑c=1
∑j∈c
I(l j ≥ u
)Λ j
]
=1
∑ri=1 Λi
K
∑c=1
∑j∈c
I(l j ≥ u
)Λ j
=∑
ri=1 I(li ≥ u)Λi
∑ri=1 Λi
= PIS(L≥ u)
This shows that the exceedance rates obtained using the small catalog of ground-motion
intensities are unbiased.
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 142
6.9 Appendix: Improving the computational efficiency of
the K-means clustering method
Clustering a large number of intensity maps (e.g., 12,500) in a single step may be compu-
tationally prohibitive on computers with limited memory and processing ability, because
clustering involves repetitive computations of the distance between each map and the clus-
ter centroids. In such cases, the authors propose the following two-step clustering technique
in which the maps are preliminarily grouped into clusters using a simplified distance mea-
sure, followed by a rigorous final clustering step using the distance measure defined in
Equation 6.22. This two-step process is described below.
In the preliminary clustering step, the intensity maps are grouped into a small number
of preliminary clusters with the distance between map Sa j and centroid Ci computed as(∑
pq=1 Saq j −∑
pq=1Cqi
)2. In other words, the distance measure is based on the sum of the
intensities corresponding to the intensity map. The sum of the intensities is chosen as the
basis for clustering since it has been seen in past research [Campbell and Seligson, 2003]
and in the current research work to be a reasonable indicator of the risk associated with an
intensity map. Further, the K-means method is extremely fast when the distance is based
on a single parameter.
The final clustering step is used to refine the preliminary clusters, and involves further
clustering within each preliminary cluster using the distance measure defined in Equation
6.22. If 50 preliminary clusters are used, each of these could be subdivided into 3 clusters
using the K-means method. Even though the more rigorous distance measure is used in this
step, it is much faster because the final clustering is based on a far fewer number of maps
stored within each preliminary cluster. Further, the memory demand in this case is much
smaller than when clustering is carried out in a single step.
Figure 6.7 shows the (point-wise) confidence intervals of the travel-time delay ex-
ceedance curves obtained using the two-step clustering procedure, where 50 preliminary
clusters are each subdivided in to three final clusters. It can be seen from Figures 6.3d
and 6.7 that the results obtained using both the single-step and the two-step clustering ap-
proaches are essentially identical. For this application, the two-step clustering procedure is
five times faster than the single-step clustering procedure.
CHAPTER 6. PROBABILISTIC SEISMIC LIFELINE RISK ASSESSMENT 143
Figure 6.6: Exceedance curves obtained using simplifying assumptions.
Figure 6.7: Travel-time delay exceedance curve obtained using the two-step clusteringtechnique.
Chapter 7
Lifeline performance assessment usingstatistical learning techniques
7.1 Abstract
Chapter 6 proposed a simulation-based method involving importance sampling and K-
means clustering to efficiently generate a small catalog of stochastically-representative
ground-motion maps that can be used for lifeline risk assessment. The current study fo-
cuses on the highly computationally demanding task of estimating the confidence interval
for the risk estimates obtained using this simulation-based method. Estimating the confi-
dence intervals is computationally intensive because it requires repetitive risk calculations
(in order to estimate a variance for the risk estimates) that in turn involve numerous life-
line performance evaluations. In order to reduce the computational demand, the catalog
of ground-motion maps generated in Chapter 6 is used in conjunction with a statistical
learning technique called Multivariate Adaptive Regression Trees (MART) to develop an
approximate relationship between the lifeline performance and the ground-motion inten-
sities during an earthquake. The lifeline performance predicted by this relationship can
be used in place of the exact lifeline performance (the evaluation of which is intensive) to
expedite the computation of several lifeline risk-related parameters, including confidence
intervals.
144
CHAPTER 7. LEARNING TECHNIQUES FOR PERFORMANCE ASSESSMENT 145
Figure 7.1: Sample ground-motion map corresponding to an earthquake on the San Andreasfault. A map is a collection of ground movement levels (ground-motion intensities) at allthe sites of interest. The sites of interest, in this case, are located in the San Francisco BayArea.
7.2 Introduction
Probabilistic seismic risk assessment for lifelines is less straightforward than for individ-
ual structures. Lifelines, by virtue of their large size and geographic spread are affected
by earthquakes that originate on several faults, which necessitates the consideration of nu-
merous probable future earthquake scenarios. Further, lifeline risk assessment is based
on a large vector of spatially-correlated ground-motion intensities. The link between the
ground-motion intensities at the sites and the performance of the lifeline is usually not
available in closed form. These complexities make it difficult to use analytical frameworks
for lifeline risk assessment. As a result, Monte Carlo simulation (MCS)-based methods are
commonly used for characterizing spatial ground motions and for estimating lifeline risk
[e.g., Campbell and Seligson, 2003, Crowley and Bommer, 2006, Kiremidjian et al., 2007,
Shiraki et al., 2007].
CHAPTER 7. LEARNING TECHNIQUES FOR PERFORMANCE ASSESSMENT 146
In the MCS approach, several possible future ground-motion maps (which are collec-
tions of ground-motion intensities at all the sites of interest) are probabilistically gener-
ated, and the performance of the lifeline is evaluated under each intensity map. (A sample
ground-motion map due to a magnitude 8 earthquake on the San Andreas fault is shown
in Figure 7.1. This particular map has been simulated without consideration of local-site
effects purely for illustration purposes, but the studies carried out in this thesis include
local-site effects.) This approach is, however, highly computationally intensive, primarily
because it involves repeated evaluations of lifeline performance under a large number of
simulated ground-motion intensity maps. In the past, researchers have used several sim-
plifying assumptions (e.g., a single dominating scenario earthquake, deterministic ground-
motion intensities, absence of spatial correlation) in order to reduce the required number of
simulations. These simplifications can, however, lead to inaccuracies in the risk assessment
results, as discussed elsewhere in the thesis.
Chapter 6 [Jayaram and Baker, 2010] proposed a simulation-based method involving
importance sampling and K-means clustering to efficiently generate a small catalog of
stochastically-representative ground-motion maps that can be used for lifeline risk assess-
ment. Importance sampling is used to preferentially sample events with extreme ground-
motion intensities that contribute to the lifeline risk. K-means clustering is used to eliminate
redundant intensity maps (i.e., maps that are similar to other maps). They showed that the
risk estimates obtained using this small catalog are in good agreement with those obtained
using the conventional MCS that uses a much larger number of simulations.
The current study focuses on the highly-computationally demanding task of estimat-
ing the confidence intervals for the risk estimates obtained using the above described
simulation-based method. Estimating the confidence intervals is computationally inten-
sive because it requires repetitive risk calculations (in order to estimate a variance for the
risk estimates) that involves numerous lifeline performance evaluations. In order to reduce
the computational demand, the catalog of ground-motion maps generated in Chapter 6 is
used in conjunction with a statistical learning technique called Multivariate Adaptive Re-
gression Trees (MART) [Friedman, 1999] to develop an approximate relationship between
the lifeline performance and the ground-motion intensities during an earthquake. The life-
line performance predicted by this relationship can be used in place of the exact lifeline
CHAPTER 7. LEARNING TECHNIQUES FOR PERFORMANCE ASSESSMENT 147
performance (the evaluation of which is intensive) to expedite the computation of several
lifeline risk-related parameters. One notable work in this regard is that of Guikema [2009]
who proposed to use approximate regression relationships for evaluating the lifeline per-
formance. That work, however, is purely conceptual and does not give concrete examples.
Chapter 6 estimated the travel-time delay exceedance curves for the San Francisco Bay
Area transportation network. In this study, the performance relationship developed using
MART is used for estimating confidence intervals for these curves. It is seen that the
confidence intervals obtained using MART match well with those obtained using the exact
loss function.
7.3 Brief introduction to ground-motion map sampling
This section describes the conventional Monte Carlo ground-motion sampling procedure
as well as the importance sampling and K-means clustering procedures used in Chapter 6.
7.3.1 Conventional MCS of ground-motion maps
The distribution of the ground-motion intensity at any particular site is predicted using a
ground-motion model, which takes the following form [e.g., Boore and Atkinson, 2008,
Abrahamson and Silva, 2008, Chiou and Youngs, 2008, Campbell and Bozorgnia, 2008]:
ln(Sai j) = ln(Sai j
)+σi jεi j + τi jηi j (7.1)
where Sai j denotes the spectral acceleration (at the period of interest) at site i during earth-
quake j; Sai j denotes the predicted (by the ground-motion model) median spectral accel-
eration, which depends on parameters such as magnitude, distance, period and local-site
conditions; εi j denotes the normalized intra-event residual and ηi j denotes the normalized
inter-event residual. Both εi j and ηi j are univariate normal random variables with zero
mean and unit standard deviation. σi j and τi j are standard deviation terms that are es-
timated as part of the ground-motion model and are functions of the spectral period of
interest, and in some models also functions of the earthquake magnitude and the distance
of the site from the rupture.
CHAPTER 7. LEARNING TECHNIQUES FOR PERFORMANCE ASSESSMENT 148
Probabilistic sampling of ground-motion intensity at multiple sites involves the follow-
ing steps [Crowley and Bommer, 2006, Jayaram and Baker, 2010]:
Step 1: Use MCS to generate earthquakes of varying magnitudes on the active faults in
the region, considering appropriate magnitude-recurrence relationships (e.g., the Gutenberg
Richter relationship).
Step 2: Using a ground-motion model (Equation 7.1), obtain the median ground-motion
intensities (Sai j) and the standard deviations of the inter-event and the intra-event residuals
(σi j and τi j) at all the sites.
Step 3: Generate the normalized inter-event residual term (ηi j) by sampling from the
univariate normal distribution.
Step 4: Simulate the normalized intra-event residuals (εi j’s) using the parameters pre-
dicted by the ground-motion model. Chapter 2 [Jayaram and Baker, 2008] showed that a
vector of spatially-distributed normalized intra-event residuals εεε jjj =(ε1 j,ε2 j, · · · ,εp j
)fol-
lows a multivariate normal distribution. Hence, the distribution of εεε jjj can be completely
defined using the mean (zero) and standard deviation (one) of εi j, and the correlation be-
tween all εi1 j and εi2 j pairs. The correlations between the residuals can be obtained from a
predictive model calibrated using past ground-motion intensity observations [Jayaram and
Baker, 2009a, Wang and Takada, 2005].
Step 5: Combine the median intensities, the normalized intra-event residuals and the
normalized inter-event residual for each earthquake in accordance with Equation 7.1 to
obtain ground-motion intensity maps (i.e., obtain Sa j =(Sa1 j ,Sa2 j , · · · ,Sap j
)).
7.3.2 Importance sampling of ground-motion maps
Most of the past research works use random MCS (based on the original distributions of
magnitudes and residuals) for simulating ground-motion maps (with the notable exception
of Kiremidjian et al. [2007] who used importance sampling for magnitudes). While small
magnitude earthquakes and average values of residuals are highly probable, they are less
interesting for risk assessment purposes, where we are interested in large values of these
random variables. Hence, Chapter 6 proposed to sample these random variables prefer-
entially from the tails of their distributions by sampling from alternate distributions. The
CHAPTER 7. LEARNING TECHNIQUES FOR PERFORMANCE ASSESSMENT 149
Figure 7.2: (a) Stratified sampling of earthquake magnitudes (b) Importance sampling ofresiduals.
magnitudes were simulated using stratified sampling, where the entire range of magnitudes
was stratified into bins, with the bin width being large at small magnitudes and small at
large magnitudes (Figure 7.2a), and one magnitude was selected from each bin. This en-
sures an adequate sampling of large magnitude events. The residuals were sampled from
a multivariate normal distribution with positive means for the residuals rather than zero
means (Figure 7.2b) (in order to sample large values of residuals). Overall, the large mag-
nitude events combined with large positive residuals lead to large values of ground-motion
intensities in the sampled maps. It was seen that the importance sampling procedure re-
sults in two orders of magnitude reduction in the number of samples needed for the risk
assessment.
7.3.3 K-means clustering
The use of importance sampling causes significant improvement in the computational ef-
ficiency of the simulation procedure, but the number of required IS intensity maps is still
large and may pose a heavy computational burden. The K-means clustering [McQueen,
1967] was used in Chapter 6 as a data reduction technique in order to develop a smaller
catalog of maps by ‘clustering’ simulated ground-motion intensity maps with similar prop-
erties (i.e., similar spectral acceleration values at the sites of interest), and subsequently
using only one map from each cluster. The clustering was performed using the K-means
CHAPTER 7. LEARNING TECHNIQUES FOR PERFORMANCE ASSESSMENT 150
algorithm, which groups a set of observations into K clusters such that the dissimilarity
between the observations (typically measured by the Euclidean distance) within a cluster is
minimized [McQueen, 1967].
In its simplest version, the K-means algorithm comprises of the following four steps:
Step 1: Pick K maps to denote the initial cluster centroids (This selection can be done
randomly.)
Step 2: Assign each map to the cluster with the closest centroid.
Step 3: Recalculate the centroid of each cluster after the assignments.
Step 4: Repeat steps 2 and 3 until no more reassignments take place.
For instance, Figure 7.3 shows four simulated ground-motion maps, two of which can
be grouped together due to their similarity. Once all the maps are clustered, the final catalog
is developed by selecting one map from each cluster, which is used to represent all maps in
that cluster on account of the similarity of the maps within a cluster. In other words, if the
map selected from a cluster produces loss l, it is assumed that all other maps in the cluster
produce the same loss l (by virtue of similarity). The maps in this smaller catalog can be
used in place of the maps generated using importance sampling for the loss assessment,
which results in a dramatic improvement in the computational efficiency.
Both the importance sampling and the K-means clustering methods make the final set
of maps unequiprobable (i.e., each map is not equally likely). Hence, suitable weights (e.g.,
importance sampling weights) are attributed to these maps so that risk estimates obtained
using these maps are unbiased. The details of these weight calculations and a proof of
unbiasedness can be found in Chapter 6.
7.4 Confidence intervals for lifeline risk estimates
Chapter 6 used the catalog of maps generated using IS and K-means (described above) to
obtain the travel-time delay exceedance curve (i.e., rates of exceedance of various travel-
time delays) for the San Francisco Bay Area transportation network. In this work, it is of
interest to obtain the confidence intervals for the exceedance rates in a computationally-
efficient manner.
CHAPTER 7. LEARNING TECHNIQUES FOR PERFORMANCE ASSESSMENT 151
Figure 7.3: Four simulated ground-motion maps, two of which are reasonably similar andgrouped together into one cluster.
CHAPTER 7. LEARNING TECHNIQUES FOR PERFORMANCE ASSESSMENT 152
Figure 7.4: (a) The San Francisco Bay Area transportation network (b) Aggregated model.
7.4.1 Network data
This section describes the properties of the San Francisco Bay Area transportation net-
work used as the sample lifeline in this work. The relevant network data were obtained
from Stergiou and Kiremidjian [2006]. Figure 7.4a shows the Metropolitan Transportation
Commission (MTC) San Francisco Bay Area highway network, which consists of 29,804
links (roads) and 10,647 nodes. The network also consists of 1,125 bridges from the five
counties of the Bay Area. The traffic demand-supply data were obtained from the 1990
MTC household survey [Purvis, 1999].
Analyzing the performance of a network as large and complete as the San Francisco Bay
Area transportation network under maps generated, in particular, by conventional MCS
is extremely computationally intensive. Therefore, an aggregated representation of the
Bay Area network is used for this example application. The aggregated network consists
predominantly of freeways and expressways, along with the ramps linking the freeways
and expressways. The nodes are placed at locations where links intersect or change in
characteristics (e.g., change in the number of lanes). The aggregated network comprises of
586 links and 310 nodes (Figure 6.2b). While the performance of the aggregated network
CHAPTER 7. LEARNING TECHNIQUES FOR PERFORMANCE ASSESSMENT 153
may or may not be similar to that of the full network, the aggregated network should serve
as a reasonably realistic and complex test case for the proposed framework. If desired, the
methods developed here can be applied to the complete network as well.
7.4.2 Ground-motion hazard data
The San Francisco Bay Area seismicity information is obtained from USGS [2003]. Ten
active faults and fault segments are considered in the current work. The characteristic
magnitude-recurrence relationship of Youngs and Coppersmith [1985] is used to model the
density function for magnitudes with the distribution parameters specified by the USGS.
The ground-motion model of Boore and Atkinson [2008] is used to obtain the median
ground-motion intensities and the standard deviations of the residuals needed in Equation
7.1.
7.4.3 Statistical description of the problem
Let XXX denote the ground-motion intensities at all the sites of interest in one ground-motion
map. The number of sites equals 1,125 (the number of bridges in the network) and hence,
XXX is a reasonably large-dimensional vector. Let xxx1,xxx2, · · · ,xxxm denote various importance
sampling realizations of XXX . Assume that these realizations are segmented using K-means
clustering into K clusters. (The clustering attempts to minimize the sum of the Euclidean
distances between the vectors in the clusters from the cluster medians as described in Sec-
tion 6.5 of this thesis.) The lifeline losses are then computed using just one map sampled
from each cluster xxx(1),xxx(2), · · · ,xxx(K) in place of the m original samples. The loss estimates
are appropriately weighted (weights are denoted by w(i)) in order to ensure statistical con-
sistency. The complete details about the weights can be found in Chapter 6.
It is of interest to empirically estimate the exceedance curve of a loss function L (e.g.,
travel-time delay), and the corresponding confidence interval (CI). The rate of exceedance
of a loss value equals the rate of occurrence of earthquakes multiplied by the probability of
exceedance of the loss value (P(L > l)). The probability of exceedance can be estimated
CHAPTER 7. LEARNING TECHNIQUES FOR PERFORMANCE ASSESSMENT 154
Figure 7.5: Exceedance rates of travel-time delays.
as follows:
P(L > l) =K
∑i=1
I[L(xxx(i))> l
]w(i) (7.2)
where I[.] is an indicator variable and the w’s are the weights referred to earlier.
A sample exceedance curve (which provides the rate of observing various levels of
travel-time delays on the aggregated transportation network) is shown in Figure 7.5. The
loss function L(xxx) used in this case is the travel-time delay induced by ground-motion map
xxx. (The structural damage to the bridges increases the free-flow travel times in the roads,
and increases the overall travel time in the network.) The network delays are computed
using the static user-equilibrium framework [Frank and Wolfe, 1956]. This study intends
to obtain a pointwise CI for the exceedance rates of losses.
7.4.4 Confidence intervals using bootstrap
The confidence intervals (CI) for the risk estimates can be obtained by repeating the entire
risk assessment process several times in order to obtain multiple exceedance curves, and
by estimating the CIs as the quantiles of these exceedance curves. In other words, this
procedure involves repeating the IS and the K-means clustering procedures multiple times
to obtain multiple catalogs of 150 ground-motion maps each. Each catalog is used to obtain
one exceedance curve, and the CIs are estimated as the quantiles of this set of exceedance
CHAPTER 7. LEARNING TECHNIQUES FOR PERFORMANCE ASSESSMENT 155
curves.
Applying IS multiple times can be computationally-inefficient, therefore the current
work uses bootstrap resampling to simplify this procedure. For simplicity, denote the
collection of the original set of importance sampled maps, (xxx1,xxx2, · · · ,xxxm), as xxx. The
first step involved in the procedure is to obtain B bootstrap realizations of xxx (denoted xxx∗bfor b ∈ [1,B]). A bootstrap realization of xxx is a set of maps sampled with replacement
from (xxx1,xxx2, · · · ,xxxm) [Efron and Tibshirani, 1997] (In other words, the sets xxx∗b’s are ob-
tained by bootstrapping the original set, rather than by resampling using IS.) The sec-
ond step is to cluster xxx∗b into 150 clusters, pick one map from each cluster, and compute
θ ∗b (l) = P[L(xxx∗(b))> l
](where xxx∗(b) denotes the 150 maps obtained after clustering xxx∗b and
selecting one map from each cluster), for all b and all l values of interest. The collection
of θ ∗(b)(l)’s at all values of l denotes the probability of exceedance curve obtained using
the bootstrapped and clustered set of ground-motion maps xxx∗(b). The point-wise bootstrap
confidence interval is then estimated as the quantiles of the replicates (i.e., θ ∗(b)(l)’s) for
each value of l [Davison and Hinkley, 1997]. In essence, this procedure involves repeat-
ing (using bootstrap) the simulation procedure several times, and obtaining the confidence
intervals using quantiles of the collection of exceedance curves obtained.
The biggest hurdle in the above procedure is the computation of P[L(xxx∗(b))> l
]B times,
given that it is computationally intensive to estimate this even once (which is the reason why
the importance sampling and the clustering are used in the first place). This is not the case
for the aggregated network used in this study, but is certainly true for real-life networks.
Hence, it is intended to use an approximate loss estimation calculation obtained using a
non-parametric regression between the lifeline loss (L) and the ground-motion intensities
(xxx∗b). This approximate loss function is used in place of the exact loss function for evalu-
ating B values of P[L(xxx∗(b))> l
]. The procedure used for obtaining the approximate loss
function is described in the next section.
CHAPTER 7. LEARNING TECHNIQUES FOR PERFORMANCE ASSESSMENT 156
7.4.5 Approximate loss estimation using non-parametric regression
Application of MART to loss estimation
Multiple additive regression trees (MART) is a methodology for predictive data mining
(regression and classification). For a set of input ground-motion maps xxx’s ∈ xxx and corre-
sponding loss values L’s, the goal is to find a function F(xxx) that maps xxx to L, such that over
the joint distribution of all input-loss pairs, the expected value of the squared prediction er-
ror is minimized. MART is a gradient boosting algorithm [Friedman, 1999] that expresses
this function F as an additive expansion of the form
L = F(xxx) =P
∑p=0
βph(xxx;aaap) (7.3)
where L denotes the predicted loss value, the functions h(xxx;aaap) are called ‘base learners’
which are functions of xxx with parameters aaap. In the case of MART, the base learners
are regression trees [Brieman et al., 1983]. It is advantageous to use MART over other
regression techniques for approximating the loss function for the following reasons: (a)
there are considerably more input variables (1,125) than data points (150) and hence, it
is infeasible to use classical regression for this purpose (requires regularized regression),
(b) using a non-parametric model allows for quicker model fitting, (c) MART is capable
of modeling highly nonlinear behavior, and (d) MART is resistant against moderate to
heavy contamination by bad measurements (outliers) of the predictors and/or the responses,
missing values, and to the inclusion of potentially large numbers of irrelevant predictor
variables that have little or no effect on the response [Friedman, 2002].
The MART prediction model is developed based on the 150 intensity maps obtained
using the IS and the K-means approaches. Figure 7.6a shows the comparison of the pre-
dicted losses and the exact losses for a cross-validation set of maps. It is to be noted that
the cross-validation set used to develop the model is chosen to be different from the train-
ing set in order to obtain an unbiased estimate of the accuracy of the model. The overall
prediction accuracy is quite reasonable, but the predictions show small biases. The plot of
residuals (computed as the predicted delay (L) - the exact delay (L)) versus predicted loss
values (Figure 7.6b) shows that small losses are slightly over-predicted, while large losses
CHAPTER 7. LEARNING TECHNIQUES FOR PERFORMANCE ASSESSMENT 157
Figure 7.6: (a) Predicted vs. exact delay values (b) Prediction residuals.
are substantially under-predicted. In order to not adversely affect the prediction accuracy,
a bias correction needs to be applied to the predictions from the MART model. The next
subsection describes the bias correction procedure used in the current work.
Bias correction using LOESS
The bias correction procedure involves estimating the residual (bias) as a function of the
predicted delay, and subtracting it from the predicted value. The residual is fit as a function
of the predicted delay using locally weighted scatterplot smoothing (LOESS) [Efron and
Tibshirani, 1997], as shown in Figure 7.7a. As expected from previous comparisons of
exact and predicted losses, the residual is positive for small loss values and negative for
large loss values, and the LOESS fit captures this effect well. The corrected loss predictions
are obtained by subtracting out the residual (provided by LOESS) from the MART loss
prediction. A comparison of these corrected predictions against the exact values is shown
in Figure 7.7b. The figure shows a significantly better match between the exact and the
predicted values. For further validation, the loss exceedance curves are estimated using the
exact and the approximate loss functions (after bias correction), and are shown in Figure
7.8. The figure shows a very good match between the two curves illustrating the accuracy
of the loss prediction model developed.
CHAPTER 7. LEARNING TECHNIQUES FOR PERFORMANCE ASSESSMENT 158
Figure 7.7: (a) A LOESS fit to the prediction residuals (b) Predicted and exact delay valuesafter bias correction.
CHAPTER 7. LEARNING TECHNIQUES FOR PERFORMANCE ASSESSMENT 159
Figure 7.8: Two sample exceedance curves obtained using the exact and the approximateloss functions (after bias correction).
Study of residuals
The predictions from MART and LOESS are not exact, as evidenced by the scatter around
the predictions. Figure 7.9a shows the plot of residuals (i.e., observed value - predicted
value) versus the predicted loss values. While using the predictive model, it is important to
appropriately account for this variability, particularly while estimating confidence intervals,
since the smoothed predictions (obtained when the residuals are ignored) will result in an
underestimation of the variance and the width of the CI of the risk estimates.
Figure 7.9a shows that the residuals are heteroscedastic (i.e., the standard deviation
of the residuals varies with the predicted value), and that the standard deviation of the
residuals increases linearly with the predicted loss. In order to model these residuals, they
are first normalized by the predicted losses (i.e., the residuals are divided by the predicted
losses) and these normalized residuals shown in Figure 7.9b are seen to be homoscedastic.
A normal Q-Q plot of these normalized residuals, shown in Figure 7.10, indicates that the
residuals can be reasonably assumed to follow a normal distribution (since the deviation
from the 45◦ straight line is negligible). Further, the standard deviation of the normalized
residuals is estimated to be 0.27 and hence, the residuals are modeled as follows:
ε ∼ N(0,0.27F(xxx)) (7.4)
CHAPTER 7. LEARNING TECHNIQUES FOR PERFORMANCE ASSESSMENT 160
Figure 7.9: (a) Residuals from the prediction model (b) Residuals normalized (divided) bythe predicted delays.
Figure 7.10: Normal Q-Q plot of the residuals.
CHAPTER 7. LEARNING TECHNIQUES FOR PERFORMANCE ASSESSMENT 161
where ε denotes the residual, N(0,0.27F(xxx)) denotes the normal distribution with mean
0 and standard deviation 0.27F(xxx) and F(xxx) denotes the predicted loss for ground-motion
map xxx.
Summary of the loss prediction procedure
For a given ground-motion map xxx, the approximate loss is evaluated as follows:
(a) Use MART to obtain F1(xxx), which is a biased estimate of the loss.
(b) Estimate the bias using the LOESS fit: B(F1(xxx))
(c) Obtain the bias-corrected prediction: F2(xxx) = F1(xxx)− B(F1(xxx))
(d) Simulate a residual (e) from the univariate normal distribution N(0,0.27F2(xxx))
(e) Obtain the final estimate of the loss: F(xxx) = F2(xxx)+ e
Discussion: Importance of data selection for training the MART model
This section illustrates the reason behind obtaining a reasonably good MART fit despite us-
ing only 150 training samples. The good fit is primarily because the importance sampling
and the K-means clustering procedures that are used for selecting the training catalog of
150 maps select highly dissimilar maps that cover almost all the intensity values (even rare
intensities) of interest to the decision maker. In other words, the 150 maps are fairly rep-
resentative of the ground-motion hazard in the region. This does not happen, for instance,
when the maps are selected using random MCS. In order to illustrate this, 150 maps were
sampled using random MCS and are used to fit a MART model. The comparison between
the exact and the predicted losses from this new MART model for the cross-validation set
used earlier in Section 7.4.5 is shown in Figure 7.11. The random MCS method samples
a lot of ground-motion maps with small but frequently-observed ground-motion intensi-
ties that correspond to very small travel-time delays. As a result, the model performs very
poorly while predicting the losses due to the large intensity maps present in the cross-
validation set (unlike in the training set).
CHAPTER 7. LEARNING TECHNIQUES FOR PERFORMANCE ASSESSMENT 162
Figure 7.11: MART model fitted using 150 MCS maps.
7.4.6 Bootstrap confidence intervals estimated using the exact and theapproximate loss functions
Results and discussion
Bootstrap confidence intervals are estimated for the travel-time delay exceedance curve
using the procedure described in Section 7.4.4. In summary, the maps obtained using im-
portance sampling (12,500 importance sampled maps were used in Chapter 6) are first
bootstrapped (sampled with replacement) to obtain 1000 sets of 12,500 maps each (de-
noted as xxx∗b in Section 7.4.4). Each of these 1000 sets are then segmented into 150 clusters
each using the K-means clustering procedure. 150 maps are drawn (one from each cluster)
from each set (denoted as xxx∗(b) in Section 7.4.4), and are used for obtaining the exceedance
curve for that set (denoted as L(xxx∗(b)) in Section 7.4.4). Figure 7.13a and b show the 1000
exceedance curves obtained using the exact and approximate (MART+LOESS) loss func-
tions respectively. The point-wise CIs for these curves are estimated as the quantiles of the
1000 loss curves at each loss level. This procedure is summarized in Figure 7.12.
Figure 7.14a shows the CIs obtained using the exact and the approximate loss functions.
CHAPTER 7. LEARNING TECHNIQUES FOR PERFORMANCE ASSESSMENT 163
Figure 7.12: Methodology for estimating bootstrap confidence intervals for the loss curves.
Figure 7.13: 1000 bootstrapped exceedance curves obtained using the (a) exact loss func-tion (b) approximate loss function.
CHAPTER 7. LEARNING TECHNIQUES FOR PERFORMANCE ASSESSMENT 164
Figure 7.14: Bootstrap confidence intervals.
The two curves do not match perfectly, but are reasonably close to one another. Given that
it will be computationally almost impossible to estimate the CI using the exact loss function
in practice, the CI obtained using MART+LOESS is a reasonable substitute.
It was mentioned earlier that it is important to model the residuals in the MART +
LOESS predictions in order to obtain accurate CIs. This is illustrated by Figure 7.14b,
which shows the CIs obtained using the exact and the approximate loss functions, but
without accounting for the residuals. The CI estimated using the predicted losses is consid-
erably narrower than the CI estimated using the exact losses, indicating that the ‘smoothed’
prediction results in an underestimation of the variance of the risk estimates. (The addi-
tional jaggedness seen is due to the use of only 200 bootstrap samples while estimating the
approximate CI.)
Sensitivity to the number of bootstrap samples
Let B denote the number of bootstrap samples used for estimating the CI. Efron and Tib-
shirani [1997] recommend a B value of 1000 for obtaining a robust CI. Figure 7.15 shows
CHAPTER 7. LEARNING TECHNIQUES FOR PERFORMANCE ASSESSMENT 165
Figure 7.15: Bootstrap confidence intervals.
the CIs obtained using 20, 200 and 1000 bootstrap samples. It is seen that the CIs ob-
tained using 20 and 200 samples are highly jagged. It is also seen that (not illustrated by
Figure 7.15) the CI obtained using only 20 or 200 samples vary from one computation to
another. Overall, there is sufficient evidence to conclude that a B value of 1000 is optimal
for computing the CI.
Balanced bootstrap confidence interval
One of the techniques that can be adopted to reduce the number of bootstrap samples is
the balanced bootstrap method [Davison et al., 1986]. This method involves simulating
bootstrap samples such that each sample observation is used equally often. It has been
seen in past works that balanced bootstrap improves on ordinary uniform resampling when
employed to estimate distribution functions or quantiles [Hall, 2005], and hence is relevant
while estimating CIs.
In this study, 200 balanced bootstrap samples were generated (each with 12,500 maps),
and are used for estimating 200 exceedance curves. The point-wise confidence interval
obtained from these curves is shown in Figure 7.16. It can be seen that this CI is less
jagged than that obtained using 200 uniform bootstrap samples (though still not as good as
the CI obtained using 1000 uniform samples).
CHAPTER 7. LEARNING TECHNIQUES FOR PERFORMANCE ASSESSMENT 166
Figure 7.16: Balanced bootstrap confidence intervals.
7.5 Conclusions
The current study focused on the highly-computationally demanding task of estimating the
confidence interval for lifeline risk estimates. Estimating the confidence intervals is com-
putationally intensive because it requires repetitive risk calculations (in order to estimate
a variance for the risk estimates) that involves numerous lifeline performance evaluations.
In order to reduce the computational demand, the stochastically-representative catalog of
ground-motion maps generated in Chapter 6 using importance sampling and K-means
clustering was used in conjunction with a statistical learning technique called Multivari-
ate Adaptive Regression Trees (MART) to develop an approximate relationship between
the lifeline performance and the ground-motion intensities during an earthquake. Predic-
tion biases from the model were modeled using Locally Weighted Scatterplot Smoothing
(LOESS), and were subtracted out from the predictions to obtain unbiased performance
estimates. The lifeline performances predicted by the combination of MART and LOESS
were used in place of the exact lifeline performances (the evaluation of which is intensive)
to expedite the computation of the confidence intervals. It was seen that the exceedance
curves and their confidence intervals obtained using the exact and the approximate perfor-
mance measures match well.
Chapter 8
Seismic risk assessment ofspatially-distributed systems usingground-motion models fitted consideringspatial correlation
N. Jayaram and J.W. Baker (2010). Considering spatial correlation in mixed-effects re-
gression, and impact on ground-motion models, Bulletin of the Seismological Society of
America (in review).
8.1 Abstract
Ground-motion models are commonly used in earthquake engineering to predict the prob-
ability distribution of the ground-motion intensity at a given site due to a particular earth-
quake event. These models are often built using regression on observed ground-motion
intensities, and are fitted using either the one-stage mixed-effects regression algorithm pro-
posed by Abrahamson and Youngs [1992] or the two-stage algorithm of Joyner and Boore
[1993]. In their current forms, these algorithms ignore the spatial correlation between intra-
event residuals. This chapter theoretically motivates the importance of considering spatial
167
CHAPTER 8. IMPACT OF SPATIAL CORRELATION ON GMMS 168
correlation while fitting ground-motion models and proposes an extension to the Abraham-
son and Youngs [1992] algorithm that allows the consideration of spatial correlation.
By refitting the Campbell and Bozorgnia [2008] ground-motion model using the mixed-
effects regression algorithm considering spatial correlation, it is seen that the variance of
the total residuals and the ground-motion model coefficients used for predicting the median
ground-motion intensity are not significantly different from the published values even after
the incorporation of spatial correlation. It is, however, seen that that there is an increase
in the variance of the intra-event residual and a significant decrease in the variance of the
inter-event residual. These changes have implications for risk assessments of spatially-
distributed systems, because a smaller inter-event residual variance implies lesser likeli-
hood of observing large ground-motion intensities at all sites in a region. An example risk
assessment is performed on a hypothetical portfolio of buildings to demonstrate that ne-
glecting the proposed refinement causes an overestimation of the recurrence rates of large
losses.
8.2 Introduction
Ground-motion models are commonly used in earthquake engineering to predict the prob-
ability distribution of the ground-motion intensity at a given site due to a particular earth-
quake event. Typically, a ground-motion model takes the following form:
ln(Yi j)= f (PPPi j,θθθ)+ εi j +ηi (8.1)
where Yi j denotes the ground-motion intensity parameter of interest (e.g., Sa(T ), the spec-
tral acceleration at period T ) at site j during earthquake i; f (PPPi j,θθθ) denotes the ground-
motion prediction function with predictive parameters PPPi j (e.g., magnitude, distance of
source from site, site condition) and coefficient set θθθ ; εi j denotes the intra-event residual,
which is a zero mean random variable with standard deviation σi j; ηi denotes the inter-
event residual, which is a random variable with zero mean and standard deviation τi j. The
rest of this chapter assumes for simplicity that the residuals have a constant σ (i.e., σi j = σ)
and τ (i.e., τi j = τ) for any given ground-motion intensity parameter (i.e., the residuals are
CHAPTER 8. IMPACT OF SPATIAL CORRELATION ON GMMS 169
homoscedastic). This assumption is not true in some modern models [e.g., Abrahamson
and Silva, 2008], in which case, the concepts remain the same, but some of the equations
are no longer directly applicable.
Ground-motion models are primarily fitted using two approaches: the two-stage regres-
sion algorithm of Joyner and Boore [1993] [e.g., Boore and Atkinson, 2008] and the one-
stage mixed-effects model regression algorithm of Abrahamson and Youngs [1992] [e.g.,
Abrahamson and Silva, 2008, Campbell and Bozorgnia, 2008, Chiou and Youngs, 2008].
Joyner and Boore [1993] provide a detailed comparison of these two algorithms. Both these
algorithms, in their current forms, assume that the intra-event residuals are independent of
each other. The intra-event residuals, however, are known to be spatially correlated [Boore
et al., 2003, Wang and Takada, 2005, Goda and Hong, 2008, Jayaram and Baker, 2009a].
Recently, Hong et al. [2009] investigated the influence of including spatial correlation in
the regression analysis on the ground-motion models fitted using the two-stage regression
algorithm and a one-stage algorithm of Joyner and Boore [1993]. They concluded that
the influence of considering spatial correlation on the estimated ground-motion models is
negligible based on insignificant changes to the coefficient set θθθ . Fitting ground-motion
models considering correlation does, however, change the variances of the inter-event and
the intra-event residuals (observed by Hong et al. [2009] themselves). This chapter pro-
vides a theoretical basis for such changes to the variance terms, and also discusses the
impact of these changes on the estimated seismic risk of spatially-distributed systems. Fur-
ther, a modified algorithm based on that of Abrahamson and Youngs [1992] is developed
that accounts for the spatial correlation in the mixed-effects regression. This modified al-
gorithm is used to refit the Campbell and Bozorgnia [2008] ground-motion model in order
to illustrate the impact of incorporating spatial correlation.
8.2.1 Current regression algorithm
Brillinger and Preisler [1984a,b] first proposed regressing a ground-motion model as a
fixed-effects model. In this approach, the ground-motion model takes the following form:
ln(Yi j)= f (PPPi j,θθθ)+ ε
(t)i j (8.2)
CHAPTER 8. IMPACT OF SPATIAL CORRELATION ON GMMS 170
where ε(t)i j denotes the total residual term at site j during earthquake i.
Abrahamson and Youngs [1992] (henceforth referred to as AY92) subsequently devel-
oped a more stable algorithm for the regression by treating the ground-motion model as
a mixed-effects model. The mixed-effects model differs from the fixed-effects model in
its consideration of the error term as being the sum of an intra-event error term and an
inter-event error term (Equation 8.1). The inter-event term helps partially account for the
correlation between the ground-motion intensities recorded during any particular earth-
quake. The AY92 algorithm uses a combination of a fixed-effects regression algorithm and
a likelihood maximization approach, and is described below in more detail.
In the first step of the algorithm, it is assumed that the random-effects terms η1,η2, · · ·,ηM equal zero, in which case Equation 8.1 simplifies to ln
(Yi j)= f (PPPi j,θθθ)+εi j. The co-
efficient set θθθ is then estimated based on the observed Yi j’s using a fixed-effects regression
algorithm. In the next step, the standard deviations σ (for the intra-event residuals) and
τ (for the inter-event residuals) are computed using the likelihood maximization approach
described below.
The total residuals (i.e., the sum of the inter-event and the intra-event residuals), denoted
ε(t)i j , can be computed using the θθθ estimated in the previous step as follows:
ε(t)i j = εi j +ηi = ln(Yi j)− f (PPPi j,θθθ) (8.3)
It is known that the total residuals follow a multivariate normal distribution [Jayaram and
Baker, 2008], and hence, the likelihood (L1) of having observed the set of total residuals
εεε(t) =(
ε(t)i j
)can be estimated as follows:
ln(L1) =−N2
ln(2π)− 12
ln|CCC|− 12
(εεε(t))′
CCC−1(
εεε(t))
(8.4)
where N is the total number of data points, CCC is the covariance matrix of the total residuals
and(
εεε(t))′
denotes the transpose of εεε(t). While estimating the model coefficients, AY92
assume that the intra-event residuals are independent of each other and of the inter-event
CHAPTER 8. IMPACT OF SPATIAL CORRELATION ON GMMS 171
residuals. Hence, the covariance matrix CCC can be written as follows:
CCC = σ2IIIN + τ
2M
∑i=1
+111ni,ni (8.5)
where IIIN is the identity matrix of size N by N, 111ni,ni is a matrix of ones of size ni by
ni, Σ+ indicates a direct sum operation (using the notation of AY92), M is the number of
earthquake events and ni is the number of recordings for the ith event. The matrix CCC can be
expanded as follows:
CCC =
σ2IIIn1 + τ2111n1,n1 000 · · · 000
000 σ2IIIn2 + τ2111n2,n2 · · · 000
. . · · · 000
. . · · · 000
000 000 · · · σ2IIInM + τ2111nM ,nM
(8.6)
The maximum likelihood estimates of σ and τ are those that maximize the likelihood
function L1, and are obtained using numerical optimization. Now, for given θθθ and the
maximum likelihood estimates of σ and τ , the random-effects term ηi is estimated using
the maximum likelihood approach as well. The maximum likelihood estimate of ηi is
obtained as follows [Abrahamson and Youngs, 1992]:
ηi =τ2
∑nij=1 εεε(t)
niτ2 +σ2 (8.7)
Finally, using the estimated value of ηi, a new set of coefficients θθθ is obtained using a
fixed-effects algorithm for ln(Yi j)−ηi (i.e., considering ln(Yi j)−ηi = f (PPPi j,θθθ)+εi j). The
new set θθθ is then used to reestimate σ , τ and ηηη , and this iterative algorithm is continued
until the coefficient estimates converge.
In summary, the steps of the mixed-effects algorithm used by AY92 are as follows:
1. Estimate the model coefficients θθθ using a fixed effects regression algorithm assuming
ηηη equals 0.
2. Using θθθ , solve for the variances of the residuals, σ2 and τ2, by maximizing the
likelihood function described in Equation 8.4.
CHAPTER 8. IMPACT OF SPATIAL CORRELATION ON GMMS 172
3. Given θθθ ,σ2 and τ2, estimate ηi using Equation 8.7.
4. Given ηi, estimate new coefficients (θθθ ) using a fixed effects regression algorithm for
ln(Yi j)−ηi.
5. Repeat steps 2, 3 and 4 until the likelihood in step 2 is maximized and the estimates
for the coefficient set converge.
One drawback of this algorithm is the assumption in Equation 8.5 that the intra-event
residuals are independent of each other. It is known that the intra-event residuals are spa-
tially correlated, with the correlation decreasing with increasing separation distance be-
tween the residuals [e.g., Jayaram and Baker, 2009a]. Before addressing that issue, the
need to account for the spatial correlation in the regression algorithm is illustrated in the
next section.
8.2.2 Should spatial correlation be considered in the regression algo-rithm?
Consider the hypothetical case where the correlation between the intra-event residuals at
any two different sites is a constant equal to ρ . In this case, the covariance matrix (CCC) for
the total residuals (ε(t)i j ) is defined by the following equations:
C(
ε(t)i j ,ε
(t)i j′
)= ρσ
2 + τ2 ∀ i, j 6= j′ (8.8a)
C(
ε(t)i j ,ε
(t)i j
)= σ
2 + τ2 ∀ i, j (8.8b)
C(
ε(t)i j ,ε
(t)i′ j′
)= 0 ∀ j, j′, i 6= i′ (8.8c)
In summary, the covariance matrix for the total residuals can be expressed as follows:
CCC = (1−ρ)σ2IIIN +(τ2 +ρσ2)
M
∑i=1
+111ni,ni (8.9)
CHAPTER 8. IMPACT OF SPATIAL CORRELATION ON GMMS 173
Denoting√
1−ρσ by σ ′ and√
τ2 +ρσ2 by τ ′, Equation 8.9 can be rewritten as
CCC = σ′2IIIN + τ
′2M
∑i=1
+111ni,ni (8.10)
Comparing the forms of Equations 8.5 and 8.10, it can be seen that the algorithm of
AY92 actually provides the estimates of σ ′ and τ ′ rather than σ and τ (If spatial correlations
are absent, this is correct since σ ′ = σ and τ ′ = τ .)
Assume for simplicity that the set of coefficients θθθ is not affected by the spatial corre-
lation (this assumption is relaxed subsequently). Hence, the ‘correct’ estimates of σ and τ
can be estimated from the σ ′ and τ ′ provided by AY92 as follows:
σ =σ ′√1−ρ
(8.11a)
τ =√
τ ′2−ρσ2 (8.11b)
It is to be noted from the above discussion and Equation 8.11 that assuming indepen-
dent intra-event residuals will underestimate σ and overestimate τ . This has implications
for lifeline risk assessments since a larger τ implies a higher likelihood of observing large
ground-motion intensities throughout the region of interest. Thus, it is important to deter-
mine whether fitting the ground-motion equations while considering correlated intra-event
residuals changes the estimates of σ and τ significantly.
8.3 Regression algorithm for mixed-effects models consid-
ering spatial correlation
This section describes an algorithm for fitting the mixed-effects model while accounting
for spatial correlation between intra-event residuals. The algorithm described here differs
from that of AY92 in the estimation of the likelihood function L1 (used in step 2) and in
the computation of the inter-event residual ηi (step 4). Both these changes are necessary to
account for the spatial correlation between intra-event residuals in the regression algorithm.
CHAPTER 8. IMPACT OF SPATIAL CORRELATION ON GMMS 174
8.3.1 Covariance matrix for the total residuals
The covariance matrix for the total residuals shown in Equation 8.5 is based on the assump-
tion of independence between spatially-distributed intra-event residuals. The covariance
matrix in the presence of spatial correlation is described below.
Let ρ(d j j′) denote the spatial correlation between intra-event residuals at two sites j
and j′ as a function of d j j′ , the separation distance between j and j′. Then,
C(ε(t)i j ,ε
(t)i j′ ) = C(εi j +ηi,εi j′+ηi)
= ρ(d j j′)σ2 + τ
2 ∀ i, j, j′ (8.12a)
C(
ε(t)i j ,ε
(t)i′ j′
)= 0 ∀ j, j′, i 6= i′ (8.12b)
8.3.2 Obtaining inter-event residuals from total residuals
The maximum likelihood approach is typically used to estimate a constant but unknown
parameter from observed data. The parameter ηi that is of interest here, however, is a
random variable in itself, and hence the authors use a Bayesian framework rather than the
method of maximum likelihood to estimate ηi.
The prior distribution of ηi is N(0,τ2). Conditional on the knowledge of ηi, the ε(t)i j ’s
marginally follow a normal distribution with mean ηi and variance σ2 (since ε(t)i j = εi j +
ηi). Also, the correlation coefficient between ε(t)i j and ε
(t)i j′ conditional on ηi is given by
ρ(d j j′). In other words, the conditional covariance matrix (Cc) for the total residuals can
be expressed as follows:
Cc(ε(t)i j ,ε
(t)i j′ ) = ρ(d j j′)σ
2 ∀ i, j, j′ (8.13a)
Cc
(ε(t)i j ,ε
(t)i′ j′
)= 0 ∀ j, j′, i 6= i′ (8.13b)
Hence the joint density of εεε(t)i =
[ε(t)i1 ,ε
(t)i2 , · · · ,ε(t)ini
]and ηi is expressed as follows:
CHAPTER 8. IMPACT OF SPATIAL CORRELATION ON GMMS 175
f (εεε(t)i ,ηi) = f (εεε(t)i |ηi) f (ηi) (8.14)
∝ exp[−1
2
(εεε(t)i −ηi1ni,1
)′C−1
c
(εεε(t)i −ηi1ni,1
)]exp[− 1
2τ2 η2i
]
where εεε(t)i =
[ε(t)i1 ,ε
(t)i2 , · · · ,ε(t)ini
]is the collection of total residuals at all the sites during
earthquake i, f (.) denotes the probability density function,(
εεε(t)i −ηi1ni,1
)′denotes the
transpose of(
εεε(t)i −ηi1ni,1
), and 1ni,1 denotes a column matrix of ones of length ni.
Noting that f (εεε(t)i ,ηi) = f (εεε(t)i ) f (ηi|εεε(t)i ), one possible approach to identify the poste-
rior distribution of ηi given εεε(t)i is to divide the joint density into a function of just εεε
(t)i and
a function that also contains ηi. Let Q(εεε(t)i ) denote any generic function of only εεε
(t)i not
containing ηi. Hence,
f (εεε(t)i ,ηi) ∝ exp[−1
2
(εεε(t)i −ηi1ni,1
)′C−1
c
(εεε(t)i −ηi1ni,1
)]exp[− 1
2τ2 η2i
](8.15)
= Q(εεε(t)i )exp
[12
εεε(t)i
′
C−1c ηi1ni,1 +ηi1
′ni,1C−1
c εεε(t)i −
12
η2i 1′ni,1C−1
c 1ni,1
]exp[− 1
2τ2 η2i
]
= Q(εεε(t)i )exp
−12
(1τ2 +1
′ni
C−1c 1ni,1
)ηi−1′ni,1C−1
c εεε(t)i
1τ2 +1′ni,1C−1
c 1ni,1
2
From the above equation, it can be seen that f (ηi|εεε(t)i ) has a normal distribution with mean1′ni,1
C−1c εεε
(t)i
1τ2 +1′ni,1
C−1c 1ni,1
and variance 11
τ2 +1′ni,1C−1
c 1ni,1. If the best estimator for ηi is to be obtained
under the squared-error loss criterion, then the Bayesian estimator of ηi equals the posterior
mean [Lehmann and Casella, 2003]
ηi =1′ni,1C−1
c εεε(t)i
1τ2 +1′ni,1C−1
c 1ni,1(8.16)
CHAPTER 8. IMPACT OF SPATIAL CORRELATION ON GMMS 176
If the spatial correlation is absent, Cc is simply σ2 times an identity matrix of size ni
by ni, in which case, 1′ni,1C−1
c 1ni,1 equals ni/σ2 and 1′ni,1C−1
c εεε(t)i equals ∑
nij=1 ε
(t)i j /σ2, and
Equation 8.16 becomes identical to Equation 8.7.
8.3.3 Algorithm summary
In summary, the steps of the modified mixed-effects algorithm are as follows:
1. Estimate the model coefficients θθθ using a fixed effects regression algorithm assuming
ηηη equals 0.
2. Using θθθ , solve for the variances of the residuals, σ2 and τ2, by maximizing the like-
lihood function described in Equation 8.4. The covariance C in Equation 8.4 is estimated
using Equation 8.12.
3. Given θθθ ,σ2 and τ2, estimate ηi using Equation 8.16.
4. Given ηi, estimate new coefficients (θθθ ) using a fixed effects regression algorithm for
ln(Yi j)−ηi.
5. Repeat steps 2, 3 and 4 until the likelihood in step 2 is maximized and the estimates
for the coefficient set converge.
8.3.4 Large sample standard errors of σ and τ
If desired, the standard errors of the inter- and intra-event residual variances can be calcu-
lated based on the following results from Searle [1977]:
var(σ2) = 2
[tr(
C−1 ∂C∂ (σ2)
)2]−1
(8.17a)
var(τ2) = 2
[tr(
C−1 ∂C∂ (τ2)
)2]−1
(8.17b)
where C is the covariance matrix defined in Equation 8.12, ∂C∂(σ2)
denotes the partial deriva-
tive of C with respect to σ2, ∂C∂(τ2)
denotes the partial derivative of C with respect to τ2, tr
CHAPTER 8. IMPACT OF SPATIAL CORRELATION ON GMMS 177
denotes the trace of a matrix and var denotes variance. The partial derivatives, ∂C∂(σ2)
and∂C
∂(τ2), can be evaluated using numerical differentiation.
Alternately, the standard errors can also be evaluated using statistical techniques such
as bootstrap [Efron, 1998].
8.3.5 Mixed-effects regression procedure in R
While mixed-effects regression procedures that consider spatial correlation (referred to as
‘within-group correlation’ in statistical literature) are available in statistical programming
languages such as R (e.g., the nlme package of Pinheiro and Bates [2000]), it is potentially
more convenient for current users of the Abrahamson and Youngs [1992] algorithm to
switch to the modified algorithm described in this chapter. Further, based on the authors’
experience, the nlme implementation in R suffers from numerical instabilities while fitting
the over-parameterized ground-motion models, while the implementation of the proposed
algorithm in MATLAB recovers from similar numerical instabilities potentially due to a
more robust fixed-effects regression implementation in MATLAB.
8.4 Results and discussion
In the current study, the algorithm described in the previous section is used to refit the
Campbell and Bozorgnia [2008] ground-motion prediction model (henceforth referred to
as the CB08 model) for illustration. First, in order to provide a baseline model for compar-
ison, the coefficients of the CB08 model are reestimated while ignoring spatial correlation.
For consistency, only records used by CB08 are used for estimating the coefficients. Ta-
ble 8.1 shows the coefficients estimated in this study for predicting spectral accelerations
at 1 second (denoted Sa(1s)) in the uncorrelated case. Also shown in the table for com-
parison are the corresponding published CB08 model coefficients. Documentation of how
these coefficients are used to make predictions is provided by CB08. The estimates of the
standard deviations of the intra-event residual and the inter-event residual (i.e., σ and τ re-
spectively) are shown in Table 8.2. The value of the published intra-event residual standard
deviation reported here corresponds to that at large Vs30’s (the Vs30 is set above a threshold
CHAPTER 8. IMPACT OF SPATIAL CORRELATION ON GMMS 178
value beyond which the ground-motion model no longer consider soil non-linearity effects,
wherein the intra-event residuals have a constant variance at any given period). The refit-
ted coefficients and variance estimates obtained in this work are similar, but not identical,
to those reported by CB08. These small discrepancies are likely due to the manual co-
efficient smoothing carried out by the authors of the CB08 model [Campbell, 2009]. For
consistency, the refitted model coefficients are treated as the benchmark values, for compar-
ison to model coefficients obtained considering spatial correlation. It is to be noted that the
functional form of the CB08 model required knowledge about the A1100 value (median es-
timate of PGA on a reference rock outcrop with Vs30 = 1100m/s) for the median prediction.
This is obtained directly using the coefficients of the CB08 model corresponding to PGA
(as against fitting a separate model for the PGA’s) for simplicity. This is reasonable because
the model coefficients used for predicting median values do not change significantly after
incorporating spatial correlation as shown subsequently in this chapter.
The model coefficients are then reestimated considering spatial correlation. The spatial
correlation model is obtained from Jayaram and Baker [2009a], and is shown below.
ρ(h) = e−3h/b (8.18)
where h (km) denotes the separation distance between the sites of interest, and b denotes
the ‘range’ parameter which determines the rate of decay of correlation. This range is a
function of the spectral period, and equals 26km when Sa(1s) is considered. The coefficient
estimates (i.e., θθθ ) obtained in this case are shown in Table 8.1. It can be seen from the table
that the coefficients obtained by considering spatial correlation are similar to those obtained
by ignoring spatial correlation. This is reinforced by a plot of the predicted medians at all
the data sites using these two approaches (Figure 8.1). This matches with the observation
of Hong et al. [2009] that the ground-motion model coefficients do not change significantly
when considering spatial correlation.
While the coefficients for the median predictions are found to be relatively insensitive
to the incorporation of spatial correlation, significant changes are seen in the estimates of
the variance of the residuals (Table 8.2). In particular, the value of σ increases from 0.578
CHAPTER 8. IMPACT OF SPATIAL CORRELATION ON GMMS 179
Table 8.1: Regression coefficients for estimating median Sa(1s)Case c0 c1 c2 c3 c4 c5 c6 c7
1 -6.406 1.196 -0.772 -0.314 -2.000 0.170 4.00 0.2552 -6.487 1.181 -0.878 -0.379 -2.064 0.195 3.884 0.2643 -6.942 1.297 -1.073 -0.182 -2.112 0.198 4.440 0.324
Case c8 c9 c10 c11 c12 k1 k2 k31 0.000 0.490 1.571 0.150 1.000 400.0 -1.955 1.9292 -0.110 0.897 1.577 0.122 0.871 400.0 -1.955 1.9293 -0.093 0.796 1.565 0.093 0.865 400.0 -1.955 1.929
Case 1: Published CB08 results [Campbell and Bozorgnia, 2008]Case 2: Estimated in this study without considering spatial correlationCase 3: Estimated in this study considering spatial correlation
Table 8.2: Standard deviations of residuals corresponding to Sa(1s)Case σ τ
√σ2 + τ2
1 0.568 0.255 0.6232 0.578 0.223 0.6203 0.654 0.157 0.673
Case 1: Published CB08 results [Campbell and Bozorgnia, 2008]Case 2: Estimated in this study without considering spatial correlationCase 3: Estimated in this study considering spatial correlationσ denotes the standard deviation of the intra-event residualτ denotes the standard deviation of the inter-event residual√
σ2 + τ2 denotes the standard deviation of the total residual
CHAPTER 8. IMPACT OF SPATIAL CORRELATION ON GMMS 180
Figure 8.1: Comparison of predicted median Sa(1s) values obtained using the CB08 modelfitted with and without the consideration of spatial correlation: (a) linear scale (b) log scale.
to 0.654 and the value of τ decreases from 0.223 to 0.157 after incorporating the spatial
correlation. This trend is to be expected based on the illustrative example shown in Section
8.2.
8.4.1 Standard deviation of residuals as a function of period
The results presented in the previous section support the use of the published coefficients
(i.e., θθθ ) for predicting the median intensities. The values of σ and τ , however, must be
obtained considering spatial correlation. This implies that the iterative mixed-effects algo-
rithm described earlier in the chapter can be simplified to a computation of only the residual
variances σ2 and τ2 (Step 3) using the published values of θθθ (i.e., the mixed-effects regres-
sion is now simply a random-effects regression procedure).
Hence, in this work, the CB08 model coefficients are assumed to be the fixed-effects
model coefficients, and the total residuals are computed using the records in the PEER
NGA database (only those records used by the authors of the CB08 model are considered
for compatibility) [Chiou et al., 2008]. The maximum likelihood estimates of σ and τ
are then obtained at different spectral acceleration periods from the total residuals using
the procedures described earlier. Figure 8.2a compares the estimates of σ obtained in this
study to those reported by CB08. It can be seen that the values of σ obtained considering
CHAPTER 8. IMPACT OF SPATIAL CORRELATION ON GMMS 181
Figure 8.2: Effect of spatial correlation on: (a) estimated intra-event residual standarddeviation (σ ), (b) estimated inter-event residual standard deviation (τ), (c) estimated totalresidual standard deviation. (d) Ratio of inter-event residual standard deviation to totalresidual standard deviation.
CHAPTER 8. IMPACT OF SPATIAL CORRELATION ON GMMS 182
spatial correlation are mostly larger than the published σ ’s (which have been estimated
ignoring spatial correlations). Figure 8.2b shows that the values of τ , on the other hand, are
considerably smaller when spatial correlations are considered. The values of σ and τ are
then used to compute the standard deviations of the total residuals (computed as√
σ2 + τ2),
and plotted in Figure 8.2c. It can be seen from this figure that considering spatial correlation
does not significantly alter the total residual standard deviation. (Hong et al., 2010 noticed
a small reduction in the total residual standard deviation when the spatial correlation was
considered. The alteration in the total residual standard deviation could depend on the data
set and the spatial correlation model used.)
Though the current work only refits the CB08 model, the trends in the values of σ and
τ are the same for the other recent NGA ground-motion models [e.g., Boore and Atkinson,
2008, Chiou and Youngs, 2008]. This can be seen from Figure 8.2d, which shows typical
ratios of the inter-event residual standard deviation to the total residual standard deviation
reported by these ground-motion models. It is seen that the ratios reported by the ground-
motion modelers are generally much larger than those estimated in this work considering
spatial correlation.
8.4.2 Estimates of spatial correlation
The spatial correlation estimates (Equation 8.18) provided by Jayaram and Baker [2009a]
are based on residuals computed using the published ground-motion models that assume in-
dependence between intra-event residuals. As discussed earlier, the consideration of spatial
correlation while fitting the models does not change the median predictions, and therefore,
the total residuals (Equation 8.1). Jayaram and Baker [2009a] also showed that the spa-
tial correlation between intra-event residuals can be estimated directly from total residuals
(exactly when the intra-event residuals are homoscedastic and approximately otherwise).
Therefore, it can be inferred that the estimates of spatial correlation will be very similar
when estimated using ground-motion models fitted with/ without consideration of spatial
correlation. In other words, it is still appropriate to use the correlation models previously
developed using the published ground-motion models.
CHAPTER 8. IMPACT OF SPATIAL CORRELATION ON GMMS 183
8.4.3 Risk assessment for a hypothetical portfolio of buildings
Since ignoring spatial correlation while fitting the ground-motion model does not signifi-
cantly affect the estimates of the ground-motion medians ( f (θθθ)) or the standard deviation
of the total residuals (Figure 8.2c), hazard and loss analyses for single structures will pro-
duce accurate results if the existing ground-motion models are used. Risk assessments for
spatially-distributed systems, however, are influenced by the standard deviation of the inter-
event and the intra-event residuals and not just by the medians and the standard deviation of
the total residuals (this is discussed in more detail in the following section). Therefore, risk
assessments of such systems carried out using ground-motion models fitted with and with-
out consideration of spatial correlation could result in different loss estimates. In the next
section, this is illustrated using a risk assessment carried out on a hypothetical portfolio of
buildings located in the San Francisco Bay Area.
Consider a hypothetical portfolio of 100 buildings in the San Francisco Bay Area lo-
cated on a 10 by 10 grid with a grid spacing of 20km. Each building in the portfolio is
assumed to have a replacement value of $1,000,000. The seismic risk of this portfolio is
estimated by modeling the seismic hazard due to 10 different faults and fault segments.
(The source model is obtained from USGS [2003]). The risk assessment is carried out us-
ing a simulation-based procedure described in Crowley and Bommer [2006] and Jayaram
and Baker [2010]. The steps involved in this procedure are summarized below.
Step 1: Simulate earthquakes of different magnitudes on the active faults in the region,
using appropriate magnitude-recurrence relationships.
Step 2: Using the ground-motion model, compute the median ground-motion intensities
( f (θθθ)) and the standard deviations of the inter-event and the intra-event residuals (σ and τ
respectively) at the sites of interest.
Step 3: Simulate the inter-event residual (i.e., η j) by sampling from the univariate
normal distribution with mean zero and standard deviation τ .
Step 4: Simulate the intra-event residuals (i.e., εi j’s) by sampling from a multivariate
normal distribution with mean 000p,1 (zero vector of size p) and covariance matrix given by
Equation 8.12. Here, the spatial correlation (ρ j j′) is defined by the exponential model in
Equation 8.18 with a range of 26 km.
CHAPTER 8. IMPACT OF SPATIAL CORRELATION ON GMMS 184
Step 5: Combine the medians, inter-event residuals and intra-event residuals using
Equation 8.1 to obtain realizations of the ground-motion intensity at all sites of interest.
In the rest of the chapter, each set of ground-motion intensities is referred to as a ground-
motion intensity map. The collection of all simulated ground-motion intensity maps quan-
tifies the total ground-motion hazard in the region.
Step 6: Simulate the damage to the buildings due to each ground-motion intensity map.
Here, this is done using fragility functions which provide the probability of the building
damage being in or exceeding various damage states (no damage, minor damage, moder-
ate damage, extensive damage and collapse) as a function of the spectral acceleration at
1 second at the building location. The damage functions were assumed to be cumulative
lognormal distribution functions with median values 0.4, 0.5, 0.7 and 0.9 for the minor,
moderate, extensive and collapse damage states respectively. The lognormal standard de-
viation was assumed to be 0.6 in all these cases.
Step 7: Compute the total monetary loss associated with the damage to the portfolio
due to each ground-motion intensity map. This is computed by assuming the damage ratio
(ratio of repair cost to replacement cost) to be 0.03, 0.08, 0.25 and 1.00 for the minor,
moderate, severe and collapse damage states respectively.
Step 8: Obtain the loss exceedance curve which provides the annual rate of exceedance
of various monetary loss values. The loss exceedance curve is obtained as the product of
the recurrence rates of all earthquakes in the region and the probability of exceedance of
various monetary loss values. The exceedance probabilities are calculated as follows:
P(L≥ l) =1n
n
∑i=1
I(Li ≥ l) (8.19)
where P(L≥ l) is the probability that the loss exceeds l, n denotes the number of simulated
ground-motion intensity maps, Li is the monetary loss associated with ground-motion in-
tensity map i, and I(Li ≥ l) is an indicator variable that equals one if Li exceeds l and zero
otherwise.
The above-mentioned risk assessment process is carried out using the values of σ and
τ provided by CB08 as well as with the σ and the τ estimated in this work by considering
spatial correlations in the regression formulation (Figures 8.2a and 8.2b). In both cases, the
CHAPTER 8. IMPACT OF SPATIAL CORRELATION ON GMMS 185
CB08 median model coefficients are used for estimating median intensities. The resulting
loss exceedance curves are shown in Figure 8.3. It can be seen in Figure 8.3 that the recur-
rence rates of extreme losses are overestimated when the CB08 estimates are used. This is
a result of the fact that the CB08 model overestimates τ and underestimates σ by ignoring
spatial correlation. A large value of τ increases the likelihood of observing large positive
inter-event residuals, which will simultaneously increase the ground-motion intensity at all
the sites in the region. If spatial correlations are large, a large value of σ will have a similar
effect and can result in large ground-motion intensities at multiple sites. In such a case, the
effect of underestimating σ is compensated by the effect of overestimating τ . If the spatial
correlations are small, however, underestimating σ and overestimating τ will have the net
effect of jointly producing more extreme ground-motion intensities at multiple sites than is
probable in reality. It can be inferred from Equation 8.18 that the spatial correlation will be
small if h is large or if b is small. Therefore, when the components of a spatially-distributed
system are well separated (large h) or if the correlation range is small, the ground-motion
models fitted without considering spatial correlation will overestimate the likelihood of
jointly observing extreme ground-motion intensities at multiple sites. It is to be noted that
the separation between the buildings in the hypothetical portfolio considered in this work
is substantial, which leads to significant differences between the loss curves obtained with
and without consideration of spatial correlation. It is difficult to make general conclusions
about the size of this effect, but it is clear that seismic risk analysis calculations using exist-
ing ground-motion model estimates of σ and τ will overestimate the chance of observing
large losses.
8.5 Conclusions
This work illustrated the impact of considering spatial correlation between intra-event
residuals while developing ground-motion models. The mixed-effects algorithm of Abra-
hamson and Youngs [1992], which assumes independence between intra-event residuals,
was modified to account for the spatial correlation between the intra-event residuals. This
was done by changing the likelihood function used for estimating the inter-event and the
intra-event residual variances given other model coefficients and changing the estimate of
CHAPTER 8. IMPACT OF SPATIAL CORRELATION ON GMMS 186
Figure 8.3: Risk assessment results for a hypothetical portfolio of buildings performedusing ground-motion models developed with and without the proposed refinement.
the inter-event residual given the total residuals at multiple sites. The modified algorithm
was used to refit the Campbell and Bozorgnia [2008] ground-motion model, to illustrate the
effect of this refinement. The variance of the total residuals and the model coefficients used
for predicting the median ground-motion intensity were not significantly affected by the
proposed refinement. Significant changes, however, were seen in the variance of the intra-
event and the inter-event residuals. Incorporating spatial correlation was seen to increase
the intra-event residual variance and to decrease the inter-event residual variance. These
changes have implications for risk assessments of spatially-distributed systems because a
smaller inter-event residual variance implies a lesser likelihood of simultaneously observ-
ing larger-than-median ground-motion intensities at all sites in a region. To demonstrate
this effect, a risk assessment was performed for a hypothetical portfolio of buildings using
the ground-motion models obtained with and without accounting for spatial correlation.
The results showed that using the published variance estimates causes an overestimation of
the exceedance rates of large losses.
Chapter 9
Hurricane risk assessment ofspatially-distributed systems withconsideration of wind-field uncertaintiesand spatial correlation
9.1 Abstract
With a view toward extending the seismic risk assessment techniques developed in this
work for risk assessment under other types of hazards, this exploratory study focuses on
quantifying the uncertainties and the spatial correlation in hurricane wind fields (using tech-
niques that were used for earthquake ground motion fields), and evaluating their impact on
the hurricane risk of spatially-distributed systems. Hurricane wind-speed predictions are
obtained for two sample hurricanes, Hurricane Jeanne and Hurricane Frances, using the
Batts et al. [1980] wind-speed model, and the uncertainties in these predictions are evalu-
ated using ‘actual’ wind-speed recordings. The spatial correlation of wind speeds is esti-
mated and modeled using geostatistical tools. Finally, the impact of the wind-speed uncer-
tainties and the spatial correlation on the hurricane risk of a spatially-distributed system is
illustrated by a sample risk assessment of a hypothetical portfolio of buildings. The results
of the risk assessment show that the uncertainties and the spatial correlations in the wind
187
CHAPTER 9. PROBABILISTIC HURRICANE RISK ASSESSMENT 188
fields needs to be modeled in order to avoid introducing errors into the risk calculations of
spatially-distributed systems.
9.2 Introduction
Frameworks for the risk assessment of structures and infrastructure systems under natural
and man-made hazards share many similarities. Broadly, they involve the quantification
of a hazard intensity measure (e.g., ground-motion intensities during earthquakes, wind
speeds during hurricanes) and the associated probable losses. The techniques developed in
this thesis for seismic risk assessment can thus be applicable for the risk assessment under
other types of hazards. This exploratory study extends the seismic hazard and risk assess-
ment concepts and techniques discussed in the earlier chapters of this thesis to hurricane
(chosen as a sample alternate hazard) hazard and risk modeling.
Vickery et al. [2000b] developed a hurricane wind-hazard model that forms the basis
for the wind-speed contours in ASCE Standard 7-02 [2003] in the Southeast U.S. The
following steps are involved in this hazard quantification approach:
• Step 1: Use historical hurricane records to develop probability density functions
(PDF) for key hurricane parameters such as the location of origin, translation di-
rection, translation speed, central pressure and radius of maximum wind.
• Step 2: Use the PDFs developed in Step 1 to Monte Carlo simulate probable future
hurricanes.
• Step 3: Predict the peak wind speeds due to each simulated hurricane at the sites
of interest using empirical or physics-based wind-speed models. [e.g., Batts et al.,
1980, Vickery et al., 2000a, 2008].
• Step 4: Develop a PDF for the peak wind speeds experienced at any particular site
using the wind-speed information from Step 3.
In general, it can be seen that this process is similar to probabilistic seismic hazard
analysis (PSHA), which is used for quantifying seismic ground-motion hazard at a given
site [Cornell, 1968, Kramer, 1996].
CHAPTER 9. PROBABILISTIC HURRICANE RISK ASSESSMENT 189
The wind-hazard model described above can be used in combination with structural
fragility curves to obtain the exceedance rates of different levels of structural losses using
numerical integration. Alternately, the hazard information can be used in a structural relia-
bility framework to estimate the failure probability of a structure under hurricane loading.
For instance, Li and Ellingwood [2009] modeled the site wind speeds as a Weibull random
variable (whose distribution is parameterized using the wind hazard information obtained
from Vickery et al. [2000b]), and estimated the reliability of low-rise light-frame wood
residential construction in the U.S. subjected to hurricane loading.
It is, however, difficult to use the two analytical risk assessment approaches described
above for assessing the risk of spatially-distributed systems such as portfolios of buildings
and lifelines. This is because the risk assessment of spatially-distributed systems is based
on a large vector of correlated wind speeds (wind speeds at all component locations), which
makes it difficult to use numerical integration and other analytical techniques. Hence, many
past research works use Monte Carlo simulation (MCS) instead of analytical approaches
for the risk assessment of spatially-distributed systems [e.g., Legg et al., 2010]. The basic
MCS approach for the risk assessment involves the following steps:
• Step 1: Simulate probable future hurricanes using the PDFs of hurricane-related pa-
rameters developed in past research works such as that of Vickery et al. [2000b].
• Step 2: Predict the peak wind speeds due to each simulated hurricane using empirical
or physics-based wind-speed models. [e.g., Batts et al., 1980, Vickery et al., 2000a,
2008].
• Step 3: Monte Carlo simulate the total loss due to the wind speeds.
• Step 4: Estimate the probability of exceeding various loss levels using the loss esti-
mates from Step 3.
Most hurricane wind-speed prediction models developed in the past are deterministic,
and the uncertainties in wind fields have been analyzed in few research works. (In this
chapter, the wind field denotes the collection of peak wind speeds (over the duration of
the hurricane) at all the sites of interest. The peak wind speed at a site (similar to peak
CHAPTER 9. PROBABILISTIC HURRICANE RISK ASSESSMENT 190
ground acceleration for earthquakes) is often the hurricane intensity measure used to esti-
mate probable losses [e.g., Jarvinen et al., 1984, Li and Ellingwood, 2009].) One notable
exception is the work of Vickery et al. [2009b], who computed the uncertainties in the wind
fields (maximum peak gust wind speed) using observed and predicted (by the Vickery et al.
[2008] wind-field model) wind speeds during 24 different hurricanes. They found that the
ratio of the observed wind speeds to the predicted wind speeds has a mean of one and and
a coefficient of variation of 0.1. Chapter 5 of this thesis illustrated that ignoring the un-
certainties in the earthquake ground motions can lead to inaccurate lifeline risk estimates.
This study demonstrates the potential for comparable inaccuracies caused by ignoring the
uncertainties in wind fields during hurricane risk assessments. The seismic risk assess-
ments described in Chapter 5 of this thesis also illustrated the importance of considering
spatial correlation in ground motion fields for obtaining accurate risk estimates. To the
author’s knowledge, the spatial correlation in hurricane wind fields has not been studied in
the literature.
Assume that a probabilistic hurricane wind-speed model takes the following form:
ln(Vi) = ln(Vi)+ εi (9.1)
where Vi denotes the observed peak wind speed at site i, Vi denotes the predicted (by the
wind-field model) median peak wind speed at site i, and εi denotes the residual (error
term). For clarity, the spatial correlation mentioned in this chapter refers to the correlation
between the residuals (ε’s) at two different sites. There is a significant amount of corre-
lation between the wind speeds at two closely-spaced sites during a hurricane (which was
considered by Legg et al. [2010]), but a large portion of this correlation is accounted by the
wind-speed model, which predicts similar wind speeds at sites close to one another. The
residuals (εi’s) are correlated as well, and this correlation is of interest in this study. (Chap-
ter 3 of this thesis discusses the comparable concept of spatial correlation for earthquake
ground-motion fields in detail.) Causes of this correlation include common source effects
and similarity in topography- and land friction-related effects.
It is of interest to quantify the uncertainties and the spatial correlation in wind fields. In
this exploratory study, two sample hurricanes are used, with the primary goals of obtaining
CHAPTER 9. PROBABILISTIC HURRICANE RISK ASSESSMENT 191
approximate estimates for these parameters and illustrating the tools and methods that can
be used for the estimation. Further, a sample hurricane risk assessment is carried out for a
hypothetical portfolio of buildings in order to illustrate the importance of considering the
uncertainties and the spatial correlation in the risk assessment process.
9.3 Spatial correlation estimation methodology
In this chapter, the uncertainties and the spatial correlation in hurricane wind fields are
empirically estimated using recorded hurricane wind speeds. The wind-field uncertainties
are quantified using the mean and the variance of the residuals (ε’s). The residuals are
computed from recorded hurricane wind speeds using Equation 9.1, where the wind-speed
predictions are obtained from the Batts et al. [1980] model. This model is chosen in this
work for its simplicity, and the analyses performed using this simple model can be repeated
with a more rigorous model [e.g., Vickery et al., 2008] if desired. The spatial correlations
between the residuals are estimated using well-established geostatistical tools [Deutsch and
Journel, 1998, Goovaerts, 1997] that were previously used in Chapter 3 for quantifying the
spatial correlation in ground motion fields. These tools are described briefly in this section
supplement the more detailed discussion in Chapter 3.
Let ε denote the normalized residual, estimated as follows:
εi =εi
σ(9.2)
where σ denotes the standard deviation of the residual.
The correlation structure of εi (equivalently, that of εi) can be represented using a semi-
variogram, which represents the dissimilarity between the εi’s. Let u and u′ denote two
sites separated by distance vector hhh, and εu denote the residual at site u. The semivari-
ogram (γ(u,u′)) is defined as follows:
γ(u,u′) =12
E[{εu− εu′}2] (9.3)
where E(.) denotes the expectation operator.
The semivariogram defined in Equation 9.3 is location-dependent, and its inference
CHAPTER 9. PROBABILISTIC HURRICANE RISK ASSESSMENT 192
requires repetitive realizations of εi at locations u and u′. Such repetitive measurements
are, however, never available in practice. Hence, it is typically assumed that the semivari-
ogram does not depend on site locations u and u′, but only on their separation hhh to obtain
a stationary semivariogram. The stationary semivariogram (γ(hhh)) can then be estimated as
follows:
γ(hhh) =12[E{εu− εu+h}2] (9.4)
A stationary semivariogram is said to be isotropic if it is a function of the separation dis-
tance (h = ‖hhh‖) rather than the separation vector hhh. An isotropic, stationary semivariogram
can be empirically estimated from a data set as follows:
γ(h) =1
2N(h)
N(h)
∑α=1{εuα− εuα+h}2 (9.5)
where γ(h) is the experimental stationary isotropic semivariogram (estimated from a data
set); N(h) denotes the number of pairs of sites separated by h; and {εuα, εuα+h} denotes the
α’th such pair.
When empirically estimated, γ(h) only provides semivariogram values at discrete val-
ues of h, and hence, a continuous function is usually fitted to the discrete values to obtain
the semivariogram for continuous values of h. There are only a few permissible continuous
functions that ensure that the covariance matrices estimated using these semivariograms
are positive definite [Goovaerts, 1997]. The current study uses the permissible Gaussian
semivariogram (shown below), which is seen to provide the best fit to the empirical semi-
variogram values.
γ(h) = a[1− exp
(−3h2/b2)] (9.6)
where a denotes the ‘sill’ of the semivariogram (which in this case equals one, the variance
of the normalized residuals) and b denotes the ‘range’ of the semivariogram (which equals
the separation distance h at which γ(h) equals 0.95a).
CHAPTER 9. PROBABILISTIC HURRICANE RISK ASSESSMENT 193
It can be theoretically shown that the spatial correlation function (ρ(h)) for the normal-
ized residuals can be computed from the semivariogram function as follows:
γ(h) = a(1−ρ (h)) (9.7)
Hence, it can be seen that the correlations are completely defined by the semivariogram,
which in turn, is a function only of the range. (The sill is known to equal 1, the variance
of the normalized residuals for which the semivariogram is constructed.) Moreover, note
from equations 9.5 and 9.7 that a larger range implies a smaller rate of increase in γ(h) with
h, and subsequently, a smaller rate of decay of correlation with separation distance.
9.4 Results and Discussion
9.4.1 Data source
The analyses performed in this study use ‘recorded’ wind-speed information from two hur-
ricanes, namely, Hurricane Jeanne (2004) and Hurricane Frances (2004). In both cases,
information about hurricane-related parameters such as central pressure, storm position,
direction and translation speed are obtained from the six hour position data provided by the
HURDAT database [Jarvinen et al., 1984]. The ‘recorded’ wind-speed data are obtained
from the Hurricane Research Division (HRD) H*Wind program [Powell et al., 1996]. The
primary data comes from the Air Force Reserves (AFRES) reconnaissance flight-level ob-
servations reduced from near 3 km to the surface with a boundary layer model Powell
[1980]. Other data sources include ships, buoys, Coastal-Marine Automated Network (C-
MAN) observations, airport observations including Automated Surface Observing Stations
(ASOS), and supplemental data collected after landfall from public and private sources
[Powell and Houston, 1997]. Additional data over sea is collected by deploying ‘drop-
windsondes’ from aircrafts that drift down on a parachute measuring vertical profiles of
pressure, temperature, humidity and wind as they fall [Aberson and Franklin, 1999]. The
wind-speed data are quality controlled and processed to conform to a common framework
for height of recording (10m), exposure (open terrain) and averaging period (maximum
CHAPTER 9. PROBABILISTIC HURRICANE RISK ASSESSMENT 194
sustained 1 minute wind speed) [Powell et al., 1996, Powell and Houston, 1996]. These
data were then objectively analyzed with a technique based upon the spectral application of
finite element representation (SAFER) method [Ooyama, 1987, Franklin et al., 1993] in or-
der to obtain an interpolated grid of peak wind speeds. In past research, tests on the SAFER
methodology have indicated that the technique correctly reproduced known surface wind
fields based on the available wind observations [Houston et al., 1999].
9.4.2 Hurricane Jeanne (2004)
This section describes the uncertainties and the spatial correlation estimated using the
recorded wind field from the 2004 Hurricane Jeanne. Hurricane Jeanne formed on Septem-
ber 13, 2004 and made its landfall and stayed over Florida on September 26. In this study,
recorded hurricane and wind-field data collected between September 24-26 are used for the
analysis.
Figure 9.1a shows the observed maximum (over the duration of the hurricane) wind
speeds during Hurricane Jeanne, and Figure 9.1b shows the maximum wind speeds pre-
dicted by the Batts et al. [1980] model. The HURDAT database only provides six hour
hurricane-related data. In order to obtain a finer resolution of wind speeds over time, the
six hour hurricane data are interpolated linearly to obtain 30 minute data, which are then
used to predict (using the Batts et al. [1980] model) the wind speeds at every 30 minute
interval and subsequently the peak wind speeds over the duration of the hurricane (Figure
9.1b). Figure 9.1c shows the residuals computed using Equation 9.1. As mentioned ear-
lier, the Batts et al. [1980] model is chosen in this study primarily for its simplicity. It is,
however, seen to predict wind speeds that are biased as a function of the closest distance of
the site from the hurricane track (denoted di for site i). This is illustrated by Figure 9.2a,
which shows the residuals as a function of the d’s. This plot indicates that, in general, the
Batts et al. [1980] model under-predicts wind speeds at sites far away from the hurricane
track, and over-predicts wind speeds at sites close to the hurricane track. The newer wind-
speed models have smaller biases on account of the availability of larger data sets and better
model development techniques. Therefore, in order to prevent the Batts et al. [1980] model
bias from affecting the uncertainty and the correlation estimates, a simple bias correction
CHAPTER 9. PROBABILISTIC HURRICANE RISK ASSESSMENT 195
Figure 9.1: Hurricane Jeanne: (a) Observed wind speeds (b) Predicted wind speeds (c)Residuals (d) Bias-corrected residuals.
CHAPTER 9. PROBABILISTIC HURRICANE RISK ASSESSMENT 196
Figure 9.2: Residuals and bias-corrected residuals versus closest distances from the hurri-cane track.
is performed before analyzing the residuals.
Since the plot between the ε’s and the d’s (Figure 9.2a) shows a linear trend, a bias
correction factor is obtained using a linear regression between ε and d. The bias correction
is then added to the predicted wind speeds in order to eliminate the bias. Figure 9.2b shows
the residuals obtained after the bias correction (denoted ε in the rest of the chapter). Some
minor local trends can still be seen between the residuals and the closest distances, but these
are reasonably insignificant compared to the overall trend seen in Figure 9.2a. (It might be
possible to employ other bias correction techniques to completely eliminate the trends, but
this is not done in this exploratory study. Further this may not be necessary while using the
more recent wind-speed models.) The scatter in Figure 9.2b is the bias-corrected ‘aleatory’
uncertainty in the wind-speed predictions.
The histogram and the normal quantile-quantile (QQ) plot of the ε (the normal QQ plot
is estimated after dividing the ε’s by their standard deviation) are shown in Figure 9.3. The
figure shows that the residuals have a heavier upper tail than the normal distribution. But
normality holds reasonably well until a normalized ε value of 2. In the rest of the chapter,
the residuals are assumed to follow a normal distribution for simplicity during simulation,
though this assumption should be verified using data from other recorded hurricanes and
CHAPTER 9. PROBABILISTIC HURRICANE RISK ASSESSMENT 197
Figure 9.3: (a) Histogram of bias-corrected residuals estimated using the Hurricane Jeannedata (b) Normal QQ plot of normalized bias-corrected residuals from Hurricane Jeanne.
particularly, using newer wind-speed models. The ε’s have mean zero (on account of the
bias correction) and standard deviation 0.15, which roughly agrees with the coefficient of
variation of 0.1 reported by Vickery et al. [2008] (noting that the coefficient of variation of
the multiplicative error term defined by Vickery et al. [2008] is comparable to the standard
deviation of ε , if the ε’s reasonably follow a normal distribution). This standard deviation is
much smaller than that of the total residuals computed from ground-motion fields (∼ 0.6),
but is not negligible.
Figure 9.4 shows the semivariogram computed using the ε’s. The experimental semi-
variogram values are fitted using the Gaussian model shown in Equation 9.6, with a range
of 170km. The Gaussian fit is more appropriate here (as compared to the exponential fit
used for earthquake ground-motion intensity semivariograms) on account of the smoothly-
varying wind-speed residual field. The range of 170km is chosen so that the fit is better
at short distances (≤ 20km), even if this requires some misfit with empirical data at large
separation distances. As described in Chapter 3, this is because it is more important to
model the semivariogram structure well at short separation distances. Large separation dis-
tances are associated with low correlations, which thus have relatively little effect on joint
distributions of ground motion intensities. In addition to having low correlation, widely
separated sites also have little impact on each other due to an effective ’screening’ of their
CHAPTER 9. PROBABILISTIC HURRICANE RISK ASSESSMENT 198
Figure 9.4: Semivariogram of bias-corrected residuals estimated using the HurricaneJeanne data.
influence by more closely-located sites (Goovaerts, 1997). It can be seen from Figure 9.4
that the extent of spatial correlation is much larger than what was seen from the earthquake
data [Jayaram and Baker, 2009a]. This is not surprising since hurricane wind speeds are
less influenced by factors such as local heterogeneities that reduce the spatial correlation in
ground-motion fields.
9.4.3 Hurricane Frances (2004)
Hurricane Frances formed on August 24, 2004 and made its landfall in Florida on Septem-
ber 4. In this study, recorded hurricane and wind-field data collected between September
4-6 are used for the analysis. The hurricane track data and the recorded wind speeds are
obtained from the HURDAT database and the HRD respectively. The peak wind speed
predictions are obtained using the Batts et al. [1980] model. Figure 9.5 shows the bias
corrected residuals obtained from the Hurricane Frances recordings.
CHAPTER 9. PROBABILISTIC HURRICANE RISK ASSESSMENT 199
Figure 9.5: Bias-corrected residuals estimated using the Hurricane Frances data.
Figure 9.6: (a) Histogram of bias-corrected residuals estimated using the Hurricane Francesdata (b) Normal QQ plot of normalized bias-corrected residuals from Hurricane Frances.
CHAPTER 9. PROBABILISTIC HURRICANE RISK ASSESSMENT 200
Figure 9.7: Semivariogram of bias-corrected residuals estimated using the HurricaneFrances data.
Figure 9.6 shows the histogram and the normal QQ plot of the ε’s. The QQ plot in-
dicates that the residuals are more heavy tailed than the normal distribution beyond plus-
minus two standard deviations. The ε’s have mean zero (on account of the bias correc-
tion) and standard deviation 0.13. Figure 9.7 shows the semivariogram computed using the
ε’s. The experimental semivariogram values are fitted using the Gaussian model shown in
Equation 9.6, with a range of 130km.
9.4.4 Hurricane risk assessment of a hypothetical portfolio of build-ings
This section describes the simulation-based hurricane risk assessment of a hypothetical
portfolio of buildings, and illustrates the importance of modeling the uncertainties and the
spatial correlation in the wind fields for obtaining accurate risk estimates. The portfolio
considered here consists of five two-story residential buildings (gable roof, 6d roof sheath-
ing nails, shingle roof cover, wood frames, two-nailed roof/wall connections, no garage)
CHAPTER 9. PROBABILISTIC HURRICANE RISK ASSESSMENT 201
Figure 9.8: Portfolio of five residential buildings considered in the risk assessment.
located in Palm Bay, Florida at coordinates [-80.6,28], [-80.7,28], [-80.5,27.9], [-80.6,27.9]
and [-80.5,27.8] (Figure 9.8). The replacement value of each building is assumed to be
$1,000,000. It is of interest in this study to evaluate the exceedance rates of post-Hurricane
Jeanne losses to this portfolio. The steps involved in the risk assessment procedure are
described below.
• Step 1: Using the wind-speed model of Batts et al. [1980], the parameters of Hur-
ricane Jeanne obtained from HURDAT, and the bias correction of Section 9.4.2, the
wind speeds are predicted at all the sites of interest.
• Step 2: The residuals (ε in Equation 9.1) at the sites of interest are assumed to follow
a multivariate normal distribution with mean zero and standard deviation 0.15, based
on the findings in this work. The spatial correlation is defined by the Gaussian model
in Equation 9.6 with a range of 150km. The residuals are simulated at the sites of
interest using this distribution. The simulation approach is described in detail in
Chapter 4.
• Step 4: The predicted wind speeds and the simulated residuals are combined using
Equation 9.1 to obtain realizations of the wind speeds at all the sites of interest (i.e.,
a simulated wind field).
• Step 5: The building losses due to each simulated wind field are evaluated using
damage functions provided by HAZUS [2006]. These damage functions provide an
estimate of the mean damage ratio (ratio of loss sustained to the replacement cost) as
a function of the peak gust wind speed experienced during the hurricane (in this case,
CHAPTER 9. PROBABILISTIC HURRICANE RISK ASSESSMENT 202
the simulated peak wind speed at the building site). It is to be noted that the damage
functions provided by HAZUS [2006] are deterministic [Vickery et al., 2009a]. For
the purposes of this exploratory study these deterministic damage functions are used,
but more realistic damage functions can be used with this framework if desired.
• Step 6: Obtain the probability of exceedance of various monetary loss values. The
exceedance probabilities are calculated as follows:
P(L≥ l) =1n
n
∑i=1
I(Li ≥ l) (9.8)
where P(L ≥ l) is the probability that the loss exceeds l, n denotes the number of
simulated wind fields, Li is the monetary loss associated with wind field i, and I(Li ≥l) is an indicator variable that equals one if Li exceeds l and zero otherwise.
It is to be noted that the steps described above do not include steps 1 and 2 listed
in Section 3.2 since the risk assessment carried out post-Hurricane Jeanne does not
require the simulation of hurricane paths and other hurricane-related parameters (The
recorded Hurricane Jeanne parameters are directly used.)
The exceedance probabilities obtained for the portfolio are shown in Figure 9.9. Also
shown in the figure are the exceedance probabilities obtained by ignoring the spatial corre-
lation between the residuals when performing the simulation in Step 2. It can be seen that
ignoring the spatial correlation results in an overestimation of the probability of exceeding
small losses and an underestimation of the probability of exceeding large losses. The ex-
tent of the overestimation and the underestimation will be smaller if the uncertainties in the
damage function are considered, but the risk estimates will nevertheless be inaccurate. Sev-
eral past risk assessments have completely ignored the uncertainty in the wind fields (i.e.,
predicted wind speeds are used, and the residuals are ignored). The loss estimate obtained
in this deterministic case is shown by the vertical line in Figure 9.9. It is seen that the loss
estimate at moderately large probabilities of exceedances can be significantly smaller than
some of the probable loss estimates obtained when the residuals and the correlations are
considered.
CHAPTER 9. PROBABILISTIC HURRICANE RISK ASSESSMENT 203
Figure 9.9: Portfolio loss exceedance probabilities.
9.5 Limitations and research needs
This section lists some of the challenges and needs in hurricane risk assessment research,
and discusses the limitations of the approach proposed in this chapter.
One of the primary concerns in developing empirical models related to hurricane wind
speeds is the availability of reliable wind speed data from past hurricanes for use in model
development. The HRD H*Wind program partly alleviates the concerns by processing data
from a multitude of data sources including from low flying aircraft, ships, buoys, airport
observations and other public and private data sources. In addition, the use of dropwind-
sondes improves the overall data quality over sea. Nevertheless, boundary layer models are
used to convert the collected data to a common framework for height of recording (10m),
exposure (open terrain) and averaging period (maximum sustained 1 minute wind speed),
and interpolation algorithms (SAFER) are used to estimate wind speeds over a grid of
points. Therefore, the wind fields developed by HRD are not entirely empirical, but rather
involve the use of additional algorithms which can have an impact on the data quality.
The current study uses hurricane recordings from only two hurricanes for quantifying
the hurricane wind-speed uncertainties and spatial correlations. The Batts et al. [1980]
CHAPTER 9. PROBABILISTIC HURRICANE RISK ASSESSMENT 204
wind-speed model was chosen primarily for its simplicity, but it has known limitations.
For instance, the model does not consider the reduction in wind speeds attributable to land
friction, but rather assumes a constant 15% reduction in the wind speed when the hurri-
cane enters the land from the sea. This is likely to result in correlated prediction errors
at neighboring sites with similar levels of land friction, which will increase the estimated
value of spatial correlation. In the future, the uncertainties and the spatial correlations need
to be estimated using data from additional hurricanes, using a newer and a more rigorous
wind-speed model such as that of Vickery et al. [2008].
In this study, a deterministic damage function obtained from HAZUS [2006] was used
for the illustrative hurricane risk assessment. A probabilistic damage function that captures
the uncertainties in the losses during a hurricane should be used in future works. This
will give a better estimate of the importance of considering wind-field uncertainties and
spatial correlation in hurricane risk assessments. The illustrative risk assessment carried
out in this work estimated the risk of a portfolio of buildings. Further research is required
to estimate the hurricane-based risk of lifelines such as transportation networks. The risk
assessment did not involve simulation of hurricane tracks, rather only estimates the risk
given that hurricane Jeanne had occurred. The current research only considered the wind
hazard during the hurricanes, and did not consider the flood and storm surge hazards.
9.6 Conclusions
An exploratory study was carried out to investigate the extension of the seismic hazard
and risk assessment concepts and techniques discussed in the earlier chapters to hurricane
hazard and risk modeling. The study focused on quantifying the uncertainties and the spa-
tial correlation in hurricane wind fields (using techniques that were used to quantify these
parameters in earthquake ground motion fields), and evaluating their impact on the hurri-
cane risk of spatially-distributed systems. Hurricane wind-speed predictions were obtained
for two sample hurricanes, Hurricane Jeanne and Hurricane Frances, using the Batts et al.
[1980] wind-speed model, and the uncertainties in these predictions were evaluated us-
ing actual wind-speed recordings. The wind-speed residuals had a standard deviation of
approximately 0.15, indicating that the uncertainties are not negligible. The wind-speed
CHAPTER 9. PROBABILISTIC HURRICANE RISK ASSESSMENT 205
uncertainties at two sites were seen to be correlated, with the correlation decaying as a
Gaussian function of the separation between the sites.
Finally, the impact of the wind-speed uncertainties and the spatial correlation on the
hurricane risk of a spatially-distributed system was illustrated by a sample risk assessment
of a hypothetical portfolio of buildings. It was seen that ignoring the uncertainties or the
correlations results in an overestimation of the probability of exceedance of small losses
and an underestimation of the probability of exceedance of large losses.
Chapter 10
Conclusions
This study focused on developing a computationally-efficient framework for the seismic
risk assessment of lifelines (infrastructure systems). Two important challenges in the seis-
mic risk assessment of a lifeline as compared to that of a single structure are the quantifi-
cation of the ground-motion hazard over a region rather than at just a single site and the
minimization of the computational burden associated with lifeline performance evaluations
(Figure 10.1). Contributions have been made in both of these areas. The following sub-
sections briefly summarize the important findings of this work, the limitations of this work
and suggested future work related to this thesis.
10.1 Contributions and practical implications
10.1.1 Joint distribution of spectral acceleration values at differentsites and/ or different periods
Risk assessment of spatially-distributed building portfolios or infrastructure systems re-
quires assumptions regarding the joint distribution of the ground-motion intensity measures
at multiple sites during the same earthquake. Chapter 2 of this thesis discussed statistical
tests that were used to examine the commonly-used assumptions of univariate normality of
logarithmic spectral acceleration values and multivariate normality of vectors of logarith-
mic spectral acceleration values computed at different sites and/or different periods. Joint
206
CHAPTER 10. CONCLUSIONS 207
Figure 10.1: Comparison of the risk assessment frameworks for (a) single structures and(b) lifelines.
CHAPTER 10. CONCLUSIONS 208
normality of logarithmic spectral accelerations was verified by testing the multivariate nor-
mality of inter-event and intra-event residuals. Univariate normality of inter-event and
intra-event residuals was studied using normal Q-Q plots. The normal Q-Q plots showed
strong linearity, indicating that the residuals are well represented by a normal distribution
marginally. No evidence was found to support truncation of the marginal distribution of
intra-event residuals as is sometimes done in PSHA.
Using the Henze-Zirkler test, the Mardia’s test of skewness and the Mardia’s test of
kurtosis, it was shown that inter-event and the intra-event residuals at a site, computed at
different periods, follow multivariate normal distributions. The normality test of Goovaerts
was used to illustrate that pairs of spatially-distributed intra-event residuals can be rep-
resented by the bivariate normal distribution. For a set of observed spatially-distributed
data, it is practically impossible to ascertain the trivariate normality and the normality at
higher dimensions and hence, the presence of univariate and bivariate normalities was con-
sidered to indicate multivariate normality of the spatially-distributed intra-event residuals
[Goovaerts, 1997].
10.1.2 Spatial correlation model for spectral accelerations
The ground-motion models that are used for site-specific hazard analysis do not provide in-
formation on the spatial correlation between ground-motion intensities, which is required
for the joint prediction of intensities at multiple sites. Chapter 3 described a spatial cor-
relation model that has been developed from recorded ground-motion time histories using
geostatistical tools. The correlation decreases with increasing separation between the sites,
and this correlation structure can be modeled using semivariograms. A semivariogram is
a measure of the average dissimilarity between the data, whose functional form, sill and
range uniquely identify the ground-motion correlation as a function of separation distance.
Ground motions observed during the Northridge, Chi-Chi, Big Bear City, Parkfield, Alum
Rock, Anza and Chino Hills earthquakes were used to compute the correlations between
spatially-distributed spectral accelerations, at various spectral periods. It was seen that the
rate of decay of the correlation with separation typically decreases with increasing spec-
tral period. It was reasoned that this could be because long period ground motions at two
CHAPTER 10. CONCLUSIONS 209
different sites tend to be more coherent than short period ground motions, on account of
lesser wave scattering during propagation. It was also observed that, at periods longer than
2 seconds, the estimated correlations were similar for all the earthquake ground motions
considered. At shorter periods, however, the correlations were found to be related to the
site Vs30 values. It was shown that the clustering of site Vs30’s suggests larger correla-
tions between residuals. The work also investigated the commonly-used assumption of
isotropy, and it was seen using the empirical data that the correlation between Chi-Chi
and Northridge earthquake intensities show isotropy. Based on these findings, a predictive
model was developed that can be used to select appropriate correlation estimates for use in
risk assessment of spatially-distributed building portfolios or infrastructure systems.
Chapter 4 described additional tests that were carried out using simulated ground-
motion time histories [Aagaard et al., 2008] to verify the validity of commonly-used as-
sumptions in spatial correlation models such as stationarity (invariance of correlation with
spatial location) and isotropy (directional independence).The correlations were estimated
using different orientations of the time histories, namely, fault normal, fault parallel, north-
south and east-west, and were found to be similar in all four cases. The assumption of
isotropy of spatial correlations was studied using directional semivariograms, and was
found to be reasonable. The correlations were seen to be smaller than average between
sites located extremely close to the fault rupture. Intuitively, it is reasonable to expect path
effects and small-scale variations to reduce spatial correlation between ground motions at
near-fault sites. The pulse-identification algorithm of Baker [2007a] was used for identify-
ing pulse-like ground motions, and the correlations between pulse-like and non-pulse-like
ground motions were compared. For the data set used, no significant differences were
found between the correlations in these two cases.
10.1.3 Lifeline seismic risk assessment using efficient sampling anddata reduction techniques
Chapter 6 discussed an efficient Monte Carlo simulation (MCS)-based framework based
on importance sampling and K-means clustering that has been proposed for the seismic
risk assessment of lifelines. The framework can be used for developing a small, but
CHAPTER 10. CONCLUSIONS 210
stochastically-representative, catalog of ground-motion intensity maps that can be used
for performing lifeline risk assessments. The importance sampling technique was used to
preferentially sample important ground-motion intensity maps, and the K-means clustering
technique was used to identify and combine redundant maps. It was shown theoretically
and empirically that the risk estimates obtained using these techniques are unbiased.
The proposed framework was used to evaluate the exceedance rates of various travel-
time delays on an aggregated form of the San Francisco Bay Area transportation network.
Simplified transportation network analysis models were used to illustrate the feasibility of
the proposed framework. The exceedance rates were obtained using a catalog of 150 maps
generated using the combination of importance sampling and K-means clustering, and were
shown to be in good agreement with those obtained using the conventional MCS method.
Therefore, the proposed techniques can potentially reduce the computational expense of a
MCS-based risk assessment by several orders of magnitude, making it practically feasible.
The study also showed that the proposed framework automatically produces intensity maps
that are hazard consistent. Finally, the study showed that the uncertainties in ground-motion
intensities and the spatial correlations between ground-motion intensities at multiple sites
needs to be be modeled in order to avoid introducing errors into the lifeline risk calcula-
tions.
Appendix C described lifeline loss deaggregation calculations that were used to identify
the ground-motion scenarios most likely to produce exceedance of a given loss threshold
for a spatially-distributed lifeline system. Deaggregation calculations were performed to
identify the likelihoods of earthquake events that cause various levels of travel time delays
(the lifeline loss measure) in an aggregated form of the San Francisco bay area transporta-
tion network. The deaggregation calculations indicated that the ‘most-likely’ scenario de-
pends on the loss level of interest, and is influenced by factors such as the seismicity of
the region, the location of the lifeline with respect to the faults and the performance state
of the various components of the lifeline under normal operating conditions. The calcu-
lations also showed that large losses are typically caused by moderately large magnitude
events with large values of inter-event and intra-event residuals, indicating the importance
of accounting for the residuals in the loss assessment framework. Loss assessments carried
out without accounting for either the inter-event or the intra-event residuals produce biased
CHAPTER 10. CONCLUSIONS 211
loss estimates.
10.1.4 Lifeline performance assessment using statistical learning tech-niques
MCS and its variants are well suited for characterizing ground motions and computing re-
sulting losses to lifelines, but are highly computationally intensive because they involve
repeated evaluations of lifeline performance under a large number of simulated ground-
motion intensity maps. Chapter 7 explored the use of a statistical learning technique termed
Multivariate Adaptive Regression Trees (MART) to obtain an approximate relationship be-
tween the ground-motion intensities at lifeline component locations and the lifeline per-
formance. The lifeline performance predicted by this relationship can be used in place of
the actual lifeline performance (the evaluation of which is intensive) to expedite the com-
putation of several lifeline risk-related parameters. The study illustrated this approach by
developing a MART-based relationship between the ground-motion intensities at bridge lo-
cations and the network travel times in the San Francisco Bay Area transportation network,
and using it for estimating confidence intervals for the risk estimates presented in Chapter
6. It was seen that the confidence intervals obtained using the actual and the approxi-
mate performance measures match well. More generally, these approximate performance
relationships can be used in other problems such as prioritizing lifeline retrofits, whose
computational demand stems from the need for repeated performance evaluations.
10.1.5 Seismic risk assessment of spatially-distributed systems usingground-motion models fitted considering spatial correlation
Even though the risk estimates were obtained in Chapter 6 considering spatial correla-
tion, the ground-motion models that were used to predict the distribution of ground-motion
intensities are fitted assuming independence between the intra-event residuals. Chapter
8 illustrated the impact of considering spatial correlation between intra-event residuals
while developing ground-motion models. The mixed-effects algorithm of Abrahamson
CHAPTER 10. CONCLUSIONS 212
and Youngs [1992], which assumes independence between intra-event residuals, was mod-
ified to account for the spatial correlation between the intra-event residuals. This was done
by changing the likelihood function used for estimating the inter-event and the intra-event
residual variances given other model coefficients and changing the estimate of the inter-
event residual given the total residuals at multiple sites. The modified algorithm was used
to refit the Campbell and Bozorgnia [2008] ground-motion model, to illustrate the effect of
this refinement. The variance of the total residuals and the model coefficients used for pre-
dicting the median ground-motion intensity were not significantly affected by the proposed
refinement. Significant changes, however, were seen in the variance of the intra-event and
the inter-event residuals. Incorporating spatial correlation was seen to increase the intra-
event residual variance and to decrease the inter-event residual variance. These changes
have implications for risk assessments of spatially-distributed systems because a smaller
inter-event residual variance implies a lesser likelihood of simultaneously observing larger-
than-median ground-motion intensities at all sites in a region. To demonstrate this effect, a
risk assessment was performed for a hypothetical portfolio of buildings using the ground-
motion models obtained with and without accounting for spatial correlation. The results
showed that using the published inter- and intra-event variance estimates causes an over-
estimation of the exceedance rates of large losses. Ground-motion hazard and seismic risk
calculations at individual locations are unaffected by this issue.
10.1.6 Extension of proposed ground-motion modeling approaches tohurricane risk assessment
Frameworks for the risk assessment of structures and infrastructure systems under a variety
of natural and man-made hazards share many similarities. It is conceivable, therefore, that
the techniques developed for the risk assessment under one type of natural or man-made
hazard will be applicable for the risk assessment under another hazard or multi-hazard sce-
nario. Chapter 9 described an exploratory study carried out to investigate the extension
of the seismic hazard and risk assessment concepts and techniques discussed in the earlier
chapters to hurricane hazard and risk modeling. The study focused on quantifying the un-
certainties and the spatial correlation in hurricane wind fields (using techniques that were
CHAPTER 10. CONCLUSIONS 213
used to quantify these parameters in earthquake ground motion fields), and evaluating their
impact on the hurricane risk of spatially-distributed systems. Hurricane wind-speed predic-
tions were obtained for two sample hurricanes, Hurricane Jeanne and Hurricane Frances,
using the Batts et al. [1980] wind-speed model, and the uncertainties in these predictions
were evaluated using actual wind-speed recordings. The wind-speed residuals had a stan-
dard deviation of approximately 0.15, indicating that the uncertainties are not negligible.
The wind-speed uncertainties at two sites were seen to be correlated, with the correlation
decaying as a Gaussian function of the separation between the sites.
Finally, the impact of the wind-speed uncertainties and the spatial correlation on the
hurricane risk of a spatially-distributed system was illustrated by a sample risk assessment
of a hypothetical portfolio of buildings. It was seen that ignoring the uncertainties or the
correlations results in an overestimation of the probability of exceedance of small losses
and an underestimation of the probability of exceedance of large losses.
10.2 Limitations and future work
10.2.1 Spatial correlation model for spectral accelerations
Chapter 3 presented a spatial correlation model for spectral accelerations developed using
recorded ground motions. The model was developed assuming stationarity (location inde-
pendence) of correlations. Tests for stationarity carried out in Chapter 4 using simulated
time histories showed that the correlation between spectral accelerations at two sites that
are close to the rupture (within 10km) is smaller than the correlation between spectral ac-
celerations at sites that are farther away from the rupture. This is probably because path
effects and small-scale variations near the rupture reduce the spatial correlation between
ground-motion intensities at near-fault sites. This implies that the assumption of stationar-
ity does not completely hold. In the future, it is important to verify this observation using
recorded ground-motion time histories, and develop a correlation model for near-fault sites,
if required. The tests for isotropy indicated that the assumption of isotropy is reasonable
on average. It might be possible that the correlations are stronger along certain directions
in certain locations even if not on average. For instance, it might be reasonable to expect
CHAPTER 10. CONCLUSIONS 214
strong correlation between residuals in the direction of propagation of waves, particularly
at near-fault sites [e.g., Walling, 2009]. This needs to be investigated in more detail in the
future.
While developing the correlation models, priority is placed on building models that fit
the empirical data well at short distances, even if this requires some misfit with empirical
data at large separation distances, because it is more important to model the semivariogram
structure well at short separation distances. This is because widely separated sites also have
little impact on each other due to an effective ’screening’ of their influence by more closely-
located sites (Goovaerts, 1997). In special cases where there are very few closely spaced
points (less than 10, according to Goovaerts, 1997), the influence of farther away points
will not be completely screened. In such cases, the correlation model developed in this
study might provide slightly inaccurate correlation estimates. It is, however, to be noted
that this is mitigated by the fact that the large separation distances are associated with low
correlations, which thus have relatively little effect on joint distributions of ground motion
intensities.
The correlation studies carried out in Chapters 3 and 4 treated the residuals as random
variables. In reality, though, the residuals are related to other unmeasured and unaccounted
(completely or partially) for physical effects such as directivity, basin effects and local site
effects. If these physical effects are directly modeled in the risk assessments (as part of
the mean ground-motion intensity prediction), the models for spatial variability and spatial
correlation should be modified accordingly.
The study in Chapter 3 identified an empirical link between the extent of spatial cor-
relation and the local-site conditions. The link between spatial correlation and other site-
and earthquake-related parameters such as magnitude and faulting mechanism were not in-
vestigated on account of the limited availability of well-processed recorded ground-motion
data sets. In the future, these links can possibly be investigated using simulated ground
motions.
Chapters 3 and 4 focused on simultaneously estimating the spatial correlations between
spectral accelerations at the same period. There are scenarios where spectral accelerations
at multiple periods (or in general, multiple intensity measures) need to be used for assessing
the lifeline risk, in which case the consideration of spatial cross-correlations (correlations
CHAPTER 10. CONCLUSIONS 215
between two different intensity measures) becomes important. This scenario arises, for
instance, when the risk assessment is carried out for a portfolio comprising of structures
with different fundamental periods in which case the risk assessment is based on spectral
accelerations at multiple periods. For instance, damage to tall buildings is better predicted
using spectral accelerations at a long period (say, T1) as compared to short buildings whose
damage due to ground shaking is more correlated with spectral accelerations at a short pe-
riod (say, T2). In such cases, the spatial cross-correlation between εi(T1) and ε j(T2) should
be considered in order to account for the likelihood of observing jointly large spectral ac-
celerations at sites i and j (the same way spatial correlations should be considered when the
spectral accelerations at a single period are used at both sites). Appendix A provided a brief
summary of the technical framework behind the estimation of spatial cross-correlation. In
the future, it is important to develop a cross-correlation model between spectral accelera-
tions at different periods using recorded and simulated time histories.
Chapter 8 described a ground-motion model fitting algorithm that can be used for devel-
oping ground-motion models considering the spatial correlation. This algorithm estimates
the inter-event and the intra-event residual variances using a maximum likelihood frame-
work. Future work can focus on estimating spatial correlation and cross-correlation in
addition to these variances while fitting ground-motion models, by extending the current
maximum likelihood framework to consider the correlation as an unknown parameter as
well.
10.2.2 Lifeline risk assessment
The MCS-based lifeline seismic risk assessment framework proposed in this study was il-
lustrated by assessing the seismic risk of an aggregated model of the San Francisco Bay
Area transportation network. An aggregated network was used because analyzing the per-
formance of a network as large and complex as the San Francisco Bay Area transportation
network under a large number of scenarios (which was be required while implementing the
CHAPTER 10. CONCLUSIONS 216
benchmark conventional MCS framework) is extremely computationally intensive. Net-
work aggregation has been used in several fields of research while assessing the perfor-
mance of large complex networks such as social networks, internet and transportation net-
works. The performance of an aggregated higher-scale network is then used for decision
making on the actual lower-scale network. Future work by the author will further explore
the opportunity to develop new methods to systematically aggregate networks, particularly
for risk assessment purposes, for obtaining significant computational savings.
The importance sampling (IS) technique proposed in Chapter 6 involves sampling large
values of inter-event and intra-event residuals in order to capture the upper tail of the life-
line loss curve accurately. The technique requires determination of the means (referred
to as mean-shifts in Section 6.3.3) of the inter- and intra-event residual sampling distribu-
tions. Large values of the means will result in realizations of large values of inter- and
intra-event residuals (i.e., realizations from the upper tail). It is, however, important to
choose a reasonable value for the means to ensure adequate preferential sampling of large
residuals, while avoiding sets of extremely large residuals that will make the simulated
residuals so improbable as to be irrelevant. Chapter 6 also provided guidance of choosing
the sampling means based on the network in consideration. Using larger or smaller than
optimal means could increase the variance of the risk estimates, and additional research is
required to investigate this effect in detail. It is to be noted that the K-means data reduction
algorithm (applied after the importance sampling step as described in Section 6.5) elimi-
nates redundant importance sampling maps and therefore, has the potential to retain a good
proportion of maps that are relevant to the risk assessment even if IS samples inefficiently.
This, however, needs to be verified in detail through additional research.
The current study did not consider the dependence between component performances,
which may arise between two components constructed by the same contractor (similar
workmanship and material quality) and/ or subjected to the similar material degradations
due to natural environmental fluctuations over time [Lee and Kiremidjian, 2007]. The
current study also did not consider the deterioration of the structural performance of lifeline
components with time.
The risk assessment framework is reasonably general, and can potentially be used to
estimate the risk of a variety of lifelines. The current work, however, only considered a
CHAPTER 10. CONCLUSIONS 217
single isolated infrastructure system (a transportation network). There has been significant
research interest of late in the risk assessment of multiple infrastructure systems (such as
power distribution networks and water distribution networks), with consideration of the in-
terdependencies between the different systems. MCS-based risk assessment frameworks
are conducive to modeling interdependencies, but additional research is required to inves-
tigate this further.
This study used a rate independent earthquake hazard model provided by USGS [2003].
Background seismicity was not considered. Additional research is required to incorporate
rate dependent models and background seismicity in the ground-motion sampling proce-
dure described in Chapter 6. Further, future studies by the author will use seismicity models
provided in more recent USGS reports.
The primary objective of the transportation network risk assessment described in Chap-
ter 6 is to illustrate the effectiveness of the proposed efficient risk assessment framework. A
simplified transportation network model was used for evaluating pre-event and post-event
network performances. These simplifications are identified below.
The current work assumed for simplicity that the post-earthquake demands equal the
pre-earthquake demands, even though this is known not to be true [Kiremidjian et al.,
2003]. The changes in network performance after an earthquake were assumed to be due
only to the delay and the rerouting of traffic caused by structural damage to bridges. The
damage states of the bridges were computed considering only the ground shaking, and other
possible damage mechanisms such as liquefaction and landslides were not considered. The
development of the cross-correlation model discussed earlier will allow the consideration of
multiple types of intensity measures that are required for estimating the damage considering
secondary hazards such as liquefaction and landslides.
The bridge fragility curves provided by HAZUS [1999] were used to estimate the prob-
ability of a bridge being in a particular damage state (no damage, slight damage, moderate
damage, extensive damage and collapse) based on the simulated ground-motion intensity
(spectral acceleration at 1 second) at the bridge site. It was assumed that the bridge fragility
functions can be used to analyze even long span bridges such as a the Golden Gate bridge,
which may not provide realistic results. Ongoing work by the author focuses on devel-
oping a procedure to incorporate the use of ground-motion time histories (instead of only
CHAPTER 10. CONCLUSIONS 218
ground-motion intensities) in the risk assessment framework.
Damaged bridges cause reduced capacity in the link containing the bridge. The study
assumed that the reduced capacities corresponding to the five different HAZUS damage
states as 100% (no damage), 75% (slight damage/ moderate damage) and 50% (extensive
damage/ collapse). (It is to be noted that the current study did not model the possible
increase to the free-flow travel times on damaged links.) The non-zero capacity corre-
sponding to the bridge collapse damage state may seem surprising at first glance. This was
based on the argument that there are alternate routes (apart from the freeways and high-
ways considered in the aggregated model used in this study) that provide reduced access to
transportation services in the event of a freeway or a highway closure [Shiraki et al., 2007].
Such redundancies are prevalent in most transportation networks, but the precise impact of
the redundancy on the capacity of the links in the aggregated model should be studied in
more detail in the future.
A network can have several bridges in a single link, and in such cases, the link capacity
is a function of the damage to all the bridges in the link. The current work assumed that the
link capacity reduction equals the average of the capacity reductions attributable to each
bridge in the link. This is a simplification, and further research is needed to handle the
presence of multiple bridges in a link. The post-earthquake network performance was then
computed by solving the user-equilibrium problem using the new set of link capacities,
and a new estimate of the total travel time in the network was obtained. It is to be noted
that the current work estimated the performance of the network only immediately after an
earthquake. The changes in the performance with network component restorations were
not considered here for simplicity.
It is to be noted that the framework can be used with more accurate and rigorous trans-
portation network models, if desired, but more work is needed to study and overcome
challenges that may arise.
10.2.3 Risk management
One of the important goals of lifeline risk assessment is risk management, by, for example,
retrofitting lifeline components in order to reduce the adverse impact of the earthquakes on
CHAPTER 10. CONCLUSIONS 219
the lifeline. Prioritizing lifeline retrofit is extremely computationally intensive due to the
numerous components present in a lifeline, and on account of the need to evaluate the per-
formance of each possible retrofit scheme under several possible future earthquake scenar-
ios. The computational demand can be reduced using the efficient MCS-based framework
proposed in this work (Chapter 6), in combination with the use of statistical learning tech-
niques to efficiently model network performance (Chapter 7). This needs to be explored in
the future.
10.2.4 Multi-hazard risk assessment
In order to illustrate the application of the proposed ground-motion modeling techniques
to modeling other hazards, the seismic hazard and risk assessment concepts and techniques
discussed in this thesis were applied to hurricane hazard and risk modeling in Chapter 9.
(It is to be noted that the hurricane was the only alternate hazard considered in this work.
It is possible that the challenges that arise in extending the seismic hazard concepts to
modeling other hazards may vary from one hazard to another, and further research is needed
to investigate this.) Many simplifying modeling assumptions were made in this exploratory
study on hurricane risk assessment. The primary simplifying assumptions include the use
of data from only two hurricanes, the use of the simplified Batts et al. [1980] model (which
does not consider the reduction in wind speeds attributable to land friction) and the use of
the deterministic HAZUS [2006] fragility function. Some concerns are also present on the
quality of the wind speed recordings provided by the NOAA Hurricane Research Division
H*Wind program. A more detailed discussion of the limitations and potential future works
connected to the current study can be found in Chapter 9.
10.3 Concluding remarks
The study quantified the distribution of earthquake ground-motion intensities over a re-
gion, which is required for the risk assessment of lifelines. A computationally-efficient
Monte Carlo sampling technique was proposed to evaluate the lifeline seismic risk with
full consideration of the uncertainties and the spatial correlation in ground-motion fields.
CHAPTER 10. CONCLUSIONS 220
Given the effectiveness of the framework when applied to the simplified lifeline model used
here, future research appears warranted to study its use with more realistic lifeline models,
and extend it to quantify the risk of multiple interdependent infrastructure systems under
other hazard and multi-hazard scenarios. Further research is also necessary to utilize the
framework for prioritizing risk-mitigation solutions.
Appendix A
Characterizing spatial cross-correlationbetween ground-motion spectralaccelerations at multiple periods
N. Jayaram and J.W. Baker (2010). Spatial cross-correlation between ground-motion in-
tensities, 9th U.S. National and 10th Canadian Conference on Earthquake Engineering,
Toronto, Canada.
A.1 Abstract
Quantifying ground-motion shaking over a spatially-distributed region rather than at just
a single site is of interest for a variety of applications relating to risk of infrastructure or
portfolios of properties. The risk assessment for a single structure can be easily performed
using the available ground-motion models that predict the distribution of the ground-motion
intensity at a single site due to a given earthquake. These models, however, do not provide
information about the joint distribution of ground-motion intensities over a region, which
is required to quantify the seismic hazard at multiple sites. In particular, the ground-motion
models do not provide information on the correlation between the ground-motion intensi-
ties at different sites during a single event.
Researchers have previously estimated the correlations between residuals of spectral
221
APPENDIX A. SPATIAL CROSS-CORRELATIONS 222
accelerations at the same spectral period at two different sites. But there is still not much
knowledge about cross-correlations between residuals of spectral accelerations at different
periods (or more generally between residuals of two different intensity measures) at two
different sites, which becomes important, for instance, when assessing the risk of a portfolio
of buildings with different fundamental periods. Spatial cross-correlations are also impor-
tant when assessing the risk due to multiple ground-motion effects such as ground shaking
and liquefaction, because this involves the use of multiple types of intensity measures. This
manuscript summarizes recent research in ground-motion spatial cross-correlation estima-
tion using geostatistical tools. Recorded ground-motion intensities are used to compute
residuals at multiple periods, which are then used to estimate the spatial cross-correlation.
These cross-correlation estimates can then be used in risk assessments of portfolios of struc-
tures with different fundamental periods, and in assessing the seismic risk under multiple
ground-motion effects.
A.2 Introduction
Quantifying ground-motion shaking over a spatially-distributed region rather than at just
a single site is of interest for a variety of applications relating to risk of infrastructure or
portfolios of properties. For instance, the knowledge about ground-motion shaking over a
region is important to predict (or estimate after an earthquake) the monetary losses associ-
ated with structures insured by an insurance company, the number of casualties in a certain
area and the probability that lifeline networks for power, water, and transportation may be
interrupted. The risk assessment for a single structure requires only the quantification of
seismic hazard at a single site, which can be easily done using probabilistic seismic hazard
analysis (PSHA). The hazard is typically measured in terms of an intensity measure such as
the spectral acceleration corresponding to the building’s fundamental period (peak response
of simple single-degree-of-freedom (SDOF) oscillators with the same fundamental period
of the real structure) when the damage to a building is to be estimated. Other ground-motion
parameters such as the peak ground acceleration (PGA) or peak ground velocity (PGV) are
used for other applications such as the prediction of liquefaction of saturated sandy soil
or the response of buried pipelines. The hazard assessment procedure uses ground-motion
APPENDIX A. SPATIAL CROSS-CORRELATIONS 223
models that have been developed to predict the distribution of the ground-motion intensity
at a single site after a given earthquake. These models, however, do not provide informa-
tion on the joint distribution of ground-motion intensities over a region, which is required
to quantify the seismic hazard at multiple sites such as for lifeline risk assessment. In par-
ticular, the ground-motion models do not provide information on the correlation between
the ground-motion intensities at different sites during a single event.
In general, the ground-motion intensities at two sites are expected to be correlated for
a variety of reasons, such as a common source earthquake (whose unique properties may
cause correlations in ground motions at many sites), similar locations to fault asperities,
similar wave propagation paths, and similar local-site conditions. Modern ground-motion
models partially account for the correlation via a specific inter-event term as follows:
ln(Sai(T )) = ln(Sai(T )
)+σi(T )εi(T )+ τi(T )ηi(T ) (A.1)
where Sai(T ) denotes the spectral acceleration at period T at site i; Sai(T ) denotes the
predicted (by the ground-motion model) median spectral acceleration (which depends on
parameters such as magnitude, distance, period and local-site conditions); εi(T ) denotes the
normalized intra-event residual at site i associated with Sai(T ), η(T ) denotes the normal-
ized inter-event residual associated with Sai(T ). Both εi(T ) and ηi(T ) are random variables
with zero mean and unit standard deviation. The standard deviations, σi(T ) and τi(T ), are
estimated as part of the ground-motion model and are functions of the spectral period (T )
of interest, and in some models also functions of the earthquake magnitude and the distance
of the site from the rupture. The term σi(T )εi(T ) is called the intra-event residual, and the
term τi(T )ηi(T ) is called the inter-event residual.
Though the ground-motion models partly account for the correlation via ηi, the εi’s still
show a significant amount of residual correlation. Researchers have previously estimated
the correlations between residuals of spectral accelerations at the same spectral period (e.g.,
between εi(T ) and ε j(T )) using recorded ground motions [e.g., Boore et al., 2003, Wang
and Takada, 2005, Goda and Hong, 2008, Jayaram and Baker, 2009a]. These models have
shown that the spatial correlation decays with site separation distance between sites i and
j, and that the rate of decay is a function of the spectral period. These works, however, do
APPENDIX A. SPATIAL CROSS-CORRELATIONS 224
not investigate the nature of the spatial cross-correlation between residuals of two different
intensity measures at two different sites (e.g., between ε j(T1) and ε j(T2)).
Considering spatial correlation in risk analysis is important because correlation between
residuals can lead to large ground-motion intensities over a spatially-extended area. Recent
research has shown that ignoring spatial correlations can significantly underestimate the
seismic risk of portfolios of buildings and of other lifelines such as transportation networks
[e.g., Park et al., 2007, Jayaram and Baker, 2010]. For instance, Figure A.1 shows the
exceedance rates of earthquake-induced travel-time delays in the San Francisco Bay Area
transportation network estimated by [Jayaram and Baker, 2010] while considering/ignoring
spatial correlation. This figure shows that the likelihood of observing large delays gets sig-
nificantly underestimated when spatial correlations are ignored. Spatial cross-correlations
are equally important when multiple intensity measures are used for assessing the system
risk. This arises, for instance, when predicting damage to a portfolio of structures whose
individual damage states are predicted using spectral accelerations at multiple periods. Spa-
tial cross-correlations are also important when secondary effects such as landslides and liq-
uefaction are considered apart from ground shaking. For instance, according to [HAZUS,
1997], the susceptibility of soil to liquefy is a function of the peak ground acceleration (i.e.,
Sa(0)) at the site, which might be different from the primary intensity measure (Sa(T )) of
interest.
This manuscript summarizes recent research in ground-motion spatial cross-correlation
estimation using geostatistical tools. Recorded ground-motion intensities are used to com-
pute residuals at multiple periods, which are then used to estimate the spatial cross corre-
lation. These cross-correlation estimates can then be used in risk assessments of portfolios
of structures with different fundamental periods, and in assessing the seismic risk under
multiple earthquake effects.
A.3 Statistical Estimation of Spatial Cross-Correlation
In this study, geostatistical tools are used to estimate the spatial cross-correlations using
recorded ground-motion data from the Pacific Earthquake Engineering Research (PEER)
Center’s Next Generation Attenuation (NGA) ground-motion library.
APPENDIX A. SPATIAL CROSS-CORRELATIONS 225
Figure A.1: (a) The San Francisco Bay Area transportation network and (b) Annual ex-ceedance rates of various travel time delays on that network (results from Jayaram andBaker [2010]).
The first step involved in developing an empirical cross-correlation model using
recorded ground-motion time histories is to use the time histories to compute the corre-
sponding ground-motion intensities ({SSSaaa(T1),SSSaaa(T2), · · · ,SSSaaa(Tm)}) and the associated nor-
malized residuals ({εεε(T1),εεε(T2), · · · ,εεε(Tm)}) using a ground-motion model. The cross-
correlation structure of the residuals can then be represented by a ‘cross-semivariogram’,
which is a measure of the average dissimilarity between the data [Goovaerts, 1997]. Let
u and u′ denote two sites separated by hhh. The cross-semivariogram (γ(u,u′)) is defined as
follows:
γ(u,u′) =12[E{εu(T1)− εu′(T1)}{εu(T2)− εu′(T2)}] (A.2)
The cross-semivariogram defined in equation A.2 is location-dependent and its infer-
ence requires repetitive realizations of εεε(T1) and εεε(T2) at locations u and u′. Such repetitive
measurements are, however, never available in practice (e.g., in the current application, one
would need repeated observations of ground-motion intensities at every pair of sites of in-
terest). Hence, it is typically assumed that the cross-semivariogram does not depend on site
locations u and u′, but only on their separation hhh to obtain a stationary cross-semivariogram.
APPENDIX A. SPATIAL CROSS-CORRELATIONS 226
The stationary cross-semivariogram (γ(hhh)) can then be estimated as follows:
γ(hhh) =12[E{εu(T1)− εu+hhh(T1)}{εu(T2)− εu+hhh(T2)}] (A.3)
A stationary cross-semivariogram is said to be isotropic if it is a function of the sep-
aration distance (h = ‖hhh‖) rather than the separation vector hhh. An isotropic, stationary
semivariogram can be empirically estimated from a data set as follows:
γ(h) =1
2N(h)
N(h)
∑α=1{εuα
(T1)− εuα+h(T1)}{εuα(T2)− εuα+h(T2)} (A.4)
where γ(h) is the experimental stationary semivariogram (estimated from a data set); N(h)
denotes the number of pairs of sites separated by h; and {εuα(T ),εuα+h(T )} denotes the
α’th such pair.
The covariance structure of εεε(TTT ) is completely specified by the semivariogram function
and the sill and the range of the cross-semivariogram. It can be theoretically shown that
the following relationship can be used to estimate the cross-correlations from the cross-
semivariograms:
γ(h) = ρ12(0)−ρ12 (h) (A.5)
where ρ12(0) denotes the cross-correlation between εu(T1) and εu(T2) at the same site u
and ρ12(h) denotes the cross-correlation between εu(T1) and εu+h(T2). Therefore, it would
suffice to estimate the cross-semivariogram of the residuals in order to determine their
cross-correlations. The correlation term ρ12(0) has been estimated in the past [e.g., Baker
and Jayaram, 2008, Baker and Cornell, 2006], and this work extends these results to include
the effects of differing locations as well.
Once the cross-semivariogram values are obtained at discrete values of h, they are then
fitted using a continuous function of h for prediction purposes. In this work, the discrete
cross-semivariogram values are fitted with an exponential semivariogram which has the
following form:
γ(h) = S(
1− e−3hR
)(A.6)
where S and R denote the sill and the range of the cross-semivariogram respectively. The
APPENDIX A. SPATIAL CROSS-CORRELATIONS 227
value of the sill equals ρ12(0) (from Equations A.5 and A.6), and the range denotes the
separation distance at which the cross-correlation decays to less than 5% of the sill. Since
the values of ρ12(0) have been previously computed by Baker and Jayaram [2008], it will
suffice to estimate the range R to quantify the extent of spatial cross-correlation.
A.4 Sample Results and Discussion
This section discusses some sample cross-correlation estimates obtained using recorded
time histories from the 1999 Chi-Chi earthquake. In particular, spatial cross-correlation
estimates are computed for the 1 second and the 2 second spectral acceleration residuals
from the Chi-Chi earthquake ground motions using the geostatistical procedure described
in the earlier section. These residuals are first computed from the recorded ground motions
using the Boore and Atkinson [2008] ground-motion model, and are shown in Figure A.2a-
b. Visually, the presence of spatial cross-correlation is indicated by the similarity between
the nearby residuals across A.2a-b.
Figure A.2c shows the cross-semivariogram estimated using the above mentioned resid-
uals. An exponential function is then fitted to the discrete cross-semivariogram values, the
sill of which equals 0.7490 (which is the ρ12(0) value obtained from Baker and Jayaram
[2008]). The range of the cross-semivariogram equals 47km, and has been chosen to pro-
vide a good fit at short separation distances, although compromising on the quality of the
fit at larger separation distances. This is because it is more important to model the cross-
semivariogram structure well at short separation distances since large separation distances
are associated with low correlations, which thus have relatively little effect on joint distribu-
tions of ground motion intensities. In addition to having low correlation, widely separated
sites also have little impact on each other due to an effective ’screening’ of their influence
by more closely-located sites [Goovaerts, 1997]. A more detailed discussion on the im-
portance of fitting well at short separation distances can be found in Jayaram and Baker
[2009a].
The sample cross-semivariogram in Figure A.2 shows that the extent of spatial cross-
correlation is reasonably significant. For instance, the value of the cross-correlation equals
0.4 for sites separated by 10km and increases up to 0.75 for sites that are very close to each
APPENDIX A. SPATIAL CROSS-CORRELATIONS 228
another. As a result, it will likely be important to consider spatial cross-correlations while
studying multiple types of intensity measures distributed over a region.
Currently, the author is in the process of developing a spatial cross-correlation model
considering the residuals from multiple intensity measures using recordings from multiple
earthquakes.
A.5 Conclusions
This manuscript summarized recent research in ground-motion spatial cross-correlation
estimation using geostatistical tools. Spatial cross-correlations become important while
quantifying the distribution of different types of ground-motion intensity measures over
a region. This work used cross-semivariograms to model the cross-correlation structure.
A cross-semivariogram is a measure of dissimilarity between the data, whose functional
form (e.g., exponential function), sill and range uniquely identify the ground-motion cross-
correlation as a function of separation distance.
In this work, recorded ground-motion spectral accelerations were used to compute
residuals at multiple periods, which are then used to estimate the spatial cross-correlation.
The manuscript showed sample cross-correlation estimates obtained using the 1s and 2s
Chi-Chi earthquake residuals. The extent of the cross-correlation was found to be fairly sig-
nificant, and hence, it will likely be important to consider spatial cross-correlations while
studying the distribution of multiple types of intensity measures over a region. Currently,
the authors are in the process of developing a spatial cross-correlation model considering
the residuals from multiple intensity measures using recordings from multiple earthquakes.
Once developed, these cross-correlation estimates can be used in risk assessments of port-
folios of structures with different fundamental periods, and in assessing the seismic risk
under multiple ground-motion effects.
APPENDIX A. SPATIAL CROSS-CORRELATIONS 229
Figure A.2: (a) Chi-Chi earthquake normalized residuals computed using spectral accel-erations at 1 second (b) Chi-Chi earthquake normalized residuals computed using spectralaccelerations at 2 seconds (c) Cross-semivariogram estimated using the 1s and 2s Chi-Chiearthquake residuals.
Appendix B
Supporting details for the spatialcorrelation model developed in Chapter3
Excerpted from:
J.W. Baker and N. Jayaram (2009). Effects of spatial correlation of ground-motion param-
eters for multi-site risk assessment: Collaborative research with Stanford University and
AIR. Technical report, Report for U.S. Geological Survey National Earthquake Hazards
Reduction Program (NEHRP) External Research Program Award 07HQGR0031.
(Professor Baker was the first author of the above report as the Principal Investigator of this
project, but all the results and the writing in this appendix were produced by the author of
this thesis)
In Chapter 3, several statements were made about properties of spectral acceleration spatial
correlations that were not explained in detail in the text. In this Appendix, details to support
230
APPENDIX B. SPATIAL CORRELATION MODEL 231
those statements are presented for interested readers.
B.1 Semivariograms of residuals estimated using the
Northridge earthquake ground motions
Chapter 3 discussed the semivariogram ranges at seven periods ranging from 0 to 10s es-
timated using the Northridge earthquake recordings. Figures B.1-B.7 show the semivari-
ograms and the exponential fits obtained in these cases.
Figure B.1: Semivariogram of ε based on the peak ground accelerations observed duringthe Northridge earthquake data
APPENDIX B. SPATIAL CORRELATION MODEL 232
Figure B.2: Semivariogram of ε computed at 0.5 seconds based on the Northridge earth-quake data
Figure B.3: Semivariogram of ε computed at 1 second based on the Northridge earthquakedata
APPENDIX B. SPATIAL CORRELATION MODEL 233
Figure B.4: Semivariogram of ε computed at 2 seconds based on the Northridge earthquakedata
Figure B.5: Semivariogram of ε computed at 5 seconds based on the Northridge earthquakedata
APPENDIX B. SPATIAL CORRELATION MODEL 234
Figure B.6: Semivariogram of ε computed at 7.5 seconds based on the Northridge earth-quake data
Figure B.7: Semivariogram of ε computed at 10 seconds based on the Northridge earth-quake data
APPENDIX B. SPATIAL CORRELATION MODEL 235
B.2 Semivariograms of residuals estimated using Chi-Chi
earthquake ground motions
Chapter 3 discussed the semivariogram ranges at seven periods ranging from 0 to 10s es-
timated using the Chi-Chi earthquake recordings. This section shows the semivariograms
and the exponential fits obtained in these cases.
B.2.1 Exact versus approximate semivariogram fit
Figure B.8 shows the experimental semivariogram values at discrete separation distances,
obtained using the ε values computed at 2 seconds. The most accurate model for the
semivariogram function is a combination of a nugget effect with a contribution of 0.3 and
an exponential semivariogram with a contribution of 0.7 and a range of 85 km, which is
also shown in Figure B.8. This model can be expressed as follows:
γ(h) = 0.3I(h > 0)+0.7(1− exp(−3h/85)) (B.1)
where I(h > 0) is an indicator variable that equals 1 when h > 0 and equals 0 otherwise.
The use of a single model for all semivariograms is highly desirable in order to facilitate
development of a standard correlation model for use in future predictions. The exponential
model is seen to be accurate in most cases and hence, an approximate exponential model
is fitted even in cases where alternate accurate models are available. Hence, the semivari-
ogram function for the ε values computed at 2 seconds is approximated by an exponential
model with a range of 36 km and a sill of 1, as shown in Figure B.8. This semivariogram
function fits the data reasonably well at small separations.
B.2.2 Semivariograms of the residuals at seven periods ranging be-tween 0 and 10s
Figures B.1-B.7 show the semivariograms and the exponential fits obtained using the Chi-
Chi earthquake records.
APPENDIX B. SPATIAL CORRELATION MODEL 236
Figure B.8: Experimental semivariogram of ε computed at 2 seconds based on the Chi-Chiearthquake data. Also shown in the figure are two fitted semivariogram models: (i) Anaccurate exponential + nugget model and (ii) An approximate exponential model
Figure B.9: Semivariogram of ε based on the peak ground accelerations observed duringthe Chi-Chi earthquake data
APPENDIX B. SPATIAL CORRELATION MODEL 237
Figure B.10: Semivariogram of ε computed at 0.5 seconds based on the Chi-Chi earthquakedata
Figure B.11: Semivariogram of ε computed at 1 second based on the Chi-Chi earthquakedata
APPENDIX B. SPATIAL CORRELATION MODEL 238
Figure B.12: (Approximate) Semivariogram of ε computed at 2 seconds based on the Chi-Chi earthquake data
Figure B.13: Semivariogram of ε computed at 5 seconds based on the Chi-Chi earthquakedata
APPENDIX B. SPATIAL CORRELATION MODEL 239
Figure B.14: Semivariogram of ε computed at 7.5 seconds based on the Chi-Chi earthquakedata
Figure B.15: Semivariogram of ε computed at 10 seconds based on the Chi-Chi earthquakedata
APPENDIX B. SPATIAL CORRELATION MODEL 240
B.3 Semivariograms of residuals estimated using broad-
band simulations for scenario earthquakes on the
Puente Hills thrust fault system
Chapter 3 only discussed the correlations estimated using recorded ground motions. This
section describes the correlations between residuals computed based on broadband ground-
motion simulations for scenario earthquakes on the Puente Hills thrust fault system [Graves,
2006], which are not discussed in any of the earlier chapters. The simulated time histories
are available for five different rupture scenarios that differ in the rupture velocity and the
rise time. In this work, ground motions due to the rupture scenario defined by a rupture
velocity equaling 80% of the shear wave velocity and a rise time of 1.4 seconds are used
for the analysis. The ground-motion time histories have been simulated at 648 sites cov-
ering the Los Angeles, San Fernando and San Gabriel basin regions. The time histories at
locations with very low Vs30 values, however, were reported to be possibly inaccurate be-
cause the simulation algorithm does not yet fully account for non-linear site effects [Graves,
2007]. Hence, in the current work, only the time histories at sites with Vs30 values exceed-
ing 300m/s are considered for analysis.
Experimental semivariograms are obtained for ε’s computed at several different periods
ranging from 0 - 10 seconds. The exponential model is found to provide a good fit at periods
below 2 seconds. At longer periods, however, a spherical model provides a better fit than an
exponential model. For example, Figure B.16 shows the experimental semivariogram and
a fitted spherical model (unit sill and range equaling 32 km) based on residuals computed
at 5 seconds.
γ(h) =32
r32− 1
2
( r32
)3if h≤ 32 (B.2)
= 1 otherwise
As explained earlier, for consistency with other results, exponential models that provide
a reasonably good approximation at short separation distances (that are useful in practice)
are used to model the semivariograms. For example, the experimental semivariogram can
APPENDIX B. SPATIAL CORRELATION MODEL 241
Figure B.16: Experimental Semivariogram of ε computed at 5 seconds based on the simu-lated ground-motion data. Also shown in the figure are two fitted semivariogram models:(i) An accurate spherical model and (ii) An approximate exponential model
also be fitted with an exponential model which has a unit sill and a range of 60 km as
shown in Figure B.16. It can be seen from the figure that this exponential function models
the correlations at small separations reasonably accurately.
A plot of the range of semivariograms as a function of period is shown in Figure B.17.
The trend of increasing range with period is seen in this figure as well. The computed
ranges are reasonably similar to those seen from the Northridge earthquake data. It is to be
noted, however, that the ground-motion simulations at short periods (periods ≤ 2 seconds)
may not be entirely accurate, and hence, the ranges obtained using the Northridge and
Chi-Chi earthquake data are more reliable estimates.
APPENDIX B. SPATIAL CORRELATION MODEL 242
Figure B.17: Range of semivariograms of ε , as a function of the period at which ε valuesare computed. The residuals are obtained using the simulated ground-motion data
B.4 Clustering of Vs30’s
The semivariogram model described in Chapter 3 involves determining the presence of
Vs30 clustering (Section 3.4.5). This is best done by computing the range of the Vs30
semivariogram and comparing it to the ranges shown in Figure 3.5. In order to provide
additional guidance to users of the correlation model, Figure B.18 shows simulated mul-
tivariate normal random fields with three different levels of correlation, with mean and
variance equaling those of the Vs30’s in the San Francisco bay area region. The correlation
structure in Figure B.18a is defined by an exponential semivariogram with a range of 0km,
and is an example of heterogeneous Vs30 conditions (Case 1 in Section 3.4.5). This is indi-
cated by the lack of clustering of the Vs30’s in the figure. The correlation structure in Figure
B.18b is defined by an exponential semivariogram with a range of 20km, and is an example
of low-moderately heterogeneous conditions (Case 1 in Section 3.4.5). Figure B.18c is an
example of homogeneous Vs30 conditions, and has a correlation structure defined by an
exponential semivariogram with a range of 40km (Case 2 in Section 3.4.5). The clustering
of the Vs30 field in the region of interest (where the region is defined as the collection of
sites of interest) can be compared to that of the three maps to approximately determine the
APPENDIX B. SPATIAL CORRELATION MODEL 243
appropriate case to use.
Figure B.18: Simulated multivariate normal random fields. The correlation structure isdefined using an exponential semivariogram with range equaling (a) 0km (b) 20km and (c)40km.
APPENDIX B. SPATIAL CORRELATION MODEL 244
B.5 Correlation between near-fault ground-motion inten-
sities
Most currently available ground-motion models do not directly predict ground motions
containing strong velocity pulses, such as those caused by near-fault directivity. As a result,
the ground-motion intensities predicted by the models at sites that experience pulse-like
ground motions will be different from the observed values. Such systematic prediction
errors can increase the apparent correlation between the residuals computed at these sites.
Hence, in this section, empirical data are used to verify whether the correlation between
residuals at sites experiencing pulse-like ground motion is significantly different from the
correlation between residuals at other sites.
Baker [2007a] used wavelet analysis to extract velocity pulses from ground motions
and developed a quantitative criterion for classifying a ground motion as pulse-like. Ninety
one large-velocity pulses were found in the fault-normal components of the approximately
3500 strong ground-motion recordings in the PEER NGA Database [2005]. It should be
noted that not all of these pulses may be due to directivity effects, but this provides a
reasonable data set for studying the potential impact of directivity. Of these, 30 pulses
were found in the fault-normal components of the Chi-Chi earthquake recordings, while
the rest of the earthquakes have far fewer recordings with pulses. In the current work, the
pulse-like ground motions from the Chi-Chi earthquake are used to compute ε values at
different periods. The semivariograms of the residuals are obtained and compared to those
estimated using all usable records (section 3.4.3).
Figures B.19-B.25 compare experimental semivariograms of residuals (at seven differ-
ent periods) computed using pulse-like ground motions to experimental semivariograms
of residuals computed using all usable ground motions. The figures show the experimental
semivariogram values at short separation distances, which are of interest in practice. On ac-
count of the fewer available records, it is to be noted that the experimental semivariograms
obtained using pulse-like ground motions are less clearly defined than those obtained using
all usable ground motions. Hence, it is difficult to fit robust models for the experimental
semivariograms obtained using the pulse-like ground motions. As a result, the experimental
semivariograms are compared as such, rather than by their models and ranges.
APPENDIX B. SPATIAL CORRELATION MODEL 245
It can be seen from Figures B.19-B.25 that the experimental semivariogram values ob-
tained using the pulse-like ground motions are slightly less than those obtained using all
usable ground motions, particularly at separation distances below 10 km and at long pe-
riods (7.5 and 10 seconds). This is consistent with expectations as the pulses from this
earthquake typically have periods of approximately 7 seconds and so, it is expected that
this is the period range that would be most strongly influenced by directivity. In other
words, the ε’s obtained using pulse-like ground motions show slightly larger correlations
than those obtained using all usable ground motions. The difference in the correlations is
typically around 0.1, with a maximum value of approximately 0.2.
While the increased correlations between the residuals at sites experiencing pulse-
like ground motions is expected, the difference in the correlation seems reasonably small.
Moreover, it is to be noted that the source of this additional correlation is the systematic pre-
diction errors caused by the ground-motion models at sites experiencing pulse-like ground
motions. Hence, if ground-motion models that account for directivity effects accurately are
developed, the correlations between near-fault ground-motion intensities can be expected
to the similar to the correlation between ground-motion intensities at other sites. That is,
the directivity effects are best addressed through refinements to ground-motion models,
rather than refinements to correlation models.
APPENDIX B. SPATIAL CORRELATION MODEL 246
Figure B.19: Comparison between the experimental semivariogram of ε’s computed usingpulse-like ground motions and the experimental semivariogram of ε’s computed using allusable ground motions. The ε’s are computed from peak ground accelerations
Figure B.20: Comparison between the experimental semivariogram of ε’s computed usingpulse-like ground motions and the experimental semivariogram of ε’s computed using allusable ground motions. The ε’s are obtained from spectral accelerations computed at 0.5seconds
APPENDIX B. SPATIAL CORRELATION MODEL 247
Figure B.21: Comparison between the experimental semivariogram of ε’s computed usingpulse-like ground motions and the experimental semivariogram of ε’s computed using allusable ground motions. The ε’s are obtained from spectral accelerations computed at 1second
Figure B.22: Comparison between the experimental semivariogram of ε’s computed usingpulse-like ground motions and the experimental semivariogram of ε’s computed using allusable ground motions. The ε’s are obtained from spectral accelerations computed at 2seconds
APPENDIX B. SPATIAL CORRELATION MODEL 248
Figure B.23: Comparison between the experimental semivariogram of ε’s computed usingpulse-like ground motions and the experimental semivariogram of ε’s computed using allusable ground motions. The ε’s are obtained from spectral accelerations computed at 5seconds
Figure B.24: Comparison between the experimental semivariogram of ε’s computed usingpulse-like ground motions and the experimental semivariogram of ε’s computed using allusable ground motions. The ε’s are obtained from spectral accelerations computed at 7.5seconds
APPENDIX B. SPATIAL CORRELATION MODEL 249
Figure B.25: Comparison between the experimental semivariogram of ε’s computed usingpulse-like ground motions and the experimental semivariogram of ε’s computed using allusable ground motions. The ε’s are obtained from spectral accelerations computed at 10seconds
APPENDIX B. SPATIAL CORRELATION MODEL 250
B.6 Directional semivariograms estimated using the
Northridge and the Chi-Chi earthquake records at
various periods
Chapter 3 showed that the directional semivariograms obtained using the Northridge earth-
quake 2s residuals match reasonably well, thereby indicating that the use of an isotropic
correlation model is reasonable. This section provides more empirical evidence (directional
semivariograms obtained using Chi-Chi and Northridge earthquake recordings, considering
residuals at three different periods) to support the assumption of isotropy.
APPENDIX B. SPATIAL CORRELATION MODEL 251
(a) (b)
(c) (d)
Figure B.26: Experimental directional semivariograms at discrete separations obtained us-ing the Northridge earthquake ε values computed at 2 seconds. Also shown in the figuresis the best fit to the omni-directional semivariogram: (a) Omni-directional; (b) Azimuth =0; (c) Azimuth = 45 and (d) Azimuth = 90
APPENDIX B. SPATIAL CORRELATION MODEL 252
(a) (b)
(c) (d)
Figure B.27: Experimental directional semivariograms at discrete separations obtained us-ing the Chi-Chi earthquake ε values computed at 1 second. Also shown in the figures is thebest fit to the omni-directional semivariogram: (a) Omni-directional; (b) Azimuth = 0; (c)Azimuth = 45 and (d) Azimuth = 90
APPENDIX B. SPATIAL CORRELATION MODEL 253
(a) (b)
(c) (d)
Figure B.28: Experimental directional semivariograms at discrete separations obtained us-ing the Chi-Chi earthquake ε values computed at 7.5 seconds. Also shown in the figures isthe best fit to the omni-directional semivariogram: (a) Omni-directional; (b) Azimuth = 0;(c) Azimuth = 45 and (d) Azimuth = 90
APPENDIX B. SPATIAL CORRELATION MODEL 254
(a) (b)
(c) (d)
Figure B.29: Experimental directional semivariograms at discrete separations obtained us-ing the simulated time histories. The ε values are computed at 2 seconds. Also shown inthe figures is the best fit to the omni-directional semivariogram: (a) Omni-directional; (b)Azimuth = 0; (c) Azimuth = 45 and (d) Azimuth = 90
APPENDIX B. SPATIAL CORRELATION MODEL 255
(a) (b)
(c) (d)
Figure B.30: Experimental directional semivariograms at discrete separations obtained us-ing the simulated time histories. The ε values are computed at 7.5 seconds. Also shown inthe figures is the best fit to the omni-directional semivariogram: (a) Omni-directional; (b)Azimuth = 0; (c) Azimuth = 45 and (d) Azimuth = 90
APPENDIX B. SPATIAL CORRELATION MODEL 256
(a) (b)
(c) (d)
Figure B.31: Experimental directional semivariograms at discrete separations obtained us-ing the simulated time histories. The ε values are computed at 7.5 seconds. Also shown inthe figures is an anisotropic model that fits the four experimental semivariograms well (It isto be noted that an anisotropic semivariogram has different shapes in different directions.):(a) Omni-directional; (b) Azimuth = 0; (c) Azimuth = 45 and (d) Azimuth = 90
Appendix C
Deaggregation of lifeline risk: Insightsfor choosing deterministic scenarioearthquakes
N. Jayaram and Baker, J.W. (2009). Deaggregation of lifeline risk: Insights for choos-
ing deterministic scenario earthquakes, Lifeline Earthquake Engineering in a Multihazard
Environment TCLEE, Oakland, California.
C.1 Abstract
Probabilistic seismic risk assessment for lifelines is less straightforward than for individual
structures. Analytical risk assessment techniques such as the ‘PEER framework’ are insuf-
ficient for a probabilistic study of lifeline performance, due in large part to difficulties in
describing ground-motion hazard over a region. As a result, Monte Carlo simulation and its
variants appear to be the best approach for characterizing ground motions for lifelines. A
challenge with Monte Carlo simulation is its large computational expense, and in situations
where computing lifeline losses is extremely computationally demanding, assessments may
consider only a single ‘interesting’ ground-motion scenario and a single associated map of
resulting ground motion intensities.
In this paper, a probabilistic simulation-based risk assessment procedure is coupled
257
APPENDIX C. LIFELINE RISK DEAGGREGATION 258
with a deaggregation calculation to identify the ground-motion scenarios most likely to
produce exceedance of a given loss threshold. The deaggregation calculations show that
this ‘most-likely scenario’ depends on the loss level of interest, and is influenced by factors
such as the seismicity of the region, the location of the lifeline with respect to the faults
and the current performance state of the various components of the lifeline. It is seen that
large losses are typically caused by moderately large magnitude events with large average
values of inter-event and intra-event residuals, implying that the scenario ground motions
should be obtained in a manner that accounts for ground-motion uncertainties. Explicit
loss analysis calculations that exclude residuals will demonstrate that the resulting loss
estimates are highly biased.
C.2 Introduction
Probabilistic seismic risk assessment for lifelines is less straightforward than for individual
structures. While procedures such as the ‘PEER framework’ have been developed for risk
assessment of individual structures, these are not easily applicable to distributed lifeline
systems, due in large part to difficulties in describing ground-motion hazard over a region
(in contrast to ground-motion hazard at a single site, which is easily quantified using Proba-
bilistic Seismic Hazard Analysis). In the past, researchers have used simplified approaches
to tackle the problem of specifying ground motions over a region. In the simplest case,
the uncertainties in the ground-motion intensities are ignored, and lifeline risks are stud-
ied using the median ground motions predicted by ground-motion models [e.g., Shiraki
et al., 2007, Campbell and Seligson, 2003]. While this approach reduces the computa-
tional burden significantly, ignoring the uncertainties in the ground-motion intensities will
result in highly biased risk estimates as shown in this paper subsequently. Sometimes, as
a simplification, lifeline risks are assessed using only those earthquake scenarios that may
dominate the ground-motion hazard in the region of interest [e.g., Adachi and Ellingwood,
2008]. This approach is helpful practically in reducing computational expense, but suffers
from several problems. First, it is difficult to identify the probability of actually incurring
the computed losses resulting from a single ground-motion scenario. Second, the scenario
earthquake is generally chosen in a somewhat ad hoc manner, and so there is no guarantee
APPENDIX C. LIFELINE RISK DEAGGREGATION 259
that the chosen scenario is the one that is most ‘interesting’ in terms of risk to the lifeline
system.
Crowley and Bommer [2006] and more recently, Jayaram and Baker [2010] proposed
Monte Carlo simulation (MCS)-based frameworks to forward simulate ground-motion in-
tensities in future earthquakes, which can then be used for the risk assessment of lifelines.
The sampling frameworks are based on the form of existing ground-motion models, which
is described below. The ground motion at a site is modeled as [e.g., Boore and Atkinson,
2008]
ln(Yi) = ln(Yi)+ εi +η (C.1)
where Yi denotes the ground-motion parameter of interest (e.g., Sa(T ), the spectral acceler-
ation at period T ); Yi denotes the predicted (by the ground-motion model) median ground-
motion intensity (which depends on parameters such as magnitude, distance, period and
local-site conditions); εi denotes the intra-event residual, which is a random variable with
zero mean and standard deviation σi; and η denotes the inter-event residual, which is a
random variable with zero mean and standard deviation τ . The standard deviations, σi
and τ , are estimated as part of the ground-motion model and are a function of the spectral
period of interest, and in some models also a function of the earthquake magnitude and
the distance of the site from the rupture. The intra-event residual at two sites i and j are
correlated, and the correlation is a function of the separation distance between the sites.
The extent of the correlation can be obtained from spatial correlation models such as that
of Jayaram and Baker [2009a] and Wang and Takada [2005].
Crowley and Bommer [2006] describe the MCS approach used to probabilistically sam-
ple ground-motion maps. This approach involves simulating earthquakes of different mag-
nitudes on various active faults in the region, followed by simulating the inter-event and
the intra-event residuals at the sites of interest for each earthquake. The residuals are then
combined with the median ground motions in accordance with Equation 1 in order to obtain
the ground motions at all the sites. In the current work, the simulation approach described
above is coupled with a deaggregation calculation that can identify the ground-motion sce-
nario most likely to produce exceedance of a given loss threshold. The results show that
APPENDIX C. LIFELINE RISK DEAGGREGATION 260
the most-likely scenario depends on the loss level of interest, and is influenced by factors
such as the seismicity of the region, the location of the lifeline with respect to the faults
and the current performance state of the various components of the lifeline. It is also seen
that large losses are most likely to be caused by moderately large magnitude earthquakes
combined with large positive inter-event and intra-event residuals. The findings illustrate
the importance of accounting for ground-motion uncertainty, as well as provide a basis for
a decision maker to choose interesting scenario ground motions for lifeline risk assessment.
C.3 Deaggregation of seismic loss
This section describes the fundamentals of the seismic loss deaggregation procedure which
is used in the current study. Deaggregation is the process used to quantify the likelihood
that various events could have produced the exceedance of a given loss threshold. For
instance, if it is known that the seismic loss exceeds x units, the likelihood that an event of
magnitude m could have caused the exceedance is given as follows:
P(Magnitude = m|Loss > x) =P(Loss > x,Magnitude = m)
P(Loss > x)
=λ (Loss > x,Magnitude = m)
λ (Loss > x)(C.2)
where λ (Loss > x,Magnitude = m) denotes the recurrence rate of events of magnitude m
causing more than loss x and λ (Loss > x) is the recurrence rates of events causing a loss
exceedance of x. These parameters can be estimated using the simulation-based framework
described in Section C.2.
The likelihoods can also be computed considering multiple parameters such as magni-
tudes and faults as follows:
P(Magnitude=m, f ault = f |Loss> x)=λ (Loss > x,Magnitude = m, f ault = g)
λ (Loss > x)(C.3)
Such calculations are common practice when loss assessments are carried out for a
APPENDIX C. LIFELINE RISK DEAGGREGATION 261
single structure (though most deaggregation calculations estimate the contribution (likeli-
hood) of various earthquake scenarios to ground-motion intensity exceedance rather than
loss exceedance). Typical results from the single-site deaggregation computations include
the joint likelihoods of magnitudes, rupture distances (distance of the structure from the
rupture) and residuals (Equation C.1).
In the current work, it is of interest to identify the contributions of magnitudes, rupture
locations and residuals (inter-event and intra-event) to lifeline losses. Deaggregation calcu-
lations for lifeline losses need to account for the fact that ground motions at multiple sites
are of interest. This would mean that a specific distance to the rupture cannot be obtained
as is commonly done when a single structure is involved. In the current work, this problem
is overcome by specifying the fault on which the rupture lies rather than the distance to any
particular site. Further, since each site of interest is associated with a different intra-event
residual, deaggregation is used to compute the contribution of the mean intra-event residual
(i.e., the average of the intra-event residuals at all sites) rather than the contribution of the
intra-event residual at any particular site.
C.4 Loss assessment for the San Francisco Bay Area
transportation network
The deaggregation computations in the current work are based on the loss estimates for an
aggregated form of the San Francisco bay area transportation network provided by Jayaram
and Baker [2009a]. This section describes the details of the aggregated network as well as
describes the performance measures considered in the loss assessment process. Figure
C.1 shows the deaggregated network along with the various important faults in the San
Francisco bay area. The network consists predominantly of freeways and expressways,
and has a total of 586 links, 310 nodes and 1,125 bridges. In this network, the traffic
originates and culminates in 46 nodes denoted centroidal nodes. Transportation network
performance is usually measured in terms of the total travel time of the network [Shiraki
et al., 2007, Stergiou and Kiremidjian, 2006]. The total travel time is obtained using the
user-equilibrium principle which states that, under equilibrium, each user would choose the
APPENDIX C. LIFELINE RISK DEAGGREGATION 262
Figure C.1: The aggregated San Francisco bay area transportation network.
path that would minimize his/ her travel time [Beckman et al., 1956]. The user-equilibrium
formulation is solved by the commonly-used solution technique provided by [Frank and
Wolfe, 1956].
The changes in the network travel time after an earthquake are due to structural damage
to bridges which will result in link closures and reduction in the link capacities. (The
current work considers only the change in the total network travel time, and omits monetary
costs due to structural damage.) Thus, the loss assessment is carried out by accounting for
the structural damage to bridges caused by each simulated ground-motion map (obtained
using the simulation-based procedure described in Section C.2) and computing the network
travel time in the damaged state (In the current work, only peak-hour demands and travel
times are considered.) Figure C.2 shows the loss estimates in the form of a recurrence
curve, which shows the rate of exceeding various travel times delays. The current work
uses these loss estimates (i.e., travel time delays) in the deaggregation calculations.
APPENDIX C. LIFELINE RISK DEAGGREGATION 263
Figure C.2: Recurrence curve for the travel time delay obtained using the simulation-basedframework.
C.5 Results and Discussion
This section presents the results from the deaggregation calculations, which include the
contribution of magnitudes, faults, inter-event residuals and mean intra-event residuals to
lifeline losses. The estimates are obtained using equations similar to C.2 and C.3, where the
required recurrence rates are obtained using the simulation-based loss assessment frame-
work described in the previous sections. For instance, if 100 out of 15,000 simulated events
involve an earthquake of magnitude 7 and a loss (i.e., travel time delay) exceeding 10,000
hours, P(Loss > 10,000,Magnitude = 7) = 100/15,000.
C.5.1 Contribution of magnitudes and faults to the lifeline losses
Figure C.3 shows the contribution (i.e., the likelihood term obtained from Equation 3) of
various magnitudes and faults to the probability of exceeding four different travel delay
thresholds, namely, 0 hours, 5,000 hours, 10,000 hours and 20,000 hours. (The total travel
time in the network during normal operating conditions equals 73,000 hours.) In order
to obtain the contributions of discrete magnitudes to the loss exceedance, earthquakes of
APPENDIX C. LIFELINE RISK DEAGGREGATION 264
Figure C.3: Joint likelihoods of magnitudes and faults given that travel time delay exceeds(a) 0 hours, (b) 5,000 hours, (c) 10,000 hours and (d) 20,000 hours.
different magnitudes need to be pooled in to bins of select discrete magnitudes. In the
current work, the bin size is chosen to be 0.5. For instance, all magnitudes between 7.75
and 8.25 will be classified as magnitude 8.
Figure C.3 shows that, at small loss thresholds, small magnitude events contribute sig-
nificantly to the loss, which is understandable since small magnitude events are signifi-
cantly more probable than large magnitude events. Also, as seen from the Figure C.3, the
loss is typically dominated by events on the northern segment of the San Andreas Fault.
This is because the rate of earthquake occurrence on the San Andreas Fault is much larger
than that on other faults.
At moderate loss levels (5,000-10,000 hours), a significant portion of the contribution
is shared by earthquake events on the Hayward and the San Andreas Faults. Events of
magnitude close to 7 on the Hayward Fault and of magnitude around 8 on the San Andreas
Fault are ’characteristic events’ on the respective faults [USGS, 2003]. In other words,
these earthquakes are known to occur on a fairly regular basis and hence, are more likely
APPENDIX C. LIFELINE RISK DEAGGREGATION 265
Figure C.4: Level of congestion in the network as indicated by the volume/ capacity ratio.
than even some of the smaller magnitude events on these faults. It can be seen from Figure
C.3 that the characteristic events contribute most to the moderate losses by virtue of the
higher likelihoods of occurrence. Further, it is interesting to note that an event of magnitude
7 on the Hayward has a slightly larger contribution than a much larger event (magnitude 8)
on the San Andreas fault. This is due to the fact that the Hayward fault is right down the
middle of the network while the San Andreas is on the western end. As a result, an event
on the Hayward fault causes moderate damage to all the links in the network, while the
San Andreas event causes extensive damage to the west end of the network and very less
damage to the east end. The overall effect is a nearly equal contribution to the losses by
both the above-mentioned events.
At large loss levels (20,000 hours), however, events on the San Andreas Fault again
dominate the hazard. Of all the links present in the transportation network, the most con-
gested ones under normal operating conditions are in the western portion of the network.
This can be seen from Figure C.4 which shows the ratio of the volume of traffic in each
link normalized by the link capacity. Large travel time delays are incurred if links that are
congested (volume/capacity greater than 0.75) under normal conditions suffer damage in-
creasing the congestion even further. This happens when a moderate to large event occurs
APPENDIX C. LIFELINE RISK DEAGGREGATION 266
Figure C.5: Joint likelihoods of inter-event residual given that travel time delay exceeds (a)0 hours, (b) 5,000 hours, (c) 10,000 hours and (d) 20,000 hours.
on the San Andreas Fault (which is adjacent to several congested links) and has large resid-
uals, and hence such a scenario is the primary cause for large delays in the network. It can
be seen from the above discussion that the most-likely scenario depends on the loss level
of interest, and is influenced by factors such as the seismicity of the region, the location of
the lifeline with respect to the faults and the performance state of the various components
of the lifeline under normal operating conditions. In fact, for certain loss levels, it may
not even be possible to choose a single dominating event as shown in Figures C.3b and
c, which show nearly equal contributions by events on the Hayward and the San Andreas
Faults.
APPENDIX C. LIFELINE RISK DEAGGREGATION 267
Figure C.6: Joint likelihoods of inter-event residual given that travel time delay exceeds (a)0 hours, (b) 5,000 hours, (c) 10,000 hours and (d) 20,000 hours.
C.5.2 Contribution of inter- and intra-event residuals to the lifelineloss
Figures C.5 and C.6 show the contribution of mean intra-event and inter-event residuals
to the probability of exceeding four different travel time delay thresholds. As expected,
events with residuals close to zero (the mean value) dominate small seismic losses. As the
loss level increases, the contribution of large inter-event and large mean intra-event resid-
uals increases rapidly. It can be seen from Figures C.5d and C.6d that, at a loss threshold
of 20,000 hours, significant contributions are obtained from mean intra-event residuals be-
tween 0.3 and 0.5 and inter-event residuals between 1.5 and 3. These results are perhaps
not surprising given the large effect that inter-event and intra-event residuals have on the re-
sulting ground motions. Since the inter-event residual is constant across the entire region, a
large positive value will increase the ground-motion intensity at every site in the region. As
a consequence, appropriate consideration of the inter-event residual is extremely important
while assessing lifeline losses than while assessing the losses for a single structure.
APPENDIX C. LIFELINE RISK DEAGGREGATION 268
Figure C.7: Mean magnitude of earthquakes producing a travel time delay exceeding aspecified threshold.
Figure C.8: (a) Average of mean intra-event residual of earthquakes producing a travel timedelay exceeding a specified threshold (b) Average of inter-event residual of earthquakesproducing a travel time exceeding a specified threshold.
APPENDIX C. LIFELINE RISK DEAGGREGATION 269
Figure C.9: Recurrence curves obtained without completely accounting for inter-event andintra-event residuals.
Finally, Figures C.7 and C.8 summarize the findings from the deaggregation calcula-
tions, and illustrate the variation in the mean magnitude and the mean residuals of the
ground-motion scenarios that contribute to the probability of exceeding various lifeline
loss thresholds. For instance, the mean magnitude causing a travel time delay exceeding
x hours is obtained by averaging the magnitudes of all earthquakes that produce a travel
time delay greater than x hours. The figures show that the magnitude, inter-event residual
and mean intra-event residual increase rapidly as the travel time delay threshold increases
(Some of the wiggles seen at large thresholds are due to small sample sizes at these thresh-
olds.) It is interesting to note that most of the extremely large losses occur at magnitudes
well below the maximum (the maximum is 8.05 in this source model), which indicates that
large losses are typically caused by moderately large events combined with large values
of residuals (Figure 8) as explained previously. This result can be understood intuitively
as follows: while ‘maximum magnitude’ events certainly cause large losses, they occur so
infrequently that in many cases, more common moderate magnitude events may be more
important.
APPENDIX C. LIFELINE RISK DEAGGREGATION 270
In order to further emphasize the importance of residuals, in the current work, the loss
assessment for the aggregated network was repeated without considering one or both types
of residuals (i.e., the inter-event and the intra-event residuals). The recurrence curves ob-
tained are shown in Figure C.9. The figure shows that the loss is significantly underesti-
mated if even one of the two types of residuals is not considered. This is to be expected
based on the previous observation that the contribution to large loss levels typically comes
from events of moderately large magnitude and large positive residuals rather than events
of extremely large magnitudes and zero residuals.
C.6 Transportation network performance under sample
scenario ground-motion maps
This section provides a graphical illustration of why residuals play an important part in
determining the losses to the transportation network. The performance of the network is
analyzed under three different ground-motion scenarios, namely, A, B and C. All three
scenarios result from an earthquake of magnitude 8.1 on the northern segment of the San
Andreas Fault, and have a mean intra-event residual of approximately zero. The value of
the inter-event residual equals 3.79 in scenario A, -1.64 in scenario B and 0 in scenario C.
Figure C.10 graphically shows the performance of the transportation network under the
three ground-motion scenarios. Thicker lines indicate links experiencing larger increases
in the travel times. It can be seen that the delays are much greater under scenario A than
under scenarios B and C. In fact, the travel time delay in the network equals 32,600 hours
under scenario A, 1,550 hours under scenario B and 4,580 hours under scenario C. The
significant differences are a result of the differences in the inter-event residual, since the
predicted median ground-motion intensities in all these three cases are identical.
C.7 Conclusions
In this paper, a probabilistic simulation-based loss assessment procedure is coupled with
a deaggregation calculation that can identify the ground-motion scenarios most likely to
APPENDIX C. LIFELINE RISK DEAGGREGATION 271
Figure C.10: Performance of the network under three difference ground-motion scenarioscorresponding to three different inter-event residuals. (a) η = 3.79, (b) η = -1.64 and (c)η= 0.
APPENDIX C. LIFELINE RISK DEAGGREGATION 272
produce exceedance of a given loss threshold for a spatially-distributed lifeline system.
The deaggregation calculation quantifies the likelihood that various events (magnitudes,
faults, inter-event and intra-event residuals) could have produced the exceedance of a given
loss threshold. In the current work, deaggregation calculations are performed to identify
the likelihoods of earthquake events that cause various levels of travel time delays (the
lifeline loss measure) in an aggregated form of the San Francisco bay area transportation
network. The deaggregation calculations indicate that the ‘most-likely’ scenario depends
on the loss level of interest, and is influenced by factors such as the seismicity of the
region, the location of the lifeline with respect to the faults and the performance state
of the various components of the lifeline under normal operating conditions. In fact, for
certain loss levels, it is seen that two different events (different magnitudes and faults)
could have similar contributions to the loss exceedance making it impossible to identify
a single most-likely scenario earthquake. The deaggregation calculations also show that
large losses are typically caused by moderately large magnitude events with large values
of inter-event and intra-event residuals, indicating that it is very important to appropriately
account for the residuals in the loss assessment framework. Loss assessments carried out
without accounting for either the inter-event or the intra-event residuals produce highly
biased and incorrect loss estimates.
Bibliography
B.T. Aagaard, T.M. Brocher, D. Dolenc, D. Dreger, W. Graves, S. Harmsen, S. Hartzell,
S. Larsen, and M.L. Zoback. Ground-motion modeling of the 1906 San Francisco earth-
quake, part I: Validation using the 1989 Loma Prieta earthquake. Bulletin of the Seismo-
logical Society of America, 98(2):989–1011, 2008.
S.D. Aberson and J.L. Franklin. Impact on hurricane track and intensity forecasts of GPS
dropwindsonde observations from the first-season flights of the NOAA Gulfstream-IV
jet aircraft. Bulletin of the American Meteorological Society, 80(3):421–428, 1999.
N.A. Abrahamson. Statistical properties of peak ground accelerations recorded by the
SMART 1 array. Bulletin of the Seismological Society of America, 78(1):26–41, 1988.
N.A. Abrahamson. Seismic hazard assessment: Problems with current practice and fu-
ture developments. Keynote address in the First European Conference on Earthquake
Engineering and Seismology, Geneva, Switzerland, 2006.
N.A. Abrahamson and W.J. Silva. Summary of the Abrahamson & Silva NGA ground-
motion relations. Earthquake Spectra, 24(1):99–138, 2008.
N.A. Abrahamson and R.R. Youngs. A stable algorithm for regression analyses using the
random effects model. Bulletin of the Seismological Society of America, 82(1):505–510,
1992.
T. Adachi and B.R. Ellingwood. Serviceability of earthquake-damaged water systems: Ef-
fects of electrical power availability and power backup systems on system vulnerability.
Reliability Engineering and System Safety, 93:78–88, 2008.
273
BIBLIOGRAPHY 274
T. Annaka, F. Yamazaki, and F. Katahira. Proposal of peak ground velocity and response
spectra based on JMA 87 type accelerometer records. Proceedings, 27th JSCE Earth-
quake Engineering Symposium, 1:161–164, 1997.
ASCE Standard 7-02. Minimum design loads for buildings and other structures. Technical
report, Reston (VA): American Society of Civil Engineering, 2003.
J.W. Baker. Quantitative classification of near–fault ground motions using wavelet analysis.
Bulletin of the Seismological Society of America, 97(5):1486–1501, 2007a.
J.W. Baker. Quantitative classification of near–fault ground motions using wavelet analysis.
Bulletin of the Seismological Society of America, 97(5):1486–1501, 2007b.
J.W. Baker and C.A. Cornell. Correlation of response spectral values for multicomponent
ground motions. Bulletin of the Seismological Society of America, 96(1):215–227, 2006.
J.W. Baker and N. Jayaram. Correlation of spectral acceleration values from NGA ground
motion models. Earthquake Spectra, 24(1):299–317, 2008.
J.W. Baker and N. Jayaram. Effects of spatial correlation of ground-motion parame-
ters for multi-site risk assessment: Collaborative research with stanford university and
air. Technical report, Technical report, Report for U.S. Geological Survey National
Earthquake Hazards Reduction Program (NEHRP) External Research Program Awards
07HQGR0031, 2009.
N. Basoz and A.S. Kiremidjian. Risk assessment for highway transportation systems. Tech-
nical report, Report No. 118, Blume Earthquake Engineering Center, Stanford Univer-
sity, 1996.
M.E. Batts, M.R. Cordes, L.R. Russel, J.R. Shaver, and E. Simiu. Hurricane wind speeds in
the United States. Technical report, Report No. BSS-124, National Bureau of Standards,
U.S. Department of Commerce, Washington, D.C., 1980.
P. Bazzurro. Personal communication, 2010.
BIBLIOGRAPHY 275
P. Bazzurro and C.A. Cornell. Vector-valued probabilistic seismic hazard analysis (VP-
SHA). In Proceedings of the 7th U.S. National Conference on Earthquake Engineering,
Boston, MA, 2002.
P. Bazzurro and N. Luco. Effects of different sources of uncertainty and
correlation on earthquake-generated losses. Technical report, Presented at
IFED: International Forum on Engineering Decision Making, Stoos, Switzerland.
http://www.ifed.ethz.ch/events/Forum04/Bazzurro paper.pdf, 2004.
P. Bazzurro, J. Park, P. Tothong, and N. Jayaram. Effects of spatial correlation of ground-
motion parameters for multi-site risk assessment: Collaborative research with Stanford
University and AIR. Technical report, Report for U.S. Geological Survey National
Earthquake Hazards Reduction Program (NEHRP) External Research Program Awards
07HQGR0032, 2008.
M.J. Beckman, C.B. McGuire, and C.B. Winsten. Studies in the economics of transporta-
tion. Technical report, Cowles Comission Monograph, New Haven, Conn.: Yale Univer-
sity Press, 1956.
M. Bensi, A. Der Kiureghian, and D. Straub. A Bayesian network framework for post-
earthquake infrastructure performance assessment. In Proceedings, TCLEE2009 Con-
ference: Lifeline Earthquake Engineering in a Multihazard Environment, Oakland, Cal-
ifornia, 2009a.
M. Bensi, D. Straub, P. Friis-Hansen, and A. Der Kiureghian. Modeling infrastructure
system performance using BN. In 10th International Conference on Structural Safety
and Reliability (ICOSSAR09), Osaka, Japan, 2009b.
J.J. Bommer and N.A. Abrahamson. Why do modern probabilistic seismic-hazard analy-
ses often lead to increased hazard estimates? Bulletin of the Seismological Society of
America, 96(6):1967–1977, 2006.
J.J. Bommer, N.A. Abrahamson, F.O. Strasser, A. Pecker, P.Y. Bard, H. Bungum, F. Cotton,
D. Fah, F. Sabetta, F. Scherbaum, and J. Studer. The challenge of defining upper bounds
on earthquake ground motions. Seismological Research Letters, 75(1):82–95, 2004.
BIBLIOGRAPHY 276
D.M. Boore and G.M. Atkinson. Ground-motion prediction equations for the average hor-
izontal component of PGA, PGV and 5% damped SA at spectral periods between 0.01s
and 10.0s. Earthquake Spectra, 24(1):99–138, 2008.
D.M. Boore, J.F. Gibbs, W.B. Joyner, J.C. Tinsley, and D.J. Ponti. Estimated ground motion
from the 1994 Northridge, California, earthquake at the site of the Interstate 10 and
La Cienega Boulevard bridge collapse, West Los Angeles, California. Bulletin of the
Seismological Society of America, 93(6), 2003.
D.M. Boore, J. Watson-Lamprey, and N.A. Abrahamson. Orientation-independent mea-
sures of ground motion. Bulletin of the Seismological Society of America, 96(4):1502–
1511, 2006.
R.D. Borcherdt. Estimates of site-dependent response spectra for design (methodology and
justification). Earthquake Spectra, 10:617–653, 1994.
L. Brieman, J.H. Friedman, J.H. Olshen, and C.J. Stone. CART: Classification and Regres-
sion Trees. Belmont, CA: Wadsworth, 1983.
D.R. Brillinger and H.K. Preisler. An exploratory analysis of the Joyner-Boore attenuation
data. Bulletin of the Seismological Society of America, 74:1441–1450, 1984a.
D.R. Brillinger and H.K. Preisler. Further analysis of the Joyner-Boore attenuation data.
Bulletin of the Seismological Society of America, 75:611–614, 1984b.
Bureau of Public Roads. Traffic assignment manual. U.S. Dept. of Commerce, Urban
Planning Division, Washington D.C., 1964.
K. Campbell. Personal Communication, 2009.
K.W. Campbell and Y. Bozorgnia. NGA ground motion model for the geometric mean hor-
izontal component of PGA, PGV, PGD and 5% damped linear elastic response spectra
for periods ranging from 0.01 to 10s. Earthquake Spectra, 24(1):139–171, 2008.
K.W. Campbell and H.A. Seligson. Quantitative method for developing hazard-consistent
earthquake scenarios. In proceedings of the 6th U.S. Conference and Workshop on Life-
line Earthquake Engineering, Long Beach, CA, 2003.
BIBLIOGRAPHY 277
S. Castellaro, F. Mulargia, and P. L. Rossi. Vs30: Proxy for seismic amplification. Seismo-
logical Research Letters, 79(4):540–543, 2008.
CESMD database. Center for Engineering Strong Motion Data,
http://www.strongmotioncenter.org (last accessed 16 March 2010), 2008.
S. Chang. Evaluating disaster mitigations: Methodology for urban infrastructure systems.
Natural Hazards, 4(4):186–196, 2003.
S. Chang, M. Shinozuka, and K.E. Moore II. Probabilistic earthquake scenarios: extending
risk analysis methodologies to spatially distributed systems. Earthquake Spectra, 16:
557–572, 2000.
B. Chiou, R. Darragh, N. Gregor, and W. Silva. NGA project strong-motion database.
Earthquake Spectra, 24(1):23–44, 2008.
B.S-J. Chiou and R.R. Youngs. An NGA model for the average horizontal component of
peak ground motion and response spectra. Earthquake Spectra, 24(1):173–215, 2008.
C.A. Cornell. Engineering seismic risk analysis. Bulletin of the Seismological Society of
America, 58(5):1583–1606, 1968.
C.A. Cornell and H. Krawinkler. Progress and challenges in seismic performance assess-
ment. PEER Center News 2000; 3(2), 2000.
H. Crowley and J.J. Bommer. Modelling seismic hazard in earthquake loss models with
spatially distributed exposure. Bulletin of Earthquake Engineering, 4(3):249–273, 2006.
A.C. Davison and D.V. Hinkley. Bootstrap Methods and Their Application. Cambridge
University Press, 1997.
A.C. Davison, D.V. Hinkley, and E. Schechtman. Efficient bootstrap simulation. Biomet-
rica, 73(3):555–566, 1986.
G.G. Deierlein. Overview of a comprehensive framework for earthquake performance as-
sessment. Technical report, International Workshop on Performance-Based Seismic De-
sign Concepts and Implementation, Bled, Slovenia, 2004.
BIBLIOGRAPHY 278
A. Der Kiureghian. A coherency model for spatially varying ground motions. Earthquake
Engineering and Structural Dynamics, 25:99–111, 1996.
A. Der Kiureghian. Seismic risk assessment and management of infrastructure systems:
Review and new perspectives. In 10th International Conference on Structural Safety
and Reliability (ICOSSAR09), Osaka, Japan, 2009.
C.V. Deutsch and A.G. Journel. Geostatistical Software Library and User’s Guide. Oxford
University Press, Oxford, New York, 1998.
L. Duenas-Osorio, J.I. Craig, B.J. Goodno, and A. Bostrom. Interdependent response of
networked systems. Journal of Infrastructure Systems, 13(3):185–194, 2005.
B. Efron. An introduction to the bootstrap. CRC Press LLC, 1998.
B. Efron and R.J. Tibshirani. An Introduction to the Bootstrap. Chapman and Hall/ CRC,
1997.
G.S. Fishman. A First Course in Monte Carlo. Duxbury, Belmont, CA, 2006.
M. Frank and P. Wolfe. An algorithm for quadratic programming. Naval Research Logistics
Quarterly, 3:95–110, 1956.
J.L. Franklin, S.J. Lord, S.E. Feuer, and F.D. Marks. The kinematic structure of Hurricane
Gloria (1985) determined from nested analyses of dropwindsonde and Doppler data.
Monthly Weather Review, 121:2433–2451, 1993.
J.H. Friedman. Greedy Function Approximation: A Gradient Boosting Machine. Technical
report, Stanford University, 1999.
J.H. Friedman. Tutorial: Getting Started with MART in R. http://www-stat.stanford.edu/∼jhf/r-mart/tutorial/tutorial.pdf, 2002.
T.L. Friesz, D. Bernstein, T.E. Smith, R.L. Tobin, and B.W. Wie. A variational inequality
formulation of the dynamic network user equilibrium problem. Operations Research,
41:179–191, 1993.
BIBLIOGRAPHY 279
A. Gersho and R.M. Gray. Vector Quantization and Signal Compression. Springer, 1991.
Global Vs30 map server. http://earthquake.usgs.gov/hazards/apps/vs30/ (last accessed 16
March 2010), 2008.
K. Goda and H.P. Hong. Spatial correlation of peak ground motions and response spectra.
Bulletin of the Seismological Society of America, 98(1):354–365, 2008.
P. Goovaerts. Geostatistics for Natural Resources Evaluation. Oxford University Press,
Oxford, New York, 1997.
R. Graves. Broadband ground motion simulations for the Puente hills thrust system. Report
for U.S. Geological Survey National Earthquake Hazards Reduction Program (NEHRP)
External Research Program Awards 05HQGR0076, 2006.
R. Graves. Broadband ground motion simulations for the Puente hills thrust system. Per-
sonal communication, 2007.
P. Grossi and H. Kunreuther. Catastrophic modeling: A new approach of managing risk.
New York: Springer, 2005.
S.D. Guikema. Natural disaster risk analysis for critical infrastructure systems: An ap-
proach based on statistical learning theory. Reliability Engineering and Systems Safety,
94:855–860, 2009.
P. Hall. Performance of balanced bootstrap resampling in distribution function and quantile
problems. Probability Theory and Related Fields, 85(2):239–260, 2005.
C. Hanson, M. McCann, C. Stevens, J. Rosenfield, P. Rawlings, T. Cooke, A. Fraser, and
K. Karkanen. Delta risk management strategy. Technical report, Department of Water
Resources, 2008.
T. Hayashi, S. Fukushima, and H. Yashiro. Effects of the spatial correlation in ground mo-
tion on the seismic risk of portfolio of buildings. First European conference on Earth-
quake engineering and Seismology, Geneva, Switzerland, 2006.
BIBLIOGRAPHY 280
HAZUS. Earthquake loss estimation methodology. Technical manual. Prepared by the
National Institute of Building Sciences for Federal Emergency Management Agency,
1997.
HAZUS. Earthquake loss estimation technical manual. Technical report, National Institute
of Building Sciences, Washington D.C., 1999.
HAZUS. Multihazard loss estimation methodology: Hurricane model. Technical manual.
Prepared by the National Institute of Building Sciences for Federal Emergency Manage-
ment Agency, 2006.
N. Henze and B. Zirkler. A class of invariant consistent tests for multivariate normality.
Communications in Statistics-Theory and Methods, 19:3595–3618, 1990.
H.P. Hong, Y. Zhang, and K. Goda. Effect of spatial correlation on estimated ground-
motion prediction equations. Bulletin of the Seismological Society of America, 99(2A):
928–934, 2009.
S.H. Houston, W.A. Shaffer, M.D. Powell, and J. Chen. Comparisons of HRD and SLOSH
Surface Wind Fields in Hurricanes: Implications for Storm Surge Modeling. Weather
and Forecasting, 14:671–686, 1999.
B.R. Jarvinen, C.J. Neumann, and M.A.S. Davis. A tropical cyclone data tape for the North
Atlantic Basin 1886-1983: Contents, limitations and use. Technical report, NOAA Tech-
nical Memorandum No. NWS-NHC-22, U.S. Department of Commerce, Washington,
D.C., 1984.
N. Jayaram and J.W. Baker. Statistical tests of the joint distribution of spectral acceleration
values. Bulletin of the Seismological Society of America, 98(5):2231–2243, 2008.
N. Jayaram and J.W. Baker. Correlation model for spatially-distributed ground-motion in-
tensities. Earthquake Engineering and Structural Dynamics, 38(15):1687–1708, 2009a.
N. Jayaram and J.W. Baker. Deaggregation of lifeline risk: Insights for choosing deter-
ministic scenario earthquakes. In Proceedings, TCLEE2009 Conference: Lifeline Earth-
quake Engineering in a Multihazard Environment, Oakland, California, 2009b.
BIBLIOGRAPHY 281
N. Jayaram and J.W. Baker. Efficient sampling and data reduction techniques for probabilis-
tic seismic lifeline risk assessment. Earthquake Engineering and Structural Dynamics
(published online), 2010.
R.A. Johnson and D.W. Wichern. Applied Multivariate Statistical Analysis. Prentice Hall,
Upper Saddle River, NJ, 2007.
W.B. Joyner and D.M. Boore. Methods for regression analysis of strong-motion data.
Bulletin of the Seismological Society of America, 83(2):469–487, 1993.
W.H. Kang, J. Song, and P. Gardoni. Matrix-based system reliability method and applica-
tions to bridge networks. Reliability Engineering & System Safety, 93(11):1584 – 1593,
2008.
KiK Net. http://www.kik.bosai.go.jp/ (last accessed 16 March 2010), 2007.
A.S. Kiremidjian, J. Moore, Y.Y. Fan, N. Basiz, O. Yazali, and M. Williams. PEER highway
demonstration project. In 6th US Conference and Workshop on Lifeline Earthquake
Engineering, TCLEE/ASCE, Monograph No.25, Long Beach, CA, 2003.
A.S. Kiremidjian, E. Stergiou, and R. Lee. Issues in seismic risk assessment of trans-
portation networks. Chapter 19, Earthquake Geotechnical Engineering, pages 939–964.
Springer, 2007.
S.L. Kramer. Geotechnical Earthquake Engineering. Prentice Hall, Upper Saddle River,
New Jersey, 1996.
M.H. Kutner, C.J. Nachtsheim, J. Neter, and W. Li. Applied Linear Statistical Models. The
McGraw-Hill Companies Inc., New York, 2005.
C.W. Landsea, C. Anderson, N. Charles, G. Clark, J. Dunion, J. Fernandez-Partagas,
P. Hunderford, C. Neumann, and M. Zimmer. The Atlantic hurricane database re-
analysis project: Documentation for 1851-1910 alterations and additions to the HUR-
DAT database. In Hurricanes and Typhoons: Past, Present and Future, edited by R.J.
Murname and K.B. Liu, Columbia Univ. Press, NY, pages 177–221, 2004.
BIBLIOGRAPHY 282
A.M. Law and W.D. Kelton. Simulation Modeling and Analysis. McGraw-Hill, 2007.
K.H. Lee and D.V. Rosowsky. Synthetic hurricane wind speed records: Development of
a database for hazard analyses and risk studies. Natural Hazards Review, 8(2):23–34,
2007.
R. Lee and A.S. Kiremidjian. Uncertainty and correlation for loss assessment of spatially
distributed systems. Earthquake Spectra, 23(4):743–770, 2007.
M.R. Legg, L.K. Nozick, and R.A. Davidson. Optimizing the selection of hazard-consistent
probabilistic scenarios for long-term regional hurricane loss estimation. Structural
Safety, 32:90–100, 2010.
E.L. Lehmann and G. Casella. Theory of Point Estimation. Springer, 2nd edition, 2003.
Y. Li and B.R. Ellingwood. Hurricane damage to residential construction in the US: Im-
portance of uncertainty modeling in risk assessment. Engineering Structures, 28:1009–
1018, 2009.
K. Mardia. Measures of multivariate skewness and kurtosis with applications. Biometrika,
57:519–530, 1970.
K.V. Mardia, J.T. Kent, and J.M. Bibby. Multivariate Analysis. Academic Press, 1979.
R.K. McGuire. Seismic Hazard and Risk Analysis. Earthquake Engineering Research
Institute, 2007.
J. B. McQueen. Some methods for classification and analysis of multivariate observations.
In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probabil-
ity, Berkeley, CA, 1967.
C.J. Mecklin and D.J. Mundfrom. On using asymptotic crit-
ical values in testing for multivariate normality. InterStat,
http://interstat.statjournals.net/YEAR/2003/abstracts/0301001.php (last accessed
16 March 2010), 2003.
BIBLIOGRAPHY 283
K.V. Ooyama. Scale controlled objective analysis. Monthly Weather Review, 115:2479–
2506, 1987.
J. Park, P. Bazzurro, and J.W. Baker. Modeling spatial correlation of ground motion in-
tensity measures for regional seismic hazard and portfolio loss estimation. 10th Inter-
national Conference on Application of Statistics and Probability in Civil Engineering
(ICASP10), Tokyo, Japan, 2007.
PEER NGA Database. http://peer.berkeley.edu/nga (last accessed 16 March 2010), 2005.
J.C. Pinheiro and D.M. Bates. Mixed-effects models in S and S-PLUS. Springer, 2000.
M.D. Powell. Evaluations of diagnostic marine boundary layer models applied to hurri-
canes. Monthly Weather Review, 108:757–766, 1980.
M.D. Powell and S.H. Houston. Hurricane Andrew’s landfall in South Florida Part II :
Surface wind fields and potential real-time applications. Weather Forecast, 11:329–349,
1996.
M.D. Powell and S.H. Houston. Surface Wind Fields of 1995 Hurricanes Erin, Opal, Luis,
Marilyn, and Roxanne at Landfall. Monthly Weather Review, 126(5):1259–1273, 1997.
M.D. Powell, S.H. Houston, and T.A. Reinhold. Hurricane Andrew’s landfall in South
Florida Part I : Standardizing measurements for documentation of surface wind fields.
Weather Forecast, 11:304–328, 1996.
M.D. Powell, S.H. Houston, L.R. Amat, and N. Morisseau-Leroy. The HRD real-time hur-
ricane wind analysis system. Journal of Wind Engineering & Industrial Aerodynamics,
77 & 78:53–64, 1998.
C. Purvis. Peak spreading models: promises and limitations. In 7th TRB Conference on
the Application of Transportation Planning Models, Boston, Massachusetts, 1999.
G.J. Rix, D. Werner, and L.M. Ivey. Seismic risk analyses for container ports. In Pro-
ceedings, TCLEE2009 Conference: Lifeline Earthquake Engineering in a Multihazard
Environment, Oakland, California, 2009.
BIBLIOGRAPHY 284
S.R. Searle. Linear Models. John Wiley and Sons, Inc., 1977.
N. Shiraki, M. Shinozuka, J.E. Moore II, S.E. Chang, H. Kameda, and S. Tanaka. Sys-
tem risk curves: Probabilistic performance scenarios for highway networks subject to
earthquake damage. Journal of Infrastructure Systems, 213(1):43–54, 2007.
N. Shome and C.A. Cornell. Probabilistic seismic demand analysis of nonlinear structures.
Report No. 35, RMS Program, Stanford, CA. www.stanford.edu/group/rms (last accessed
16 March 2010), 1999.
P.G. Somerville, N.F. Smith, R.W. Graves, and N.A. Abrahamson. Modification of em-
pirical strong ground motion attenuation relations to include the amplitude and duration
effects of rupture directivity. Seismological Research Letters, 68(1):199–222, 1997.
E. Stergiou and A.S. Kiremidjian. Treatment of uncertainties in seismic risk analysis of
transportation systems. Technical report, No. 154, Blume Earthquake Engineering Cen-
ter, Stanford University, 2006.
F.O. Strasser, J.J. Bommer, and N.A. Abrahamson. Truncation of the distribution of
ground-motion residuals. Journal of Seismology, 12(1):79–105, 2008.
D. Straub and A. Der Kiureghian. Improved seismic fragility modeling from empirical
data. Structural Safety, 30:320–336, 2008.
S. Tanaka, M. Shinozuka, A. Schiff, and Y. Kawata. Lifeline seismic performance of elec-
tric power systems during the Northridge earthquake. In Proceedings of the Northridge
Earthquake Research Conference, Los Angeles, California, 1997.
USGS. Earthquake probabilities in the San Francisco bay region: 2002-2031. Technical
report, Open File Report 03-214, USGS, 2003.
D. Vamvatsikos and C.A. Cornell. Developing efficient scalar and vector intensity mea-
sures for IDA capacity estimation by incorporating elastic spectral shape information.
Earthquake Engineering and Structural Dynamics, 34(13):1573–1600, 2005.
BIBLIOGRAPHY 285
P.J. Vickery, P.F. Skerlj, A.C. Steckley, and L.A. Twisdale. Hurricane wind field model
for use in hurricane simulation. Journal of Structural Engineering, 126(10):1203–1221,
2000a.
P.J. Vickery, P.F. Skerlj, and L.A. Twisdale. Simulation of hurricane risk in the U.S. using
empirical track model. Journal of Structural Engineering, 126(10):1222–1237, 2000b.
P.J. Vickery, D. Wadhera, M.D. Powell, and Y. Chen. A hurricane boundary layer and wind
field model for use in engineering applications. Journal of Applied Meteorology, 48:
381–405, 2008.
P.J. Vickery, P.F. Skerlj, J. Lin, L.A. Twisdale Jr., M.A. Young, and F.M. Lavelle. HASUS-
MH hurricane model methodology. II: Damage and loss estimation. Natural Hazards
Review, 7(2):94–103, 2009a.
P.J. Vickery, D. Wadhera, L.A. Twisdale Jr., and F.M. Lavelle. U.S. hurricane wind speed
risk and uncertainty. Journal of Structural Engineering, 135(3):301–320, 2009b.
M.A. Walling. Non-Ergodic Probabilistic Seismic Hazard Analysis and Spatial Simulation
of Variation in Ground Motion. PhD thesis, University of California at Berkeley, 2009.
M. Wang and T. Takada. Macrospatial correlation model of seismic ground motions. Earth-
quake Spectra, 21(4):1137–1156, 2005.
S.D. Werner, C.E.Taylor, J.E. Moore, J.S. Walton, and S.Cho. A Risk-Based methodology
for assessing the seismic performance of highway systems. Technical report, Multidis-
ciplinary Center for Earthquake Engineering Research, University at Buffalo, Buffalo,
2000.
S.D. Werner, J.P. Lavoie, C. Eitzel, S.Cho, C.K. Huyck, S. Ghosh, R.T. Eguchi, C.E. Tay-
lor, and J.E. Moore. REDARS 1: Demonstration software for seismic risk analysis of
highway systems. Technical report, Multidisciplinary Centre for Earthquake Engineer-
ing MCEER, University at Buffalo, Buffalo, 2004.
BIBLIOGRAPHY 286
R.L. Wesson, D.M. Perkins, N. Luco, and E. Karaca. Direct calculation for the probability
distribution for earthquake losses to a portfolio. Earthquake Spectra, 25(3):687–706,
2009.
R.R. Youngs and K.J. Coppersmith. Implications of fault slip rates and earthquake recur-
rence models to probabilistic seismic hazard estimates. Bulletin of the Seismological
Society of America, 75(4):939–964, 1985.
A. Zerva and V. Zervas. Spatial variation of seismic ground motions. Applied Mechanics
Reviews, 55(3):271–297, 2002.