the use of l-moments in the peak over threshold approach for estimating extreme ... faculteit... ·...

1 INTRODUCTION

A fundamental theorem of extreme value theorystates that sufficiently large values of independentidentically distributed random variables generallyfollow one of the three extreme value distributions,namely, Frechet, Gumbel and reverse Weibull. Theupper tail of Frechet and Gumbel distributions is in-finite, whereas the reverse Weibull distribution hasfinite upper tail. In the estimation of extreme quan-tiles of wind loading, Frechet and Gumbel distribu-tions are normally used. However, it has been shownthat the Gumbel and Frechet models results in unre-alistically high failure probability of wind sensitivestructures due perhaps to the unbounded upper tail ofthe distribution (Ellingwood 1980).

The POT (peaks over threshold) approach formodelling the tail of a distribution has received con-siderable attention since it has been shown that thePareto distribution (GPD) arises as the limiting dis-tribution of peaks (or excesses) X−u of a randomvariable X over a high threshold u (Pickands 1975).The central objective of this paper is to evaluate theeffectiveness of the method of L-moments for esti-mating the GPD parameters, and compare its per-formance in quantile prediction against that of the deHaan method (1994). Monte Carlo simulations andactual wind velocity data collected at various sta-tions in the United States have been utilized forcomparison purposes.

The paper is organized as follows. The next sec-tion reviews basic concepts of the POT approach andthe de Haan estimation method. The L-momentmethod is briefly outlined in Section 3. Section 4presents numerical results using Monte Carlo simu-lations and actual wind speed data. The main con-

clusions and a list of references are presented inSection 5 and 6, respectively.

2 PEAKS OVER THRESHOLD AND DE HAAN

2.1 Simiu and Heckert's studies

In a recent study Simiu and Heckert (1996) appliedthe Peak Over Threshold (POT) approach to analyzethe wind speed data and fit an appropriate extremevalue distribution. The POT approach enables theanalyst to use all the data exceeding a sufficientlyhigh threshold, in contrast with the classical extremevalue analysis which typically uses annual extremevalues. Simiu and Heckert (1996) concluded that thereverse Weibull distribution is a more appropriatedistribution than the Gumbel or Frechet. Their con-clusion is also supported by a physical fact that non-tornadic extreme winds are expected to be bounded.

2.2 POT

The application of POT approach involves two mainsteps: selection of threshold and estimation of thetail index using order statistics above the threshold.Since the tail index is found to be very sensitive thethreshold value, subsequent quantile estimates alsobecome sensitive to the threshold. This is a majorweakness of the POT approach, which is also con-firmed from the analysis of Simiu and Heckert(1996). In fact, the fluctuation of quantiles of windloading with threshold is so severe that no reliabledesign estimates could be recommended.

Let X1, X2, . . ., Xn be a series of independent ran-dom observations of a random variable X with thedistribution function (DF) F(x). To model the uppertail of F(x), consider k exceedances of X over a

The Use of L-Moments in the Peak Over Threshold Approach forEstimating Extreme Quantiles of Wind Velocity

M.D. PandeyUniversity of Waterloo, Ontario, Canada

P.H.A.J.M. van Gelder & J.K. VrijlingDelft University of Technology, Delft, Netherlands

ABSTRACT: An alternative approach to the estimation of the tail index of extreme value distributions in thePOT approach is described. The proposed estimation method is based on sample L-moments. Quantile esti-mation of wind loading is presented as a case study.

threshold u and let Y1, Y2, . . ., Yk denote the excesses(or peaks), i.e., Yi = Xi- u. Pickands (1975) showedthat, in some asymptotic sense, the conditional dis-tribution of excesses follow the generalized Paretodistribution. Thus the DF of Yi = [(Xi - u)| Xi > u], i =1,2,. . ., k, is given as

1/( )

( ) 1 1c

c y hG y

a

−− = − +

(1)

where h, a and c denote the location, scale and shapeparameters, respectively. Generally, the location pa-rameter is taken as zero. The distribution is un-bounded, i.e., 0 < y < ∞ if c ≥ 0 and bounded as 0 < y< a/c if c < 0. The exponential DF is a special caseof eqn.(1) when c = 0. The GPD exhibits a uniquethreshold-stability property, i.e., if X follows GPDthen the conditional distribution of excesses, i.e.,G(y), also follows GPD with the same shape pa-rameter as that of X. It can also be shown that thedistribution of maximum excesses, i.e., Z = max (Y1,Y2, . . ., Yk) follows the generalized extreme valuedistribution provided that exceedances over thethreshold are generated from a Poisson process(Davison and Smith 1990). These elegant mathe-matical properties have motivated the use of theGPD model in extreme quantile estimation.

The calculation of a quantile value, xR, corre-sponding to an R-year return period is based on thequantile of peaks, Y, corresponding to a return periodof λR, where λ is the mean exceedance (or crossing)rate per year of X over u. Thus,

1 11Rx G uRλ− = − +

(2)

where G-1(p) denotes the Pareto quantile function. Ifn denotes the number of samples collected over myears and k is the number of exceedances, then themean crossing rate is estimated as λ = k/m. As thethreshold is lowered to include more data in the in-ference, the crossing rate λ increases due to increas-ing values of k. An interesting trade-off is thus re-vealed from eqn.(2), i.e., the more the dataconsidered, the farther in the tail region one has togo for quantile estimation due to increasing values ofthe effective return period λR. Conversely, by inclu-sion of additional data the accuracy of tail modellingmust increase at a higher rate than the movementfarther in the tail region. Otherwise, the accuracy ofPOT estimates would deteriorate. In this sense, it isexpected that an optimal threshold might exists thatwould results in the minimum error (Caer and Maes1998, Dougherty et al. 1999).

2.3 Parameter estimation by De Haan

De Haan (1994) proposed the estimation of scale andshape parameters using the order statistics of ex-ceedances, {Xn-k,n, . . ., Xn,n}, where Xn-k,n is thesmallest data point to exceed a given threshold.Based on an extensive mathematical analysis, theshape parameter was derived as

c = c1 + c2, where (1)

1 nc M= and

1(1) 2

2 (2)

( )11 1

2n

n

Mc

M

−

= − −

(3)

in terms of moments of excesses obtained from thelog-transformed data:

( )1, ,

1

1[log( ) log( )]

kr r

n n i n n k ni

M X Xk − + −=

= −∑ (4)

The scale parameter can be obtained as(1)nMa u

ρ= (5)

where ρ = 1 if c ≥ 0, and ρ = 1/(1-c) if c < 0.

Generally speaking, c1 estimates the shape parameterof unbounded GPD, whereas c2 corresponds to thebounded case (Reiss and Thomas 1997). The confi-dence bounds for an estimate of c were also derivedbased on the argument of asymptotic normality (deHaan 1994).

3 METHOD OF L-MOMENTS

3.1 Order Statistics and L-Moments

Using the density function of an rth order statistic,Xr:n ( Kendall ad Stuart 1977), along with a trans-formation u = F(x), its expectation can be expressedin terms of the quantile function, x(u), as

E[Xr:n] = ∫ −− −

1

0

1 )1()( duuuuxr

nr rnr (6)

Expectations of the maximum and minimum of asample of size n can be easily obtained from eqn.(6)by setting r = n and r = 1, respectively.

E[Xn:n] = ∫ −1

0

1)( duuuxn n , and

E[X1:n] = ∫ −−1

0

1)1)(( duuuxn n (7)

The probability-weighted moment (PWM) of a ran-dom variable was formally defined by Greenwood etal. (1979) as

∫ −=−=1

0

,, )1()(])1([ duuuuxuuXEMkjikji

kji (8)

Certain linear combinations of PWMs, referred to asL-moments, are shown to be analogous to ordinarymoments in a sense that they also provide measuresof location, dispersion, skewness, kurtosis, and otheraspects of the shape of probability distributions ordata samples (Hosking 1990). An rth order L-moment is mathematically defined as

∑=

−−=r

kkkrr p

1

*1,1 βλ (9)

where *,krp represents the coefficients of shiftedLegendre polynomials (Hosking 1990). From an or-dered random sample of size n, unbiased estimatesbk and ak of βk and αk, respectively, can be obtainedas (Hosking and Wallis 1997)

∑=

−

−=

n

iik k

nX

k

i

nb

1

111 and

∑=

−

−=

n

iik k

nX

k

in

na

1

11 (10)

Normalized forms of higher order L-moments, τr =λr/λ2, (r = 3, 4,. . . ) are convenient to work with dueto their bounded variation, i.e., |τr| < 1. Extensivesimulation-based studies and comparison with in-formation-theoretic measures, e.g., cross-entropy anddivergence, Pandey et al. (1999) have shown that L-moments are very effective in summarizing distribu-tion properties and extreme quantile estimation.

3.2 Parameter estimation

Hosking and Wallis (1997) illustrated that L-moments are efficient in estimating parameters of awide range of distributions from small samples. Themain advantage of L-moments is that, being a linearcombination of data, they are less influenced by out-liers. In general, the bias of small sample estimatesof higher-order L-moments is fairly small. Further-more, the required computation is quite limited ascompared with other traditional techniques, such asmaximum likelihood and least-squares.

Figures 1 and 2 illustrate variations of thebias and root-mean-square error (RMSE), respec-tively, of sample estimates (size 50) of the first threeGPD L-moments (λ1, λ2, τ3) with respect to its shapeparameter. The bias and RMSE estimates are nor-malized by exact values of respective L-moments.As seen from Figure 1, bias associated with λ1 andλ2 is almost zero and that of τ3 is fairly small (≤3%).

Figure 2 shows that the RMSE of λ1 and λ2 increasesfrom 10% to 20% with increasing values of c. TheRMSE of τ3 remains somewhat insensitive to c, andis the order of 20%.

Using 3 sample L-moments, the location (h),scale (a) and shape (c) parameters of the GPD can beestimated as

3

3

(3 1)

( 1)c

ττ

−=

+ , 2(1 )(2 )a c c λ= − − and

1 2(2 )h cλ λ= − − (11)

In case that the location parameter is known, the first

two L-moments can be used to estimate c and a as

1

2

( )2

hc

λλ−

= − and 1(1 )( )a c hλ= − − (12)

Figure 1: Normalized bias of sample estimates of L-moments(parent: GPD, sample size 50).

Figure 2: Normalized RMSE of sample estimates of L-moments(parent: GPD, sample size 50).

-4%

-2%

0%

2%

-0.2 -0.1 0 0.1 0.2

Shape Parameter (c)

Bia

s

λ1

λ2

τ3

(Bounded GPD) (Unbounded GPD)

0%

10%

20%

30%

-0.2 -0.1 0 0.1 0.2

Shape Parameter (c)

RM

SE

λ1

λ2

τ3(Bounded GPD) (Unbounded GPD)

4 COMPARISON OF SHAPE ESTIMATES

4.1 Three methods

The paper presents an alternative approach to the es-timation of tail index of extreme value distributionin the POT approach. The proposed estimationmethod is based on sample L-moments instead oforder statistics. The L-moments can be consideredas linear combinations of certain expectations of or-der statistics (Hosking 1990). In contrast with ordi-nary statistical moments, the L-moments of higherorder can be reliably estimated from small samples.It is expected that L-moments could provide morereliable estimates of the tail index with a limitedsensitivity to the threshold parameter.

In this study, three methods are considered for theestimation of c, namely,

(1) de Haan (DHN) - eqn.(3)(2) 2 L-moments (2-LM) - eqn.(12)(3) 3 L-moments (3-LM) - eqn.(11)

To compare the bias and RMSE of various estimatesof c, a simulation experiment was designed in whichsamples, each of size of 1000, were generated from aGPD of prescribed parameters. Considering Np toporder statistics and corresponding sample of ex-cesses, the three methods were applied to estimatethe shape parameter. The simulation consisted of10,000 cycles. Numerical results for Np = 30 and 100are summarized in Table 1 for various c values ofthe parent GPD.

As seen from Table 1, bias associated with all thethree estimates is small (0-6%). In the first case ofNp = 30, the bias of DHN estimate is higher thanthose of L-moment estimates. In all the cases re-ported in Table 1, RMSE values of all the three es-timates are almost identical. A general observation isthat both bias and RMSE are almost insensitive tothe shape parameter of the parent GPD used in thesimulation. Considering these results, it is antici-pated that L-moment estimates of the tail index (c)would at least be competitive to those obtained fromthe de Haan method. In contrast with the de Haanmethod, the L-moment method is more insightfulsince L-moments have intuitive meaning, such asGini's mean difference (λ2) and skewness. On theother hand, de Haan's formulae based on log-transformed data are of "black-box" nature. Since thedistribution of L-moments ratios, λ2/λ1 and τ3, is as-ymptotically normal (Hosking 1990), the confidencebounds on c can also be constructed in principle.This topic is an area of future research.

Table 1: Bias and RMSE the Pareto shape parameter estimatedfrom POT samples (GPD parent, sample size 1000)

4.2 Simulations

To illustrate a trade-off associated with the selectionof threshold in the POT method, two scenarios wereanalyzed via Monte Carlo simulations. In the firstcase, samples of size N=1000 were generated fromthe Gumbel distribution (µ=50 mph, σ=6.25), andPOT samples were constructed from upper Np orderstatistics. Considering the original sample (N=1000)as a sample of annual maximum wind speeds, theexceedance rate was estimated as λ = Np/N. Windspeed quantiles for return periods (R) of 100 to 5000years were estimated from the POT sample by fittingthe Pareto distribution using de Haan and L-momentmethods. The simulation consisted of 10,000 cycles.The normalized bias and RMSE of quantile esti-mates are reported in Table 2 for various values ofNp and R. In all the cases, the bias is fairly small (<3%) and RMSE varies in a narrow range from 2 -10%, similar to results of Gross et al.(1994).

Table 2: Bias and RMSE of Gumbel quantile estimates (1000years of simulated data).

Table 2 clearly suggests that the GPD representationof POT data is fairly accurate.

In the second case, a more realistic scenario wasconsidered by assuming that N=1000 independentobservations of wind speed were collected overm=25 years period such that the exceedance rate be-comes λ = Np/m (e.g., Dougherty and Corotis 1998).In the simulation, samples were generated from theGumbel distribution (µ=30 mph, σ=6.5 mph) andquantile bias and RMSE were estimated for the samevalues of Np and R as considered before. As seenfrom Table 3, the bias (20-40%) and RMSE (30-60%) have increased significantly in comparison tothe previous case.

Table 3: Bias and RMSE of Gumbel quantile estimates from 25years of simulated data consisting of 1000 observations.

The increased error can be attributed to large valuesof the exceedance rate (λ≈1-4), because the previouscase corresponds to fairly small values of λ (λ

Figure 4: Quantile estimates using wind speed data from He-lena, MT (R=1000 years)

In Figure 4, quantiles estimates for 1000 year returnperiod are plotted against the threshold velocity.Note that 2-LM and DHN estimates are in very closeagreement. 3-LM quantile estimates follow asmoother trend, which is helpful in identifying therepresentative threshold and quantile value. FromHelena, a 3-LM quantile estimate corresponding tohorizontal-stable part of the curves is roughly 80mph.

5 CONCLUSIONS

An accurate estimation of parameters of the Paretodistribution (GPD) in the peaks-over-threshold ap-proach is of considerable importance. The paper ex-amines the method of L-Moments and compares itsperformance against a widely acceptable method ofde Haan (1994). In the de Haan method, the first twomoments of peaks (or excesses) of log-transformeddata are used for parameter estimation, whereas theL-moment method utilizes linear combinations ofexpectations of order statistics of peaks in the origi-nal data. Two variations of the L-moment methodare discussed, namely, 2-LM and 3-LM, which in-corporate 2 and 3 L-moments, respectively, in theparameter estimation. Despite the procedural differ-ences, it is interesting to note that the de Haan and 2-LM estimates of the tail shape parameter (c) are infairly close agreement in simulated data (Tables 2and 3) as well as in actual wind speed data collectedin the United States (Figures 3-5). In this respect, anequivalence between the 2-LM and de Haan methodsis a notable point of the paper. Furthermore, the L-moment method is more insightful because of intui-tive meaning of L-moments, such as Gini's disper-sion coefficient (λ2) and skewness.

In the 3-LM method, higher order estimates of theshape parameter are obtained using the L-skewnessof POT data. The analysis of actual wind speed dataillustrates that 3-LM estimates of c tend to follow amore stable trend in the proximity of the upperbound estimates obtained from the de Haan method.Therefore, 3-LM method appears to be useful inidentifying meaningful upper bounds of designquantiles.

6 ACKNOWLEDGEMENT

The case study described in this paper is performedwith the wind data provided by Dr. Simiu for whichhe is greatfully acknowledged.

REFERENCES

Carers, J. and Maes, M.A. (1998). Identifying tailbounds and end points of random variables. J.Structural Safety, 20, 1-23.

Davison, A.C. and Smith, R.L. (1990). Models ofexceedances over high thresholds. J. RoyalStatistical Society, Series B, 52, pp.393-442.

de Haan, L. (1994). Extreme value Statistics. Ex-treme Value Theory and Applications, J. Ga-lambos, J. Lechner and E. Simiu (eds.), v. 1,pp.93-122.

Dougherty, A.M. and Corotis, R.B. (1998). Extremewind estimation: Theoretical considerations.Structural Safety and Reliability , N. Shirashi,M. Shinozuka and Y.K. Wen (eds.), v.2,pp.1351-1358.

Dougherty, A.M. Corotis, R.B and Schwartz, L.M.(1999). Extreme value theory and mixed dis-tributions - Applications. Applications of Sta-tistics and Probability, R.E. Melchers andM.G. Stewart (eds.), v.1, pp.27-33.

Ellingwood, B. (1980). Development of a probabil-ity-based load criterion for American NationalStandard A58. NBS Spec. Publ. 577, NationalBureau of Standards, Washington D.C.

Greenwood, J.A., Landwehr, J.M. and Matalas, N.C.(1979). Probability weighted moments: Defi-nition and relation to parameters of severaldistributions expressable in inverse form.Water Resources Research, 15(5), 1049-1054.

Gross, J., Heckert, A., Lechner, J. and Simiu, E.(1994). Novel extreme value estimation proce-dures: Application to extreme wind data. Ex-treme Value Theory and Applications, J. Ga-lambos, J. Lechner and E. Simiu (eds.), v. 1,pp.139-158.

Heckert, N.A. and Simiu, E. (1996). Extreme winddistribution tails: A peaks over threshold ap-proach. J. Structural Engineering, ASCE,122(5), 539-547.

20

40

60

80

100

30 35 40 45 50

Threshold (mph)

Qu

anti

le (

mp

h)

3-LM

2-LM

DHN

Hosking, J.R.M. (1990) L-moments analysis and es-timation of distributions using linear combina-tion of order statistics. J. of the Royal Statisti-cal Society, Series B, 52, 105-124.

Hosking, J.R.M. and Wallis, J.R. (1997). RegionalFrequency Analysis: An Approach based on L-Moments. Cambridge University Press, Cam-bridge, UK.

Kendall, M.G. and Stuart, A. (1977). Advanced The-ory of Statistics. Griffin, London, U.K.

Pandey, M.D., Van Gelder, P.H.A.J.M. and Vrijling,J.K. (1999) Application of information theoryto the assessment of effectiveness of L-kurtosiscriterion for quantile estimation. Accepted forpublication in J. Hydrologic Engineering,ASCE.

Pickand, J. (1975). Statistical inference using orderstatistics. Annals of Statistics, 3, 119-131.

Reiss, R.D. and Thomas, M. (1997). StatisticalAnalysis of Extreme Values with Applicationsto Insurance, Finance, Hydrology and OtherFields. Birkhauser Verlag, Basel, Switzerland.

Simiu, E. and Heckert, A. (1996). Extreme WindDistribution Tails: A Peak Over ThresholdApproach. J. Structural Engineering, ASCE,122(5), 539-547.

the use of l-moments in the peak over threshold approach for estimating extreme ... faculteit... ·...

Documents