center for stochastic processes for stochastic processes r%. department of statistics 10 university...
TRANSCRIPT
CENTER FOR STOCHASTIC PROCESSES
r%. Department of Statistics10 University of North Carolina
Cy Chapel Hill, North Carolina
ITr
0V
ON A BASIS FOR "PEAKS OVER THRESHOLD" MODELING
by
M.R. Leadbetter
ODTICEECTE
Technical Report No. 285
March 1990
DI--WUTONSTA1EMWTA7___ ___ pu(4rlme() ( ~04 027_r~ptiui aR.Ihcu
4* 6- . a A :7..
do 0 0 -, 0*
P. v .* * I ~- 4 a a
rl 'to 8 ,- - ua a
2: - C6-. -a ~ 02 -3 a
t t
t; 6 it j 6 .1. i
'a~~ C!~I-~~ I~ gi oj~ ~ d~&Onow ** - .e
39 8
'a a j 2Au ~ a
* 0- . .-
Li .a -a I. .A - a -;~L a A -. a
a. daSs 4
- L
UNCLASSIFIEDSECURITY CLASSIFICATION OF THIS PAGE
IForm ApprovedREPORT DOCUMENTATION PAGE OMB Nov0704018
I OM o 70- 8
la. REPORT SECURITY CLASSIFICATION lb, RESTRICTIVE MARKINGS
UnclassifiedZa. SECURITY CLASSIFICATION AUTHORITY 3. DISTRIBUTION/AVAILABILITY OF REPORT
N/A Approved for Public Release;2b. DECLASSIFICATON/DOWNGRADING SCHEDULEN/A Distribut ion Unlimited
4. PERFORMING ORGANIZATION REPORT NUMBER(S) S. MONITORING ORGANIZATION REPORT NUMBER(S)
Technical Report No. 285 AFOSR-TR- ( 0 Of6 5 16a. NAME OF PERFORMING ORGANIZATION 6b. OFFICE SYMBOL 7a. NAME OF MONITORING ORGANIZATIONUniversity of North Carolina (If applicable)
Center for Stochastic Processel AFOSR/NM
6c. ADDRESS (City, State, and ZIP Code) 7b. ADDRESS(City, State, and ZIP Code)Statistics Department Bldg. 410CB #3260, Phillips Hall Bolling Air Force Base, DC 20332-6448Chapel Hill, NC 27599-3260
8a. NAME OF FUNDING /SPONSORING 8b. OFFICE SYMBOL 9. PROCUREMENT INSTRUMENT IDENTIFICATION NUMBERORGANIZATION (If applicable)
AFOSR NM F49620 85C 0144
Bc ADDRESS (City, State, and ZIP Code) 10 SOURCE OF FUNDING NUMBERSBldg. 410 PROGRAM PROJECT TASK WORK UNIT
ELEMENT NO. NO, NO ACCESS ON NOBolling AFB, DC 20332-6448 6.1102F 2304
12. TITLE (include Security Classification)
On a basis for "peaks over threshold" modeling
12. PERSONAL AUTHOR(S)Leadbetter, M.R.
13a. TYPE OF REPORT 13b TIME COVERED 14. DATE OF REPORT (Year, Month, Day) 15. PAGE COUNT
preprint FROM TO 1990, March 1016. SUPPLEMENTARY NOTATION
N/A
17. COSATI CODES 18. SUBJECT TERMS (Continue on reverse if necessary and identit, by block number)
FIELD GROUP SUB-GROUP
19. ABSTRACT (Continue on reverse if necessary and identify by block number)
(Please see over)
20 DISTRIBUTION iAVAILABIL TY OF ABSTRACT 21 ABSTRACT SECURITY CLASSIFICATIONQ UNCLASSIFIEDlUNLIMITED 0 SAME AS RPT C3 DTIC USERS UNCLASSIFIED
22a NAME OF RESPONSIBLE INDIVIDUAL 22b TELEPHONE (Include Area Code) 22c OFFICE SYMBOL
Professor Ey'tan Barouch (202)767-5026 AFOSR/NM
DD Form 1473, JUN 86 Previous editions are obsolete SECURITY CLASSiFICATiON Or Tv-,S PAGE
UNCLASSI FED
19. Abstract-. . . &
Abs.t. cj. "Peaks over Threshold" ("POT") models commonly used e.g. in
hydrology, assume that peak values of an lid or st- lonary sequence Xi above a
high value u, occur at Poisson points, and the ex.: , values of the peak above
u are independent with an arbitrary common d.f. G. Motivation for these models
has been provided by R.L. Smith (cf. [7],[83), by using Pareto-type
approximations of Pickands ([6]) for distributions of such excess values.
These works strongly suggest that the Pareto family provides the appropriate
class of distributions C for the POT model.
In the present paper we consider the point process of excess values of
peaks above a high level u and demonstrate that this converges in distribution
to a Compound Poisson Process as u -# a under appropriate assumptions. It is
shown that the multiplicity distribution of this limit (i.e. the limiting
distribution of excess values of peaks) must belong to the Pareto family and
detailed forms are given for the normalizing constants involved. This exhibits
the POT model specifically as a limit for the point process of excesses of
peaks and delineates the distributions involved.
ON A BASIS FOR "PEAKS OVER THRESHOLD" MODELING
by
M.R. Leadbetter
University of North Carolina
Abstract. "'Peaks over Threshold". F' models commonly used e.g. inhydrology, assume that peak values of an iid or stationary sequence Xi above a
high value u, occur at Poisson points, and the excess values of the peak above
u are independent with an arbitrary common d.f. G. Motivation for these models
has been provided by *t6O Smith-4cf-:E by using Pareto-type
approximations of Pickands4E6-7 for distributions of such excess values.
These works strongly suggest that the Pareto family provides the appropriate
class of distributions G for the POT model.
In Q- prs m-t paper - considereithe point process of excess values of
peaks above a high level u and demonstrate that this converges in distributionraa -L ,, , ; l r , '
to a Compound Poisson Process as u under- opriate assumptions. It is
shown that the multiplicity distribution of this limit (i.e. the limiting
distribution of excess values of peaks) must belong to the Pareto family and
detailed forms are given for the normalizing constants involved. This exhibits
the POT model specifically as a limit for the point process of excesses of
peaks and delineates the distributions involved. )
Research sponsored by the Air Force Office of Scientific Research Contract No.F49260 &9C 0144.
1. Introduction.
In what are sometimes called "Peaks over Threshold" (POT) models (cf.
[7]), the excess values over a high level u by an observed time series are
assumed to occur at Poisson points and to have arbitrary common distributions.
That is if {Xi: i=1.2,...} is e.g. a stationary (or iid) sequence, the
exceedance points {i: X. > u} are assumed to be Poisson and the corresponding1
excess values (Xi-u)+ to be independent with an arbitrary distribution.
The Poisson nature of the occurrence of exceedances is intuitively clear
since e.g. if Xi are iid with d.f. F, the number of exceedances of a high
level un by X 1.... n , is binomial with parameters (n, 1-F(u n)) and hence
approximately Poisson if n(l-F(un)) converges to some value T > 0. This will
be made more precise below by a time normalization. Further motivation for the
model is provided by R.L. Smith ([7],[8]) based on theory of Pickands [6],
restricting the distribution for excess values to a "generalized Pareto" (GP)
form
(1.1) Ga1 3 (x) = 1 - (1 + ax/p)-1/a P > 0, a A 0
= 1 - e-x /P P > 0, a = 0
where the range of x is (O,w) if a>O and (0,-a- 1P) if a<O.
This class is a flexible 2-parameter family, but more importantly for a
wide class of F's the excess distribution
(1.2) F (x) = P{X - u xIX > u} Accession For
NTIS GRA&Iis approximately GP in a sense shown by Pickands [6]. viz. DTIC TAB i
Unanmnowiced 0Justificitin
(1.3) inf sup IFu(X) - G (x) - 0 as u -- _
x ByDi~tribution/
Availability Codes
fAvail and/oMe pca
2
for some fixed a. That is for high levels u. 13 may be chosen (depending on u)
so that G a(x) is uniformly close to Fu(x).
Write xF for the right endpoint (sup{x: F(x) < 1) of the d.f. F and
F(x) = 1-F(x), the tail of F. Then the class of d.f.'s F for which the above
GP approximation holds includes all those satisfying
(1.4) F(u+xg(u))F(u) -- U(x) as u -- xF
for some function g(u) > 0. some d.f. C and all 0 < x < xF ( w). It is known
([6]) that any such G in (1.4) must be G.P. as in (1.1) for some a,13.
Note that (1.4) holds for all d.f.'s F of interest in extreme value theory
i.e. such that if Mn =............Xn), P a(M-b n x} (= n(x/an +b )) has a
non degenerate limit A(x). For example if G(x) = e-x, (1.1) is a classical
domain of attraction criterion for a "Type I" extreme value distribution
A(x) = exp(-e-x). If instead F has a regulary varying tail (l-F(ux))/(l-F(u))
-- a , a>O, as u -- , each x > 0, then (1.1) holds with G(x) = (l+x)-a .
g(u) = u, and A(x) is then "Type II" i.e. A(x) = exp(-x-a), x > 0.
Hence in a wide variety of cases of interest the distribution Fu(X) of
excesses (given by (1.2)) of the level u is approximately GP, C (x) in the
sense stated, where a is fixed but P can change with u. As discussed in [7]
this provides significant intuitive support for the POT model. Our purpose
here is to further justify the model by exhibiting it as the limit of point
processes of excesses of high levels, the limiting distribution of excess
values being shown to be GP, C (x) where now P as well as a, is independenta ,1
of u.
For clarity this will be shown for iid sequences in Section 2 and extended
to dependent (mixing) situations in Section 3. In the latter case high serial
correlation can cause clustering of exceedances of high levels and the peak
3
values are then defined to be the largest in each cluster. Two points worth
noting are (i) the dependence modifies the theory via the introduction of a
single parameter, the "extremal index" (essentially the inverse of mean cluster
size) and (ii) the CP form applies to the peak values but not necessarily to
other cluster properties such as their lengths. A corresponding theory will be
indicated in Section 3 for other such cases.
2. The iid case.
Let funI be a sequence of levels such that
(2.1) n(l-F(un) ) --+ T > 0.
Define point processes Nn to consist of the points i/n for which Xi > u i.e.
the exceedance points normalized by 1/n. N is thus defined on the positiven
real line but it will be convenient to restrict attention to the unit interval,
corresponding to exceedances among the n sample values X ....,X n . Hence Nn(B)
is defined for (Borel) subsets B C (0,1] by Nn(B) = #{i/n C B: Xi > u }.
It is trivial to show that under (2.1) that N d-+ N where N is a Poissonn
Process on (0,1] with intensity T. For if I = (a.b] C (0.1]. Nn(I) is binomial
with parameters [nb]-[na], p n=l-F(Un) ( ] denoting integer part) and nPn--+ T
so that
P(Nn(I) = r) --+ e-T(b-a[T(b-a)]r/r! = P(N(I) = r).
Hence Nn(I) d-*N(I) and by independence if 11-..... k are disjoint
(Nn(I) .... Nn(I) d- (N(I1)...N(Y)
drwhich is sufficient ([2]) to show full weak convergence N nd N.
Now associate with each point of Nn (i/n such that Xi ) un) the
corresponding excess value X -u to give the "point process N of excesses".i n n
4
Technically perhaps N should be regarded as a "marked point process" (or as ann
atomic random measure) since the (Xi-un) are not necessarily integer valued but
it will be convenient (and legitimate) to call it a point process whose events
(at (i/n: Xi ) un)) have (not necessarily integer) multiplicities (X-un).
Since, as above, the positions of the excesses converge to a Poisson
Process, it seems evident that one should expect any limit N* for N* to be an
compound Poisson process having events at Poisson points with intensity T and
(independent) multiplicities with some common d.f. G. Such a result may be
shown (cf. [1]) but here we give sufficient conditions for such convergence.
Here and below we write N = CP(T,C) to denote a Compound Poisson point process
whose Poisson events have intensity T with (not necessarily integer-valued)
multiplicities having d.f. C.
Theorem 2.1. Let Xi i=1.2... be iid with d.f. F satisfying (1.4) for some g,
G, and let (un) be levels satisfying (2.1). Then a N* d N , CP(T,G) where Tn n nis as in (2.1), G as in (1.4) and an = I/g(u ).
Proof: If Gn(x) (= Fu (x/an)) denotes the conditional d.f. of an(Xl-un) givenn
X1 > un . then clearly
Sexp{-san(X1- U)+ = F(u n) + (u n) fSme-sxdGn(x).
But from (1.4), Gn(x) -- G(x) and hence f 0 e -sxdG (x) --+ (s) = ~e-sxdG(x) so
that if I = (a,b] C (0,1] contains mn points i/n (mn ~ n(b-a))
nn
n nnexp{-sa(Xl-U)+} = [I.-(T/n)(I-*(S))(l+o(1))]n
which is the Laplace Transform ge-sN (a'b] where NN is CP(rC). Since N (I) is
5
the sum I (Xi-un)+ of m iid terms, it follows that a N*(I) d-N *(I). Hencei/nEI
n
by independence
(anN*(i1) ... an(i)d. (Nn1)..Nni)
for any disjoint I I so that aN * d+ N* as required. 03l'''k .n n
Thus in the iid case the (normalized) point process a N of excesses overnTn
u has a Compound Poisson limit and may be regarded as approximately CP(T.G)
for large n. Note again that G is a GP distribution G for some fixed a..
This provides a strong basis for the POT model in the iid case.
3. Stationary sequences
A stationary sequence X1 ,X2 .... can exhibit both "long range" and "local"
dependence between the X Here the former, but not the latter, will be
restricted by a "mixing condition" which enables the collection of X.'s into1
groups which are approximately independent but with possible high dependence
within groups. Strong mixing will suffice to restrict long range dependence.
However one may "tailor" this to the problem at hand with a slightly weaker
restriction A(Vn) defined for any sequence of constants vn as follows. For
lj~k~n write 9jk(vn) = a((Xs-vn)+: j~s~k} and say that A(Vn) holds if for
le<n IP(AnB) - P(A)P(B)I a whenever A C 1.j(vn), B E J+e,n ) lj<n-t
and a --+0 for some e = o(n).n~en n
High local dependence is reflected in the presence of clustering of
exceedances of high levels. "Clusters" will be defined precisely below, but we
first note that for levels (u n) satisfying (2.1) the mean size of a cluster
typically converges to a parameter whose inverse 0 is sometimes called the
"extremal index" of the sequence (Xi). Specifically 0 is defined by the
property that if Mn = max(X, X2. .. Xn) and un satisfies (2.1) then
6
(3.1) P(Mn Un) --+ e ,
6 being independent of T. (For i.i.d. sequences 6 = 1). Such 6 exists under
general conditions (cf [4], Sec. 3.7)
Clusters of exceedances of high levels may be defined in various ways, the
most obvious being as runs of consecutive exceedances. However if A(vn) holds
for a given sequence {vn} of levels the following "block definition" is more
convenient (and often asymptotically equivalent to that using runs). Choose
integers k -- -, kn = o(n), satisfying
(3.2) kn(a ne + Rn/n) --+0.n
Write r n=[n/k ] and divide the integers (1.. .n) into consecutive blocks
J, = {(i-l)rn+l. (i-1)rn+2,....irn ldi~kn
J =kr~k +2 .... n}JR +1 = k knr n+ 1 ' k nr n "2. .n
n
and regard the exceedances (if any) in a block as forming a cluster. The
choice of kn (or equivalently rn) is flexible, subject to the growth
restriction (3.2).
Obviously this block definition can count a run of consecutive exceedances as
two or more clusters, if the run straddles more than one block so that the
block definition is less natural in some cases. However it can be more
appropriate than the runs definition for sequences with high local variability.
In any case we use the block definition for its convenience. Further if the
block Ji contains exceedances the first point. (i-l)rn+l, of Ji will be
regarded as the location of the cluster in Ji"
Define now a point process P of normalized cluster positions, i.e.nconsisting of the points {((i-1)rn+1)/n, 1 i kn: M(Ji) > Un} where M(E) is
7
written for max{Xj: jeE}. Associate with each point ((i-l)rn+l)/n of Pn the
corresponding maximum excess M(Ji)-un = max{(XJ-Un)+: j e Ji}. giving the
(again technically "marked") point process P* of peak excess values above then
level in the clusters. We show that a result like Theorem 2.1 holds in the
dependent case with P' replacing N*. In fact it may be shown (cf [5]) that forn n
i.i.d. sequences N and P are asymptotically equivalent in a strong sense asn n
also are N and P* so that P and P* generalize N and N* .n n n n n n
It was shown in Section 1 that in the iid case the exceedance point
process N has a Poisson limit with intensity T. This result is simplyn
generalized (cf. [1]) to show that under dependence Pn has a Poisson limit
(with intensity OT) whereas N itself has a Compound Poisson limit whose eventsn
occur a* the positions of clusters (i.e. points of P ), with multiplicities
given by (limiting) cluster sizes. However the (limiting) distribution for
multiplicities need not have a GP form and need not be totally determined by
the (tail of the) marginal d.f. F of the X1 . On the other hand it will be
shown below that P* has a CP(OT,G) distributional lintit where G is obtainedn
from the tail of F via (1.4). Thus G has GP form and the dependence influences
the limit only through the factor 0 in the intensity of the underlying Poisson
Process.
The Compound Poisson limit for P* will be obtained by considering then
asymptotic behavior of the maximum in a cluster and showing that the clusters
are essentially independent. The specific basic results needed are contained
in the following lemma,
Lemma 3.1. Let {Xn) be stationary with extremal index 0 > 0 and marginal d.f.
F satisfying (1.4). and let A(vn) hold with vn=un+xg(un), each x > 0. where un
satisfies (2.1). Let {kn} satisfy (3.2) and write an = Wu . Then (with
r = [n/k ]). as n--n n
8
(i) P(a (Mr -Un) > xI (x ) x 0n n
(ii) P(an(Mr -Un) xIM r >Un -d G(x)n n
k(iii) [Sexp{-sa n(Mr -un)+] n -- exp(-&r(1-#(s)))
n
where #(s) = O e-SXdG(x)
Proof: Since n[1-F(un+a nx)] ~ TF(un+xg(un) )/F (un) -- 70(x) and (Xn} has
extremal index 6. it follows that
P{M n Un+a-lx) --+ exp(-OiG(x)).
But it follows in a standard way from the mixing condition A(Vn) (cf. [1. Lemmak k
2.3]) that P{Mn VnI - p nM vnI --+ 0 so that p n(M Un +anlx --* exp(-eOT(x))n n
from which (i) readily follows. The left hand side of (ii) is
1 - >u+ax/PM >U -=1 21 ?T +n n n n
by (i). so that (ii) follows. Finally (iii) follows by the first calculation
of Theorem 2.1 with Mr replacing Xi. using (ii). 0
n
The main result now follows in a similar way to Theorem 2.1 on using
approximate independence between the clusters.
Theorem 3.2. Let Xn) be stationary with extremal index 6 > 0, and marginal
d.f F satisfying (1.4). and let A(v n) hold with vn = u + xg(u ) each x 0.--n naif (32 n = nnad
where un satisfies (2.1). Let kn satisfy (3.2), rn = [nn and
an = (g(u))-l Then the point process Pn of peak values above un satisfies
a p* d-+ P* where P* is CP(OT,G) and G is as in (1.4).n n
9
Proof: Let I be a subinterval of (0,1] and J, = {(i-l)rn+l .... irn) as defined
above, 1 I k . Then it follows from the mixing conditions along the samen
lines as Lemma 2.2 of [3] (or Lemma 2.2 of [1]) that
9exp{-sa P*(I)} - 17 exp(-sanPn(Ji)} --+ 0(i:JiCIn
as n -- so that
knm(I)(l+o(1))
9exp(-saP*(I)) = [8exp-sa (M -U nn
--- exp(-OTm(I)(1-#(s))}
by Lemma 3.1, where #(s) = e-SXdG(x). Hence anPn() !- P(I) where P is
CP(OT,G) on (0,1]. Now if Il .... Ik are disjoint subintervals of (0,1] it
follows also as in Lemma 2.2 of [3] (or Lemma 2.2 of [1]) that
k k9exp{-a I s.P(I) - lexp-a n P (I )} --)0n J=l in j=l
from which it follows that
(anP *(i1)... anP *k) d. ((i).p*(ik
and hence that a p* i+ po.nn
Finally we reiterate that this result is one of many which can be obtained
involving different aspects of cluster structure. For example the Compound
Poisson limit for N was cited above, the multiplicities corresponding ton
cluster sizes. More complicated functions - such as the sum of powers of
excess values in a cluster - may also be considered and will lead to Compound
Poisson limits. Such cases may be useful in applications where damage from
high levels (e.g. high pollution episodes) may be modeled as a specific
10
function of the excess values. However the case of (excess) peak values in a
cluster is especially important (e.g. in describing severe floods, where damage
can well be a function of flood level). Theorem 3.2 establishing the POT
approximation is particularly useful since the multiplicity distribution then
depends only on the marginal d.f. F and moreover is known to have GP form.
References
[1] Hsing, T., HIUsler, J., Leadbetter, M.R., "On the exceedance point process for a
stationary sequence", Prob. Theor. Rel. Fields, 78, 97-112, 1988.
[2] Kallenberg, 0., Random Measures, Academic Press, N.Y. 1976
[3] Leadbetter, M.R. and Hsing, T., "Limit theorems for strongly mixing stationaryrandom measures" To appear in J. Stoch. Proc. and Applns., 1990.
[4] Leadbetter, M.R.. Lindgren, G., Rootz~n, H., "Extremes and related propertiesof random sequences and processes", Springer Statistics Series,1983.
[5] Leadbetter, M.R. and Nandagopalan. S., "On exceedance point processes undermild oscillation restrictions" Extreme Value Theory, J. HUsler andR.-D. Reiss (eds.) Proc. Oberwolfach 1987, Springer Lecture Notes inStatistics, No. 51, 1988.
[6] Pickands, J., "Statistical inference using extreme order statistics" Ann.Statist. 3, 1975, p. 119-131.
[7] Smith, R.L., "Threshold methods for sample extremes" in Statistical Extremesand Applications, J. Tiago de Oliveira Ed., NATO ASI Series, Reidel,Dordrecht, 1985, p. 623-638.
[8] Smith, R.L. "Estimating tails of probability distributions" Ann. Statist. 15,1987, p. 1174-1207.
S
ao Ua - 0a. A
U -a a aa a a. o~ 'a aI d 0 *@ ~ .0 Ia a. .. d .' U- a. *.- -~ -
- a . S Qa.4a ~ a. SDa. - SI W ia.g a- au
a. a -
.0 *~ : ~a. a.. a. a - a 4 -a a a. a. - a - a..2 ~ ~*a~ ~-. 'a .a-. - a
- a ~ ~ 4 5 S - -* - 11
~ ? ~.a a aa'a-a. a- SI a. ~ -I. ~ SI
00 U U 4 - 0
aa . S SI SIi08 - ga..-a.
- aaa. * - ~a.a-a a a .-U -. - ~- U 0. Li Hn
-'
- ~11 ~ ~'
~ :~ u~ V~ A ~ ~.: ~~. ~O(~
~xxa.xa.'0 2 ~ .
*0 S a .~ 55 2- a 0 - - 2* .J . -
a Is.
a a
.4
.~ ~ .0 a a a.
a g . a ~ . U aSI a. 55 II
a0I~i~m~w
2 a.*~1
SI
a. -.. ~U 2~ 8~ 4 a. a.
-SI. a. SI. U
.~ ~I.SI. 0~a .4 ~ K g 'a a~a.~ ~ - ~
fl.SIM
-
a. o..o..o .0 'a .~ -
-
a ~,
-.~
.., 8 ~ a- (4 * ~ - a£5~ ~ £
a a S a..Ii -~ a. * a. a a.. - a
0 I.. a ' ~rn
a -* a.
- a
0SI -a a.
j~ 1~. ~ a.~ -
'a a* -a. ~
.5'-~.4
0a-a.
SI
- - '4U- a- a.a.. 4
a. * 'a~ -
*~ 51.0 ~ SII SIa. *' -u -~ .U
A~
~ ~~
~ ..a ~