slides of my presentation at the ties 2015

Extreme Value Modelling with Application toEnvironmental Control

Laique Merlin Djeutchouang1 and Abdel Hameed El-Shaarawi2

2National Water Research Institute, Canada and The American University of Cairo, Egypt

November 23, 2015

The 25th Annual Conference of The International Environmetrics Society

Al Ain, 2015

Extreme Value Modelling with Application to Environmental Control

Overview of this talk

1 Motivations and Background

2 Statistical Modelling for ExtremesBlock Maximum ApproachPeak over Threshold ApproachNotion of Return Value and its estimation

3 Applications to the Sea DataThe dataResults

4 Current developments: Additional Remarks

5 Summary

2 / 39

5 Summary

2 / 39

5 Summary

2 / 39

5 Summary

2 / 39

5 Summary

2 / 39

Motivations and Background

Why extreme value theory?

Extreme frequency and magnitude of weather

I A combined effect of irrigation and global warming.

Figure : The lake Chad has lost about 80% of it surface area since the 1960s.

3 / 39

Extreme frequency and magnitude of weather

Figure : The lake Chad has lost about 80% of it surface area since the 1960s.

3 / 39

Extreme amount of precipitations

Figure : Flooding in Dar-es-Salaam, May 2015, Tanzania.

4 / 39

Extremal Natural Events

I Why do some Earthquakes cause Tsunamis but others don’t?

I Magnitude of the quake, which is a measure of the amplitude of thelargest seismic wave recorded for the earthquake, must exceed acertain threshold.

ExamplesI The unforgotten and unfortunate, Indian Ocean Earthquake and

Tsunami of the December 26, 2004.

I The Fukushima tsunami of March 2011.

I The Tropical Storm Katrina (Hurricane Katrina).

I etc.

5 / 39

I etc.

5 / 39

I etc.

5 / 39

I etc.

5 / 39

I etc.

5 / 39

I etc.

5 / 39

Indian Ocean Earthquake and Tsunami

Figure : December 26, 2004 Indian Ocean Tsunami.

I With its magnitude of 9.0 on the Rchter Scale, over 227 898 peoplehave been confirmed dead.

I Making this the fourth largest death toll from an earthquake inrecorded history.

6 / 39

Extremal events: Disasters

I Cancer rates at Fukushima worse than Chermobyl (November 112015, Fukushima Watch), Cancer rates spike in Fukushimaprefecture.

Figure : Fukushima-Fire-Explosion-Radiation.

7 / 39

Extremal events: Disasters

Figure : Fukushima-Fire-Explosion-Radiation.

7 / 39

Theoretical Motivations

Objectives

I Basically, we are interested in a threshold value XT , from which onecan predict an extreme event based on the observed data

Procedure for estimatingI Choose an appropriate parametric distribution function or model.

I Calibrate the model such that it describes reasonably available data.

Estimation of the Return valueI Find reliable estimates of XT for large T , i.e rare events.

I Even for T large than the period of observation.

I This implies an extrapolation from observed to unobserved values.

I Extreme value theory provides a class of models to handle suchextrapolation.

8 / 39

Objectives

8 / 39

Objectives

8 / 39

Objectives

8 / 39

Objectives

8 / 39

Objectives

8 / 39

Objectives

8 / 39

Extremal types distribution

1 Gumbel distribution:

G(x) = exp

−exp

x−µ

)], where x ∈R and σ > 0,µ ∈R.

2 Frechet distribution:

F(x) =

x−µ

)−α, if x ≥ µ

0, if x < µ

, where α > 0,σ > 0,µ ∈R.

3 Weibull distribution:

W (x) =

µ− xσ

)α, if x < µ

1, if x ≥ µ

, where α > 0,σ > 0,µ ∈R.

9 / 39

G(x) = exp

−exp

x−µ

F(x) =

x−µ

)−α, if x ≥ µ

0, if x < µ

, where α > 0,σ > 0,µ ∈R.

W (x) =

µ− xσ

)α, if x < µ

1, if x ≥ µ

, where α > 0,σ > 0,µ ∈R.

9 / 39

G(x) = exp

−exp

x−µ

F(x) =

x−µ

)−α, if x ≥ µ

0, if x < µ

, where α > 0,σ > 0,µ ∈R.

W (x) =

µ− xσ

)α, if x < µ

1, if x ≥ µ

, where α > 0,σ > 0,µ ∈R.

9 / 39

Generalized Extreme Value (GEV) distribution

I The cumulative distribution function is given as follows

GEV(x ,ξ ,µ,σ) =

1 + ξ

(x−µ

))− 1ξ

, if ξ 6= 0,

−exp

x−µ

)], if ξ = 0.

I The three families can be deduced as follows:ξ < 0⇒Weibull(ξ ,µ,σ), called GEV type I,

ξ = 0⇒ Gumbel(ξ ,µ,σ), called GEV type II,

ξ > 0⇒ Frechet(ξ ,µ,σ), called GEV type III.

10 / 39

I The cumulative distribution function is given as follows

GEV(x ,ξ ,µ,σ) =

1 + ξ

(x−µ

))− 1ξ

, if ξ 6= 0,

−exp

x−µ

)], if ξ = 0.

I The three families can be deduced as follows:ξ < 0⇒Weibull(ξ ,µ,σ), called GEV type I,

ξ = 0⇒ Gumbel(ξ ,µ,σ), called GEV type II,

ξ > 0⇒ Frechet(ξ ,µ,σ), called GEV type III.

10 / 39

−4 0 2 4 6 8 10

GEV: pdf

xAll with: mu=−1, sigma=1

babili

ξ < 0ξ = 0ξ > 0

−4 0 2 4 6 8 10

GEV: cdf

xAll with: mu=−1, sigma=1

babili

ibution

ξ < 0ξ = 0ξ > 0

Figure : PDF and the corresponding CDF, ξ varying with 0,−0.5,0.5. 11 / 39

Height Endpoint of a distribution: Limit distribution

I Let X1, . . . ,Xk be iid with same distribution as X with the underlyingCDF F .

Height Endpoint of F

x∗ = supx ∈ R : F(x) < 1 ≤+∞. (2)

I AndZk = max

1≤i≤k(Xi)

L−→ x∗, as k −→+∞ (3)

I Then

FZk (z) −→k→∞G(z) =

0, if z < x∗

1, if z ≥ x∗, (4)

12 / 39

x∗ = supx ∈ R : F(x) < 1 ≤+∞. (2)

I AndZk = max

1≤i≤k(Xi)

L−→ x∗, as k −→+∞ (3)

I Then

FZk (z) −→k→∞G(z) =

0, if z < x∗

1, if z ≥ x∗, (4)

12 / 39

x∗ = supx ∈ R : F(x) < 1 ≤+∞. (2)

I AndZk = max

1≤i≤k(Xi)

L−→ x∗, as k −→+∞ (3)

I Then

FZk (z) −→k→∞G(z) =

0, if z < x∗

1, if z ≥ x∗, (4)

12 / 39

x∗ = supx ∈ R : F(x) < 1 ≤+∞. (2)

I AndZk = max

1≤i≤k(Xi)

L−→ x∗, as k −→+∞ (3)

I Then

FZk (z) −→k→∞G(z) =

0, if z < x∗

1, if z ≥ x∗, (4)

12 / 39

General Theorem in EVTFisher-Tippett-Gendenko theorem

1 Let X1, . . . ,Xk be a sequence of independent and identicallydistributed (iid) real random variables and

Zk = max1≤i≤k

(Xi) .

2 If there exists (ak ,bk )k∈N such that each ak > 0, k ∈ N and

limk−→+∞

Zk −bk

ak≤ x

= F(x) (5)

where F is a non degenerate cumulative function,3 Then the limit distribution F belongs to either Gumbel, Frechet or

Weibull family, that is

F(x)≡ GEV(x ,ξ ,µ,σ).

13 / 39

Zk = max1≤i≤k

(Xi) .

limk−→+∞

Zk −bk

ak≤ x

= F(x) (5)

where F is a non degenerate cumulative function,

3 Then the limit distribution F belongs to either Gumbel, Frechet orWeibull family, that is

13 / 39

Zk = max1≤i≤k

(Xi) .

limk−→+∞

Zk −bk

ak≤ x

= F(x) (5)

where F is a non degenerate cumulative function,3 Then the limit distribution F belongs to either Gumbel, Frechet or

Weibull family, that is

13 / 39

Statistical Modelling for Extremes

Extreme value modelling

I Let us consider an iid observations of X1,X2, . . . ,Xk , with k = m×n.

Block maxima method (BM)I The daily maximum temperature or precipitation,

I The monthly or annual maximum temperature or precipitation.

peak above the threshold method (POT)

Others modelling approaches of extremesI The Poison point process method, etc.

14 / 39

Block Maximum Approach

Extreme value modellingBlock Maxima Approach (BM)

I So let us set

Zt,n = max(t−1)n+1≤j≤tn

(Xj) , t = 1,2, . . . ,m, (6)

I Assume the new observations of Z1,n, . . . ,Zm,n are iid with thedistribution GEV(ξ ,µ,σ).

I Therefore the log-likelihood function for BM approach gives us

L1(θ) =−m logσ −m

∑t=1

B− 1

t −(

∑t=1

logBt , (7)

I Where Bt = 1 + ξ

(zt −µ

15 / 39

I So let us set

(Xj) , t = 1,2, . . . ,m, (6)

∑t=1

B− 1

t −(

∑t=1

logBt , (7)

I Where Bt = 1 + ξ

(zt −µ

15 / 39

I So let us set

(Xj) , t = 1,2, . . . ,m, (6)

∑t=1

B− 1

t −(

∑t=1

logBt , (7)

I Where Bt = 1 + ξ

(zt −µ

15 / 39

I So let us set

(Xj) , t = 1,2, . . . ,m, (6)

∑t=1

B− 1

t −(

∑t=1

logBt , (7)

I Where Bt = 1 + ξ

(zt −µ

15 / 39

I So let us set

(Xj) , t = 1,2, . . . ,m, (6)

∑t=1

B− 1

t −(

∑t=1

logBt , (7)

I Where Bt = 1 + ξ

(zt −µ

15 / 39

Block Maxima Approach (BM)Estimating of model parameters by ML

I The ML estimates θ =(

ξ , µ, σ)

are then found by solving theoptimization problem

arg maxθ=(ξ ,µ,σ)

L1(θ) (8)

I For inference and confidence interval estimation, by applying thecentral limit theorem in multi-dimension,

Θ−θ

)∼N (O,Ω) (9)

I Where Ω is the variance-covariance matrix of Θ:

Ω = [E(I1(θ))]−1 .

16 / 39

ξ , µ, σ)

L1(θ) (8)

Θ−θ

)∼N (O,Ω) (9)

Ω = [E(I1(θ))]−1 .

16 / 39

ξ , µ, σ)

L1(θ) (8)

Θ−θ

)∼N (O,Ω) (9)

Ω = [E(I1(θ))]−1 .

16 / 39

ML estimates of θ = (ξ ,µ,σ)How accurate are parameter estimates

I From the approximate normality of ML estimators, a 100(1−α)%confidence interval for GEV parameter θ1 = ξ ,θ2 = µ,θ3 = σ isgiven as follows

θi ±Φ−1(

1− α

)√ωii

m, with i = 1,2,3 , (10)

I Where Φ is the cumulative distribution function of the standardnormal distribution, and ωii denotes the i-th diagonal coefficient of Ω

17 / 39

ML estimates of θ = (ξ ,µ,σ)How accurate are parameter estimates

I From the approximate normality of ML estimators, a 100(1−α)%confidence interval for GEV parameter θ1 = ξ ,θ2 = µ,θ3 = σ isgiven as follows

θi ±Φ−1(

1− α

)√ωii

m, with i = 1,2,3 , (10)

I Where Φ is the cumulative distribution function of the standardnormal distribution, and ωii denotes the i-th diagonal coefficient of Ω

17 / 39

Peak over Threshold Approach

POT framework

I Let us consider a high threshold u. We want to use all the excessesXi −u for those Xi > u as well as the indicator of Xi < u for thoseobservations below u to estimate the parameters.

I We have shown that

Fu (y) = PXi −u ≤ y | Xi > u= 1−(

1 +ξ yσ

)− 1ξ

, (11)

I Which is the CDF a generalized Pareto distribution (GPD) (Pickands,1975) with parameters: location 0, scale σ = σ + ξ (u−µ) andshape ξ .

I On the other hand, Fu (y) can be interpreted like the probability thata value, temperature or precipitation amount, exceeds the thresholdu by no more than an amount value y , given that the threshold u isexceeded.

18 / 39

POT framework

Fu (y) = PXi −u ≤ y | Xi > u= 1−(

1 +ξ yσ

)− 1ξ

, (11)

18 / 39

POT framework

Fu (y) = PXi −u ≤ y | Xi > u= 1−(

1 +ξ yσ

)− 1ξ

, (11)

18 / 39

POT framework

Fu (y) = PXi −u ≤ y | Xi > u= 1−(

1 +ξ yσ

)− 1ξ

, (11)

18 / 39

Model adequacy or goodness of fit

I As with all statistical models, there are various goodness-of-fitproperties that should be considered to check the overall adequacyof the fitted model:

1 The probability plots: it is the plot of the points(F(x(i)),

in + 1

): i = 1,2, . . . ,n

, (12)

2 quantile-quantile (Q-Q plots): it is the plot of the points(F−1

),x(i)

): i = 1,2, . . . ,n

, (13)

3 and simply plotting a histogram of the data against the fitted density.

19 / 39

1 The probability plots: it is the plot of the points

(F(x(i)),

in + 1

): i = 1,2, . . . ,n

, (12)

),x(i)

): i = 1,2, . . . ,n

, (13)

19 / 39

in + 1

): i = 1,2, . . . ,n

, (12)

2 quantile-quantile (Q-Q plots): it is the plot of the points

(F−1

),x(i)

): i = 1,2, . . . ,n

, (13)

19 / 39

in + 1

): i = 1,2, . . . ,n

, (12)

),x(i)

): i = 1,2, . . . ,n

, (13)

19 / 39

Notion of Return Value and its estimation

Return value with the corresponding return period

I Let p be the probability that Zt,n exceeds a level Xp, that is

p = PZt,n > Xp

I p is called the upper tail probability and we deduce that

µ +σ

([− log(1−p)]−ξ −1

), for ξ 6= 0,

µ−σ log(− log(1−p)) , for ξ = 0.(14)

I XT ≡ Xp is referred to return value with return period of T = 1p which

is exceeded on average every T period of time.I Equivalently, the average time for a very rare event to exceed XT is

every T period of time.

20 / 39

p = PZt,n > Xp

µ +σ

([− log(1−p)]−ξ −1

), for ξ 6= 0,

µ−σ log(− log(1−p)) , for ξ = 0.(14)

every T period of time.

20 / 39

p = PZt,n > Xp

µ +σ

([− log(1−p)]−ξ −1

), for ξ 6= 0,

µ−σ log(− log(1−p)) , for ξ = 0.(14)

is exceeded on average every T period of time.

I Equivalently, the average time for a very rare event to exceed XT isevery T period of time.

20 / 39

p = PZt,n > Xp

µ +σ

([− log(1−p)]−ξ −1

), for ξ 6= 0,

µ−σ log(− log(1−p)) , for ξ = 0.(14)

every T period of time.20 / 39

Estimation of Return LevelHow accurate is the return value estimates?

I The return level XT is often interpreted as the expected waiting valueuntil another exceedance event.

I So for ξ 6= 0, using the estimates θ =(

ξ , µ, σ)

above, uT can be

estimated as follows

xT ≡ xp = µ +σ

([− log(1−p)]−ξ −1

I We have Xp = g(θ) = µ +σ

([− log(1−p)]−ξ −1

), (for ξ 6= 0),

I So the delta method gives us

)≈ (∇g(θ))T Ω∇g(θ),

21 / 39

ξ , µ, σ)

above, uT can be

xT ≡ xp = µ +σ

([− log(1−p)]−ξ −1

), (for ξ 6= 0),

)≈ (∇g(θ))T Ω∇g(θ),

21 / 39

ξ , µ, σ)

above, uT can be

xT ≡ xp = µ +σ

([− log(1−p)]−ξ −1

), (for ξ 6= 0),

)≈ (∇g(θ))T Ω∇g(θ),

21 / 39

ξ , µ, σ)

above, uT can be

xT ≡ xp = µ +σ

([− log(1−p)]−ξ −1

), (for ξ 6= 0),

)≈ (∇g(θ))T Ω∇g(θ),

21 / 39

Applications to the Sea Data

The data

Origin of the data and Description

I Data (1912 - 2003): Daily flow rate (m3/s) of Fraser River (Station ofHope) which drains a 220 000Km2 area.

I Variables: Daily maximum flow rate.

Figure : The map of Fraser River.

22 / 39

The data

Origin of the data and Description

Figure : The map of Fraser River.22 / 39

Description of the data and visualization

0 200 400 600 800 1000

Time (days)

Figure : Scatter plot of the data.

Months

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Figure : Monthly time series plots of the data.

23 / 39

Months

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Figure : Monthly time series plots of the data.

23 / 39

BM Approach: Yearly maxima

1920 1940 1960 1980 2000

Yearly Maxima

Figure : Yearly maxima plot of the daily flow rate of Fraser River in BritishColumbia, 1912-2003.

24 / 39

Results

BM Approach: estimate parameters

Parameters

Methods shape(

)location(µ) scale(σ)

MLEstimates -0.076 7963.50 1412.97

95%CI (−0.192,0.041) (7644.52,8282.48) (1195.35,1630.58)

L-MomentsEstimates -0.068 7963.50 1412.96

95%CI (−0.220,0.070) (7647.26,8261.46) (1190.40,1661.41)

Table : Yearly maxima: estimated parameters θ = (ξ , µ, σ) and their 95%confidence interval of the GEV distribution, using both ML and L-Momentsmethods.

25 / 39

Results

Adequacy of fit for GEV model with the yearly maxima

6000 8000 10000 12000

Model Quantiles

ical Q

uantile

6000 8000 12000

DailyFlowRate Empirical Quantiles

Quantile

odel S

1−1 line

regression line

95% confidence bands

4000 8000 12000 160000.0

N = 92 Bandwidth = 588.6

Density

Empirical

Modeled

2 5 10 50 200 1000

Return Period (years)

fevd(x = DailyFlowRate, data = BM.FR.Hope1, method = "MLE")

Figure : Diagnostic plots of the fit of the GEV distribution.26 / 39

Results

Return level estimation: Friser River

I We are now able to provide an estimate of the daily flow rate ofFraser River to protect the cities around the river against the floodswe would expect to see,

I once in T = 10 years;I once in a T = 100 years, etc.

T (in year) Estimate 95%CI2

T -year return level (in m3/s)

8 474.247 (8 130.125, 8 818.369)5 9 966.811 (9 508.747, 10 424.876)10 10 886.754 (10 305.199, 11 468.308)20 11 721.342 (10 964.387, 12 478.297)50 12 736.158 (11 664.145, 13 808.172)

100 13 450.993 (12 083.758, 14 818.227)

Table : Some estimate Return levels or values with their 95% confidenceinterval.

27 / 39

Results

Return level estimation: Friser River

T (in year) Estimate 95%CI2

T -year return level (in m3/s)

8 474.247 (8 130.125, 8 818.369)5 9 966.811 (9 508.747, 10 424.876)10 10 886.754 (10 305.199, 11 468.308)20 11 721.342 (10 964.387, 12 478.297)50 12 736.158 (11 664.145, 13 808.172)

100 13 450.993 (12 083.758, 14 818.227)

Table : Some estimate Return levels or values with their 95% confidenceinterval.

27 / 39

Results

Data: Monthly maximaAdequacy of fit for GEV model with the monthly maxima

Date JulianDay True.Date Year Month.num Month.cha Day DailyFlowRate21938 72-03-23 83.00 812.00 1972.00 3.00 Mar 23.00 3170.008096 34-04-30 120.00 -13030.00 1934.00 4.00 Apr 30.00 8240.00

13241 48-05-31 152.00 -7885.00 1948.00 5.00 May 31.00 15200.0013242 48-06-01 153.00 -7884.00 1948.00 6.00 Jun 1.00 14800.0015828 55-07-01 182.00 -5298.00 1955.00 7.00 Jul 1.00 11100.003077 20-08-02 215.00 -18049.00 1920.00 8.00 Aug 2.00 7650.00

25764 82-09-13 256.00 4638.00 1982.00 9.00 Sep 13.00 6260.003148 20-10-12 286.00 -17978.00 1920.00 10.00 Oct 12.00 5320.00

17416 59-11-05 309.00 -3710.00 1959.00 11.00 Nov 5.00 4130.0025139 80-12-27 362.00 4013.00 1980.00 12.00 Dec 27.00 4210.0025144 81-01-01 1.00 4018.00 1981.00 1.00 Jan 1.00 2760.0018242 62-02-08 39.00 -2884.00 1962.00 2.00 Feb 8.00 2940.00

Table : Monthly Maxima of the daily flow rate of Fraser, British Columbia; datarecorded from March 1912 to December 2003.

5000 15000 25000

Model Quantiles

ical Q

uantile

4000 8000 12000

Quantile

odel S

1−1 line

regression line

0 5000 150000.0

N = 12 Bandwidth = 2070

Density

Empirical

Modeled

2 5 10 50 200 1000

−1.0

Figure : Diagnostic plots of the fit of the GEV model to the monthly maxima ofthe daily flow rate of Fraser River in British Columbia, 1912-2003.

28 / 39

Results

Data: Monthly maximaAdequacy of fit for GEV model with the monthly maxima

5000 15000 25000

Model Quantiles

ical Q

uantile

4000 8000 12000

Quantile

odel S

1−1 line

regression line

0 5000 150000.0

N = 12 Bandwidth = 2070

Density

Empirical

Modeled

2 5 10 50 200 1000

−1.0

Figure : Diagnostic plots of the fit of the GEV model to the monthly maxima ofthe daily flow rate of Fraser River in British Columbia, 1912-2003.

28 / 39

Results

POT Approach: Threshold selection

I Only 14 observation exceed 12 500m3/s, and 99 values exceed 10500m3/s .

I Therefore, in order to ensure we have enough data and to moreeasily interpret the mean residual life plot, we will restrict it to therange of 350 (minimum value) to 10 500m3/s.

0 2000 4000 6000 8000 10000

threshrange.plot(x = DailyFlowRate.FR, r = c(350, 10500), type = "GP", set.panels = F)

Threshold

0 2000 4000 6000 8000 10000

−0.2

Threshold

0 2000 4000 6000 8000 10000

Figure : Threshold selection diagnostic plots.

29 / 39

Results

POT Approach: Threshold selection

0 2000 4000 6000 8000 10000

−100

threshrange.plot(x = DailyFlowRate.FR, r = c(350, 10500), type = "GP", set.panels = F)

Threshold

0 2000 4000 6000 8000 10000

−0.2

−0.1

Threshold

0 2000 4000 6000 8000 10000

Figure : Threshold selection diagnostic plots.

29 / 39

Results

POT Approach: Estimate parameters

I The selected threshold POT model estimation is u =7 550m3/s.

Parameters

shape(

)scale

Estimates -0.0734 1282.0895%CI (−0.120,−0.027) (1192.55,1371.61)

Table : Estimated parameters ξ and σu and confidence intervals, with 95% levelof confidence, of the GPD fitted to the daily flow rate of Fraser River, havingexceeded the threshold u = 7 550m3/s.

30 / 39

Results

POT Approach: Estimate parameters

I The selected threshold POT model estimation is u =7 550m3/s.

Parameters

shape(

)scale

Estimates -0.0734 1282.0895%CI (−0.120,−0.027) (1192.55,1371.61)

Table : Estimated parameters ξ and σu and confidence intervals, with 95% levelof confidence, of the GPD fitted to the daily flow rate of Fraser River, havingexceeded the threshold u = 7 550m3/s.

30 / 39

Results

POT Approach: Adequacy of the model

I The Model.1, that incorporates the seasonality, models the scaleparameter σu as follows

σu(t) = exp(φ0 + φ1 cos(2π× t/365.25) + φ2 sin(2π× t/365.25)) ,(16)

where t = 1,2, . . . ,365.

Figure : Diagnostic plots for the fitting of GPD model.

31 / 39

Results

POT Approach: Adequacy of the model

Figure : Diagnostic plots for the fitting of GPD model.

31 / 39

Current developments: Additional Remarks

Review of MLE for BM method: Additional Remarks

I The idea is to use the likelihood based to estimate the nonstationarity effects (time dependence).

I For the location parameter µ we propose on the one hand (trendeffect):

µ(t) = φ0 + φ1t, for linear trend

µ(t) = φ0 + φ1t + φ2t2, for quadratic trend

I And on the other hand using (seasonality effect):

µ(t) = φ0 + φ1 cos(2π t) + φ2 sin(2π t) (17)

I So, θ = (φ0,φ1,φ2,σ0,ξ0) with φ0,φ1,φ2 the parameters related tothe location µ(t) via the above regression model.

32 / 39

Review of POT approach: Formulation of the idea

Recall: James Pickands, 1975

Xi −u > y | Xi > u, i = 1,2 . . .n follows a generalized Pareto distributionwith parameters: location 0, scale σu = σ + ξ (u−µ) and the shape ξ .

I To estimate our model parameters θ = (ξ ,µ,σ), we want to use allthe extremes Yi = Xi −u for those Xi > u as well as those Xi ≤ u tocompute the likelihood function L2(θ).

I Let us consider the indicator variable:

∆i = 1Xi>u =

1, if Xi exceeds u,

0, otherwise.(18)

33 / 39

∆i = 1Xi>u =

1, if Xi exceeds u,

0, otherwise.(18)

33 / 39

∆i = 1Xi>u =

1, if Xi exceeds u,

0, otherwise.(18)

33 / 39

Review of POT approach: Formulation of the idea...

I So we have P∆i = 1= PXi > u= pn, from Eqn. (??), andthen P∆i = 0= 1−pn.

I Actually our observed data with which we are going to work are theiid observations (x1−u,δ1), . . . ,(xn−u,δn) of respectively(Y1,∆1), . . . ,(Yn,∆n).

I Since we need to work with our new data, we are going to define thecorresponding function f (y ,δ ) joint probability density function of(Y ,∆) = (X −u,∆).

I In order to derive this density, we assume that for a given thresholdu:

Py ≤ Y < y + h |∆ = 0= Py ≤ Y < y + h | X ≤ u ≈ h (19)

for any h positively close to 0.

34 / 39

Py ≤ Y < y + h |∆ = 0= Py ≤ Y < y + h | X ≤ u ≈ h (19)

34 / 39

Py ≤ Y < y + h |∆ = 0= Py ≤ Y < y + h | X ≤ u ≈ h (19)

34 / 39

Py ≤ Y < y + h |∆ = 0= Py ≤ Y < y + h | X ≤ u ≈ h (19)

34 / 39

Recall

Let us recall that the density function of (Y ,∆) is defined as:

f (y ,δ ) = limh→0

Py ≤ Y < y + h,∆ = δh

, with δ ∈ 0,1 . (20)

I Case 1: δ = 1, that is y > 0, we have: f (y ,1) = fu(y)pn.I Case 2: δ = 0, that is y ≤ 0, we have: f (y ,0) = 1−pn.

I Therefore, the probability density function of (Y ,∆) is given by:

f (y ,δ ) = (fu(y)pn)δ (1−pn)

1−δ . (21)

I Recall that pn and fu(y) are respectively ...

35 / 39

Recall

f (y ,δ ) = limh→0

Py ≤ Y < y + h,∆ = δh

, with δ ∈ 0,1 . (20)

I Case 1: δ = 1, that is y > 0, we have: f (y ,1) = fu(y)pn.

I Case 2: δ = 0, that is y ≤ 0, we have: f (y ,0) = 1−pn.

f (y ,δ ) = (fu(y)pn)δ (1−pn)

1−δ . (21)

35 / 39

Recall

f (y ,δ ) = limh→0

Py ≤ Y < y + h,∆ = δh

, with δ ∈ 0,1 . (20)

f (y ,δ ) = (fu(y)pn)δ (1−pn)

1−δ . (21)

35 / 39

Recall

f (y ,δ ) = limh→0

Py ≤ Y < y + h,∆ = δh

, with δ ∈ 0,1 . (20)

f (y ,δ ) = (fu(y)pn)δ (1−pn)

1−δ . (21)

35 / 39

Recall

f (y ,δ ) = limh→0

Py ≤ Y < y + h,∆ = δh

, with δ ∈ 0,1 . (20)

f (y ,δ ) = (fu(y)pn)δ (1−pn)

1−δ . (21)

I Recall that pn and fu(y) are respectively ...35 / 39

Review of POT approach: ML Estimation

I So, given the new iid observations (x1−u,δ1), . . . ,(xn−u,δn),L2(θ) is defined by (with yi = xi −u,):

L2(θ) =n

∏i=1

f (yi ,δi) =n

∏i=1

(fu (xi −u)pn)δi (1−pn)1−δi ,

I Therefore the log-likelihood is:

L2(θ) = (n−Nu) log(1−pn)−Nu log(nσ)−(

∑i=1δi=1

logDi (22)

I where Di = 1 + ξ

(xi −µ

), and Nu =

∑i=1

δi is the number of

observations that have exceeded the threshold u

36 / 39

L2(θ) =n

∏i=1

f (yi ,δi) =n

∏i=1

∑i=1δi=1

logDi (22)

I where Di = 1 + ξ

(xi −µ

), and Nu =

∑i=1

observations that have exceeded the threshold u

36 / 39

L2(θ) =n

∏i=1

f (yi ,δi) =n

∏i=1

∑i=1δi=1

logDi (22)

I where Di = 1 + ξ

(xi −µ

), and Nu =

∑i=1

observations that have exceeded the threshold u36 / 39

Summary

Modelling Challenges

Covariable effects and Cluster dependenceI Location, direction, seasonality,...

I Multicovariables in practice

I e.g: Storms independent observed many times at many locations

Others challengesI Threshold estimation

I Parameters estimation

37 / 39

Summary

37 / 39

Summary

37 / 39

Summary

37 / 39

Summary

37 / 39

Summary

37 / 39

Summary

37 / 39

Summary

37 / 39

Summary

37 / 39

Summary

37 / 39

Summary

37 / 39

Summary

37 / 39

Summary

37 / 39

Summary

37 / 39

Summary

37 / 39

Summary

Key References

J. Pickands.Statistical inference using extremes order statistics.The Annals of Statistics, 3(1): 119 â131, 1975.

D. Depuis.Exceedances over high threshold: A guide to threshold selection.DalTech, Dalhousie University, 1:111â121, 1998.

A. J. McNeil and T. Saladin.The peaks over thresholds method for estimating high quantiles ofloss distributions.In Proceedings of 27-th International ASTIN Colloquium, pages23â43, CiteSeer 5M, 1997.

38 / 39

Summary

THANK YOU FOR YOUR KINDATTENTION

39 / 39

slides of my presentation at the ties 2015

Documents