averaging day-ahead electricity price forecasts for ...rweron/assets/s3/serafinweron...part 2...

Averaging day-ahead electricity price forecastsfor autoregressive models across calibration

windows of various lengths∗

Grzegorz Marcjasz1,2, Tomasz Serafin1,2, Rafał Weron1

1Department of Operations Research, Faculty of Computer Science & Management

and 2Faculty of Pure and Applied Mathematics

Wrocław University of Science and Technology, Poland

∗ Supported by the National Science Center (PL) through grants no. 2015/17/B/HS4/00334 and 2016/23/G/HS4/01005

Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 1 / 39

Introduction

Marcjasz, Serafin & Weron (2018)


Introduction

Part 1Notation and previous results


Literature


Results of Hubicka, Marcjasz & Weron (2018)

ARX1 (Misiorek et al., 2006, SNDE)

Xd ,h = βh,0 + βh,1Xd−1,h + βh,2Xd−2,h + βh,3Xd−7,h︸︷︷︸autoregressive effects

+ βh,4Xd−1,min︸︷︷︸non-linear effects

+ βh,5Cd ,h︸︷︷︸load forecast

+ βh,6DSat + βh,7DSun + βh,8DMon︸︷︷︸weekday dummies

+εd ,h



GEFCom2014 (01.01.2011-17.12.2013)

1.1.2011 31.12.2011 29.12.2012 17.12.2013

LMP

[USD/M

Wh]

0

100

200

300

400⇒ Test period

1.1.2011 31.12.2011 29.12.2012 17.12.2013

System

load

[GW

h]

10

15

20

25

30

35



Rolling window scheme

Which window length is optimal? Ten days? Four weeks? Ayear?

No consistency in the EPF literature




1.1.2011 31.12.2011 29.12.2012 17.12.2013

LMP

[USD/M

Wh]

0

100

200

300

400⇒ Test period

728-day calibration window




1.1.2011 31.12.2011 29.12.2012 17.12.2013

LMP

[USD/M

Wh]

0

100

200

300

400⇒ Test period





Averaging forecasts: notation

Win(T ) - forecast for a T -day windowAW(τ) - simple arithmetic average of Win(T )’s

τ = (28,728) refers to 28- and 728-day windowsτ = (28:1092) refers to all 1065 windows ranging from 28 to1092 daysτ = (28:28:728) is the selection of 26 windows: 28-, 56-, 84-,..., 728-day

Mean Absolute Error (MAE)MAE = 1

24D

∑Dd=1

∑24h=1 |εd ,h|,

εd ,h = Pd ,h − Pd ,h is the forecast error for day d and hour h



Which windows to choose?

1.1.2011 31.12.2011 29.12.2012 17.12.2013

LMP

[USD/M

Wh]

0

100

200

300

400⇒ Test period

1.1.2011 31.12.2011 29.12.2012 17.12.2013

LMP

[USD/M

Wh]

0

100

200

300

400⇒ Test period




1.1.2011 31.12.2011 29.12.2012 17.12.2013

LMP

[USD/M

Wh]

0

100

200

300

400⇒ Test period

Longer calibration windowsBetter reflect trends and allow more precise and stableestimation of model parameters




1.1.2011 31.12.2011 29.12.2012 17.12.2013

LMP

[USD/M

Wh]

0

100

200

300

400⇒ Test period

Shorter calibration windowsTend to quickly adapt to changes in price behavior



Averaging forecasts: results

Calibration window length in days (T )0 56 182 364 546 728

MAE

6.4

6.6

6.8

7

7.2

7.4

7.6

7.8ARX Win(T )

AW(364,728)AW(28:728)AW(28:28:728)

AW(28,728)AW(28:28:84,714:7:728)AW(28,56,364,728)AW(28,56,721,728)



Averaging forecasts: results

Number of windows in T2 4 8 16 32 64

MAE

6.4

6.5

6.6

6.7

6.8

6.9

7

ARX

Number of windows in T2 4 8 16 32 64

MAE

6.4

6.5

6.6

6.7

6.8

6.9

7

NARX

T ={28:28:∗, ∗:7:728}

T ={28:28:∗, ∗:28:728}

T ={28:7:∗, ∗:7:728}

T ={28:728}

AW(28:28:84,714:7:728)Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 16 / 39


Diebold-Mariano test

Pair of models: (X,Y)

Compute 24-dimensional vector of errors for each day

∆X ,Y ,d = ‖EX ,d‖ − ‖EY ,d‖

‖EX ,d‖ =∑24

h=1 eX ,d ,h

Hypothesis H0:E (∆X ,Y ,d) = 0

Hypothesis H1:E (∆X ,Y ,d) < 0

Hypothesis HR1 :E (∆X ,Y ,d) > 0



Diebold-Mariano test: p-valuesARX

Win(28)

Win(364)

AW

(364,728)

Win(728)

AW

(28:728)

AW

(28:7:728)

AW

(28:14:728)

AW

(28:28:728)

AW

(56,728)

AW

(28,728)

AW

(28:28:84,714:7:728)

AW

(28,56,728)

AW

(28,56,364,728)

AW

(28,56,721,728)

Win(28)Win(364)

AW(364,728)Win(728)

AW(28:728)AW(28:7:728)

AW(28:14:728)AW(28:28:728)

AW(56,728)AW(28,728)

AW(28:28:84,714:7:728)AW(28,56,728)

AW(28,56,364,728)AW(28,56,721,728)

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

Dark green →p-value close tozero → significantdifference

Black square →results of X-axismodel do notstatisticallyoutperform resultsof Y-axis model


Part 2Extension of the study




Three recent and much longer datasets (windows: 28-1092 days)

Models with more explanatory variables

Data transformation

Better statistical test

New averaging scheme


Data preprocessing Datasets

Dataset: Nord Pool (01.01.2013 - 31.07.2018)

01.01.2013 01.07.2014 28.12.2015 28.06.2017 31.07.2018

Spot

prices[EUR/M

Wh]

0

40

80

120

160

200Initial calibration window ⇐ ⇒ Out of sample period

Hydro-dominated market

Exhibits strong seasonal variations



Dataset: Nord Pool (01.01.2013 - 31.07.2018)

01.01.2013 01.07.2014 28.12.2015 28.06.2017 31.07.2018

Spot

prices[EUR/M

Wh]

0

40

80

120

160

200Initial calibration window ⇐ ⇒ Out of sample period

01.01.2013 01.07.2014 28.12.2015 28.06.2017 31.07.2018

Con

sumption

[GW

h]

40

80



Dataset: PJM (10.04.2012-02.04.2018)

10.04.2012 08.10.2013 06.04.2015 03.10.2016 02.04.2018

Spot

prices[EUR/M

Wh]

0

210

420

630

840

Initial calibration window ⇐ ⇒ Out of sample period

One of the world’s largest wholesale electricity market

Volatile behavior, particularly in early 2014



Dataset: PJM (10.04.2012-02.04.2018)

10.04.2012 08.10.2013 06.04.2015 03.10.2016 02.04.2018

Spot

prices[EUR/M

Wh]

0

210

420

630

840


10.04.2012 08.10.2013 06.04.2015 03.10.2016 02.04.2018

Con

sumption

[GW

h]



Dataset: EPEX (06.08.2010-28.07.2016)

06.08.2010 06.02.2012 02.08.2013 02.02.2015 28.07.2016

Spot

prices[EUR/M

Wh]

-200-160-120-80-4004080120160200


Rapidly growing share of renewables

Pronounced negative prices


Data preprocessing VST

Variance stabilizing transformation (VST)



Variance stabilizing transformation (VST)

Data ‘normalization’ prior to applying a VST: pd ,h = 1b

(Pd ,h − a)a is the median of Pd ,h

b is the sample median absolute deviation (MAD)

Area hyperbolic sine (asinh)

Yd ,h = asinh(pd ,h) ≡ log(pd ,h +

√p2d ,h + 1

)Identity (ID)

Yd ,h = pd ,h



VSTs: Sample application (PJM dataset)

10.04.2012 08.10.2013 06.04.2015 03.10.2016 02.04.2018

ID

-50

0

50

100

150

-20 20 60 100 140

Density

0

0.1

0.2

0.3

0.4

0.5

10.04.2012 08.10.2013 06.04.2015 03.10.2016 02.04.2018

asinh

-5

0

5

10

-4 -3 -2 -1 0 1 2 3 4 5 6

Density

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035


Models Expert models

ARX2 (Weron & Ziel, 2018, Energy Economics)

Xd ,h = βh,1Xd−1,h + βh,2Xd−2,h + βh,3Xd−7,h︸︷︷︸autoregressive effects

+ βh,4Xd−1,min + βh,5Xd−1,max︸︷︷︸non-linear effects

+ βh,6Xd−1,24 + βh,7Cd ,h︸︷︷︸load forecast

+7∑

i=1

βh,7+iDi︸︷︷︸weekday dummies

+εd ,h.


Results

Weighted Averaged Windows (WAW)

For each window in the combination, calculate MAE over thelast t days (e.g. t = 1)

Weight for each window is based on it’s past performance over tdays:

wT =

1MAEt,T∑

T∈T1

MAEt,T

Aggregate forecast is a weighted sum of the forecasts fordifferent calibration windows

Pd ,h =∑T∈T

wT Pd ,h,T


Results

Weighted Averaged Windows (WAW): example

Window combination: (28,364,728)

MAE and weights for windows over past 24 hours :

Window MAE weight28 4.2 0.319

364 3.8 0.354728 4.1 0.327

Predictions for the next 24 hours are weighted using weightsfrom the table


Results

Why should we weigh the forecasts?

Calibration window length in days (T)28 182 364 546 728 910 1092

MAE

3.2

3.4

3.6

3.8

4

4.2

4.4ARX2(PJM,ID)


MAE

5.3

5.4

5.5

5.6

5.7

5.8AR1(EPEX,asinh)


MAE

2.3

2.35

2.4

2.45

2.5

2.55

2.6

2.65ARX1(NP,ID)


Results

Results (— AW , −− WAW)


Results

Results: PJM

ARX1(PJM, ID) ARX1(PJM, asinh) ARX2(PJM, ID) ARX2(PJM, asinh)Window Win Win Win Win

28 3.995 68.409 5.141 ∞56 3.563 3.288 3.691 3.365

364 3.433 3.196 3.383 3.093728 4.078 3.253 3.976 3.121

1092 4.391 3.294 4.321 3.157

Window Set AW WAW AW WAW AW WAW AW WAW

(28:1092) 3.653 3.515 3.221 3.178 3.526 3.414 ∞ ∞(28:728) 3.465 3.379 3.231 3.178 3.367 3.300 ∞ ∞

(56:1092) 3.684 3.549 3.170 3.156 3.555 3.443 3.053 3.046(56:728) 3.499 3.415 3.148 3.134 3.399 3.330 3.042 3.035

(28:28:84,1078:7:1092) 3.563 3.379 13.811 9.853 3.557 3.422 ∞ ∞(56:28:112,1078:7:1092) 3.620 3.421 3.090 3.069 3.536 3.380 2.996 2.985

(28:28:84,714:7:728) 3.463 3.335 13.801 10.360 3.458 3.356 ∞ ∞(56:28:112,714:7:728) 3.517 3.377 3.080 3.061 3.435 3.321 2.989 2.980

WAW(56:28:112,714:7:728) - 3 out of 4 best forecasts overall


Results

Results: PJM

ARX1(PJM, ID) ARX1(PJM, asinh) ARX2(PJM, ID) ARX2(PJM, asinh)Window Win Win Win Win

28 3.995 68.409 5.141 ∞56 3.563 3.288 3.691 3.365

364 3.433 3.196 3.383 3.093728 4.078 3.253 3.976 3.121

1092 4.391 3.294 4.321 3.157

Window Set AW WAW AW WAW AW WAW AW WAW

(28:1092) 3.653 3.515 3.221 3.178 3.526 3.414 ∞ ∞(28:728) 3.465 3.379 3.231 3.178 3.367 3.300 ∞ ∞

(56:1092) 3.684 3.549 3.170 3.156 3.555 3.443 3.053 3.046(56:728) 3.499 3.415 3.148 3.134 3.399 3.330 3.042 3.035

(28:28:84,1078:7:1092) 3.563 3.379 13.811 9.853 3.557 3.422 ∞ ∞(56:28:112,1078:7:1092) 3.620 3.421 3.090 3.069 3.536 3.380 2.996 2.985

(28:28:84,714:7:728) 3.463 3.335 13.801 10.360 3.458 3.356 ∞ ∞(56:28:112,714:7:728) 3.517 3.377 3.080 3.061 3.435 3.321 2.989 2.980

Win(28) and combinations that include it perform very poorly


Results

Best combination?

Hubicka et al. (2018) recommended AW(28:28:84, 714:7:728)

We recommend the WAW(56:28:112, 714:7:728) scheme

Averages long and short windowsGood performance across all models and datasetsComputationally efficient

But is the evaluation in terms of MAE sufficient?


Results Statistical significance tests

Giacomini-White (2006) test

For each model pair and each dataset we compute the p-value ofthe GW test

Null hypothesis H0: ∀iφi = 0 in the regression:

∆X ,Y ,d = φ0 + φ1∆X ,Y ,d−1 + φ2∆X ,Y ,d−2 + . . . + εd

Conditional predictive ability

Better when there is the presence of estimation uncertainty


Results Statistical significance tests

GW test: p-values, PJM


Conclusions

Conclusions

Extremely simple yet efficient technique

Can be applied to different, more complex forecasting techniques

Brings significant gains in predictions accuracy

Including longer windows (1092-day) do not increase forecastingaccuracy

None combination is significantly better than WAW(56:28:112,714:7:728)


averaging day-ahead electricity price forecasts for ...rweron/assets/s3/serafinweron...part 2...

Documents