averaging day-ahead electricity price forecasts for ...rweron/assets/s3/serafinweron...part 2...
TRANSCRIPT
Averaging day-ahead electricity price forecastsfor autoregressive models across calibration
windows of various lengths∗
Grzegorz Marcjasz1,2, Tomasz Serafin1,2, Rafał Weron1
1Department of Operations Research, Faculty of Computer Science & Management
and 2Faculty of Pure and Applied Mathematics
Wrocław University of Science and Technology, Poland
∗ Supported by the National Science Center (PL) through grants no. 2015/17/B/HS4/00334 and 2016/23/G/HS4/01005
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 1 / 39
Introduction
Marcjasz, Serafin & Weron (2018)
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 2 / 39
Introduction
Part 1Notation and previous results
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 3 / 39
Literature
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 4 / 39
Results of Hubicka, Marcjasz & Weron (2018)
ARX1 (Misiorek et al., 2006, SNDE)
Xd ,h = βh,0 + βh,1Xd−1,h + βh,2Xd−2,h + βh,3Xd−7,h︸ ︷︷ ︸autoregressive effects
+ βh,4Xd−1,min︸ ︷︷ ︸non-linear effects
+ βh,5Cd ,h︸ ︷︷ ︸load forecast
+ βh,6DSat + βh,7DSun + βh,8DMon︸ ︷︷ ︸weekday dummies
+εd ,h
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 5 / 39
Results of Hubicka, Marcjasz & Weron (2018)
GEFCom2014 (01.01.2011-17.12.2013)
1.1.2011 31.12.2011 29.12.2012 17.12.2013
LMP
[USD/M
Wh]
0
100
200
300
400⇒ Test period
1.1.2011 31.12.2011 29.12.2012 17.12.2013
System
load
[GW
h]
10
15
20
25
30
35
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 6 / 39
Results of Hubicka, Marcjasz & Weron (2018)
Rolling window scheme
Which window length is optimal? Ten days? Four weeks? Ayear?
No consistency in the EPF literature
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 7 / 39
Results of Hubicka, Marcjasz & Weron (2018)
Rolling window scheme
1.1.2011 31.12.2011 29.12.2012 17.12.2013
LMP
[USD/M
Wh]
0
100
200
300
400⇒ Test period
728-day calibration window
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 8 / 39
Results of Hubicka, Marcjasz & Weron (2018)
Rolling window scheme
1.1.2011 31.12.2011 29.12.2012 17.12.2013
LMP
[USD/M
Wh]
0
100
200
300
400⇒ Test period
728-day calibration window
182-day calibration window
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 9 / 39
Results of Hubicka, Marcjasz & Weron (2018)
Rolling window scheme
1.1.2011 31.12.2011 29.12.2012 17.12.2013
LMP
[USD/M
Wh]
0
100
200
300
400⇒ Test period
728-day calibration window
182-day calibration window
28-day calibration window
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 10 / 39
Results of Hubicka, Marcjasz & Weron (2018)
Averaging forecasts: notation
Win(T ) - forecast for a T -day windowAW(τ) - simple arithmetic average of Win(T )’s
τ = (28,728) refers to 28- and 728-day windowsτ = (28:1092) refers to all 1065 windows ranging from 28 to1092 daysτ = (28:28:728) is the selection of 26 windows: 28-, 56-, 84-,..., 728-day
Mean Absolute Error (MAE)MAE = 1
24D
∑Dd=1
∑24h=1 |εd ,h|,
εd ,h = Pd ,h − Pd ,h is the forecast error for day d and hour h
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 11 / 39
Results of Hubicka, Marcjasz & Weron (2018)
Which windows to choose?
1.1.2011 31.12.2011 29.12.2012 17.12.2013
LMP
[USD/M
Wh]
0
100
200
300
400⇒ Test period
1.1.2011 31.12.2011 29.12.2012 17.12.2013
LMP
[USD/M
Wh]
0
100
200
300
400⇒ Test period
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 12 / 39
Results of Hubicka, Marcjasz & Weron (2018)
Which windows to choose?
1.1.2011 31.12.2011 29.12.2012 17.12.2013
LMP
[USD/M
Wh]
0
100
200
300
400⇒ Test period
Longer calibration windowsBetter reflect trends and allow more precise and stableestimation of model parameters
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 13 / 39
Results of Hubicka, Marcjasz & Weron (2018)
Which windows to choose?
1.1.2011 31.12.2011 29.12.2012 17.12.2013
LMP
[USD/M
Wh]
0
100
200
300
400⇒ Test period
Shorter calibration windowsTend to quickly adapt to changes in price behavior
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 14 / 39
Results of Hubicka, Marcjasz & Weron (2018)
Averaging forecasts: results
Calibration window length in days (T )0 56 182 364 546 728
MAE
6.4
6.6
6.8
7
7.2
7.4
7.6
7.8ARX Win(T )
AW(364,728)AW(28:728)AW(28:28:728)
AW(28,728)AW(28:28:84,714:7:728)AW(28,56,364,728)AW(28,56,721,728)
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 15 / 39
Results of Hubicka, Marcjasz & Weron (2018)
Averaging forecasts: results
Number of windows in T2 4 8 16 32 64
MAE
6.4
6.5
6.6
6.7
6.8
6.9
7
ARX
Number of windows in T2 4 8 16 32 64
MAE
6.4
6.5
6.6
6.7
6.8
6.9
7
NARX
T ={28:28:∗, ∗:7:728}
T ={28:28:∗, ∗:28:728}
T ={28:7:∗, ∗:7:728}
T ={28:728}
AW(28:28:84,714:7:728)Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 16 / 39
Results of Hubicka, Marcjasz & Weron (2018)
Diebold-Mariano test
Pair of models: (X,Y)
Compute 24-dimensional vector of errors for each day
∆X ,Y ,d = ‖EX ,d‖ − ‖EY ,d‖
‖EX ,d‖ =∑24
h=1 eX ,d ,h
Hypothesis H0:E (∆X ,Y ,d) = 0
Hypothesis H1:E (∆X ,Y ,d) < 0
Hypothesis HR1 :E (∆X ,Y ,d) > 0
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 17 / 39
Results of Hubicka, Marcjasz & Weron (2018)
Diebold-Mariano test: p-valuesARX
Win(28)
Win(364)
AW
(364,728)
Win(728)
AW
(28:728)
AW
(28:7:728)
AW
(28:14:728)
AW
(28:28:728)
AW
(56,728)
AW
(28,728)
AW
(28:28:84,714:7:728)
AW
(28,56,728)
AW
(28,56,364,728)
AW
(28,56,721,728)
Win(28)Win(364)
AW(364,728)Win(728)
AW(28:728)AW(28:7:728)
AW(28:14:728)AW(28:28:728)
AW(56,728)AW(28,728)
AW(28:28:84,714:7:728)AW(28,56,728)
AW(28,56,364,728)AW(28,56,721,728)
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
Dark green →p-value close tozero → significantdifference
Black square →results of X-axismodel do notstatisticallyoutperform resultsof Y-axis model
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 18 / 39
Part 2Extension of the study
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 19 / 39
Marcjasz, Serafin & Weron (2018)
Marcjasz, Serafin & Weron (2018)
Three recent and much longer datasets (windows: 28-1092 days)
Models with more explanatory variables
Data transformation
Better statistical test
New averaging scheme
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 20 / 39
Data preprocessing Datasets
Dataset: Nord Pool (01.01.2013 - 31.07.2018)
01.01.2013 01.07.2014 28.12.2015 28.06.2017 31.07.2018
Spot
prices[EUR/M
Wh]
0
40
80
120
160
200Initial calibration window ⇐ ⇒ Out of sample period
Hydro-dominated market
Exhibits strong seasonal variations
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 21 / 39
Data preprocessing Datasets
Dataset: Nord Pool (01.01.2013 - 31.07.2018)
01.01.2013 01.07.2014 28.12.2015 28.06.2017 31.07.2018
Spot
prices[EUR/M
Wh]
0
40
80
120
160
200Initial calibration window ⇐ ⇒ Out of sample period
01.01.2013 01.07.2014 28.12.2015 28.06.2017 31.07.2018
Con
sumption
[GW
h]
40
80
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 22 / 39
Data preprocessing Datasets
Dataset: PJM (10.04.2012-02.04.2018)
10.04.2012 08.10.2013 06.04.2015 03.10.2016 02.04.2018
Spot
prices[EUR/M
Wh]
0
210
420
630
840
Initial calibration window ⇐ ⇒ Out of sample period
One of the world’s largest wholesale electricity market
Volatile behavior, particularly in early 2014
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 23 / 39
Data preprocessing Datasets
Dataset: PJM (10.04.2012-02.04.2018)
10.04.2012 08.10.2013 06.04.2015 03.10.2016 02.04.2018
Spot
prices[EUR/M
Wh]
0
210
420
630
840
Initial calibration window ⇐ ⇒ Out of sample period
10.04.2012 08.10.2013 06.04.2015 03.10.2016 02.04.2018
Con
sumption
[GW
h]
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 24 / 39
Data preprocessing Datasets
Dataset: EPEX (06.08.2010-28.07.2016)
06.08.2010 06.02.2012 02.08.2013 02.02.2015 28.07.2016
Spot
prices[EUR/M
Wh]
-200-160-120-80-4004080120160200
Initial calibration window ⇐ ⇒ Out of sample period
Rapidly growing share of renewables
Pronounced negative prices
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 25 / 39
Data preprocessing VST
Variance stabilizing transformation (VST)
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 26 / 39
Data preprocessing VST
Variance stabilizing transformation (VST)
Data ‘normalization’ prior to applying a VST: pd ,h = 1b
(Pd ,h − a)a is the median of Pd ,h
b is the sample median absolute deviation (MAD)
Area hyperbolic sine (asinh)
Yd ,h = asinh(pd ,h) ≡ log(pd ,h +
√p2d ,h + 1
)Identity (ID)
Yd ,h = pd ,h
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 27 / 39
Data preprocessing VST
VSTs: Sample application (PJM dataset)
10.04.2012 08.10.2013 06.04.2015 03.10.2016 02.04.2018
ID
-50
0
50
100
150
-20 20 60 100 140
Density
0
0.1
0.2
0.3
0.4
0.5
10.04.2012 08.10.2013 06.04.2015 03.10.2016 02.04.2018
asinh
-5
0
5
10
-4 -3 -2 -1 0 1 2 3 4 5 6
Density
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 28 / 39
Models Expert models
ARX2 (Weron & Ziel, 2018, Energy Economics)
Xd ,h = βh,1Xd−1,h + βh,2Xd−2,h + βh,3Xd−7,h︸ ︷︷ ︸autoregressive effects
+ βh,4Xd−1,min + βh,5Xd−1,max︸ ︷︷ ︸non-linear effects
+ βh,6Xd−1,24 + βh,7Cd ,h︸ ︷︷ ︸load forecast
+7∑
i=1
βh,7+iDi︸ ︷︷ ︸weekday dummies
+εd ,h.
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 29 / 39
Results
Weighted Averaged Windows (WAW)
For each window in the combination, calculate MAE over thelast t days (e.g. t = 1)
Weight for each window is based on it’s past performance over tdays:
wT =
1MAEt,T∑
T∈T1
MAEt,T
Aggregate forecast is a weighted sum of the forecasts fordifferent calibration windows
Pd ,h =∑T∈T
wT Pd ,h,T
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 30 / 39
Results
Weighted Averaged Windows (WAW): example
Window combination: (28,364,728)
MAE and weights for windows over past 24 hours :
Window MAE weight28 4.2 0.319
364 3.8 0.354728 4.1 0.327
Predictions for the next 24 hours are weighted using weightsfrom the table
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 31 / 39
Results
Why should we weigh the forecasts?
Calibration window length in days (T)28 182 364 546 728 910 1092
MAE
3.2
3.4
3.6
3.8
4
4.2
4.4ARX2(PJM,ID)
Calibration window length in days (T)28 182 364 546 728 910 1092
MAE
5.3
5.4
5.5
5.6
5.7
5.8AR1(EPEX,asinh)
Calibration window length in days (T)28 182 364 546 728 910 1092
MAE
2.3
2.35
2.4
2.45
2.5
2.55
2.6
2.65ARX1(NP,ID)
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 32 / 39
Results
Results (— AW , −− WAW)
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 33 / 39
Results
Results: PJM
ARX1(PJM, ID) ARX1(PJM, asinh) ARX2(PJM, ID) ARX2(PJM, asinh)Window Win Win Win Win
28 3.995 68.409 5.141 ∞56 3.563 3.288 3.691 3.365
364 3.433 3.196 3.383 3.093728 4.078 3.253 3.976 3.121
1092 4.391 3.294 4.321 3.157
Window Set AW WAW AW WAW AW WAW AW WAW
(28:1092) 3.653 3.515 3.221 3.178 3.526 3.414 ∞ ∞(28:728) 3.465 3.379 3.231 3.178 3.367 3.300 ∞ ∞
(56:1092) 3.684 3.549 3.170 3.156 3.555 3.443 3.053 3.046(56:728) 3.499 3.415 3.148 3.134 3.399 3.330 3.042 3.035
(28:28:84,1078:7:1092) 3.563 3.379 13.811 9.853 3.557 3.422 ∞ ∞(56:28:112,1078:7:1092) 3.620 3.421 3.090 3.069 3.536 3.380 2.996 2.985
(28:28:84,714:7:728) 3.463 3.335 13.801 10.360 3.458 3.356 ∞ ∞(56:28:112,714:7:728) 3.517 3.377 3.080 3.061 3.435 3.321 2.989 2.980
WAW(56:28:112,714:7:728) - 3 out of 4 best forecasts overall
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 34 / 39
Results
Results: PJM
ARX1(PJM, ID) ARX1(PJM, asinh) ARX2(PJM, ID) ARX2(PJM, asinh)Window Win Win Win Win
28 3.995 68.409 5.141 ∞56 3.563 3.288 3.691 3.365
364 3.433 3.196 3.383 3.093728 4.078 3.253 3.976 3.121
1092 4.391 3.294 4.321 3.157
Window Set AW WAW AW WAW AW WAW AW WAW
(28:1092) 3.653 3.515 3.221 3.178 3.526 3.414 ∞ ∞(28:728) 3.465 3.379 3.231 3.178 3.367 3.300 ∞ ∞
(56:1092) 3.684 3.549 3.170 3.156 3.555 3.443 3.053 3.046(56:728) 3.499 3.415 3.148 3.134 3.399 3.330 3.042 3.035
(28:28:84,1078:7:1092) 3.563 3.379 13.811 9.853 3.557 3.422 ∞ ∞(56:28:112,1078:7:1092) 3.620 3.421 3.090 3.069 3.536 3.380 2.996 2.985
(28:28:84,714:7:728) 3.463 3.335 13.801 10.360 3.458 3.356 ∞ ∞(56:28:112,714:7:728) 3.517 3.377 3.080 3.061 3.435 3.321 2.989 2.980
Win(28) and combinations that include it perform very poorly
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 35 / 39
Results
Best combination?
Hubicka et al. (2018) recommended AW(28:28:84, 714:7:728)
We recommend the WAW(56:28:112, 714:7:728) scheme
Averages long and short windowsGood performance across all models and datasetsComputationally efficient
But is the evaluation in terms of MAE sufficient?
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 36 / 39
Results Statistical significance tests
Giacomini-White (2006) test
For each model pair and each dataset we compute the p-value ofthe GW test
Null hypothesis H0: ∀iφi = 0 in the regression:
∆X ,Y ,d = φ0 + φ1∆X ,Y ,d−1 + φ2∆X ,Y ,d−2 + . . . + εd
Conditional predictive ability
Better when there is the presence of estimation uncertainty
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 37 / 39
Results Statistical significance tests
GW test: p-values, PJM
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 38 / 39
Conclusions
Conclusions
Extremely simple yet efficient technique
Can be applied to different, more complex forecasting techniques
Brings significant gains in predictions accuracy
Including longer windows (1092-day) do not increase forecastingaccuracy
None combination is significantly better than WAW(56:28:112,714:7:728)
Tomasz Serafin (Wrocław, PL) Averaging across calibration windows 3.10.2018, S3 Seminar 39 / 39