summary of polson and sokolov 2018 · summary of polson and sokolov 2018 deep learning for energy...

Summary of Polson and Sokolov 2018Deep Learning for Energy Markets

David Prentiss

OR750-004

November 12, 2018

The PJM Interconnection

I The Pennsylvania–New Jersey–Maryland Interconnection(PTO) is a regional transmission organization (RTO).

I It implements a wholesale electricity market for a network ofproducers and consumers in the Mid-Atlantic.

I It’s primary purpose is to prevent outages or otherwise un-metdemand.

I Obligations are exchanged in bilateral contracts, theday-ahead market, and the real-time market.

Local marginal price data

I Local Marginal Prices (LMP) are price data aggregated forprices in various locations and interconnection services is thenetwork.

I They reflect the cost of producing and transmitting electricityin the network.

I Prices are non-linear because electricity.

I This paper proposes a NN to model price extremes.

Load vs. price

Load vs. previous load

RNN vs. long short-term memory

Vanilla RNN

ht = tanh

(W

(ht−1

xt

))LSTM

ifok

=

σσσ

tanh

◦W (ht−1

xt

)

ct = f � ct−1 + i � k

ht = o � tanh (ct)

LTSM model

ifok

=

σσσ

tanh

◦W (ht−1

xt

)

ct = f � ct−1 + i � k

ht = o � tanh (ct)

Extreme value theory

I Extreme value analysis begins by filtering the data to select“extreme” values.

I Extreme values are selected by one of two methods.I Block maxima: Select the peak values after dividing the series

into periods.I Peak over threshold: Select values larger than some threshold.

I Peak over threshold used in this paper.

Peak over threshold

I Pickands–Balkema–de Hann (1974 and 1975) theoremcharacterizes the asymptotic tail distribution of an unknowndistribution.

I Distribution of events that exceed a threshold areapproximated with the generalized Pareto distribution.

I Low threshold increases bias.

I High threshold increases variance.

Generalized Pareto distribution

I CDF

H(y | σ, ξ) = 1−(

1 + ξy − u

σ

)−1ξ

+

I PDF

h(y | σ, ξ) = 1− 1

σ

(1 + ξ

y − u

σ

)−1ξ−1

Parameters

h(y | σ, ξ) = 1− 1

σ

(1 + ξ

y − u

σ

)−1ξ−1

I Location, u, is the threshold

I Scale, σ, is our learned parameter

I Shape, ξ = f (u, σ)? EX [y ] = σ + u =⇒ ξ = 0?

Fourier (ARIMA) model

Fourier (ARIMA) model vs DL

Demand forcasting DL–EVT

I DL–EVT Architecture

X → tanh(W (1)X + b(1)

)→ Z (1) → exp

(tanh

(Z (1)

)→ σ(X )

I W (1) ∈ Rp×3, x ∈ Rp, p = 24 (one day)

I Threshold, u = 31, 000

Vanilla DL vs. DL-EVT

summary of polson and sokolov 2018 · summary of polson and sokolov 2018 deep learning for energy...

Documents