master's thesis slides
TRANSCRIPT
Approximate Dynamic Programming Methods forResidential Water Heating
A thesis submitted in partial fulfillment for the degree of Master’s of Sciencein the
Department of Electrical Engineering
byMatthew Motoki
December 3, 2015
Outline
1 Motivation
2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement
3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming
Prescient Lower Bound (PLB)
4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation
5 Problem ExtensionsSolar Water HeatingAutomated Demand Response
6 Conclusion
Outline
1 Motivation
2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement
3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming
Prescient Lower Bound (PLB)
4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation
5 Problem ExtensionsSolar Water HeatingAutomated Demand Response
6 Conclusion
Motivation
Why do we need a smarter water heater?
I Energy efficiency is important.
• Electricity is expensive.• Burning fossil fuels is bad for the environment.
I Can we do better than water heaters with an adjustable set-point?
• If so, then are there any provable guarantees that can be made?• Theoretically, what is best that we can do?
I The legacy grid is becoming obsolete.
• Renewable energy sources are variable and distributed.• Energy storage capabilities of water heaters have been fully exploited.
1 / 31
Outline
1 Motivation
2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement
3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming
Prescient Lower Bound (PLB)
4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation
5 Problem ExtensionsSolar Water HeatingAutomated Demand Response
6 Conclusion
Outline
1 Motivation
2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement
3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming
Prescient Lower Bound (PLB)
4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation
5 Problem ExtensionsSolar Water HeatingAutomated Demand Response
6 Conclusion
Problem Formulation
State VariableDefine t ∈ 0,∆t, . . . , (N − 1)∆t. Define tk = mod(k ,N)∆t, where k = 0, 1, . . . isthe simulation time stage.
The state x := (T , h) summarizes the information needed to make a decision.
We require T ∈ [Tamb,Tmax ]. The temperature at tk is written Tk .
The hot water usage history is hk := (ti ,wi ) | 0 ≤ i < mod(k ,N), where wk is theintensity of the hot water draw at time tk .
2 / 31
Problem Formulation
Decision VariableThe decision variable is
uk :=
1, if the water heater is on
0, if the water heater is off.
We assume that the decision uk is constant during the interval [tk , tk+1). A feasibledecision uk ∈ Ωu, is one that does not violate T ∈ [Tamb,Tmax ]. A policy µ is amapping from a state into a feasible decision.
3 / 31
Problem Formulation
Hot Water Demand (Disturbance Variable) 1
We model hot water demand as a cyclostationary random process W(t) given by
W(t) := specific heat∑τ∈Ωτ
Npeople∑i=1
Nτi∑j=1
F (j)τ,i ·
(T (j)τ,i − Tamb
)· IS(j)τ,i ≤ t < S(j)
τ,i +D(j)τ,i
,
where Ωτ := shower , bath, . . . , dishwasher is the set of possible usage events, Npeople
is the number of people in a household, Nτi is the number of events of type τcorresponding to the i th person in the household, and the following are randomvariables:
S(j)τ,i := the start time of E(j)
τ,i ,
D(j)τ,i := the duration of E(j)
τ,i ,
F (j)τ,i := the flow rate of E(j)
τ,i ,
T (j)τ,i := the desired temperature of E(j)
τ,i .
4 / 31
Problem Formulation
Hot Water Demand (Disturbance Variable) 2
We can only observe W(t) at pre-specified times t ∈ Ωt , therefore, we approximateW(t) using a piecewise linear interpolation
W(t) :=W(tk) +t − tk
∆t[W(tk + ∆t)−W(tk)].
for all k = 0, 1, . . . and t ∈ [tk , tk + ∆t). The discrete-time analog of W(t) to be theaverage of W(t) over t ∈ [tk , tk + ∆t),
Wk :=1
∆t
∫ tk+∆t
tk
W(t) dt = 12 [W(tk) +W(tk + ∆t)].
We denote particular realizations of W(t) and Wk using w(t) and wk , respectively.We discretize wk ∈ 0,∆w , . . . ,wmax. We write the conditional probability massfunction of Wk given hk as pWk
(wk | hk).
5 / 31
Outline
1 Motivation
2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement
3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming
Prescient Lower Bound (PLB)
4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation
5 Problem ExtensionsSolar Water HeatingAutomated Demand Response
6 Conclusion
Problem Formulation
State Equation
The state equation maps the current state xk , current decision uk , and currentdisturbance wk into the next state xk+1 according to
xk+1 = f (xk , uk ,wk) :=(fT (Tk , uk ,wk), fh(tk , hk ,wk)
),
where
Tk+1 = fT (Tk , uk ,wk) := maxTk − rcool∆t (Tk − Tamb)
+ rheat∆t uk − rloss∆t wk , Tamb
hk+1 = fh(hk ,wk) :=
(tk ,wk) ∪ hk , tk 6= (N − 1)∆t
∅, otherwise,
for all k = 0, 1, . . .
6 / 31
Outline
1 Motivation
2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement
3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming
Prescient Lower Bound (PLB)
4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation
5 Problem ExtensionsSolar Water HeatingAutomated Demand Response
6 Conclusion
Problem Formulation
Objective Function
The objective is to minimize over all policies µ, the following function
Jµ(x0) = limK→∞
EW
[1
K
K−1∑k=0
g(Xk , µ(Xk),Wk ; θ
) ∣∣∣∣∣ x0
],
= limK→∞
1
K
K−1∑k=0
EW0,W1,...,Wk ,
[g(Xk , µ(Xk),Wk ; θ
) ∣∣ x0
].
where X0 = x0 is given and Xk = f(Xk−1, µ(Xk−1),Wk−1
), for all k = 1, 2, . . .
7 / 31
Problem Formulation
Stage Cost
The stage cost is
g (xk , uk ,wk ; θ) := α gdiscomfort (xk , uk ,wk ;Tmin) + (1− α) goperating (xk , uk) ,
where θ := α,Tmin is a customer-defined parameter set, α ∈ [0, 1] is the relativeweighting of the objectives, and Tmin is the minimum desirable temperature during ahot water use.
Operating Cost
The operating cost is
goperating (uk) :=1
∆t
∫ tk+∆t
tk
C (t) rating uk dt,
where C (t) is the cost of power and rating is the power rating of the water heater.8 / 31
Problem Formulation
Discomfort CostThe discomfort cost is
gdiscomfort
(xk , uk ,wk ;Tmin
):=
1
∆t
∫ tk+∆t
tk
maxTmin − T (t), 0
· Iw(t) > 0 dt,
where
T (t) := Tk +t − tk
∆t[fT (Tk , uk ,wk)− Tk ],
for all k = 0, 1, . . .
9 / 31
Outline
1 Motivation
2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement
3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming
Prescient Lower Bound (PLB)
4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation
5 Problem ExtensionsSolar Water HeatingAutomated Demand Response
6 Conclusion
Problem Formulation
Problem StatementFind a feasible on/off policy that minimizes an expected objective cost.
minimizeµ
limK→∞
EW
[1
K
K−1∑k=0
g(Xk , µ(Xk),Wk ; θ
) ∣∣∣∣∣ x0
]subject to Xk+1 = f
(Xk , µ(Xk),Wk
), µ(xk) ∈ 0, 1,
Tk ∈ [Tamb,Tmax ], for all k = 0, 1, . . .
This is a discrete-time, average cost periodic Markov decision problem (MDP).
10 / 31
Outline
1 Motivation
2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement
3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming
Prescient Lower Bound (PLB)
4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation
5 Problem ExtensionsSolar Water HeatingAutomated Demand Response
6 Conclusion
Outline
1 Motivation
2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement
3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming
Prescient Lower Bound (PLB)
4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation
5 Problem ExtensionsSolar Water HeatingAutomated Demand Response
6 Conclusion
Methodology
Finite-Horizon Dynamic Programming
The goal is to minimize over all policies µ, the following function
Jµ(x0) = EW
[gterminal(XM) +
M−1∑k=0
g(Xk , µk(Xk),Wk ; θ
) ∣∣∣∣∣ x0
],
where M is the horizon and gterminal is a terminal cost function.
The optimal policy µ∗ is the minimizer of Bellman’s equations
J∗(xM) = gterminal(xM),
J∗(xk) = minuk∈0, 1
EWk
[g(xk , uk ,wk ; θ) + J∗
(f (xk , uk ,Wk)
)| xk],
where J∗ is known as the optimal cost-to-go function.
11 / 31
Outline
1 Motivation
2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement
3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming
Prescient Lower Bound (PLB)
4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation
5 Problem ExtensionsSolar Water HeatingAutomated Demand Response
6 Conclusion
Methodology
Average Cost Dynamic Programming for Periodic MDP’s
Relative value iteration (VI) can be used to solve average cost periodic MDP’s.
1. Initialize J and µ arbitrarily and fix a reference state xref .
2. Calculate the new cost-to-go function J ′ by solving an N-horizon MDP usingJ(x0) as the terminal cost function.
3. Update the current cost-to-go function using J(xk)← J ′(xk)− J ′(xref ).
4. Repeat step 2 until convergence is achieved.
The relative value iteration algorithm terminates with J being a differential costfunction—interpreted as the minimum expected N-stage costs relative to the referencestate xref ; furthermore, J(xref ) is interpreted as the average cost of completing a cycle.
12 / 31
Outline
1 Motivation
2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement
3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming
Prescient Lower Bound (PLB)
4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation
5 Problem ExtensionsSolar Water HeatingAutomated Demand Response
6 Conclusion
Methodology
Approximate Dynamic Programming (ADP)
Exact dynamic programming is hard because of the large state-space; in particular, Tk
is continuous and dimension of hk increases at every stage (except the last stage of acycle). Simplify the model to get a more tractable problem.
1. Temperature Aggregation
2. Usage History Aggregation
3. Approximate Transition Probabilities Using Density Estimation
4. Q-Learning
13 / 31
Methodology
Temperature Aggregation
Discretize temperature T ∈ Tamb,Tamb + ∆T , . . . ,Tamb + (n − 1)∆T.Let A(T ) be the following random function of T
A(T ) :=
sgn(T − T )∆T , w.p. |T − T |/∆T
0, w.p. 1− |T − T |/∆T ,
where T = round(T/∆T )∆T . The aggregate problem has the following modifiedthermodynamics
Tk+1 = fT (Tk , uk ,wk) := round(Tk+1/∆T )∆T +A(Tk+1),
where Tk+1 = fT (Tk , uk ,wk).
14 / 31
Methodology
Usage History Aggregation
Here the goal is find a low-dimensional feature vector φk , such thatpWk
(wk | hk) ≈ pWk(wk | φk). We are interested in φk with simple update rules
φk+1 = fφ(φk ,wk). For example,
φ(1a)k = Iwk−1 > 0, φ
(1a)k+1 = Iwk > 0,
φ(2a)k =
k−1∑i=iStartUse
Iwi > 0, φ(2a)k+1 = Iwk 6= 0 ·
(φ
(2a)k + Iwk > 0
),
φ(3a)k =
k−1∑i=iStartCycle
Iwi > 0, φ(3a)k+1 = Imod(k ,N) = 0 ·
(φ
(3a)k + Iwk > 0
).
The aggregate problem uses xk = (Tk , tk ,φk) in place of xk .
15 / 31
Methodology
Approximate Transition Probabilities Using Density Estimation
• A closed-form expression for pW is hard to find.
• Use kernel density estimation to get an estimate of pWk(wk | hk).
• Estimation of high dimensional pdf’s is difficult, so use usage history aggregationto estimate pWk
(wk | φk) instead.
• Use the estimate pWk(wk | φk) to calculate the transition probabilities
Pr[fT (Tk , uk ,Wk) = Tk+1 | Tk ,φk , uk
]and Pr
[fφ(φk ,Wk) = φk+1 | φk
].
16 / 31
Methodology
Model-Free Q-Learning
• Model-Free Q-Learning involves learning from trajectories of the form(x0, u0), (x1, u1), . . . (xp, up) where uk = µ(xk).
• Q-factors are updated using the following formula
Q(xk , uk)← (1− γ)Q(xk , uk) + γ
[g(xk , uk ,wk ; θ) + min
vk+1
Q(xk+1, vk+1)
],
where xk+1 = f (xk , uk ,wk) and 0 ≤ γ ≤ 1 is the learning rate.
• The policy is updated using µ(xk)← IQ(xk , 1) < Q(xk , 0).• Model-Free Q-Learning does not require knowledge of the transition probabilities,
but it suffers from the problem of “Exploration v.s. Exploitation”.
• An ε-greedy algorithm can be used to tradeoff between exploration andexploitation.
17 / 31
Methodology
Model-Based Q-Learning
• Model-Based Q-Learning involves learning from usage trajectories w0,w1, . . . ,wp.
• The model of the system is used to obtain a family of state-decision pairtrajectories corresponding to each usage trajectory.
• The Q-factors are updated using the same formula.
• Model-Based Q-Learning does not require knowledge of the transitionprobabilities and it does not have the problem of “Exploration v.s. Exploitation”.
18 / 31
Outline
1 Motivation
2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement
3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming
Prescient Lower Bound (PLB)
4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation
5 Problem ExtensionsSolar Water HeatingAutomated Demand Response
6 Conclusion
Methodology
Prescient Lower Bound (PLB)
1. Generate/observe a series of usage trajectories.
2. Solve the finite-horizon problem corresponding to these trajectories exactly.
3. The average of the optimal costs is a lower bound for the objective function.
This lower bound represents represents the minimum possible objective cost, given thathot water usage is known.
19 / 31
Outline
1 Motivation
2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement
3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming
Prescient Lower Bound (PLB)
4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation
5 Problem ExtensionsSolar Water HeatingAutomated Demand Response
6 Conclusion
Outline
1 Motivation
2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement
3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming
Prescient Lower Bound (PLB)
4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation
5 Problem ExtensionsSolar Water HeatingAutomated Demand Response
6 Conclusion
Results
Numerical Simulations Setup
Figure: Simulate Hot Water Usage Data
20 / 31
Results
Numerical Simulations Setup
Figure: Hot Water Usage Probability Mass Function
20 / 31
Results
Numerical Simulations Setup
0 2 4 6 8 10 12 14 16 18 20 22 240.18
0.2
0.22
0.24
0.26
0.28
0.3
Price o
f P
ow
er
($/k
W)
Figure: Time-Varying Price of Power
20 / 31
Outline
1 Motivation
2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement
3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming
Prescient Lower Bound (PLB)
4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation
5 Problem ExtensionsSolar Water HeatingAutomated Demand Response
6 Conclusion
Results
Set-Point MethodsThe policy of a set-point water heater maps (Tk , uk−1) to uk :
µset−point(Tk , uk−1;ϑ) :=
0, if Tk > Tset(tk) + δ(tk)
1, if Tk < Tset(tk)− δ(tk)
uk−1, otherwise
for all k = 0, 1, . . . , where ϑ := Tset , δ.A simple case occurs when δ(tk) ≡ 0:
µsimple
(Tk ;Tset
):= I
Tk < Tset(tk)
,
for all k = 0, 1, . . .Relative VI with state xk = Tk does no worse than simple set-points.
21 / 31
Results
Simple Set-Point with HECO Pricing
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
1
3
5
7
9
11
13
15
Dis
com
fort
Cos
t (°
C/us
e)
Operating Cost ($/day)
SimpleSet−PointSolution
DynamicProgramming
Solution
PrescientLower Bound
Set−Point (°C)
25 30 35 40 45 50 55
22 / 31
Results
Simple Set-Point with HECO Pricing
1.2 1.25 1.3 1.35 1.4 1.45 1.5 1.55 1.6 1.65 1.7 1.750
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Dis
com
fort
Cos
t (°
C/us
e)
Operating Cost ($/day)
Set−Point (°C)
25 30 35 40 45 50 55
22 / 31
Results
Simple Set-Point with Constant Pricing
0.1 0.25 0.4 0.55 0.7 0.85 1 1.15 1.3 1.45 1.6
1
3
5
7
9
11
13
15
Dis
com
fort
Cos
t (°
C/us
e)
Operating Cost ($/day)
SimpleSet−PointSolution
DynamicProgramming
Solution
PrecientLower Bound
Set−Point (°C)
25 30 35 40 45 50 55
23 / 31
Results
Simple Set-Point with Constant Pricing
1.2 1.25 1.3 1.35 1.4 1.45 1.5 1.55 1.60
0.4
0.8
1.2
1.6
2
2.4
Dis
com
fort
Cos
t (°
C/us
e)
Operating Cost ($/day)
Set−Point (°C)
25 30 35 40 45 50 55
23 / 31
Outline
1 Motivation
2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement
3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming
Prescient Lower Bound (PLB)
4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation
5 Problem ExtensionsSolar Water HeatingAutomated Demand Response
6 Conclusion
Results
Temperature Aggregation
0.1 0.25 0.4 0.55 0.7 0.85 1 1.15 1.3 1.45 1.6
1
3
5
7
9
11
13
15
Dis
co
mfo
rt C
ost
(°C
/use
)
Operating Cost ($/day)
Hard, 1Hard, 1/3Hard, 1/10Coarse, 1Coarse, 1/3Coarse, 1/10PLB
24 / 31
Results
Temperature Aggregation
1.2 1.25 1.3 1.35 1.4 1.45 1.5 1.55 1.60
0.15
0.3
0.45
0.6
0.75
Dis
com
fort
Cost (
°C
/use)
Operating Cost ($/day)
Hard, 1Hard, 1/3Hard, 1/10Coarse, 1Coarse, 1/3Coarse, 1/10PLB
24 / 31
Outline
1 Motivation
2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement
3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming
Prescient Lower Bound (PLB)
4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation
5 Problem ExtensionsSolar Water HeatingAutomated Demand Response
6 Conclusion
Results
Usage History Aggregation
1.2 1.25 1.3 1.35 1.4 1.45 1.5 1.55 1.60
0.15
0.3
0.45
0.6
0.75
Dis
com
fort
Cost (
°C
/use)
Operating Cost ($/day)
∅φ(1a)
φ(2a)
φ(3a)
PLB
25 / 31
Outline
1 Motivation
2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement
3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming
Prescient Lower Bound (PLB)
4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation
5 Problem ExtensionsSolar Water HeatingAutomated Demand Response
6 Conclusion
Outline
1 Motivation
2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement
3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming
Prescient Lower Bound (PLB)
4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation
5 Problem ExtensionsSolar Water HeatingAutomated Demand Response
6 Conclusion
Problem Extension
Solar Water Heating
Let Vk be a random variable representing the solar irradiance at time tk . In practice,we will have estimate vk using forecasting methods. Let efficiency(vk) convertirradiance into usable power. The modified temperature equation is
fT (Tk , uk ,wk , vk) = max Tk − rcool∆t(Tk − Tamb) + rheat∆t uk
− rloss∆t wk + rsolar∆t · efficency(vk),Tamb
where rsolar is a conversion factor from power to temperature.
26 / 31
Outline
1 Motivation
2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement
3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming
Prescient Lower Bound (PLB)
4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation
5 Problem ExtensionsSolar Water HeatingAutomated Demand Response
6 Conclusion
Problem Extensions
Demand Response
Compensate customers for reducing/shifting electricity use.
Water Heater
-L
KDE
pW
6pW
DP@@@R
µ
ExpectedLoad
Utility
-C minimizeC
n∑k=1
(aL2
k + bLk + c)
subject to L = f ′(C),
1
n
n∑k=1
C(k) = Cavg ,
Cmin ≤ C(k) ≤ Cmax .
27 / 31
Problem Extensions - Automated Demand Resonse
Heursitic for Setting Price
Find β1, β2 ≥ 0 such that
C = β1L + β2,1
N
N∑k=1
C(k) = Cavg , Cmin ≤ C(k) ≤ Cmax ,
and β1 is maximal.The closed-form solution is
β∗1 = max
Cmax − Cavg
Lmax − Lavg,Cmin − Cavg
Lmin − Lavg
and β∗2 = Cavg − β∗1Lavg .
The update isC← (1− η)C + η(β∗1L + β∗2).
28 / 31
Problem Extensions
Automated Demand Resonse Simulation
29 / 31
Outline
1 Motivation
2 Problem FormulationSystem VariablesState DynamicsObjective FunctionProblem Statement
3 MethodologyFinite-Horizon DPAverage Cost DP for Periodic MDP’sApproximate Dynamic Programming
Prescient Lower Bound (PLB)
4 ResultsNumerical Simulations SetupSet-Point MethodsTemperature AggregationUsage History Aggregation
5 Problem ExtensionsSolar Water HeatingAutomated Demand Response
6 Conclusion
Conclusion
Summary
• Formulated the problem of minimizing a weighted sum of operating anddiscomfort costs as an average cost MDP.
• Considered approximate DP methods such as aggregation, density estimation, andQ-Learning.
• Approximate DP is at least as good as simple set-points.
• Applications of Water heaters optimized with approximate DP are solar waterheating and automated demand response.
• A longer cycle (e.g., a week or a month) should be considered.
• Non-stationary usage patterns should be considered.
30 / 31
Thank You
31 / 31