dynamic programming and optimal control · dynamic programming and optimal control fall 2009...
TRANSCRIPT
Dynamic Programming andOptimal Control
Fall 2009
Problem Set:Infinite Horizon Problems, Value Iteration, PolicyIteration
Notes:
• Problems marked with BERTSEKAS are taken from the book Dynamic Programming andOptimal Control by Dimitri P. Bertsekas, Vol. I, 3rd edition, 2005, 558 pages, hardcover.
• The solutions were derived by the teaching assistants. Please report any error that youmay find to [email protected] or [email protected].
• The solutions in this document are handwritten. A typed version will be available in future.
Problem Set
Problem 1 (BERTSEKAS, p. 445, exercise 7.1)
Problem 2 (BERTSEKAS, p. 446, exercise 7.3)
Problem 3 (BERTSEKAS, p. 448, exercise 7.12)
Problem 4 (BERTSEKAS, p. 60, exercise 1.23)
2
---
---
£Xe1==_~=====~:.=::::::J!.....:..-20
±-iJ
~_.
~=_-o
---.--+-
- -vt.( W k-) G 0, 4+ If. fu- 'i oLt ~3 "-1-1~
1-;---------+- - - - '---
I-:---'--'-'--I.""""-'--.LI.-- -~~ pOLQt1 --of-~ A
111~----as:~w ... ~ &>"
'<. '<._A-'<. YO
o..lli -to (ie'" L.(~~L
~ ~ l(4.0-U~_~~r_
fOr
-~~ ---'
~ ... --<-+
l.l- "
'--v >--t{ -... .. 4( ~ - -,. -+ k..-. ' i-I.---
or--5StYQ{\'_.
. -<----~
"-I--r-..!L.~
p~e.s Q t -&e..- 'fJ~ k (.. Lt 4.
----==-=-
() G.Jc., -At(...~-~;'(
-A-I-=-----U\-,C -,
Ii'
I +
j _
I \
I"
J,
..
1--'
f ';t
t•
t t
J r [I
~1'
I . 1
;#:
'1
T
.. II
~1 I
I' I
I
I I~~
•.
I _)
U ..,...
(.J
R.v
"'
1.
-, -
j~
-..
~J;
I
I 1,
1 '
,
't+
r t
. 'T
I
I;; i"
;
I j
l
_ I
1 ,
,s:::
1 I
I
j I
1 •
I "
I '
f · , ·
1
r :r
-.
I.
• r
I' I
I !".
.,
~ '.
. I.
I1
'III
I I
.l rI
l' .;
j~ ~ r'
~. r ;
;;
A"
1l'
-I
Ie j
. ..
~ ~t
.'
~ \.
••
tIt
fI
"lI
~ -
I ",
.•
J
:! ..
. ::
."
:;'11
.", ,I:'
I '
I ~
.
-1
~.
1:
t
I-
r .-
l~ P
,,-cy
j I;
(J
..!Jj
' _~
I
f'
I ~
I
S?)
t·
-1
'1
1
14-1
;
~
D. .
.. ~
fU
1c
'=:';
V
-..
.../
>:'
,i
'-il
r ·I
Irh.
I.
t'
I I
. .
1\;1
V
I I ~
I ~~
..:J
II ~
J \J
tj
~j
~ l)1
1' l
Rf' f~
Ij
I.
. ,~I
'J ~ ~
l~aJ
IL
~-I
j 1'1
i..'
I rI
~8 e:
~ l ~
._II
I
II
l t,
...l
,p
f?'j
t-1=1
N
-j
I 0;'
I
I -
r.I
'I
-,--
.--.
rl C
9f'
74
''f
D~
I
_-rC
:f I ~
, If
fit
~/l'~'
P
I~ I
I
t ~
W P
eN
'f
Ct ~
r 1;.
K
,<
-'I
-1f
-.
SI
t1<:
::: I
pt'~
..:~
-~
'f
f I
It"
7F.~t
if
<
-q,
C
rI
"I
W
'{''
OJ
r-
C:J
(.,J
-I:
-'Tt
~'
4-
' 'f.
l' I I
Hi
.r+
•
r""
--
OJ
~
I ~
'1
-+-
I R
~
~~
,~
fJ
-
-'f.J
11
. -~
.r ~
I ~
,oj
I 1
t
'-"0
-r --/,)
..
~
I Y
a:
• r·
"t
-·
I -.
~ y
~ I
f7
~ '+
1 ~
j
Ie1
01 t
~ ~~
r ••
I I'll
I, Ii
II'
II
-oe
"
"
1 I ..
. :!
:.:
1.'
)"::
'1
1 T
J
ll!j
'llr
ll~-
Q~I"
j 1
'I t
).).
1 I.
••
1 •
I.
..,I
..I rI
.1
I/tt
' ~
•
.,
.. t
·t
+L
+
~
a
, --
T)
I'
1 r
I'1
t..
-~ ~
I
) •
· ~;
; i ~
: ~
~ i ti
I -
; II r
j 1:
iI ~
~ Ii:
'~ ~: I
~
....
I~~.
It
;~
I ~ r;
t.
~ ! ~
'!'
~
I' If
~ '
j
~~ I
•t
w
I~I
0+1
r'
~ "f"
r•
:.
j J
.. I 1
+.;
-'"
' ~
f?
II'
~I
I; 4
r.
l •
III
l..I
:jt -'I"~
.'1.. t"
~~
I·;:
II
'·
, '"
,. ·
I,
1 +
.,
I Pt
I ,."
I,.
., t::
!!>1
j
I j.~.
f
'!'~
I 1
'"
M"
c:; r
' I
1
I r
iii r .
ii ~ r
T
; Ii;
lJ I·
I t
[I ~ I
I:i
f !
[ :;
: j' !
'!;
II
, f
't
~ 1
~ J
t +
I
I
--+
. -~ - .....
I I
+- -t---- -
_... ---
... -
--+-- ..... -
• I
.---
- ~ I 1 r
._-"---,---+ - - - - r ~
I I
I
~ ..--<---'----r--------+
---.. I
----'---- - ---=
~ ~_ iT: __------"----1 C- -------,-- .
i--'---+-----.------;-------,,---'-~__+_.=..:---=-~:t -I -rr=~ ~ - 1----- - -~
~~I--'C-,PI-I----'-'~~..=---,. ~ L~'"f4A.--<l: It ;L~I -i . I ~ ~ r ...-----;-----,---'-----,~--'----.,.-------:....I-----' - . ~ ~ ~ ~Q-~(l~~ 1) - ~~
~,.-.--~ I I =;=[~ ) ~ __ u; Vq-OL ~---r-
13.11.09 15:59 C:\Users\Angela\ETH\Teaching\DynProgrOptCt...\Bertsekas7_1b.m 1 of 6
% Solution to Exercise 7.1 b) % in Bertsekas - "Dynamic Programming and Optimal Control" Vol. 1 p. 445%% --% ETH Zurich% Institute for Dynamic Systems and Control% Angela Schöllig% [email protected]%% --% Revision history% [12.11.09, AS] first version%% % clear workspace and command windowclear;clc; %% PARAMETERS% Landing probability p(2) = 0.95; % Slow serve % Winning probabilityq(1) = 0.6; % Fast serveq(2) = 0.4; % Slow serve % Define value iteration error bounderr = 1e-100; % Specifiy if you want to test one value for p(1) - landing probability for% fast serve: use test = 1, or a row of probabilities: use test = 2test =1; if test ==1 % Define p(1) p(1) = 0.2; % Number of runs mend = 1;else % Define increment for p(1) incr = 0.01; % Number of runs mend = ((0.95-incr)/incr+2);end %% VALUE ITERATIONfor m=1:mend %% PARAMETER % Landing probability if test == 2
13.11.09 15:59 C:\Users\Angela\ETH\Teaching\DynProgrOptCt...\Bertsekas7_1b.m 2 of 6
p(1) = (m-1)*incr; % Fast serve, check for 0...0.9 end %% INITIALIZE PROBLEM % Our state space is S={0,1,2,3}x{0,1,2,3}x{1,2}, % i.e. x_k = [score player 1, score player 2, serve] % ATTENTION: indices are shifted in code % Set the initial costs to 0, although any value will do (except for the % termination cost, which must be equal to 0). J = (p(1)+0.1).*ones(4, 4, 2); % Initialize the optimal control policy: 1 represents Fast serve, 2 % represents Slow serve FVal = ones(4, 4, 2); %% POLICY ITERATION % Initialize cost-to-go costToGo = zeros(4, 4, 2); iter = 0; while (1) iter = iter+1; if (mod(iter,1e3)==0) disp(['Value Iteration Number ',num2str(iter)]); end % Update the value for i = 1:3 [costToGo(4,i,1),FVal(4,i,1)] = max(q.*p + (1-q).*p.*J(4,i+1,1)+(1-p).*J(4,i,2)); [costToGo(4,i,2),FVal(4,i,2)] = max(q.*p + (1-q.*p).*J(4,i+1,1)); [costToGo(i,4,1),FVal(i,4,1)] = max(q.*p.*J(i+1,4,1) + (1-p).*J(i,4,2)); [costToGo(i,4,2),FVal(i,4,2)] = max(q.*p.*J(i+1,4,1)); for j = 1:3 % Maximize over input 1 and 2 [costToGo(i,j,1),FVal(i,j,1)] = max(q.*p.*J(i+1,j,1) + (1-q).*p.*J(i,j+1,1)+(1-p).*J(i,j,2)); [costToGo(i,j,2),FVal(i,j,2)] = max(q.*p.*J(i+1,j,1) + (1-q.*p).*J(i,j+1,1)); end end [costToGo(4,4,1),FVal(4,4,1)] = max(q.*p.*J(4,3,1) + (1-q).*p.*J(3,4,1)+(1-p).*J(4,4,2)); [costToGo(4,4,2),FVal(4,4,2)] = max(q.*p.*J(4,3,1) + (1-q.*p).*J(3,4,1)); % max(max(max(J-costToGo)))/max(max(max(costToGo))) if (max(max(max(J-costToGo)))/max(max(max(costToGo))) < err) % update cost J = costToGo; break;
13.11.09 15:59 C:\Users\Angela\ETH\Teaching\DynProgrOptCt...\Bertsekas7_1b.m 3 of 6
else J = costToGo; end end % Probability of player 1 wining the game JValsave(m)=J(1, 1, 1);end; %% POLICY ITERATIONfor m=1:mend %% PARAMETER % Landing probability if test == 2 p(1) = (m-1)*incr; % Fast serve, check for 0...0.9 end %% INITIALIZE PROBLEM % Our state space is S={0,1,2,3}x{0,1,2,3}x{1,2}, % i.e. x_k = [score player 1, score player 2, serve] % ATTENTION: indices are shifted in code % Initialize the optimal control policy: 1 represents Fast serve, 2 % represents Slow serve - in vector form FPol = 2.*ones(32,1); %% VALUE ITERATION iter = 0; while (1) iter = iter+1; disp(['Policy Iteration Number ',num2str(iter)]); if (mod(iter,1e3)==0) disp(['Policy Iteration Number ',num2str(iter)]); end % Update the value % Initialize matrices APol = zeros(32,32); bPol = zeros(32,1); for i = 1:3 % [costToGo(4,i,1),FVal(4,i,1)] = max(q.*p + (1-q).*p.*J(4,i+1,1)+(1-p).*J(4,i,2)); f = FPol(12+i); APol(12+i,12+i)=-1; bPol(12+i)= -q(f)*p(f); APol(12+i,13+i)= (1-q(f))*p(f); APol(12+i,28+i)= (1-p(f)); % [costToGo(4,i,2),FPol(4,i,2)] = max(q.*p + (1-q.*p).*J(4,i+1,1));
13.11.09 15:59 C:\Users\Angela\ETH\Teaching\DynProgrOptCt...\Bertsekas7_1b.m 4 of 6
f = FPol(28+i); APol(28+i,28+i)=-1; bPol(28+i)= -q(f)*p(f); APol(28+i,13+i)= 1-p(f)*q(f); % [costToGo(i,4,1),FPol(i,4,1)] = max(q.*p.*J(i+1,4,1) + (1-p).*J(i,4,2)); f = FPol(4*i); APol(4*i,4*i)=-1; APol(4*i,4*i+4)= q(f)*p(f); APol(4*i,20+4*(i-1))= 1-p(f); % [costToGo(i,4,2),FPol(i,4,2)] = max(q.*p.*J(i+1,4,1)); f = FPol(4*i+16); APol(4*i+16,4*i+16)=-1; APol(4*i+16,4*i+4)= q(f)*p(f); for j = 1:3 % Maximize over input 1 and 2 % [costToGo(i,j,1),FPol(i,j,1)] = max(q.*p.*J(i+1,j,1) + (1-q).*p.*J(i,j+1,1)+(1-p).*J(i,j,2)); f = FPol(4*(i-1)+j); APol(4*(i-1)+j,4*(i-1)+j)=-1; APol(4*(i-1)+j,4*i+j)= q(f)*p(f); APol(4*(i-1)+j,4*(i-1)+j+1)= (1-q(f))*p(f); APol(4*(i-1)+j,4*(i-1)+j+16)= (1-p(f)); % [costToGo(i,j,2),FPol(i,j,2)] = max(q.*p.*J(i+1,j,1) + (1-q.*p).*J(i,j+1,1)); f = FPol(4*(i-1)+j+16); APol(4*(i-1)+j+16,4*(i-1)+j+16)=-1; APol(4*(i-1)+j+16,4*i+j)= q(f)*p(f); APol(4*(i-1)+j+16,4*(i-1)+j+1)= 1-q(f)*p(f); end end % [costToGo(4,4,1),FPol(4,4,1)] = max(q.*p.*J(4,3,1) + (1-q).*p.*J(3,4,1)+(1-p).*J(4,4,2)); f = FPol(16); APol(16,16)=-1; APol(16,15)= q(f)*p(f); APol(16,12)= (1-q(f))*p(f); APol(16,32)= (1-p(f)); % [costToGo(4,4,2),FPol(4,4,2)] = max(q.*p.*J(4,3,1) + (1-q.*p).*J(3,4,1)); f = FPol(32); APol(32,32)=-1; APol(32,15)= q(f)*p(f); APol(32,12)= 1-q(f)*p(f); % Calculate cost JPol = zeros(32,1); JPol = APol\bPol; % Can now update the policy. FPolNew = FPol; JPolNew = JPol; JPol_re = reshape(JPol,4,4,2);
13.11.09 15:59 C:\Users\Angela\ETH\Teaching\DynProgrOptCt...\Bertsekas7_1b.m 5 of 6
JPol_re(:,:,1)= JPol_re(:,:,1)'; JPol_re(:,:,2)= JPol_re(:,:,2)'; % Initialize cost-to-go costToGo = zeros(4, 4, 2); FPolNew_re = zeros(4, 4, 2); % Update the optimal policy for i = 1:3 [costToGo(4,i,1),FPolNew_re(4,i,1)] = max(q.*p + (1-q).*p.*JPol_re(4,i+1,1)+(1-p).*JPol_re(4,i,2)); [costToGo(4,i,2),FPolNew_re(4,i,2)] = max(q.*p + (1-q.*p).*JPol_re(4,i+1,1)); [costToGo(i,4,1),FPolNew_re(i,4,1)] = max(q.*p.*JPol_re(i+1,4,1) + (1-p).*JPol_re(i,4,2)); [costToGo(i,4,2),FPolNew_re(i,4,2)] = max(q.*p.*JPol_re(i+1,4,1)); for j = 1:3 % Maximize over input 1 and 2 [costToGo(i,j,1),FPolNew_re(i,j,1)] = max(q.*p.*JPol_re(i+1,j,1) + (1-q).*p.*JPol_re(i,j+1,1)+(1-p).*JPol_re(i,j,2)); [costToGo(i,j,2),FPolNew_re(i,j,2)] = max(q.*p.*JPol_re(i+1,j,1) + (1-q.*p).*JPol_re(i,j+1,1)); end end [costToGo(4,4,1),FPolNew_re(4,4,1)] = max(q.*p.*JPol_re(4,3,1) + (1-q).*p.*JPol_re(3,4,1)+(1-p).*JPol_re(4,4,2)); [costToGo(4,4,2),FPolNew_re(4,4,2)] = max(q.*p.*JPol_re(4,3,1) + (1-q.*p).*JPol_re(3,4,1)); JPolNew(1:16) = reshape(costToGo(:,:,1)', 16,1); JPolNew(17:32) = reshape(costToGo(:,:,2)', 16,1); FPolNew(1:16) = reshape(FPolNew_re(:,:,1)', 16,1); FPolNew(17:32) = reshape(FPolNew_re(:,:,2)', 16,1); % Quit if the policy is the same, or if the costs are essentially the % same. if ((norm(JPol - JPolNew)/norm(JPol) < 1e-10) ) break; else FPol = FPolNew; end end % Probability of player 1 wining the game JPolsave(m)=JPolNew(1,1); end;
13.11.09 15:59 C:\Users\Angela\ETH\Teaching\DynProgrOptCt...\Bertsekas7_1b.m 6 of 6
if test ==2 plot(0:incr:(0.95-incr), JValsave,'.'); hold on plot(0:incr:(0.95-incr), JPolsave,'r'); grid on title('Probability of the server winning a game') xlabel('p_F') ylabel('Probability of winning') legend('Value Iteration', 'Policy Iteration')else disp(['Probability of player 1 winning ',num2str(JValsave(1))]); subplot(1,2,1) bar3(FVal(:,:,1),0.25,'detached') title(['First Serve for p_F =', num2str(p(1))]) ylabel('Score Player 1') xlabel('Score Player 2') zlabel('Input: Fast Serve (1) or Slow Serve (2)') subplot(1,2,2) bar3(FVal(:,:,2),0.25,'detached') title(['Second Serve for p_F =', num2str(p(1))]) ylabel('Score Player 1') xlabel('Score Player 2') zlabel('Input: Fast Serve (1) or Slow Serve (2)')end
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9Probability of the server winning a game
pF
Pro
babi
lity
of w
inni
ng
Value IterationPolicy Iteration
f<- LA.) e U-L-t}:= l FJ I R J r=;-: ~ciV<ShIK
H ~ -:-~ 'r Qa{LX.S~K
--+ -
-tJ-'.-j f- . - ..... >----
f- r I- .I
t I- ....-- -r-I-
I--L_ f+-I--I-
I I II
I I I t--t--I~I -!-- l--I-
~, lJi .... 1-1 I 1 -t +--f-...
[ L~ r!J 1 ~ ::fH :. 1 I~f:( t ~ ; ~
; II. {{.ottf t r • T i t: ~ ?'O~~7. I L · t r ~ ---. r= . ~l - 1----+
L -~ ; c ~ = L;.( z.L f A~ s(,.tLS c.-rcJ-' I Z ; c£.vc:.s 111 rI kU c...X1.(
.....
S"~(we" ~~~~ ~ c..:>~~t ~ C)f<.-...u roG'w.s.
pZ= (A,n.) .
)!-'J.: (f=l,R).
j3 ~ ( I - [~.~ ~.Jr' [!s-lat
~4= (ft, R _' -
"'O.() O'~]1-1 [6] - 0<. Lo~ If O. (; - ~
. -
-1 -
cx-p ] /l-oL+~P
- A-2",-+-0-~ + ~ oLp -01(, oL~p
lA-o-.)'? + (-1-o.-)ol(9+P)
(A-ol) (A.- eX. 17 ol(?+p))
..J: ~ ( - ,<A)-1,: 11
-) I ot->-1 0-ol) r1~)
-7
I+. 12.
-:L !'1] =
11.((.09
Ca.¥<. -'f: eX.. gv../+r'~k J SeNt ( : (;)( -">0
--1 Lt::::~ j<ft ~L~] .
3 c:]0<.->0 eX.. ") ()
::::~ fJ ~L~ 1 I r ~ f-~10...-' 0 0/...-)0
Cc<-\-<.... z: ol.- ~/"'~ I-<.r cw k 1-0. A !
~ 3·: ~:-j oq [~; ~ ~] [_:]
:J '} := 'to z-z,~ J /1- ~ ...-r-OL"'1
:=:"> ~ }" > ~ jJ , j=z,3,lf oJ.- -">--1 --- 0< >...,
.:::~ '~~~k~' 0'L'ur r ":: (17t I it ) l'j ~~a-e
-s
rO -= Cy0(A.),jA U(t))
=-C1
/2)
• Sfup"2; POG.<CCJ ~ruo~'_J·-u ..A._d Sk-p
M-(L.t"I( I) .- 0JC\ V'-\.()..>( (~(fIU-) +r>Lt r,~ tlA.) .1j1l: Cj) ) i ~'1, L ,; <J t.L€. UU) r' I
r~ (-{) -OJ~ V\Ac..J( f 4- -r (}q (Og If-''(.{) -t-O·l.lyo(?')),
6 +0~ ((';IS J}'.(-t) +o.qy.n)) J =: CWt \..U~l( t A 6. 2, /1 t:;'. ~ ~ =- R
jJ- ~ ('2) = (D~ ..u.C<-x [- S- + (11. ~ ( C). t· Jft>( I{) 40. '3 . ~ft> (2 ) ) I
-1 +0· ~ ((J. t ~rrJ(I() +(). (~jJ(){Z ) ~
- O-J~ ~o_x 1(.27-, t;'.63 = J~
~ (R, R)
-(J.
12.11.09 11:44 D:\research\IMRT\teaching\DPOC_Fall09\ProblemS...\script_P73c.m 1 of 4
% script_P73c.m
%
% Matlab Script with implementation of a solution to Problem 7.3c from the
% class textbook.
%
% Dynamic Programming and Optimal Control
% Fall 2009
% Problem Set, Chapter 7
% Problem 7.3 c
%
% --
% ETH Zurich
% Institute for Dynamic Systems and Control
% Sebastian Trimpe
%
% --
% Revision history
% [11.11.09, ST] first version
%
%
% clear workspace and command window
clear;
%clc;
%% Program control
% Flag defining whether we run the value iteration with or without error
% bounds.
flag_errbnds = true;
% False: No error bounds; fixed number of iterations MAX_ITER.
% True: Use error bounds. Terminate when result within tolerance ERR_TOL.
MAX_ITER = 1000;
ERR_TOL = 1e-6;
%% Setup problem data
% Stage costs g. g is a 2x2 matrix where the first index correponds to the
% state i=1,2 and the second index correponds to the admissible control
% input u=1 (=advertising/do research), or 2 (=don't advertise/don't do
% research).
g = [4 6; -5 -3];
% Transition probabilities. P is a 2x2x2 matrix, where the first index
% corresponds to the origin state, the second index corresponds to the
% destination state and the third input corresponds to the applied control
% input where (1,2) maps to (advertise/do research, don't advertise/don't
% do research). For example, the probability of transition from node 1 to
% node 2 given that we do not advertice is P(1,2,2).
P = zeros(2,2,2);
% Advertise/do research (u=1):
12.11.09 11:44 D:\research\IMRT\teaching\DPOC_Fall09\ProblemS...\script_P73c.m 2 of 4
P(:,:,1) = [0.8 0.2; 0.7 0.3];
% Don't advertise/don't do research (u=2):
P(:,:,2) = [0.5 0.5; 0.4 0.6];
% Discount factor.
%alpha = 0.9;
alpha = 0.99;
%% Value Iteration
% Initialize variables that we update during value iteration.
% Cost (here it really is the reward):
costJ = [0,0];
costJnew = [0,0];
% Policy
policy = [0,0];
% Control variable for breaking the loop
doBreak = false;
% Loop over value iterations k.
for k=1:MAX_ITER
for i=1:2 % loop over two states
% One value iteration step for each state.
[costJnew(i),policy(i)] = max( g(i,:) + alpha*costJ*squeeze(P(i,:,:)) );
end;
% Save results for plotting later.
res_J(k,:) = costJnew;
% Construct string to be displayed later:
dispstr = ['k=',num2str(k,'%5d'), ...
' J(1)=',num2str(costJnew(1),'%6.4f'), ...
' J(2)=',num2str(costJnew(2),'%6.4f'), ...
' mu(1)=',num2str(policy(1),'%3d'), ...
' mu(2)=',num2str(policy(2),'%3d')];
% Use error bounds if flag has been set.
if flag_errbnds
% Compute error bounds.
lower = costJnew + alpha/(1-alpha)*min(costJnew-costJ); % lower bound on J*
upper = costJnew + alpha/(1-alpha)*max(costJnew-costJ); % upper bound on J*
% display string:
dispstr = [dispstr,' lower=[',num2str(lower,'%9.4f'), ...
'] upper=[',num2str(upper,'%9.4f'),']'];
% If desired tolerance reached: can terminate the algorithm.
if all((upper-lower)<=ERR_TOL)
% Difference between upper and lower bound less than specified
12.11.09 11:44 D:\research\IMRT\teaching\DPOC_Fall09\ProblemS...\script_P73c.m 3 of 4
% tolerance for all states: set cost to average of upper and
% lower bound and terminate.
% Note that using the bounds, we can get a better result than
% just using J_{k+1}.
costJnew = mean([lower; upper],1);
dispstr = [dispstr, ...
sprintf('\nResult within desired tolerance. Terminate.\n')];
doBreak = true;
end;
% Save results for plotting later.
res_lower(k,:) = lower;
res_upper(k,:) = upper;
end;
% Update the cost. One could also implement an immidiate update of the
% cost as Gauss-Seidel type update, which would yield faster
% convergence.
costJ = costJnew;
% Display:
disp(dispstr);
if doBreak
break;
end;
end;
% display the optained costs
disp('Result:');
disp([' J*(1) = ',num2str(costJ(1),'%9.6f')]);
disp([' J*(2) = ',num2str(costJ(2),'%9.6f')]);
disp([' mu*(1) = ',num2str(policy(1))]);
disp([' mu*(2) = ',num2str(policy(2))]);
%% Plots
% Plot costs and bounds over iterations.
kk = 1:size(res_J,1);
figure;
subplot(2,1,1);
if(flag_errbnds)
plot(kk,[res_J(:,1),res_lower(:,1), res_upper(:,1)], ...
[kk(1),kk(end)],[costJ(1), costJ(1)],'k');
else
plot(kk,[res_J(:,1)],[kk(1),kk(end)],[costJ(1), costJ(1)],'k');
end;
grid;
%legend('lower bound','upper bound','J_k','J*');
%xlabel('iteration');
%
12.11.09 11:44 D:\research\IMRT\teaching\DPOC_Fall09\ProblemS...\script_P73c.m 4 of 4
subplot(2,1,2);
if(flag_errbnds)
plot(kk,[res_J(:,2),res_lower(:,2), res_upper(:,2)], ...
[kk(1),kk(end)],[costJ(2),costJ(2)],'k');
else
plot(kk,[res_J(:,2)],[kk(1),kk(end)],[costJ(2), costJ(2)],'k');
end;
grid;
if(flag_errbnds)
legend('J_k','lower bound','upper bound','J*');
else
legend('J_k','J*');
end;
xlabel('iteration');
I
T
~
..
r
-+--..------. ---
t-
- --I- +-- ~ __
-,
_____ -.0-. __ -+-
-~ ,..--,--
------r---+ ,.. .... --r r-+-.,.---.,.----, - t
~-- -- ~
-~)--
-- - +--~
-_ --+--1--'-_-
~ ~ ~~-=-=-=-~ =: ==~ - I±fl.£dn .l----,----j--,-~~....;__f_-- I ~--
-l-------'----' .1--1..,....-.,---'----'-'---!--_ _~ ~- ~ - _ ~
- ~ I 1'.-=-~_-a- -~ .::I! .. ---+__• . I I I ' ~~
, - -CoJlDi~..L-i~'-,,-,,'!--.,....-"'&--t:-~~
..
~_--t ~I ~ T~ - --
I--:--__~~ ~ k<:/Yl ~ t$ Q (c) r ~ ~, '"'l:l .r> rJ: r-~\ ~- L '~ ~ft(/~\'£F- -\.,~ I, kYC ~...,.."..,
""IIiI'~"-J ......, --at X 1if -F--m..~_-'-- ):\S6 ~~ I 1~_
h __ ~_ 0t<.LZ) _C~ ~ -=r6E)J± or propa ~~;)~I .-. a:s-~ - Cr5;(~ _ - ---r ~
~ £hd:- D\...~ o.ft:tr a..:l (Y'uX::£ CL <Steps { (.,Je..
i---..----r-- --c.eKf{(}{~~~ ~~-=.6 - ~~ a.L~
b (,.)(.t.~ ~po {we. (0to6~tl.j
I I 1
; 1I
• 1
I t
., +
...
r I I •
I : :;)
j. I
j lil-
if "
I ~
l.! j"
. j
r I
1 ••.
I I
j I
~ t
] .. I
• I
I I:
I'
I
I I
I I
c t.
I
I I
'I
! I
r j
I
)/IJ
I tll
I I
r ,
III
I ;"
rllr:j
jl1
!11
I
I IJ
J I l~
I.
1 ;
1
~ I
I I
I [
: !I
i.
I 1
;
,;Iii
; I
·I c
I -I
! ! I
I '
i~ ·I
~ I , I +
I +
J
..
I t
'Pn:7 ~ <..u....t ..{." '1 [ . . I
~. [ ~ ~~~~~ r-roru ~.c:J i2P'-s ~
p-oace, t~o....h(,~ Oyt pCLcr- 4- /s .J • •• (j-,'~ 1/J-1 eX) ~ lJJ (x) VX' e. S
• T {;" k~ ~~: }I' (X' ).f 11c.t I (x)
P(Q-O~ b ~~ch'~~ f ~
rw ~ ;t -,-l;-/ J><! ~-1 hi l'< ) ~'~
:1 K. L>< ') ~ ~ ~-i; (l) \7 >( E S_ {<N S~ k
- .
~-~ -~C,,-C>4-~.J-u6(.a~4
t:le-t (\<'1 v... "7,,,- LX- IA. ~~ -/ _~~_~ +0... ~}- T
-} - .
. -
L ~ __-9" e eX..
N '"
Q
..D
I
1;1 r
r~
1 --f
N
~tt
l ~
rJ
« -
'+
-...-.
x ~!.
~
~ "7
}:
rJ
0
~
(( -~
~ ~
I '~
r ~-
t
t :
I
+
i
It
I
t
~
....... '
tX
-
V! -..
.A
r\1"
.
'Ls.
.J
· ·.
r:~:
j~1
E. >
(
·f ~
I i
I
-.
-: I
+
L
, ,
... t
t ~
+
I +
I
t-I
1
i +
t
-I ~
I· •
~ j
1
r -\
-t---
4--,
--~
l' -
-:--
~
jl-'
}
"J.+
I
:l+t
_t~!
!l
• I
II
1+--
1t
1 I
t-I
,
t -I
--
-.•
+.
r t
Itt
.
t 1
!
+
I i
_ ._
. __~
1 .:
t 1
r J
: ~_
1 I
~
+
I
+
_J L
,
1
t+
t-l-
t
1
f ! ~
t
+
I ~
r
~ j ;
I ~ ~
[I t
:1 ·
, . ,
r
-.. . -