2019 06 rl 3- applications - university of texas at arlington 01 ntu hri and rl...may 20, 2013...
TRANSCRIPT
![Page 1: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/1.jpg)
Moncrief-O’Donnell Chair, UTA Research Institute (UTARI)The University of Texas at Arlington, USA
F.L. Lewis National Academy of Inventors
Talk available online at http://www.UTA.edu/UTARI/acs
Applications of Integral Reinforcement Learning:Microgrids, UAV, Human-Robot Interaction
Supported by :ONRUS NSF
![Page 2: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/2.jpg)
Applications of Reinforcement Learning
Microgrid Control Human‐Robot Interactive LearningIndustrial process control‐Mineral grinding in Gansu, ChinaH‐infinity control for UAVResilient Control to Cyber‐Attacks in Networked Multi‐agent SystemsDecision & Control for Heterogeneous MAS (different dynamics)
![Page 3: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/3.jpg)
8
Work of Vahidreza Nasirian with Ali Davoudi
Game-theoretic Control for DC Microgrids
![Page 4: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/4.jpg)
9
![Page 5: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/5.jpg)
AC Microgrid:
1) Complex synchronization procedure for grid-tied operation (frequency, magnitude, and phase match is required)
2) Complex control circuitry (voltage, frequency, and active/reactive power control)
3) Unwanted transmission loss due to reactive power exchange
4) Redundant dc-ac-dc conversions for integration of renewable sources, loads, and storage units
5) Harmonic current management and phase unbalances
DC Microgrid:
1) Only voltage and power control is needed2) No reactive power flow and, thus, an
improved overall efficiency3) Converted renewable energies are
basically dc and, thus, a dc distribution is more effective for integration of these sources
4) No harmonic current or phase unbalance issue
10
Advantages of DC Microgrids
![Page 6: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/6.jpg)
Cooperative Game-theoretic Control of Active Loads in DC Microgrids
3t
2t
1t
3t
2t
1t
e
outp
inp
e
outp
3t
2t
1t
inp
Power buffer operation during a step change in power demand.
Supplies excess power needed during load changes until sources can respond
18r
48r
58r
59r
47r
27r
67r
69r
39r
iv i
p
s1vs1r
iu
ie
Power buffers in Microgrid Network
Ling-ling Fan, V. Nasirian, H. Modares, F.L. Lewis,Y.D. Song, and A. Davoudi, “Game-theoretic Controlof Active Loads in DC Microgrids,” IEEE Trans.Energy Conversion, vol. 31, no. 3, pp. 882-895, 2016.
![Page 7: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/7.jpg)
2
,i
i ii
i i
ve p
rr u
ìïïï = -ïíïï =ïïî
Active Load Power Buffer
Stored energyInput impedanceBus voltage Control input Output power = a disturbance
ieir
iviu
ip
Vahid Nasirian
Nonlinear dynamicsNot obvious how to handle ip
![Page 8: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/8.jpg)
2
0
d , 1, , ,i
i j ij j i ij N
J u t i M M Nr¥
Î
æ ö÷ç ÷ç= + = + +÷ç ÷ç ÷çè øåò x Q xT
Define coupled performance indices
( )
2q q
1( )q
0 00 2 1
1 00 0 0
0 10 0 0
2 0 ,
0
i i i ii
i ii ii i
i i i i
i i
M N
ij jj M i
i
e ei i
r r u w
p p
r
i
g
g+
= + ¹
é ùé ù é ù é ù é ù- -ê úê ú ê ú ê ú ê úê úê ú ê ú ê ú ê úê ú= + + +ê ú ê ú ê ú ê úê úê ú ê ú ê ú ê úê úê ú ê ú ê ú ê úê úë û ë û ë û ë ûë û
é ùê úê úê úê ú+ ê úê úê úê úë û
å
x x B DA
1, , ,i M M N= + +
Solve for bus voltage to get coupled agent dynamics
Define Communication GraphSparse efficient topologyOptimal design provides Resilience
and disturbance rejection
Vahid NasirianReza Modares
Dr. Ali DavoudiLinearize.Add as a state.Formulate as H‐infinity Problem.
ip
Coupling terms
![Page 9: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/9.jpg)
14
Optimal Cooperative Control as a Dynamic Game
14
Minimize the performance function for active loads
Ji x jTQijx j
jNi
iui2
dt
0
Let’s define the neighborhood state vector as xi xiT, x j
T jNi T
The optimal solution is in a general form of
With such solutions, the performance function Ji is quadratic in x:
ui kixi
Ji (xi ) xiTPixi
which helps to find the optimal solution by solving an algebraic Riccati equation
ui* Bii
TPixi i1
Graphical Game
![Page 10: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/10.jpg)
15
Optimal Cooperative Control: Policy Iteration finds Optimal Solutions
15
• Substituting the optimal solution in Bellman equations leads to the following coupled Algebraic Riccati Equations (ARE)
• Policy iteration (a class of reinforcement learning) is used to solve ARE and find Pi and the optimal control input
• Policy evaluation: the performance of a given control policy, ui, is evaluated using the Bellman equation, and Pi are found.
• Policy improvement: an improved control policy, ui, is found for each agent, using Pi found in the first step.
• Policy evaluation and improvement are repeated until no improvement in control policies, ui, of any agent is observed.
Hi xiTQixi
T +i ui* 2
+xiTPi Aixi Biui
* Diwi (xi ) + Aixi Biui
* Diwi (xi ) TPixi=0
ui* ui
*, uj* jNi T
ui* Bii
TPixi i1
![Page 11: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/11.jpg)
(a) DC microgrid system(b) Active load(c) Communication network
16
Controller Implementation
Microgrid Setup and Cooperative Controller
![Page 12: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/12.jpg)
Controller Performance with Load Change
17
(a) microgrid bus voltages at the load terminals, (b) Output voltage of the power buffers, (c) output voltage across theresistive loads, (d) Source currents, (e) Stored energies in power buffers, (f) Input impedance of the power buffers, (g)Output of the active loads, (h) energy-impedance trajectory of power buffers during the load transient.
Load change in bus 5; Buffers 4 & 5 assisting Load change in bus 4; Multiple assistive buffers
![Page 13: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/13.jpg)
Intelligent Operational Control for Complex Industrial Processes
Professor Chai Tianyou
State Key Laboratory of Synthetical Automation for Process Industries
Northeastern UniversityMay 20, 2013
Jinliang Ding
1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based Multi-objective Plant-widePerformance Optimization of Industrial Processes under Dynamic Environments,” IEEE Trans. IndustrialInformatics, vol.12, no. 2, pp. 454-465, April 2016.
2. Xinglong Lu, B. Kiumarsi, Tianyou Chai, and F.L. Lewis, “Data-driven Optimal Control of OperationalIndices for a Class of Industrial Processes,” IET Control Theory & Applications, vol. 10, no. 12, pp. 1348-1356, 2016.
![Page 14: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/14.jpg)
Manufacturing as the Interactions of Multiple AgentsEach machine has it own dynamics and cost functionNeighboring machines influence each other most stronglyThere are local optimization requirements as well as global necessities
![Page 15: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/15.jpg)
Production line for mineral processing plant
Mineral Processing Plant in Gansu China
![Page 16: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/16.jpg)
Existing Manual Control for Plant production indices, unit operational indices, and unit process control for a production line
Overall
![Page 17: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/17.jpg)
ˆ ( )kQ t
( )kQ mT
,~ { } 1,
1, 2,3i jr i n
j
r r
ˆ( )tr
( )mTr
*min max, ,k k kQ Q Q
*,i jr
*min max, ,k k kQ Q Q
( )kQ mT
*min max, ,k k kQ Q Q
, ( )i jr mT
Automated online reinforcement learning for determining operational indices
Implemented by Jingliang Ding and Chai Tianyou’s group in biggest mineral processingfactory of hematite iron ore in China, Gansu Province.
Savings of 30.75 million RMB per year were realized by implementing this automatedoptimization procedure instead of the standard industry practice of human operatorselection of process operational indices.
2 RL loopsAnd Value Function Approximation
![Page 18: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/18.jpg)
Xinglong Lu, B. Kiumarsi, Tianyou Chai, and F.L. Lewis, “Data-driven Optimal Control ofOperational Indices for a Class of Industrial Processes,” IET Control Theory & Applications, vol.10, no. 12, pp. 1348-1356, 2016.
Yi Jiang, Jialu Fan, Tianyou Chai, Jinna Li, and F.L. Lewis, “Data-Driven Flotation IndustrialProcess Operational Optimal Control Based on Reinforcement Learning,” IEEE Trans. IndustrialInformatics, to appear, 2018.
Jinna Li, Tianyou Chai, F.L. Lewis, Jialu Fan, Zhangtao Ding, and Jinliang Ding, “Off-policy Q-learning: set-point design for optimizing dual-rate rougher flotation operational processes,” IEEETrans. Industrial electronics, vol. 65, no. 5, pp. 4092-4102, May 2018.
Jinna Li, Bahare Kiumarsi, Tianyou Chai, F.L. Lewis, and Jialu Fan, “Off-Policy ReinforcementLearning: Optimal Operational Control for Two-Time-Scale Industrial Processes,” IEEE Trans.Cybernetics, vol. 47, no. 12, pp. 4547-4558, Dec. 2017.
Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based Multi-objective Plant-widePerformance Optimization of Industrial Processes under Dynamic Environments,” IEEE Trans.Industrial Informatics, vol.12, no. 2, pp. 454-465, April 2016.
![Page 19: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/19.jpg)
Control of Non-affine Aerial Systems Using Off-policy Reinforcement Learning
![Page 20: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/20.jpg)
![Page 21: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/21.jpg)
( ) ( ( )) ( ( )) ( ) ( )X t f X t g X t L u Dw t= + +
1 1 1
2 2 2
3 3 32
22 1 3 4 2
cos cos
cos sin
sin
sin
( cos cos )
sincos
zz
z
z
x V d w
x V d w
x V d w
nV V g T n
Vgn
Vgn
V
g yg yg
a g a a a
g f g
y fg
= += +=- +
=- - + - -
= -
=
max
max
cos
sin
x
x
TT Dn
mgTT K
nmg
a
a
-=
+=
with
UAV dynamics
Non‐affine nonlinear aerial vehicle model
![Page 22: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/22.jpg)
( ) ( ( )) ( ( )) ( ) ( )X t f X t g X t L u Dw t= + +
4 5 6
4 5 6
4 52
2 4 5
54
cos( )cos( )
cos( )sin( )
sin( )
( ( )) sin( )
( cos( )
0
x x x
x x x
x x
f X t x g x
gx
x
a
é ùê úê úê úê ú-ê úê ú= - -ê úê úê ú
-ê úê úê úê úë û
41 3 2
4
4 5
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
( ( )) 0 0
0 0 0 1 0
0 0 0 0cos( )
g X tx
g
x x
aa a
é ùê úê úê úê úê úê úê ú= - -ê úê úê úê úê úê úê úê úë û
11
222
3 2
4 2 3
5 2 3
( ( ))
cos( )
sin( )
uL
uL
L uL u t
L u u
L u u
é ùé ùê úê úê úê úê úê úê úê ú= = ê úê úê úê úê úê úê úê úê úê úë û ë û
1 2 3{ , , , , , }TX x x x V g y=State
Dynamics
where
![Page 23: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/23.jpg)
Optimal Control for Constrained Input Systems
This is a quasi-norm
2
0
2 ( )u
Tq
u d
Weaker than a norm –homogeneity property is replaced by the weaker symmetry property
qqxx
(Used by Lyshevsky for H2 control)
Control constrained by saturation function tanh(p)
p
1
-1
0 0
( , ) ( ) 2 ( )u
TJ u d Q x d dt
Encode constraint into Value function
Then Is BOUNDED1 ( )T Vu R g xx
Murad Abu-Khalaf
![Page 24: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/24.jpg)
( ) ( )d d dX t h X=
2( )
2
2( )
( )
( )
t
t
t
t
e z d
e w d
a t
a t
t tg
t t
¥- -
¥- -
£ò
ò
2 T( ) ( ) ( ) ( ( ))d d
z t X X Q X X W L u= - - +
( ) ( ( )) ( ( )) ( ) ( )X t f X t g X t L u Dw t= + +
( ) T 2 T( ) ( ) ( ) ( ( ))td dt
J X e X X Q X X W L u w w da t g t¥
- - é ù= - - + -ê úë ûò
1 1
2 2
u u
u u
£
£
where
H‐infinity Control Tracking Problem
UAV dynamics
Desired trajectory generator
Bounded L2 norm
Constrained controls
Formulate as Optimal Control Problem
![Page 25: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/25.jpg)
( ) ( ) ( )d
e t X t X t= -
( )( )
( )d
e tZ t
X t
é ùê ú= ê úê úë û
( ) ( ) ( ( )) ( )
( ) ( ) ( ( )) ( ( )) ( ) ( )( ) ( ( )) 0 0
d d d
d d d
e t f e X h X t g e X DL u w t F Z t G Z t L u Kw t
X t h X t
é ù é ù é ù é ù+ - +ê ú ê ú ê ú ê ú= + + º + +ê ú ê ú ê ú ê úê ú ê ú ê ú ê úë û ë û ë û ë û
( ) T 2 T1
( ( ), ) ( ( ))t
tJ L u w e Z Q Z W L u w w da t g t
¥- - é ù= + -ê úë ûò
1
0
0 0
é ùê ú= ê úê úë û
Write Augmented System and Leader Dynamics
Tracking error
Augmented State
Augmented Tracking Dynamics
Performance Index
with
![Page 26: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/26.jpg)
T T( ) tanh (( ) )Z
L u L V G* *= -
2 T1( )
2 Zw V Kg* - *=
T 2 T1
( ( )) ( ) ( ) 0Z Q Z W L u w w V Z V Zg a+ - - + =
T 2 T T1
( , ( ), , ) ( ( )) ( ) ( ( ) ( ) ( ) ) 0Z Z
H Z L u w V Z Q Z W L u w w V Z V F Z G Z L u Kwg a= + - - + + + =
( )( ) argmin ( , ( ), , )
L uL u H Z L u w V* *=
arg max ( , ( ), , )w
w H Z L u w V* *=
Optimal H‐inf Tracker
Bellman Equation
Stationarity Condition gives Optimal Control and worst‐case disturbance
So that
![Page 27: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/27.jpg)
1( tanh ( ))Tu L L v* - *= -
Assume L(u) is Invertible
Then
![Page 28: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/28.jpg)
Reinforcement Learning Policy Iteration Solution
Need to know input matrices G and K
( ) ( ) ( ( )) ( )
( ) ( ) ( ( )) ( ( )) ( ) ( )( ) ( ( )) 0 0
d d d
d d d
e t f e X h X t g e X DL u w t F Z t G Z t L u Kw t
X t h X t
é ù é ù é ù é ù+ - +ê ú ê ú ê ú ê ú= + + º + +ê ú ê ú ê ú ê úê ú ê ú ê ú ê úë û ë û ë û ë û
![Page 29: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/29.jpg)
( ) ( ( )) ( ( )) ( ) ( ( ))( ( ) ( )) ( )j j j jZ t F Z t G Z t L u Kw G Z t L u L u K w w= + + + - + -
Off‐Policy IRL Solution
Do not need any of the dynamics of UAV or leader
Off‐PolicyBellman Equation
![Page 30: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/30.jpg)
Data‐Driven Real‐Time Solution Using VFA
Approximate critic, control, disturbance
Plug into Off‐Policy Bellman Equation to get algebraic equations for the weights
![Page 31: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/31.jpg)
RL for Human-Robot Interaction (HRI)1. H. Modares, I. Ranatunga, F.L. Lewis, and D.O. Popa, “Optimized Assistive Human-robot
Interaction using Reinforcement Learning,” IEEE Transactions on Cybernetics, vol. 46, no. 3,pp. 655-667, 2016.
2. I. Ranatunga, F.L. Lewis, D.O. Popa, and S.M. Tousif, "Adaptive Admittance Control forHuman-Robot Interaction Using Model Reference Design and Adaptive Inverse Filtering" IEEETransactions on Control Systems Technology, vol. 25, no. 1, pp. 278-285, Jan. 2017.
3. B. AlQaudi, H. Modares, I. Ranatunga, S.M. Tousif, F.L. Lewis, and D.O. Popa, “Modelreference adaptive impedance control for physical human robot interaction,” Control Theory andTechnology, vol. 14, no. 1, pp. 1-15, Feb. 2016.
![Page 32: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/32.jpg)
PR2 meets Isura
![Page 33: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/33.jpg)
Robot dynamics
Prescribed Error system
Control torque depends onImpedance model parameters
Impedance Control
![Page 34: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/34.jpg)
Standard Robot Trajectory Tracking Controller
Where is the human?
![Page 35: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/35.jpg)
Human task learning has 2 components:1. Human learns a robot dynamics model to compensate for robot nonlinearities2. Human learns a task model to properly perform a task
Inner Robot Specific Control LoopINDEPENDENT OF TASK
Outer Task Specific Control LoopINDEPENDENT OF ROBOT DETAILS
Human Performance Factors Studies
![Page 36: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/36.jpg)
Robot control inner loop
Task control outer loop
RL for Human‐Robot Interactions
![Page 37: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/37.jpg)
No task trajectory information is used in this inner‐loop robot controllerThe inner‐loop robot controller makes the model‐following error smallThe admittance model parameters are not neededOnly the admittance model trajectories are needed., ,m m mx x x
New Inner Robot Control Loop
![Page 38: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/38.jpg)
Three Outer Loop DesignsTo appear 2016
![Page 39: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/39.jpg)
2C. Outer‐loop Task Specific Design #3
Reinforcement Learning for minimum human effort
Feedforward assistive control term
‐ 2 1( )Ms Bs K -+ +hK
(.)l
+
-
dx
mxh
fde +
+
1( )p dK s K s -+
PrescribedImpedanceModelHuman
Find robot impedance model parametersTo minimize human force effortAnd task trajectory following error
, ,M B Khf
de
Human force amplifier
Work of Reza Modares
Force exerted by human indicates his discontent‐A measure of Human Intent
![Page 40: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/40.jpg)
Feedback linearization loop
Robot Impedance Model Unknown Human Model
1,0h d p h e d d h h h d
f K K f k K e A f E e-= - + º +
( )d p e dK s K f k e+ =
d h p h e dK f K f k e+ =
![Page 41: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/41.jpg)
Minimize human effort and tracking error
( )T T Td d d h h h e e
t
J e Q e f Q f u Ru dt¥
= + +ò
Performance index
1 2e d hu K e K f= +
Then control is
( )T Te e
t
J X Q X u Ru dt¥
= +ò
Overall Augmented Dynamics
nd d me x x= - Î
2[ ]T T T nd d d de e e x x= = - Î
Augmented Tracker Dynamics with Human and Tracking Error
![Page 42: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/42.jpg)
We want online method to learn the optimal control without knowing the System Matrix A
Optimal Design Always Admits Reinforcement Learning for Real‐time Optimal Adaptive Control
Optimal control is an offline methodBased on solving AREKnowing all the plant dynamics
![Page 43: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/43.jpg)
Take enough data along the system trajectoryTo solve this equation using least‐squares
OFF‐POLICYReinforcement LearningNeeds NO knowledge of the system dynamics
![Page 44: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/44.jpg)
![Page 45: 2019 06 RL 3- applications - University of Texas at Arlington 01 NTU HRI and RL...May 20, 2013 Jinliang Ding 1. Jinliang Ding, H. Modares, Tianyou Chai, and F.L. Lewis, "Data-based](https://reader035.vdocuments.us/reader035/viewer/2022071423/611dcaa6f808262e5d29bbb6/html5/thumbnails/45.jpg)