deep reinforcement learning based maintenance free ...back to basics • pid fits naturally in the...
TRANSCRIPT
![Page 1: Deep Reinforcement Learning Based Maintenance Free ...Back to basics • PID fits naturally in the actor-critic framework • Simple, industrially-accepted control structure with](https://reader035.vdocuments.us/reader035/viewer/2022071605/6141b5d7d64cc55ff0755816/html5/thumbnails/1.jpg)
Deep Reinforcement Learning Based Maintenance Free Control Nathan Lawrence ~ UBC mathematicsCollaborative R&D between UBC (Bhushan Gopaluni, Philip Loewen, Greg Stewart) and Honeywell (Michael Forbes, Johan Backstrom)
Control (e.g. PID or MPC)
Reinforcement Learning Agent
![Page 2: Deep Reinforcement Learning Based Maintenance Free ...Back to basics • PID fits naturally in the actor-critic framework • Simple, industrially-accepted control structure with](https://reader035.vdocuments.us/reader035/viewer/2022071605/6141b5d7d64cc55ff0755816/html5/thumbnails/2.jpg)
Overview• Deep Reinforcement Learning as a (model-free) framework for
control in industrial settings
➡ Utilize new and historical data
➡ Control with minimal disruptions to the plant and minimal human intervention
Control (e.g. PID or MPC)
actuators sensors
Reinforcement Learning Agent
Control (e.g. PID or MPC)
actuators sensors
Prior Art Tuning Techniques
(1) Prior art: semi-manual tuning
(2) Control tuning with DRL
(3) Replace controller with DRL
offlineonline
offlineonline
Deep Neural Network
Reinforcement Learning Agent
actuators sensors
offline
online
![Page 3: Deep Reinforcement Learning Based Maintenance Free ...Back to basics • PID fits naturally in the actor-critic framework • Simple, industrially-accepted control structure with](https://reader035.vdocuments.us/reader035/viewer/2022071605/6141b5d7d64cc55ff0755816/html5/thumbnails/3.jpg)
Initial work
• Deep RL for set-point tracking
➡ Actor = controller = deep neural network
• Tracking performance and adaptability is promising, but issues with sample efficiency, stability, interpretability, compatibility, tuning of hyper-parameters
ProcessActor Replay Memory
Z�1
Update Actor Update Critic
Target NetworkUpdate
atyt+1yt
< st, at, rt, st+1 >
PolicyUpdates
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971. 2015 Sep 9. Spielberg S, Tulsyan A, Lawrence NP, Loewen PD, Bhushan Gopaluni R. Toward self-driving processes: A deep reinforcement learning approach to control. AIChE J. 2019;e16689.
![Page 4: Deep Reinforcement Learning Based Maintenance Free ...Back to basics • PID fits naturally in the actor-critic framework • Simple, industrially-accepted control structure with](https://reader035.vdocuments.us/reader035/viewer/2022071605/6141b5d7d64cc55ff0755816/html5/thumbnails/4.jpg)
Back to basics
• PID
fits naturally in the actor-critic framework
• Simple, industrially-accepted control structure with straightforward initialization
Lawrence NP, Stewart GE, Loewen PD, Forbes MG, Backstrom JU, Bhushan Gopaluni R. Optimal PID and antiwindup control design as a reinforcement learning problem. IFAC World Congress, submitted.
u(t) = kpe(t) + ki ∫t
0e(τ)dτ + kd
ddt
e(t),
![Page 5: Deep Reinforcement Learning Based Maintenance Free ...Back to basics • PID fits naturally in the actor-critic framework • Simple, industrially-accepted control structure with](https://reader035.vdocuments.us/reader035/viewer/2022071605/6141b5d7d64cc55ff0755816/html5/thumbnails/5.jpg)
Anti-windup
• Integral action with actuator constraints can lead to integral windup
• PID + AW is the (nonlinear) actor
Lawrence NP, Stewart GE, Loewen PD, Forbes MG, Backstrom JU, Bhushan Gopaluni R. Optimal PID and antiwindup control design as a reinforcement learning problem. IFAC World Congress, submitted.
st at
Qt
…
…kp ki kd ρ
st
at
(controller) (performance)
at = sat(kpe + kiIy + kdD + ρIu)