![Page 1: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/1.jpg)
ReinforcementLearningforCPSSafetyEngineering
SamGreen,Çetin KayaKoç,Jieliang LuoUniversityofCalifornia,SantaBarbara
![Page 2: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/2.jpg)
Motivations
![Page 3: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/3.jpg)
Safety-criticaldutiesdesiredbyCPS?
• Autonomousvehiclecontrol:UAV,passengervehicles,deliverytrucks• Automaticallyrespondingto,orpreventing,damage• Industrialrobotcontrolforusearoundhumans• Largeprocessautomation• E.g.,optimizationoffactory
![Page 4: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/4.jpg)
ReinforcementLearning
![Page 5: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/5.jpg)
GeorgiaTech,https://www.youtube.com/watch?v=f2at-cqaJMM
![Page 6: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/6.jpg)
Deepmind,https://arxiv.org/abs/1707.02286
![Page 7: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/7.jpg)
MachineLearning
Supervised Unsupervised Reinforcement
![Page 8: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/8.jpg)
IntroductiontoRL
• Acomputationalapproachtolearningfrominteraction• Establishedinthe1980s• Objectiveistotakeactionstomaximizeareward(orminimizeacost)• SeenasapathtowardArtificialGeneralIntelligence
• RLisattheintersectionbetween• Psychology• ControlTheory• ComputerScience/AI
• Resurgencewithadventofdeeplearningmethods
![Page 9: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/9.jpg)
[Mnih,etal.AsynchronousMethodsforDeepReinforcementLearning,2016]
AdvancesinRLsince2015
20152015201520152015201620162016
![Page 10: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/10.jpg)
Terminology
• Agent – Thethingwearelearningtocontrol• Environment – Allthefactorsaffectingtheagent• Action – Performedbyagentinanattempttoaffectchangeontheenvironment• Reward – Returnedbytheenvironmenttotheagentaftertheagentmakesanaction.Usedtohelptheagentlearn.• AKAthenegativecost
![Page 11: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/11.jpg)
[R.Sutton,andA.Barto.ReinforcementLearning:AnIntroduction.2016]
![Page 12: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/12.jpg)
MarkovDecisionProcess
• WhatRLsolves• Environmentswhereagent’sdecisionsareonlydependentonpresent• Anobjectinflight• Self-drivingcar• Manufacturingprocess• Robotcontrol
• It’snotthatthepastdoesn’tmatter,butthelawsofphysicsguaranteecertainthings,e.g.momentum• MethodsalsoexisttosolveapproximateMDP
![Page 13: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/13.jpg)
Example:StudentMarkovChain
[http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/MDP.pdf]
Starthereatthebeginningofeachepisode
![Page 14: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/14.jpg)
RLforCPSSafetyEngineering
• InterdisciplinarynaturesmakesRLinterestingforCPSengineering• AI,ML(Math,Statistics)• Mechanicsdesignandsimulation(ME,Physics,CS)• Programmingandimplementation(CS,EE)
![Page 15: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/15.jpg)
MountainCarExample
![Page 16: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/16.jpg)
• Agentisanunderpoweredcarwith3actions:• Backward,Neutral,Forward
• Reward:=-1pertimestep• Implicitgoal:=Reachtheflagasfastaspossible
• State:=x-pos andvelocity
Canonicalexample:MountainCar
[R.Sutton,andA.Barto.ReinforcementLearning:AnIntroduction.2016]
![Page 17: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/17.jpg)
Model-FreeControlviaPolicy-BasedRL• Asimplephysicsmodeldeterminesthebehaviorofcar• Capturespositionofthecaronthehill• Captureseffectoflimitedenginepower
• Usingaphysicsmodelsimplifiesapproach• Useanefficienttraditionalcontroller
• Butinmanyscenariosthemodelisnotavailableortoocomplex• Amazonpackagedeliverydrone
• Solvemountaincarusingsophisticatedmethodastoyexample• Directlytrainaneuralnetwork-basedpolicy
![Page 18: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/18.jpg)
![Page 19: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/19.jpg)
RLTerminologyandNotation
• 𝑆𝑡 – Stateoftheenvironmentattime𝑡• x-axispositionandvelocity
• 𝐴𝑡 – Actiontakenbyagentattime𝑡• Backward,Neutral,Forward
• 𝜋 – Thepolicyfunction;returnsthenextactiontotake.Stochasticinthisexample• 𝜃– Aparametervectorforthepolicy;i.e.theweightslearnedinaneuralnetwork
Puttingeverythingtogether:𝐴'()~𝜋𝜃 𝐴𝑡,𝑆𝑡 = 𝑃(𝐴𝑡|𝑆𝑡, 𝜃)
![Page 20: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/20.jpg)
Thepolicy𝜋𝜃• 𝜋𝜃 isoftenapproximated• Deepneuralnetworksarepowerforapproximation• WewillusegradientascenttooptimizetheDNN
![Page 21: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/21.jpg)
Thepolicyfunction𝜋𝜃,approximatedbyNN
• Stateinformationattime𝑡:• PositionandVelocity
• Actionoptionsattime𝑡:• Forwardacceleration• Neutral• Backwardacceleration
PositionVelocity
Input Output
𝜋𝜃Prob(F)Prob(N)Prob(B)
![Page 22: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/22.jpg)
Rewardfunction• Ateverytimesteptakeanaction• Forward,neutral,backward• Eachactionhasarewardof-1• Trainagenttoreachtheflaginminimumtimesteps
![Page 23: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/23.jpg)
Example:MarkovRewardProcess
[http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/MDP.pdf]
Starthereatthebeginningofeachepisode
![Page 24: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/24.jpg)
HowtotraintheNN?
• Smallnetworkscanbeeffectivelytrainedwithgeneticalgorithms• Geneticalgorithmsworkpoorlywithlargenetworks(parameterspaceistoolarge)• Gradient-ascentoptimizationworkswithlargeparameterspace Position
Velocity
Prob(F)Prob(N)Prob(B)
𝜋𝜃
![Page 25: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/25.jpg)
Monte-CarloPolicyGradient(REINFORCE)
• FindDNNparametervector𝜃 suchthat𝜋𝜃 maximizesthereward• Foreveryepisode,untilflagisreached• Getstateinformation(position&velocity)fromenvironment• FeedNNwithstateinformation• NNwilloutputaprobabilityfor(F)orward,(N)eutral,and(B)ackward• RandomlyselectactionF,N,andB(usingtheaboveprobabilities)• Storethestateinformationandactiontaken
• Onceflagisreached• Assignthemostrewardtothelastaction…leastrewardtothefirstaction• Update𝜃 s.t. actionsmadeattheendaremoreprobable
[http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html]
![Page 26: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/26.jpg)
Monte-CarloPolicyGradient
• Methodleveragesmethodscreatedforsupervisedlearning• Inputs≔ thestateinformation(position,velocity)• Predictions:=forward,neutral,orbackwardactiontaken• Labels(“groundtruth”):=Aftertheepisodewasover,assignmostvaluetothelastactions.Assignleastvaluetothefirstactions
• Runmanyepisodes,aftereachepisodefinishes(flagisreached)strengthenthenetworksuchthatthelastmovesbecomemoreprobable
[http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html]
![Page 27: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/27.jpg)
Gradient-ascent
• Gradientalgorithmsfindalocalextremum• Atendofeachepisode,adjusteachparameterin𝜃 s.t. actionsmadeneartheendarestrengthened• Howmuchandinwhichdirectiontomoveeachparameterisdeterminedbythebackpropagationmethod
𝜃1𝜃2
EpisodeRewards
![Page 28: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/28.jpg)
![Page 29: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/29.jpg)
Caveats
• DeepRLisusuallyslowtolearn
• Transferringknowledgefromoneproblemtoanotherisdifficult
• Rewardfunctioncanbecomplex
![Page 30: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/30.jpg)
SafetyandSecurityConsiderations
![Page 31: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/31.jpg)
![Page 32: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/32.jpg)
SafetyandSecurityConsiderations
• DNNsareblack-boxmodels• PossibletogiveaninputwhichcausesDNNtoprovidewildoutput
• Effortstomitigatethislimitation• E.g.ConstrainedPolicyOptimization
![Page 33: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/33.jpg)
ConstrainedPolicyOptimization
• School-bookRLspecifiesonlytherewardfunction• Problem:whenanagentislearning,itmaytryanything• Potentiallyunsafewhentrainingisinphysicalenvironment
• Constraintscanbeaddedtotheobjectivefunction
[Achiam etal.“ConstrainedPolicyOptimization”,2017]
![Page 34: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/34.jpg)
CurrentEfforts
![Page 35: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/35.jpg)
DevelopingRLforQuadcopterControl• GoodcasestudyforcomplexautonomousCPS• Collisionavoidance• Targettracking• Packagedelivery
• Usingopensourcefirmwareandhardware
![Page 36: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/36.jpg)
UsingMicrosoftAirSim for1st-orderlearning
[S.Shahetal.AirSim:High-FidelityVisualandPhysicalSimulationforAutonomousVehicles.2017.]
![Page 37: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/37.jpg)
Conclusions
• RLisageneralizablemethodtotacklemanyCPSdecisionmakingproblems• High-capacitymodelscanmakesophisticateddecisions
• GoodapproachforCPSeducation,becauseofinterdisciplinarynature
• Openproblemswhenusingblack-boxfunctionsforsafetyapplications
![Page 38: Reinforcement Learning for CPS Safety Engineeringkoclab.cs.ucsb.edu/cpsed/files/Green1.pdf · Reinforcement Learning for CPS Safety Engineering Sam Green, ÇetinKaya Koç, JieliangLuo](https://reader034.vdocuments.us/reader034/viewer/2022051911/6000f640903bbf6dc81473f9/html5/thumbnails/38.jpg)
Questions?