multi-objective reinforcement learning for cognitive radio
TRANSCRIPT
![Page 1: Multi-Objective Reinforcement Learning for Cognitive Radio](https://reader030.vdocuments.us/reader030/viewer/2022012506/6181f36d4e8cc0106c251fce/html5/thumbnails/1.jpg)
Multi-Objective Reinforcement Learning for Cognitive Radio-based Satellite Communications
1
NASA GRC Grant: “Intelligent Media Access Protocol for SDR-based Satellite Communications.”
Grant number: NNC14AA01AP. V. R. Ferreira © 2016
Paulo Ferreira, Randy Paffenroth, Alexander Wyglinski - (WPI)
Timothy Hackett, Sven Bilen - (Penn State)
Richard Reinhart, Dale Mortensen - (NASA GRC)
![Page 2: Multi-Objective Reinforcement Learning for Cognitive Radio](https://reader030.vdocuments.us/reader030/viewer/2022012506/6181f36d4e8cc0106c251fce/html5/thumbnails/2.jpg)
Worcester Polytechnic Institute
Agenda
• What is a Cognitive Radio?
• CR applications
• The problem: Multi-objective performance
• Reinforcement Learning: The solution
• Satcom RL performance
2 P. V. R. Ferreira © 2016
![Page 3: Multi-Objective Reinforcement Learning for Cognitive Radio](https://reader030.vdocuments.us/reader030/viewer/2022012506/6181f36d4e8cc0106c251fce/html5/thumbnails/3.jpg)
What is a Cognitive Radio ?
3 P. V. R. Ferreira © 2016
![Page 4: Multi-Objective Reinforcement Learning for Cognitive Radio](https://reader030.vdocuments.us/reader030/viewer/2022012506/6181f36d4e8cc0106c251fce/html5/thumbnails/4.jpg)
Worcester Polytechnic Institute
What is a Cognitive Radio ?
4 P. V. R. Ferreira © 2016
What you want
What you see
What you can do
What you can tune
T. Collins, A.M. Wyglinski. “Enabling Security in Cognitive Radios and Wireless Spectrum.” MILCOM Tutorial, 2014.
![Page 5: Multi-Objective Reinforcement Learning for Cognitive Radio](https://reader030.vdocuments.us/reader030/viewer/2022012506/6181f36d4e8cc0106c251fce/html5/thumbnails/5.jpg)
Worcester Polytechnic Institute
What is a Cognitive Radio ?
• Learning algorithm
─ Explore vs. Exploit
5 P. V. R. Ferreira © 2016
0
100
Err
or
(%)
Global minima
Local minima
Iteration number
![Page 6: Multi-Objective Reinforcement Learning for Cognitive Radio](https://reader030.vdocuments.us/reader030/viewer/2022012506/6181f36d4e8cc0106c251fce/html5/thumbnails/6.jpg)
CR applications
6 P. V. R. Ferreira © 2016
![Page 7: Multi-Objective Reinforcement Learning for Cognitive Radio](https://reader030.vdocuments.us/reader030/viewer/2022012506/6181f36d4e8cc0106c251fce/html5/thumbnails/7.jpg)
Worcester Polytechnic Institute
CR applications
• Satcom
7 P. V. R. Ferreira © 2016
Reinhart, R. C. Using International Space Station For Cognitive System Research And Technology With Space-based Reconfigurable Software Defined Radios. 66th International Astronautical Congress, IAC 2015.
![Page 8: Multi-Objective Reinforcement Learning for Cognitive Radio](https://reader030.vdocuments.us/reader030/viewer/2022012506/6181f36d4e8cc0106c251fce/html5/thumbnails/8.jpg)
Worcester Polytechnic Institute
• Modulation scheme
• Encoding scheme
• Symbol rate
• Bandwidth
• Carrier frequency
• ADC/DAC resolution
• Antenna
• Transmission power level
P. V. R. Ferreira © 20168
CR Satcom – What you tune: PHY layer
![Page 9: Multi-Objective Reinforcement Learning for Cognitive Radio](https://reader030.vdocuments.us/reader030/viewer/2022012506/6181f36d4e8cc0106c251fce/html5/thumbnails/9.jpg)
Multi-Objective Comms Performance
The problem
9 P. V. R. Ferreira © 2016
![Page 10: Multi-Objective Reinforcement Learning for Cognitive Radio](https://reader030.vdocuments.us/reader030/viewer/2022012506/6181f36d4e8cc0106c251fce/html5/thumbnails/10.jpg)
Worcester Polytechnic Institute
Multi-objective comms performance
P. V. R. Ferreira © 201610
M – modulation and encoding schemesR – data rateBER – bit error rateP – transmission powerW – bandwidthEb – energy per bit
![Page 11: Multi-Objective Reinforcement Learning for Cognitive Radio](https://reader030.vdocuments.us/reader030/viewer/2022012506/6181f36d4e8cc0106c251fce/html5/thumbnails/11.jpg)
Reinforcement Learning
The solution
11 P. V. R. Ferreira © 2016
![Page 12: Multi-Objective Reinforcement Learning for Cognitive Radio](https://reader030.vdocuments.us/reader030/viewer/2022012506/6181f36d4e8cc0106c251fce/html5/thumbnails/12.jpg)
Worcester Polytechnic Institute
Reinforcement Learning
12 P. V. R. Ferreira © 2016
Agent
EnvironmentTx Rx
𝑨𝒕
𝑺𝒕+𝟏 𝑹𝒕+𝟏
𝑹𝒕
𝑺𝒕
Communications channel
Space segment
Ground segment
𝐴𝑡=action𝑅𝑡= reward𝑆𝑡=state
![Page 13: Multi-Objective Reinforcement Learning for Cognitive Radio](https://reader030.vdocuments.us/reader030/viewer/2022012506/6181f36d4e8cc0106c251fce/html5/thumbnails/13.jpg)
Worcester Polytechnic Institute
Reinforcement Learning
13 P. V. R. Ferreira © 2016
![Page 14: Multi-Objective Reinforcement Learning for Cognitive Radio](https://reader030.vdocuments.us/reader030/viewer/2022012506/6181f36d4e8cc0106c251fce/html5/thumbnails/14.jpg)
Worcester Polytechnic Institute
Reinforcement Learning
14 P. V. R. Ferreira © 2016
• Satcom 𝑸 𝑺,𝑨
𝑄𝑘+1 𝑠𝑘 , 𝑢𝑘 = 𝑄𝑘 𝑠𝑘 , 𝑢𝑘 + 𝛼𝑘𝑟𝑘+1
𝑢𝑘 = ℎ 𝑠𝑘
𝑠𝑘+1 = 𝑔 𝑠𝑘 , 𝑢𝑘
𝑟𝑘+1 = 𝜌 𝑠𝑘 , 𝑢𝑘
State-action policy
State-transition function
Reward function
![Page 15: Multi-Objective Reinforcement Learning for Cognitive Radio](https://reader030.vdocuments.us/reader030/viewer/2022012506/6181f36d4e8cc0106c251fce/html5/thumbnails/15.jpg)
Satcom RL performance
15 P. V. R. Ferreira © 2016
![Page 16: Multi-Objective Reinforcement Learning for Cognitive Radio](https://reader030.vdocuments.us/reader030/viewer/2022012506/6181f36d4e8cc0106c251fce/html5/thumbnails/16.jpg)
Worcester Polytechnic Institute
Fixed exploration probability (𝜀 = 0.5)
16 P. V. R. Ferreira © 2016
![Page 17: Multi-Objective Reinforcement Learning for Cognitive Radio](https://reader030.vdocuments.us/reader030/viewer/2022012506/6181f36d4e8cc0106c251fce/html5/thumbnails/17.jpg)
Worcester Polytechnic Institute
Variable exploration probability 𝜀
17 P. V. R. Ferreira © 2016
![Page 18: Multi-Objective Reinforcement Learning for Cognitive Radio](https://reader030.vdocuments.us/reader030/viewer/2022012506/6181f36d4e8cc0106c251fce/html5/thumbnails/18.jpg)
Worcester Polytechnic Institute
Time spent at performance levels
P. V. R. Ferreira © 201618
Mission 1: Launch/re-entry Mission 2: Multimedia
![Page 19: Multi-Objective Reinforcement Learning for Cognitive Radio](https://reader030.vdocuments.us/reader030/viewer/2022012506/6181f36d4e8cc0106c251fce/html5/thumbnails/19.jpg)
Worcester Polytechnic Institute
Time spent at performance levels
P. V. R. Ferreira © 201619
Mission 3: Power saving Mission 4: Normal
![Page 20: Multi-Objective Reinforcement Learning for Cognitive Radio](https://reader030.vdocuments.us/reader030/viewer/2022012506/6181f36d4e8cc0106c251fce/html5/thumbnails/20.jpg)
Worcester Polytechnic Institute
Time spent at performance levels
P. V. R. Ferreira © 201620
Mission 5: Cooperation
![Page 21: Multi-Objective Reinforcement Learning for Cognitive Radio](https://reader030.vdocuments.us/reader030/viewer/2022012506/6181f36d4e8cc0106c251fce/html5/thumbnails/21.jpg)
THANK YOU!
WIWireless
Applied Research Solutions for a Connected Society
LaboratoryInnovation