uav route planning in delay tolerant networks
DESCRIPTION
UAV Route Planning in Delay Tolerant Networks. Daniel Henkel , Timothy X Brown University of Colorado, Boulder Infotech @ Aerospace ‘07 May 8, 2007. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A. Familiar: Dial-A-Ride. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: UAV Route Planning in Delay Tolerant Networks](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814749550346895db48894/html5/thumbnails/1.jpg)
UAV Route Planning in Delay Tolerant Networks
Daniel Henkel, Timothy X Brown
University of Colorado, Boulder
Infotech @ Aerospace ‘07
May 8, 2007
![Page 2: UAV Route Planning in Delay Tolerant Networks](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814749550346895db48894/html5/thumbnails/2.jpg)
Familiar: Dial-A-Ride
Receive calls Pick up and drop
off passengers Minimize overall
transit time
The Bus
Dial-A-Ride: curb-to-curb, shared ride transportation service
Optimal route not trivial !
![Page 3: UAV Route Planning in Delay Tolerant Networks](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814749550346895db48894/html5/thumbnails/3.jpg)
In context: Dial-A-UAV
Sparsely distributed sensors, limited radios TSP solution not optimal Our approach: Queueing and MDP theory
Sensor-1
Sensor-2
Sensor-4
MonitoringStation
Delay tolerant traffic!
Sensor-5Sensor-3
Sensor-6
Complication: infinite data at sensors; potentially two-way traffic
Talk tomorrow – 8am:Sensor Data Collection
![Page 4: UAV Route Planning in Delay Tolerant Networks](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814749550346895db48894/html5/thumbnails/4.jpg)
TSP’s Problem
• One cycle visits every node
• Problem: far-away nodes with little data to sendVisit them less often
Traveling Salesman SolutionA Bhub
fA fB
UAV
dA dB
New: cycle defined by visit frequencies pi
pA pB
B
B
![Page 5: UAV Route Planning in Delay Tolerant Networks](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814749550346895db48894/html5/thumbnails/5.jpg)
Queueing Approach
Idea: express delay in terms of pi, then minimize over set {pi}
• pi as probability distribution
• Expected service time of any packet
• Inter-service time: exponential distribution with mean Ti/pi
• Weighted delay:
A
Chub
fA
fC
UAV
dAdB
pA pB
B
fB
D
fD
i
ip 1
pC
pD
dD
dC
i
ii pT
GoalMinimize average delay
0ip
i j i
ijj
Fp
fpT
![Page 6: UAV Route Planning in Delay Tolerant Networks](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814749550346895db48894/html5/thumbnails/6.jpg)
Solution and Algorithm
Probability of choosing node i for next visit:
jjj
iii
f
fp
/
/
Implementation: deterministic algorithm1. Set ci = 02. ci = ci + pi while max{ci} < 13. k = argmax {ci}4. Visit node k; ck = ck-15. Go to 2.
Performance improvement over TSP!
![Page 7: UAV Route Planning in Delay Tolerant Networks](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814749550346895db48894/html5/thumbnails/7.jpg)
Unknown Environment
• What is RL?• Learning what to do without prior training• Given: high-level goal; NOT: how to reach it• Improving actions on the go
• Distinguishing Features:• Interaction with environment• Trial & Error Search• Concept of Rewards & Punishments
• Example: training dog
Learns model of environment.
![Page 8: UAV Route Planning in Delay Tolerant Networks](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814749550346895db48894/html5/thumbnails/8.jpg)
The Framework
Agent• Performs
Actions
Environment• Gives rise to
Rewards• Puts Agent in
situations called States
![Page 9: UAV Route Planning in Delay Tolerant Networks](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814749550346895db48894/html5/thumbnails/9.jpg)
Elements of RL
Policy
RewardValue
Model ofEnvironment
• Policy: what to do (depending on state)• Reward: what is good• Value: what is good because it predicts reward• Model: what follows what
Source: Sutton, Barto, Reinforcement Learning – An Introduction, MIT Press, 1998
![Page 10: UAV Route Planning in Delay Tolerant Networks](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814749550346895db48894/html5/thumbnails/10.jpg)
UA Path Planning - Simple
• Service traffic from A and B to hub H
• Goal: minimize average packet delay• State: traffic waiting at nodes: (tA, tB)
• Actions: fly to A; fly to B• Reward: # packets delivered
• Optimal policy: # visits to A and B; depend on flow rates, distances
A Bhub
fA fB
UAV
dA dB
pA pB
GoalMinimize average delay
-> Find pA and pB
![Page 11: UAV Route Planning in Delay Tolerant Networks](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814749550346895db48894/html5/thumbnails/11.jpg)
MDP
• If a reinforcement learning task has the Markov Property, it is basically a Markov Decision Process (MDP).
• If state and action sets are finite, it is a finite MDP.• To define a finite MDP, you need to give:
• state and action sets• one-step “dynamics” defined by transition probabilities:
• reward expectation:
).(,, allfor ,Pr 1 sAaSssaassss tttass P
Rs s a E rt1 st s,at a,st1 s for all s, s S, a A(s).
![Page 12: UAV Route Planning in Delay Tolerant Networks](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814749550346895db48894/html5/thumbnails/12.jpg)
• Policy: Mapping from set of States to set of Actions
π : S → A • Sum of Rewards (:=return): from this time onwards
• Value function (of a state): Expected return when starting with s and following policy π. For an MDP,
RL approach to solving MDPs
![Page 13: UAV Route Planning in Delay Tolerant Networks](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814749550346895db48894/html5/thumbnails/13.jpg)
Bellman Equation for Policy π
• Evaluating E{.}; assuming deterministic policy; π solution:
• Action-Value Function: Value of taking action a in state s. For an MDP,
![Page 14: UAV Route Planning in Delay Tolerant Networks](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814749550346895db48894/html5/thumbnails/14.jpg)
• V and Q, both have a partial ordering on them since they are real valued. π also ordered:
• Concept of V* and Q*:
• Concept of π*: The policy π which maximizes Qπ(s,a) for all states s.
Optimality
if and only if V (s) V (s) for all s S
V (s) max
V (s) for all s S
Q(s,a) max
Q (s, a) for all s S and a A(s)
(s) arg maxaA (s)
Q(s,a)
![Page 15: UAV Route Planning in Delay Tolerant Networks](https://reader036.vdocuments.us/reader036/viewer/2022062803/56814749550346895db48894/html5/thumbnails/15.jpg)
Reinforcement Learning - Methods
• To find π*, all methods try to evaluate V/Q value functions
• Different Approaches:• Dynamic Programming Approach
• Policy evaluation, improvement, iteration
• Monte-Carlo Methods• Decisions are taken based on averaging sample
returns
• Temporal Difference Methods (!!)