departure management with a reinforcement learning approach: respecting cfmu slots
TRANSCRIPT
![Page 1: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/1.jpg)
Departure MANagement with a ReinforcementLearning ApproachRespecting CFMU Slots
Ivomar Brito Soares, Yann-Michael De Hauwere, KrisJanuarius, Tim Brys, Thierry Salvant, Ann Nowe
[email protected] - [email protected]
ITSC 2015September 2015
![Page 2: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/2.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Summary
Introduction
Reinforcement Learning
Departure MANanagement
DMAN RL Model
Experiments
Conclusions
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 2
![Page 3: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/3.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Large Scale Multi-Agent Systems (MAS)
Smart Energy Grids Intelligent Traffic Systems
Warehouse PlanningAir Traffic System
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 3
![Page 4: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/4.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Large Scale MAS
Characteristics
I Resources and control are inherently distributed.
I Different actors with mutual and conflicting interests.
I Highly stochastic.
I Full system dynamics is unknown (e.g. not all constraints areknown).
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 4
![Page 5: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/5.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Large Scale MAS
Characteristics
I Resources and control are inherently distributed.
I Different actors with mutual and conflicting interests.
I Highly stochastic.
I Full system dynamics is unknown (e.g. not all constraints areknown).
Full mathematical description or a global centralized solution aredifficult to be calculated.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 4
![Page 6: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/6.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Summary
Introduction
Reinforcement Learning
Departure MANanagement
DMAN RL Model
Experiments
Conclusions
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 5
![Page 7: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/7.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Why use RL?
I Artificial Intelligence (AI)I Machine Learning (ML)
I Reinforcement Learning (RL)
Reinforcement Learning
I Agent based modelling effort is reduced.I Exceeds human controller performance.I Adaptive and can change its decisions dynamically.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 6
![Page 8: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/8.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Single-Agent RL
Single-Agent Reinforcement Learning (RL) Model[Kelbling et al. (1996)]
I Approach to solve a Markov Decision Process (MDP).I Agent must learn by itself (trial and error).I Maximize a long term numerical reward signal.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 7
![Page 9: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/9.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Multi-Agent RL
Multi-Agent Reinforcement Learning (MARL)[Nowe (2011)]
I Markov Game (MG).
I Multiple agents / multiple sequential decisions.
I More complex than Single-Agent RL.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 8
![Page 10: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/10.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Summary
Introduction
Reinforcement Learning
Departure MANanagement
DMAN RL Model
Experiments
Conclusions
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 9
![Page 11: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/11.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Air Traffic System
I Air Traffic System (ATS)I Airport Ground Operations (AGO)
I Departure MANagement (DMAN)
Organize movement of departing aircraft from gate to take-offclearance.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 10
![Page 12: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/12.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Air Traffic System
I Air Traffic System (ATS)I Airport Ground Operations (AGO)
I Departure MANagement (DMAN)
Organize movement of departing aircraft from gate to take-offclearance.
DMAN Tasks
I Respect assigned Target Take-Off Time Windows(TTOTW).
I Increase of runway throughput and airport capacity.
I Reduce fuel consumtion, noise and CO2 emissions.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 10
![Page 13: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/13.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Target Take-Off Time Window (TTOTW)
Respect TTOW allows for a better usage of the ATSinfra-structure (CFMU Slots ≡Take-off Time Slots).
Flight Plan CallsignTarget Take-Off Time Window (TTOTW)
TTOTWmin TTOT TTOTWmax
AAL0005D 07:19:12 07:20:12 07:21:12
DAL0067D 07:22:12 07:23:12 07:24:12
JBU0065D 07:25:12 07:26:12 07:27:12
AAL0007D 07:28:12 07:29:12 07:30:12
DAL0009D 07:31:12 07:32:12 07:33:12Some TTOT windows generated for learning scenario 6
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 11
![Page 14: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/14.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Target Take-Off Time Window (TTOTW)
Respect TTOW allows for a better usage of the ATSinfra-structure (CFMU Slots ≡Take-off Time Slots).
Flight Plan CallsignTarget Take-Off Time Window (TTOTW)
TTOTWmin TTOT TTOTWmax
AAL0005D 07:19:12 07:20:12 07:21:12
DAL0067D 07:22:12 07:23:12 07:24:12
JBU0065D 07:25:12 07:26:12 07:27:12
AAL0007D 07:28:12 07:29:12 07:30:12
DAL0009D 07:31:12 07:32:12 07:33:12Some TTOT windows generated for learning scenario 6
In Charles de Gaulle Airport (LFPG) in Paris, France, roughly80% of the flights succeed in taking off inside their slots[Gotteland, 2003] .
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 11
![Page 15: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/15.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Human Controllers Approaches to Respecting TTOTW
1. Gate Controllers: Clear the aircraft off-block at TTOT -estimation of time duration between off-block and take-off.
I Average: Average of all departure aircraft of KJFK.I Exact: When it taxis alone.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 12
![Page 16: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/16.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Human Controllers Approaches to Respecting TTOTW
1. Gate Controllers: Clear the aircraft off-block at TTOT -estimation of time duration between off-block and take-off.
I Average: Average of all departure aircraft of KJFK.I Exact: When it taxis alone.
2. Runway Controllers: Make it wait before lining up, if itestimates that it will miss its window by taking-off too early.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 12
![Page 17: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/17.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Fast Time Simulation (FTS)
I Modeling: EnRoute flight phase, Terminal ManeuveringArea (TMA), aircraft and airport handlers, vehicles groundmovements.
I Simulation clock runs faster than a regular clock (fast-time).
I AirTOp, SIMMOD, TAAM, Arc Port ALTO etc.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 13
![Page 18: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/18.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
John F. Kennedy International Airport (KJFK)
GridWorldEnvironment
New York City, USA (NY-Metro)
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 14
![Page 19: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/19.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Summary
Introduction
Reinforcement Learning
Departure MANanagement
DMAN RL Model
Experiments
Conclusions
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 15
![Page 20: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/20.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Environment
I Fast Time Simulation: AirTOp.
I Departure Flights: Off-Block → Pushing-Back → Taxiing →Runway acceleration → Take-Off → Standard InstrumentDeparture (SID) → EnRoute.
I Arrival Flights: EnRoute → Standard Terminal ArrivalRoute (STAR) → Land → Runway decceleration → Taxiing→ In-Block.
I Safety requirements: wake-vortex separations, runway usage.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 16
![Page 21: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/21.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Markov Decision Process (MDP)
I Agent: A FP controller agent.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 17
![Page 22: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/22.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Markov Decision Process (MDP)
I Agent: A FP controller agent.I States:
1. Parked (initial state): one per departure aircraft with a TTOTwindow.
I Departure GateI Entry Time at departure gate.
2. Taxiing (intermediate states) (finite):I Entry NodeI Exit NodeI Entry Time at entry node.
3. Taken-Off (goal/absorbing states):I Taken-Off Inside WindowI Taken-Off Outside Window
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 17
![Page 23: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/23.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Markov Decision Process (MDP)
I Actions:
1. Delay Off-Block.2. Delay During Taxiing.3. Take-Off
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 18
![Page 24: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/24.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Markov Decision Process (MDP)
I Actions:
1. Delay Off-Block.2. Delay During Taxiing.3. Take-Off
I Reward Function:I Inside Window: Positive reward penalizing delay on ground.I Outside Window: 0.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 18
![Page 25: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/25.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
RLDMAN: Single-Agent Single-State to Multi-Agent Multi-State
1. Single-Agent Single-State (N-Armed Bandit):
I Parked State / Delay Off-Block actions.
I Delay is absorbed at the gate →reduced fuelconsumption.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 19
![Page 26: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/26.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
RLDMAN: Single-Agent Single-State to Multi-Agent Multi-State
1. Single-Agent Single-State (N-Armed Bandit):
I Parked State / Delay Off-Block actions.
I Delay is absorbed at the gate →reduced fuelconsumption.
2. Multi-Agent Single-State: Multiple Single-State agentslearning in a shared environment.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 19
![Page 27: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/27.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
RLDMAN: Single-Agent Single-State to Multi-Agent Multi-State
1. Single-Agent Single-State (N-Armed Bandit):
I Parked State / Delay Off-Block actions.
I Delay is absorbed at the gate →reduced fuelconsumption.
2. Multi-Agent Single-State: Multiple Single-State agentslearning in a shared environment.
3. Single-Agent Multi-State: Not always possible to absorb allthe delay at the gate:
I Arriving flight is requesting it.I Avoid the traffic in the vicinity of the gate.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 19
![Page 28: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/28.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
RLDMAN: Single-Agent Single-State to Multi-Agent Multi-State
1. Single-Agent Single-State (N-Armed Bandit):
I Parked State / Delay Off-Block actions.
I Delay is absorbed at the gate →reduced fuelconsumption.
2. Multi-Agent Single-State: Multiple Single-State agentslearning in a shared environment.
3. Single-Agent Multi-State: Not always possible to absorb allthe delay at the gate:
I Arriving flight is requesting it.I Avoid the traffic in the vicinity of the gate.
4. Multi-Agent Multi-State: Multiple Multi-State agentslearning in a shared environment.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 19
![Page 29: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/29.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Single-Agent Multi-State Example
I TTOT Window (TTOW): [08:16:00,08:18:00], a-b = b-c =c-d = 5min.
I Reward Function Parameters: rmax=100, rTTOTW ,out=0,f taxiing=0.5. Q-learning= α = 0.2, γ = 0.8.
Terminal
a
b c
d 27
09
Parked State Taxiing States Taken-Off States
08:00:00 08:00:30
08:00:00
08:01:00
08:05:30
08:05:00
08:06:00
08:06:30
08:07:00
08:10:30
08:10:00
08:11:00
08:11:30
08:12:00
08:12:30
08:13:00
Taken-OffOutsideWindow
Taken-OffInside
Window
gd (a)
ne (a), nx (b)
ne (b), nx (c)
ne (c), nx (d)
1. Aircraft parks at the gate at 08:00:00.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 20
![Page 30: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/30.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Single-Agent Multi-State Example
I TTOT Window (TTOW): [08:16:00,08:18:00], a-b = b-c =c-d = 5min.
I Reward Function Parameters: rmax=100, rTTOTW ,out=0,f taxiing=0.5. Q-learning= α = 0.2, γ = 0.8.
Terminal
a
b c
d 27
09
Parked State Taxiing States Taken-Off States
08:00:00 08:00:30
08:00:00
08:01:00
08:05:30
08:05:00
08:06:00
08:06:30
08:07:00
08:10:30
08:10:00
08:11:00
08:11:30
08:12:00
08:12:30
08:13:00
Taken-OffOutsideWindow
Taken-OffInside
Window
gd (a)
ne (a), nx (b)
ne (b), nx (c)
ne (c), nx (d)
1. Aircraft parks at the gate at 08:00:00.2. Off-blocks (AOBT) at 08:00:30. R=0,
Q=0.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 20
![Page 31: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/31.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Single-Agent Multi-State Example
I TTOT Window (TTOW): [08:16:00,08:18:00], a-b = b-c =c-d = 5min.
I Reward Function Parameters: rmax=100, rTTOTW ,out=0,f taxiing=0.5. Q-learning= α = 0.2, γ = 0.8.
Terminal
a
b c
d 27
09
Parked State Taxiing States Taken-Off States
08:00:00 08:00:30
08:00:00
08:01:00
08:05:30
08:05:00
08:06:00
08:06:30
08:07:00
08:10:30
08:10:00
08:11:00
08:11:30
08:12:00
08:12:30
08:13:00
Taken-OffOutsideWindow
Taken-OffInside
Window
gd (a)
ne (a), nx (b)
ne (b), nx (c)
ne (c), nx (d)
1. Aircraft parks at the gate at 08:00:00.2. Off-blocks (AOBT) at 08:00:30. R=0,
Q=0.3. No stop at node b. R=0, Q=0.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 20
![Page 32: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/32.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Single-Agent Multi-State Example
I TTOT Window (TTOW): [08:16:00,08:18:00], a-b = b-c =c-d = 5min.
I Reward Function Parameters: rmax=100, rTTOTW ,out=0,f taxiing=0.5. Q-learning= α = 0.2, γ = 0.8.
Terminal
a
b c
d 27
09
Parked State Taxiing States Taken-Off States
08:00:00 08:00:30
08:00:00
08:01:00
08:05:30
08:05:00
08:06:00
08:06:30
08:07:00
08:10:30
08:10:00
08:11:00
08:11:30
08:12:00
08:12:30
08:13:00
Taken-OffOutsideWindow
Taken-OffInside
Window
gd (a)
ne (a), nx (b)
ne (b), nx (c)
ne (c), nx (d)
1. Aircraft parks at the gate at 08:00:00.2. Off-blocks (AOBT) at 08:00:30. R=0,
Q=0.3. No stop at node b. R=0, Q=0.4. Aircraft stops for 30sec at node c. R=0,
Q=0.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 20
![Page 33: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/33.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Single-Agent Multi-State Example
I TTOT Window (TTOW): [08:16:00,08:18:00], a-b = b-c =c-d = 5min.
I Reward Function Parameters: rmax=100, rTTOTW ,out=0,f taxiing=0.5. Q-learning= α = 0.2, γ = 0.8.
Terminal
a
b c
d 27
09
Parked State Taxiing States Taken-Off States
08:00:00 08:00:30
08:00:00
08:01:00
08:05:30
08:05:00
08:06:00
08:06:30
08:07:00
08:10:30
08:10:00
08:11:00
08:11:30
08:12:00
08:12:30
08:13:00
Taken-OffOutsideWindow
Taken-OffInside
Window
gd (a)
ne (a), nx (b)
ne (b), nx (c)
ne (c), nx (d)
1. Aircraft parks at the gate at 08:00:00.2. Off-blocks (AOBT) at 08:00:30. R=0,
Q=0.3. No stop at node b. R=0, Q=0.4. Aircraft stops for 30sec at node c. R=0,
Q=0.5. Aircraft takes-off (ATOT) at 08:16:00.
rTTOTW ,ini =100-0.5 * 60 = 70. Q =
0.2*70=14.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 20
![Page 34: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/34.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Summary
Introduction
Reinforcement Learning
Departure MANanagement
DMAN RL Model
Experiments
Conclusions
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 21
![Page 35: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/35.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Learning Problem Size
Large Multi-Agent System (MAS)
I Number of departure flights (agents): 698
I Number of arrival flights: 711
I Two days of operations of KJFK
Independent Learning Scenarios
I Total: 42
Scenario 6-38 5 0 39 1,3,4 2,41 40
# of Dep 20 13 8 6 3 1 0
# of AT MA MA MA MA MA SA -Number of departure flights per learning scenario index
(Dep: Departure Flights, AT: Agents Type, MA: Multi-Agent, SA: Single-Agent).
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 22
![Page 36: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/36.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Learning Scenario 6: Single-State Multi-State Comparison
I Deterministic environment
I 20 independent learning agents
Average # of TTOTW in Average fuel consumption departureflights (Kg)
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 23
![Page 37: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/37.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Learning Scenario 6: Single-State Multi-State Comparison
I Deterministic environment
I 20 independent learning agents
Average reward (all agents) Average reward (per agent)
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 24
![Page 38: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/38.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Learning Scenario 6: Single-State Multi-State Comparison
I Deterministic environment
I 20 independent learning agents
Average gate delay (all agents) Average taxiing delay (all agents)
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 25
![Page 39: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/39.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
All Learning Scenarios: Percentage of Windows Respected
Percentage of Windows Respected
CaseEnvironment
Deterministic StochasticMachine Learning 99 96
Gate ControllersAverage 85 71Exact 97 44
Gate + RunwayControllers
Average 87 70Exact 96 44
Percentage of windows respected for all scenarios
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 26
![Page 40: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/40.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
All Learning Scenarios: Fuel Consumption
Fuel Consumption (kg)
CaseEnvironment
Deterministic StochasticMachine Learning 34,806 37,989
Gate ControllersAverage 35,057 38,106Exact 34,839 37,865
Gate + RunwayControllers
Average 35,412 36,613Exact 34,847 37,872
Fuel consumption for departure aircraft for all scenarios
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 27
![Page 41: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/41.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Summary
Introduction
Reinforcement Learning
Departure MANanagement
DMAN RL Model
Experiments
Conclusions
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 28
![Page 42: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/42.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Conclusions
I Reinforcement Learning (RL) has showen to have goodpotential for modeling and finding solutions for respectingassigned take-off windows for departure aircraft.
I Realistic real world applications of RL.
I Single-State
I Advantages: Reduced fuel consumption and a reducedlearning problem since there are no visited taxiing states.
I Disadvantages: Increased gate delay and not being able tofind a solution for all cases, e.g., when it needs to avoid traffictaxiing in the vicinity of the gate.
I Multi-State
I Advantages: Reduced gate delay. Finds solutions to avoiddisturbance traffic on its path to the runway.
I Disadvantages: Increased fuel consumption. Bigger learningproblem.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 29
![Page 43: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/43.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Conclusions
I Reinforcement Learning (RL) has showen to have goodpotential for modeling and finding solutions for respectingassigned take-off windows for departure aircraft.
I Realistic real world applications of RL.
I Single-State
I Advantages: Reduced fuel consumption and a reducedlearning problem since there are no visited taxiing states.
I Disadvantages: Increased gate delay and not being able tofind a solution for all cases, e.g., when it needs to avoid traffictaxiing in the vicinity of the gate.
I Multi-State
I Advantages: Reduced gate delay. Finds solutions to avoiddisturbance traffic on its path to the runway.
I Disadvantages: Increased fuel consumption. Bigger learningproblem.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 29
![Page 44: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/44.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Conclusions
I Reinforcement Learning (RL) has showen to have goodpotential for modeling and finding solutions for respectingassigned take-off windows for departure aircraft.
I Realistic real world applications of RL.
I Single-State
I Advantages: Reduced fuel consumption and a reducedlearning problem since there are no visited taxiing states.
I Disadvantages: Increased gate delay and not being able tofind a solution for all cases, e.g., when it needs to avoid traffictaxiing in the vicinity of the gate.
I Multi-State
I Advantages: Reduced gate delay. Finds solutions to avoiddisturbance traffic on its path to the runway.
I Disadvantages: Increased fuel consumption. Bigger learningproblem.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 29
![Page 45: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/45.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Questions?
Ivomar Brito Soares
[email protected] - [email protected]
AI Lab (VUB): https://ai.vub.ac.be
Airtopsoft SA: http://www.airtopsoft.
com
Innoviris: http://www.innoviris.be
Youtube Channel:RLDMAN: http://www.youtube.com/
channel/UC8uJBsMej5A1as8trbVxbbQ.Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 30
![Page 46: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/46.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
References
1. A. G. Barto, Reinforcement Learning: An Introduction.MIT press, 1998.
2. K. Tumer and A. Agogino, Improving Air TrafficManagement with a Learning Multiagent System, IEEEIntelligent Systems, vol. 24, no. 1, pp. 1821, 2009.
3. R. S. Michalski, J. G. Carbonell, and T. M. Mitchell,Machine learning: An Artificial Intelligence Approach.Springer Science Business Media, 2013.
4. A. Nowe, P. Vrancx, and Y.-M. De Hauwere, Game Theoryand Multi-agent Reinforcement Learning. Springer, 2012,ch. Reinforcement Learning: State of the Art, pp. 441470.
5. Y.-M. De Hauwere, Sparse Interactions in Multi-AgentReinforcement Learning, 2011.
6. R. De Neufville and A. Odoni, Airport Systems: Planning,Design and Management, 2013.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 31
![Page 47: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/47.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
References
7. S. Stroiney, B. Levy, and C. Knickerbocker, DepartureManagement: Savings in Taxi Time, Fuel Burn, andEmissions, in Integrated Communications Navigation andSurveillance Conference (ICNS). IEEE, 2010, pp. J217.
8. J.-B. Gotteland, N. Durand, and J.-M. Alliot, HandlingCFMS Slots in Busy Airports, in 5th USA/Europe AirTraffic Management Research and Development Seminar,2003.
9. Airtopsoft, Airtop Fast Time Simulatorhttp://www.airtopsoft.com/, 2005, [Online; accessed23-February-2015].
10. C. J. Watkins and P. Dayan, Q-Learning, Machine learning,vol. 8, no. 3-4, pp. 279292, 1992.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 32
![Page 48: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/48.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Abstract
This paper considers how existing Reinforcement Learning (RL) techniques can be used to model and learnsolutions for large scale Multi-Agent Systems (MAS). The large scale MAS of interest is the context of themovement of departure flights in big airports, commonly known as the Departure MANagement (DMAN)problem. A particular DMAN subproblem is how to respect Central Flow Management Unit (CFMU) take-off timewindows, which are time windows planned by flow management authorities to be respected for the take-off time ofdeparture flights. A RL model to handle this problem is proposed including the Markov Decision Process (MDP)definition, the behavior of the learning agents and how the problem can be modeled using RL ranging from thesimplest to the full RL problem. Several experiments are also shown that illustrate the performance of the machinelearning algorithm, with a comparison on how these problems are commonly handled by airport controllersnowadays. The environment in which the agents learn is provided by the Fast Time Simulator (FTS) AirTOp andthe airport case study is the John F. Kennedy International Airport (KJFK) in New York City, USA, one of thebusiest airports in the world.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 33
![Page 49: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/49.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Reinforcement Learning (RL) Overview
I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).
I Learn Policy (π): Jπ ≡ E [∑∞
t=0 γtR(s(t), π(s(t)))]
I Q values: Q(s, a) = R(s, a) + γ∑
s′ T (s, a, s ′) maxa′ Q(s ′, a′)
I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]
I ε-greedy action selection mechanism:
ε (episode) = ε0 ∗ τ episode
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34
![Page 50: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/50.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Reinforcement Learning (RL) Overview
I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).
I S = {s1, ..., sN} is a finite set of states.I A = {a1, ..., ak} are the actions available to the agent.I Tsa: Each combination of starting state si , action choice
al ∈ A and next state sj has an associated transitionprobability T (si , al , sj).
I R = Immediate reward R(si , al).I γ ∈ [0, 1) is the discount factor (e.g., 0.9).
I Learn Policy (π): Jπ ≡ E [∑∞
t=0 γtR(s(t), π(s(t)))]
I Q values: Q(s, a) = R(s, a) + γ∑
s′ T (s, a, s ′) maxa′ Q(s ′, a′)
I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]
I ε-greedy action selection mechanism:
ε (episode) = ε0 ∗ τ episode
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34
![Page 51: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/51.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Reinforcement Learning (RL) Overview
I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).
I Learn Policy (π): Jπ ≡ E [∑∞
t=0 γtR(s(t), π(s(t)))]
I Q values: Q(s, a) = R(s, a) + γ∑
s′ T (s, a, s ′) maxa′ Q(s ′, a′)
I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]
I ε-greedy action selection mechanism:
ε (episode) = ε0 ∗ τ episode
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34
![Page 52: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/52.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Reinforcement Learning (RL) Overview
I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).
I Learn Policy (π): Jπ ≡ E [∑∞
t=0 γtR(s(t), π(s(t)))]
I E : Expected value.
I Q values: Q(s, a) = R(s, a) + γ∑
s′ T (s, a, s ′) maxa′ Q(s ′, a′)
I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]
I ε-greedy action selection mechanism:
ε (episode) = ε0 ∗ τ episode
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34
![Page 53: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/53.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Reinforcement Learning (RL) Overview
I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).
I Learn Policy (π): Jπ ≡ E [∑∞
t=0 γtR(s(t), π(s(t)))]
I Q values: Q(s, a) = R(s, a) + γ∑
s′ T (s, a, s ′) maxa′ Q(s ′, a′)
I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]
I ε-greedy action selection mechanism:
ε (episode) = ε0 ∗ τ episode
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34
![Page 54: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/54.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Reinforcement Learning (RL) Overview
I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).
I Learn Policy (π): Jπ ≡ E [∑∞
t=0 γtR(s(t), π(s(t)))]
I Q values: Q(s, a) = R(s, a) + γ∑
s′ T (s, a, s ′) maxa′ Q(s ′, a′)
I s ′: Next state.I a′: Action taken on next state.
I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]
I ε-greedy action selection mechanism:
ε (episode) = ε0 ∗ τ episode
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34
![Page 55: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/55.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Reinforcement Learning (RL) Overview
I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).
I Learn Policy (π): Jπ ≡ E [∑∞
t=0 γtR(s(t), π(s(t)))]
I Q values: Q(s, a) = R(s, a) + γ∑
s′ T (s, a, s ′) maxa′ Q(s ′, a′)
I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]
I ε-greedy action selection mechanism:
ε (episode) = ε0 ∗ τ episode
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34
![Page 56: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/56.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Reinforcement Learning (RL) Overview
I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).
I Learn Policy (π): Jπ ≡ E [∑∞
t=0 γtR(s(t), π(s(t)))]
I Q values: Q(s, a) = R(s, a) + γ∑
s′ T (s, a, s ′) maxa′ Q(s ′, a′)
I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]
I α: Learning rate.
I ε-greedy action selection mechanism:
ε (episode) = ε0 ∗ τ episode
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34
![Page 57: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/57.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Reinforcement Learning (RL) Overview
I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).
I Learn Policy (π): Jπ ≡ E [∑∞
t=0 γtR(s(t), π(s(t)))]
I Q values: Q(s, a) = R(s, a) + γ∑
s′ T (s, a, s ′) maxa′ Q(s ′, a′)
I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]
I ε-greedy action selection mechanism:
ε (episode) = ε0 ∗ τ episode
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34
![Page 58: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/58.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Reinforcement Learning (RL) Overview
I Markov Decision Process (MDP) : (S ,A,Tsa, γ,R).
I Learn Policy (π): Jπ ≡ E [∑∞
t=0 γtR(s(t), π(s(t)))]
I Q values: Q(s, a) = R(s, a) + γ∑
s′ T (s, a, s ′) maxa′ Q(s ′, a′)
I Q-Learning update rule [Watkins, 1992]:Q(s, a)← Q(s, a) + α[R(s, a) + γmaxa′ Q(s ′, a′)− Q(s, a)]
I ε-greedy action selection mechanism:
ε (episode) = ε0 ∗ τ episode
I ε0 to be the initial value (e.g., 1.0).I τ ∈ (0, 1) the decay (e.g.,0.995, 1348 episodes).
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 34
![Page 59: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/59.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Algorithm 1 RL Update With Q-Learning and ε-Greedy withDecay
1: Initialize Q(s, a) (e.g., 0)2: for Each episode do3: Agent returns to initial state.4: Decay ε: ε← ε0 ∗ τ episode5: while Final/absorbing state not reached do6: Generate random number n ∈ [0, 1):7: if n ≤ ε then8: Explore: Choose a at random9: else
10: Exploit: Choose among a with the highest Q
11: Execute action a12: Observe reward r , next state s
′
13: Update Q of a: Q(s, a) ← Q(s, a) + αt [R(s, a) +γmaxa′ Q(s ′, a′)− Q(s, a)] [Watkins, 1992]Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 35
![Page 60: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/60.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Target Take-Off Time Window (TTOTW)
Respect TTOW allows for a better usage of the ATSinfra-structure.
I Departure aircraft i : acdepi , acdep,TTOTi .
I TTOT Window (TTOTW):
I Width: TTOTW wi > 0.
I Range: [TTOTWmini , TTOTmax
i ].I Constraints: TTOTWmin
i < TTOTi < TTOTWmaxi and
TTOTWmaxi = TTOTWmin
i + TTOTW wi .
I TTOTWi respected: Actual Take-Off Time (ATOT):ATOTi ∈ [TTOTWmin
i ,TTOTWmaxi ].
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 36
![Page 61: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/61.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Target Take-Off Time Window (TTOTW)
Respect TTOW allows for a better usage of the ATSinfra-structure.
I Departure aircraft i : acdepi , acdep,TTOTi .
I TTOT Window (TTOTW):
I Width: TTOTW wi > 0.
I Range: [TTOTWmini , TTOTmax
i ].I Constraints: TTOTWmin
i < TTOTi < TTOTWmaxi and
TTOTWmaxi = TTOTWmin
i + TTOTW wi .
I TTOTWi respected: Actual Take-Off Time (ATOT):ATOTi ∈ [TTOTWmin
i ,TTOTWmaxi ].
In Charles de Gaulle Airport (LFPG) in Paris, France, roughly80% of the flights succeed in taking off inside their slots[Gotteland, 2003] .
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 36
![Page 62: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/62.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Human Controllers Approaches to Respecting TTOTW
1. Gate Controllers: Clear the aircraft off-block at TTOT -T oti .
I Average: T ot,ai =
Average push back duration (00:02:35)+ Average taxi time duration (total taxi length / 14kt)+ Average runway line up duration (00:00:28)+ Runway acceleration duration.
I Exact: T ot,ei =
Actual Take-Off Time (ATOT)- Actual Off-Block Time (AOBT).
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 37
![Page 63: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/63.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Human Controllers Approaches to Respecting TTOTW
1. Gate Controllers: Clear the aircraft off-block at TTOT -T oti .
I Average: T ot,ai =
Average push back duration (00:02:35)+ Average taxi time duration (total taxi length / 14kt)+ Average runway line up duration (00:00:28)+ Runway acceleration duration.
I Exact: T ot,ei =
Actual Take-Off Time (ATOT)- Actual Off-Block Time (AOBT).
2. Runway Controllers: Make it wait before lining up, if itestimates that it will miss its window by taking-off too early.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 37
![Page 64: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/64.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
RLDMAN: Single-Agent Single-State to Multi-Agent Multi-State
1. Single-Agent Single-State (N-Armed Bandit):I Only one learning agent and the only state
considered is the Parked State with the DelayOff-Block actions.
I If delay is absorbed at the gate and the aircraftengines are turned off, thus reducing fuelconsumption.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 38
![Page 65: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/65.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
RLDMAN: Single-Agent Single-State to Multi-Agent Multi-State
1. Single-Agent Single-State (N-Armed Bandit):I Only one learning agent and the only state
considered is the Parked State with the DelayOff-Block actions.
I If delay is absorbed at the gate and the aircraftengines are turned off, thus reducing fuelconsumption.
2. Multi-Agent Single-State: Multiple Single-State agentslearning in a shared environment.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 38
![Page 66: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/66.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
RLDMAN: Single-Agent Single-State to Multi-Agent Multi-State
1. Single-Agent Single-State (N-Armed Bandit):I Only one learning agent and the only state
considered is the Parked State with the DelayOff-Block actions.
I If delay is absorbed at the gate and the aircraftengines are turned off, thus reducing fuelconsumption.
2. Multi-Agent Single-State: Multiple Single-State agentslearning in a shared environment.
3. Single-Agent Multi-State: Not always possible to absorb allthe delay at the gate:
I Arriving flight is requesting it.I Avoid the traffic in the vicinity of the gate.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 38
![Page 67: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/67.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
RLDMAN: Single-Agent Single-State to Multi-Agent Multi-State
1. Single-Agent Single-State (N-Armed Bandit):I Only one learning agent and the only state
considered is the Parked State with the DelayOff-Block actions.
I If delay is absorbed at the gate and the aircraftengines are turned off, thus reducing fuelconsumption.
2. Multi-Agent Single-State: Multiple Single-State agentslearning in a shared environment.
3. Single-Agent Multi-State: Not always possible to absorb allthe delay at the gate:
I Arriving flight is requesting it.I Avoid the traffic in the vicinity of the gate.
4. Multi-Agent Multi-State: Multiple Multi-State agentslearning in a shared environment.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 38
![Page 68: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/68.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Markov Decision Process (MDP)
I Agent: A FP controller agent ai .
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 39
![Page 69: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/69.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Markov Decision Process (MDP)
I Agent: A FP controller agent ai .I States (S):
1. Parked (initial state) (spi ):I Departure Gate (gd)I Entry Time (ep)
2. Taxiing (intermediate states) (S ti,j = sti,1, ..., s
ti,N):
I Entry Node (ne)I Exit Node (nx)I Entry Time (et)
3. Taken-Off (goal/absorbing states):I Taken-Off Inside Window (sTTOTW ,in
i )I Taken-Off Outside Window (sTTOTW ,out
i )
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 39
![Page 70: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/70.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Markov Decision Process (MDP)
I Actions (A):
1. Delay Off-Block (Ao = ao1 , ..., aoL)
2. Delay During Taxiing (At = at1, ..., atM)
3. Take-Off (ae)
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 40
![Page 71: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/71.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
Markov Decision Process (MDP)
I Actions (A):
1. Delay Off-Block (Ao = ao1 , ..., aoL)
2. Delay During Taxiing (At = at1, ..., atM)
3. Take-Off (ae)
I Reward Function (R):I Inside Window:
rTTOTW ,ini = rmax − ptaxiingi = rmax − f taxiing ∗ d taxiing
i
I Outside Window: rTTOTW ,outi = 0.
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 40
![Page 72: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/72.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
RL Set Up
I Independent learners.
I Q-learning: α = 0.2, γ = 0.8.
I Action selection mechanism is ε-greedy with parameter decay:ε0 = 1.0, τ = 0.995.
I Competitive setting: rmax = 10, 000, f taxiing = −1.0.
I Environment: deterministic, stochastic (non-deterministic).
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 41
![Page 73: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/73.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
RL Set Up
I Delay off-block actions: Ao with a range of [−10min, 10min]centered around TTOT − T ot,e and a step of 10sec for everyagent (121 actions per spi ).
I Delay during taxiing actions: At were defined with a range of[0, 1min] and a step of 10sec (7 actions per S t
i ) close to eachapron exit.
I Learning trial ends when ε = 0.001 for all agents (1378episodes).
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 42
![Page 74: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/74.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
All Learning Scenarios: Percentage of Windows Respected
Percentage of KJFK departure flights that take off inside window
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 43
![Page 75: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/75.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
All Learning Scenarios: Percentage of Windows Respected
Percentage of KJFK departure flights that take off inside window
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 44
![Page 76: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/76.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
All Learning Scenarios: Fuel Consumption
Percentage of KJFK departure flights that take off inside window
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 45
![Page 77: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/77.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
All Learning Scenarios: Fuel Consumption
Percentage of KJFK departure flights that take off inside window
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 46
![Page 78: Departure MANagement with a Reinforcement Learning Approach: Respecting CFMU Slots](https://reader030.vdocuments.us/reader030/viewer/2022021500/5870d75f1a28ab64768b6e05/html5/thumbnails/78.jpg)
Introduction Reinforcement Learning Departure MANanagement DMAN RL Model Experiments Conclusions
All Learning Scenarios: Fuel Consumption
Percentage of KJFK departure flights that take off inside window
Brito Soares et al. Departure MANagement with a Reinforcement Learning Approach 47