multi-core scheduling optimizations for soft real-time
TRANSCRIPT
![Page 1: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/1.jpg)
Multi-core scheduling optimizationsfor soft real-time applications
a cooperation aware approach
co-authors:
Liria Matsumoto Sato
Patrick Bellasi
William Fornaciari
Wolfgang Betz
Lucas De Marchi
sponsors:
![Page 2: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/2.jpg)
Agenda
IntroductionMotivationObjectives
AnalysisOptimization descriptionExperimental resultsConclusions & future works
![Page 3: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/3.jpg)
Introduction – motivation
![Page 4: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/4.jpg)
Introduction – motivation
SMP + RTMultiple processing units inside a processorDeterminism
Parallel Programming ParadigmsData Level Parallelism (DLP)
Competitive tasks
Task Level Parallelism (TLP)Cooperative tasks
![Page 5: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/5.jpg)
Introduction – motivationDLP TLP
T1
T0
T2
T3
T0
T1
T2
Characterization:SynchronizationCommunication
![Page 6: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/6.jpg)
Introduction – motivation
Linux RT scheduler (mainline)Run as soon as possible (based on prio)⇔ Use as many CPUs as possibleOk for DLP!
But, what about TLP?Anyway, why do we care about TLP?
![Page 7: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/7.jpg)
Objectives
Study the behavior of RT Linux scheduler for cooperative tasks
Optimize the RT schedulerSmooth integration into mainline kernel
Don't throw away everything
![Page 8: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/8.jpg)
Analysis – benchmark
Simulation of a scenario where SW replaces HW
Multimedia-likeMixed workload:
DLP + TLPChallenge: map N
tasks to M cores optimally
![Page 9: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/9.jpg)
Analysis – metrics
ThroughputSample mean
time
DeterminismSample variance
![Page 10: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/10.jpg)
Analysis – metrics
Influencing factorsPreemption-disabledIRQ-disabledLocality of wake-up
Intel i7 920:4 cores2.8 GHz
![Page 11: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/11.jpg)
Analysis – locality of wake-upMigrations
wave0: 112010101010101010101030111301010101010131010321010033 wave1: 033333333333333333333313202133333333333303333202332121 wave2: 022022222222222222222222233302222222222222222213322330 wave3: 111101010101010101010101011110101010101010101010101002 mixer0: 023033333333333333333333333202333333333333333333322233 mixer1: 103302201010101010101010101010330101010101010101010101 mixer2: 230201010101010101010101012020101010101010101010101021monitor: 111122222222222222222222231112222222222222222233322303
a) migration patterns (wave0, mixer1 and mixer2)
![Page 12: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/12.jpg)
Analysis – locality of wake-upMigrations
wave0: 302121331102303323303311032320011021111111120101010101 wave1: 223303013333212010121032121012333210320230333333333333 wave2: 033332232302003111201211330021322233323330220222222222 wave3: 110211120211101030031121003030120101101111111010101010 mixer0: 322210301110320231310303021213120013220320230333333333 mixer1: 001003321202220131103203212130320331201031033022010101 mixer2: 230101020201202020121020021010130101201202302010101010monitor: 113022133333131312032133303222223233123111111222222222
b) occasional migrations
![Page 13: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/13.jpg)
Analysis – locality of wake-up
Cache-miss rate measurements
1 CPU 2 CPUs 4 CPUs Increase (1 - 4)
Load 7.58% 8.99% 9.44% +24.5%
Store 9.29% 9.78% 11.62% +25.1%
![Page 14: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/14.jpg)
Analysis – conclusion
Why do we care about TLP?Common parallelization technique
What about TLP?Current state of Linux scheduler is not as
good as we want
![Page 15: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/15.jpg)
Solution – benchmarkAbstraction:
One application level
sends data to another
Serialization (task cooperation)
Parallelization (task com
petition)
w0
w1
w2
w3
m0
m1
m2
![Page 16: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/16.jpg)
Solution – benchmarkAbstraction:
One application level
sends data to another
Reality:
shared buffers +
synchronization
Serialization (task cooperation)
Parallelization (task com
petition)
w0
w1
w2
w3
m0
m1
m2
buffer
![Page 17: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/17.jpg)
Solution – benchmarkAbstraction:
One application level
sends data to another
Reality:
shared buffers +
synchronization
Dependencies:
Define dependencies
among tasks in the
opposite way of
data flow
Serialization (task cooperation)
Parallelization (task com
petition)
w0
w1
w2
w3
m0
m1
m2
![Page 18: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/18.jpg)
If data do not go to tasks, then the tasks go to where data were produced
Make tasks run on same CPU of their
dependencies
Solution – ideaDependency followers
![Page 19: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/19.jpg)
Measurement tools
Ftracesched_switch tool
(Carsten Emde)*gtkwaveperfadhoc scripts
* http://www.osadl.org/Single-View.111+M5d51b7830c8.0.html
![Page 20: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/20.jpg)
Solution – task-affinityDependency followers
Task-affinity:
selection of the CPU in which a task (e.g. m0) executes takes into consideration the CPUs in which its dependencies (e.g. w0 and w1) ran last time.
w0
w1
w2
w3
m0
m1
m2
![Page 21: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/21.jpg)
Solution – task-affinityimplementation
2 lists inside each task_struct:taskaffinity_listfollowme_list
2 system calls to add/delete affinities:sched_add_taskaffinitysched_del_taskaffinity
![Page 22: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/22.jpg)
1 core 2 cores 4 cores
5.00%
6.00%
7.00%
8.00%
9.00%
10.00%
11.00%
12.00%
9.29%
9.78%
11.62%
10.17%
8.68%
10.35%
mainline task-aff inity
% o
f sto
re c
ach
e-m
iss
1 core 2 cores 4 cores
5.00%
5.50%
6.00%
6.50%
7.00%
7.50%
8.00%
8.50%
9.00%
9.50%
10.00%
7.58%
8.99%
9.44%
7.92% 7.88%
8.58%
mainline task-af f inity
% o
f lo
ad
ca
che
-mis
s
Experimental resultsCache-miss rates
More data/instruction hit
the L1 cache
less cache lines are invalidated
Measurements without and with task-affinity
![Page 23: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/23.jpg)
Experimental resultsWhat exactly to evaluate?
w0
w1
w2
w3
m0
m1
m2
Cache-miss rate is not exactly what we want to optimize
Optimization objectives:Lower the time to
produce a single sample
Increase determinism on production of several samples
![Page 24: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/24.jpg)
w ave0 w ave1 w ave2 w ave3 mixer0 mixer1 mixer2
8
10
12
14
16
18
20
8.81
9.479.21 9.19
16.34
15.41
17.55
9.15
9.649.47
9.84
17.39
16.69
17.82
task-aff inity mainline
ave
rag
e e
xecu
tion
tim
e (
µs
)Experimental resultsAverage execution time of each task
w0
w1
w2
w3
m0
m1
m2
![Page 25: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/25.jpg)
w ave0 w ave1 w ave2 w ave3 mixer0 mixer1 mixer2
0
0.5
1
1.5
2
2.5
0.22
0.31 0.290.36
0.61
0.380.35
0.570.51 0.48
0.83
1.45
2.19
0.47
task-aff inity mainline
vari
an
ce o
f th
e e
xecu
tion
tim
e (
µs
)²Experimental resultsVariance of execution time of each task
w0
w1
w2
w3
m0
m1
m2
![Page 26: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/26.jpg)
Experimental resultsProduction time of each single sample
mainline task-affinity
Results obtained for 150,000 samples
![Page 27: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/27.jpg)
Experimental resultsProduction time of each single sample
Empiric repartition functionReal-time metric (normal distribution):
average + 2 * standard deviation
![Page 28: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/28.jpg)
02
46
810
1214
1618
2022
2426
2830
3234
3638
4042
4446
4850
5254
5658
6062
6466
6870
7274
7678
8082
8486
8890
9294
9698
100102
104106
108110
112114
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
time (us)
% o
f sa
mp
les
41.2 µs
01
23
45
67
8910
1112
1314
1516
1718
1920
2122
2324
2526
2728
2930
3132
3334
3536
3738
3940
4142
4344
4546
4748
4950
51
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
time (us)
% o
f sa
mp
les
38.96 µs
Max: 114Min: 29Avg: 37.8Variance: 3.22
Max: 51Min: 35Avg: 38.0Variance: 0.21
mainline
task-affinity
![Page 29: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/29.jpg)
Experimental resultssummary
~15x
![Page 30: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/30.jpg)
Conclusion & future works
Average execution time is almost the sameDeterminism for real-time applications is
improved
Future works:Better focus on temporal localityImprove task-affinity configurationTest on other architecturesClean up the repository
![Page 31: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/31.jpg)
Conclusion & future works
Still a Work In ProgressGit repository:
git://git.politreco.com/linux-lcs.git
Contact:[email protected]@gmail.com
![Page 32: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/32.jpg)
Q & A
![Page 33: Multi-core scheduling optimizations for soft real-time](https://reader034.vdocuments.us/reader034/viewer/2022042702/6265ec8c75e822031f73c7f7/html5/thumbnails/33.jpg)
Solution – Linux schedulerDependency followers
Linux scheduler: Change the decision process of the CPU in which a task executes when it is woken up
Mainscheduler
Periodicscheduler
CoreScheduler
CPUi
Contextswitch
RT IdleFair
RR FIFO Normal
Batch
Idle
Selecttask
Schedulerclasses
Schedulerpolicies