research on embedded hypervisor scheduler techniques
DESCRIPTION
Research on Embedded Hypervisor Scheduler Techniques. 2014/10/02. Background. Asymmetric multi-core is becoming increasing popular over homogeneous multi-core systems. An asymmetric multi-core platform consists of cores with different capabilities. ARM: big.LITTLE architecture. - PowerPoint PPT PresentationTRANSCRIPT
Research on Embedded Hypervisor Scheduler Techniques
2014/10/02
1
BackgroundAsymmetric multi-core is becoming
increasing popular over homogeneous multi-core systems.◦An asymmetric multi-core platform
consists of cores with different capabilities. ARM: big.LITTLE architecture. Qualcomm: asynchronous Symmetrical Multi-
Processing (aSMP) Nvidia: variable Symmetric Multiprocessing
(vSMP) …etc.
2
MotivationScheduling goals differ between
homogenous and asymmetric multi-core platforms.◦Homogeneous multi-core: load-
balancing. Distribute workloads evenly in order to
obtain maximum performance.
◦Asymmetric multi-core: maximize power efficiency with modest performance sacrifices.
3
Motivation(Cont.)Need new scheduling strategies
for asymmetric multi-core platform.◦The power and computing
characteristics vary from different types of cores.
◦Take the differences into consideration while scheduling.
4
Project GoalResearch on the current scheduling
algorithms for homogenous and asymmetric multi-core architecture.
Design and implement the hypervisor scheduler on asymmetric multi-core platform.Assign virtual cores to physical cores for
execution.Minimize the power consumption with
performance guarantee.
5
Linaro Linux Kernel
GUEST2
Android Framework
Scheduler
VCPU VCPU
6
ARM Cortex-A15
ARM Cortex-A7
OS Kernel
GUEST1
Android Framework
Scheduler
VCPU VCPU
Hypervisor
Performance Power-saving
OS Kernel
GUEST2
Android Framework
Scheduler
VCPU VCPU
Low computing resource requirement
High computing resource requirement
Task 2 Task 4
VM Introspector
b-L vCPU Scheduler
VM Introspector gathers task information from
Guest OS
Task-to-vCPU
Mapper
Modify the CPU mask of each task according
to the task information from VMI
[1|0][1|0]
[0|1][0|1]
Treat this vCPU as LITTLE core since tasks with low computing requirement are scheduled here.
Hypervisor vCPU scheduler will schedule big vCPU to A15, and LITTLE vCPU to A7.
VCPU
Task 3Task 1
Hypervisor Architecture with VMI
Hypervisor SchedulerAssigns the virtual cores to
physical cores for execution.◦Determines the execution order and
amount of time assigned to each virtual core according to a scheduling policy.
◦Xen - credit-based scheduler◦KVM - completely fair scheduler
7
Virtual Core Scheduling Problem
For every time period, the hypervisor scheduler is given a set of virtual cores.
Given the operating frequency of each virtual core, the scheduler will generate a scheduling plan, such that the power consumption is minimized, and the performance is guaranteed.
8
Scheduling PlanMust satisfy three constraints.
◦Each virtual core should run on each physical core for a certain amount of time to satisfy the workload requirement.
◦A virtual core can run only on a single physical core at any time.
◦The virtual core should not switch among physical cores frequently, so as to reduce the overheads.
9
Example of A Scheduling Plan
◦x: physical core idle
10
V4xxV4x … Core0
V3xV3xx … Core1
V2V4V1V2V4 … Core2
V1V1V2V3V1 … Core3
t1t2t3t4t100
Execution Slice
Three-phase Solution[Phase 1] generates the amount
of time each virtual core should run on each physical core.
[Phase 2] determines the execution order of each virtual core on a physical core.
[Phase 3] exchanges the order of execution slice in order to reduce the number of core switching.
11
Phase 1Given the objective function and
the constraints, we can use integer programming to find ai,j.◦ai,j : the amount of time slices virtual
core i should run on physical core j. Divide a time interval into time slices.
◦Integer programming can find a feasible solution in a short time when the number of vCPUs and the number of pCPUs are small constants.
12
Phase 1(Cont.)If the relationship between power
and load is linear.◦Use greedy instead.◦Assign virtual core to the physical
core with the least power/instruction ratio and load under100%.
13
Phase 2With the information from phase
1, the scheduler has to determine the execution order of each virtual core on each physical core.◦A virtual core cannot appear in two
or more physical core at the same time.
14
ExamplevCPU0
(50,40,0, 0)
t=100
t=0
vCPU3
(10,10,20, 20)
vCPU1
(20,20,20, 20)vCPU4
(10,10,10, 10)
vCPU2
(10,10,20, 20)vCPU5
(0, 0,10, 10)
Phase 2(Cont.)We can formulate the problem
into an Open-shop scheduling problem (OSSP).◦OSSP with preemption can be solved
in polynomial time. [1]
16[1] T. Gonzalez and S. Sahni. Open shop scheduling to minimize finish time. J. ACM, 23(4):665–679, Oct. 1976.
After Phase 1 & 2After the first two phases, the
scheduler generates a scheduling plan.◦x: physical core idle
17
V4xxV4x … Core0
V3xV3xx … Core1
V2V4V1V2V4 … Core2
V1V1V2V3V1 … Core3
t1t2t3t4t100
Execution Slice
Phase 3Migrating tasks between cores
incurs overhead.Reduce the overhead by
exchanging the order to minimize the number of core switching.
18
Number of Switching Minimization ProblemGiven a scheduling plan, we want
to find an order of the execution slice, such that the cost is minimized.◦An NPC problem
Reduce from the Hamilton Path Problem.
◦Propose a greedy heuristic.
19
Example
1 2 3
3 1 2
x 2 3
x x 1
4 3 2
2 1 3
p1 p2 p3
#switching = 0
Example
1 2 3 0
3 1 2 0
x 2 3 0
x x 1 0
4 3 2 0
2 1 3 0
p1 p2 p3
#switching = 0
Example
1 2 3
3 1 2
x 2 3
4 3 2
2 1 3
p1 p2 p3
x x 1 t1
#switching = 0
Example
1 2 3 1
3 1 2 1
x 2 3 0
4 3 2 0
2 1 3 1
p1 p2 p3
x x 1 t1
#switching = 0
Example
1 2 3
3 1 2
4 3 2
2 1 3
p1 p2 p3
x 2 3 t2
x x 1 t1
#switching = 0
Example
1 2 3 1
3 1 2 3
4 3 2 2
2 1 3 2
p1 p2 p3
x 2 3 t2
x x 1 t1
#switching = 0
Example
3 1 2
4 3 2
2 1 3
p1 p2 p3
1 2 3 t3
x 2 3 t2
x x 1 t1
#switching = 1
Example
p1 p2 p3
2 1 3 t6
3 1 2 t5
4 3 2 t4
1 2 3 t3
x 2 3 t2
x x 1 t1
#switching = 7
EvaluationConduct simulations to compare
the power consumption of our asymmetry-aware scheduler with that of a credit-based scheduler.
Compare the numbers of core switching from our greedy heuristic and that from an optimal solution.
28
EvaluationConduct simulations to compare
the power consumption of our asymmetry-aware scheduler with that of a credit-based scheduler.
Compare the numbers of core switching from our greedy heuristic and that from an optimal solution.
29
EnvironmentTwo types of physical cores
◦power-hunger “big” cores frequency: 1600MHz
◦power-efficient “little” cores frequency: 600MHz
◦The DVFS mechanism is disabled.
30
Power ModelRelation between power
consumption, core frequency, and load. ◦bzip2
0 20 40 60 80 100 1200
0.5
1
1.5
2
2.5
250MHzLinear (250MHz)600MHzLinear (600MHz)8000MHzLinear (8000MHz)1600MHzLinear (1600MHz)
Loading(%)
功耗
(Watt
)
Scenario I – 2 Big and 2 LittleDual-core VM.Two sets of input:
◦Case 1: Both VMs with light workloads. 250MHz for each virtual core.
◦Case 2: One VM with heavy workloads, the other with modest workloads. Heavy:1200MHz for each virtual core Modest:600MHz for each virtual core.
32
Scenario I - Results
◦Case 1: asymmetry-aware method is about 43.2% of that of credit-based method.
◦Case 2:asymmetry-aware method uses 95.6% of energy used by the credit-base method.
33
Power(Watt)
Case 1Light-load VMs
Asymmetry-aware 0.295
Credit-based 0.683
Case 2Heavy-load VM + Modest-load
VM
Asymmetry-aware 2.382
Credit-based 2.491
Scenario 2 – 4 Big and 4 LittleQuad-core VM.Three cases
34
VM1 VM2 VM3
Case 1Light-load All 250 MHz All 250 MHz All 250 MHz
Case 2Modest-load
All 600MHz All 600 MHz All 250 MHz
Case 3Heavy-load
All 1600MHz All 1600MHz All 1600MHz
Scenario 2 - Results
In case 3, the loading of physical cores are 100% using both methods. Cannot save power if the computing resources
are not enough.
35
Power(Watt) Savings
Case 1Light-load
Asymmetry-aware 1.20541.2%
Credit-based 2.049
Case 2Modest-
Load
Asymmetry-aware 3.52411.1%Credit-based 3.960
Case 3*Heavy-load
Asymmetry-aware 6.0090%
Credit-based 6.009
EvaluationConduct simulations to compare
the power consumption of our asymmetry-aware scheduler with that of a credit-based scheduler.
Compare the numbers of core switching from our greedy heuristic and that from an optimal solution.
36
Setting25 sets of input
◦4 physical cores, 12 virtual cores, 24 distinct execution slices.
Optimal solution◦Enumerates all possible
permutations of the execution slices.◦Use A* search to reduce the search
space.
37
Evaluation Result
38
Greedy Heuristic A* Search
Average number of switching 31.2 27.7
Average execution time 0.006 seconds 10+ minutes
XEN HYPERVISOR SCHEDULER:CODE STUDY
39
Xen HypervisorScheduler:
◦xen/common/ schedule.c sched_credit.c sched_credit2.c sched_sedf.c sched_arinc653.c
40
xen/common/schedule.cGeneric CPU scheduling code
◦implements support functionality for the Xen scheduler API.
◦scheduler: default to credit-base scheduler
static void schedule(void)◦de-schedule the current domain.◦pick a new domain.
41
xen/common/sched_credit.cCredit-based SMP CPU scheduler
static struct task_slice csched_schedule;◦Implementation of credit-base
scheduling.◦SMP Load balance.
If the next highest priority local runnable VCPU has already eaten through its credits, look on other PCPUs to see if we have more urgent work.
42
xen/common/sched_credit2.cCredit-based SMP CPU scheduler
◦Based on an earlier version.static struct task_slice
csched2_schedule;◦Select next runnable local VCPU (i.e.
top of local run queue).static void balance_load(const
struct scheduler *ops, int cpu, s_time_t now);
43
Scheduling StepsXen call do_schedule() of current
scheduler on each physical CPU(PCPU).
Scheduler selects a virtual CPU(VCPU) from run queue, and return it to Xen hypervisor.
Xen hypervisor deploy the VCPU to current PCPU.
44
Adding Our SchedulerOur scheduler periodically
generates a scheduling plan.Organize the run queue of each
physical core according to the scheduling plan.
Xen hypervisor assigns VCPU to PCPU according to the run queue.
45
Current StatusWe propose a three-phase solution
for generating a scheduling plan on asymmetric multi-core platform.
Our simulation results show that the asymmetry-aware strategy results in a potential energy savings of up to 56.8% against the credit-based method.
On going: implement the solution into Xen hypervisor.
46
Questions or Comments?
47