research on embedded hypervisor scheduler techniques

Research on Embedded Hypervisor Scheduler Techniques

2014/10/02

1

BackgroundAsymmetric multi-core is becoming

increasing popular over homogeneous multi-core systems.◦An asymmetric multi-core platform

consists of cores with different capabilities. ARM: big.LITTLE architecture. Qualcomm: asynchronous Symmetrical Multi-

Processing (aSMP) Nvidia: variable Symmetric Multiprocessing

(vSMP) …etc.

2

MotivationScheduling goals differ between

homogenous and asymmetric multi-core platforms.◦Homogeneous multi-core: load-

balancing. Distribute workloads evenly in order to

obtain maximum performance.

◦Asymmetric multi-core: maximize power efficiency with modest performance sacrifices.

3

Motivation(Cont.)Need new scheduling strategies

for asymmetric multi-core platform.◦The power and computing

characteristics vary from different types of cores.

◦Take the differences into consideration while scheduling.

4

Project GoalResearch on the current scheduling

algorithms for homogenous and asymmetric multi-core architecture.

Design and implement the hypervisor scheduler on asymmetric multi-core platform.Assign virtual cores to physical cores for

execution.Minimize the power consumption with

performance guarantee.

5

Linaro Linux Kernel

GUEST2

Android Framework

Scheduler

VCPU VCPU

6

ARM Cortex-A15

ARM Cortex-A7

OS Kernel

GUEST1

Android Framework

Scheduler

VCPU VCPU

Hypervisor

Performance Power-saving

OS Kernel

GUEST2

Android Framework

Scheduler

VCPU VCPU

Low computing resource requirement

High computing resource requirement

Task 2 Task 4

VM Introspector

b-L vCPU Scheduler

VM Introspector gathers task information from

Guest OS

Task-to-vCPU

Mapper

Modify the CPU mask of each task according

to the task information from VMI

[1|0][1|0]

[0|1][0|1]

Treat this vCPU as LITTLE core since tasks with low computing requirement are scheduled here.

Hypervisor vCPU scheduler will schedule big vCPU to A15, and LITTLE vCPU to A7.

VCPU

Task 3Task 1

Hypervisor Architecture with VMI

Hypervisor SchedulerAssigns the virtual cores to

physical cores for execution.◦Determines the execution order and

amount of time assigned to each virtual core according to a scheduling policy.

◦Xen - credit-based scheduler◦KVM - completely fair scheduler

7

Virtual Core Scheduling Problem

For every time period, the hypervisor scheduler is given a set of virtual cores.

Given the operating frequency of each virtual core, the scheduler will generate a scheduling plan, such that the power consumption is minimized, and the performance is guaranteed.

8

Scheduling PlanMust satisfy three constraints.

◦Each virtual core should run on each physical core for a certain amount of time to satisfy the workload requirement.

◦A virtual core can run only on a single physical core at any time.

◦The virtual core should not switch among physical cores frequently, so as to reduce the overheads.

9

Example of A Scheduling Plan

◦x: physical core idle

10

V4xxV4x … Core0

V3xV3xx … Core1

V2V4V1V2V4 … Core2


t1t2t3t4t100

Execution Slice

Three-phase Solution[Phase 1] generates the amount

of time each virtual core should run on each physical core.

[Phase 2] determines the execution order of each virtual core on a physical core.

[Phase 3] exchanges the order of execution slice in order to reduce the number of core switching.

11

Phase 1Given the objective function and

the constraints, we can use integer programming to find ai,j.◦ai,j : the amount of time slices virtual

core i should run on physical core j. Divide a time interval into time slices.

◦Integer programming can find a feasible solution in a short time when the number of vCPUs and the number of pCPUs are small constants.

12

Phase 1(Cont.)If the relationship between power

and load is linear.◦Use greedy instead.◦Assign virtual core to the physical

core with the least power/instruction ratio and load under100%.

13

Phase 2With the information from phase

1, the scheduler has to determine the execution order of each virtual core on each physical core.◦A virtual core cannot appear in two

or more physical core at the same time.

14

ExamplevCPU0

(50,40,0, 0)

t=100

t=0

vCPU3

(10,10,20, 20)

vCPU1

(20,20,20, 20)vCPU4

(10,10,10, 10)

vCPU2

(10,10,20, 20)vCPU5

(0, 0,10, 10)

Phase 2(Cont.)We can formulate the problem

into an Open-shop scheduling problem (OSSP).◦OSSP with preemption can be solved

in polynomial time. [1]

16[1] T. Gonzalez and S. Sahni. Open shop scheduling to minimize finish time. J. ACM, 23(4):665–679, Oct. 1976.

After Phase 1 & 2After the first two phases, the

scheduler generates a scheduling plan.◦x: physical core idle

17

V4xxV4x … Core0

V3xV3xx … Core1



t1t2t3t4t100

Execution Slice

Phase 3Migrating tasks between cores

incurs overhead.Reduce the overhead by

exchanging the order to minimize the number of core switching.

18

Number of Switching Minimization ProblemGiven a scheduling plan, we want

to find an order of the execution slice, such that the cost is minimized.◦An NPC problem

Reduce from the Hamilton Path Problem.

◦Propose a greedy heuristic.

19

Example

1 2 3

3 1 2

x 2 3

x x 1

4 3 2

2 1 3

p1 p2 p3

#switching = 0

Example

1 2 3 0

3 1 2 0

x 2 3 0

x x 1 0

4 3 2 0

2 1 3 0

p1 p2 p3

#switching = 0

Example

1 2 3

3 1 2

x 2 3

4 3 2

2 1 3

p1 p2 p3

x x 1 t1

#switching = 0

Example

1 2 3 1

3 1 2 1

x 2 3 0

4 3 2 0

2 1 3 1

p1 p2 p3

x x 1 t1

#switching = 0

Example

1 2 3

3 1 2

4 3 2

2 1 3

p1 p2 p3

x 2 3 t2

x x 1 t1

#switching = 0

Example

1 2 3 1

3 1 2 3

4 3 2 2

2 1 3 2

p1 p2 p3

x 2 3 t2

x x 1 t1

#switching = 0

Example

3 1 2

4 3 2

2 1 3

p1 p2 p3

1 2 3 t3

x 2 3 t2

x x 1 t1

#switching = 1

Example

p1 p2 p3

2 1 3 t6

3 1 2 t5

4 3 2 t4

1 2 3 t3

x 2 3 t2

x x 1 t1

#switching = 7

EvaluationConduct simulations to compare

the power consumption of our asymmetry-aware scheduler with that of a credit-based scheduler.

Compare the numbers of core switching from our greedy heuristic and that from an optimal solution.

28




29

EnvironmentTwo types of physical cores

◦power-hunger “big” cores frequency: 1600MHz

◦power-efficient “little” cores frequency: 600MHz

◦The DVFS mechanism is disabled.

30

Power ModelRelation between power

consumption, core frequency, and load. ◦bzip2

0 20 40 60 80 100 1200

0.5

1

1.5

2

2.5

250MHzLinear (250MHz)600MHzLinear (600MHz)8000MHzLinear (8000MHz)1600MHzLinear (1600MHz)

Loading(%)

功耗

(Watt

)

Scenario I – 2 Big and 2 LittleDual-core VM.Two sets of input:

◦Case 1: Both VMs with light workloads. 250MHz for each virtual core.

◦Case 2: One VM with heavy workloads, the other with modest workloads. Heavy:1200MHz for each virtual core Modest:600MHz for each virtual core.

32

Scenario I - Results

◦Case 1: asymmetry-aware method is about 43.2% of that of credit-based method.

◦Case 2:asymmetry-aware method uses 95.6% of energy used by the credit-base method.

33

Power(Watt)

Case 1Light-load VMs

Asymmetry-aware 0.295

Credit-based 0.683

Case 2Heavy-load VM + Modest-load

VM

Asymmetry-aware 2.382

Credit-based 2.491

Scenario 2 – 4 Big and 4 LittleQuad-core VM.Three cases

34

VM1 VM2 VM3

Case 1Light-load All 250 MHz All 250 MHz All 250 MHz

Case 2Modest-load

All 600MHz All 600 MHz All 250 MHz

Case 3Heavy-load

All 1600MHz All 1600MHz All 1600MHz

Scenario 2 - Results

In case 3, the loading of physical cores are 100% using both methods. Cannot save power if the computing resources

are not enough.

35

Power(Watt) Savings

Case 1Light-load

Asymmetry-aware 1.20541.2%

Credit-based 2.049

Case 2Modest-

Load

Asymmetry-aware 3.52411.1%Credit-based 3.960

Case 3*Heavy-load

Asymmetry-aware 6.0090%

Credit-based 6.009




36

Setting25 sets of input

◦4 physical cores, 12 virtual cores, 24 distinct execution slices.

Optimal solution◦Enumerates all possible

permutations of the execution slices.◦Use A* search to reduce the search

space.

37

Evaluation Result

38

Greedy Heuristic A* Search

Average number of switching 31.2 27.7

Average execution time 0.006 seconds 10+ minutes

XEN HYPERVISOR SCHEDULER:CODE STUDY

39

Xen HypervisorScheduler:

◦xen/common/ schedule.c sched_credit.c sched_credit2.c sched_sedf.c sched_arinc653.c

40

xen/common/schedule.cGeneric CPU scheduling code

◦implements support functionality for the Xen scheduler API.

◦scheduler: default to credit-base scheduler

static void schedule(void)◦de-schedule the current domain.◦pick a new domain.

41

xen/common/sched_credit.cCredit-based SMP CPU scheduler

static struct task_slice csched_schedule;◦Implementation of credit-base

scheduling.◦SMP Load balance.

If the next highest priority local runnable VCPU has already eaten through its credits, look on other PCPUs to see if we have more urgent work.

42

xen/common/sched_credit2.cCredit-based SMP CPU scheduler

◦Based on an earlier version.static struct task_slice

csched2_schedule;◦Select next runnable local VCPU (i.e.

top of local run queue).static void balance_load(const

struct scheduler *ops, int cpu, s_time_t now);

43

Scheduling StepsXen call do_schedule() of current

scheduler on each physical CPU(PCPU).

Scheduler selects a virtual CPU(VCPU) from run queue, and return it to Xen hypervisor.

Xen hypervisor deploy the VCPU to current PCPU.

44

Adding Our SchedulerOur scheduler periodically

generates a scheduling plan.Organize the run queue of each

physical core according to the scheduling plan.

Xen hypervisor assigns VCPU to PCPU according to the run queue.

45

Current StatusWe propose a three-phase solution

for generating a scheduling plan on asymmetric multi-core platform.

Our simulation results show that the asymmetry-aware strategy results in a potential energy savings of up to 56.8% against the credit-based method.

On going: implement the solution into Xen hypervisor.

46

Questions or Comments?

47

research on embedded hypervisor scheduler techniques

Documents