csc 660: advanced operating systemsslide #1 csc 660: advanced os scheduling

37
CSC 660: Advanced Operating Systems Slide #1 CSC 660: Advanced OS Scheduling

Upload: alice-holland

Post on 04-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #1

CSC 660: Advanced OS

Scheduling

Page 2: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #2

Topics

1. Basic Concepts2. Scheduling Policy3. The O(1) Scheduler4. Runqueues5. Priority Arrays6. Calculating Priorities and Timeslices.7. Scheduler Interrupts.8. Sleeping and Waking.9. The schedule() function10. Multiprocessor Scheduling11. Soft Realtime Scheduling

Page 3: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #3

Basic Concepts

SchedulerSelects a process to run and allocates CPU to it.

Provides semblence of multitasking on single CPU.

Scheduler is invoked when:Process blocks on an I/O operation.

A hardware interrupt occurs.

Process time slice expires.

Kernel thread yields to scheduler.

Page 4: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #4

Types of Processes

CPU BoundSpend most time on computations.Example: computer algebra systems.

I/O BoundSpend most time on I/O.Example: word processor.

MixedAlternate CPU and I/O activity.Example: web browser.

Page 5: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #5

Alternating CPU and I/O Bursts

Page 6: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #6

Scheduling Policy

Scheduler executes policy, determining1. When threads can execute.

2. How long threads can execute.

3. Where threads can execute.

Page 7: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #7

Scheduling Policy Goals

• Efficiency– Maximize amount of work accomplished.

• Interactivity– Respond as quickly as possible to user.

• Fairness– Don’t allow any process to starve.

Page 8: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #8

Which goal is most important?

Depends on the target audience:

Desktop: interactivityBut kernel shouldn’t spend all its time in context switch.

Server: efficiencyBut should offer interactivity in order to serve multiple users.

Page 9: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #9

Pre-2.6 Scheduler

O(n) algorithm at every process switch:1. Scanned list of runnable processes.

2. Computed priority of each task.

3. Selected best task to run.

Page 10: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #10

The O(1) Scheduler

Replacement for O(n) 2.4 scheduler.

All algorithms run in constant time.New data structures: runqueues and priority arrays.

Performs work in small pieces.

Additional new featuresImproved SMP scalability, including NUMA.

Better processor affinity.

SMT scheduling.

Page 11: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #11

Runqueues

List of runnable processes on a processor.

Each runnable process is a member of precisely one runqueue.

Runqueue data:Lock to prevent concurrency problems.

Pointers to current and idle tasks.

Priority arrays which contain actual tasks.

Statistics

Page 12: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #12

Runqueuesstruct runqueue {spinlock_t lock; unsigned long nr_running; unsigned long long nr_switches; unsigned long expired_timestamp,

nr_uninterruptible;unsigned long long timestamp_last_tick; task_t *curr, *idle; struct mm_struct *prev_mm; prio_array_t *active, *expired, arrays[2]; int best_expired_prio; atomic_t nr_iowait;

}

Page 13: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #13

Priority Arrays

Each runqueue contains 2 priority arraysActive array

Expired array

Basis for O(1) performance:Scheduler always runs highest priority task.

Round robin for multiple equal priority tasks.

Priority array finds highest task O(1) operation.

Using two arrays allows transitions between epochs by switching active and expired pointers.

Page 14: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #14

Priority Arrays

struct prio_array {

/* # of runnable tasks in array */

unsigned int nr_active;

/* bitmap: pri lvls contain tasks */

unsigned long bitmap[BITMAP_SIZE];

/* 1 list_head per priority (140) */

struct list_head queue[MAX_PRIO];

};

Page 15: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #15

Finding Highest Priority Task

1. Find first bit set in bitmap.sched_find_first_bit()

2. Read corresponding queue[n]If one process, give CPU to that one.

If multiple processes, round-robin schedule all processes in queue for that priority.

idx = sched_find_first_bit(array->bitmap);

queue = array->queue + idx;

next = list_entry(queue->next, task_t, run_list);

Page 16: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #16

What if no runnable task exists?

System runs the swapper task (PID 0).

Each CPU has its own swapper process.

Page 17: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #17

Running out of Timeslice

1. Remove task from active priority array.2. Calculate new priority and timeslice.3. Add task to expired priority array.4. Swap arrays when active array is empty.

array = rq->active;if (unlikely(!array->nr_active)) {

rq->active = rq->expired;rq->expired = array;...

}

Page 18: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #18

Static and Dynamic Priorities

Initial priority value called the nice value.Set via the nice() system call.

Static priority is nice value + 120.

Stored in current->static_prio.

Ranges from 100 (highest) to 139 (lowest).

Scheduling based on dynamic priority.Bonuses and penalties according to interactivity.

Stored in current->prio.

Calculated by effective_prio() function.

Page 19: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #19

Dynamic Priority Policy

Increase priority of interactive processes.Favor I/O-bound over CPU-bound.

Need heuristic for determining interactivity.Use time spent sleeping vs. runnable time.

Sleep averageStored in current->sleep_avg.Incremented when task becomes runnable.Decremented for each timer tick task runs.Scaled to produce priority bonus ranging 0..10.

Page 20: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #20

Calculating Priority/* Scale sleep_avg to range 0..MAX_BONUS */#define CURRENT_BONUS(p) \

(NS_TO_JIFFIES((p)->sleep_avg) * MAX_BONUS / \MAX_SLEEP_AVG)

static int effective_prio(task_t *p){

int bonus, prio;bonus = CURRENT_BONUS(p) - MAX_BONUS / 2;

prio = p->static_prio - bonus;return prio;

}

Page 21: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #21

Time Slices

Time slice duration critical to performance.Too short: high overhead from context switches.

Too long: loss of apparent multitasking.

Interactive processes and time slicesInteractive processes have high priority.

Pre-empt CPU bound tasks on kbd/ptr interrupts.

Long time slices slow start of new tasks.

Page 22: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #22

Calculating TimesliceInitial Timeslice

On fork(), parent + child divide remaining time evenly.Stored in current->time_slice.

Recalculating TimeslicesTime Slice = (140 – static priority) x 20 if static < 140

= (140 – static priority) x 5 if static >= 140

Description Nice Static Pri Time Slice

Highest -20 100 800ms

Default 0 120 100ms

Lowest +19 139 5ms

Page 23: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #23

Scheduler Interrupts

• Scheduler interrupt: scheduler_tick()– Invoked every 1ms by a timer interrupt.

• Decrements task’s time slice.• If a higher priority task exists,

– Higher priority task is given CPU.

– Current task remains in TASK_RUNNING state.

• If time slice expired,– Moved to expired priority array.

– If highly interactive, may be re-inserted into active priority array.

Page 24: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #24

Sleeping and Waking

Sleeping tasks are not in runqueues.Require no CPU time until awakened.

Why sleep?Waiting for I/O.

Waiting for other hardware events.

Waiting for a kernel semaphore.

Page 25: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #25

SleepingDECLARE_WAITQUEUE(wait, current);/* q is a wait queue, wait is a q entry */add_wait_queue(q, &wait);while (!condition) {set_current_state(TASK_INTERRUPTIBLE);if (signal_pending(current))

/* Handle signal */schedule()

}set_current_state(TASK_RUNNING);remove_wait_queue(q, &wait);

Page 26: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #26

Wakingwake_up() wakes up tasks on event

Exclusive: only wakes up one task on waitqueue

Non-exclusive: wakes all tasks on waitqueue

TASK_INTERRUPTIBLETASK_RUNNINGSignal

add_wait_queue

wake_up

Page 27: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #27

Multiprocessor Architectures

ClassicMemory shared by all CPUs.

HyperthreadingSingle CPU executing multiple on-chip threads.

NUMACPUs + RAM grouped in local nodes.Reduces contention for accessing RAM.Fast to access local RAM.Slower to access remote RAM.

Page 28: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #28

Multiprocessor Scheduling

Each CPU has own runqueue.

Scheduler selects tasks from local runqueue.CPU cache more likely to still be hot.

Periodic checks to balance load across CPUs.Called by rebalance_tick().

Loops over all scheduling domains.

Calls load_balance() if balance interval expired.

Page 29: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #29

load_balance()

1. Acquires this_rq->lock spin lock.

2. Finds busiest CPU with > 1 process.

3. If no busiest or current CPU is busiest, terminates.

4. Obtains spin lock on busiest CPU.

5. Pull tasks from busiest CPU to local runqueue.

6. Releases locks.

Page 30: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #30

move_tasks()

Searches for runnable tasks in expired runqueue.

Then scans active runqueue.Call pull_task() to move task if all true:

Task not currently being executed.Local CPU is in cpus_allowed bitmask.At least one of the following is true:

Local CPU is idle.Multiple attempts to move processes have failed.Process is not cache hot.

Page 31: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #31

Realtime Scheduling

Hard Real-timeGuaranteed response within defined period.

Used for embedded systems: car engines.

Ex: RealTime Application Interface (RTAI)

Soft Real-timeBest effort to meet scheduling constraints.

Used for multimedia applications.

Currently provided by Linux.

Improved by Realtime Preemption Patch.

Page 32: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #32

Soft Realtime Scheduling

Scheduling PrioritiesRT have higher priorities than any non-RT tasks.

RT priorities are static, ranging 1-99, not dynamic.

If RT tasks are runnable, no other tasks can run.

Scheduling PoliciesSCHED_NORMAL (non-realtime)

SCHED_FIFO

SCHED_RR

Page 33: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #33

Realtime Policies

SCHED_FIFOFirst-in First-out real-time Scheduling

Process uses CPU until:It blocks or yields the CPU voluntarily.

A higher priority real-time process pre-empts it.

SCHED_RRRound Robin real-time scheduling.

Process runs for time slice, then waits for other equal priority real-time processes in runqueue.

Page 34: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #34

Realtime Process Replacement

Realtime processes replaced only when:Pre-empted by a high-priority RT process.

Process performs a blocking operation.

Process is stopped or killed by a signal.

Process invokes sched_yield() system call.

SCHED_RR process has exhausted its time slice.

Page 35: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #35

Realtime System Calls

Scheduler Policysched_setscheduler()

sched_getscheduler()

Prioritysched_getparam()

sched_setparam()

Page 36: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #36

Yielding the Processor

sched_yield() system callMoves regular task to expired priority array.

RT tasks moved to end of priority list.

Kernel tasks can yield the CPU too.Call yield() function.

Page 37: CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Scheduling

CSC 660: Advanced Operating Systems Slide #37

References1. Josh Aas, “Understanding the Linux 2.6.8.1 Scheduler,”

http://josh.trancesoftware.com/linux/, 2005.2. Daniel P. Bovet and Marco Cesati, Understanding the Linux Kernel,

3rd edition, O’Reilly, 2005.3. Corbet, “Realtime preemption and read-copy-update,” Linux Weekly

News, http://lwn.net/Articles/129511/, March 29, 2005.4. Robert Love, Linux Kernel Development, 2nd edition, Prentice-Hall,

2005.5. Claudia Rodriguez et al, The Linux Kernel Primer, Prentice-Hall,

2005.6. RTAI, http://www.rtai.org/, 2006.7. Peter Salzman et. al., Linux Kernel Module Programming Guide,

version 2.6.1, 2005.8. Avi Silberchatz et. al., Operating System Concepts, 7th edition, 2004.9. Andrew S. Tanenbaum, Modern Operating Systems, 3rd edition,

Prentice-Hall, 2005.