1 ece7995 computer storage and operating system design lecture 3: processes and threads (i)

1

ECE7995Computer Storage and Operating System

Design

Lecture 3: Processes and threads (I)

2

Why Multiprogramming and Timesharing? (revisit)

3

Protection (revisit)

• OS must protect/isolate applications from each other, and OS from applications

• Three techniquesPreemption: granted resources can be revoked

Interposition: must go through OS to access resourcesPrivilege: user/kernel modes

4

Mode Switching (revisit)

• User Kernel mode For reasons external or internal to CPU

• External (aka hardware) interrupt: timer/clock chip, I/O device, network card, keyboard, mouse

asynchronous (with respect to the executing program)

• Internal interrupt (aka software interrupt, trap, or exception)are synchronous

System Call (process wants to enter kernel to obtain services) – intended

Fault/exception (division by zero, privileged instruction in user mode) – usually unintended

• Kernel User mode switch on iret instruction

5

Mode Switch (revisit)

Process 1

Process 2

Kernel

user mode

kernel mode

Timer interrupt: P1 is preempted, context switch to P2

Timer interrupt: P1 is preempted, context switch to P2

System call: (trap): P2 starts I/O operation, blocks context switch to process 1

System call: (trap): P2 starts I/O operation, blocks context switch to process 1

I/O device interrupt:P2’s I/O completeswitch back to P2

I/O device interrupt:P2’s I/O completeswitch back to P2

Timer interrupt: P2 still hastime left, no context switch

Timer interrupt: P2 still hastime left, no context switch

6

Overview

• Process concept

• Process Scheduling

• Thread concept

7

Process

These are all possible definitions:• A program in execution, and process execution must progress in

sequential fashion• An instance of a program running on a computer• Schedulable entity (*)• Unit of resource ownership• Unit of protection• Execution sequence (*) + current state (*) + set of resources

(*) can be said of threads as well

8

Process in Memory

A process includes• text section• program counter • stack• data section• heap

9

Data placement, seen from C/C++

Q.: which of these variables are stored?

int a;

static int b;

int c = 5;

struct S {

int t;

};

struct S s;

void func(int d)

{

static int e;

int f;

struct S w;

int *g = new int[10];

}

A.: On stack: d, f, w (including w.t), gOn (global) data section: a, b, c, s (including s.t), e On the heap: g[0]…g[9]

A.: On stack: d, f, w (including w.t), gOn (global) data section: a, b, c, s (including s.t), e On the heap: g[0]…g[9]

10

Process State

As a process executes, it changes state

• new: The process is being created

• running: Instructions are being executed

• waiting: The process is waiting for some event to occur

• ready: The process is waiting to be assigned to a processor

• terminated: The process has finished execution

11

Process Control Block (PCB)

PCB records information associated with each process• Process identifier (pid)• Value of registers, including stack pointer

Program counter CPU registers

• Information needed by scheduler Process state Other CPU scheduling information

• Resources held by process: Memory-management information I/O status information

• Accounting information

12

Context Switching

• Multiprogramming: switch to another process if current process is (momentarily) blocked

• Time-sharing: switch to another process periodically to make sure all process make equal progress

• this switch is called a context switch.

• Understand how it works

how it interacts with user/kernel mode switching

how it maintains the illusion of each process having the CPU to itself (process must not notice being switched in and out!)

13

CPU Switch From Process to Process

1) Save the current process’s execution state to its PCB2) Update current’s PCB as needed3) Choose next process N4) Update N’s PCB as needed5) Restore N’s PCB execution state

May involve reprogramming MMU

14

Process Scheduling Queues

• Processes are linked in multiple queues:Job queue – set of all processes in the systemReady queue – set of all processes residing in main memory,

ready and waiting to executeDevice queues (wait queue )– set of processes waiting for an I/O

device or other events

• Processes migrate among the various queues

15

Ready Queue And Various I/O Device Queues

16

Representation of Process Scheduling

17

Schedulers

• Long-term scheduler (or job scheduler) – selects which processes should be brought into the ready queueThe long-term scheduler controls the degree of multiprogrammingA good mix of processes can be described as either:

– I/O-bound process – spends more time doing I/O than computations, many short CPU bursts

– CPU-bound process – spends more time doing computations; few very long CPU bursts

• Short-term scheduler (or CPU scheduler) – selects which process should be executed next and allocates CPU

18

CPU Scheduling

• Selects from among the processes in memory that are ready to execute, and allocates the CPU to one of them

• CPU scheduling decisions may take place when a process:

1. Switches from running to waiting state

2. Switches from running to ready state

3. Switches from waiting to ready

4. Terminates

• Scheduling under 1 and 4 is nonpreemptive• All other scheduling is preemptive

19

Preemptive vs Nonpreemptive Scheduling

Q.: when is scheduler asked to pick a thread from ready queue?

Nonpreemptive:

• Only when RUNNING BLOCKED transition

• Or RUNNING EXIT

• Or voluntary yield: RUNNING READY

Preemptive

• Also when BLOCKED READY transition

• Also on timer

RUNNINGRUNNING

READYREADYBLOCKEDBLOCKED

Processmust waitfor event

Event arrived

Schedulerpicks process

Processpreempted

20

Dispatcher

• Dispatcher module gives control of the CPU to the process selected by the short-term scheduler; this involves:switching contextswitching to user modejumping to the proper location in the user program to

restart that program

• Dispatch latency – time it takes for the dispatcher to stop one process and start another running

21

Static vs Dynamic Scheduling

• Static schedulingThe arrival and execution times of all jobs are known in advance. It create a

schedule, execute it– Used in statically configured systems, such as embedded real-time systems

• Dynamic/online scheduling• Jobs are not known in advance, scheduler must make online

decision whenever jobs arrives or leaves– Execution time may or may not be known– Behavior can be modeled by making assumptions about nature of arrival

process

22

Alternating Sequence of CPU And I/O Bursts

23

CPU Scheduling ModelProcess alternates between CPU burst and I/O burst

I/O Bound Process

CPU Bound Process

I/OCPUWaiting

P1

P1

P2

P2

Scheduling on the same CPU:

24

CPU Scheduling Terminology

• A job (sometimes called a task, or a job instance)

Activity that’s schedulable: process/thread and a collection of processes that are scheduled together

• Arrival time: time when job arrives

• Start time: time when job actually starts

• Finish time: time when job is done

• Completion time (aka Turn-around time)

Finish time – Arrival time

• Response time

Time when user sees response – Arrival time

• Execution time (aka cost): time a job need to execute

25

CPU Scheduling Terminology (cont’d)

• Waiting time = time when job was ready-to-rundidn’t run because CPU scheduler picked another job

• Blocked time = time when job was blockedwhile I/O device is in use

• Completion time Execution time + Waiting time + Blocked time

26

CPU Scheduling Goals

• Minimize latencyCan mean completion time

Can mean response time

• Maximize throughputThroughput: number of finished jobs per time-unit

Implies minimizing overhead (for context-switching, for scheduling algorithm itself)

Requires efficient use of non-CPU resources

• FairnessMinimize variance in waiting time/completion time

27

Scheduling Constraints

Reaching those goals is difficult, because• Goals are conflicting:

– Latency vs. throughput– Fairness vs. low overhead

• Scheduler must operate with incomplete knowledge– Execution time may not be known– I/O device use may not be known

• Scheduler must make decision fast– Approximate best solution from huge solution space

28

Round Robin (RR)• Each process gets a small unit of CPU time (time

quantum), usually 10-100 milliseconds. After this time has elapsed, the process is preempted and added to the end of the ready queue.

• If there are n processes in the ready queue and the time quantum is q, then each process gets 1/n of the CPU time in chunks of at most q time units at once. No process waits more than (n-1)q time units.

• No more unfairness to short jobs or starvation for long jobs!

• Performanceq large FIFOq small q must be large with respect to context switch,

otherwise overhead is too high

29

Example of RR with Time Quantum = 20Process CPU Burst Time

P1 53

P2 17

P3 68

P4 24

The schedule is:

P1 P2 P3 P4 P1 P3 P4 P1 P3 P3

0 20 37 57 77 97 117 121 134 154 162

30

Round Robin – Cost of Time Slicing• Context switching incurs a cost

Direct cost (execute scheduler & context switch) + indirect cost (cache & TLB misses)

• Long time slices lower overhead, but approaches FCFS if processes finish before timeslice expires

• Short time slices lots of context switches, high overhead• Typical cost: context switch < 10 us• Time slice typical around 100 ms

Linux: 100ms default, adjust to between 10ms & 300msNote: time slice length != interval between timer interrupts

Timer frequency usually 1000Hz

31

Multi-Level Feedback Queue Scheduling

• Objectives:preference for short jobs (tends to lead to good I/O utilization)

longer timeslices for CPU bound jobs (reduces context-switching overhead)

• Challenge: Don’t know type of each process – algorithm needs to figure out

• Solutions: use multiple queuesqueue determines priority

usually combined with static priorities (nice values)

many variations of this

32

MLFQS

MIN

MAXH

ighe

r P

riorit

y

4

3

1

2

Long

er T

imes

lices

Process thatuse up theirtime slice movedown

Processes start in highest queue

• Higher priority queues are served before lower-priority ones - within highest-priority queue, round-robin

Processes that starve

move up

• Only ready processes are in this queue - blocked processes leave queue and reenter same queue on unblock

33

Case Study: Linux Scheduler

• Variant of MLFQS

• 140 priorities

1-99 “realtime”

100-140 nonrealtime

• Dynamic priority computed from static priority (nice) plus “interactivity bonus”

0

100

120

139

“Realtime”processesscheduled based on static prioritySCHED_FIFOSCHED_RR

Processes scheduledbased on dynamicprioritySCHED_NORMAL

nice=0

nice=19

nice=-20

Default

34

Linux Scheduler (cont’d)Instead of recomputation loop, recompute priority at end of each

timeslice• dyn_prio = nice + interactivity bonus (-5…5)

Interactivity bonus depends on sleep_avg• measures time a process was blocked

2 priority arrays (“active” & “expired”) in each runqueue (Linux calls ready queues “runqueue”)

• Finds highest-priority ready thread quickly

• Switching active & expired arrays at end of epoch is simple pointer swap

35

Linux Timeslice Computation• Principle: choose a timeslice as long as possible while keeping good

system response time.

• Various tweaks:“interactive processes” are reinserted into active array even after

timeslice expires– Unless processes in expired array are starving

processes with long timeslices are round-robin’d with other of equal priority at sub-timeslice granularity

Q: Does a very long quantum duration degrade the response time of interactive applications?

36

Linux SMP Load Balancing

• One runqueue is per CPU• Periodically, lengths of runqueues on different CPU is compared

Processes are migrated to balance load

• Migrating requires locks on both runqueues

37

Scheduling Summary• OS must schedule all resources in a system

CPU, Disk, Network, etc.

• CPU Scheduling affects indirectly scheduling of other devices• Goals: (1) Minimize latency (2) Maximize throughput (3) Provide fairness

• In Practice: some theory, lots of tweaking

38

Single and Multithreaded Processes

39

Benefits of Multithreading

• Responsiveness

• Resource Sharing

• Economy

• Utilization of MP Architectures

40

Thread Implementations

• User threadsThread management done by user-level threads libraryThree primary thread libraries:

― POSIX Pthreads― Win32 threads― Java threads

• Kernel threads:Supported by the Kernel Examples

– Windows XP/2000– Solaris– Linux– Tru64 UNIX– Mac OS X

41

#include <stdio.h>#include <stdlib.h>#include <pthread.h>

void *print_message_function( void *ptr );main(){

pthread_t thread1, thread2;char *message1 = "Thread 1";char *message2 = "Thread 2";int iret1, iret2;

/* Create independent threads each of which will execute function */

iret1 = pthread_create( &thread1, NULL, print_message_function, (void*) message1);iret2 = pthread_create( &thread2, NULL, print_message_function, (void*) message2);

/* Wait till threads are complete before main continues. Unless we *//* wait we run the risk of executing an exit which will terminate *//* the process and all threads before the threads have completed. */

pthread_join( thread1, NULL);pthread_join( thread2, NULL);

printf("Thread 1 returns: %d\n",iret1);printf("Thread 2 returns: %d\n",iret2);exit(0);

}void *print_message_function( void *ptr ){

char *message;message = (char *) ptr;printf("%s \n", message);

}

An example: Pthread

cc -lpthread pthread1.c

42

Multithreading Models

• Many-to-One: many user-level threads mapped to single kernel thread

Examples:―Solaris Green Threads―GNU Portable Threads

Pros: efficient, no need for OS support for threading

Cons: lost concurrency with blocked in the kernel, cannot use MP

• One-to-One: each user-level thread maps to kernel thread

Examples― Windows NT/XP/2000

― Linux

― Solaris 9 and later

the number of kernel threads must be limited to reduce system cost

43

Multithreading Models (cont’d)• Many-to-Many: allows many user level threads to be mapped to many kernel

threads

Allows the operating system to create a sufficient number of kernel threads

Examples:― Solaris prior to version 9

― Windows NT/2000 with the ThreadFiber package

44

Thread Pools

• Create a number of threads in a pool where they await work• Advantages:

Usually slightly faster to service a request with an existing thread than create a new thread

Allows the number of threads in the application(s) to be bound to the size of the pool

45

Linux Threads• Linux treats thread as lightweight process• Each process (or thread) has its own task_struct

structure (PCB in Linux), and is identified by its own Process ID (or PID);

• However, Unix programmer expects threads in the same group (or process) to have a common PID. Linux uses tgid in PCB to record the PID of thread grorp

leader. So all the threads in a group share the same identifier.

• Thread creation is done through clone() system callclone() is a variant of fork() that allows a child task to share

certain resources with its parent via a parameter (CLONE_VM, CLONE_FILES …)

1 ece7995 computer storage and operating system design lecture 3: processes and threads (i)

Documents

process execution

current process

mode switch revisit

scheduler process state

context switch slide

execution slide

revisit slide

s w int