on threads, processes and co-processes(c) 2008,2009 codefidence ltd. spot the difference (1) the...

35
(C) 2008,2009 Codefidence CELF ELC Europe 2009 Gilad Ben-Yossef Codefidence Ltd. On Threads, Processes and Co-Processes

Upload: others

Post on 25-Apr-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

CELF ELC Europe 2009

Gilad Ben-YossefCodefidence Ltd.

On Threads, Processes and Co-Processes

Page 2: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

About Me

Chief Coffee Drinker of Codefidence Ltd.

Co-Author of Building Embedded Linux System, 2nd edition

Israeli FOSS NPO Hamakor co-founder

August Penguin co-founder

git blame -- FOSS.*

Linux Asterisk cfgsh (RIP) ...

Page 3: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

Linux Process and Threads

Thread 1 Thread1

Thread2

Thread3

Thread4

Process 123 Process 124

FileDescriptors

MemorySignal

HandlersFile

DescriptorsMemory Signal

Handlers

Stack

State

SignalMask

Stack

State

SignalMask

Stack

State

SignalMask

Stack

State

SignalMask

Stack

State

SignalMask

Priority Priority Priority Priority Priority

Page 4: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

The Question

Should we use threads?Given an application that calls for several tasks, should we

implement each task as a thread in the same processes or in a separate processes?

Page 5: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

What Do The Old Wise Men Say?

"If you think you need threads then your processes are too fat." Rob Pike, co-author of The Practice of Programming and

The Unix Programming Environment.

“A Computer is a state machine. Threads are for people who can't program state machines." Alan Cox, Linux kernel programmer

Page 6: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

Spot the Difference (1)

The Linux scheduler always operates at a thread granularity level.

You can start a new process without loading a new program, just like with a thread fork() vs. exec()

Process can communicate with each other just like threads Shared memory, mutexes, semaphores, message

queues etc. work between processes too.

Page 7: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

Spot the Difference (2)

Process creation time is roughly double that of a thread. ... but Linux process creation time is still very small. ... but most embedded systems pre-create all tasks

in advance anyway.

Each process has it's own virtual memory address space.

Threads (of the same process) share the virtual address space.

Page 8: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

Physical and Virtual Memory

0x00000000

0xFFFFFFFFF

Physical address space

RAM 0

RAM 1

Flash

I/O memory 1

I/O memory 2

I/O memory 3

MMU

MemoryManagement

Unit

CPU

Virtual address spaces0xFFFFFFFFF

0x00000000

Process 0

0xFFFFFFFFF

0x00000000

Process1

0xFFFFFFFFF

0x00000000

Process2

All the processes have their own virtual address space, and run as if they had access to the whole address space.

This slide is © 2004 – 2009 Free Electrons.

Page 9: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

Page Tables

ASN Virtual Physical Permission

12 0x8000 0x5340 RWXC

15 0x8000 0x3390 RX

CPU MemoryBus

MMU

ReadWrite

ExecuteCached

Read OnlyExecute

Non Cached

Page FrameAddress

Virtual PageAddressAddress

SpaceNumber *

* Does not exists on all architectures.

Page 10: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

Translation Look-aside Buffers

The MMU caches the content of page tables in a CPU local cache called the TLB.

TLBs can be managed in hardware automatically (x86, Sparc) or by software (Mips, PowerPC)

Making changes to the page tables might require a TLB cache flush, if the architecture does not support TLB ASN (Alpha, Intel Nehalem, AMD SVM)

Page 11: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

Cache Indexes

Data and Instruction caches may use either virtual or physical address as the key to the cache.

Physically indexed caches don't care about different address spaces.

Virtually indexed caches cannot keep more then one alias to the same physical address Or need to carefully manage aliasing using tagging.

Page 12: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

LMBench

LMBench is a suite of simple, portable benchmarks by Larry McVoy and Carl Staelin

lat_ctx measures context switching time. Original lat_ctx supported only measurement of

inter process context switches. I have extended it to measure also inter thread

content switches. All bugs are mine, not Larry and Carl :-)

Page 13: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

How lat_ctx works (1)

Task 1

Task 2

Task 3

8923478234972364972349723469234692346923462397462937y923236497234693246928923478234972364972349723469234692346923462397462937y923236497234693246928923478234972364972349723469234692346923462397462937y923236497234693246928923478234972364972349723469234692346923462397462937y923236497234693246928923478234972364972349723469234692346923462397462937y923236497234693246928923478234972364972349723469234692346923462397462937y923236497234693246928923478234972364972349723469234692346923462397462937y923236497234693246928923478234972364972349723469234692346923462397462937y923236497234693246928923478234972364972349723469234692346923462397462937y923236497234693246928923478234972364972349723469234692346923462397462937y923236497234693246928923478234972364972349723469234692346923462397462937y923236497234693246923246928923478234972364972349723469234692346923462397462937y923236497234693246928923478234972364972349723469234692346923462397462937y923236497234693246928923478234972364972349723469234692346923462397462937y923236497234693246928923478234972364972349723469234692346923462397462937y923236497234693246928923478234972364972349723469234692346923462397462937y92323649723469324692892347823497236497234

2. Passtoken on a

Unix pipe to next task

1. Perform calculation on a

variable size array

Can set task size and number of tasks

Page 14: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

How lat_ctx works (2)

Both the data and the instruction cache get polluted by some amount before the token is passed on. The data cache gets polluted by approximately the

process ``size''. The instruction cache gets polluted by a constant

amount, approximately 2.7 thousand instructions. The benchmark measures only the context switch

time, not including the overhead of doing the work. A warm up run with hot caches is used as a reference.

Page 15: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

lat_ctx Accuracy

The numbers produced by this benchmark are somewhat inaccurate; They vary by about 10 to 15% from run to run.

The possible reasons for the inaccuracies are detailed in LMBench documentation They aren't really sure either.

Page 16: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

Context Switch Costs

Using the modified lat_ctx we can measure the difference between the context switch times of threads and processes.

Two systems used: Intel x86 Core2 Duo 2Ghz

Dual Core Hardware TLB

Freescale PowerPC MPC8568 MDS Single core Software TLB

Page 17: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

X86 Core2 Duo 2Ghz 0k data

5 10 201.6

1.65

1.7

1.75

1.8

1.85

1.76

1.81

1.83

1.69

1.74

1.81

ThreadsProccesses

Number of Threads/Processes

Co

nte

xt S

wit

ch T

ime

in U

sec

Page 18: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

X86 Core2 Duo 2Ghz 16k data

5 10 202.192.19

2.22.2

2.212.212.222.222.232.232.24

2.22 2.22

2.23

2.2

2.21

2.22

ThreadsProccesses

Number of Threads/Processes

Co

nte

xt S

wit

ch T

ime

in U

sec

Page 19: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

X86 Core2 Duo 2Ghz 128k data

5 10 200

2

4

6

8

10

12

14

16

18

1.99

3.73

15.85

1.923.28

16.61

ThreadsProccesses

Number of Threads/Processes

Co

nte

xt S

wit

ch T

ime

in U

sec

Page 20: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

PPC MPC8568 MDS 0k data

5 10 200

0.5

1

1.5

2

2.5

3

3.5

4

2.36

3.053.26

2.61

3.243.44

ThreadsProccesses

Number of Threads/Processes

Co

nte

xt S

wit

ch T

ime

in U

sec

Page 21: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

PPC MPC8568 MDS 16k data

5 10 2002468

101214161820

13.18 13.35

17.77

13.7 14.25

17.87

ThreadsProccesses

Number of Threads/Processes

Co

nte

xt S

wit

ch T

ime

in U

sec

Page 22: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

PPC MPC8568 MDS 128k data

5 10 200

50

100

150

200

250

167.88

222.41 227.68

178.06

227.63 228.23

ThreadsProccesses

Number of Threads/Processes

Co

nte

xt S

wit

ch T

ime

in U

sec

Page 23: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

Conclusions

Context switch times change between threads and processes.

It is not a priori obvious that threads are better. The difference is quite small. The results vary between architectures and

platforms. Why do we really use threads?

Page 24: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

Why People Use Threads?

It's the API, Silly.POSIX thread API offers simple mental model.

Page 25: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

API Complexity: Task Creation

fork() Zero parameters. Set everything yourself

after process creation. New process begins

with a virtual copy of parent at same location

Copy on write semantics require shared memory setup

pthread_create() Can specify most

attributes for new thread during create

Can specify function for new thread to start with

Easy to grasp address space sharing

Page 26: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

API Complexity: Shared Memory

Threads share all the memory. You can easily setup a segment of shared

memory for processes using shm_create() Share only what you need!

BUT... each process may map shared segment at different virtual address. Pointers to shared memory cannot be shared! A simple linked list becomes complex.

Page 27: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

API Complexity: File Handles

In Unix, Everything is a File. Threads share file descriptors. Processes do not.

Although Unix Domain Socket can be used to pass file descriptors between processes.

System V semaphores undo values not shared as well, unlike threads.

Can make life more complicated.

Page 28: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

Processes API Advantages

Processes PID visible in the

system. Can set process

name via program_invocation_name.

Easy to identify a processes in the system.

Threads No way to name task

in a unique name. Kernel thread id not

related to internal thread handle.

Difficult to ID a thread in the system.

Page 29: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

The CoProc Library

CoProc is a proof of concept library that provides an API implementing share-as-you-need semantics for tasks Wrapper around Linux clone() and friends.

Co Processes offer a golden path between traditional threads and processes.

Check out the code at: http://github.com/gby/coproc

Page 30: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

CoProc Highlights

A managed shared memory segment, guaranteed to be mapped at the same virtual address

Coproc PID and name visible to system Decide to share file descriptors or not at coproc

creation time. Set attributes at coproc creation time.

Detached/joinable, address space size, core size, max CPU time, stack size, scheduling policy, priority, and CPU affinity supported.

Page 31: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

CoProc API Overview

int coproc_init(size_t shm_max_size);

pid_t coproc_create(char * coproc_name, struct coproc_attributes * attrib, int flags, int (* start_routine)(void *), void * arg);

int coproc_exit(void);

void * coproc_alloc(size_t size);

void coproc_free(void * ptr);

int coproc_join(pid_t pid, int * status);

Page 32: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

CoProc Attributesstruct coproc_attributes {   rlim_t address_space_size;   /* The maximum size of the process

       address space in bytes */  rlim_t core_file_size;     /* Maximum size of core file */  rlim_t cpu_time;     /* CPU time limit in seconds */  rlim_t stack_size;     /* The maximum size of the process                 stack, in bytes */  int    scheduling_policy;   /* Scheduling policy. */  int    scheduling_param;   /* Scheduling priority or 

      nice level */  cpu_set_t cpu_affinity_mask;   /* The CPU mask of the co­proc */};

Page 33: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

CoProc Simple Usage Example

  int pid, ret;  char * test_mem;  struct coproc_attributes = { ... };

  coproc_init(1024 * 1024);

  test_mem = coproc_alloc(1024); 

  if(!test_mem) abort();   pid = coproc_create("test_coproc", &test_coproc_attr, \    COPROC_SHARE_FS ,test_coproc_func, test_mem);

  if(pid < 0) abort();

  coproc_join(pid, &ret);

  coproc_free(test_mem);

Page 34: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

Credits

The author wishes to acknowledge the contribution of the following people: Larry McVoy and Carl Staelin for LMBench. Joel Issacason, for an eye opening paper about a

different approach to the same issue Rusty Russel, for libantithreads, yet another

approach to the same issue. Free Electrons, for Virt/Phy slide. Sergio Leone, Clint Eastwood, Lee Van Cleef, and

Eli Wallach for the movie :-)

Page 35: On Threads, Processes and Co-Processes(C) 2008,2009 Codefidence Ltd. Spot the Difference (1) The Linux scheduler always operates at a thread granularity level. You can start a new

(C) 2008,2009 Codefidence Ltd.

Thank You For Listening!

Codefidece Ltd.: http://codefidence.com Community site: http://tuxology.net Email: [email protected] Phone: +972-52-8260388 Twitter: @giladby SIP: [email protected] Skype: gilad_codefidence

Questions?