ca4/cae4 concurrent programming dr. martin cranemcrane/ca463/lecture1.pdf · concurrent programming...

27
page 1 C A 4 / C A E 4 C o n c u r r e n t P r o g r a m m i n g D r . M a r t i n C r a n e R e c o m m e n d e d T e x t : F o u n d a t i o n s o f M u l t i t h r e a d e d , P a r a l l e l a n d D i s t r i b u t e d P r o g r a m m i n g , G . R . A n d r e w s , A d d i s o n - W e s l e y , 2 0 0 0 . I S B N 0 - 2 0 1 - 3 5 7 5 2 - 6 A d d i t i o n a l T e x t s : P r i n c i p l e s o f C o n c u r r e n t a n d D i s t r i b u t e d P r o g r a m m i n g , M . B e n - A r i , P r e n t i c e H a l l , 1 9 9 0 . T h e S R P r o g r a m m i n g L a n g u a g e , C o n c u r r e n c y i n P r a c t i c e , G . R . A n d r e w s a n d R . A . O l s s o n , B e n j a m i n / C u m m i n g s , 1 9 9 3 . U s i n g M P I : P o r t a b l e P a r a l l e l P r o g r a m m i n g w i t h t h e M e s s a g e P a s s i n g I n t e r f a c e , W . G r o p p , E . L u s k , A . S k j e l l u m , 2 nd E d i t i o n , M I T P r e s s 1 9 9 9 .

Upload: others

Post on 25-Jan-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

  • page 1

    CA4/CAE4

    Concurrent Programming

    Dr. Martin Crane

    Recommended Text:

    “Foundations of Multithreaded, Parallel and DistributedProgramming”, G.R. Andrews, Addison-Wesley, 2000.ISBN 0-201-35752-6

    Additional Texts:

    “Principles of Concurrent and DistributedProgramming”, M. Ben-Ari, Prentice Hall, 1990.

    “The SR Programming Language, Concurrency inPractice” , G.R. Andrews and R.A. Olsson,Benjamin/Cummings, 1993.

    “Using MPI: Portable Parallel Programming with theMessage Passing Interface” , W. Gropp, E. Lusk, A.Skjellum, 2nd Edition, MIT Press 1999.

  • page 2

    Course Outline

    • Introduction to Concurrent Processing

    • Critical Sections and Mutual Exclusion

    • Semaphores

    • Monitors

    • Message Passing (Java and MPI)

    • Remote Procedure Calls (RPC)

    • Rendezvous

    • Languages for Concurrent Processing (SR, Java,Linda)

    • Load Balancing and Resource Allocation

    • Fault Tolerance

  • page 3

    Parallel vs. Distr ibuted Computing

    The individual processors of a distributed systemare physically distributed (loosely coupled), whereasparallel systems are “ in the same box” (tightly coupled).Both are examples of multiprocessing, but distributedcomputing introduces extra issues.

    Why bother?

    The advantages of concurrent processing are:

    1) Faster processing

    2) Better resource usage

    3) Fault tolerance

    Parallel Computing

    Distributed Computing

    Concurrent Computing

  • page 4

    Types of Concurrent Processing

    Over lapping I /O and Processing

    Multi-Programming

    One processor time-slices between severalprograms.

    I/O

    Main

    Active

    Waiting

    I/O

    P1

    Active

    Waiting

    P1 I/O P2 I/O P1 I/O

    P2

  • page 5

    Multi-tasking

    This is a generalisation of the multi-programmingconcept. Each program is decomposed into a set ofconcurrent tasks. The processor time-slices between allthe tasks.

    Multi-Processing

    The set of concurrent tasks are processed by a set ofinterconnected processors. Each processor may havemore than one task allocated to it, and may time-slicebetween its allocated tasks.

    Applications of Concurrent Processing

    Predictive Modelling and Simulation

    Weather forecasting, Oceanography and astrophysics,Socioeconomics and governmental use, etc.

    Engineer ing Design and Automation

    Finite-element analysis, Computational Aerodynamics,Artificial Intelligence (Image processing, patternrecognition, speech understanding, expert systems, etc.),Remote Sensing, etc.

  • page 6

    Energy Resource Exploration

    Seismic Exploration, Reservoir Modelling, PlasmaFusion Power, Nuclear Reactor Safety, etc.

    Medical

    Computer-assisted Tomography.

    Military

    Weapons Research, Intelligence gathering surveillance,etc.

    Research

    Computational Chemistry and Physics, GenomeResearch, VLSI analysis and design, etc.

    Parallel Speed-up

    Minsky’s Conjecture

    Minsky argued that algorithms such as binarysearch, branch and bound, etc. would give the bestspeed-up. All processors would be used in the 1stiteration, half in the 2nd, and so, giving a speed-up oflog2 P .

  • page 7

    Amdahl’s Law

    Gene Amdahl divided a program into 2 sections,one that is inherently serial and the other which can beparallel. Let α be the fraction of the program which isinherently serial. Then,

    ( )( )( ) ( )

    Speed up ST

    TP

    P

    PP− = + −

    + −

    =+ −

    ≤ ∀ >, ,α α

    α α α α11 1 1

    11

    If α = 5%, then S ≤ 20.

    The Sandia Exper iments

    The Karp prize was established for the first programto achieve a speed-up of 200 or better. In 1988, a teamfrom Sandia laboratories reported a speed-up of over1,000 on a 1,024 processor system on three differentproblems.

    How?

    Moler ’s Law

    An implicit assumption of Amdahl’s Law is that thefraction of the program which is inherently serial, α, isindependent of the size of the program. The Sandiaexperiments showed this to be false. As the problem size

  • page 8

    increased the inherently serial parts of a programremained the same or increased at a slower rate than theproblem size. So Amdahl’s law should be

    ( )S

    n≤ 1

    α

    so as the problem size, n, increases, ( )α n decreasesand S increases.

    Moler defined an efficient parallel algorithm as onewhere ( )α n → 0 as n → ∞ (Moler’s law).

    Sullivan’s First Theorem

    The speed-up of a program is ( )min ,P C , where P isthe number of processors and C is the concurrency of theprogram.

    Execution graph

    (directed acyclic graph)

    If N is the number of operations in the executiongraph, and D is the longest path through the graph thenthe concurrency C = N/D.

    The maximum speed-up is a property of thestructure of the parallel program.

  • page 9

    Architectural Classification Schemes

    Flynn’s Classification

    Bases on multiplicity of instruction streams anddata streams.

    SISD Single Instruction Single Data

    SIMD Single Instruction Multiple Data

    Array processors

    CU PU MM

    IS

    IS DS

    MM

    MM

    MM

    PU

    PU

    PU

    CU

    IS

    IS

    DS

    DS

    DS

  • page 10

    MISD Multiple Instruction Single Data

    Some say this is impractical - no MISDmachine exists.

    MIMD Multiple Instruction Multiple Data

    Most parallel systems - tightly vs. loosely coupled

    CU

    CU

    CU

    PU

    PU

    PU

    IS

    IS

    IS

    MM MM MM

    DS

    DS

    IS IS IS

    PU

    PU

    PU

    IS

    IS

    IS

    CU

    CU

    CU

    IS

    IS

    IS

    MM

    MM

    MM

    DS

    DS

    DS

  • page 11

    Processor Topologies

    Farm (Star)

    Ring

    Used in image processing applications, or anythingwhich uses a nearest-neighbour operator.

    Mesh

    Image processing, finite-element analysis, nearest-neighbour operations.

  • page 12

    Torus

    A variant of the mesh.

    Hypercube

    A structure with the minimum diameter property.

    3-d hypercube

    4-d hypercube

  • page 13

    A Model of Concurrent Programming

    A concurrent program is the interleaving of sets ofsequential atomic instructions.

    A concurrent program can be considered as a set ofinteracting sequential processes. These sequentialprocesses execute at the same time, on the same ordifferent processors. The processes are said to beinterleaved, that is at any given time each processor isexecuting one of the instructions of the sequentialprocesses. The relative rate at which the instructions ofeach process are executed is not important.

    Each sequential process consists of a series ofatomic instructions. An atomic instruction is aninstruction that once it starts, proceeds to completionwithout interruption. Different processors have differentatomic instructions , and this can have a big effect.

    N : Integer := 0;

    Task body P1 isbegin

    N := N + 1;end P1;

    Task body P2 isbegin

    N := N + 1;end P2;

  • page 14

    If the processor includes instructions like INC thenthis program will be correct no matter which instructionis executed first. But if all arithmetic must be performedin registers then following interleaving does not producethe desired results.

    P1: load reg, NP2: load reg, NP1: add reg, #1P2: add reg, #1P1: store reg, NP2: store reg, N

    A concurrent program must be correct under allpossible interleavings.

    Correctness

    If ( )P a� is a property of the input (pre-condition),and ( )Q a b� �, is a property of the input and output(post-condition), then correctness is defined as:

    Partial correctness

    ( ) ( )( )( ) ( )P a a b Q a b� � � � �∧ ⇒terminates Prog , ,Total correctness

    ( ) ( )( ) ( )( )P a a b Q a b� � � � �⇒ ∧terminates Prog , ,

  • page 15

    Totally correct programs terminate. A totally correctspecification of the incrementing tasks is:

    ( )( )( )a N a a a a∈ ⇒ ∧ = +terminates INC , 1There are 2 types of correctness properties:

    Safety proper ties These must always be true.

    Mutual exclusion Two processes must notinterleave certain sequences ofinstructions.

    Absence of deadlock Deadlock is when anon-terminating system cannotrespond to any signal.

    Liveness proper ties These must eventually be true.

    Absence of starvation Information sent is delivered.

    Fairness That any contention must beresolved.

    There 4 different way to specify fairness.

    Weak Fairness If a process continuously makes arequest, eventually it will be granted.

    Strong fairness If a process makes a request infinitelyoften, eventually it will be granted.

  • page 16

    Linear waiting If a process makes a request, it willbe granted before any other processis granted the request more thanonce.

    FIFO If a process makes a request, it willbe granted before any other processmakes a later request.

    Mutual Exclusion

    A concurrent program must be correct in allallowable interleavings. Therefore there must be somesections of the different processes which cannot beallowed to be interleaved. These are called criticalsections.

    do true ->Non_Critical_SectionPre_protocolCritical_SectionPost_protocol

    od

  • page 17

    First proposed solution

    var Turn: int := 1;

    process P1do true ->

    Non_Critical_Sectiondo Turn != 1 ->odCritical_SectionTurn := 2

    odend

    process P2do true ->

    Non_Critical_Sectiondo Turn != 2 ->odCritical_SectionTurn := 1

    odend

    This solution satisfies mutual exclusion.

    This solution cannot deadlock, since both processeswould have to loop on the test on Turn infinitely andfail. This would imply Turn = 1 and Turn = 2 at thesame time.

  • page 18

    There is no starvation. This would require one taskto execute its critical section infinitely often and theother task to be stuck in its pre-protocol.

    However this solution can fail in the absence ofcontention. If one process halts in its critical section theother process will always fail in its pre-protocol. Even ifthe processes are guaranteed not to halt, both processesare forced to execute at the same rate. This, in general, isnot acceptable.

    Second proposed solution

    The first attempt failed because both processesshared the same variable.

    var C1:int := 1var C2:int := 1

    process P1do true ->

    Non_Critical_Sectiondo C2 != 1 ->odC1 := 0Critical_SectionC1 := 1

    odend

  • page 19

    process P2do true ->

    Non_Critical_Sectiondo C1 != 1 ->odC2 := 0Critical_SectionC2 := 1

    odend

    This unfortunately violates the mutual exclusionrequirement. To prove this we need to find only oneinterleaving which allows P1 and P2 into their criticalsection simultaneously. Starting from the initial state, wehave:

    P1 checks C2 and finds C2 = 1.P2 checks C1 and finds C1 = 1.P1 sets C1 = 0.P2 sets C2 = 0.P1 enters its critical section.P2 enters its critical section. QED

  • page 20

    Third proposed solution

    The problem with the last attempt is that once thepre-protocol loop is completed you cannot stop a processfrom entering its critical section. So the pre-protocolloop should be considered as part of the critical section.

    var C1:int := 1var C2:int := 1

    process P1do true ->

    Non_Critical_Section # a1C1 := 0 # b1do C2 != 1 -> # c1odCritical_Section # d1C1 := 1 # e1

    odend

    process P2do true ->

    Non_Critical_Section # a2C2 := 0 # b2do C1 != 1 -> # c2odCritical_Section # d2C2 := 1 # e2

    odend

  • page 21

    We can prove that the mutual exclusion property isvalid. To do this we need to prove that the followingequations are invariants.

    ( ) ( ) ( )( ) ( ) ( )

    ( ) ( )( )

    C at c at d at e

    C at c at d at e

    at d at d

    1 0 1 1 1 1

    2 0 2 2 2 2

    1 2 3

    = ≡ ∨ ∨ −= ≡ ∨ ∨ −

    ¬ ∧ −

    eqn

    eqn

    eqn

    at(x) ⇒ x is the next instruction to be executed inthat process.

    Eqn 1 is initially true. Only the b1→c1 and e1→a1transitions can affect its truth. But each of thesetransitions also changes the value of C1.

    A similar proof is true for eqn 2.

    Eqn 3 is initially true, and can only be made false bya c2→d2 transition while at(d1) is true. But by eqn 1,at(d1)⇒C1=0, so c2→d2 cannot occur since thisrequires C1=1. Similar proof for process P2.

    Fourth proposed solution

    The problem with the last proposed solution wasthat once a process indicated its intention to enter itscritical section, it also insisted on entering its criticalsection. What we need is some way for a process torelinquish its attempt if it fails to gain immediate accessto its critical section, and try again.

  • page 22

    var C1:int := 1var C2:int := 1

    process P1do true ->

    Non_Critical_SectionC1 := 0do true ->

    if C2 = 1 -> exit fiC1 :=1C1 := 0

    odCritical_SectionC1 := 1

    odend

    process P2do true ->

    Non_Critical_SectionC2 := 0do true ->

    if C1 = 1 -> exit fiC2 :=1C2 := 0

    odCritical_SectionC2 := 1

    odend

  • page 23

    This proposal has two drawbacks.

    1) A process can be starved. You can find aninterleaving in which a process never gets to enterits critical section.

    2) The program can livelock. This is a form ofdeadlock. In deadlock there is no possibleinterleaving which allows the processes to entertheir critical sections. In livelock, some interleavingsucceed, but there sequences which do not succeed.

    Dekker ’s Algor ithm

    This is a combinations of the first and fourthproposals. The first proposal explicitly passed the rightto enter the critical sections between the processes,whereas the fourth proposal had its own variable toprevent problems in the absence of contention. InDekker’s algorithm the right to insist on entering acritical section is explicitly passed between processes.

    var C1:int := 1var C2:int := 1var Turn:int := 1

  • page 24

    process P1do true ->

    Non_Critical_SectionC1 := 0do true ->

    if C2 = 1 -> exit fiif Turn = 2 ->

    C1 := 1do Turn != 1 -> odC1 := 0

    fiodCritical_SectionC1 := 1Turn := 2

    odend

    process P2do true ->

    Non_Critical_SectionC2 := 0do true ->

    if C1 = 1 -> exit fiif Turn = 1 ->

    C2 := 1do Turn != 2 -> odC2 := 0

    fiodCritical_Section

  • page 25

    C2 := 1Turn := 1

    odend

    This is a solution for mutual exclusion for 2processes.

    Mutual Exclusion for N Processes

    There are many N process mutual exclusionalgorithms; all complicated and relatively slow to othermethods. One such algorithm is the Bakery Algorithm.The idea is that each process takes a numbered ticket(whose value constantly increases) when it want to enterits critical section. The process with the lowest currentticket gets to enter its critical section.

    var Choosing: [N] intvar Number: [N] int

    # Choosing and Number arrays initialised to zero

    process P(i:int)do true ->

    Non_Critical_SectionChoosing [i] := 1Number [i] := 1 + max (Number)

  • page 26

    Choosing := 0fa j := 1 to N ->

    if j != i ->do Choosing [ j] != 0 -> oddo true ->

    if (Number [ j] = 0) or(Number [i] < Number [ j]) or ((Number [i] = Number [ j]) and (i < j)) ->exit

    fiod

    fiafCritical_SectionNumber [i] := 0

    odend

    The bakery algorithm is not practical because:

    a) the ticket numbers will be unbounded if someprocess is always in its critical section, and

    b) even in the absence of contention it is veryinefficient as each process must query the otherprocesses for their ticket number.

  • page 27

    Hardware-Assisted Mutual Exclusion

    If the atomic instructions available allow a load andstore in a single atomic instruction all out problemsdisappear. For example if there is an atomic test and setinstruction equivalent to li := C; C := 1 in oneinstruction, then we could have mutual exclusion asfollows.

    var C:int := 0

    process Pvar li:intdo true ->

    Non_Critical_Sectiondo li != 0 -> test_and_set (li) odCritical_SectionC := 0

    odend

    A similar solution exists with atomic exchangeinstructions.