csep505: programming languages lecture 8: wrap-up types; start concurrency dan grossman winter 2009

CSEP505: Programming LanguagesLecture 8: Wrap-Up Types; Start Concurrency

Dan Grossman

Winter 2009

26 February 2009 CSE P505 Winter 2009 Dan Grossman 2

Where were we

• Covered the use of type variables to increase expressiveness– Generics = “for all”– Abstract types in interfaces = “there exists”– For both, type variables can be “any type” but the same type

variable in the same scope must be the “same type”

• Now– 1 more place existentials come up (implementing closures)– ML-style type inference– Some odds and ends so I’m not lying– Combining parametric and subtype polymorphism

• Then: PL support for concurrency and parallelism


Closures & Existentials

• There’s a deep connection between and how closures are (1) used and (2) compiled

• Callbacks are the canonical example:

(* interface *)val onKeyEvent : (int->unit)->unit

(* implementation *)let callbacks : (int->unit) list ref = ref []

let onKeyEvent f = callbacks := f::(!callbacks)

let keyPress i = List.iter (fun f -> f i) !callbacks


The connection

• Key to flexibility: – Each callback can have “private fields” of different types– But each callback has type int->unit– There exists an environment of some type

• In C, we don’t have closures or existentials, so we use void* (next slide)– Clients must downcast their environment– Clients must assume library passes back correct

environment


Now in C

/* interface */typedef struct{void* env; void(*f)(void*,int);}* cb_t;void onKeyEvent(cb_t);

/* implementation (assuming a list library) */list_t callbacks = NULL;void onKeyEvent(cb_t cb){ callbacks=cons(cb, callbacks);}void keyPress(int i) { for(list_t lst=callbacks; lst; lst=lst->tl) lst->hd->f(lst->hd->env, i);}

/* clients: full of casts to/from void* */


The type we want

• The cb_t type should be an existential (not a forall):

• Client does a “pack” to make the argument for onKeyEvent– Must “show” the types match up

• Library does an “unpack” in the loop– Has no choice but to pass each cb_t function pointer its

own environment• See Cyclone if curious (syntax isn’t pretty though)

/* interface using existentials (not C) */typedef struct{α. α env; void(*f)(α, int);}* cb_t;

void onKeyEvent(cb_t);


Where are we

• Done: understand subtyping• Done: understand “universal” types and “existential” types

• Now: making universal types easier to use but less powerful– Type inference– Reconsider first-class polymorphism / polymorphic recursion– Polymorphic-reference problem– Combining parametric and subtype polymorphism


The ML type system

• Called “Algorithm W” or “Hindley-Milner inference”• In theory, inference “fills out explicit types”

– Complete if finds an explicit typing whenever one exists• In practice, often merge inference and checking

An algorithm best understood by example…– Then describe the type system for which it infers types– Yes, this is backwards: how does it do it, before defining it


Example #1

let f x = let (y,z) = x in (abs y) + z


Example #2

let rec sum lst = match lst with [] -> 0 |hd::tl -> hd + (sum tl)


Example #3

let rec length lst = match lst with [] -> 0 |hd::tl -> 1 + (length tl)


Example #4

let compose f g x = f (g x)


Example #5

let rec funnyCount f g lst1 lst2 =

match lst1 with

[] -> 0

| hd::tl -> (if (f hd) then 1 else 0)

+ funnyCount g f lst2 tl

(* does not type-check:

let useFunny =

funnyCount

(fun x -> x=4)

not

[2;4;4]

[true;false] *)


More generally

• Infer each let-binding or toplevel binding in order– Except for mutual recursion (do all at once)

• Give each variable a fresh “constraint variable”• Add constraints for each subexpression

– Very similar to typing rules• Circular constraints fail (so x x never typechecks)• After inferring let-body, generalize (unconstrained constraint

variables become type variables)

Note: Actual implementations much more efficient than “generate big pile of constraints then solve” – (can unify eagerly)


What this infers

“Natural” limitations of this algorithm: Universal types, but

1. Only let-bound variables get polymorphic types– This is why let is not sugar for fun in Caml

2. No first-class polymorphism (all foralls all the way to the left)

3. No polymorphic recursion

Unnatural limitation imposed for soundness reasons we will see:

4. “Value restriction”: let x = e1 in e2 gives x a polymorphic type only if e1 is a value or a variable– Includes e1 being a function, but not a partial application– Caml has recently relaxed this slightly in some cases


Why?

• These restrictions are usually tolerable• Polymorphic recursion makes inference undecidable

– Proven in 1992• First-class polymorphism makes inference undecidable

– Proven in 1995• Note: Type inference for ML efficient in practice, but not in

theory: A program of size n and run-time n can have a type of size O(2^(2^n))

• The value restriction is one way to prevent an unsoundness with references


Given this…

Subject to these 4 limitations, inference is perfect:

• It gives every expression the most general type it possibly can– Not all type systems even have most-general types

• So every program that can type-check can be inferred– That is, explicit type annotations are never necessary– Exceptions are related to the “value restriction”

• Make programmer specify non-polymorphic type


Going beyond

“Good” extensions to ML still being considered

A case study for “what matters” for an extension:

• Soundness: Does the system still have its “nice properties”?• Conservatism: Does the system still typecheck every program

it used to?• Power: Does the system typecheck “a lot” of new programs?• Convenience: Does the system not require “too many” explicit

annotations?


Where are we


• Now: making universal types easier to use but less powerful– Type inference– Reconsider first-class polymorphism / polymorphic recursion– Polymorphic-reference problem

• Then: Bounded parametric polymorphism– Synergistic combination of universal and subtyping

• Then onto concurrency (more than enough types!)


Polymorphic references

A sound type system cannot accept this program:

let x = ref [] in x := 1::[];match !x with _ -> () | hd::_ -> hd ^ “gotcha”

But it would assuming this interface:

type ’a refval ref : ’a -> ’a refval ! : ’a ref -> ’aval := : ’a ref -> ’a -> unit


Solutions

Must restrict the type system

Many ways exist:

1. “Value restriction”: ref [] cannot have a polymorphic type – syntactic look for ref type not enough

2. Let ref [] have type (α.α list) ref – not useful and not an ML type

3. Tell the type system “mutation is special” – not “just another library interface”


Where are we


• Now: making universal types easier to use but less powerful– Type inference– Reconsider first-class polymorphism / polymorphic recursion– Polymorphic-reference problem– Combining parametric and subtype polymorphism


Why bounded polymorphism

Could one language have τ1 ≤ τ2 and α. τ ?– Sure! They’re both useful and complementary– But how do they interact?

1. When is α. τ1 ≤ β.τ2 ?

2. What about bounds?

let dblL1 x = x.l1 <- x.l1*2; x– Subtyping: dblL1 : {l1=int} → {l1=int}

• Can pass subtype, but result type loses a lot– Polymorphism: dblL1 : α.α → α

• Lose nothing, but body doesn’t type-check


What bounded polymorphism

The type we want: dblL1 : α≤{l1=int}.α→α

Java and C# generics have this (different syntax)

Key ideas:• A bounded polymorphic function can use subsumption as

specified by the constraint• Instantiating a bounded polymorphic function must satisfy the

constraint


Subtyping revisited

When is α≤τ1.τ2 ≤ α≤τ3.τ4 ?• Note: already “alpha-converted” to same type variable

Sound answer:• Contravariant bounds (τ3≤τ1)• Covariant bodies (τ2≤τ4)Problem: Makes subtyping undecidable (1992; surprised many)

Common workarounds:• Require invariant bounds (τ3≤τ1 and τ1≤τ3)• Some ad hoc approximation


Onward

• That’s the end of the “types part” of the course– Which wasn’t all about types– And other parts don’t totally ignore types


Concurrency

• PL support for concurrency a huge topic– And increasingly important (used to skip entirely)

• We’ll just do explicit threads plus– Shared memory (barriers, locks, and transactions)– Synchronous message-passing (CML)– Transactions last (wrong logic, but CML is hw5)

• Skipped topics– Futures– Asynchronous methods (joins, tuple-spaces, …)– Data-parallel (vector) languages– …


Threads

Code for a thread is in a closure (with hidden fields) and

Thread.create actually spawns the thread.

Most languages makes the same distinction, e.g., Java:• Create a Thread object (just the code and data)• Call its run method to actually spawn the thread

(* thread.mli; compile with –vmthread threads.cma ON THE LEFT *)

type t (* a thread handle *)val create : (’a->’b) -> ’a -> t (*run new thread*)val self : unit -> t (* which thread am I? *)…

High-level: “Communicating sequential processes”

Low-level: “Multiple stacks plus communication”


Why use threads?

Why? Any one of:– Performance (multiprocessor or mask I/O latency)– Isolation (separate errors or responsiveness)– Natural code structure (1 stack not enough)

It’s not just performance.

Useful terminology not widely enough known:• Concurrency: Respond to external events in a timely fashion• Parallelism: Increase throughput via extra computational

resources

The current Caml implementation doesn’t support parallelism– F# does (via the CLR)– Hard part is concurrent garbage collection


Preemption

• We’ll assume pre-emptive scheduling– Running thread can be stopped whenever– yield : unit->unit a semantic no-op (a “hint”)

• Because threads may interleave arbitrarily and communicate, execution is non-deterministic– With shared memory, via reads/writes– With message passing, via shared channels


A “library”?

Threads cannot be implemented as a library Hans-J. Boehm, PLDI2005

• Does not mean you need new language constructs– thread.mli, mutex.mli, condition.mli is fine

• Does mean the compiler must know threads exist• (See paper for more compelling examples, e.g., C bit-fields)

int x=0, y=0;void f1(){ if(x) ++y; } void f2(){ if(y) ++x; }/* main: run f1, f2 concurrently *//* can compiler implement f2 as ++x; if(!y) --x; */


Communication

If threads do nothing other threads “see”, we are done– Best to do as little communication as possible– E.g., don’t mutate shared data unnecessarily – or hide

mutation behind easier-to-use interfaces

One way to communicate: Shared memory• One thread writes to a ref, another reads it• Sounds nasty with pre-emptive scheduling• Hence synchronization mechanisms

– Taught in O/S for historical reasons!– Fundamentally about restricting interleavings


Join

“Fork-join” parallelism• Simple approach good for “farm out independent

subcomputations, then merge results”

(*suspend caller until/unless arg terminates*)val join : Thread.t -> unit

Common pattern (in C syntax; Caml also simple):

data_t data[N];result_t results[N];thread_t tids[N];for(i=0; i < N; ++i) tids[i] = create(f,&data[i], &results[i]);for(i=0; i < N; ++i) join(tids[i]);// now use/merge results


Locks (a.k.a. mutexes)

• Caml locks do not have two common features:– Reentrancy

(changes semantics of lock)– Banning non-holder release

(changes semantics of unlock)• Also want condition variables (see condition.mli)

– also known as wait/notify or wait/pulse

(* mutex.mli *)type t (* a mutex *)val create : unit -> tval lock : t -> unit (* may block *)val unlock : t -> unit


Using locks

Among infinite correct idioms using locks (and more incorrect ones), the most common:

• Determine what data must be “kept in sync”• Always acquire a lock before accessing that data and release it

afterwards• Have a partial order on all locks and if a thread holds m1 it can

acquire m2 only if m1<m2

Coarser locking (more data with same lock) trades off parallelism with synchronization– Related performance-bug: false sharing– In general, think about, “the object to lock mapping”


Example

type acct = { lk : Mutex.t; bal : float ref; avail : float ref }

let mkAcct () = {lk=Mutex.create(); bal=ref 0.0; avail=ref 0.0}

let get a f = (* return type unit *) Mutex.lock a.lk; (if(!(a.avail) > f) then (a.bal := !(a.bal) -. f; a.avail := !(a.avail) -.f)); Mutex.unlock a.lk

let put a f = (* return type unit *) Mutex.lock a.lk; a.bal := !(a.bal) +. f; a.avail := !(a.avail) +.(if f<500. then f else 500.); Mutex.unlock a.lk


Getting it wrong

Races result from too little synchronization• Data races: simultaneous read-write or write-write of same data

– Lots of PL work in last 10 years on types and tools to prevent/detect

– Provided language has some guarantees (not C++), may not be a bug

• Canonical example: parallel search and “done” bits• Higher-level races much tougher for the PL to help

– Amount of non-determinism is problem-specific

Deadlock results from too much synchronization• Cycle of threads waiting for each other• Easy to detect dynamically, but then what?


The evolution problem

Even if you get locking right today, tomorrow’s code change can have drastic effects

• Every bank account has its own lock works great until you want an “atomic transfer” function– One lock at a time: race– Both locks first: deadlock with parallel untransfer

• Same idea in JDK1.4 (documented in 1.5):

synchronized append(StringBuffer sb) { int len = sb.length(); if(this.count + len > this.value.length) this.expand(…); sb.getChars(0,len,this.value,this.count); …}// length and getChars also synchronized


Where are we

• Thread creation

• Communication via shared memory– Synchronization with join, locks

• Message passing a la Concurrent ML– Very elegant– First done for Standard ML, but available in several

functional languages– Can wrap synchronization abstractions to make new ones – In my opinion, quite under-appreciated

• Back to shared memory for software transactions


The basics

• Send and receive return “events” immediately• Sync blocks until “the event happens”• Separating these is key in a few slides

(* event.mli; Caml’s version of CML *)type ’a channel (* messages passed on channels *)val new_channel : unit -> ’a channel

type ’a event (* when sync’ed on, get an ’a *)val send : ’a channel -> ’a -> unit eventval receive : ’a channel -> ’a eventval sync : ’a event -> ’a


Simple version

Note: In SML, the CML book, etc:send = sendEvt

receive = recvEvtsendNow = sendrecvNow = recv

let sendNow ch a = sync (send ch a) (* block *)let recvNow ch = sync (receive ch) (* block *)

Helper functions to define blocking sending/receiving• Message sent when 1 thread sends, another receives• One will block waiting for the other


Example

Make a thread to handle changes to a bank account• mkAcct returns 2 channels for talking to the thread• More elegant/functional approach: loop-carried state

type action = Put of float | Get of floattype acct = action channel * float channellet mkAcct () = let inCh = new_channel() in let outCh = new_channel() in let bal = ref 0.0 in (* state *) let rec loop () = (match recvNow inCh with (* blocks *) Put f -> bal := !bal +. f; | Get f -> bal := !bal -. f);(*allows overdraw*) sendNow outCh !bal; loop () in Thread.create loop (); (inCh,outCh)


Example, continued

get and put functions use the channels

let get acct f = let inCh,outCh = acct in sendNow inCh (Get f); recvNow outChlet put acct f = let inCh,outCh = acct in sendNow inCh (Put f); recvNow outCh

type acct val mkAcct : unit -> acctval get : acct->float->floatval put : acct->float->float

Outside the module, don’t see threads or channels!!

– Cannot break the communication protocol


Key points

• We put the entire communication protocol behind an abstraction

• The infinite-loop-as-server idiom works well– And naturally prevents races– Multiple requests implicitly queued by CML implementation

• Don’t think of threads like you’re used to– “Very lightweight”

• Asynchronous = spawn a thread to do synchronous– System should easily support 100,000 threads– Cost about as much space as an object plus “current stack”

• Quite similar to “actors” in OOP– Cost no time when blocked on a channel– Real example: A GUI where each widget is a thread


Simpler example

• A stream is an infinite set of values– Don’t compute them until asked– Again we could hide the channels and thread

let squares = new_channel()let rec loop i = sendNow squares (i*i); loop (i+1)let _ = create loop 1

let one = recvNow squareslet four = recvNow squareslet nine = recvNow squares…


So far

• sendNow and recvNow allow synchronous message passing

• Abstraction lets us hide concurrency behind interfaces

• But these block until the rendezvous, which is insufficient for many important communication patterns

• Example: add : int channel -> int channel -> int– Must choose which to receive first; hurting performance or

causing deadlock if other is ready earlier

• Example: or : bool channel -> bool channel -> bool– Cannot short-circuit

• This is why we split out sync and have other primitives


The cool stuff

• choose: when synchronized on, block until 1 of the events occurs• wrap: An event with the function as post-processing

– Can wrap as many times as you want• Note: Skipping a couple other key primitives (e.g., for timeouts)

type ’a event (* when sync’ed on, get an ’a *)val send : ’a channel -> ’a -> unit eventval receive : ’a channel -> ’a eventval sync : ’a event -> ’a channel

val choose : ’a event list -> ’a eventval wrap : ’a event -> (’a -> ’b) -> ’b event


“And from or”

• Choose seems great for “until one happens”• But a little coding trick gets you “until all happen”• Code below returns answer on a third channel

let add in1 in2 out = let ans = sync(choose[ wrap (receive in1) (fun i -> sync (receive in2) + i); wrap (receive in2) (fun i -> sync (receive in1) + i)]) in sync (send out ans)


Another example

let or in1 in2 = let ans = sync(choose[ wrap (receive in1) (fun b -> b || sync (receive in2)); wrap (receive in2) (fun b -> b || sync (receive in1))]) in sync (send out ans)

• Not blocking in the case of inclusive or takes some more work– Spawn a thread to receive the second input (and ignore it)


Circuits

If you’re an electrical engineer:• send and receive are ends of a gate• wrap is combinational logic connected to a gate • choose is a multiplexer (no control over which)

So after you wire something up, you sync to say “wait for communication from the outside”

And the abstract interfaces are related to circuits composing

If you’re a UNIX hacker:• UNIX select is “sync of choose”• A pain that they can’t be separated


Remaining comments

• The ability to build bigger events from smaller ones is very powerful

• Synchronous message passing, well, synchronizes

• Key by-design limitation is that CML supports only point-to-point communication

• By the way, Caml’s implementation of CML itself is in terms of queues and locks– Works okay on a uniprocessor

csep505: programming languages lecture 8: wrap-up types; start concurrency dan grossman winter 2009

Documents