compiling functional languages · this focus source optimization desugaring type-checking...

44
Compiling functional languages Lectures 1-2 (compressed!) Johan Nordlander <http://www.cse.chalmers.se/edu/year/2011/course/CompFun/ >

Upload: others

Post on 17-Apr-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Compiling functional languages

Lectures 1-2 (compressed!)

Johan Nordlander

<http://www.cse.chalmers.se/edu/year/2011/course/CompFun/>

Page 2: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

The compiler pipeline

source optimizationdesugaringtype-

checking

intermediate code

optimization

register allocation

instruction selection

peephole optimization

assembling&

linking

parsinglexingpre-processing

static checking

intermediate code

generation

text

text

toke

ns

synt

axtr

ee

inte

rmed

iate

code

bina

ries

asse

mbl

yco

de

Page 3: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

This focus

source optimizationdesugaringtype-

checking

intermediate code

optimization

register allocation

instruction selection

peephole optimization

assembling&

linking

parsinglexingpre-processing

static checking

intermediate code

generation

text

text

toke

ns

synt

axtr

ee

inte

rmed

iate

code

bina

ries

asse

mbl

yco

de

Page 4: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

A common theme• Manipulation of syntax trees — schematically:

Input: parse :: String -> SyntaxTreeVerification / addition of missing information: staticCheck :: SyntaxTree -> Bool typeInference :: SyntaxTree -> SyntaxTreeMisc. transformations, possibly changing representation: desugar :: SyntaxTree -> SyntaxTree translate :: SyntaxTree -> CoreSyntaxTree optimize :: CoreSyntaxTree -> CoreSyntaxTreeOutput: codegen :: CoreSyntaxTree -> String

Page 5: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Source-to-source transformations

• Rewriting syntax trees with the purpose of

- Removing redundant constructs

- Making implicit information explicit

- Choosing more efficient representations

- Normalizing form before code generation

• Can be distributed over different passes, run in many different orders

Page 6: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Our input languageprog ::= module K where dsd ::= p rhs | ms | ...m ::= x ps rhs | x ps rhs where dsrhs ::= = e | grhssgrhs ::= | e = ee ::= x | K | lit | e op e | e e | - e | \ps -> e | let ds in e | if e then e else e | case e of alts | (es) | [es] | [e..e] | [e,e..e] | [ e | stmts ] | (e op) | (op e) | K { fs } | e { fs }p ::= x | K ps | lit | - p | p op p | (ps) | [ps] | K { fps } | _ | x@palt ::= p rhs | p rhs where dsstmt ::= p <- e | e | let dsf ::= x = e fp ::= x = pop ::= x | K

Page 7: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Our core language

prog ::= module K where dsd ::= x = ee ::= x | K es | lit | e e | \x -> e | let ds in e | case e of altsalt ::= p -> e p ::= lit | K xs

Page 8: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

A transformation

• Removing operator sections:...translate (RSect op e) = do x <- newVar e <- translate e return (Fun x (Op x e))

• Using "concrete" abstract syntax:...translate (op e) = \x -> x op e where x is a new variable

Page 9: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Simple transformations• Translating lists:

translate [ e1, ..., en ] = e1 : ... : en : []

• Translating enumerations: translate [ e1 .. e2 ] = enumFromTo e1 en

• Translating infix applications: translate ( e1 op en ) = op e1 en

• Translating if-expressions: translate ( if e1 then e2 else e3 ) = case e1 of True -> e2; False -> e3

Page 10: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

List comprehensionstranslate [ e | e', stmts ] = if e' then translate [ e | stmts ] else []

translate [ e | let ds, stmts ] = let ds in translate [ e | stmts ]

translate [ e | p <- e', stmts ] = let x p = translate [ e | stmts ] x _ = [] in concat (map x e') where x is a new variable

translate [ e | ] = e

Page 11: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Pattern-matching

translate ( case e of p1 -> e1 , ... , pn -> en ) =

let x = e in match [x] [ \p1 -> e1 , ... , \pn -> en ] (error "pmc")

where x is a new variable

translateDecl ( f ps1 = e1 , ... , f psn = en ) =

f = \xs -> match xs [ \ps1 -> e1 , ... , \psn -> en ] (error "pmc")

where xs are new variables (of same length as each psi)

Page 12: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Function "match"

match xs (funs1 ++ ... ++ funsn) e0

= match xs funs1 ( ... (match xs funsn e0) ...)

match (x:xs) [ \y1 ps1 -> e1 , ... , \yn psn -> en ] e0

= match xs [ \ps1 -> [x/y1]e1 , ... , \psn -> [x/yn]en ] e0

match [] [ \ -> e1 , ... , \ -> en ] e0

= e1 ║ ... ║ en ║ e0

the mix rule

the var rule

the null rule

Page 13: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Function "match"match (x:xs) (funs1 ++ ... ++ funsn) e0

= (case x of K1 ys1 -> match (ys1 ++ xs) (decon K1 funs1) fail

... Kn ysn -> match (ysn ++ xs) (decon Kn funsn) fail) ║ e0

where ys1 ... ysn are new variable lists of correct length

decon K [ \(K qs1) : ps1 -> e1 , ... , \(K qsm) : psm -> em ]

= [ \qs1++ps1 -> e1 , ... , \qsm++psm -> em ]

the con rule

Page 14: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

fail and fatbar (║)

New abstract syntax forms introduced during translation of of pattern-matching.

Semantics:

fail ║ e = e e ║ fail = e e1 ║ e2 = e1 if e1 cannot evaluate to fail e1 ║ e2 = [e2/fail]e1 (if functions can't return fail)

Page 15: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Pattern-match example

zip [] bs = []zip (a:as) [] = []zip (a:as) (b:bs) = (a,b) : zip as bs

Page 16: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Pattern-match example

zip = \x1 x2 -> match [x1,x2] [ \[] bs -> [], \(a:as) [] -> [], \(a:as) (b:bs) -> (a,b) : zip as bs ] (error "pmc")

con rule applies

Page 17: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Pattern-match example

zip = \x1 x2 -> (case x1 of [] -> match [x2] [ \bs -> [] ] fail x3:x4 -> match [x3,x4,x2] [ \a as [] -> [], \a as (b:bs) -> (a,b) : zip as bs ] fail ) ║ (error "pmc")

var rule applies

Page 18: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Pattern-match example

zip = \x1 x2 -> (case x1 of [] -> match [] [ \ -> [] ] fail x3:x4 -> match [x2] [ \[] -> [], \(b:bs) -> (x3,b) : zip x4 bs ] fail ) ║ (error "pmc")

null rule applies

Page 19: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Pattern-match example

zip = \x1 x2 -> (case x1 of [] -> [] ║ fail x3:x4 -> match [x2] [ \[] -> [], \(b:bs) -> (x3,b) : zip x4 bs ] fail ) ║ (error "pmc")

con rule applies

Page 20: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Pattern-match example

zip = \x1 x2 -> (case x1 of [] -> [] ║ fail x3:x4 -> (case x2 of [] -> match [] [ \ -> [] ] fail x5:x6 -> match [x5,x6] [ \b bs -> (x3,b) : zip x4 bs ] fail ) ║ fail ) ║ (error "pmc")

null rule applies

Page 21: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Pattern-match example

zip = \x1 x2 -> (case x1 of [] -> [] ║ fail x3:x4 -> (case x2 of [] -> [] ║ fail x5:x6 -> match [x5,x6] [ \b bs -> (x3,b) : zip x4 bs ] fail ) ║ fail ) ║ (error "pmc")

var rule applies

Page 22: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Pattern-match example

zip = \x1 x2 -> (case x1 of [] -> [] ║ fail x3:x4 -> (case x2 of [] -> [] ║ fail x5:x6 -> match [] [ \ -> (x3,x5) : zip x4 x6 ] fail ) ║ fail ) ║ (error "pmc")

null rule applies

Page 23: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Pattern-match example

zip = \x1 x2 -> (case x1 of [] -> [] ║ fail x3:x4 -> (case x2 of [] -> [] ║ fail x5:x6 -> (x3,x5) : zip x4 x6 ║ fail ) ║ fail ) ║ (error "pmc")

semantics of ║

Page 24: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Pattern-match example

zip = \x1 x2 -> (case x1 of [] -> [] x3:x4 -> (case x2 of [] -> [] x5:x6 -> (x3,x5) : zip x4 x6

) ║ fail ) ║ (error "pmc")

semantics of ║

Page 25: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Pattern-match example

zip = \x1 x2 -> (case x1 of [] -> [] x3:x4 -> case x2 of [] -> [] x5:x6 -> (x3,x5) : zip x4 x6

) ║ (error "pmc")

semantics of ║

Page 26: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Pattern-match example

zip = \x1 x2 -> case x1 of [] -> [] x3:x4 -> case x2 of [] -> [] x5:x6 -> (x3,x5) : zip x4 x6

Page 27: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Summary

• Goal: transform rich abstract syntax trees into a simpler but equivalent syntactic subset

• Means: local rewrite rules of varying difficulty

• Challenge 1: define rules for full input syntax(see the Haskell Report ch. 3 for inspiration!)

• Challenge 2: apply rules to every subtree

• Challenge 3: organize into one or more passes

Page 28: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Recall our core language

prog ::= module K where dsd ::= x = ee ::= x | K es | lit | e e | \x -> e | let ds in e | case e of altsalt ::= lit -> e | K xs -> e

Good for analysis and optimization, but still not directly mappable to C

Page 29: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Restrict form even furtherprog ::= module K where dsd ::= f = \xs -> b | x = K es | x = eb ::= let ds in b | case x of alts | b ║ b | fail | ee ::= x | f | x es | f es | lit | Kalt ::= K -> b | K xs -> b | lit -> b

Main difference: expression syntax now depends on position

• The right-hand side of a declaration (d)• The body of a function (b)• Arguments to functions and constructors (e)

Minor differences: marking known functions (f)+ multi-argument abstraction & application

Page 30: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

C correspondenceDeclarations: f = \xs -> b A C function declaration (if on the top level) x = K es A malloc() call followed by assignments x = e A single assignment

Function bodies: let ds in b A sequence of assignments (if ds not recursive) case x of alts A switch statement e A return statement fail and e║e break and sequential composition

Expressions: x Variable x f es A function call to f (if arity matches) lit Literal lit (if not a string literal) Ki Integer literal i x es and f Deferred...

Page 31: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Data layout typedef int *Ptr;

Basic assumptions: (Ptr)(int)x = x (int)(Ptr)y = y

Construction: x = Ki e1 ... en Ptr x = malloc((n+1)*sizeof(int)); x[0] = i; x[1] = (int)e1; ... x[n] = (int)en;

Deconstruction: case x of switch (x[0]) { .... ... Ki x1 ... xn -> bodyi case i: { Ptr x1 = (Ptr)x[1]; ... Ptr xn = (Ptr)x[n]; bodyi }

Page 32: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Nullary constructorsCould just use the generic form: x = Ki Ptr x = malloc(sizeof(int)); x[0] = i;

case x of switch (x[0]) { ... ... Ki -> body case i: { body }

For better memory efficiency, encode as small pointer: x = Ki x = (Ptr) i; case x of switch ((int)x) { Ki -> bodyi case i: { bodyi } ... ... Kj x1 ... xn -> bodyj default: switch (x[0]) { case j: { Ptr x1 = (Ptr)x[1]; ... Ptr xn = (Ptr)x[n]; bodyj }

Page 33: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Single constructorsCould just use the generic form: x = K0 e1 ... en Ptr x = malloc((n+1)*sizeof(int)); x[0] = 0; x[1] = (int)e1; ... x->arg[n] = (int)en;

case x of switch (x[0]) { K0 x1 ... xn -> bodyi case 0: { Ptr x1 = (Ptr)x[1]; ... Ptr xn = (Ptr)x[n]; body0 }

For better efficiency, encode without a tag: x = K0 e1 ... en Ptr x = malloc(n*sizeof(int)); x[0] = (int)e1; ... x[n-1] = (int)en;

case x of Ptr x1 = (Ptr)x[0]; ... K0 x1 ... xn -> body0 Ptr xn = (Ptr)x[n-1]; body0

Page 34: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

After normalization

• A program form corresponding to C syntax, but with some serious caveats:

- Only function declarations may be recursive

- Function declarations must be on top level only

- Function calls must match arity of callee

- Function names must not be used as values

- Unknown functions cannot be called

Page 35: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Only top-level functions

• Naive idea: just move the declarations!

• Problem: loss of local scopef = \a -> let g = \b -> a + b g = \b -> a + b in g 7 f = \a -> g 7

• A way forward: first turn free variables into parameters (so called lambda-lifting)f = \a -> let g = \a b -> a + b g = \a b -> a + b in g a 7 f = \a -> g a 7

Page 36: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Anonymous functions• Our latest expression grammar:

e ::= x | x es | f | f es | lit | K

• Must be supported – not a functional language otherwise!

• Requires the concept of closures!

Creating oneCalling one

Creating one if arguments are too few

Page 37: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Closures• The generic representation of functions: a

function pointer with a list of free variables

• The limits of lambda-lifting: g = \a -> let f = \b -> a + b f = \a b -> a + b in h f g = \a -> h (f a) h = \x -> x 7 h = \x -> x 7

• Closures can represent partial applications, even in the presence of free variables

• Nevertheless, lambda-lifting before closure-conversion simplifies the presentation

Too few arguments regardless!

Page 38: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Closure-conversion• Assume a lambda-lifted f = \x1 ... xn -> e

closureConvert f = CL f0 nclosureConvert (f e1 ... em) = | m < n = CL fm (n-m) e1 ... em ... where fm is a new top-level function fm = \xthis xm+1 ... xn -> case xthis of CL _ _ y1 ... ym -> f y1 ... ym xm+1 ... xn

closureConvert (x e1 ... em) = case x of CL f n | m == n -> f x e1 ... em

...

Page 39: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Closures• CL is an ordinary constructor name (a K) and a

closure term is just a constructor application that references an f

• After closure conversion, these terms will be our only references to function names outside function calls

• Note: static typing will actually require a CLk for each closure arity k (a well as existentials and subtyping), but we're past type-checking here!

Page 40: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Closure-conversion• Example before and after lambda-lifting:

g = \a -> let f = \b -> a + b f = \a b -> a + b in h f g = \a -> h (f a)h = \x -> x 7 h = \x -> x 7

• And after closure-conversion: f = \a b -> a + b g = \a -> h (CL f1 1 a) h = \x -> case x of CL funknown 1 -> funknown x 7

f1 = \xthis x2 -> case xthis of CL _ _ y1 -> f y1 x2

• But we're still ignoring arity mismtaches...

Page 41: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Matching arities

• Strategies for matching function arity with the number of arguments:

- "Push/enter": arguments pushed and code entered unconditionally, matching done by called function

- "Eval/apply": function evaluated and asked for arity by caller, then only applied if enough arguments are present

Page 42: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Checking arities• Assuming f = \x1 ... xn -> e

closureConvert f = CL f0 n

closureConvert (f e1 ... em) = | m == n = f e1 ... em

| m < n = CL fm (n-m) e1 ... em

| m > n = applym-n (f e1 ... en) en+1 ... em

where each applyk is a run-time system function TBD

• Note: checks are done at compile-time

(eval/apply)

Page 43: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Checking arities

• The full dynamic case (checks at run-time!):closureConvert (x e1 ... em) = applym x e1 ... em

applym = \xthis x1 ... xm -> case xthis of CL f n | m == n -> f xthis x1 ... xm

| m < n -> CL papn-m,m (n-m) xthis x1 ... xm

| m > n -> applym-n (f xthis x1 ... xn) xn+1 ... xm

papk,m = \xthis x1 ... xk -> case xthis of CL _ _ ythat y1 ... ym -> applym+k ythat y1 ... ym x1 ... xk

(eval/apply)

Page 44: Compiling functional languages · This focus source optimization desugaring type-checking intermediate code optimization register allocation instruction selection peephole optimization

Summary• C code generation involves

1) Normalization (pretty straightforward)2) Lamda-lifting (known functions)3) Closure conversion (anonymous/partial apps)

• 3) supersedes 2) but is generally less efficient

• Challenge: avoid the need for special papk,m functions for every combination of k and m

• Idea: make m a closure parameter as well, and write a generic papk,m directly in assembly code