strictness-unboxed explained
TRANSCRIPT
Lazy Evaluation in Haskell
In the late ’70s and early ’80s, .... A series of seminal publications ignited an explosion of interest in the idea of lazy (or non-strict, or call-by-need) functional languages as a vehicle for writing serious programs.
A History of Haskell: Being Lazy With Class
● Lazy evaluation was once a hot research topic in academic world, and that founded the design of Haskell.
● There are Data.ByteString and Data.ByteString.Lazy, why?
But what does this exactly mean?
Let’s start from a metaphor
● You are helping your starving colleagues at the office to buy their lunch. You go to a McDonald’s, head to the counter, and order a bunch of things and make it to-go.
● Then the clerk gives you this big paper bag. You didn’t bother to check and just take it and run.
● Then you are doing non-strict evaluation!
You are in the office now
● Evaluation begins.● Wait! what’s your definition of
evaluation? In this metaphor, one step of evaluation is to open a bag or open a box
● Let’s see what would happen!
Non-strict semanticYou read on the HaskellWiki that strict semantic is
And non-strict semantic is
The symbol of upside-down T is called “bottom”. It is something undefined, or non-terminating program. In this case, just giving a finger. (See it looks like a finger, right?)
Non-strict semanticYou read it on HaskellWiki that strict semantic is
And non-strict semantic is
Now you can tell the difference between non-strict and strict.
Evaluate the bags at the counter, and catch the error. Finger sent!
No evaluation at the counter. Happy face.
Back to Haskell● In the example, the evaluation means either open bags or open boxes.
What is the “evaluation” in Haskell?● What’s the difference between non-strict and lazy?
Evaluation
1+1+1+1+1= 1+1+1+2= 1+1+3= 1+4= 5
Each step is an evaluation step. Or another fancy name, called “reduction”
Weak Head Normal FormWhat does this alien word mean?
To better explain it, let’s rewrite the last example a bit.
In Haskell, ‘+’ is a function, so 1+1+1+1+1 is actually
+(1, +(1, +(1, +(1, 1))
Head Normal FormThe form that can’t be further evaluated if we only do evaluation at the “HEAD” position
+(1, +(1, +(1, +(1, 1))
Head
Outermost Bag
Here the head normal form = normal form
Trouble
(\x -> 1) ((fix (+1))
If we don’t evaluate at the head position first, then we are in trouble.
If we don’t open the outermost bag, maybe there would be infinite hamburgers inside!
Trouble
fix (+1)
But even if we evaluate at the head position, we are still not guaranteed to be fully evaluated in Haskell.
Head normal form doesn’t apply to Haskell in general (not for arbitrary terms)
Weak Head Normal FormWeak = “We are not guaranteed”Weak Head Normal Form = “We only evaluate at the head position, and only evaluate one step. To evaluate further, we are not guaranteed what would happen.”
Schrödinger’s Filet of Fish: Filet of Fish boxes could contain a Big Mac! We are not guaranteed unil we open it.
ThunkThunk is the expression that could still be reduced. (There are still bags!)
1+1+1+1+1
We are used to think that the above would be computed to value 5, but not for Haskell. It is what it is: (1+1+1+1+1)
Non-Strict vs Lazy● Non-strict is semantic, by definition it is something not equal to strict.● Strategy could be many, and lazy is just one of them.
Call-by-Need: Not evaluated until it is needed. It is the so called “lazy-evaluation”
Call-by-Name: a thunk is copied to every place inside the function body.
f x = x + x
f (1+1+1) => (1+1+1) + (1+1+1)
f (1+1+1)
call-by-name call-by-need
Non-Strict vs LazyCall-by-Need: Not evaluated until it is needed. It is the so called “lazy-evaluation”
Call-by-Name: a thunk is copied to every place inside the function body.
f x = x + x
f (1+1+1) => (1+1+1) + (1+1+1)=> 3 + 3
call-by-name call-by-need
call-by-value
f (1+1+1)=> f (3)=> 3+3
f (1+1+1) => let x = (1+1+1)=> x = 3 => therefore 3+3
Back to Haskell: sum
sum [] = 0sum (x:xs) = x + sum xs
Not tail recursion! It would create a stack frame for each recursive call.
sum’
sum’ acc [] = accsum’ acc (x:xs) = sum’ (acc+x) xs
This would not be reduced by default
It is tail recursion now, but still has a problem
sum’sum’ 0 [1,2,3,4]= sum’ (0+1) [2,3,4]= sum’ ((0+1)+2) [3,4]= sum’ (((0+1)+2)+3) [4]= sum’ ((((0+1)+2)+3)+4) []= ((((0+1)+2)+3)+4) = (((1+2)+3)+4)= ((3+3)+4)= (6+4)= 10
When the list is large enough, this would still cause stack overflow.
seq
seq :: a -> b -> b
This allows us to control the evaluation order, it would evaluate a first, then return b
let x = 1+2 in seq x (f x)
reduce the thunk before apply f
sum’
sum’ acc [] = accsum’ acc (x:xs) = let z = (acc+x) in seq z (sum’ z xs)
seq :: a -> b -> bit would evaluate a first, then return b
sum’sum’ 0 [1,2,3,4]= sum’ (1) [2,3,4]= sum’ (3) [3,4]= sum’ (6) [4]= sum’ (10) []= 10
No more stack overflow
Bang Patterns
sum’ !acc [] = accsum’ !acc (x:xs) = sum’ (acc+x) xs
{-# LANGUAGE BangPatterns -#}
For convenience, you don’t have to write so many ‘seq’s
deepseq
import Control.DeepSeq
deepseq :: NFData a => a -> b -> bdeepseq a b = rnf a `seq` b
-- A class of types that can be fully evaluated.class NFData a where rnf :: a -> () rnf a = a `seq` ()
NFData = Normal Form Data
rnf = reduce to normal form
deepseqinstance NFData a => NFData [a] where rnf [] = () rnf (x:xs) = rnf x `seq` rnf xs
instance (NFData a, NFData b, NFData c) => NFData (a,b,c) where rnf (x,y,z) = rnf x `seq` rnf y `seq` rnf z
Boxed vs Unboxed
The finite-precision integer type Int covers at least the range [ -2^29, 2^29 - 1]. As Int is an instance of the Bounded class, maxBound and minBound can be used to determine the exact Int range defined by an implementation
From Haskell98 Standard
One might imagine numbers naively represented in Haskell "as pointer to a heap-allocated object" which is either an unevaluated closure or is a "box" containing the number's actual value, which has now overwritten the closure
From HaskellWiki
No Definition in the Standard
Boxed vs UnboxedIt is GHC implementation detail. It is not defined in the Standard. It could be different in other implementation
Memory Layout of an Int
I# Int#One box is one machine word
Int is two words in GHC, one pointer of word-size pointing to a word-size heap object
Boxed vs UnboxedIn GHC, types ending in hashes are unboxed types: Int#, Float#, Double#,
Memory Layout of an Int#
Int#Only one machine word
Unboxed Typeimport GHC.Prim
data IntPair = IP Int# Int#
Memory Layout of an IntPair
IP3 machine words in total.
Int# Int#
UNPACKdata IntPair = IP {-# UNPACK #-} !Int
{-# UNPACK #-} !Int
Memory Layout of an IntPair
IP3 machine words in total.
Int# Int#
Real World Examples#ifdef __GLASGOW_HASKELL__data UArray i e = UArray !i !i !Int ByteArray##endif
-- | Boxed vectors, supporting efficient slicing.data Vector a = Vector {-# UNPACK #-} !Int {-# UNPACK #-} !Int {-# UNPACK #-} !(Array a) deriving ( Typeable )
Epilogue
To write high performance Haskell (or specifically in GHC), you have to understand Strict and Unboxed Types thoroughly.