Difference between revisions of "Performance/Strictness"

From HaskellWiki
Jump to navigation Jump to search
(foldl & strictness, formatting)
m (<pre-haskell>. incorrect removed.)
Line 8: Line 8:
   
 
Optimising compilers like GHC try to reduce the cost of laziness using ''strictness analysis'', which attempts to determine which function arguments are always evaluated by the function, and hence can be evaluated by the caller instead. Sometimes this leads to bigger gains; a strict <code-haskell>Int</code-haskell> can be passed as an unboxed value, for example. Strictness analysis sometimes does wonderful things; for example it is very good at optimising <code-haskell>fac</code-haskell>:
 
Optimising compilers like GHC try to reduce the cost of laziness using ''strictness analysis'', which attempts to determine which function arguments are always evaluated by the function, and hence can be evaluated by the caller instead. Sometimes this leads to bigger gains; a strict <code-haskell>Int</code-haskell> can be passed as an unboxed value, for example. Strictness analysis sometimes does wonderful things; for example it is very good at optimising <code-haskell>fac</code-haskell>:
  +
<pre-haskell>
 
fac :: Int -> Int
+
fac :: Int -> Int
fac n = if n <= 1 then 1 else n * fac (n-1)
+
fac n = if n <= 1 then 1 else n * fac (n-1)
  +
</pre-haskell>
   
 
Strictness analysis can spot the fact that the argument <code-haskell>n</code-haskell> is strict, and can be represented unboxed. The resulting function won't use any heap while it is running, as you'd expect.
 
Strictness analysis can spot the fact that the argument <code-haskell>n</code-haskell> is strict, and can be represented unboxed. The resulting function won't use any heap while it is running, as you'd expect.
   
 
The common case of misunderstanding of strictness analysis is when [[Fold|folding]] (reducing) lists. If this program
 
The common case of misunderstanding of strictness analysis is when [[Fold|folding]] (reducing) lists. If this program
  +
<pre-haskell>
 
main = print (foldl (+) 0 [1..1000000])
+
main = print (foldl (+) 0 [1..1000000])
  +
</pre-haskell>
 
 
is compiled in GHC without "-O" flag, it uses a lot of heap and stack. A programmer knows that the long list (<code-haskell>[1..1000000]</code-haskell>) is stored as a thunk, not fully, because the programmer read articles [[Non-strict semantics]] and [[Lazy vs. non-strict]]. The programmer explicitly wrote <code-haskell>sum</code-haskell> as [[Tail recursion|tail recursive]], so the program should use a small amount of stack, because the programmer read [[Stack overflow]]. So behavior of the program looks mysterious to the programmer.
 
is compiled in GHC without "-O" flag, it uses a lot of heap and stack. A programmer knows that the long list (<code-haskell>[1..1000000]</code-haskell>) is stored as a thunk, not fully, because the programmer read articles [[Non-strict semantics]] and [[Lazy vs. non-strict]]. The programmer explicitly wrote <code-haskell>sum</code-haskell> as [[Tail recursion|tail recursive]], so the program should use a small amount of stack, because the programmer read [[Stack overflow]]. So behavior of the program looks mysterious to the programmer.
   
Line 23: Line 24:
   
 
Look at the definition from the standard library.
 
Look at the definition from the standard library.
  +
<pre-haskell>
 
foldl :: (a -> b -> a) -> a -> [b] -> a
+
foldl :: (a -> b -> a) -> a -> [b] -> a
foldl f z0 xs0 = lgo z0 xs0
+
foldl f z0 xs0 = lgo z0 xs0
where
+
where
lgo z [] = z
+
lgo z [] = z
lgo z (x:xs) = lgo (f z x) xs
+
lgo z (x:xs) = lgo (f z x) xs
  +
</pre-haskell>
   
 
<code-haskell>lgo</code-haskell>, instead of adding elements of the long list, creates '''a thunk''' for <code-haskell>(f z x)</code-haskell>. <code-haskell>z</code-haskell> is stored within that thunk, and <code-haskell>z</code-haskell> is a thunk also, created during the previous call to <code-haskell>lgo</code-haskell>. The program creates the long chain of thunks. Stack is bloated when evaluating that chain.
 
<code-haskell>lgo</code-haskell>, instead of adding elements of the long list, creates '''a thunk''' for <code-haskell>(f z x)</code-haskell>. <code-haskell>z</code-haskell> is stored within that thunk, and <code-haskell>z</code-haskell> is a thunk also, created during the previous call to <code-haskell>lgo</code-haskell>. The program creates the long chain of thunks. Stack is bloated when evaluating that chain.
Line 37: Line 39:
   
 
It's easy to accidentally write functions that aren't strict, though. Often a lazy function can be sitting around eating up your performance, when making it strict wouldn't change the meaning of the program. For example:
 
It's easy to accidentally write functions that aren't strict, though. Often a lazy function can be sitting around eating up your performance, when making it strict wouldn't change the meaning of the program. For example:
  +
<pre-haskell>
 
suminit :: [Int] -> Int -> Int -> (Int,[Int])
+
suminit :: [Int] -> Int -> Int -> (Int,[Int])
suminit xs len acc = case len == 0 of
+
suminit xs len acc = case len == 0 of
True -> (acc,xs)
+
True -> (acc,xs)
False -> case xs of
+
False -> case xs of
[] -> (acc,[])
+
[] -> (acc,[])
x:xs -> suminit xs (len-1) (acc+x)
+
x:xs -> suminit xs (len-1) (acc+x)
main = print (fst (suminit [1..] 1000000 0))
+
main = print (fst (suminit [1..] 1000000 0))
  +
</pre-haskell>
 
 
this function sums the first len elements of a list, returning the sum and the remaining list. We've already tried to improve performance by using an [[Performance/Accumulating parameter|accumulating parameter]]. However, the parameter <code-haskell>acc</code-haskell> isn't strict, because there's no guarantee that the caller will evaluate it. The compiler will use a fully boxed <code-haskell>Int</code-haskell> to represent <code-haskell>acc</code-haskell>, although it will probably use an unboxed <code-haskell>Int</code-haskell> to represent <code-haskell>len</code-haskell>. The expression <code-haskell>(acc+x)</code-haskell> will be saved as a suspension, rather than evaluated on the spot. (Incidentally, this is a common pattern we see crop up time and again in small recursive functions with a few parameters).
 
this function sums the first len elements of a list, returning the sum and the remaining list. We've already tried to improve performance by using an [[Performance/Accumulating parameter|accumulating parameter]]. However, the parameter <code-haskell>acc</code-haskell> isn't strict, because there's no guarantee that the caller will evaluate it. The compiler will use a fully boxed <code-haskell>Int</code-haskell> to represent <code-haskell>acc</code-haskell>, although it will probably use an unboxed <code-haskell>Int</code-haskell> to represent <code-haskell>len</code-haskell>. The expression <code-haskell>(acc+x)</code-haskell> will be saved as a suspension, rather than evaluated on the spot. (Incidentally, this is a common pattern we see crop up time and again in small recursive functions with a few parameters).
   
Line 55: Line 57:
   
 
For <code-haskell>suminit</code-haskell>, we need to make <code-haskell>acc</code-haskell> strict. The way to do this is using <code-haskell>seq</code-haskell>:
 
For <code-haskell>suminit</code-haskell>, we need to make <code-haskell>acc</code-haskell> strict. The way to do this is using <code-haskell>seq</code-haskell>:
  +
<pre-haskell>
 
suminit :: [Int] -> Int -> Int -> (Int,[Int])
+
suminit :: [Int] -> Int -> Int -> (Int,[Int])
suminit xs len acc = acc `seq` case len == 0 of
+
suminit xs len acc = acc `seq` case len == 0 of
True -> (acc,xs)
+
True -> (acc,xs)
False -> case xs of
+
False -> case xs of
[] -> (acc,[])
+
[] -> (acc,[])
x:xs -> suminit xs (len-1) (acc+x)
+
x:xs -> suminit xs (len-1) (acc+x)
  +
</pre-haskell>
   
 
Some other languages (eg. Clean) have strictness annotations on types, which is a less ugly way to express this, but for now there are no Haskell compilers that support this.
 
Some other languages (eg. Clean) have strictness annotations on types, which is a less ugly way to express this, but for now there are no Haskell compilers that support this.
   
 
With the BangPatterns GHC extension enabled (either explicitly or with "-fglasgow-exts"), the above can be written as
 
With the BangPatterns GHC extension enabled (either explicitly or with "-fglasgow-exts"), the above can be written as
  +
<pre-haskell>
 
suminit xs !len !acc = ...
+
suminit xs !len !acc =
  +
</pre-haskell>
   
 
Incidentally, GHC will also eliminate the tuple returned by this function if the caller immediately deconstructs it.
 
Incidentally, GHC will also eliminate the tuple returned by this function if the caller immediately deconstructs it.
Line 74: Line 78:
   
 
There's a useful variant of the infix application operator <code-haskell>($)</code-haskell> that evaluates its argument strictly: <code-haskell>($!)</code-haskell>. This can often be used to great effect in eliminating unnecessary suspensions that the compiler hasn't spotted. eg. in a function application
 
There's a useful variant of the infix application operator <code-haskell>($)</code-haskell> that evaluates its argument strictly: <code-haskell>($!)</code-haskell>. This can often be used to great effect in eliminating unnecessary suspensions that the compiler hasn't spotted. eg. in a function application
  +
<pre-haskell>
 
f (g x)
+
f (g x)
  +
</pre-haskell>
 
 
writing instead
 
writing instead
  +
<pre-haskell>
 
f $! (g x)
+
f $! (g x)
  +
</pre-haskell>
 
 
will be more efficient if (a) you were going to evaluate <code-haskell>(g x)</code-haskell> anyway, and (b) <code-haskell>f</code-haskell> isn't visibly strict, or inlined. If <code-haskell>f</code-haskell> is strict or inlined, then the chances are that <code-haskell>($!)</code-haskell> is unnecessary cruft here.
 
will be more efficient if (a) you were going to evaluate <code-haskell>(g x)</code-haskell> anyway, and (b) <code-haskell>f</code-haskell> isn't visibly strict, or inlined. If <code-haskell>f</code-haskell> is strict or inlined, then the chances are that <code-haskell>($!)</code-haskell> is unnecessary cruft here.
   
 
A good example is the monadic return. If you find yourself writing
 
A good example is the monadic return. If you find yourself writing
  +
<pre-haskell>
 
do ...
+
do
...
+
return (fn x)
+
return (fn x)
  +
</pre-haskell>
   
 
then consider instead writing
 
then consider instead writing
  +
<pre-haskell>
 
do ...
+
do
...
+
return $! fn x
+
return $! fn x
  +
</pre-haskell>
 
 
it is very rare to actually need laziness in the argument of return here.
 
it is very rare to actually need laziness in the argument of return here.
 
NB. do not do this if the expression on the right of <code-haskell>$!</code-haskell> is a variable - that just wastes effort, because it does not eliminate a suspension. The only reason to do this would be if you were eliminating a [[space leak]].
 
   
 
Warning: Using any kind of strictness annotations as above can have unexpected impact on program semantics, in particular when certain optimizations are performed by the compiler. See [[correctness of short cut fusion]].
 
Warning: Using any kind of strictness annotations as above can have unexpected impact on program semantics, in particular when certain optimizations are performed by the compiler. See [[correctness of short cut fusion]].
Line 111: Line 114:
   
 
Example:
 
Example:
  +
<pre-haskell>
 
-- Force Strict: Make g's argument smaller.
 
f x = g $! (h x)
   
 
-- Don't force: f isn't building on x, so just let g deal with it.
-- Force Strict: Make g's argument smaller.
 
f x = g $! (h x)
+
f x = g x
 
-- Don't force: f isn't building on x, so just let g deal with it.
 
f x = g x
 
   
-- Don't force: f is already strict in x
+
-- Don't force: f is already strict in x
f x = case x of
+
f x = case x of
 
0 -> (h (g x))
 
0 -> (h (g x))
  +
</pre-haskell>

Revision as of 16:59, 16 June 2010

Haskell Performance Resource

Constructs:
Data Types - Functions
Overloading - FFI - Arrays
Strings - Integers - I/O
Floating point - Concurrency
Modules - Monads

Techniques:
Strictness - Laziness
Avoiding space leaks
Accumulating parameter

Implementation-Specific:
GHC - nhc98 - Hugs
Yhc - JHC

Haskell is a non-strict language, and most implementations use a strategy called laziness to run your program. Basically laziness == non-strictness + sharing.

Laziness can be a useful tool for improving performance, but more often than not it reduces performance by adding a constant overhead to everything. Because of laziness, the compiler can't evaluate a function argument and pass the value to the function, it has to record the expression in the heap in a suspension (or thunk) in case it is evaluated later. Storing and evaluating suspensions is costly, and unnecessary if the expression was going to be evaluated anyway.

Strictness analysis

Optimising compilers like GHC try to reduce the cost of laziness using strictness analysis, which attempts to determine which function arguments are always evaluated by the function, and hence can be evaluated by the caller instead. Sometimes this leads to bigger gains; a strict <code-haskell>Int</code-haskell> can be passed as an unboxed value, for example. Strictness analysis sometimes does wonderful things; for example it is very good at optimising <code-haskell>fac</code-haskell>: <pre-haskell> fac :: Int -> Int fac n = if n <= 1 then 1 else n * fac (n-1) </pre-haskell>

Strictness analysis can spot the fact that the argument <code-haskell>n</code-haskell> is strict, and can be represented unboxed. The resulting function won't use any heap while it is running, as you'd expect.

The common case of misunderstanding of strictness analysis is when folding (reducing) lists. If this program <pre-haskell> main = print (foldl (+) 0 [1..1000000]) </pre-haskell> is compiled in GHC without "-O" flag, it uses a lot of heap and stack. A programmer knows that the long list (<code-haskell>[1..1000000]</code-haskell>) is stored as a thunk, not fully, because the programmer read articles Non-strict semantics and Lazy vs. non-strict. The programmer explicitly wrote <code-haskell>sum</code-haskell> as tail recursive, so the program should use a small amount of stack, because the programmer read Stack overflow. So behavior of the program looks mysterious to the programmer.

The programmer concludes that the program somehow decides to store the long list fully in the heap, or garbage collector is not able to remove dead prefix of the long list. Wrong. The long list is fine.

Look at the definition from the standard library. <pre-haskell> foldl :: (a -> b -> a) -> a -> [b] -> a foldl f z0 xs0 = lgo z0 xs0

 where
   lgo z []     =  z
   lgo z (x:xs) = lgo (f z x) xs

</pre-haskell>

<code-haskell>lgo</code-haskell>, instead of adding elements of the long list, creates a thunk for <code-haskell>(f z x)</code-haskell>. <code-haskell>z</code-haskell> is stored within that thunk, and <code-haskell>z</code-haskell> is a thunk also, created during the previous call to <code-haskell>lgo</code-haskell>. The program creates the long chain of thunks. Stack is bloated when evaluating that chain.

With "-O" flag GHC performs strictness analysis, than it knows that <code-haskell>lgo</code-haskell> is strict in <code-haskell>z</code-haskell> argument, therefore thunks are not needed and are not created.

Limitations of strictness analysis

It's easy to accidentally write functions that aren't strict, though. Often a lazy function can be sitting around eating up your performance, when making it strict wouldn't change the meaning of the program. For example: <pre-haskell> suminit :: [Int] -> Int -> Int -> (Int,[Int]) suminit xs len acc = case len == 0 of

 True  -> (acc,xs)
 False -> case xs of
   []   -> (acc,[])
   x:xs -> suminit xs (len-1) (acc+x)

main = print (fst (suminit [1..] 1000000 0)) </pre-haskell> this function sums the first len elements of a list, returning the sum and the remaining list. We've already tried to improve performance by using an accumulating parameter. However, the parameter <code-haskell>acc</code-haskell> isn't strict, because there's no guarantee that the caller will evaluate it. The compiler will use a fully boxed <code-haskell>Int</code-haskell> to represent <code-haskell>acc</code-haskell>, although it will probably use an unboxed <code-haskell>Int</code-haskell> to represent <code-haskell>len</code-haskell>. The expression <code-haskell>(acc+x)</code-haskell> will be saved as a suspension, rather than evaluated on the spot. (Incidentally, this is a common pattern we see crop up time and again in small recursive functions with a few parameters).

Explicit strictness

We can make an argument strict explicitely.

In the <code-haskell>foldl</code-haskell> example, replace <code-haskell>foldl</code-haskell> with <code-haskell>foldl'</code-haskell>.

For <code-haskell>suminit</code-haskell>, we need to make <code-haskell>acc</code-haskell> strict. The way to do this is using <code-haskell>seq</code-haskell>: <pre-haskell> suminit :: [Int] -> Int -> Int -> (Int,[Int]) suminit xs len acc = acc `seq` case len == 0 of

 True  -> (acc,xs)
 False -> case xs of
   []   -> (acc,[])
   x:xs -> suminit xs (len-1) (acc+x)

</pre-haskell>

Some other languages (eg. Clean) have strictness annotations on types, which is a less ugly way to express this, but for now there are no Haskell compilers that support this.

With the BangPatterns GHC extension enabled (either explicitly or with "-fglasgow-exts"), the above can be written as <pre-haskell> suminit xs !len !acc = … </pre-haskell>

Incidentally, GHC will also eliminate the tuple returned by this function if the caller immediately deconstructs it.

Evaluating expressions strictly

There's a useful variant of the infix application operator <code-haskell>($)</code-haskell> that evaluates its argument strictly: <code-haskell>($!)</code-haskell>. This can often be used to great effect in eliminating unnecessary suspensions that the compiler hasn't spotted. eg. in a function application <pre-haskell> f (g x) </pre-haskell> writing instead <pre-haskell> f $! (g x) </pre-haskell> will be more efficient if (a) you were going to evaluate <code-haskell>(g x)</code-haskell> anyway, and (b) <code-haskell>f</code-haskell> isn't visibly strict, or inlined. If <code-haskell>f</code-haskell> is strict or inlined, then the chances are that <code-haskell>($!)</code-haskell> is unnecessary cruft here.

A good example is the monadic return. If you find yourself writing <pre-haskell> do …

  …
  return (fn x)

</pre-haskell>

then consider instead writing <pre-haskell> do …

  …
  return $! fn x

</pre-haskell> it is very rare to actually need laziness in the argument of return here.

Warning: Using any kind of strictness annotations as above can have unexpected impact on program semantics, in particular when certain optimizations are performed by the compiler. See correctness of short cut fusion.

Rule of Thumb for Strictness Annotation

A rule of thumb for when strictness annotation might be needed:

When a function <code-haskell>f</code-haskell> with argument <code-haskell>x</code-haskell> satisfies both conditions:

  • <code-haskell>f</code-haskell> calls a function on a function of <code-haskell>x</code-haskell>: <code-haskell>(h (g x))</code-haskell>
  • is not already strict in <code-haskell>x</code-haskell> (does not inspect <code-haskell>x</code-haskell>'s value),

then it can be helpful to force evaluation:

Example: <pre-haskell> -- Force Strict: Make g's argument smaller. f x = g $! (h x)

-- Don't force: f isn't building on x, so just let g deal with it. f x = g x

-- Don't force: f is already strict in x f x = case x of

  0 -> (h (g x))

</pre-haskell>