[Haskell-cafe] What does laziness buy you?

Thu Oct 16 05:20:10 EDT 2008

One thing that is hard to communicate to people in industry is what
value laziness has.  You can point people at the WhyFP paper, and talk
about infinite data structures and decoupling generation from search
until you're blue in the face, but it's so far beyond the day-to-day
work of regular programmers that they don't really comprehend.

But I was talking to Simon Marlow yesterday and he pointed out
something that strikes me as a great example to give of "what laziness
buys you".

Lets talk about atomic changes.  In particular, I want to examine the function

    atomicModifyIORef :: IORef a -> (a -> (a,b)) -> IO b

This takes an IORef, reads the value out of it, calls some code that
gets a new value and may return some derived data, writes the new
value into the IORef, and returns the derived data.  It has the
invariant that no other change to that IORef can happen in between.

Now, there are all sorts of issues like "what if 'f' has a side
effect" that get sidestepped entirely by purity, but I don't want to
talk about those; I think the benefits of purity are easier to argue
in many contexts.

Lets just look at a simple Haskell implementation:

> atomicModifyIORef p f = do
>    a <- readIORef p
>    let r = f a
>    writeIORef p (fst r)
>    return (snd r)

This is a non-atomic implementation, but we can fix it assuming we
have "compare-and-swap" available; any language with boxed types can
do this:

> -- low-level primitive that uses pointer equality behind the scenes
> compareAndSwap# :: IORef a -> a -> a -> IO Bool

> atomicModifyIORef p f = do
>     a <- readIORef p
>     let r = f a
>     let aNew = fst r
>     ok <- compareAndSwap# p a aNew
>     if ok
>         then return (snd r)
>         else atomicModifyIORef p f

This loops until compareAndSwap# succeeds and is a simple lock-free
implementation.

But what if f takes a long time to run?  In a strict language, if
there was contention on p, it's likely that atomicModifyIORef never
finishes executing due to starvation!  But since Haskell is lazy, each
attempt to modify p is a constant-time operation!  We just allocate
thunks for "r" and "aNew" and do the compare-and-swap.

Now, the responsibility for executing f is shifted to anyone who reads
from the IORef; they get a thunk.  It's likely that this thread will
do so, as the value returned by atomicModifyIORef depends on f
executing, but if this thread gets suspended or there is contention,
the next reader can pick up the slack!

In fact, at the low level we can improve this further; we can build
the thunks for "r" and "aNew" with a hole to put the value we read out
of the IORef; the inner loop then looks like this:

> a <- readIORef p
> update thunk hole in "r" with "a"
> ok <- compareAndSwap# p a aNew

which compiles down to just a pointer read, assignment, and CAS instruction.

This blew my mind; from the "imperative, strict" point of view, this
sort of operation is impossible to do reasonably, and here in Haskell
we can get a great "probability of success"; the "critical section"
where we have to loop if someone else modified the value behind our
back is just a few instructions long!

  -- ryan