Difference between revisions of "Thunk"

From HaskellWiki
Jump to navigation Jump to search
(start)
(No difference)

Revision as of 15:17, 15 December 2006

A thunk is a value that is yet to be evaluated. Haskell is lazy, it does not evaluate a thunk unless it has to. Expressions in Haskell are translated into a graph (not a tree, as you might have expected, this enables sharing, and infinite lists!) and a Spineless Tagless G-machine (STG, G for graph, I suppose?) reduces it, chucking out any unneeded thunk, unevaluated.

Why are thunks useful?

Well, if you don't need it, why evaluate it? Take for example the "lazy" && (and) operation. Two boolean expressions joined by && together is true if and only if both of them are. If you find out that one of them is false, you immediately know the joined expression cannot be true.

-- the first is false, so is the answer, don't even need to know what the other is
False && _ = False
-- so the first turns out to be true, hmm...
-- if the second is true, then the result is true
-- if it's false, so is the result
-- in other words, the result is the second!
True  && x = x

This function only evaluates the first parameter, because that's all that is needed. Even if the first parameter is true, you don't need to evaluate the second, so don't (so this version effectively is smarter than the explicit truth table). Who knows, the second parameter may get thrown out later as well!

Perhaps a more convincing example is a (naive but intuitive) algorithm to find out if a given number is prime.

-- the non-trivial factors are those who divide the number so no remainder
factors n = filter (\m -> n `mod` m == 0) [2 .. (n - 1)]
-- a number is a prime if it has no non-trivial factors
isPrime n = n > 1 && null (factors n)

Fascinatingly, isPrime evaluates to False as soon as it finds a factor (due to the lazy definition of null), and discards the rest of the list.

When are thunks not so useful?

If you keep building up a very complication graph to reduce later, it consumes memory (naturally), and can hinder performance, like, (from a blog comment made by Cale Gibbard)

foldl (+) 0 [1,2,3]
==> foldl (+) (0 + 1) [2,3]
==> foldl (+) ((0 + 1) + 2) [3]
==> foldl (+) (((0 + 1) + 2) + 3) []
==> ((0 + 1) + 2) + 3
==> (1 + 2) + 3
==> 3 + 3
==> 6

and by ==>, "is reduced to" is meant. Lucky, the example is not applied to [1 .. 2^40] because it would have taken a very long time to load this page then, and so would your program when run. For more involved (and subtle!) examples, see Performance/Strictness.

See also

Performance/Laziness