Maintaining laziness
From HaskellWiki
One of Haskell's main features is non-strict semantics, which in is implemented by lazy evaluation in all popular Haskell compilers. However many Haskell libraries found on Hackage are implemented just as if Haskell would be a strict language. This leads to unnecessary inefficiencies, memory leaks and, we suspect, unintended semantics. In this article we want to go through some techniques on how to check lazy behaviour on functions, examples of typical constructs which break laziness without need, and finally we want to link to techniques that may yield the same effect without laziness.
Contents |
1 Checking laziness
If you want to check whether a function is lazy enough, you may feed it with undefined values.
An undefined value can beThe latter one has the advantage that it cannot be hidden by some hacks like "catching" the error in the IO monad.
Examples:
Check whetherfilter even [0..] filter even ([0..5] ++ undefined)
then it keeps generating elements in the first case and it outputs a prefix of the output list, before breaking because of the undefined, in the second case.
An automated unit test can check whether infinite or corrupted input data produces correct prefixes.
Those tests usually do not fail by returningtestFilter0 = filter even [0..100] `isPrefixOf` filter even [0..] testFilter1 = filter even [0..100] `isPrefixOf` filter even ([0..102]++undefined) testFilter2 = let x = filter even [0..] !! 100 in x==x testFilter3 = let x = filter even ([0..102]++undefined) !! 50 in x==x
2 Laziness breakers
2.1 Maybe, Either, Exceptions
Some laziness breakers are visible in type signatures:
decodeUTF8 :: [Word8] -> Either Message String
This function cannot be lazy, because when you access the first character of the result,
it must already be computed, whether the result isFor this decision, the complete input must be decoded. A better type signature is
decodeUTF8 :: [Word8] -> (Maybe Message, String)
If you touch the first element of the pair, the complete decodings is triggered, thus laziness is broken.
This means you should first process theInstead of the unspecific pair type you should use the special type for asynchronous exceptions as found in the explicit exception package.
Especially in parsers you may find a function, called Wadler's force function.
It works as follows:
force y = let Just x = y in Just x
Then the runtime system need not to wait until it can determine the right constructor but it can proceed immediately.
This way a function can be made lazy, also if it returnsUsing force like functions is sometimes necessary, but should be avoided for data types with more than one constructor. It is better to use an interim data type with one constructor and lift to the multi-constructor datatype when needed.
Consider parsers of typemany :: StateT [Word8] Maybe a -> StateT [Word8] Maybe [a]
It shall just return an empty list, if parsing of one element fails.
A quick hack would be to definemany :: StateT [Word8] Maybe a -> StateT [Word8] Identity [a]
2.2 Early decision
2.2.1 List construction
Be aware that the following two expressions are not equivalent.
-- less lazy if b then f x else f y -- more lazy f (if b then x else y)
which is a difference in non-strict semantics.
Consider e.g.It is common source of too much strictness to make decisions too early and thus duplicate code in the decision branches. Intuitively spoken, the bad thing about code duplication (stylistic questions put aside) is, that the run-time system cannot see that in the branches some things are equal and do it in common before the critical decision. Actually, the compiler and run-time system could be "improved" to do so, but in order to keep things predictable, they do not do so. Even more, this behaviour is required by theory, since by pushing decisions to the inner of an expression you change the semantics of the expression. So we return to the question, what the programmer actually wants.
Now, do you think this expression
if b then [x] else y:ys
is maximally lazy? It seems so, but actually it is not. In both branches we create non-empty lists, but the run-time system cannot see this.
It islet z:zs = if b then [x] else y:ys in z:zs
This error can only caught at run-time which is bad. We can avoid it using the single constructor pair type.
let (z,zs) = if b then (x,[]) else (y,ys) in z:zs
which can be abbreviated to
uncurry (:) (if b then (x,[]) else (y,ys))
In the Haskell 98 report the implementation
inits :: [a] -> [[a]] inits [] = [[]] inits (x:xs) = [[]] ++ map (x:) (inits xs)
is suggested.
However you find thatThe following implementation does exactly this:
inits :: [a] -> [[a]] inits xt = [] : case xt of [] -> [] x:xs -> map (x:) (inits xs)
See also the article on base cases and identities.
2.2.2 Reader-Writer-State monad
I do not know whether the following example can be simplified. In this form it occured in a real application, namely the HTTP package.
Consider the following action of theThe state of the monad is the input list we fetch the elements from. The reader part provides an element which means that the input is consumed. It is returned as singleton when the caller tries to read from a completely read input. The writer allows to log some information, however the considered action does not output something to the log.
getN :: Int -> RWS a [Int] [a] [a] getN n = do input <- get if null input then asks (:[]) else let (fetched,rest) = splitAt n input in put rest >> return fetched
that is, only if there is something to fetch. This works even more many corner cases, but not in the following one.
Althoughit requires to read the input in order to find out, that nothing was logged:
*Test> (\(_a,_s,w) -> w) $ runRWS (getN 5) '\n' undefined *** Exception: Prelude.undefined
which is undefined, since the input is undefined. Thus we must ensure, that the invoked monadic actions are run independent from the input. Only this way, the run-time system can see that the logging stream is never touched.
We start refactoring by callinggetN :: Int -> RWS a [Int] [a] [a] getN n = do input <- get let (fetched,rest) = splitAt n input put rest if null input then asks (:[]) else return fetched
getN :: Int -> RWS a [Int] [a] [a] getN n = do input <- get let (fetched,rest) = splitAt n input put rest endOfInput <- ask return $ if null input then [endOfInput] else fetched
Now things work as expected:
*Test> (\(_a,_s,w) -> w) $ runRWS (getN 5) '\n' undefined []
We learn from this example, that sometimes in Haskell it is more efficient to call functions that are not needed under some circumstances. Always remind, that the do notation looks only imperative, but it is not imperative.
E.g.,
2.3 Strict pattern matching in a recursion
Consider theThis function should also work on infinite lists, but the implementation shipped with GHC up to 6.2 failed on infinite lists. What happened?
The reason was too strict pattern matching.
Let's first consider the following correct implementation:
partition :: (a -> Bool) -> [a] -> ([a], [a]) partition p = foldr (\x ~(y,z) -> if p x then (x : y, z) else (y, x : z)) ([],[])
However, how can this work if there is a list without an end?
That can be seen when applying the definition offoldr :: (a -> b -> b) -> b -> [a] -> b foldr _ b [] = b foldr f b (a:as) = f a (foldr f b as)
Now we expand this once for an infinite input list, we get
partition p (a:as) = (\ ~(y,z) -> if p a then (a:y, z) else (y, a:z)) (foldr ... ([],[]) as)
Omitting it, would require the evaluation of the whole input list before the first output element can be determined.
This fails for infinite lists and is inefficient for finite lists, and that was the bug in former implementations ofBtw. by the expansion you also see, that it would not help to omit the tilde and apply the above 'force' trick to the 'if-then-else' expression.
2.4 List reversal
Any use of the list functionsince when you access the first element of a reversed list, then all nodes of the input list must be evaluated and stored in memory. Think twice whether it is really needed. The article Infinity and efficiency shows how to avoid list reversal.
3 Alternatives
From the above issues you see that laziness is a fragile thing. Make one mistake and a function, carefully developed with laziness in mind, is no longer lazy. The type system will rarely help you hunting laziness breakers, and there is little support by debuggers.
Thus detecting laziness breakers will often requires understanding of a large portion of code, which is against the idea of modularity.
Maybe for your case you will prefer a different idiom, that achieves the same goals in a safer way. See e.g. the Enumerator and iteratee pattern.
