[Haskell-cafe] Two-iteration optimisation (was: GHC Predictability)

Henning Thielemann lemming at henning-thielemann.de
Wed May 14 13:25:13 EDT 2008


On Wed, 14 May 2008, Paul Johnson wrote:

> We've had a big debate over
>
>> mean xs = foldl' (+) 0 xs / fromIntegral (length xs)
>
> For anyone who didn't follow it, the problem is that "mean" needs to traverse 
> its argument twice, so the entire list has to be held in memory.  So if "xs = 
> [1..1000000000]" then "mean xs" uses all your memory, but "foldl' (+) 0 xs" 
> and "length xs" by themselves do not.

I also wondered how to solve that problem. It also occurs in 
implementations of "wc", that is counting lines, words and characters of a 
text in parallel.

> The solution is for the programmer to rewrite "mean" to accumulate a pair 
> containing the running total and count together, then do the division.  This 
> makes me wonder: could there be a compiler optimisation rule for this, 
> collapsing two iterations over a list into one.  I've never tried writing GHC 
> rules, but something like:
>
>>  f (foldl' g x) (foldl' h x) = (uncurry f) (foldl' (\i (gt,ht) -> (g i gt,  h i ht)))

The left hand sides of GHC rules must have a definite function as 
top-level construct, whereas 'f' is a variable. You may define a function 
which points the optimizer to the places where something shall be 
replaced. Say

fuseFold :: (a -> b -> c) -> a -> b -> c
fuseFold f x y = f x y

{-# RULES
    forall f g h x y.
       fuseFold f (foldl' g x) (foldl' h x) = ...
   #-}


However, instead of writing library functions which use foldl', it might 
be better to implement and export the accumulating functions separately. 
They can be used with 'foldl's of arbitrary data structures and they can 
be fused explicitly without GHC rules. However, I suspect you cannot 
rewrite say (length (words str)) or (length (lines str)) in that manner in 
an elegant way.


> The first problem that occurs to me is the number of permutations of fold and 
> map functions.

foldl' could be merged with 'map' to a foldl', however the existing fusion 
frameworks use different primitives, like foldr/build.


More information about the Haskell-Cafe mailing list