[Haskell-cafe] Two-iteration optimisation (was: GHC Predictability)

Mon May 19 03:22:58 EDT 2008

On Wed, May 14, 2008 at 06:08:28PM +0100, Paul Johnson wrote:
> We've had a big debate over
>
> > mean xs = foldl' (+) 0 xs / fromIntegral (length xs)
>
> For anyone who didn't follow it, the problem is that "mean" needs to  
> traverse its argument twice, so the entire list has to be held in  
> memory.  So if "xs = [1..1000000000]" then "mean xs" uses all your  
> memory, but "foldl' (+) 0 xs" and "length xs" by themselves do not.
>
> The solution is for the programmer to rewrite "mean" to accumulate a  
> pair containing the running total and count together, then do the  
> division.  This makes me wonder: could there be a compiler optimisation  
> rule for this, collapsing two iterations over a list into one.  I've  
> never tried writing GHC rules, but something like:
>
> >  f (foldl' g x) (foldl' h x) = (uncurry f) (foldl' (\i (gt,ht) -> (g i 
> gt, h i ht)))
>
> The first problem that occurs to me is the number of permutations of  
> fold and map functions.
>
> Paul.

There are two problems with this rule.  The first is that the function is not
strict in 'gt' and 'ht' -- this is easily fixed with a little bit of seq.  The
other problem is that 'f' must be strict in both parameters for this rule to
hold.  As far as I know, there is no access to strictness information in rule
pragmas.

Cheers,
Spencer Janssen