[Haskell-beginners] Profiling introduces a space leak where there was none before?

Daniel Fischer daniel.is.fischer at web.de
Fri Aug 13 09:41:52 EDT 2010


On Friday 13 August 2010 03:20:49, Travis Erdman wrote:
> In Ch 25 of Real World Haskell, the authors introduce some naive code
> for finding the average of a big list; it has a space leak, and they
> present several solutions.
>
>
> Below are two of the solutions that successfully eliminate the space
> leak (though, the first one -- the one that uses foldl'rnf -- is quite a
> bit faster).   However, if compiled with profiling, the first one (using
> foldl'rnf) NOW has a leak.  The second solution (foldl') does not have a
> leak even when profiling is enabled.
>
> I have used this foldl'rnf function in my own code, as it is the only
> solution I have found for a space leak in my own code.  But, since it
> leaks when profiled, it is making analysis difficult.
>
> Is this a feature, bug, or user error?  If a known issue, is there a
> workaround?  The code and some documenting output follows.

I must admit I don't really understand what's going on.
However, compiling for profiling makes some optimisations impossible, so 
different behaviour between profiling and non-profiling code isn't too 
surprising.
Since the profiling version overflows the default stack, it seems to be a 
problem of missing strictness.
I believe, what happens is that profiling prevents too much inlining, so 
that the strictness analyser gets confused.

>
> thanks,
>
> Travis
> ------------------------------------
>
> {-# LANGUAGE BangPatterns #-}
>
> import System.Environment
> import Text.Printf
> import Control.Parallel.Strategies
> import Control.DeepSeq
> import Data.List (foldl')
>
> main = do
>     [d] <- map read `fmap` getArgs
>     printf "%f\n" (mean [1..d])
>
> foldl'rnf :: NFData a => (a -> b -> a) -> a -> [b] -> a
> foldl'rnf f z xs = lgo z xs
>     where
>         lgo z []     = z
>         lgo z (x:xs) = lgo z' xs
>             where
>                 z' = f z x `using` rdeepseq

You get better Core and a faster mean with

     lgo z (x:xs) = let z' = f z x in deepseq z' (lgo z' xs)

as the second equation. Alas, that doesn't fix the profiling space-leak.
For fixing the space leak, it is important whether the fold is defined in a 
library module and compiled separately or, as is the case here, it's 
defined in the Main module and not exported.
There are several variants that fix the leak in the latter setting but not 
in the former.

Since the former is the interesting case (in the latter case you can write 
faster specialised code), the version that fixes the profiling space leak 
as a separately compiled library function (at least, there's no leak here):


noleak :: NFData a => (a -> b -> a) -> a -> [b] -> a
noleak f = nol
  where
    nol !z [] = z
    nol z (x:xs) = case rdeepseq (f z x) of
                    Done z' -> nol z' xs

The important points are
- manually inlining `using` in some way
- the bang on z in the first equation (would probably also work with a bang 
in the second equation instead of the first)

The above gives however a spurious deprecation warning (the warning code 
confuses the data constructor Done of data Eval with the deprecated type 
alias type Done = (), so it warns). To avoid that, you can also write the 
second equation as

    nol z (x:xs) = nol (runEval (rdeepseq (f z x))) xs

Non-profiling performance is, as far as I can tell, identical to that of 
your foldl'rnf.



More information about the Beginners mailing list