[Haskell-beginners] leaky folding

Daniel Fischer daniel.is.fischer at web.de
Sun Aug 15 10:20:54 EDT 2010


On Sunday 15 August 2010 09:58:49, Travis Erdman wrote:
> in the code below, finalmap seems fast enough ... but it has a space
> leak. otoh, finalmap'rnf runs in constant space, but its performance is
> terrible, at least 4x slower than finalmap.

The performance isn't so bad, actually.
Consider that in each step it has to rnf the entire map. Even for small 
maps like this one, that's quite a bit of work (and mostly unnecessary, 
because almost everything is already in normal form).

>
> this is a common problem i'm having ... foldl' isn't strict enough, but
> foldl'rnf kills performance.

Firstly, foldl'rnf is a good idea *only* for small structures or if in each 
step large parts of the structure are changed. If you change only small 
parts of a large structure, you're wasting a lot of work.
Secondly, in this case your problem is a) the choice of a suboptimal data 
structure b) a bad choice of functions to manipulate the structure c) 
perhaps the lacking strictness of Data.IntMap.

a) STUArray would be better. But that makes itself really felt only for 
larger n.
b) that's the biggo
c) Data.IntMap doesn't offer any strict versions of insertWith, 
insertWithKey, adjust et al, which would often make it far easier to avoid 
space leaks. Data.Map at least offers strict(er) versions of 
insertWith[Key].

> And not only with IntMap as the cumulating
> data structure, but others as well.
>
> any ideas on this one?

Sure. See below.

> how can i get a fast fold in constant space?
>
> thanks again,
>
> travis
>
>
> {-# LANGUAGE BangPatterns #-}
>
> import System.Environment
> import Foreign (unsafePerformIO)
> import System.Random.Mersenne
> import Data.List
> import Control.DeepSeq
> import Control.Parallel.Strategies
> import qualified Data.IntMap as IntMap
>
> mersennegen = unsafePerformIO $ newMTGen Nothing
> infrandoms =  unfoldr ( Just . splitAt 3) $ map (\x -> abs (x `mod` n))
> (unsafePerformIO $ (randoms mersennegen)::[Int])

Arrrgh! Use of unsafePerformIO in that way makes my head hurt. Pass things 
as arguments, please.

>
> n = 200
>
> foldl'rnf :: NFData a => (a -> b -> a) -> a -> [b] -> a
> foldl'rnf f z xs = lgo z xs
>     where
>         lgo z []     = z
>         lgo !z (x:xs) = lgo (runEval (rdeepseq (f z x))) xs
>
> startmap = IntMap.fromDistinctAscList $ zip [0..] [1..n]

That is almost certainly a bad idea. If the map contains a contiguous range 
of keys, an array is practically guaranteed to be more appropriate (uses 
less memory and is faster to boot).

>
> finalmap x = foldl' g startmap (take x infrandoms)

Okay, that looks reasonable.

> finalmap'rnf x = foldl'rnf g startmap (take x infrandoms)

As stated above, rnf'ing the entire map at each step is a baad idea.
You can get reasonable performance while squashing the leak by splitting 
your list in chunks of size k and rnf'ing the map only after each chunk has 
been processed.

finalmap'rnf x = foldl'rnf h startmap
       (takeWhile (not . null) 
          (unfoldr (Just . splitAt 200) (take x infrandoms)))
    where
        h mp lst = foldl' g mp lst

The chunk size should be approximately the size of the map, so that after 
each chunk
- a large part of the map has been changed
- no value has been changed too often.
Then rnf'ing the entire map doesn't waste too much work (since only a small 
part of the map is already in NF) and no key maps to a too large thunk 
(which can cause a space leak with vanilla foldl', though here the problem 
is something else).

But, here at least, changing the folded function does better.

>
> g:: IntMap.IntMap Int -> [Int] -> IntMap.IntMap Int
> g !a [x,y,z] = IntMap.adjust (const $ y + (a IntMap.! z) `mod` n) x a

Oy gevalt!

The bang on the map parameter is superfluous since we foldl' anyway, but 
that's no big deal.

IntMap.adjust (const val) is not the best if the key at which the map is to 
be adjusted is guaranteed to be in the map. Then IntMap.insert is better.
But the difference is small, and if the key is not algorithmically 
guaranteed to be present, adjust is cleaner. So the use of adjust is no big 
deal either, even if insert is faster here.

What is bad is that you IntMap.adjust (const thunk).
Well, that alone isn't so bad.
What kills you is that the thunk refers to the map.
Thus each modified map contains a reference to the previous version, the 
old contents can't be garbage collected, hello space leak (and finally 
getting the values involves hopping through old cells).

If you modify a data structure, don't let the modified version contain any 
thunks referencing the old. That spells space leak.

With

g a [x,y,z] = let !w = y + (a IntMap.! z) `mod` n
              in IntMap.adjust (const w) x a

you're fine with foldl'.

Question by the way, did you really want

y + ((a ! z) `mod` n)

or rather

(y + (a ! z)) `mod` n
?

>
> main = do
>         args <- getArgs
>         print $ finalmap (read $ head args)



More information about the Beginners mailing list