[Haskell-cafe] How do I get this done in constant mem?

Luke Palmer lrpalmer at gmail.com
Fri Oct 9 19:48:15 EDT 2009


On Fri, Oct 9, 2009 at 2:05 PM,  <mf-hcafe-15c311f0c at etc-network.de> wrote:
> Hi all,
>
> I think there is something about my use of the IO monad that bites me,
> but I am bored of staring at the code, so here you g.  The code goes
> through a list of records and collects the maximum in each record
> position.
>
>
> -- test.hs
> import Random
> import System.Environment (getArgs)
> import System.IO (putStr)
>
> samples :: Int -> Int -> IO [[Double]]
> samples i j = sequence . replicate i . sequence . replicate j $ randomRIO (0, 1000 ** 3)

Yes, you should not do this in IO.  That requires the entire
computation to finish before the result can be used.  This computation
should be pure and lazy.

It is possible, using split (and I believe not without it, unless you
use mkStdGen), to make a 2D list of randoms where the random
generation matches exactly the structure of the list.

splits :: (RandomGen g) => Int -> g -> [g]
splits 0 _ = []
splits n g = let (g1,g2) = split g in g1 : splits (n-1) g2

samples :: (RandomGen g) => Int -> Int -> g -> [[Double]]
samples i j gen = map row (splits i gen)
    where
    row g = take j (randomRs (0, 10^9) g)

In fact, we could omit all these counts and make an infinite 2D list,
which you can cull in the client code.

splits :: (RandomGen g) => g -> [g]
splits g = let (g1,g2) = split g in g1 : splits g2

samples :: (RandomGen g) => g -> [[Double]]
samples = map row . splits
    where
    row = randomRs (0, 10^9)

I find the latter to be more straightforward and obvious.  Maintaining
the laziness here is a fairly subtle thing, so study, perturb, try to
write it yourself in different ways, etc.

> maxima :: [[Double]] -> [Double]
> maxima samples@(_:_) = foldr (\ x y -> map (uncurry max) $ zip x y) (head samples) (tail samples)

FWIW, This function has a beautiful alternate definition:

maxima :: [[Double]] -> [Double]
maxima = map maximum . transpose

> main = do
>  args <- getArgs
>  x <- samples (read (head args)) 5
>  putStr . (++ "\n") . show $ maxima x
>
>
> I would expect this to take constant memory (foldr as well as foldl),
> but this is what happens:
>
>
> $ ghc -prof --make -O9 -o test test.hs
> [1 of 1] Compiling Main             ( test.hs, test.o )
> Linking test ...
> $ ./test 100 +RTS -p
> [9.881155955344708e8,9.910336352165401e8,9.71000686630374e8,9.968532576451201e8,9.996200333115692e8]
> $ grep 'total alloc' test.prof
>        total alloc =     744,180 bytes  (excludes profiling overheads)
> $ ./test 10000 +RTS -p
> [9.996199711457872e8,9.998928358545277e8,9.99960283632381e8,9.999707142123885e8,9.998952151508758e8]
> $ grep 'total alloc' test.prof
>        total alloc =  64,777,692 bytes  (excludes profiling overheads)
> $ ./test 1000000 +RTS -p
> Stack space overflow: current size 8388608 bytes.
> Use `+RTS -Ksize' to increase it.
> $
>
>
> so...
>
> does sequence somehow force the entire list of monads into evaluation
> before the head of the result list can be used?

Yep.  IO is completely strict; in some sense the same as "call by
value" (don't take the analogy too far).  Rule of thumb: keep your
distance from it ;-)


More information about the Haskell-Cafe mailing list