sequence causing stack overflow on pretty small lists

Mon Aug 26 10:46:45 CEST 2013

On #haskell we recently had a discussion about the following:

   import System.Random

   list <- replicateM 1000000 randomIO :: IO [Int]

I would think that this gives us a list of a million random Ints. In
fact, this is what happens in ghci. But with ghc we get:

   Stack space overflow: current size 8388608 bytes.
   Use `+RTS -Ksize -RTS' to increase it.

This is because sequence is implemented as

     sequence (m:ms) = do x <- m
                          xs <- sequence ms
                          return (x:xs)

and uses stack space when used on some [IO a].

>From a theoretical side, this is an implementation detail. From the
software engineering side this disastrous because the code is

  * obviously correct by itself
  * the first thing people would come up with
  * not exaggerating: a million elements is not much
  * used a lot of places: mapM, replicateM are *everywhere*

and yet it will kill our programs, crash our airplanes, and give no
helpful information where the problem occurred.

Effectively, sequence is a partial function.

(Note: We are not trying to obtain a lazy list of random numbers, use
any kind of streaming or the likes. We want the list in memory and use it.)

We noticed that this problem did not happen if sequence were implemented
with a difference list.

What do you think about this? Should we "fix" functions like this,
probably trading off a small performance hit, or accept that idiomatic
Haskell code can crash at any time?