[Haskell-cafe] Space leak with unsafePerformIO

Bertram Felgenhauer bertram.felgenhauer at googlemail.com
Sun Jun 27 06:36:43 EDT 2010


Henning Thielemann wrote:
> Attached is a program with a space leak that I do not understand. I
> have coded a simple 'map' function, once using unsafePerformIO and
> once without. UnsafePerformIO has a space leak in some circumstances.
> In the main program I demonstrate cases with and without space leak.
> Without space leak the program writes a file to the disk until it
> is full. Any idea?

The program relies on the GC doing short-cut evaluation of record
selectors to avoid a space leak. If the user of the function
splitAtLazy

| splitAtLazy :: [b] -> [a] -> ([a],[a])
| splitAtLazy nt xt =
|    (\ ~(ys,zs) -> (ys,zs)) $
|    case (nt,xt) of
|       (_:ns, x:xs) ->
|          let (ys,zs) = splitAtLazy ns xs
|          in  (x:ys,zs)
|       (_, xs) -> ([],xs)

somehow holds on to a reference of the returned pair while processing
the first part of the list, there will be a space leak, because that
means that the whole prefix remains reachable.

splitAtLazy itself is not leaky, because the value returned by the
recursive call is scrutinized as follows,

  Main.$wsplitAtLazy =
  ...
      (# case ds_sLb of wild_B1 { (ys_agC, zs_agE) -> ys_agC },
         case ds_sLb of wild_B1 { (ys_agC, zs_agE) -> zs_agE } #)

and ghc turns that into record selector thunks in the code generator.

The precise rule can be found in compiler/codeGen/StgCmmBind.hs:

| Note [Selectors]
| ~~~~~~~~~~~~~~~
| We look at the body of the closure to see if it's a selector---turgid,
| but nothing deep.  We are looking for a closure of {\em exactly} the
| form:
| 
| ...  = [the_fv] \ u [] ->
|          case the_fv of
|            con a_1 ... a_n -> a_i

Now let's look at how the result of splitAtLazy is used.

non-leaky version (case 0):

  Main.lvl1 =
    case Main.$wsplitAtLazy @ () @ GHC.Types.Char Main.xs Main.xs1
    of ww_sLv { (# ww1_sLx, ww2_sLy #) ->
    Main.go (GHC.Base.++ @ GHC.Types.Char ww1_sLx ww2_sLy)
    }

The return values are passed on to (++) directly. The result pair is
actually never built at all, so no reference to it can be kept.

leaky version (case 3):

  Main.ds =
    case Main.$wsplitAtLazy @ () @ GHC.Types.Char Main.xs Main.xs1
    of ww_sKy { (# ww1_sKA, ww2_sKB #) ->
    (ww1_sKA, ww2_sKB)
    }

This builds the pair returned by splitAtLazy.

  Main.lvl1 =
    case Main.ds of wild_Xw { (prefix_aCf, suffix_aCh) -> prefix_aCf }

Use of prefix: it's a record selector. This is fine.

  Main.lvl2 = Main.go Main.lvl1

The prefix is then passed to some worker function.

  Main.lvl3 =
    case Main.ds of wild_Xw { (prefix_aCf, suffix_aCh) ->
    Main.go1 suffix_aCh
    }

Use of suffix: Due to the call of  Main.go1  this is *not* a record
selector. It is compiled to an actual case expression, which to the
garbage collector looks just like an ordinary thunk. A reference to
Main.ds is kept around until the suffix is about to be processed
and a memory leak ensues.

If the compiler had produced

  Main.lvl3 =
    case Main.ds of wild_Xw { (prefix_aCf, suffix_aCh) ->
    suffix_aCh
    }

  Main.lvl4 = Main.go1 Main.lvl3

instead, then there would not be a leak. This whole record selector
thunk business is very fragile.

The good news is that the problem is completely unrelated to
unsafePerformIO (the presence of unsafePerformIO makes optimisations
more difficult, but any pure function of sufficient complexity would
have the same effect).

There's a simple fix for the problem, too: Change

>       let (prefix, suffix) = makeTwoLists 'a'

to
      let !(prefix, suffix) = makeTwoLists 'a'

in which case the compiler produces code similar to the non-leaky case
for all alternatives.

HTH,

Bertram


More information about the Haskell-Cafe mailing list