[Haskell-cafe] Strange memory consumption problems in something that should be tail-recursive

Duncan Coutts duncan.coutts at worc.ox.ac.uk
Tue Feb 13 17:17:11 EST 2007


On Tue, 2007-02-13 at 15:12 -0600, Creighton Hogg wrote:
> 
> 
> On 2/13/07, Duncan Coutts <duncan.coutts at worc.ox.ac.uk> wrote:
>         On Tue, 2007-02-13 at 15:27 -0500, Jefferson Heard wrote:
>         > Hi, I am running the following code against a 210 MB file in
>         an attempt to
>         > determine whether I should use alex or whether, since my
>         needs are very
>         > performance oriented, I should write a lexer of my own.  I
>         thought that
>         > everything I'd written here was tail-recursive
>         
>         Isn't that exactly the problem - that it's tail recursive? You
>         do not
>         want it to be tail recursive since then it must consume the
>         whole input
>         before producing any output. You want it to be as lazy as
>         possible so
>         that it can start producing tokens as soon as possible without
>         having to
>         consume everything.
> 
> This may be silly of me, but I feel like this is an important point:
> so you're saying that tail recursion, without strictness, doesn't run
> in constant space?

There are two kinds of space use that you have to consider here. One is
the stack space and the other is the space required by whatever it is
that your recursive function is doing (in particular if your recursive
function constructs a list then you need space for that list).

> So for example in the case of, 
> facTail 1 n' = n'
> facTail n n' = facTail (n-1) (n*n')
> You'll just be building a bunch of unevaluated thunks until you hit
> the termination condition?

Actually yes, though with a slight modification we can fix that and make
it run in constant space:

facTail !1 !n' = n'
facTail !n !n' = facTail (n-1) (n*n')

however the original example, even if we did something like the above it
still has major problems. Yes it is tail recursive and so it's not
taking any stack space, it is a true loop, but it's a loop that's
allocating a massive list! Let's look at the code again:

pass1 :: String -> String -> String
pass1 left [] = left
pass1 left ('<':right) = pass1 left (stripTagOrComment right)
pass1 left (' ':right) = pass1 left right
pass1 left (c:right) 
    | Set.member c punct = pass1 (' ':c:' ':left) right
    | otherwise          = pass1 (c:left) right

This may well be a perfect tail recursive loop but each iteration it's
allocating a cons cell. It doesn't return until it has consumed the
entire input and built the entire output. So if you run it on a 2TB file
then it's going to pull the whole lot into memory before returning
anything.

So as I said originally, this is a case where it pays to be lazy.

Duncan



More information about the Haskell-Cafe mailing list