Some good-hearted criticism

Juan Carlos Arevalo Baeza
Wed, 27 Jun 2001 13:32:54 -0700

At 12:30 PM 6/27/2001 +0100, Malcolm Wallace wrote:

> >     In any case, it's the stack that's the problem, not the heap.
>But the two are linked.  A heap profile shows everything that is
>reachable from the stack, hence a fuller stack will lead to a larger
>amount of heap being retained.  The kind of thing that is in the
>heap then points back in an indirect way to what might be holding
>onto it in the stack.

    Hmmm... I think I see your point.

>So, I generated a heap profile from Hugs, using -d20000 as the
>profiling option.  (There appears to be a small bug in Hugs, so the
>output is invalid.  However, it was pretty easy to patch it up in a
>text editor.)  I have attached the final PostScript graph below for you
>have have a look.

    Hmmm... I guess I'll have to look more closely into the Hugs 
documentation. I wasn't aware that it had profiling in it too... O:-) And, 
looking closer, this option is not mentioned anywhere, and it rejects the 
option. Hmmm... I have the February 2001 version, for Windows. Maybe I need 
to check for a new one?

>The first thing to notice is that the graph does not grow uniformly
>towards the right.  This is good, as it means the program is not
>retaining stuff unnecessarily long after it has been used.

    Actually, I was pretty certain about that already. It's good to confirm 
it, though.

>The second thing to notice is that the graph is extremely spiky.
>When the program fails with a stack overflow, at the far right,
>it is inside the deepest spike.  This suggests that part of your
>program has a very deeply nested recursive algorithm.

    Ok... I can see the problem, I think.

>Now, taking a look at the input file, I see that line 818 (where
>Hugs dies with a stack overflow) comes just before a C++ routine
>that is relatively long (60 lines).  I'd be prepared to bet that
>your parsing algorithm needs to get to the end of the individual C++
>routine before it can start returning any results at all.

    Actually, it doesn't parse the C++ code at all. Just strings and 
comments. But you figured that out for yourself...

>If ghc has a larger stack, then I would expect it to be able to
>cope with larger spikes, so my guess is that when it dies at line
>3000+, it is once again inside a C++ routine that is longer than
>any previously seen in the input file.

    Oh... Ouch! I see where I was totally wrong. I assumed (and "assuming" 
is bad as a matter of principle when you're debugging a program) that, 
because GHC has a larger stack and it the stack was being filled later in 
the program, the problem was somehow degenerative in principle (as you said 
above, it would've shown, at least, larger and larger spikes continuously 
growing over time). It didn't make much sense, though. I had already added 
"seq" all over the place, which should've brought the spikes towards the 
beginning of the program or eliminated them altogether.

>To test this hypothesis, I moved the routine from just after line 818
>back to earlier in the file, say line 95.  Lo and behold, Hugs now
>dies after line 95.
>And in moving the routine, I suddenly noticed that it is not a
>routine at all - it is all entirely within /* */ comments.  So,
>I tried uncommenting it.  Now, Hugs happily accepts this routine,
>and continues until we get another control stack overflow after line
>1588.  Guess what - the beginning of another long comment.

    It does, because code is parsed and divided in individual lines. But 
multi-line comments are not. Thus, it's like one huge line.

>| ... the problem is degenerative, so no matter how large I make the
>| stack, there can always be files large enough to fill it up.
>Well I think we've found the culprit.  Your parser is actually
>very well behaved for space usage in most respects, but the parse
>function for stripping comments is unnecessarily deeply recursive.
>I'll leave it up to you to find a fix.

    :) The ultimate offender is a new parser combinator I added:

manynot :: Parser p inp => p inp v -> p inp a -> p inp ([v],Maybe a)
manynot s p = mn p where mn p = force $ do a <- p
                                            return ([],Just a)
                                         do x <- s
                                            (xs,a) <- mn p
                                            return ((x:xs),a)
                                         return ([],Nothing)

    The solution is tremendously obvious: use tail recursion!

manynot :: Parser p inp => p inp v -> p inp a -> p inp ([v],Maybe a)
manynot s p = mn [] p where mn r p = force $ do a <- p
                                                 return (r,Just a)
                                              do x <- s
                                                 mn (r ++ [x]) p
                                              return (r,Nothing)

    Works beautifully: Hugs can now parse the whole file without a glitch.


Juan Carlos "JCAB" Arevalo Baeza    |
Senior Technology Engineer          |
Ronin Entertainment                 | ICQ: 101728263
                        (my opinions are only mine)
JCAB's Rumblings is so off-line O:-(