[Gtk2hs-devel] A new lexer/parser for c2hs [was: [C2hs] Re: support for 6.4]

Duncan Coutts duncan.coutts at worc.ox.ac.uk
Tue May 31 04:21:02 EDT 2005


On Tue, 2005-05-31 at 08:15 +0100, Axel Simon wrote:
> On Mon, 2005-05-30 at 19:18 +0100, Duncan Coutts wrote:
> 
> [..]
> > Going back to the lexer, it now produces exactly the same output as the
> > original lexer (including positions and unique names). Sadly it seems to
> > have got quite a bit slower for reasons I don't quite understand. In
> > particular making it monadic (which we need to do because of) seems to
> > make it rather slower. It is now taking 6 seconds rather than 2 and so
> > is now only a little faster that the original lexer. Though on the
> > positive side it means that if the lexer is taking 6 out of the 8 second
> > total then the parser is only taking 2 seconds which is quite good.
> 
> Ok, I'm impressed, too. But was the parser the culprit? It did use a lot
> of space, but then most of the time in our current setup is spent in
> serialisation. So if I understand your intention you mainly try to
> improve the memory footprint, not the compilation time?

Basically yes. The real problem was the memory use. The existing parser
was taking 270Mb for the Gtk+ headers while this new one now takes 29Mb.
I've tried integrating this parser into c2hs and overall, producing the
precomp file now runs in 80Mb of heap space. In fact a significant
minority of that space is only required during the serialisation, the
name analysis phase only pushes the memory requirements up to 50Mb or
so. (I may be wrong about that, it may be that the serialisation is
simply forcing the result of the name analysis which thereby increases
the heap use.)

The slowness of the serialisation is a seperate problem. But reducing
the memory requirements of the other phases makes even that part faster.
On my fast athlon it used to take about a minute to generate the Gtk+
precomp file (and 380Mb). It now takes 13 seconds (and 80Mb). I guess
the improvement to the time taken to do the serialisation is mostly from
having to do less GC.

There's still some small difference in the precomp file which I have not
yet tracked down (but in my earlier parser tests, the AST seems to be
exactly the same, right down to the source locations and unique names).

So I think it's worth trying to get this done for the 0.9.8 gtk2hs
release. That should provide reasonable testing and then we can create
patches for the mainline c2hs.

Duncan



More information about the C2hs mailing list