[C2hs] Profiling c2hs

Duncan Coutts duncan.coutts at worcester.oxford.ac.uk
Wed May 12 10:54:04 EDT 2004


Hi Manuel,

On Wed, 2004-05-12 at 05:35, Manuel M T Chakravarty wrote:
> >  At the moment
> > this is done by after parsing each declaration (which might contain
> > typedefs) modifying the remaining stream of tokens changing all
> > identifiers that have been discovered to be typedefs from ordinary
> > identifier tokens to typedef-identifier tokens. This is expensive.
> 
> To be honest, I am not entirely convinced that the basic idea of
> modifying the token stream to rewrite identifier tokens is that
> expensive.  I suspect it is the naive way in which morphTypeNames does
> the job that's the problem.

The set of new type names passed to morphTypeNames after each
parseCExtDEclList is quite small. So small that using lists and elem is
faster than using Sets and elemSet (I tried this).

> > Perhaps it would be possible to add this set of typedef identifiers to
> > the parser state monad and thread it through to the parsers for cid and
> > typedefName which could consult this parser state and match
> > conditionally on the identifier token being in the typedef-identifier
> > set or not. (cidOrTN would not need to to a lookup) This would avoid
> > traversing the token stream after parsing each declaration.
> 
> I guess, it would be possible to interleave this with the parser, but
> what I like about the current setup is that it is more modular.  I don't
> think that it is a problem to traverse the token stream.  

True, it looks like it would be difficult to interleave in the manner I
suggested anyway because as far as I can tell, the parser combinators
don't make it easy to do a conditional parse - conditional on some
internal monad state. Swierstra & Duponcheel's combinator isn't a monad
right? And doesn't allow you to thread state from earlier in the token
stream to later.

> The problem with the current setup is that morphTypeNames is interleaved
> with parseCExtDeclList; ie, with each call to parseCExtDEclList that
> reads so a typedef one more morphTypeNames-filter is added to the token
> stream.  In other words, after N typedefs, each token passes through N
> instances of morphTypeNames.  Instead, we want a single instance of
> morphTypeNames that collects all the typedef'd names in a set and
> rewrites token accordingly.  If the set of represented as a balanced
> tree (ie, using the "Set" module in "base/general/Sets.hs") we be able
> to remove a factor of log N from the asymptotic complexity.  This would
> be a much more localised change than changing the parser state.
> 
> What do you think?

If it can be done with a single pass of morphTypeNames that simply
accumulates a set of type names that need to be changed, I imagine that
would have similar speed improvements to the thing I suggested.

It's not clear how that might be accomplished in the current scheme
where we don't know the typedef'ed names until after parsing each
declaration.

How difficult is it to pick out typedef'ed names from the token stream
(ie without full parsing)? Then we could do it in a single lazy pass
over the token stream between the lexer and the parser.

Perhaps it could be done in the lexer itself, if the lexer is monadic.

Duncan



More information about the C2hs mailing list