[Haskell-cafe] Attoparsec concatenating combinator

Yitzchak Gale gale at sefer.org
Fri Jun 3 11:52:32 CEST 2011


Bryan O'Sullivan wrote:
> I'd like a no-copy combinator for the same reasons, but I think it's
> impossible to do without some low-level support.

I wrote:
>> ...does the internal representation easily admit such a combinator?

> Not very easily. Internally, attoparsec maintains just three pieces of data
> for its state... If
> there was a "bytes consumed" counter, it would be possible to write a
> "try"-like combinator

I was thinking of even lower level: allocating a moderate chunk of
memory and writing the results directly into it consecutively as a
special case.

I think Data.ByteString.Internal.create might do the trick.
In fact, some of the existing basic attoparsec combinators,
like takeWhile, could use that kind of treatment. The question
is whether you want to dip that low into the ByteString
implementation.

Part of the problem is that there doesn't seem to be any way
to allocate contiguous memory in GHC and then release only
part of it. So even ByteString itself is doing extra copying.
That is another reason why I think there may be some serious
performance gains to be had by exposing those internals in
attoparsec.

[Duncan: Did you notice that the Haddocks for
Data.ByteString.Internals and a few others haven't
been building lately?]

Thanks,
Yitz



More information about the Haskell-Cafe mailing list