lazy ByteStrings: toChunks

Duncan Coutts duncan.coutts at worc.ox.ac.uk
Wed Jan 24 07:13:06 EST 2007


On Wed, 2007-01-24 at 11:58 +0000, Ross Paterson wrote:
> On Tue, Jan 23, 2007 at 09:02:31PM +0000, Duncan Coutts wrote:
> > On Wed, 2007-01-24 at 07:32 +1100, Donald Bruce Stewart wrote:
> > > ross:
> > > > toChunks exposes the implementation, and so shouldn't be in the public
> > > > interface, should it?  There could be a function from lazy to ordinary
> > > > ByteStrings (B.concat . toChunks), though.
> > 
> > No, I don't think it exposes the implementation that much. In particular
> > we could change the internal representation from a list of chunks to a
> > tree of chunks or a element-strict list of chunks without breaking the
> > toChunks function.
> 
> OK, but it does break the abstraction (list of bytes), while B.concat .
> toChunks doesn't.  I think that qualifies it as an internal interface.

How about the other way around? fromChunks would be terrible if it had
to go via a single strict chunk. It'd loose all the laziness and force
everything into memory.

Given a toStrict function, toChunks could be implemented anyway, just
not as efficiently. You can do it via unfolding uses of take/drop and
applying the toStrict function to each bit. So you can implement one in
terms of the other. So I'm claiming that toChunks is the better
primitive.

> > > That seems reasonable. All uses I've ever had for toChunks involve also
> > > concat'ing them.
> > 
> > This is indeed the most common use however libraries like zlib/bzlib
> > compression, charset conversion, encryption, (de)serialisation etc that
> > need to work on contiguous chunks of memory need to be able to get at
> > the chunks.
> 
> Can you point at some examples?  I had a quick look, but couldn't find
> any uses of toChunks not preceded by concat.

Well at the moment most of these just import the internal Base module
and the LPS constructor, but that doesn't mean they should!

> I would expect most of those examples to operate on substrings that
> might span chunk boundaries.

No, because they need access to contiguous chunks of memory. Some
because they're calling out to C libs that expect contiguous chunks (eg
zlib, bzlib, iconv) and others like the binary deserialisation want
contiguous chunks for efficiency so that it becomes possible to
common-up bounds checks rather than doing a bounds check for each byte
as head/tail must do.

> > The only other thing they can do is to import the internal
> > module and get at the LPS constructor which is more evil and will break
> > if we change the underlying representation (and I do intend to
> > experiment with making the lazy byte string rep use element-strict lists
> > to remove one indirection).
> 
> The internal module could offer different levels of interface, though.

Yeah it could.

Duncan



More information about the Libraries mailing list