[Haskell-cafe] RFC: demanding lazy instances of Data.Binary

Duncan Coutts duncan.coutts at worc.ox.ac.uk
Tue Nov 20 05:28:09 EST 2007


On Mon, 2007-11-19 at 20:22 -0600, Nicolas Frisby wrote:
> On Nov 19, 2007 4:16 PM, Duncan Coutts <duncan.coutts at worc.ox.ac.uk> wrote:
> >
> > On Mon, 2007-11-19 at 13:39 -0800, Don Stewart wrote:
> > > nicolas.frisby:
> 
> *snip*
> 
> > > >
> > > >    1) The fact that serialisation is fully strict for 32760 bytes but not for
> > > >    32761 makes the direct application of strictCheck intractable. Do you have
> > > >    any ideas how to circumvent that?
> >
> > Test using a much smaller chunk size. I'd test sizes from 1 to something
> > like one more than the machine word size.
> >
> 
> Let me clarify "circumvent that." strictCheck uses a bounded search
> starting at 1 and proceeds to some limit. The Binary instance used in
> the example was the fully lazy one for lists: a get/putWord8 for each
> constructor. Even so, it was effectively spine-strict up to 32k bytes
> (which was 32k elements b/c of the use of unit) because (I think that)
> the first chunk of the lazy bytestring wasn't being generated by
> encode until it was full. If you asked strictCheck to go from 1 to
> 32k, I don't think it would finish. So by circumvent, I mean: How can
> we apply the essential ideas of strictCheck when our chunks are so
> big?

We don't. We test it with a variety of small chunk sizes. That is the
sensible thing to do.

> Obviously, the iterative search cannot just proceed by one
> element at a time; but then we lose the obvious meaning of "add one
> more _|_. I don't see an obvious candidate for how to alter the
> _|_-ridden test vector generation. Moreover, it's "proposed output" is
> wrong when considered from the Big Chunks perspective--we don't
> necessarily want Chitil's least strictness.

Indeed. As I've said, Data.Binary is lazy but in a chunky way where
within each chunk it is strict.

> > Sequences are like spine strict lists. Sets are strict in as much as the
> > element type's (==) function is strict.
> >
> Let me refine how I posed that question. A predicate: if you enter
> "Package.fromList [1..]" at the ghci prompt and you get no
> intermediate results, then that was a "strict type." 

Right, because (==) for Int is strict, and Set.fromList uses (==) on
each element. Sorry, I was just being pedantic by saying that it depends
on the strictness of (==).

> I'm assuming that if the Show instance doesn't produce intermediate
> results, then the serialisation technique can't handle intermediate
> results (i.e. chunking) either--at least not in a general enough way
> to include it in a library.

So if you did this test with my proposed list instance (and you somehow
slowed your computer right down so you could see what was going on)
you'd see it wait a sec, then print out 32k of serialised list elements,
then wait again and emit another chunk. So lazy, but in strict chunks.

Duncan


More information about the Haskell-Cafe mailing list