[Haskell-cafe] Garbage collecting pointers

Sebastian Sylvan sebastian.sylvan at gmail.com
Fri Mar 26 20:18:04 EDT 2010


On Fri, Mar 26, 2010 at 10:52 PM, Mads Lindstrøm
<mads_lindstroem at yahoo.dk>wrote:

> Hi
>
> On Fri, 2010-03-26 at 21:33 +0000, Sebastian Sylvan wrote:
>
> >
> >
> > Reorganizing data on the fly sounds like it may be a pretty sensible
> > idea now that cache misses are so bad (in comparison). The fact that
> > Haskell data is generally immutable helps too.
> > However, I think your scheme sounds a bit complicated for my tastes,
> > and I don't really understand the idea of "lengths", what would "n
> > consecutive elements" mean for a tree? Surely this wouldn't work just
> > for list-like data structures?
>
> First of all, I had assumed that a data type's memory footprint was
> independent of its value. Similarly to how a C-union always occupies the
> same amount of memory. After reading your post, I reevaluated my
> assumption and my assumption might just be wrong.
>

Take the Maybe data type. The Nothing value occupies no space other than the
Nothing-tag, the Just part occupies more (potentially a lot more, if it too
has been collapsed!). It's conceivable that you would force the size to be
the same a la C' unions, and have some thresholding where if one of the
possible alternatives is too big then it would always use a pointer for that
one alternative, to minimize padding.


> What do "heavily quantized lookup table" mean? It is the "heavily
> quantized" part I do not get.
>

E.g. store a small number of bits of offset information per field in a
record at the top, let's say 8 bits per field. Now, that means you can have
256 unique values, and therefore fields, but it nothing says that the offset
value needs to refer to "bytes", it could just as easily refer to "multiples
of 16 bytes" for larger structures. This means you would need to align all
your fields to 16-byte boundaries, which may need padding, but lets you
refer to fields with a larger offset without using a full pointer per
field.

Maybe your gain is wasted by doing too much logic here. Perhaps a more
conservative suggestion would be to keep the pointers, so the runtime code
is the same, but store a little tag indicating that a value has an optional
memory extent for the purposes of GC. This way you could compact the leaves
of a data graph (still keeping the pointers), which lets the GC stop
following pointers once it hits the tag and reads the memory extent saying
"this is a some data totalling 612 bytes, and I promise that any pointers in
this memory block are all internal to the block". You'd reduce GC work, and
more importantly cache misses, but not overall memory usage.

-- 
Sebastian Sylvan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100326/ed227d80/attachment.html


More information about the Haskell-Cafe mailing list