<br><br><div class="gmail_quote">On Fri, Mar 26, 2010 at 10:52 PM, Mads Lindstrøm <span dir="ltr">&lt;<a href="mailto:mads_lindstroem@yahoo.dk">mads_lindstroem@yahoo.dk</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

Hi<br>

<div class="im"><br>

On Fri, 2010-03-26 at 21:33 +0000, Sebastian Sylvan wrote:<br>

<br>

&gt;<br>

&gt;<br>

&gt; Reorganizing data on the fly sounds like it may be a pretty sensible<br>

&gt; idea now that cache misses are so bad (in comparison). The fact that<br>

&gt; Haskell data is generally immutable helps too.<br>

&gt; However, I think your scheme sounds a bit complicated for my tastes,<br>

&gt; and I don&#39;t really understand the idea of &quot;lengths&quot;, what would &quot;n<br>

&gt; consecutive elements&quot; mean for a tree? Surely this wouldn&#39;t work just<br>

&gt; for list-like data structures?<br>

<br>

</div>First of all, I had assumed that a data type&#39;s memory footprint was<br>

independent of its value. Similarly to how a C-union always occupies the<br>

same amount of memory. After reading your post, I reevaluated my<br>

assumption and my assumption might just be wrong.<br></blockquote><div><br></div><div>Take the Maybe data type. The Nothing value occupies no space other than the Nothing-tag, the Just part occupies more (potentially a lot more, if it too has been collapsed!). It&#39;s conceivable that you would force the size to be the same a la C&#39; unions, and have some thresholding where if one of the possible alternatives is too big then it would always use a pointer for that one alternative, to minimize padding.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

What do &quot;heavily quantized lookup table&quot; mean? It is the &quot;heavily<br>

quantized&quot; part I do not get.<br></blockquote><div><br></div><div>E.g. store a small number of bits of offset information per field in a record at the top, let&#39;s say 8 bits per field. Now, that means you can have 256 unique values, and therefore fields, but it nothing says that the offset value needs to refer to &quot;bytes&quot;, it could just as easily refer to &quot;multiples of 16 bytes&quot; for larger structures. This means you would need to align all your fields to 16-byte boundaries, which may need padding, but lets you refer to fields with a larger offset without using a full pointer per field. </div>

<div><br></div><div>Maybe your gain is wasted by doing too much logic here. Perhaps a more conservative suggestion would be to keep the pointers, so the runtime code is the same, but store a little tag indicating that a value has an optional memory extent for the purposes of GC. This way you could compact the leaves of a data graph (still keeping the pointers), which lets the GC stop following pointers once it hits the tag and reads the memory extent saying &quot;this is a some data totalling 612 bytes, and I promise that any pointers in this memory block are all internal to the block&quot;. You&#39;d reduce GC work, and more importantly cache misses, but not overall memory usage.</div>

</div><br>-- <br>Sebastian Sylvan<br>