On Sun, Oct 9, 2011 at 12:11 PM, Roman Beslik <span dir="ltr">&lt;<a href="mailto:beroal@ukr.net">beroal@ukr.net</a>&gt;</span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


Yes, if you do not use high-level concepts and optimize everything by hand, it requires a lot of testing. :)<br>

</blockquote></div><br>There are probably more constructive, jibe-free ways to frame this suggestion...<br><br>Regarding testing:  my preference for using a preexisting solution is a product of 18 years of programming in Scheme without a large base of shared infrastructure -- I&#39;ve seen way too much &quot;roll your own X&quot; leading to trouble.<br>


<br>Regarding high-performance data-structures in Haskell: I wish high-level concepts were sufficient for their optimization.  But if you look at all the tricks played by, for example, Johan Tibell and Greg Collins in their excellent hashmaps and hashtables libraries, that, alas, seems not to be the case yet.  GHC is in a good position to do inlining and specialization (making the world safe for type classes), but it can&#39;t add unpack and strictness annotations, nor can it change data representations themselves.<br>


<br>For example, to answer Yves question:<br><br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">I fail to understand. Why not just:<br>


&gt; data BitList b = Nil | BitList Int b (BitList b)<br>??<br></blockquote><div><br>That was a &quot;data structure unrolling&quot; to optimize the memory representation in the common case (&lt;64 bits).  Starting with:<br>


<br><span style="font-family: courier new,monospace;"> type I = Int64 -- or whatever<br> data BitList = Nil | BL Int I BitList</span><br><br>The recursive datatype can be inlined (once):<br><br><span style="font-family: courier new,monospace;"> data BitList = Nil | BL  Int I (Nil | BL Int I BitList) </span><i style="font-family: courier new,monospace;">-- not real syntax</i><br style="font-family: courier new,monospace;">


<span style="font-family: courier new,monospace;"> data BitList = Nil | BL2 Int I Nil | BL3 Int I Int I BitList</span><i style="font-family: courier new,monospace;"> -- distribute</i><br style="font-family: courier new,monospace;">


<span style="font-family: courier new,monospace;"> data BitList = Nil | BL2 Int I     | BL3 Int I Int I BitList </span><i style="font-family: courier new,monospace;">-- prune Nil</i><br><br>This unrolled data structure has two advantages.  It can directly represent the common case &lt;64 bits with one object, and it can use half the tail pointers for longer lists.  GHC could conceivably transform code automatically to enable this unrolling (but it can&#39;t now).<br>


<br>However, there are some further observations that really require a human.  Because we are using that extra Int to track the bit position inside the &quot;I&quot; the Nil case is redundant -- &quot;BL2 0 0&quot; can represent empty.  Further one of the Ints in the BL3 case is always 64 (sizeof I) and needn&#39;t be represented.  That gives us:<br>


<br><span style="font-family: courier new,monospace;"> data BitList = BL2 Int I | BL3 Int I I BitList </span><i style="font-family: courier new,monospace;">-- prune Nil</i><br><br>Which is pretty much what I used.  Actually, I skipped the &quot;double wide&quot; second case because I was really only worried about simplifying the representation for shorter lists and that would indeed complicate the code.<br>


<br>&gt; data <b>(Bits b) =&gt;</b> BitList b <br><br>FYI, in the bit of code I sent I didn&#39;t generalize over the Bits class because it&#39;s really an implementation detail what size chunk is used (just like in Lazy ByteStrings).  I didn&#39;t want to pollute the interface.  That said, the code should certainly be &quot;CSE&quot;d to make the &quot;64/Int64&quot; choice swappable.<br>


<br>Best regards,<br>  -Ryan<br><br></div></div>