<div class="gmail_quote">On Thu, May 27, 2010 at 10:53 AM, Michael Snoyman <span dir="ltr">&lt;<a href="mailto:michael@snoyman.com">michael@snoyman.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<div dir="ltr"><div class="gmail_quote"><div>In other words, here&#39;s what I think the three different benchmarks are really doing:</div><div><br></div><div>* String: generates a list of Strings, passes each String to a relatively inefficient IO routine.</div>


<div>* ByteString: encodes Strings one by one into ByteStrings, generates a list of these ByteStrings, and passes each ByteString to a very efficient IO routine.</div><div>: Text: encodes Strings one by one into Texts, generates a list of these Texts, calls a UTF-8 decoding function to decode each Text into a ByteString, and passes each resulting ByteString to a very efficient IO routine.</div>


</div></div></blockquote><div><br>If Text used UTF-8 internally rather than UTF-16 we could create Texts from string literals much more efficiently, in the same manner as done in Char8.pack for bytestrings:<br><br>    {-# RULES<br>


       &quot;FPS pack/packAddress&quot; forall s .<br>          pack (unpackCString# s) = inlinePerformIO (B.unsafePackAddress s)<br>     #-}<br><br>This rule skips the creation of an intermediate String when packing a string literal by having the created ByteString point directly to the memory GHC allocates (outside the heap) for the string literal. This rule could be added directly to a builder monoid for lazy Texts so that no copying is done at all. In addition, if Text was internally represented using UTF-8 encodeUtf8 would be free.<br>


<br>Johan<br><br></div></div>