<div dir="ltr"><br><br><div class="gmail_quote">On Thu, May 27, 2010 at 11:40 AM, Ivan Miljenovic <span dir="ltr"><<a href="mailto:ivan.miljenovic@gmail.com">ivan.miljenovic@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="im">On 27 May 2010 18:33, Michael Snoyman <<a href="mailto:michael@snoyman.com">michael@snoyman.com</a>> wrote:<br>
> I don't do any string concatenation (look closely), I was very careful to<br>
> avoid it. I tried with lazy text as well: it was slower. This isn't<br>
> surprising, since lazy text- under the surface- is just a list of strict<br>
> text. And the benchmark itself already has a lazy list of strict text. Using<br>
> lazy text would just be adding a layer of wrapping.<br>
> I don't know what you mean by "explicitly using Text values"; you mean<br>
> calling pack manually? That's really all that OverloadedStrings does.<br>
> You can try out lots of different variants on that benchmark. I did that<br>
> already, and found this to be the fastest version.<br>
<br>
</div>Fair enough. Now that I think about it, I recall once trying to have<br>
pretty generate Text values rather than String for graphviz (by using<br>
fullRender, so it was still using String under the hood until it came<br>
time to render) and it too was much slower than String (unfortunately,<br>
I didn't record a patch with these changes so I can't just go back and<br>
play with it anymore as I reverted them all :s).<br>
<br>
Maybe Bryan can chime in with some best-practices for using Text?<br>
<div><div></div><div class="h5"><br></div></div></blockquote><div>Here's my guess at an explanation for what's happening in my benchmark:</div><div><br></div><div>text will clearly beat String in memory usage, that's what it's designed for. However, the compiler is still generating String values which are being encoded to Text as runtime.</div>
<div><br></div><div>Now, this is the same process for bytestrings. However, bytestrings never have to be decoded: the IO routines simply read the character buffer. In the case of text, however, the encoded data must be decoded again to a bytestring.</div>
<div><br></div><div>In other words, here's what I think the three different benchmarks are really doing:</div><div><br></div><div>* String: generates a list of Strings, passes each String to a relatively inefficient IO routine.</div>
<div>* ByteString: encodes Strings one by one into ByteStrings, generates a list of these ByteStrings, and passes each ByteString to a very efficient IO routine.</div><div>: Text: encodes Strings one by one into Texts, generates a list of these Texts, calls a UTF-8 decoding function to decode each Text into a ByteString, and passes each resulting ByteString to a very efficient IO routine.</div>
<div><br></div><div>In the case of ASCII data to be output as UTF-8, uses the Data.ByteString.Char8.pack function will most likely always be the most efficient choice, and thus it seems like something BlazeHtml should support. I'm considering releasing a Hamlet 0.3 based entirely on UTF-8 encoded ByteStrings, but I'd also like to hear from Bryan about this.</div>
<div><br></div><div>Michael</div></div></div>