[Haskell-cafe] Copying Arrays

Ketil Malde ketil at malde.org
Fri May 30 04:38:22 EDT 2008


"Johan Tibell" <johan.tibell at gmail.com> writes:

>> I guess this is where I don't follow: why would you need more short
>> strings for Unicode text than for ASCII or 8-bit latin text?

> But ByteStrings are neither ASCII nor 8-bit Latin text! 
  [...] 
> The intent of the not-yet-existing Unicode string is to represent
> text not bytes. 

Right, so this will replace the .Char8 modules as well?  What confused
me was my misunderstanding Duncan to mean that Unicode text would
somehow imply shorter strings than non-Unicode (i.e. 8-bit) text.

> To give just one example, short (Unicode) strings are common as keys
> in associative data structures like maps

I guess typically, you'd break things down to words, so strings of
lenght 4-10 or so.  BS uses three words and LBS four (IIRC), so the
cost of sharing typically outweighs the benefit.

> Can I also here insert a plea for keeping lazy I/O out of the new
> Unicode module?

I use ByteString.Lazy almost exclusively.  I realize it there's a
penalty in time and space, but the ability to write applications that
stream over multi-Gb files is essential.

Of course, these applications couldn't care less about Unicode, so
perhaps the usage is different.

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants


More information about the Haskell-Cafe mailing list