[Haskell-cafe] PROPOSAL: New efficient Unicode string library.

Twan van Laarhoven twanvl at gmail.com
Mon Sep 24 19:08:38 EDT 2007


Johan Tibell wrote:

> Dear haskell-cafe,
> 
> I would like to propose a new, ByteString like, Unicode string library
> which can be used where both efficiency (currently offered by
> ByteString) and i18n support (currently offered by vanilla Strings)
> are needed. I wrote a skeleton draft today but I'm a bit tired so I
> didn't get all the details. Nevertheless I think it fleshed out enough
> for some initial feedback. If I can get the important parts nailed
> down before Hackathon I could hack on it there.
> 
> Apologies for not getting everything we discussed on #haskell down in
> the first draft. It'll get in there eventually.
> 
> Bring out your Unicode kung-fu!
> 
> http://haskell.org/haskellwiki/UnicodeByteString

Have you looked at my CompactString library[1]? It essentially does 
exactly this, with one extension: the type is parameterized over the 
encoding. From the discussion on #haskell it would seem that some people 
consider this unforgivable, while others consider it essential.

In my opinion flexibility should be more important, you can always 
restrict things later. For the common case where encoding doesn't matter 
there is Data.CompactString.UTF8, which provides an un-parameterized 
type. I called this type 'CompactString' as well, which might be a bit 
unfortunate. I don't like the name UnicodeString, since it suggests that 
the normal string somehow doesn't support unicode. This module could be 
made more prominent. Maybe Data.CompactString could be the specialized 
type, while Data.CompactString.Parameterized supports different encodings.

A word of warning: The library is still in the alpha stage of 
development. I don't fully trust it myself yet :)

[1] http://twan.home.fmf.nl/compact-string/

Twan



More information about the Haskell-Cafe mailing list