[Haskell] ANNOUNCE: FPS - FastPackedStrings 0.2

John Meacham john at repetae.net
Wed Apr 19 17:52:53 EDT 2006


On Wed, Apr 19, 2006 at 06:04:58PM +0400, Bulat Ziganshin wrote:
> 1. it don't support Unicode. there are at least two libs (Simon's and from
> JHC) that uses UTF-8 to do this. of course, they will be not so
> efficient on some operations. i think that it is essential to
> general-purpose library and Data.PackedString replacement. Simon's lib
> already implements utf-8, latin-1, ucs-2 and ucs-4 encoding. may be
> it's possible to join them all together in one lib that uses
> prerocessing or some other technique to implement differences between
> utf-8 and fixed-width encoding

Indeed. I was excited about the prospect of using FastPackedString until
I saw it didn't support the full character range. that is too bad.

I'd recommend just always using utf8 under the hood (since it shouldn't
matter what encoding is used internally) and have two integers stored
with the pointer, the number of bytes and the number of characters. when
these are the same you know you have straight ASCII, plus it gives you
O(1) length for free. I have very optimized utf8 fold operators in the
jhc version of PackedString you can steal. they get speed by assuming
the data is always valid UTF8 so don't do error checking, which the
constructors always enforce. (yay for ADTs)


        John

-- 
John Meacham - ⑆repetae.net⑆john⑈


More information about the Libraries mailing list