UTF-8 encode/decode libraries.

Wolfgang Jeltsch wolfgang at jeltsch.net
Tue May 4 19:28:50 EDT 2004


Am Dienstag, 4. Mai 2004 11:16 schrieb George Russell:
> Sven Panne wrote:
>  > Hmmm, "String -> [Word8]" would be nicer...
>
> My UTF8 encoder is
>     toUTF8 :: String -> String
> but an obvious alternative would be
>     toUTF8 :: Enum codedChar => String -> [codedChar]
> and I could implement this quite easily, by globally-exchanging
> chr with toEnum.  It would then be appropriate to SPECIALIZE
> to types String -> String and String -> [Word8], satisfying
> both the purists and those who actually want to write the
> output to a file.

Writing UTF-8 to a file should be done using binary output anyway, since UTF-8 
is a sequence of octets.  So Word8 would also be the way to go for the "file 
writers".

>  > ... and here: "[Word8] -> String" or "[Word8] -> Maybe String
>
> and my UTF8 decoder has type
>
>     fromUTF8WE :: Monad m => String -> m String
>
> Errors are reported by "fail".  If for example you import
> Control.Monad.Error that means you have a function returning
> either an error message or the converted string
>
>     fromUTF8WE :: String -> Either String String

I like this "error handling via monads" and use it myself a lot.

> Of course for Word8, you would change the type of the decoder to
>
>     fromUTF8WE :: (Monad m,Enum codedChar) => [codedChar] -> m String
>
> Incidentally I am *hoping* I shall be able to say that my UTF8 code
> is LGPL but you know what University administrators are like ...

Wolfgang



More information about the Libraries mailing list