ANN: unicode-properties 3.2.0.0, unicode-names 3.2.0.0

Ashley Yakeley ashley at semantic.org
Tue Sep 2 00:54:38 EDT 2008


unicode-properties 3.2.0.0, unicode-names 3.2.0.0

These two packages are representations in Haskell of various data in the 
Unicode 3.2.0 Character Database. Unicode 3.2.0 was the latest version 
of the Unicode standard at the time I wrote most of the code; later I 
may move the packages to the latest version (currently 5.1.0).

The unicode-properties package contains functions to determine general 
category, case, and a wide range of other properties, as well as to do 
decomposition and case-folding.

The unicode-names package contains just one function, getCharacterName, 
for getting the name of a character. It's separated out because it's a 
sufficiently large proportion of the total data.

Both packages use the type "Char" to represent Unicode characters (more 
pedantically, codepoints). In GHC Char has the range 
['\x0'..'\x10FFFF'], matching the Unicode standard. The packages won't 
work with compilers that restrict Char to a smaller range.

Hackage:
<http://hackage.haskell.org/cgi-bin/hackage-scripts/package/unicode-properties>
<http://hackage.haskell.org/cgi-bin/hackage-scripts/package/unicode-names>

Source for both packages: <http://code.haskell.org/unicode-properties/>
Most of the data is auto-generated at build time from files downloadable 
from the Unicode web-site.

I expect Don will have them both in Arch Linux within the hour.

-- 
Ashley Yakeley



More information about the Libraries mailing list