String != [Char]

Mon Mar 26 13:26:47 CEST 2012

On Mon, Mar 26, 2012 at 5:08 AM, Christian Siefkes
<christian at siefkes.net> wrote:
> On 03/26/2012 02:39 AM, Gabriel Dos Reis wrote:
>> True, but should the language definition default to a string type
>> that is one the most unsuited for text processing in the 21st
>> century where global multilingualism abounds?  Even C has qualms
>> about that.
> ...
>> I have no doubt believing that if all texts my students have to
>> process are US ASCII, [Char] is more than sufficient.  So, I have
>> sympathy for your position.  However,  I doubt [Char] would be
>> adequate if I ask them to shared texts from their diverse cultures.
>
> Uh, while a C char is (usually) just a byte (2^8 bits of information, like
> Word8 in Haskell), a Haskell Char is a Unicode character (2^21 bits of
> information).

It is not the precision of Char or char that is the issue here.
It has been clarified at several points that Char is not a Unicode character,
but a Unicode code point.  Not every Unicode code point represents a
Unicode code character, and not every sequence of Unicode code points
represents a character or a sequence of Unicode character.

> A single C char cannot contain arbitrary Unicode character,
> while a Haskell Char can, and does. Hence [Char] is (efficiency issues
> aside) perfectly adequate for dealing with texts written in arbitrary languages.

See above.

-- Gaby