String != [Char]

Thomas Schilling nominolo at googlemail.com
Sat Mar 24 23:58:43 CET 2012


On 24 March 2012 22:33, Freddie Manners <f.manners at gmail.com> wrote:
> To add my tuppence-worth on this, addressed to no-one in particular:
>
> (1) I think getting hung up on UTF-8 correctness is a distraction here.  I
> can't imagine anyone suggesting that the C/C++ standards removed support for
> (char*) because it wasn't UTF-8 correct: sure, you'd recommend people use a
> different type when it matters, but the language standard itself shouldn't
> be driven by technical issues that don't affect most people most of the
> time.  I'm sure it's good engineering practice to worry about these things,
> but the standard isn't there to encourage good engineering practice.

It doesn't really have anything to do with UTF-8.  UTF-8 is just a
particular serialisation of a unicode string.

Here's a simple illustration of the problems one faces:  Let's say you
want to search for the string "fix".  Now, the problem is that the
sequence 'f','i' could be represented both as ['f', 'i'] or as [chr
0xfb01] (the "fi" ligature).  The text-icu package provides a function
to normalise a string such that only one of these forms can occur in
each string.  Because the world's languages are rather complex there
are many more such cases which need to be handled properly (if you
don't want to run into weird corner cases).

> (2) I'd suggest that a proposal that advocated overloaded string literals --
> of which [Char] was an option -- couldn't be much more confusing from a
> pedagogical perspective than the fact that numeric literals are overloaded.
>  Since that seems to be one of the main biases in favour of [Char] in the
> current standard, that might be a possible incremental fix.

I agree that this proposal should probably include the standardisation
of the OverloadedStrings extension.

>
> Best,
> Freddie
>
>
> On 24 March 2012 22:15, Ian Lynagh <igloo at earth.li> wrote:
>>
>> On Sat, Mar 24, 2012 at 08:38:23PM +0000, Thomas Schilling wrote:
>> > On 24 March 2012 20:16, Ian Lynagh <igloo at earth.li> wrote:
>> > >
>> > >> Correctness
>> > >> ==========
>> > >>
>> > >> Using list-based operations on Strings are almost always wrong
>> > >
>> > > Data.Text seems to think that many of them are worth reimplementing
>> > > for
>> > > Text. It looks like someone's systematically gone through Data.List.
>> >
>> > That's exactly what happened as part of the platform inclusion
>> > process.  In fact, there was quite a bit of bike shedding whether the
>> > Text API should be compatible with the list API or not.  In the end
>> > the decision was made to add all the list functions even if that
>> > encouraged running into unicode issues.  I'm pretty sure you
>> > participated in that discussion.
>>
>> As far as I remember, a few functions were added to text and bytestring
>> during that, but mostly the discussion was about naming.
>>
>> Even in the first 0.1 release of bytestring:
>>
>>  http://hackage.haskell.org/packages/archive/text/0.1/doc/html/Data-Text.html
>> there is a large amount of Data.List covered, e.g. map, transpose,
>> foldl1', minimum, mapAccumR, groupBy.
>>
>>
>> Thanks
>> Ian
>>
>>
>> _______________________________________________
>> Haskell-prime mailing list
>> Haskell-prime at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-prime
>
>
>
> _______________________________________________
> Haskell-prime mailing list
> Haskell-prime at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-prime
>



-- 
Push the envelope. Watch it bend.



More information about the Haskell-prime mailing list