String != [Char]

Thomas Schilling nominolo at googlemail.com
Sat Mar 24 23:39:08 CET 2012


On 24 March 2012 22:27, Ian Lynagh <igloo at earth.li> wrote:
> On Sat, Mar 24, 2012 at 05:31:48PM -0400, Brandon Allbery wrote:
>> On Sat, Mar 24, 2012 at 16:16, Ian Lynagh <igloo at earth.li> wrote:
>>
>> > On Sat, Mar 24, 2012 at 11:50:10AM -0700, Johan Tibell wrote:
>> > > Using list-based operations on Strings are almost always wrong
>> >
>> > Data.Text seems to think that many of them are worth reimplementing for
>> > Text. It looks like someone's systematically gone through Data.List.
>> > And in fact, very few functions there /don't/ look like they are
>> > directly equivalent to list functions.
>> >
>>
>> I was under the impression they have been very carefully designed to do the
>> right thing with characters represented by multiple codepoints, which is
>> something the String version *cannot* do.  It would help if Bryan were
>> involved with this discussion, though.  (I'm cc:ing him on this.)  Since
>> the whole point of Data.Text is to handle stuff like this properly I would
>> be surprised if your assertion that
>>
>> >     upcase :: String -> String
>> > >     upcase = map toUpper
>> >
>> > This is no more incorrect than
>> >    upcase = Data.Text.map toUpper
>>
>> is correct.
>
> I don't see how it could do any better, given both use
>    toUpper :: Char -> Char
> to do the hard work. That's why there is also a
>    Data.Text.toUpper :: Text -> Text
>
> Based on a very quick skim I think that there are only 3 such functions
> in Data.Text (toCaseFold, toLower, toUpper), although the 3
> justification functions may handle double-width characters properly.
>
>
> Anyway, my main point is that I don't think that either text or String
> should make it any easier for people to get things right. It's true that
> currently only text makes correct case-conversions easy, but only
> because no-one's written Data.String.to* yet.

The reason Text uses UTF16 internally is so that it can be used with
the ICU library (written in C, I think) which implements all the
difficult things (http://hackage.haskell.org/package/text-icu).
Reimplementing all that in Haskell would be a significant undertaking.
 You could do the same for String, but that would have to encode and
re-encode on each invokation.

BTW, I checked the version history of the text package and most of the
list functions existed already in Tom Harper's version that text was
based on in 2009.  If you look at the documentation you can see that
many of the list-like functions treat some invalid characters
specially, so they are different.



More information about the Haskell-prime mailing list