Haskell Platform Proposal: add the 'text' library

Ian Lynagh igloo at earth.li
Wed Sep 8 10:18:41 EDT 2010


On Tue, Sep 07, 2010 at 11:21:19PM +0100, Duncan Coutts wrote:
> On 7 September 2010 22:50, Ian Lynagh <igloo at earth.li> wrote:
> 
> > I compared the API of Data.Text and Data.ByteString.Char8 and found a
> > number of differences:
> 
> Many of these are deliberate and sensible.

Some at least seem just gratuitously different, e.g.:

BS:   break :: (Char -> Bool) -> ByteString -> (ByteString, ByteString)
      breakSubstring :: ByteString -> ByteString -> (ByteString, ByteString)
Text: break :: Text -> Text -> (Text, Text)
      breakBy :: (Char -> Bool) -> Text -> (Text, Text)

> The thing with text as
> opposed to lists/arrays is that almost all operations you want to do
> are substring based and not element based. A Unicode code point (a
> Char) is sadly only roughly related to the human concept of a
> character. In particular there are combining characters. So even if
> you want to search or split on a particular "character" that may mean
> searching for a short sequence of Chars / code points.

Hmm, wouldn't you want to be able to break on
    either
        <a-with-umlaut>
    or
        <a> <umlaut combining character>
in that case?

Also, even if the intention is that you
    break [<a>, <umlaut combining character>]
people will still use it for other things, e.g.
    break "END FOO"
and wonder why they are not able to do likewise with bytestring.

Even if there is a case where you would want different behaviour in the
two packages, I think it would be bettre if the function names weren't
the same.

> > I think the two APIs ought to be brought into agreement.
> 
> Perhaps. If so, then it is the ByteString.Char8 that ought to be
> brought into agreement with Text, not the other way around.

I don't have an opinion on what the APIs should look like; I'd just like
them to be consistent.

> > There are a number of other differences which probably want to be tidied
> > up (mostly functions which are in one package but not the other,
> 
> What are you thinking of specifically?

There are a number of them:

In Text only:
    center, chunksOf, dropAround, dropWhileEnd, justifyLeft,
    justifyRight, partitionBy, prefixed, replace, strip, stripEnd,
    stripStart, suffixed, compareLength, toCaseFold, toLower, toUpper

In BS only:
    copy, elem, elemIndex, elemIndexEnd, elemIndices, findIndices,
    findSubstring, findSubstrings, foldr', foldr1', notElem, readInt,
    readInteger, sort, unzip

> > ByteString has IO functions mixed in with the non-IO functions,
> 
> Which I don't think was a good idea. I would prefer to split them up.

Agreed, but I would like us to move towards consistency.


Thanks
Ian



More information about the Libraries mailing list