Haskell Platform Proposal: add the 'text' library

Wed Sep 8 06:33:03 EDT 2010

I'd like to first say that I'm very impressed with Ian's thoroughness of
review.

On the API differences between Data.Text and Data.ByteString.Char8, I agree
with Duncan that the Data.Text API is more natural for text-oriented work,
although I'm slightly uncomfortable with the similarities between Data.Text
and Data.List.  Everything works the same, until it doesn't because of a
minor API change you didn't notice.

Would it be useful to list the API incompatibilities in the docs, either as
a list or at each relevant function?  Or would that just be extra noise?

John

> > I compared the API of Data.Text and Data.ByteString.Char8 and found a
> > number of differences:
>
> Many of these are deliberate and sensible. The thing with text as
> opposed to lists/arrays is that almost all operations you want to do
> are substring based and not element based. A Unicode code point (a
> Char) is sadly only roughly related to the human concept of a
> character. In particular there are combining characters. So even if
> you want to search or split on a particular "character" that may mean
> searching for a short sequence of Chars / code points.
>
> So where the ByteString API followed the List api by being byte
> oriented, the Text API is substring oriented.
>
> > BS: Â  break :: (Char -> Bool) -> ByteString -> (ByteString, ByteString)
> > Â  Â  Â breakEnd :: (Char -> Bool) -> ByteString -> (ByteString,
> ByteString)
> > Â  Â  Â breakSubstring :: ByteString -> ByteString -> (ByteString,
> ByteString)
> > Text: break :: Text -> Text -> (Text, Text)
> > Â  Â  Â breakEnd :: Text -> Text -> (Text, Text)
> > Â  Â  Â breakBy :: (Char -> Bool) -> Text -> (Text, Text)
> >
> > BS: Â  count :: Char -> ByteString -> Int
> > Text: count :: Text -> Text -> Int
> >
> > BS: Â  find :: (Char -> Bool) -> ByteString -> Maybe Char
> > Text: find :: Text -> Text -> [(Text, Text)]
> > Â  Â  Â findBy :: (Char -> Bool) -> Text -> Maybe Char
> >
> > BS: Â  replicate :: Int -> Char -> ByteString
> > Text: replicate :: Int -> Text -> Text
> >
> > BS: Â  split :: Char -> ByteString -> [ByteString]
> > Text: split :: Text -> Text -> [Text]
> >
> > BS: Â  span :: (Char -> Bool) -> ByteString -> (ByteString, ByteString)
> > Â  Â  Â spanEnd :: (Char -> Bool) -> ByteString -> (ByteString,
> ByteString)
> > Text: spanBy :: (Char -> Bool) -> Text -> (Text, Text)
> >
> > BS: Â  splitBy :: (Char -> Bool) -> Text -> [Text]
> > Text: splitWith :: (Char -> Bool) -> ByteString -> [ByteString]
> >
> > BS: Â  unfoldrN :: Int -> (a -> Maybe (Char, a)) -> a -> (ByteString,
> Maybe a)
> > Text: unfoldrN :: Int -> (a -> Maybe (Char, a)) -> a -> Text
> >
> > BS: Â  zipWith :: (Char -> Char -> a) -> ByteString -> ByteString -> [a]
> > Text: zipWith :: (Char -> Char -> Char) -> Text -> Text -> Text
> >
> > I think the two APIs ought to be brought into agreement.
>
> Perhaps. If so, then it is the ByteString.Char8 that ought to be
> brought into agreement with Text, not the other way around. I think
> Text is right in this area. On the other hand, perhaps it makes sense
> for ByteString.Char8 to remain like the ByteString byte interface
> which is byte oriented (and probably rightly so). I hope the
> significance and use of ByteString.Char8 will decrease as Text becomes
> more popular. ByteString.Char8 is really just for the cases where
> you're handling ASCII-like protocols.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/libraries/attachments/20100908/ba2f6708/attachment-0001.html