List function suggestions

From HaskellWiki
Revision as of 09:41, 6 September 2007 by RossPaterson (talk | contribs) (add generalized groupBy)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

This page lists proposed extensions to the Haskell list functions, whether in the Prelude or Data.List. Please discuss the proposals on the Talk Page or the libraries list, and use this page to record the results of discussions.

Splitting on a separator, etc

We need these useful functions in Data.List; I'll call them 'split' (and variants) and 'replace'. These are easily implemented but everyone always reinvents them. Various versions have been proposed, but there was no consensus on which was best, e.g.

Note: a lot of good points (diverging opinions!) are covered in the mailing lists, but if we include all these various cases, split* will have 9 variants! The goal is to reach some kind of reasonable consensus, specifically on naming and semantics. Even if we need pairs of functions to satisfy various usage and algebraic needs. Failing to accommodate every possible use of these functions should not be a sufficient reason to abandon the whole project.

The goal is clarity/uniformity (everyone uses them widely and recognizes them) and portability (I don't have to keep reimplementing these or copying that one file UsefulMissingFunctions.hs).

Note: I (Jared Updike) am working with the belief that efficiency should not be a valid argument to bar these otherwise universally useful functions from the libraries; regexes are overkill for 'split' and 'replace' for common simple situations. Let's assume people will know (or learn) when they need heavier machinery (regexes, FPS/ByteString) and will use it when efficiency is important. We can try to facilitate this by reusing any names from FastPackedString and/or ByteString, etc.

split (working name)

We need a few of these:

split :: Eq a => a -> [a] -> [[a]]
splitWith :: (a -> Bool) -> [a] -> [[a]]
tokens :: (a -> Bool) -> [a] -> [[a]]

That preserve:

join sep . split sep = id

See below for 'join'

And some that use above split but filter to remove empty elements (but do not preserve above property). Easy enough:

split' :: Eq a => a -> [a] -> [[a]]
splitWith' :: (a -> Bool) -> [a] -> [[a]]
tokens' :: (a -> Bool) -> [a] -> [[a]]

i.e.

split' sep = filter (not . null) . split sep

Usage would be:

tokensws = tokens' (`elem` " \f\v\t\n\r\b")

tokensws "Hello  there\n \n   Haskellers! " ==
   ["Hello", "there", "Haskellers!"]

TODO: add version like python with multi-element separator

TODO: give code, copy-paste from threads mentioned above

TODO: list names and reasons for/against

replace (working name)

replace :: [a] -> [a] -> [a] -> [a]

like Python replace:

replace "the" "a" "the quick brown fox jumped over the lazy black dog"
===>
"a quick brown fox jumped over a lazy black dog"

TODO: give code, copy-paste from threads mentioned above

TODO: list names and reasons for/against

join (working name)

join :: [a] -> [[a]] -> [a]
join sep = concat . intersperse sep

TODO: copy-paste things from threads mentioned above

TODO: list names and reasons for/against

Sorted lists

The following are versions of standard prelude functions, but intended for sorted lists. The advantage is that they frequently reduce execution time by an O(n). The disadvantage is that the elements have to be members of Ord, and the lists have to be already sorted.

 -- Eliminates duplicate entries from the list, where duplication is defined
 -- by the 'eq' function.  The last value is kept.
 sortedNubBy :: (a -> a -> Bool) -> [a] -> [a]
 sortedNubBy eq (x1 : xs@(x2 : _)) =
    if eq x1 x2 then sortedNubBy eq xs else x1 : sortedNubBy eq xs
 sortedNubBy _ xs = xs

 sortedNub :: (Eq a) => [a] -> [a]
 sortedNub = sortedNubBy (==)

 -- Merge two sorted lists into a new sorted list.  Where elements are equal
 -- the element from the first list is taken first.
 mergeBy :: (a -> a -> Ordering) -> [a] -> [a] -> [a]
 mergeBy cmp xs@(x1:xs1) ys@(y1:ys1) =
    if cmp x1 y1 == GT
       then y1 : mergeBy cmp xs ys1
       else x1 : mergeBy cmp xs1 ys
 mergeBy _ [] ys = ys
 mergeBy _ xs [] = xs

 merge :: (Ord a) => [a] -> [a] -> [a]
 merge = mergeBy compare

Generalize groupBy and friends

In the Haskell 98 List library, groupBy assumes that its argument function defines an equivalence, and the reference definition returns sublists where each element is equivalent to the first. The following definition, comparing adjacent elements, does the same thing on equivalence relations:

 groupBy                 :: (a -> a -> Bool) -> [a] -> [[a]]
 groupBy rel []          =  []
 groupBy rel (x:xs)      =  (x:ys) : groupBy rel zs
   where (ys,zs) = groupByAux x xs
         groupByAux x0 (x:xs) | rel x0 x = (x:ys, zs)
           where (ys,zs) = groupByAux x xs
         groupByAux y xs = ([], xs)

However it is also useful on other relations, e.g.

  • groupBy (<=) would pick out maximal ascending sublists (runs).
  • groupBy (\a b -> b == a+1) would pick out contiguous sublists from an ascending sequence.
  • groupBy (\ c1 c2 -> isLetter c1 || not (isLetter c2)) would split at the start of each word.

Since this more useful definition agrees with the Haskell 98 one on its specified domain, it should be a backwards-compatible replacement.

The same applies to nubBy, and possibly deleteBy, deleteFirstsBy and intersectBy (which could have more general types to make this clear).

See also