Generalize groupBy in a useful way?

Bart Massey bart at cs.pdx.edu
Fri Sep 5 03:06:54 EDT 2008


I experienced some interesting behavior with groupBy
tonight, and thought I'd run it by folks as a proposal for
the Haskell libraries.


Consider the predicate

    incp e1 e2 = e1 + 1 == e2

Naively, one might expect

    groupBy incp [2,3,4,5,6] == [[2,3,4,5,6]]

but one actually gets [[2,3],[4,5],[6]] from the standard library
implementation.

The Haskell98 Report makes it clear that

   "When the 'By' function replaces an Eq context by a
    binary predicate, the predicate is assumed to define an
    equivalence."

Thus, the observed behavior is technically correct, or at
least technically not incorrect.  The incp predicate does
not follow the equivalence laws (it is neither symmetric nor
reflexive), so groupBy is probably permitted to return most
any result of the correct type.  What is actually happening
is that the groupBy implementation is using span with a
predicate (incp a) where a is the first element in the list.


I think it would be better to expand the definition of of
groupBy such that the equality test is applied only to
adjacent elements (with the arguments in proper order).
This would not change the behavior for true equality
predicates, but would permit cases like incp to have
sensible behavior, which can occasionally be useful. (It was
to me in this case: I was identifying straights in poker
hands.)

Here's a suggested (untested) implementation.  (See also
http://hpaste.org/10134?#a0 )  It relies on a new "chain"
function that is the binary equivalent of span.  This
function seems to me to be useful in its own right.

    chain :: (a -> a -> Bool) -> [a] -> ([a],[a])
    chain _ [] = ([], [])
    chain _ [e] = ([e], [])
    chain p (e1 : es@(e2 : _))
	| p e1 e2 = let (s1, s2) = chain p es in (e1 : s1, s2)
	| otherwise = ([e1], es)

    groupBy' _ [] = []
    groupBy' p l = s1 : groupBy' p s2 where
	(s1, s2) = chain p l
	    
The implementation of groupBy' is actually a bit cleaned up
compared to that of groupBy.

Regardless of what is done here, it would be nice to expand
the Haddock for groupBy in the standard library to make it
clear how the predicate is used.


Comments and suggestions are welcome.  Am I way off base here?

    Bart Massey
    bart at cs.pdx.edu




More information about the Libraries mailing list