Proposal for Data.List.splitBy

Sun Jan 11 14:14:57 EST 2009

Duncan Coutts wrote:
> On Sun, 2009-01-04 at 23:25 +0100, Henning Thielemann wrote:
>   
>> On Sun, 4 Jan 2009, Marcus D. Gabriel wrote:
>>
>>     
>>> We seem to have not added any splitters such as splitBy or split to
>>> Data.List despite the discussions.  Here is yet another perspective
>>> with the hope that it will make it.
>>>       
>> I assumed that the discussion was resolved by
>>     http://hackage.haskell.org/cgi-bin/hackage-scripts/package/split
>>     
>
> We should say that the split package is a way of continuing the
> discussion. The hope of some around here is that we can find a consensus
> on a small set of common split functions and add them to the Data.List
> module. I'm not sure if that goal is achievable but I think it's worth
> trying.
>
> Duncan
>   

Let me see if I can explain what I was trying to bring to the
discussion.  Forget the code examples that I gave for now.

You should realize that I am someone who was reading the discussion
threads and felt that the group had the answer or answers, that there
was no fighting going on as someone wrote somewhere, and that the key
properties that the splitter functions should posses is contained in the
e-mail threads but not grouped together.  In other words, it was a nice
discussion to read.

So, let me formulate this from my point of view of how this comes
together as I have read it.

1/ It's worth having some splitter functions in the Data.List module, so
this is the goal.

Otherwise, one leaves Data.List alone and the discussion should be
whether or not one should simply use a Data.List.Split package to handle
a set of useful idioms somewhere between Data.List and
Text.ParserCombinators.Parsec in complexity.  Possibly one would flush
out Data.List.Split and even write it with a pedantic theme for new
comers to Haskell for example.

2/ Some properties, in a loose sense of the term, are needed to
constrain the choices and from the e-mail threads, I would say that this
sums up as the following.

P1.  The number of splitter functions should be few in number,
otherwise, make a package.

P2. There should be no information loss, that is, keep the delimiters,
keep the separators, keep the parts of the original list xs that satisfy
a predicate p, do not lose information about the beginning and the end
of the list relative to the first and last elements of the list
respectively.  The user of the function decides what to discard.

P3. A split list should be unsplittable so as to recover the original
list xs.  (I made up the word unsplittable.)  (P2 implies P3, but let us
state this anyway.)

>From the e-mail threads, I would say the above are the absolute minimum
properties that the solution should have.  If there is no agreement
here, this would be interesting and should be discussed.

I would go on to write that the e-mail threads and the properties above
strongly imply in my mind the follow property:

P4. A split list xs' should have the same relative order as the original
list xs.  (I am sure everyone finds this obvious, and in the e-mail
threads I do not remember an exception, so let us state this clearly.)

The next properties are not exactly what I read from the e-mail threads
and are really more my personal view, but they are not in contradiction
with what I read either.

P5.  The splitter functions should fit within the spirit of the
Data.List module and even the original Haskell 98 List module in terms
of type signature and complexity of implementation.

P6. There should be at least one splitter function that takes a
predicate p, a "by" word such as groupBy or nubBy.

If there is agreement about P1 to P6, then this is a way to frame the
discussion.  If not, then I would say that this needs to be determined
first, otherwise the goal stated above will be difficult to reach.  The
function intercalate was probably easy to add because it grasps a nice
idiom, but with splitter functions there is the problem of adding more
and more words and moving closer and closer to the power and complexity
of Parsec for example.

This next property is very much my point of view:

P7. The splitter functions should take off where break and scan left off.

I add this one because when I was first learning Haskell, I was using
the List module quite a bit.  I needed a splitBy word, and I thought
that break and scan were for what I was looking, but they were not.  So
I wrote

> splitBy :: (a -> Bool) -> [a] -> [([a], [a])]
> splitBy p xs =
>     let (hd1, tl1) = break p xs
>         (hd2, tl2) = span p tl1
>    in (hd1, hd2):(if null tl2 then [] else splitBy p tl2)

which I eventually came to not like.

To sum up, if we can agree that a set of splitter properties should fit
within the context of Properties P1 to P7, then maybe forward movement
can be made.  If we cannot agree on these properties or another set,
then splitter function idioms are too idiomatic, and a larger set of
words are needed, so the investment should be in Data.List.Split.

Cheers,
- Marcus

-- 
  Marcus D. Gabriel, Ph.D.                         Saint Louis, FRANCE
  http://www.marcus.gabriel.name            mailto:marcus at gabriel.name
  Tel: +33.3.89.69.05.06                   Portable: +33.6.34.56.07.75