Proposal #2717: Add nubWith, nubOrd

Bart Massey bart at cs.pdx.edu
Tue Oct 21 13:38:22 EDT 2008


Mitchell, Neil <neil.mitchell.2 <at> credit-suisse.com> writes:
> nubOrd: Seems good. Useful function to have. Shame its not in Data.List,
> but I understand the reasons for that, and think this is a perfectly
> sensible choice.

Thanks.

> nubWith: Seems bad. A not particularly useful function (other than to
> write nubOrd), with a fairly confusing name. I have previously used
> nubWith to mean nubOn (nubOn f = nubBy ((==) `on` f)) with cacheing of
> the function f. As a name, "with" only means "with additional stuff" -
> it gives no hint what the additional stuff is.

Were we to keep it, do you have a better naming suggestion?  I agree that
nubWith is not optimal, but I don't have a great alternative.  As you sort
of suggest below, maybe nubFilter could work?  [The problem starts with
the belief of everyone I've ever talked to about it that "nub" itself
should have been called "uniq". :-)]

> The concept of stop-lists is a little confusing, and then you tell
> the user they almost certainly want to pass [] every single time - in that
> case why do they get an option?

One can imagine situations in which you want to accumulate a "stop list"
over several runs of the function, but of course nubWith as designed won't
let you do that, so that's fail.

If they already have a list of things they want pre-stopped, they could
pass that in.  Imagine a variant of nondescendingSubsequence below in
which the accumulator started with 0 instead of the first element, for
example.

Of course the technical reason the user needs to pass an empty stop list is that
we killed the type class that goes with nubWith, so there's no way to know what
the empty accumulator of their stop list type actually is.

> Also stop-lists have type "b", so stop-unconstrainted-type-variable
> seems a more appropriate name.  

The name "stop list" is a term of art in the algorithm and AI fields, which
is why I chose it.  I could stick some bib reference in the docs if
you thought that would help.  In general, the stop list is an accumulator
of values that you want to stop after the first one.

> The type signature is fairly complex.

I agree.  I'd written this off initially for this reason, but other
folks on the list seem to be fine with it.

> I'm not even convinced that nubWith really is a nub function, and not
> just some generalised filter - filterState perhaps. In which case you'd
> want to generalise Maybe b to (Bool,b).

Or to Either b b?  Which is preferred in this case?

> Can you ever imagine anyone other than nubOrd using nubWith?

Yes. As discussed previously you can get a significant constant factor
speedup with nubInt on IntSets if you need it.  Also, as discussed previously,
you can get a speedup for nubChar by using a mutable array as the stop list.

One can also imagine using this in less traditional ways; for example

  nondescendingSubsequence [] = []
  nondescendingSubsequence (e : es) =
    e : nubWith (\a b-> if (a >= b) then Just a else Nothing) e es

  > nondescendingSubsequence [1,2,3,2,3,3,4,1,1]
  [1,2,3,3,3,4]

> Is it a genuine utility function people
> have been crying out for?

IMHO it is a "genuine utility function", whatever that means.
It certainly isn't something "people have been crying out for",
so if that's the criterion we should omit it.

It just seems churlish to hide it, given that it's sitting in
there being a perfectly useful building block for things people
do cry out for.  I'd guess it would be used about as often
as "break", which is in there for much the same reason.

> It seems perfectly good to include as a local
> function in Data.Set to be used to implement nubOrd, but I don't see its
> value as a standalone export from a very commonly used library full of
> very useful stuff.

We could move it to live in Data.Set with nubOrd if folks think that it
doesn't belong in Data.List.  Or we could move them both into their own module,
although that seems silly to me given that I can't really think what else
goes in there.

> So, in summary I think nubOrd is great, and nubWith isn't nub, and isn't
> great.

Thanks hugely for the detailed commentary!

Please understand that I'm in no way wedded to the idea of getting any of this
in; I want to do what works best for everyone, and am grateful for your and
everyone's help in figuring out what that is.

    Bart Massey
    bart at cs.pdx.edu



More information about the Libraries mailing list