[Haskell-cafe] Duplicates in arbitrary instances for collections

Sun Jan 20 21:12:17 CET 2013

Every QuickCheck tutorial makes the point that, given a property on
some type, the Arbitrary instance for that type is written in such a
way that nearly all of the interesting edge-cases of that type are
tried. So e.g. for numeric types, positive, negative, and zero numbers
are tried. For collections, I think the case of duplicate elements is
insufficiently covered.

Say we have a list difference function:
badDiff :: Eq a => [a] -> [a] -> [a]
badDiff l r = filter (\e -> e `notElem` r) l

And a test for it:
prop_wrong :: [Int] -> [Int] -> Bool
prop_wrong a b = (a `badDiff` b) == (a \\ b)

`quickCheck prop_wrong` passes about 80% of the time!

Obviously it shouldn't:
> badDiff [1,2,3,3,3] [2,3]
[1]
> (\\) [1,2,3,3,3] [2,3]
[1,3,3]

But even with maxSuccess at 1000, this property passes at least 2/3 of
the times it's run (on my machine, where (maxBound :: Int) is large).

The reason is that `arbitrary` for [a] is just
`sized $ \n -> choose (0,n) >>= \k -> sequence [arbitrary | _ <- [1..k]]`
It would be trivial to duplicate values in the collection a certain
percentage of the time (possibly sometimes adjacent). Is there a good
reason not to? It shouldn't require an Eq constraint.

Tom