[Haskell-cafe] Re: (flawed?) benchmark : sort

Adrian Hey ahey at iee.org
Mon Mar 10 17:13:14 EDT 2008


Krzysztof Skrze;tnicki wrote:
> Ok, my turn now. Let's think about algorithm that takes equivalence 
> relation EQ, ordering relation ORD on abstraction classes generated by 
> this equivalence ( -> equivalence classes ) and divides given input 
> elements XS into appropriate classes and then prints them out according 
> to given ordering ORD. If we pose the restriction (let's call it (*)), 
> that input XS should have at most one element from every abstraction 
> class, we get sorting in a way that you desire. Hovewer, if we don't 
> pose that restriction the algorithm still makes perfect sense and is 
> given by standard library sortBy.
> 
> I see no reason for restriction (*).

I don't understand the above paragraph. Let's consider a simple example:

(sort [a,b]) in the case we have: (compare a b = EQ)

Which of the following 4 possible results are correct/incorrect?
1- [a,a]
2- [a,b]
3- [b,a]
4- [b,b]

I would say they are all correct and equivalent for any "sane" Ord
instance, though from the point of view of space efficiency 1 or 4
are preferable to 2 or 3.

> For efficiency reasons there could be additional class StrictEq. If the 
> type is in that class then we can assume (*) and use more 
> space-efficient algorithm.
> 
> Now, the problem with
> 
> treeSort = concatMap (reverse . snd) . Map.toAscList
>              . Map.fromListWith (++) . map (\x -> (x,[x]))
> 
> is that i can't tell any compact way of implementing treeSortBy in nice 
> manner. This is because Data.Map does not allow us to provide our own 
> comparison test instead of compare.

As a practical matter, for benchmarking you should also count the actual
number of comparisons needed, not just execution times for simple types
(Ints presumably).

Also, I think you'll find that the AVL lib gives better performance than
Data.Map, particularly for sorted inputs. Unfortunately you can't use
this implementation in the standard libs without making the AVL lib a
standard lib (the same happens to be true of Data.Map too, thought this
is widely perceived as being standard because of ghc library bundling
:-)

But actually I would say that if either (both) of these is faster than
the the standard sort then this is some kind of performance bug with
the current ghc release. They weren't faster last time I tested (with
Ints).

I also happen to think that sort should be made an Ord class method,
so that trie based sorts are possible (which should be faster for
complex data types). We should only use sort = sortBy compare as
the default.

Regards
--
Adrian Hey



More information about the Haskell-Cafe mailing list