Weird profiling behaviour

Colin Runciman colin@cs.york.ac.uk
Wed, 26 Jun 2002 12:19:15 +0100


Ketil Z. Malde wrote:

>I have what I think is a really strange problem.  I have a fair sized
>problem, which involves sorting a data set, first on labels (which are
>Strings) and then on scores (which are Ints).
>
>The strange thing is that string sorting is *vastly* faster than int
>scoring!  Now, I've tried modifying the sorting routinges, moving them
>into the same module, and so on, to no avail.  Is there any likely
>explanation?  The pipeline is basically
>
>         sortBy int_cmp . compound_equal . sortBy string_cmp
>
>I hesitate to post the code, since it's part of a rather large
>program, and in the hope that somebody will pop up and say that oh,
>yes, it's a well know feature of 'sortBy'.  Or that it's an artifact
>of profiling.  Or something like that.
>
>Any, and I mean any, suggestions really, really welcome.
>
>-kzm
>
Could it be that the string-comparison sort simply has less sorting to do
than the int-comparison sort?  The default definition of sortBy uses
insertion sort, so if the string-sort input happens to be already sorted
it takes linear time and if the int-sort input happens to be in
reverse order  it takes quadratic time.

Colin R