need help optimizing a function

David Roundy droundy@jdj5.mit.edu
Tue, 8 Oct 2002 09:14:26 -0400


Hello.  If you followed my previous post (which was about a bug in my lcs
function), this is still the same code (only now bug-free, it seems).

It turns out that my lcs diff function is MUCH slower than GNU diff on some
test files (much worse than 1000 time slower!).The test case where its
speed was close was IO bound on the output to an xterm (silly me).

I've profiled my lcs code, and it is spending about 95% of its time in the
"hunt_one_char" function.  It also claims to spend 0.0% of its time in
my_bs, which does a binary search on the array, and is the only other
function that gets called by hunt_one_char (not counting itself).

I've told the compiler (ghc) to inline hunt_one_char and my_bs to no avail.
I guess maybe it can't or won't inline a recursive function, or maybe that
just doesn't gain me anything.

My only guess is that for some reason it is making an entirely new copy of
the array each time it recurses, but that doesn't make any sense, since the
array is getting threaded through, so it should be able to just modify the
array in place.  But then, I'm not entirely clear as to what the compiler
can and cannot do as it optimizes.

>>data Threshold a = Thresh Int! [a]
>>                 deriving (Show)
>>
>>hunt_one_char :: String -> [Int] -> Array Int (Threshold String) ->
>>                 Array Int (Threshold String)
>>hunt_one_char c [] th = th
>>hunt_one_char c (j:js) th =
>>    case my_bs (Thresh j [c]) th of
>>    Nothing -> hunt_one_char c js th
>>    Just k ->
>>        case th!(k-1) of
>>        Thresh _ rest ->
>>            hunt_one_char c js th//[(k,Thresh j (c:rest))]

For what it's worth, the calling function is:

>>hunt_internal :: [String] -> [[Int]] -> 
>>                 Array Int (Threshold String) ->
>>                 Array Int (Threshold String)
>>hunt_internal [] _ th = th
>>hunt_internal _ [] th = th
>>hunt_internal (c:cs) (m:ms) th =
>>    hunt_internal cs ms $ hunt_one_char c m th

Each string in this list of strings is a line from one of the files (and
the [[Int]] is a list of line numbers in the other file that that line
matches).  The array 'th' has size (1,n) where n is the number of lines in
the file.

My simplest test case consists of a 20k line file, each line of which is
unique and a second file which is simply a permutation of the first (moving
the first line to the last), so hunt_one_char should be called exactly once
for each line, and it is always called with its second argument having
exactly one element.  In this test case, according to the ghc profiler,
89.7% of the time is spent in hunt_one_char:
                                              individual     inherited
COST CENTRE              MODULE     entries  %time %alloc   %time %alloc
      hunt_internal      Main         20001    0.2   0.0     91.4  95.3
       hunt_one_char     Main         40000   89.7  95.0     91.1  95.3
        my_bs            Main         20000    0.2   0.0      1.4   0.3
         my_helper_bs    Main        307231    1.2   0.3      1.2   0.3

This has me at a loss.  It seems like such a simple function... of course,
it is called 20k times, which could certainly add up, but while each call
of my_bs should take O(log 20k) time, each call of hunt_one_char (apart
from its one call to my_bs) should only take O(1), so most of the time
should be spent in my_bs unless something is terribly wrong (which must be
the case here).

If it would help, I'd be happy to send a listing of the complete code, but
it's 6.5k so I figured it'd be better not to send it, since it seems
unlikely that anyone would want to run it anyways.
-- 
David Roundy
http://civet.berkeley.edu/droundy/