[Haskell-cafe] Re: tail-recursing through an associative list

apfelmus at quantentunnel.de apfelmus at quantentunnel.de
Thu Oct 12 12:05:25 EDT 2006


Malcolm Wallace wrote:
> Seth Gordon <sethg at ropine.com> wrote:
> 
>> almost-entirely-functional code ... that in its first draft, took
>> about three seconds to process 2,000 rows, eight minutes to process
>> 20,000 rows, and overflowed a 1-MB stack when processing 200,000 rows.
>>  Oops.
> 
> Which just goes to show that your algorithm is non-linear in the size of
> the input.  I doubt that your posted snippet is the cause of the
> problem, since it is certainly linear in the AList it is given.

The linear time in the length of AList may well be the problem :)

You seem to use AList as a key-value mapping (I know the word
"associative" only as mathematical property, please correct me). The Key
acts as key for the grouping you mentioned and the [MetaThingie] is the
actual group of MetaThingie, is that so? That means that with each call
to myFunction, the AList roughly grows by one element.

For logarithmic access times, you should use a binary search tree like
Data.Map or similar. The problem in your case could be that matchKeys is
only approximate and your keys cannot be ordered in suitable fasion.
Then you need a clever algorithm which somehow exploits extra structure
of the keys (perhaps they are intervals, then you can use interval trees
etc.). The C++ code is likely to do some magic along these lines. In
such case, stunning functional power may come from
  Finger Trees: A Simple General-purpose Data Structure
  Ralf Hinze and Ross Paterson.
  in Journal of Functional Programming16:2 (2006), pages 197-217
  http://www.soi.city.ac.uk/~ross/papers/FingerTree.pdf

>> I profiled the code and
>> observed that the most-called-upon function in the program (200
>> million entries for those 20,000 rows)
>
> By optimisation, you can only make this function a constant factor
> faster.  You need to work out how to call it less often in the first
> place.

Almost true. I think that recursive calls are counted as proper call, so
that each top level call to myFunction will result a count of calls to
myFunction linear in the length of AList. Thus "in first place" alias
top level is not enough.


>> type AList = [(Key, [MetaThingies])]
>>
>> myFunction :: AList -> Thingie -> AList
>> myFunction [] x = [(key x, [meta x])]
>> myFunction ((k, v):tail) x | matchKeys k (key x) =
>>                                case tryJoining v x of
>>                                Nothing -> (k, v) : (myFunction tail x)
>>                                Just v' -> (k', v') : tail
>>                                    where v' = bestKey k (key x)
                    should be (?)     where k' = bestKey k (key x)
>>                            | otherwise = (k, v) : (myFunction tail x)


Regards,
apfelmus



More information about the Haskell-Cafe mailing list