[Haskell-cafe] Re: Parallelism and expensive calculations that may not be needed

Fri Jun 8 05:21:41 EDT 2007

Felipe Almeida Lessa wrote:

> (Sorry if this is a newbie question, couldn't find the answer anywhere)
> 
> Suppose I have an expensive function (such that even to be reduced to WHNF
> it takes a long processing time)
> 
>> expensive :: Foo -> Maybe Bar
> 
> and I want to calculate it on multiple processors, like in
> 
>> calculate = parMap rnf expensive
> 
> Well, fine. But suppose that then I give the result to a function like
> 
>> final xs = reverse $ final' xs [] where
>>     final' []     r = r
>>     final' (x:xs) r = case x of Nothing -> []
>>                                 Just x  -> final' xs (x:r)
> 
> Given list :: [Foo], a sequential calculation as in
> 
>> resultS = final $ map expensive list
> 
> would not call expensive after finding a Nothing. But
> 
>> resultP = final $ calculate list
> 
> will keep calculating all the other expensive's because they were sparked?

Right.  This is an example of speculation: you don't know in advance how much 
work you really have to do, so any work you do in parallel may or may not be 
necessary.

If we fire off the whole list in parallel, we might end up doing a lot of work 
on the elements at the end of the list, which are less likely to be required 
than the elements at the front of the list.  That's a bad strategy: we need to 
prioritise the elements near the front.  Fortunately GHC's built-in strategy for 
picking sparks to evaluate picks the oldest ones first, so this shouldn't be a 
problem.

The problem occurs when you've found the Nothing, but the rest of the list has 
already been sparked.  You really want to throw away all those sparks, but 
there's no way to do that currently.  One way you could improve the situation 
though is to only spark the next N elements in the list, limiting the amount of 
speculation.

Hope this helps...

Cheers,
	Simon