[Haskell-cafe] optimising for vector units

Tue Jul 27 13:03:14 EDT 2004

Just wanted to explore a few points you made:

Jon Cast wrote:

>No.  A function may not use all of its arguments, so there's no point in
>evaluating all arguments---you have to find out which ones are actually
>needed.  Terminology note: GHC uses the STG language internally.  In
>STG, evaluation is always triggered by a case expression, which
>evaluates precisely one expression (not necessarily an argument of
>anything, either!)
>
If a  function never uses an argument you dont need to evaluate it... 
remember in
dataflow we start from the end of the program working backwards so we know
all future demands on function arguments...

>Now, (:) cannot `spark' a thread to evaluate xn1 in downFrom, because
>that would end up sparking infinitely many threads---which may be
>possible, but nothing you've said suggests that it is.
>
>  
>
Again think of lazy lists in their 'stream' interpretation. We model them as
a FIFO queue and can set high and low water marks in the FIFO. We can use
hysteresis to ensure work gets done in reasonably large chunks.

I guess my description didn't include all the details of the model I had 
in mind...
you obviously dont want to unroll all recursive functions... I was 
thinking along
the lines of expressions like

    let a = c*d + e*f

where you can obviously execute the two multiplies in parallel (the 
multiplies
being the arguments to the (+) function.

Also it is obvious you don't need a new thread for one of the arguments 
(a function
of one argument would use the same thread)...

In the general recursive case:

fna x y z = fna (x+1) (y-1) (z*3)

again we can do the three simple arithmetic operations in parallel but 
have to wait for all
of them to finish before the call to (fna) can be made. So there isn't 
an infinite expansion
of threads there is one thead that does the recursion and one of the 
arithmetic operations
and two that are forked and reaped each iteration.

>This is true for tree reduction, but not for graph reduction.  Consider,
>e.g., the fact that the call to product in choose and the recursive call
>to downFrom walk down (and hence evaluate) the same list.  Since the
>cons cells making up this list are evaluated by two different functions,
>some sort of locking is indeed necessary to ensure that you don't end up
>with two threads for each of the (infinitely many) cons cells in the
>list.
>  
>
The list is read only (this is a declarative language) no locking is 
necessary on read
only data structures.

>>I suspect I haven't fully understood the difficaulty in doing this,
>>care to enlighten me?
>>    
>>
A final point someone made about the cost of starting threads... Surely 
the obvious approach
is to start one OS thread per execution unit, and do all the thread 
starting with the very
lightweight haskell threads...

I guess in this case I can see where locking would be necessary...

Keean