Optimizations for mutable structures?

Wed Dec 7 23:09:17 EST 2005

Oh, dear, a brief mail with a high-level view of optimization seems  
to have touched off some confusion about concurrency.  I wasn't  
thinking about concurrency at all when I wrote my original message,  
but there seem to be some major misconceptions in the ensuing  
discussion, which I'd like to clear up.

On Dec 7, 2005, at 9:15 AM, Simon Marlow wrote:

> On 07 December 2005 13:38, Malcolm Wallace wrote:
>
>> Jan-Willem Maessen <jmaessen at alum.mit.edu> writes:
>>
>>>    - Fetch elimination for imperative reads:
>>>      writeIORef r e >> acts >> readIORef r
>>>  === writeIORef r e >> acts >> return e
>>
>> This transformation is valid only on single-threaded systems.
>> If there is any possibility of an IORef being shared across threads,
>> you are out of luck.
>
> (assuming 'acts' doesn't modify 'r').

This was my intended meaning---I dashed off the above lines, and  
trusted they'd be understood.  Apparently I should have been clearer.

> No, Jan's transformation is correct even in a multithreaded  
> setting.  It
> might eliminate some possible outcomes from a non-deterministic  
> program,
> but that's ok.

I agree with Simon here.  "Eliminate some possible outcomes" is  
indeed possible to define in a meaningful and technically rigorous  
way.  Others (I'll single out Malcolm and Claus here) have railed  
against this view, arguing that every program behavior should be  
preserved.  Claus even raises the (difficult technical) issue of  
fairness.

Sadly, I'm afraid any realistic implementation *will* eliminate  
possible outcomes.  In its cooperative multithreading, GHC constantly  
makes decisions about where thread switching may occur---and I  
suspect that inserting every possible thread switch would result in  
dramatic performance drops.

But worse than that, even if we insert a check between every single  
IORef operation, the processor may well decide to execute successive  
IORef operations in parallel, or even choose to reorder two IORef  
operations which refer to different locations.  It may even choose to  
eliminate the read in the above example before the write has become  
visible to any other thread!  In effect we get behavior  
indistinguishable from the suggested optimizations.

Now, such behavior isn't always acceptable, so there are ways to get  
back to sanity.  However, those ways (memory barriers and/or atomic  
memory operations) are extremely expensive.  I'm betting one or both  
of you regularly use an x86 machine---for which there is not even a  
rigorous specification of where these operations must be inserted!

Nonetheless, we get by.  We do so by defining idioms based on  
synchronization---MVars and TMVars are entirely appropriate places to  
be enforcing memory ordering.  Much of the early pain of the Java  
Memory Model revision (Claus referred to the mess which was made of  
the original spec, now fixed) was to avoid the need to insert  
barriers in most code.  A consensus was reached on an acceptable  
programming style: Use monitor synchronization and avoid data races,  
or failing that use volatile variables in particular well-defined  
ways.  If you break those rules, all bets are off; there is a lengthy  
spec defining exactly what that means (mostly to rule out the  
behavior "then the program creates a valid password out of thin  
air"), but this is everything you, the programmer, need to understand.

Similar consensus opinions have formed in other communities, usually  
without rigorous specifications to back them up.  Voices in this  
thread have suggested that the right idiom for Haskell is to  
synchronize using TMVars and MVars.  I agree (actually I hope that  
TMVars are enough, though I'd love to have a TMArray), and I think we  
can even guarantee reasonable things about IORefs that get passed  
from thread to thread in this way.  Want fairness?  Put the stuff you  
want to observe in an MVar or a TMVar.

Where does this leave us?  Roughly where Simon noted---it's easy to  
do shallow things, like fetch elimination in the absence of  
intervening calls to unknown functions, and hard to do deep  
optimizations.  I suspect that shallow things are all that is called  
for.

We love to criticize C for all the things it has done badly.  But  
let's not assume that everything C compilers do is stupid or a bad  
idea.  [Oddly, I find myself telling C programmers the same thing  
about Fortran all the time.]

-Jan-Willem Maessen

> There's no requirement that all interleavings according
> to the semantics have to be implemented.  This is a hard property to
> state precisely, indeed we gave up trying to in the concurrency/FFI
> paper: http://www.haskell.org/~simonmar/papers/conc-ffi.pdf, see  
> Section
> 6.1.
>
> Cheers,
> 	Simon
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users