Difference between revisions of "GHC/Concurrency/Flaws"

From HaskellWiki
Jump to navigation Jump to search
(Fixed -- removed problem description)
(2 intermediate revisions by 2 users not shown)
Line 1: Line 1:
  +
[[Category:GHC|Concurrency]]
= "throwTo & block statements considered harmful" =
 
   
  +
= throwTo & block statements fixed in GHC 6.6.1 =
The following description is by ChrisKuklewicz.
 
   
  +
The problem which used to be described here has been fixed in GHC HEAD and this will be included in GHC version 6.6.1
This is a short essay to prove that the current GHC concurrency
 
implementation has a critical flaw.
 
   
  +
I have built (Feb 9, 2007) the HEAD version of GHC and this problem is no longer present.
The key problem is, at least in the presence of block/unblock, that
 
Exceptions are never reliably delivered. And this is not a
 
theoretical case, but can cause a hang in something as innocuous as
 
"program A" given below.
 
   
  +
See the ticket at [http://hackage.haskell.org/trac/ghc/ticket/1047] and the comments near the top of [http://darcs.haskell.org/ghc/rts/Exception.cmm].
Simon Marlow wrote:
 
<pre>
 
>> I think people are misunderstanding the nature of a safepoint. The
 
>> safepoint is a point at which you are prepared to have exceptions
 
>> delivered. This does not mean that they *will* be delivered, just that they
 
>> can. If you need to *wait* for an asynchronous exception, then you
 
>> shouldn't be using them at all.
 
>
 
> Right. If a thread mostly runs inside 'block' with the occasional safe
 
> point, then your exceptions are not really asynchronous, they're synchronous.
 
>
 
>
 
> In this case, I'd say a better solution is to have an explicit event queue,
 
> and instead of the safe point take an event from the queue. The action on
 
> receiving an event can be to raise an exception, if necessary.
 
>
 
> Cheers, Simon
 
</pre>
 
   
  +
The description which was here has been deleted, but is still available in the historical views of this page.
The implementation of asynchronous signals, as described by the paper
 
"Asynchronous exceptions in Haskell by Simon Marlow, Simon Peyton Jones, Andy Moran and John Reppy, PLDI'01."
 
is fatally inconsistent with the implementation in GHC 6.4 and GHC 6.6 today.
 
 
The implemented semantics have strictly weaker guarantees and render programs
 
using asynchronous expressions impossible to write correctly. The semantics in
 
the paper were carefully designed to solve the problem laid out in the first
 
sentence of the abstract:
 
 
"Asynchronous exceptions, such as timeouts, are important for robust, modular
 
programs, but are extremely difficult to program with -- so much so that most
 
programming languages either heavily restrict them or ban them altogether."
 
 
And I believe the paper succeeded. The paper shows how to replace other
 
languages pervasive and intrusive error catching and handling code with much
 
cleaner, clearer, and often more correct code.
 
 
The implementation in GHC has changed the behavior of throwTo from asynchronous
 
(not-interruptible) to synchronous (interruptible?) as discussed in section 8 of
 
the paper. This change, in and of itself, is not the fatal problem; as
 
described in the paper a (forkIO (throwTo ...)) recovers the asynchronous behavior.
 
 
The fatal change between the paper and GHC comes from not following section 7.2
 
as published. Section "7.2 Implementation of throwTo" has two bullet point, and
 
the second bullet point is (retyped, so typos are my own fault):
 
 
"As soon as a thread exits the scope of a 'block', and at regular intervals
 
during execution inside 'unblock', it should check its queue of pending
 
exceptions. If the queue is non-empty, the first exception from the queue
 
should be raised."
 
 
A test of GHC 6.6 shows that this is not the case. Test program A:
 
<haskell>
 
> loop = block (print "alive") >> loop
 
>
 
> main = do tid <- forkIO loop
 
> threadDelay 1
 
> killThread tid
 
</haskell>
 
Program A, compiled with (-threaded) on a single CPU machine never halts. It
 
will print "alive" forever while the the main thread is blocked on "killThread".
 
 
As an aside, removing the threadDelay causes killThread to destroy the child
 
before it can enter the block, thus showing the need to add "forkBlockedIO" or
 
"forkInheritIO" to the library. This can be worked around using an MVar.
 
 
Changing the definition of loop produces Test program B:
 
<haskell>
 
> loop = block (print "alive") >> loop >> yield
 
>
 
> main = do tid <- forkIO loop
 
> threadDelay 1
 
> killThread tid
 
</haskell>
 
This prints "alive" twice before the killThread succeeds.
 
 
The paper demands that when the loop in Program A exits the scope of "block
 
(print a)" that it check a queue of pending exceptions, see that it is
 
non-empty, and raise the exception thrown by killThread. This can also be seen
 
in "Figure 5. Transition Rules for Asynchronous Exceptions", where killThread
 
should use throwTo to create an in-flight exception and exiting the scope of
 
block in the presence of this in-flight exception should raise the exception.
 
 
The implementation in GHC sleeps the main thread at the killThread command, and
 
it is awoken when the block is exited and to succeed in delivering the
 
exception it must execute while the child is still in the unblocked state. But
 
the child re-enters a blocked state too quickly in Program A, so killThread
 
never succeeds. The change in Program B has the child "yield" when unblocked
 
and this gives the main thread a change to succeed.
 
 
This trick using yield to make a safepoint was suggested by Simon Marlow:
 
<pre>
 
> The window in 'unblock (return ())' is tiny, I'm not really surprised if
 
> nothing ever gets through it. You might have more luck with 'unblock yield'.
 
</pre>
 
It has been said on this mailing list thread that needing "yield" to
 
program concurrently is a bug in the user's code and should be
 
replaced by other mechanisms. In a complex program with many threads
 
even the yield trick may not be enough to program reliably. Using
 
 
<haskell>
 
blockY io = do x <- block io
 
yield
 
return x
 
 
unblockY io = unblock (yield >> io)
 
</haskell>
 
 
instead of 'block' and 'unblock' is at least a simple but unreliable workaround.
 
 
This would not be a fatal flaw if there were a simple reliable
 
workaround. The best workaround is the following, which is anything
 
but simple, and amount to re-implementing the mechanisms in the paper:
 
 
* When forking a thread:
 
** Create an (Chan Exception) (or (TChan Exception)) and pass it to
 
the child thread. Call this the queue.
 
** Create a ((T)MVar queue) to indicate whether the thread is still
 
alive or not and wrap the whole child action in a 'finally' to
 
ensure this is set to the dead state when the child exits.
 
(Alive iff the queue is in the MVar)
 
** To be sure the child thread is in the handler, use another MVar
 
or re-use the "alive/dead" MVar to ensure the child is running
 
before returning the ThreadId from the fork operation.
 
 
* Replace all uses throwTo/killThread by
 
** Test the alive/dead state of the thread, proceeding only on
 
living threads
 
** Putting the Exception in the queue
 
** Call throwTo/killThread with the Exception
 
 
* The child thread must:
 
** implement safepoints by checking the queue
 
** At each 'block' to 'unblock' transition the child ought to add a
 
safepoint
 
 
This machinery closely recovers the queue of exceptions and the
 
semantics described in the paper. When a thread has died the queue is
 
removed from the alive/dead MVar and can be garbage collected, along
 
with any pending exceptions for the dead thread.
 
 
To ensure the above semantics are not violated by the caller, the now
 
fork operation should not return an exposed ThreadId, but should
 
return some proxy type or operation to ensure only the proper
 
throwto/killThread are peformed.
 
 
Next I will demonstrate how the paper's achievement of simple and
 
clear error handling code is lost.
 
 
The child thread is where the workaround becomes a nightmare. It is
 
impossible to test the current 'block' versus 'unblock' state. So the
 
only time the child is certain that a 'block' to 'unblock' transition
 
has occurred is when it is explicit:
 
<haskell>
 
> unblock ( .... block ( ... ) <*> .... )
 
</haskell>
 
The <*> is obviously such a transition and ought to have a safepoint added.
 
 
But is only because the block had the explicit unblock around it. If
 
there were a function such as
 
<haskell>
 
> atomicUpdateFoo = block (...) <?>
 
</haskell>
 
then there is no way to know if the <?> should be a safepoint.
 
Therefore atomicUpdateFoo needs documentation so that the caller knows
 
to add a safepoint in the case it is called from the unblocked case.
 
 
In a more complex example:
 
<haskell>
 
> atomicUpdateFooBar = block (...) <1?> block (...) <2?>
 
</haskell>
 
the caller cannot add <1?> itself, and must therefore pass in code
 
'safe :: IO ()' to all such functions:
 
<haskell>
 
> atomicUpdateFooBar safe = block (...) >> trans >> block (...) >>
 
</haskell>
 
trans where 'trans' is "return ()" or "checkChan queue" depending on
 
whether the the caller knows it is in a 'block' or 'unblock' state.
 
 
If the parent never needed to use 'throwTo'/'killThread' then the
 
child never needs 'block'/'unblock' and the nightmare is avoided. But
 
if the child uses any operation which can block such as 'accept' or
 
'getChar' or 'takeMVar' then the parent cannot deliver the signal
 
without waking up the child with a 'throwTo' which quickly leads to
 
either an overwhelming use of 'catch' in the child (like other
 
languages) or use of 'block' and the fragile workaround I just
 
described.
 
 
As an aside, if there was a "isBlockedThreadState" operation, then it
 
would be possible to create a better client workaround:
 
 
<haskell>
 
checkChan queue x = do
 
isBlocked <- isBlockedThreadState -- perhaps needing =<< myThreadId
 
case isBlocked of
 
True -> return x
 
False -> raiseQueue queue x
 
 
raiseQueue queue x = do
 
hasException <- isEmptyChan queue
 
case hasException of
 
False -> return x
 
True -> readChan >>= raise
 
 
checkedBlock queue io = do x <- block io
 
checkChan queue x
 
 
checkedUnblock queue io = checkChan queue >> io
 
</haskell>
 
 
If that existed then the client should use checkedBlock and
 
checkedUnblock instead of block and unblock. The net effect of this
 
workaround is to implement the queue described in the original paper
 
and wrap the forkIO/throwTo/block/unblock operations with new ones
 
that implement the semantics described in the section 7.2 of the
 
paper.
 
 
An ugly side effect of the above is that the modified throwTo also
 
sent the Exception via throwTo as well as the (T)Chan. So it is
 
possible that Exception could be raised twice depending on how the
 
child is written. And the order of exceptions in the channel may not
 
be the order that the throwTo's deliver the asynchronous exceptions.
 
 
To alleviate this somewhat you could have the modified throwTo put the
 
real exception on the (T)Chan and send a GoLookInTheChan asynchronous
 
exception. Reversing this might be possible, but it is not clear how.
 
If we could deterministicly wait for a pending asynchronous signal
 
then we could write a proper safe point to fix "block to unblock"
 
transitions, which we know we cannot do.
 
 
We can try to implement isBlockedThread state by maintaining a stack
 
of block/unblock operations:
 
 
<haskell>
 
checkedBlock stack queue io = do
 
x <- bracket_ (modifyIORef stack (True:))
 
(modifyIORef stack tail)
 
(io)
 
checkChan queue x
 
 
checkedUnblock stack queue io = do
 
checkChan queue ()
 
x <- bracket_ (modifyIORef stack (False:))
 
(modifyIORef stack tail)
 
(io)
 
 
isBlockedThread stack = liftM head (readIORef stack)
 
</haskell>
 
 
Where "stack :: IORef [Bool]" is created and passed to the child by
 
the modified fork operation. The 'stack' above grows without bound
 
during recursion and is not tail recursive, so the stack management
 
should be changed to the algorithm in the paper... in which case one
 
really has recreated the paper's implementation inside Haskell.
 
 
The only alternatives are to
 
* Write code that may hang forever at a throwTo/killThread
 
* Never use throwTo or killThread making block/unblock irrelevant
 
* Never use 'block' (and 'unblock')
 
* Only write children with explicit "(unblock ... block (... block
 
() ...) >>= raiseQueue queue >>= ...." where the different between
 
checked and unchecked block is obvious and explicit.
 
* Track and propagate the blocked state as in atomicUpdateFooBar.
 
 
This problem with throwTo/block was not obvious until I stated writing
 
small programs to test the implementation in GHC and thus discovered
 
that the paper's description had mislead me.
 
 
The key problem is, at least in the presence of block/unblock, that
 
Exceptions are never reliably delivered. And this is not a
 
theoretical case, but can cause a hang in something as innocuous as
 
"program A" given above.
 

Revision as of 17:07, 9 February 2007


throwTo & block statements fixed in GHC 6.6.1

The problem which used to be described here has been fixed in GHC HEAD and this will be included in GHC version 6.6.1

I have built (Feb 9, 2007) the HEAD version of GHC and this problem is no longer present.

See the ticket at [1] and the comments near the top of [2].

The description which was here has been deleted, but is still available in the historical views of this page.