Personal tools

GHC/Concurrency/Flaws

From HaskellWiki

< GHC | Concurrency(Difference between revisions)
Jump to: navigation, search
(+cat)
Current revision (17:07, 9 February 2007) (edit) (undo)
(Fixed -- removed problem description)
 
(One intermediate revision not shown.)
Line 1: Line 1:
[[Category:GHC|Concurrency]]
[[Category:GHC|Concurrency]]
-
= "throwTo & block statements considered harmful" =
 
-
The following description is by ChrisKuklewicz.
+
= throwTo & block statements fixed in GHC 6.6.1 =
-
This is a short essay to prove that the current GHC concurrency
+
The problem which used to be described here has been fixed in GHC HEAD and this will be included in GHC version 6.6.1
-
implementation has a critical flaw.
+
-
The key problem is, at least in the presence of block/unblock, that
+
I have built (Feb 9, 2007) the HEAD version of GHC and this problem is no longer present.
-
Exceptions are never reliably delivered. And this is not a
+
-
theoretical case, but can cause a hang in something as innocuous as
+
-
"program A" given below.
+
-
Simon Marlow wrote:
+
See the ticket at [http://hackage.haskell.org/trac/ghc/ticket/1047] and the comments near the top of [http://darcs.haskell.org/ghc/rts/Exception.cmm].
-
<pre>
+
-
>> I think people are misunderstanding the nature of a safepoint. The
+
-
>> safepoint is a point at which you are prepared to have exceptions
+
-
>> delivered. This does not mean that they *will* be delivered, just that they
+
-
>> can. If you need to *wait* for an asynchronous exception, then you
+
-
>> shouldn't be using them at all.
+
-
>
+
-
> Right. If a thread mostly runs inside 'block' with the occasional safe
+
-
> point, then your exceptions are not really asynchronous, they're synchronous.
+
-
>
+
-
>
+
-
> In this case, I'd say a better solution is to have an explicit event queue,
+
-
> and instead of the safe point take an event from the queue. The action on
+
-
> receiving an event can be to raise an exception, if necessary.
+
-
>
+
-
> Cheers, Simon
+
-
</pre>
+
-
The implementation of asynchronous signals, as described by the paper
+
The description which was here has been deleted, but is still available in the historical views of this page.
-
"Asynchronous exceptions in Haskell by Simon Marlow, Simon Peyton Jones, Andy Moran and John Reppy, PLDI'01."
+
-
is fatally inconsistent with the implementation in GHC 6.4 and GHC 6.6 today.
+
-
 
+
-
The implemented semantics have strictly weaker guarantees and render programs
+
-
using asynchronous expressions impossible to write correctly. The semantics in
+
-
the paper were carefully designed to solve the problem laid out in the first
+
-
sentence of the abstract:
+
-
 
+
-
"Asynchronous exceptions, such as timeouts, are important for robust, modular
+
-
programs, but are extremely difficult to program with -- so much so that most
+
-
programming languages either heavily restrict them or ban them altogether."
+
-
 
+
-
And I believe the paper succeeded. The paper shows how to replace other
+
-
languages pervasive and intrusive error catching and handling code with much
+
-
cleaner, clearer, and often more correct code.
+
-
 
+
-
The implementation in GHC has changed the behavior of throwTo from asynchronous
+
-
(not-interruptible) to synchronous (interruptible?) as discussed in section 8 of
+
-
the paper. This change, in and of itself, is not the fatal problem; as
+
-
described in the paper a (forkIO (throwTo ...)) recovers the asynchronous behavior.
+
-
 
+
-
The fatal change between the paper and GHC comes from not following section 7.2
+
-
as published. Section "7.2 Implementation of throwTo" has two bullet point, and
+
-
the second bullet point is (retyped, so typos are my own fault):
+
-
 
+
-
"As soon as a thread exits the scope of a 'block', and at regular intervals
+
-
during execution inside 'unblock', it should check its queue of pending
+
-
exceptions. If the queue is non-empty, the first exception from the queue
+
-
should be raised."
+
-
 
+
-
A test of GHC 6.6 shows that this is not the case. Test program A:
+
-
<haskell>
+
-
> loop = block (print "alive") >> loop
+
-
>
+
-
> main = do tid <- forkIO loop
+
-
> threadDelay 1
+
-
> killThread tid
+
-
</haskell>
+
-
Program A, compiled with (-threaded) on a single CPU machine never halts. It
+
-
will print "alive" forever while the the main thread is blocked on "killThread".
+
-
 
+
-
As an aside, removing the threadDelay causes killThread to destroy the child
+
-
before it can enter the block, thus showing the need to add "forkBlockedIO" or
+
-
"forkInheritIO" to the library. This can be worked around using an MVar.
+
-
 
+
-
Changing the definition of loop produces Test program B:
+
-
<haskell>
+
-
> loop = block (print "alive") >> loop >> yield
+
-
>
+
-
> main = do tid <- forkIO loop
+
-
> threadDelay 1
+
-
> killThread tid
+
-
</haskell>
+
-
This prints "alive" twice before the killThread succeeds.
+
-
 
+
-
The paper demands that when the loop in Program A exits the scope of "block
+
-
(print a)" that it check a queue of pending exceptions, see that it is
+
-
non-empty, and raise the exception thrown by killThread. This can also be seen
+
-
in "Figure 5. Transition Rules for Asynchronous Exceptions", where killThread
+
-
should use throwTo to create an in-flight exception and exiting the scope of
+
-
block in the presence of this in-flight exception should raise the exception.
+
-
 
+
-
The implementation in GHC sleeps the main thread at the killThread command, and
+
-
it is awoken when the block is exited and to succeed in delivering the
+
-
exception it must execute while the child is still in the unblocked state. But
+
-
the child re-enters a blocked state too quickly in Program A, so killThread
+
-
never succeeds. The change in Program B has the child "yield" when unblocked
+
-
and this gives the main thread a change to succeed.
+
-
 
+
-
This trick using yield to make a safepoint was suggested by Simon Marlow:
+
-
<pre>
+
-
> The window in 'unblock (return ())' is tiny, I'm not really surprised if
+
-
> nothing ever gets through it. You might have more luck with 'unblock yield'.
+
-
</pre>
+
-
It has been said on this mailing list thread that needing "yield" to
+
-
program concurrently is a bug in the user's code and should be
+
-
replaced by other mechanisms. In a complex program with many threads
+
-
even the yield trick may not be enough to program reliably. Using
+
-
 
+
-
<haskell>
+
-
blockY io = do x <- block io
+
-
yield
+
-
return x
+
-
 
+
-
unblockY io = unblock (yield >> io)
+
-
</haskell>
+
-
 
+
-
instead of 'block' and 'unblock' is at least a simple but unreliable workaround.
+
-
 
+
-
This would not be a fatal flaw if there were a simple reliable
+
-
workaround. The best workaround is the following, which is anything
+
-
but simple, and amount to re-implementing the mechanisms in the paper:
+
-
 
+
-
* When forking a thread:
+
-
** Create an (Chan Exception) (or (TChan Exception)) and pass it to
+
-
the child thread. Call this the queue.
+
-
** Create a ((T)MVar queue) to indicate whether the thread is still
+
-
alive or not and wrap the whole child action in a 'finally' to
+
-
ensure this is set to the dead state when the child exits.
+
-
(Alive iff the queue is in the MVar)
+
-
** To be sure the child thread is in the handler, use another MVar
+
-
or re-use the "alive/dead" MVar to ensure the child is running
+
-
before returning the ThreadId from the fork operation.
+
-
 
+
-
* Replace all uses throwTo/killThread by
+
-
** Test the alive/dead state of the thread, proceeding only on
+
-
living threads
+
-
** Putting the Exception in the queue
+
-
** Call throwTo/killThread with the Exception
+
-
 
+
-
* The child thread must:
+
-
** implement safepoints by checking the queue
+
-
** At each 'block' to 'unblock' transition the child ought to add a
+
-
safepoint
+
-
 
+
-
This machinery closely recovers the queue of exceptions and the
+
-
semantics described in the paper. When a thread has died the queue is
+
-
removed from the alive/dead MVar and can be garbage collected, along
+
-
with any pending exceptions for the dead thread.
+
-
 
+
-
To ensure the above semantics are not violated by the caller, the now
+
-
fork operation should not return an exposed ThreadId, but should
+
-
return some proxy type or operation to ensure only the proper
+
-
throwto/killThread are peformed.
+
-
 
+
-
Next I will demonstrate how the paper's achievement of simple and
+
-
clear error handling code is lost.
+
-
 
+
-
The child thread is where the workaround becomes a nightmare. It is
+
-
impossible to test the current 'block' versus 'unblock' state. So the
+
-
only time the child is certain that a 'block' to 'unblock' transition
+
-
has occurred is when it is explicit:
+
-
<haskell>
+
-
> unblock ( .... block ( ... ) <*> .... )
+
-
</haskell>
+
-
The <*> is obviously such a transition and ought to have a safepoint added.
+
-
 
+
-
But is only because the block had the explicit unblock around it. If
+
-
there were a function such as
+
-
<haskell>
+
-
> atomicUpdateFoo = block (...) <?>
+
-
</haskell>
+
-
then there is no way to know if the <?> should be a safepoint.
+
-
Therefore atomicUpdateFoo needs documentation so that the caller knows
+
-
to add a safepoint in the case it is called from the unblocked case.
+
-
 
+
-
In a more complex example:
+
-
<haskell>
+
-
> atomicUpdateFooBar = block (...) <1?> block (...) <2?>
+
-
</haskell>
+
-
the caller cannot add <1?> itself, and must therefore pass in code
+
-
'safe :: IO ()' to all such functions:
+
-
<haskell>
+
-
> atomicUpdateFooBar safe = block (...) >> trans >> block (...) >>
+
-
</haskell>
+
-
trans where 'trans' is "return ()" or "checkChan queue" depending on
+
-
whether the the caller knows it is in a 'block' or 'unblock' state.
+
-
 
+
-
If the parent never needed to use 'throwTo'/'killThread' then the
+
-
child never needs 'block'/'unblock' and the nightmare is avoided. But
+
-
if the child uses any operation which can block such as 'accept' or
+
-
'getChar' or 'takeMVar' then the parent cannot deliver the signal
+
-
without waking up the child with a 'throwTo' which quickly leads to
+
-
either an overwhelming use of 'catch' in the child (like other
+
-
languages) or use of 'block' and the fragile workaround I just
+
-
described.
+
-
 
+
-
As an aside, if there was a "isBlockedThreadState" operation, then it
+
-
would be possible to create a better client workaround:
+
-
 
+
-
<haskell>
+
-
checkChan queue x = do
+
-
isBlocked <- isBlockedThreadState -- perhaps needing =<< myThreadId
+
-
case isBlocked of
+
-
True -> return x
+
-
False -> raiseQueue queue x
+
-
 
+
-
raiseQueue queue x = do
+
-
hasException <- isEmptyChan queue
+
-
case hasException of
+
-
False -> return x
+
-
True -> readChan >>= raise
+
-
 
+
-
checkedBlock queue io = do x <- block io
+
-
checkChan queue x
+
-
 
+
-
checkedUnblock queue io = checkChan queue >> io
+
-
</haskell>
+
-
 
+
-
If that existed then the client should use checkedBlock and
+
-
checkedUnblock instead of block and unblock. The net effect of this
+
-
workaround is to implement the queue described in the original paper
+
-
and wrap the forkIO/throwTo/block/unblock operations with new ones
+
-
that implement the semantics described in the section 7.2 of the
+
-
paper.
+
-
 
+
-
An ugly side effect of the above is that the modified throwTo also
+
-
sent the Exception via throwTo as well as the (T)Chan. So it is
+
-
possible that Exception could be raised twice depending on how the
+
-
child is written. And the order of exceptions in the channel may not
+
-
be the order that the throwTo's deliver the asynchronous exceptions.
+
-
 
+
-
To alleviate this somewhat you could have the modified throwTo put the
+
-
real exception on the (T)Chan and send a GoLookInTheChan asynchronous
+
-
exception. Reversing this might be possible, but it is not clear how.
+
-
If we could deterministicly wait for a pending asynchronous signal
+
-
then we could write a proper safe point to fix "block to unblock"
+
-
transitions, which we know we cannot do.
+
-
 
+
-
We can try to implement isBlockedThread state by maintaining a stack
+
-
of block/unblock operations:
+
-
 
+
-
<haskell>
+
-
checkedBlock stack queue io = do
+
-
x <- bracket_ (modifyIORef stack (True:))
+
-
(modifyIORef stack tail)
+
-
(io)
+
-
checkChan queue x
+
-
 
+
-
checkedUnblock stack queue io = do
+
-
checkChan queue ()
+
-
x <- bracket_ (modifyIORef stack (False:))
+
-
(modifyIORef stack tail)
+
-
(io)
+
-
 
+
-
isBlockedThread stack = liftM head (readIORef stack)
+
-
</haskell>
+
-
 
+
-
Where "stack :: IORef [Bool]" is created and passed to the child by
+
-
the modified fork operation. The 'stack' above grows without bound
+
-
during recursion and is not tail recursive, so the stack management
+
-
should be changed to the algorithm in the paper... in which case one
+
-
really has recreated the paper's implementation inside Haskell.
+
-
 
+
-
The only alternatives are to
+
-
* Write code that may hang forever at a throwTo/killThread
+
-
* Never use throwTo or killThread making block/unblock irrelevant
+
-
* Never use 'block' (and 'unblock')
+
-
* Only write children with explicit "(unblock ... block (... block
+
-
() ...) >>= raiseQueue queue >>= ...." where the different between
+
-
checked and unchecked block is obvious and explicit.
+
-
* Track and propagate the blocked state as in atomicUpdateFooBar.
+
-
 
+
-
This problem with throwTo/block was not obvious until I stated writing
+
-
small programs to test the implementation in GHC and thus discovered
+
-
that the paper's description had mislead me.
+
-
 
+
-
The key problem is, at least in the presence of block/unblock, that
+
-
Exceptions are never reliably delivered. And this is not a
+
-
theoretical case, but can cause a hang in something as innocuous as
+
-
"program A" given above.
+

Current revision


throwTo & block statements fixed in GHC 6.6.1

The problem which used to be described here has been fixed in GHC HEAD and this will be included in GHC version 6.6.1

I have built (Feb 9, 2007) the HEAD version of GHC and this problem is no longer present.

See the ticket at [1] and the comments near the top of [2].

The description which was here has been deleted, but is still available in the historical views of this page.