[Haskell-cafe] forkProcess, forkIO, and multithreaded runtime

Alexander Kjeldaas alexander.kjeldaas at gmail.com
Mon Jan 21 10:42:37 CET 2013


Or this.  It seems that you must compile with DEBUG for the mutex check.
 This enables error-checking mutexes on posix.

Alexander

diff --git a/rts/posix/OSThreads.c b/rts/posix/OSThreads.c
index ae31966..e07221d 100644
--- a/rts/posix/OSThreads.c
+++ b/rts/posix/OSThreads.c
@@ -91,7 +91,8 @@ initCondition( Condition* pCond )
 void
 closeCondition( Condition* pCond )
 {
-  pthread_cond_destroy(pCond);
+  int ret = pthread_cond_destroy(pCond);
+  CHECKM(ret == 0, "RTS Bug! Someone is waiting on condvar ret=%d.", ret);
   return;
 }

@@ -165,7 +166,8 @@ initMutex(Mutex* pMut)
 void
 closeMutex(Mutex* pMut)
 {
-    pthread_mutex_destroy(pMut);
+    int ret = pthread_mutex_destroy(pMut);
+    CHECKM(ret == 0, "RTS Bug! Destroying held mutex ret=%d", ret);
 }

 void


On Mon, Jan 21, 2013 at 10:14 AM, Alexander Kjeldaas <
alexander.kjeldaas at gmail.com> wrote:

> I think you can test this theory with this patch.  If a thread is waiting
> on the task->cond condition variable which is matched up with task->lock,
> then pthread_cond_destroy will return EBUSY, which must always be a bug in
> the RTS.
>
> Alexander
>
> diff --git a/rts/posix/OSThreads.c b/rts/posix/OSThreads.c
> index ae31966..0f12830 100644
> --- a/rts/posix/OSThreads.c
> +++ b/rts/posix/OSThreads.c
> @@ -91,7 +91,8 @@ initCondition( Condition* pCond )
>  void
>  closeCondition( Condition* pCond )
>  {
> -  pthread_cond_destroy(pCond);
> +  int ret = pthread_cond_destroy(pCond);
> +  CHECKM(ret == 0, "RTS BUG! Someone is waiting on condvar %d.", ret);
>    return;
>  }
>
>
>
> On Mon, Jan 21, 2013 at 8:18 AM, Alexander Kjeldaas <
> alexander.kjeldaas at gmail.com> wrote:
>
>>
>> I just looked at this code and since I don't know the code I can't give
>> you good solutions, but for others watching this thread the links might
>> prove interesting.
>>
>> My main theory is that you do have some other thread in FFI-land while
>> you are fork()ing.  The task->cond, task->lock seems to be related to this
>> (see quoted comments below).
>>
>> Also, pthread_mutex_destroy is undefined if the lock is locked, so I am
>> guessing that the task->lock is somehow locked when it shouldn't be.
>>
>> It isn't clear from your description whether this is consistently
>> happening on Linux, or whether this only sometimes happens.
>>
>> The forkProcess() code seems to hold all capabilities during fork, but
>> that does not include FFI-land threads AFAIU.
>>
>> Assuming that this happens only rarely, I am trying to understand what
>> happens if the thread that is in FFI-land returns to the RTS (in the
>> parent) after fork(), but before the freeTask() in the child.  Based on the
>> descriptions I read, it seems likely that this thread will try to inspect
>> task->cap, which requires holding task->lock.
>>
>> That would in turn make the pthread_mutex_destroy in the child invalid.
>>
>> https://github.com/ghc/ghc/blob/master/rts/Task.h#L57
>>
>> """
>>  ...
>>  When a task is migrated from sleeping on one Capability to another,
>>    its task->cap field must be modified.  When the task wakes up, it
>>    will read the new value of task->cap to find out which Capability
>>    it belongs to.  Hence some synchronisation is required on
>>    task->cap, and this is why we have task->lock.
>>
>>    If the Task is not currently owned by task->id, then the thread is
>>    either
>>
>>      (a) waiting on the condition task->cond.  The Task is either
>>          (1) a bound Task, the TSO will be on a queue somewhere
>>  (2) a worker task, on the spare_workers queue of task->cap.
>>    ...
>> """
>>
>> freeTask:
>> https://github.com/ghc/ghc/blob/master/rts/Task.c#L142
>>
>> the comment in freeTask refers to this test:
>>
>> https://github.com/ghc/testsuite/blob/master/tests/concurrent/should_run/conc059.hs
>>
>> That test calls the RTC from C which then forkIOs off actions that are
>> outstanding when the RTS exits.
>>
>> in forkProcess, child code
>> https://github.com/ghc/ghc/blob/master/rts/Schedule.c#L1837
>>
>> It look like all this code supports the notion that some other thread can
>> be in foreign code during the fork call.
>>
>> discardTasksExcept
>> https://github.com/ghc/ghc/blob/master/rts/Task.c#L305
>>
>>
>> Alexander
>>
>>
>> On Mon, Jan 21, 2013 at 12:15 AM, Mark Lentczner <
>> mark.lentczner at gmail.com> wrote:
>>
>>> Sorry to be reviving this thread so long after.... but I seem to be
>>> running into similar issues as Michael S. did at the start.
>>>
>>> In short, I'm using forkProcess with the threaded RTS, and see
>>> occasional hangs:
>>>
>>>    - I see these only on Linux. On Mac OS X, I never do.
>>>    - I'm using GHC 7.4.2
>>>    - I noticed the warning in the doc for forkProcess, but assumed I
>>>    was safe, as I wasn't holding any shared resources at the time of the fork,
>>>    and no shared resources in the program are used in the child.
>>>    - WIth gdb, I've traced the hang to here in the run-time: forkProcess
>>>    > discardTasksExcept > freeTask > closeMutex(&task->lock)
>>>    > pthread_mutex_destroy
>>>
>>> The discussion in this thread leaves me with these questions:
>>>
>>>    - Is there reason to think the situation has gotten better in 7.6
>>>    and later?
>>>    - Isn't the only reason *System.Process* is safer because it does an
>>>    immediate exec in the child? Alas, I really want to just fork()sometimes.
>>>    - Is it really true that even if my program has no shared resources
>>>    with the child, that the IO subsystem and FFI system do anyway? Surely the
>>>    RTS would take care of doing the right thing with those, no?
>>>    - There should be no concern with exec w.r.t. library invariants
>>>    since exec is wholesale replacement - all the libraries will
>>>    reinitialize. Is there a problem here I'm missing?
>>>
>>> Alas, I've stopped using the threaded RTS until I understand this better.
>>>
>>> - Mark
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130121/1bde3f67/attachment.htm>


More information about the Haskell-Cafe mailing list