Increasing number of worker tasks in RTS (GHC 7.4.1) - how to debug?

Sanket Agrawal sanket.agrawal at gmail.com
Sat Feb 25 21:41:11 CET 2012


On further investigation, it seems to be very specific to Mac OS Lion (I am
running 10.7.3) - all tests were with -N3 option:

- I can reliably crash the code with seg fault or bus error if I create
more than 8 threads in C FFI (each thread creates its own mutex, for 1-1
coordination with Haskell timer thread). My iMac has 4 processors. In gdb,
I can see that the crash happened in __psynch_cvsignal () which seems to be
related to pthread mutex.

- If I increase the number of C FFI threads (and hence, pthread mutexes) to
>=7, the number of tasks starts increasing. 8 is the max number of FFI
threads in my testing where the code runs without crashing. But, it seems
that there is some kind of pthread mutex related leak. What the timer
thread does is to fork 8 parallel haskell threads to acquire mutexes from
each of the C FFI thread. Though the function returns after acquiring,
collecting data, and releasing mutex, some of the threads seem to be marked
as active by GC, because of mutex memory leak. Exactly how, I don't know.

- If I keep the number of C FFI threads to <=6, there is no memory leak.
The number of tasks stays steady.

So, it seems to be pthread library issue (and not a GHC issue). Something
to keep in mind when developing code on Mac that involves mutex
coordination with C FFI.


On Sat, Feb 25, 2012 at 2:59 PM, Sanket Agrawal <sanket.agrawal at gmail.com>wrote:

> I wrote a program that uses a timed thread to collect data from a C
> producer (using FFI). The number of threads in C producer are fixed (and
> created at init). One haskell timer thread uses threadDelay to run itself
> on timed interval. When I look at RTS output after killing the program
> after couple of timer iterations, I see number of worker tasks increasing
> with time.
>
>  For example, below is an output after 20 iterations of timer event:
>
>                       MUT time (elapsed)       GC time  (elapsed)
>   Task  0 (worker) :    0.00s    (  0.00s)       0.00s    (  0.00s)
>   Task  1 (worker) :    0.00s    (  0.00s)       0.00s    (  0.00s)
>   .......output until task 37 snipped as it is same as task 1.......
>   Task 38 (worker) :    0.07s    (  0.09s)       0.00s    (  0.00s)
>   Task 39 (worker) :    0.07s    (  0.09s)       0.00s    (  0.00s)
>   Task 40 (worker) :    0.18s    ( 10.20s)       0.00s    (  0.00s)
>   Task 41 (worker) :    0.18s    ( 10.20s)       0.00s    (  0.00s)
>   Task 42 (worker) :    0.18s    ( 10.20s)       0.00s    (  0.00s)
>   Task 43 (worker) :    0.18s    ( 10.20s)       0.00s    (  0.00s)
>   Task 44 (worker) :    0.52s    ( 10.74s)       0.00s    (  0.00s)
>   Task 45 (worker) :    0.52s    ( 10.75s)       0.00s    (  0.00s)
>   Task 46 (worker) :    0.52s    ( 10.75s)       0.00s    (  0.00s)
>   Task 47 (bound)  :    0.00s    (  0.00s)       0.00s    (  0.00s)
>
>
> After two iterations of timer event:
>
>                        MUT time (elapsed)       GC time  (elapsed)
>   Task  0 (worker) :    0.00s    (  0.00s)       0.00s    (  0.00s)
>   Task  1 (worker) :    0.00s    (  0.00s)       0.00s    (  0.00s)
>   Task  2 (worker) :    0.07s    (  0.09s)       0.00s    (  0.00s)
>   Task  3 (worker) :    0.07s    (  0.09s)       0.00s    (  0.00s)
>   Task  4 (worker) :    0.16s    (  1.21s)       0.00s    (  0.00s)
>   Task  5 (worker) :    0.16s    (  1.21s)       0.00s    (  0.00s)
>   Task  6 (worker) :    0.16s    (  1.21s)       0.00s    (  0.00s)
>   Task  7 (worker) :    0.16s    (  1.21s)       0.00s    (  0.00s)
>   Task  8 (worker) :    0.48s    (  1.80s)       0.00s    (  0.00s)
>   Task  9 (worker) :    0.48s    (  1.81s)       0.00s    (  0.00s)
>   Task 10 (worker) :    0.48s    (  1.81s)       0.00s    (  0.00s)
>   Task 11 (bound)  :    0.00s    (  0.00s)       0.00s    (  0.00s)
>
>
> Haskell code has one forkIO call to kick off C FFI - C FFI creates 8
> threads. Runtime options are "-N3 +RTS -s". timer event is kicked off after
> forkIO. It is for the form (pseudo-code):
>
> timerevent <other arguments> time = run where run = do threadDelay time >>
> do some work >> run where <other variables defined for run function>
>
> I also wrote a simpler code using just timer event (fork one timer event,
> and run another timer event after that), but didn't see any tasks in RTS
> output.
>
> I tried searching GHC page for documentation on RTS output, but didn't
> find anything that could help me debug above issue. I suspect that timer
> event is the root cause of increasing number of tasks (with all but last 9
> tasks idle -  I guess 8 tasks belong to C FFI, and one task to timerevent
> thread), and hence, memory leak.
>
> I will appreciate pointers on how to debug it. The timerevent does forkIO
> a call to send collected data from C FFI to a db server, but disabling that
> fork still results in the issue of increasing number of tasks. So, it seems
> strongly correlated with timer event though I am unable to reproduce it
> with a simpler version of timer event (which removes mvar sync/callback
> from C FFI).
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20120225/37765e38/attachment.htm>


More information about the Glasgow-haskell-users mailing list