On further investigation, it seems to be very specific to Mac OS Lion (I am running 10.7.3) - all tests were with -N3 option:<div><br></div><div>- I can reliably crash the code with seg fault or bus error if I create more than 8 threads in C FFI (each thread creates its own mutex, for 1-1 coordination with Haskell timer thread). My iMac has 4 processors. In gdb, I can see that the crash happened in __psynch_cvsignal () which seems to be related to pthread mutex.</div>

<div><br></div><div>- If I increase the number of C FFI threads (and hence, pthread mutexes) to &gt;=7, the number of tasks starts increasing. 8 is the max number of FFI threads in my testing where the code runs without crashing. But, it seems that there is some kind of pthread mutex related leak. What the timer thread does is to fork 8 parallel haskell threads to acquire mutexes from each of the C FFI thread. Though the function returns after acquiring, collecting data, and releasing mutex, some of the threads seem to be marked as active by GC, because of mutex memory leak. Exactly how, I don&#39;t know.</div>

<div><br></div><div>- If I keep the number of C FFI threads to &lt;=6, there is no memory leak. The number of tasks stays steady.</div><div><br></div><div>So, it seems to be pthread library issue (and not a GHC issue). Something to keep in mind when developing code on Mac that involves mutex coordination with C FFI.</div>

<div><br><br><div class="gmail_quote">On Sat, Feb 25, 2012 at 2:59 PM, Sanket Agrawal <span dir="ltr">&lt;<a href="mailto:sanket.agrawal@gmail.com">sanket.agrawal@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

I wrote a program that uses a timed thread to collect data from a C producer (using FFI). The number of threads in C producer are fixed (and created at init). One haskell timer thread uses threadDelay to run itself on timed interval. When I look at RTS output after killing the program after couple of timer iterations, I see number of worker tasks increasing with time.<div>


<br></div><div> For example, below is an output after 20 iterations of timer event:<div><br></div><div><div><div>                      MUT time (elapsed)       GC time  (elapsed)</div><div>  Task  0 (worker) :    0.00s    (  0.00s)       0.00s    (  0.00s)</div>


<div>  Task  1 (worker) :    0.00s    (  0.00s)       0.00s    (  0.00s)</div><div>  .......output until task 37 snipped as it is same as task 1.......</div><div>  Task 38 (worker) :    0.07s    (  0.09s)       0.00s    (  0.00s)</div>


<div>  Task 39 (worker) :    0.07s    (  0.09s)       0.00s    (  0.00s)</div><div>  Task 40 (worker) :    0.18s    ( 10.20s)       0.00s    (  0.00s)</div><div>  Task 41 (worker) :    0.18s    ( 10.20s)       0.00s    (  0.00s)</div>


<div>  Task 42 (worker) :    0.18s    ( 10.20s)       0.00s    (  0.00s)</div><div>  Task 43 (worker) :    0.18s    ( 10.20s)       0.00s    (  0.00s)</div><div>  Task 44 (worker) :    0.52s    ( 10.74s)       0.00s    (  0.00s)</div>


<div>  Task 45 (worker) :    0.52s    ( 10.75s)       0.00s    (  0.00s)</div><div>  Task 46 (worker) :    0.52s    ( 10.75s)       0.00s    (  0.00s)</div><div>  Task 47 (bound)  :    0.00s    (  0.00s)       0.00s    (  0.00s)</div>


</div></div><div><br></div><div><br></div><div>After two iterations of timer event:</div><div><br></div><div><div>                       MUT time (elapsed)       GC time  (elapsed)</div><div>  Task  0 (worker) :    0.00s    (  0.00s)       0.00s    (  0.00s)</div>


<div>  Task  1 (worker) :    0.00s    (  0.00s)       0.00s    (  0.00s)</div><div>  Task  2 (worker) :    0.07s    (  0.09s)       0.00s    (  0.00s)</div><div>  Task  3 (worker) :    0.07s    (  0.09s)       0.00s    (  0.00s)</div>


<div>  Task  4 (worker) :    0.16s    (  1.21s)       0.00s    (  0.00s)</div><div>  Task  5 (worker) :    0.16s    (  1.21s)       0.00s    (  0.00s)</div><div>  Task  6 (worker) :    0.16s    (  1.21s)       0.00s    (  0.00s)</div>


<div>  Task  7 (worker) :    0.16s    (  1.21s)       0.00s    (  0.00s)</div><div>  Task  8 (worker) :    0.48s    (  1.80s)       0.00s    (  0.00s)</div><div>  Task  9 (worker) :    0.48s    (  1.81s)       0.00s    (  0.00s)</div>


<div>  Task 10 (worker) :    0.48s    (  1.81s)       0.00s    (  0.00s)</div><div>  Task 11 (bound)  :    0.00s    (  0.00s)       0.00s    (  0.00s)</div></div><div><br></div><div><br></div><div>Haskell code has one forkIO call to kick off C FFI - C FFI creates 8 threads. Runtime options are &quot;-N3 +RTS -s&quot;. timer event is kicked off after forkIO. It is for the form (pseudo-code):</div>


<div><br></div><div>timerevent &lt;other arguments&gt; time = run where run = do threadDelay time &gt;&gt; do some work &gt;&gt; run where &lt;other variables defined for run function&gt;</div><div><br></div></div><div>I also wrote a simpler code using just timer event (fork one timer event, and run another timer event after that), but didn&#39;t see any tasks in RTS output. </div>


<div><br></div><div>I tried searching GHC page for documentation on RTS output, but didn&#39;t find anything that could help me debug above issue. I suspect that timer event is the root cause of increasing number of tasks (with all but last 9 tasks idle -  I guess 8 tasks belong to C FFI, and one task to timerevent thread), and hence, memory leak. </div>


<div><br></div><div>I will appreciate pointers on how to debug it. The timerevent does forkIO a call to send collected data from C FFI to a db server, but disabling that fork still results in the issue of increasing number of tasks. So, it seems strongly correlated with timer event though I am unable to reproduce it with a simpler version of timer event (which removes mvar sync/callback from C FFI).</div>


</blockquote></div><br></div>