Hi Edward,<div><br></div><div>I was just going to get back to you about it. I did find out that the issue was indeed one GHC thread dealing with 5 C threads for callback (1:5 mapping) - so, the C threads were blocking on callback waiting for the only GHC thread to be available. I updated the code to do 1:1 mapping - 5 GHC threads for 5 C threads. That proved to be almost linearly scalable.</div>

<div><br></div><div>John Latos suggested the above approach two days back, but I didn&#39;t get to test the idea until now.</div><div><br></div><div>It doesn&#39;t seem to matter whether number of GHC threads are increased, if the mapping between GHC threads and C threads is not 1:1. I got 1:1 mapping by doing forkIO for each C thread. Is it really possible to do 7:5 mapping (that is 7 GHC threads to choose from, for 5 C threads during callback)? I can&#39;t think of a way to do it. Not that I need it. I am just curious if that is possible.</div>

<div><br></div><div>Thanks,</div><div>Sanket<br><br><div class="gmail_quote">On Fri, Jan 20, 2012 at 11:16 PM, Edward Z. Yang <span dir="ltr">&lt;<a href="mailto:ezyang@mit.edu">ezyang@mit.edu</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hello Sanket,<br>

<br>

What happens if you run this experiment with 5 threads in the C function,<br>

and have GHC run RTS with -N7? (e.g. five C threads + seven GHC threads = 12<br>

threads on your 12-core box.)<br>

<div><div></div><div class="h5"><br>

Edward<br>

<br>

Excerpts from Sanket Agrawal&#39;s message of Tue Jan 17 23:31:<a href="tel:38%20-0500%202012" value="+13805002012">38 -0500 2012</a>:<br>

&gt; I posted this issue on StackOverflow today. A brief recap:<br>

&gt;<br>

&gt;  In the case when C FFI calls back a Haskell function, I have observed<br>

&gt; sharp increase in total time when multi-threading is enabled in C code<br>

&gt; (even when total number of function calls to Haskell remain same). In my<br>

&gt; test, I called a Haskell function 5M times using two scenarios (GHC 7.0.4,<br>

&gt; RHEL5, 12-core box):<br>

&gt;<br>

&gt;<br>

&gt;    - Single-threaded C function: call back Haskell function 5M times -<br>

&gt;    Total time 1.32s<br>

&gt;    - 5 threads in C function: each thread calls back the Haskell function 1M<br>

&gt;    times - so, total is still 5M - Total time 7.79s - Verified that pthread<br>

&gt;    didn&#39;t contribute much to the overhead by having the same code call a C<br>

&gt;    function instead, and compared with single-threaded version. So, almost all<br>

&gt;    of the increase in overhead seems to come from GHC runtime.<br>

&gt;<br>

&gt; What I want to ask is if this is a known issue for GHC runtime? If not,  I<br>

&gt; will file a bug report for GHC team with code to reproduce it. I don&#39;t want<br>

&gt; to file a duplicate bug report if this is already known issue. I searched<br>

&gt; through GHC trac using some keywords but didn&#39;t see any bugs related to it.<br>

&gt;<br>

&gt; StackOverflow post link (has code and details on how to reproduce the<br>

&gt; issue):<br>

&gt; <a href="http://stackoverflow.com/questions/8902568/runtime-performance-degradation-for-c-ffi-callback-when-pthreads-are-enabled" target="_blank">http://stackoverflow.com/questions/8902568/runtime-performance-degradation-for-c-ffi-callback-when-pthreads-are-enabled</a><br>


</div></div></blockquote></div><br></div>