The speedup is around 6 times on a 12 core machine, which I think is pretty decent given that the parallelised section is only a part of my code. The nested parMaps were left over from a previous implementation, I have moved to using just the inner one, since the outer map doesn&#39;t divide the work equally and causes one thread to do most of the work.<br>

<br><div class="gmail_quote">On Mon, Oct 10, 2011 at 4:04 PM, Christopher Brown <span dir="ltr">&lt;<a href="mailto:cmb21@st-andrews.ac.uk">cmb21@st-andrews.ac.uk</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div style="word-wrap:break-word">What kinds of speedup are you getting?<div><br></div><div>422000 is a lot of sparks, could be due to the fact you are nesting your par maps (why do you do this?)</div><div><br></div><div>

Chris.</div><div><br></div><div><br></div><div><br><div><div><div></div><div class="h5"><div>On 10 Oct 2011, at 15:44, Tom Thorne wrote:</div><br></div></div><blockquote type="cite"><div><div></div><div class="h5">thanks! I just tried setting -A32M and this seems to fix the parallel GC problems, I now get a speedup with parallel GC on and performance is the same as passing -qg. I had tried -H before and it only made things worse, but -A seems to do the trick.<div>


<br></div><div>I&#39;m still having problems with segmentation faults though. Depending on how I apply parMap, and whether I use monad-par or control.parallel, they seem to come and go arbitrarily. In a successful run that lasted about 30s in total with control.parallel, +RTS -s reports:</div>


<div>SPARKS: 422712 (394377 converted, 0 pruned)</div><div><br></div><div>am I creating too many sparks?</div><div><br><div class="gmail_quote">On Mon, Oct 10, 2011 at 3:07 PM, Gregory Collins <span dir="ltr">&lt;<a href="mailto:greg@gregorycollins.net" target="_blank">greg@gregorycollins.net</a>&gt;</span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>On Mon, Oct 10, 2011 at 3:55 PM, Tom Thorne &lt;<a href="mailto:thomas.thorne21@gmail.com" target="_blank">thomas.thorne21@gmail.com</a>&gt; wrote:<br>


&gt;<br>

&gt; Yes I will try to run threadscope on it, I tried it before and the event log output produced about 1.8GB, and then crashed.<br>

&gt; Is there any way to tell the RTS to perform GC less often? My code doesn&#39;t use too much memory and I&#39;m using fairly hefty machines (e.g one with 48 cores and 128GB of RAM) and so perhaps the default/heuristic settings aren&#39;t optimal.<br>


<br>

</div>Increasing &quot;-A&quot; and &quot;-H&quot; in the RTS options should help with this.<br>

<br>

G<br>

<font color="#888888">--<br>

Gregory Collins &lt;<a href="mailto:greg@gregorycollins.net" target="_blank">greg@gregorycollins.net</a>&gt;<br>

</font></blockquote></div><br></div></div></div><div class="im">

_______________________________________________<br>Haskell-Cafe mailing list<br><a href="mailto:Haskell-Cafe@haskell.org" target="_blank">Haskell-Cafe@haskell.org</a><br><a href="http://www.haskell.org/mailman/listinfo/haskell-cafe" target="_blank">http://www.haskell.org/mailman/listinfo/haskell-cafe</a><br>

</div></blockquote></div><br></div></div></blockquote></div><br>