<div dir="ltr"><div>ok, could you add those comments (about additional operations to consider) to the ticket?</div><div><br></div><div>relatedly: if we want these atomic ops to use the sequential analogues when we&#39;re not using the threaded run time system, does that mean </div>


<div>we need to have a symbol / constant variable exposed in the RTS we link in, so that the inline code branches on a linktime constant value / symbol (something like &quot;isThreadedRTS:: Bool&quot;, )  or some sort of analogue thereof?  </div>


<div><br></div><div>one nice thing about doing such, is that if at some point link time optimization is added, the branch would go away! On the other hand, it could be argued that the cost of the call to the CAS primops in their current form isn&#39;t that much more expensive than such a branch. </div>


<div><br></div><div style>I should add that question to the ticket, but its worth hashing out first.</div><div><br></div><div><br></div><div style>thoughts? I&#39;m probably overlooking some parts of this too</div><div style>


-Carter</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Sat, Jul 20, 2013 at 1:49 PM, Ryan Newton <span dir="ltr">&lt;<a href="mailto:rrnewton@gmail.com" target="_blank">rrnewton@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr">Ah, I see.  There are several ways this could be done.  With the &quot;substitute the cas funcall&quot; line I thought you were going for an intermediate solution that would help the LLVM backend but not the native codegen.  I was thinking you would leave the out-of-line primop definition for, e.g., casMutVar#, but fix the ccall to &quot;cas&quot; within that primop, so that you don&#39;t need a C function call sequence.  But it sounds like you are going whole hog and going right for inline primops!  Great.<div>


<br></div><div>Actually, there are some places where I am ignorant of what optimizations the backend(s) can do (and I haven&#39;t been able to learn the answer <a href="http://ghc.haskell.org/trac/ghc/wiki/Commentary/PrimOps" target="_blank">from the commentary yet</a>).  For example, I assume calls to C are never inlinable, but <b>are &quot;out of line&quot; primops inlinable</b>?  </div>


<div>You alluded to the double call over head -- first for out-of-line casMutVar# and then to the C function &quot;cas&quot;.  Does that mean &quot;no&quot; they are not inlinable?  (There is one sentence in the commentary that makes it sound like &quot;no&quot;: <span style="font-size:12.800000190734863px;font-family:Verdana,Arial,&#39;Bitstream Vera Sans&#39;,Helvetica,sans-serif"><i>This also changes to code generator to push the continuation of any follow on code onto the stack.</i></span><span style="font-size:12.800000190734863px;font-family:Verdana,Arial,&#39;Bitstream Vera Sans&#39;,Helvetica,sans-serif">)</span></div>


<div><br></div><div>One thing that I now understand looking at Tibbe&#39;s patches, is that going to inline primops does NOT mean forgoing FFI calls necessarily.  That patch still uses <span style="background-color:rgb(248,238,199);color:rgb(51,51,51);font-family:Consolas,&#39;Liberation Mono&#39;,Courier,monospace;font-size:12.800000190734863px;line-height:14.399999618530273px;white-space:nowrap">emitForeignCall</span> within <span style="background-color:rgb(248,238,199);color:rgb(51,51,51);font-family:Consolas,&#39;Liberation Mono&#39;,Courier,monospace;font-size:12.800000190734863px;line-height:14.399999618530273px;white-space:nowrap">emitPopCntCall</span>.  Is that what you were planning to do for the atomic primops?  </div>


<div><br></div><div>The alternative, which seemed laborious, is to take code like this:</div><div><br></div><div><div><font face="courier new, monospace" size="1"><b>  cas(StgVolatilePtr p, StgWord o, StgWord n)</b></font></div>


<div><font face="courier new, monospace" size="1"><b>  {</b></font></div><div><font face="courier new, monospace" size="1"><b>  #if i386_HOST_ARCH || x86_64_HOST_ARCH</b></font></div><div><font face="courier new, monospace" size="1"><b>      __asm__ __volatile__ (</b></font></div>


<div><font face="courier new, monospace" size="1"><b>   <span style="white-space:pre-wrap">        </span>  &quot;lock\ncmpxchg %3,%1&quot;</b></font></div><div><font face="courier new, monospace" size="1"><b>            :&quot;=a&quot;(o), &quot;=m&quot; (*(volatile unsigned int *)p) </b></font></div>


<div><font face="courier new, monospace" size="1"><b>            :&quot;0&quot; (o), &quot;r&quot; (n));</b></font></div><div><font face="courier new, monospace" size="1"><b>      return o;</b></font></div><div><font face="courier new, monospace" size="1"><b>  #elif powerpc_HOST_ARCH </b></font></div>


<div><font face="courier new, monospace" size="1"><b>  ....</b></font></div></div><div><br></div><div>and embed its logic within the codegen for the inline primops.  </div><div><br></div><div>-----------------------------------------------------------------------------</div>


<div>Anyway, to answer your question about which primops I&#39;d like to see:</div><div><ul><li>CAS on MutVars, MutableArray#, and MutableByteArray#</li><li>fetch and add on MutableByteArray#</li><li>barriers / memory fences</li>


<li>Drafts of .cmm for these <a href="https://github.com/rrnewton/haskell-lockfree-queue/blob/master/AtomicPrimops/cbits/primops.cmm" target="_blank">can be found here</a>.  Note that *only* casMutVar# is currently shipped with GHC.</li>


</ul></div><div>These are the ones I&#39;m using currently.  But there&#39;s no reason that we shouldn&#39;t aim for a fairly &quot;complete set&quot;.  For example, why not have fetch-and-sub and the other &quot;atomicrmw&quot; variants?  Relating these to the LLVM atomics and memory orderings, they become:</div>


<div><ul><li>CAS variants = LLVM <span style="line-height:20px;font-size:13.63636302947998px;font-family:&#39;Lucida Grande&#39;,&#39;Lucida Sans Unicode&#39;,Geneva,Verdana,sans-serif">cmpxchg with SequentiallyConsistent ordering</span></li>


<li><span style="line-height:20px;font-size:13.63636302947998px;font-family:&#39;Lucida Grande&#39;,&#39;Lucida Sans Unicode&#39;,Geneva,Verdana,sans-serif">fetch-and-X variants = LLVM atomicrmw with SequentiallyConsistent</span></li>


<li><span style="line-height:20px;font-size:13.63636302947998px;font-family:&#39;Lucida Grande&#39;,&#39;Lucida Sans Unicode&#39;,Geneva,Verdana,sans-serif">store_load_barrier = LLVM fenceInst with SequentiallyConsistent</span></li>


<li><span style="line-height:20px;font-size:13.63636302947998px;font-family:&#39;Lucida Grande&#39;,&#39;Lucida Sans Unicode&#39;,Geneva,Verdana,sans-serif">write_barrier and load_load_barrier = I *think* these are both covered by a FenceInst with AcquireRelease ordering...</span></li>


</ul></div><div>Someone else double checking these would be good, since I&#39;m not yet familiar with LLVM and am just going off the documentation you linked.</div><div><br></div><div>Btw, I&#39;m not sure why SMP.h uses <font face="courier new, monospace">&quot;lock; addl $0,0(%%esp)&quot;</font> instead of the <font face="courier new, monospace">mfence</font> instruction for store_load_barrier on x86, but I believe they should be the same.</div>


<div><br></div><div>  -Ryan</div><div><br></div><div>[1] I note that the LLVM documentation says &quot;<span style="line-height:20px;font-size:13.63636302947998px;font-family:&#39;Lucida Grande&#39;,&#39;Lucida Sans Unicode&#39;,Geneva,Verdana,sans-serif">store-store fences are generally not exposed to IR because they are extremely difficult to use correctly.&quot;</span></div>


<div><div>


<div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Sat, Jul 20, 2013 at 3:19 AM, Carter Schonwald <span dir="ltr">&lt;<a href="mailto:carter.schonwald@gmail.com" target="_blank">carter.schonwald@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div><br></div><div><br></div>Ryan, you misunderstand (or maybe i&#39;m not understanding quite). It is 330 am  after all! (I might be better at explaining tomorrow afternoon)<div>


<br></div><div>the idea is to provide CMM/haskell level primops, not to &quot;pattern match on the ccall&quot;. I leave the updating of any cmm code to use such intrinsics as distinct task to be done subsequently :) </div>


<div><br></div><div>  If you look at the example patches for pop count that David Terei referred me to,  <a href="https://github.com/ghc/ghc/commit/2d0438f329ac153f9e59155f405d27fac0c43d65" target="_blank">https://github.com/ghc/ghc/commit/2d0438f329ac153f9e59155f405d27fac0c43d65</a> (for the native code gen) and <a href="https://github.com/ghc/ghc/commit/2906db6c3a3f1000bd7347c7d8e45e65eb2806cb" target="_blank">https://github.com/ghc/ghc/commit/2906db6c3a3f1000bd7347c7d8e45e65eb2806cb</a> for the llvm code gen, the pattern is pretty clear, adding new &quot;first class&quot; primiops</div>


<div><br></div><div>Point being, dont&#39; worry about that right now, (its 3am after all)</div><div><br></div><div>What I want from you is a clear description of the CMM / Haskell level PrimOps you want for making your life easier in supporting great parallelism in GHC, in terms of those LLVM operations and their semantics that I&#39;ve referred you to. </div>


<div><br></div><div>what the final names of these will be can be bike shedded some other time, doesn&#39;t matter currently. For now, please read my ticket and the llvm links when you have the bandwidth, and layout what you&#39;d want primop wise! </div>


<div><br></div><div>thanks</div><span><font color="#888888"><div>-Carter</div><div><br></div><div><br></div><div><br></div></font></span></div><div><div><div class="gmail_extra"><br><br><div class="gmail_quote">


On Sat, Jul 20, 2013 at 2:47 AM, Ryan Newton <span dir="ltr">&lt;<a href="mailto:rrnewton@gmail.com" target="_blank">rrnewton@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div>Sorry, &quot;rewrite&quot; was too overloaded a term to use here.  I was just referring to the proposal to &quot;substitute the cas funcall with the right llvm operation&quot;.</div>


<div><br></div><div>


That is, the approach would pattern match for the CMM code &quot;ccall cas&quot; or &quot;foreign &quot;C&quot; cas&quot; (I&#39;m afraid I don&#39;t know the difference between those) and replace it with the equivalent LLVM op, right?</div>


<div><br></div><div>I think the assumption there is that the native codegen would still have to suffer the funcall overhead and use the C versions.  I don&#39;t know exactly what the changes would look like to make barriers/CAS all proper inline primops, because it would have to reproduce in the code generator all the platform-specific #ifdef&#39;d C code that is currently in SMP.h.  Which I guess is doable, but probably only for someone who knows the native GHC codegen properly...</div>


<div><br></div></div><div><div><div class="gmail_extra"><br><br><div class="gmail_quote">On Sat, Jul 20, 2013 at 2:30 AM, Carter Schonwald <span dir="ltr">&lt;<a href="mailto:carter.schonwald@gmail.com" target="_blank">carter.schonwald@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr">Ryan, could you explain what you want more precisely? Specifically what you want in terms of exposed primops using the terminology / vocabulary in <a href="http://llvm.org/docs/LangRef.html#ordering" target="_blank">http://llvm.org/docs/LangRef.html#ordering</a> and <a href="http://llvm.org/docs/Atomics.html" target="_blank">http://llvm.org/docs/Atomics.html</a> ?<div>


<br></div><div> I&#39;ll first do the work for just the LLVM backend, and I&quot;ll likely need some active guidance / monitoring for the native codegen analogues</div><div><br></div><div>(also asked this on ticket for documentation purposes)</div>


</div><div><div><div class="gmail_extra"><br><br><div class="gmail_quote">On Sat, Jul 20, 2013 at 2:18 AM, Ryan Newton <span dir="ltr">&lt;<a href="mailto:rrnewton@gmail.com" target="_blank">rrnewton@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr">Hi Carter,<div><br></div><div>Yes, SMP.h is where I&#39;ve copy pasted the duplicate functionality from (since I can&#39;t presently rely on linking the symbols).</div>


<div><br></div><div>Your proposal for the LLVM backend sounds *<b>great</b>*.  But it also is going to provide additional constraints for getting &quot;atomic-primops&quot; right.  </div>


<div>   The goal of atomic-primops is to be a stable Haskell-level interface into the relevant CAS and fetch-and-add stuff.  The reason this is important is that one has to be very careful to defeat the GHC optimizer in all the relevant places and make pointer equality a reliable property.  I would like to get atomic-primops to work reliably in 7.4, 7.6 [and 7.8] and have more &quot;native&quot; support in future GHC releases, where maybe the foreign primops would become unecessary.  (They are a pain and have already exposed one blocking cabal bug, fixed in upcoming 1.17.)</div>


<div><br></div><div>A couple additional suggestions for the proposal in ticket #7883:</div><div><ul><li>we should use more unique symbols than &quot;cas&quot;, especially for this rewriting trick.  How about &quot;ghc_cas&quot; or something?</li>


<li>it would be great to get at least fetch-and-add in addition to CAS and barriers</li><li>if we reliably provide this set of special symbols, libraries like atomic-primops may use them in the .cmm and benefit from the CMM-&gt;LLVM substitutions</li>


<li>if we include all the primops I need in GHC proper the previous bullet will stop applying ;-)</li></ul></div><div>Cheers,</div><div>  -Ryan</div><div><br></div><div>P.S. Just as a bit of motivation, here are some recent performance numbers.  We often wonder about how close our &quot;pure values in a box&quot; approach comes to efficient lock-free structures.  <span style="font-size:12.727272033691406px;font-family:arial,sans-serif">Well here are some numbers about using a proper unboxed counter in the Haskell heap, vs using an IORef Int and atomicModifyIORef&#39;:  </span><span style="font-size:12.727272033691406px;font-family:arial,sans-serif">Up to 100X performance difference on some platforms for microbenchmarks that hammer a counter:</span></div>


<div><div><br>    <a href="https://github.com/rrnewton/haskell-lockfree-queue/blob/fb12d1121690553e4f737af258848f279147ea24/AtomicPrimops/DEVLOG.md#20130718-timing-atomic-counter-ops" target="_blank">https://github.com/rrnewton/haskell-lockfree-queue/blob/fb12d1121690553e4f737af258848f279147ea24/AtomicPrimops/DEVLOG.md#20130718-timing-atomic-counter-ops</a><br>


</div><div><div style="font-family:arial,sans-serif;font-size:12.727272033691406px"><div><br></div><div>And here are the performance and scaling advantages of using ChaseLev (based on atomic-primops), over a traditional pure-in-a-box structure (IORef Data.Seq). The following are timings of ChaseLev/traditional respectively on a 32 core westmere:</div>


<div><br></div><div>    fib(42) 1 threads:  21s</div><div>    fib(42) 2 threads:  10.1s</div><div>    fib(42) 4 threads:  5.2s (100%prod)</div><div>    fib(42) 8 threads:  2.7s - 3.2s (100%prod) </div><div>    fib(42) 16 threads: 1.28s</div>


<div>    fib(42) 24 threads: 1.85s</div><div>    fib(42) 32 threads: 4.8s (high variance)</div></div><div style="font-family:arial,sans-serif;font-size:12.727272033691406px"><br></div><div style="font-family:arial,sans-serif;font-size:12.727272033691406px">


<div>    (hive) fib(42) 1 threads:  41.8s  (95% prod)</div><div>    (hive) fib(42) 2 threads:  25.2s  (66% prod)</div><div>    (hive) fib(42) 4 threads:  14.6s  (27% prod, 135GB alloc)</div><div>    (hive) fib(42) 8 threads:  17.1s  (26% prod)</div>


<div>    (hive) fib(42) 16 threads: 16.3s  (13% prod)</div><div>    (hive) fib(42) 24 threads: 21.2s  (30% prod)</div><div>    (hive) fib(42) 32 threads: 29.3s  (33% prod)</div></div><div style="font-family:arial,sans-serif;font-size:12.727272033691406px">


<br></div><div style="font-family:arial,sans-serif;font-size:12.727272033691406px">And that is WITH the inefficiency of doing a &quot;ccall&quot; on every single atomic operation.</div><div style="font-family:arial,sans-serif;font-size:12.727272033691406px">


<br></div><div style="font-family:arial,sans-serif;font-size:12.727272033691406px">Notes on parfib performance are here:</div><div style="font-family:arial,sans-serif;font-size:12.727272033691406px"><br></div><div style="font-family:arial,sans-serif;font-size:12.727272033691406px">


<a href="https://github.com/rrnewton/haskell-lockfree-queue/blob/d6d3e9eda2a487a5f055b1f51423954bb6b6bdfa/ChaseLev/Test.hs#L158" target="_blank">https://github.com/rrnewton/haskell-lockfree-queue/blob/d6d3e9eda2a487a5f055b1f51423954bb6b6bdfa/ChaseLev/Test.hs#L158</a><br>


</div></div></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div></div><div><div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Jul 19, 2013 at 5:05 PM, Carter Schonwald <span dir="ltr">&lt;<a href="mailto:carter.schonwald@gmail.com" target="_blank">carter.schonwald@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr">ryan, the relevant machinery on the C side is here, see ./includes/stg/SMP.h : <a href="https://github.com/ghc/ghc/blob/7cc8a3cc5c2970009b83844ff9cc4e27913b8559/includes/stg/SMP.h" target="_blank">https://github.com/ghc/ghc/blob/7cc8a3cc5c2970009b83844ff9cc4e27913b8559/includes/stg/SMP.h</a><div>


<br></div><div>(unless i&#39;m missing something)</div></div><div><div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Jul 19, 2013 at 4:53 PM, Carter Schonwald <span dir="ltr">&lt;<a href="mailto:carter.schonwald@gmail.com" target="_blank">carter.schonwald@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr">Ryan, <div>if you look at line 270, you&#39;ll see the CAS is a C call <a href="https://github.com/ghc/ghc/blob/95e6865ecf06b2bd80fa737e4fa4a24beaae25c5/rts/PrimOps.cmm#L270" target="_blank">https://github.com/ghc/ghc/blob/95e6865ecf06b2bd80fa737e4fa4a24beaae25c5/rts/PrimOps.cmm#L270</a> </div>


<div><br></div><div>What Simon is alluding to is some work I started (but need to finish)</div><div><a href="http://ghc.haskell.org/trac/ghc/ticket/7883" target="_blank">http://ghc.haskell.org/trac/ghc/ticket/7883</a> is the relevant ticket, and I&#39;ll need to sort out doing the same on the native code gen too<br>


</div><div><br></div><div>there ARE no write barrier primops, they&#39;re baked into the CAS machinery in ghc&#39;s rts</div></div><div><div><div class="gmail_extra"><br><br><div class="gmail_quote">


On Fri, Jul 19, 2013 at 1:02 PM, Ryan Newton <span dir="ltr">&lt;<a href="mailto:rrnewton@gmail.com" target="_blank">rrnewton@gmail.com</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div>Yes, I&#39;d absolutely rather not suffer C call overhead for these functions (or the CAS functions).  But isn&#39;t that how it&#39;s done currently for the casMutVar# primop?</div>


<div><br></div>

<div><a href="https://github.com/ghc/ghc/blob/95e6865ecf06b2bd80fa737e4fa4a24beaae25c5/rts/PrimOps.cmm#L265" target="_blank">https://github.com/ghc/ghc/blob/95e6865ecf06b2bd80fa737e4fa4a24beaae25c5/rts/PrimOps.cmm#L265</a><br>


</div>

<div><br></div><div>To avoid the overhead, is it necessary to make each primop in-line rather than out-of-line, or just to get rid of the &quot;ccall&quot;?</div><div><br></div>Another reason it would be good to package these with GHC is that I&#39;m having trouble building robust libraries of foreign primops that work under all &quot;ways&quot; (e.g. GHCI).  For example, this bug:<div>


<br></div><div>    <a href="https://github.com/rrnewton/haskell-lockfree-queue/issues/10" target="_blank">https://github.com/rrnewton/haskell-lockfree-queue/issues/10</a><br></div><div><br></div><div>If I write .cmm code that depends on RTS functionality like stg_MUT_VAR_CLEAN_info, then it seems to work fine when in compiled mode (with/without threading, profiling), but I get link errors from GHCI where these symbols aren&#39;t defined.</div>


<div><br></div><div>I&#39;ve got a draft of the relevant primops here:</div><div><br></div><div><a href="https://github.com/rrnewton/haskell-lockfree-queue/blob/master/AtomicPrimops/cbits/primops.cmm" target="_blank">https://github.com/rrnewton/haskell-lockfree-queue/blob/master/AtomicPrimops/cbits/primops.cmm</a><br>


</div><div><br></div><div>Which includes:</div><div><ul><li>variants of CAS for MutableArray# and MutableByteArray#</li><li>fetch-and-add for MutableByteArray#</li></ul></div><div>

Also, there are some tweaks to support the new &quot;ticketed&quot; interface for safer CAS:</div><div><br></div><div>   <a href="http://hackage.haskell.org/packages/archive/atomic-primops/0.3/doc/html/Data-Atomics.html#g:3" target="_blank">http://hackage.haskell.org/packages/archive/atomic-primops/0.3/doc/html/Data-Atomics.html#g:3</a><br>


</div><div><br></div><div>I started adding some of these primops to GHC proper (still as out-of-line), but not all of them.  I had gone with the foreign primop route instead...</div><div><br></div><div>

   <a href="https://github.com/rrnewton/ghc/commits/master" target="_blank">https://github.com/rrnewton/ghc/commits/master</a><span><font color="#888888"><br></font></span></div><span><font color="#888888"><div>

<br></div><div>  -Ryan</div></font></span><div><br></div><div>P.S. Where is the write barrier primop?  I don&#39;t see it listed in prelude/primops.txt...</div><div><div>


<div><br></div><div><br></div><div><br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Jul 19, 2013 at 11:41 AM, Carter Schonwald <span dir="ltr">&lt;<a href="mailto:carter.schonwald@gmail.com" target="_blank">carter.schonwald@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">I guess I should find the time to finish the CAS primop work I volunteered to do then. Ill look into in a few days. <div>


<div><span></span><br><br>On Friday, July 19, 2013, Simon Marlow  wrote:<br></div></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


<div><div>


On 18/07/13 14:17, Ryan Newton wrote:<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

The &quot;atomic-primops&quot; library depends on symbols such as<br>

store_load_barrier and &quot;cas&quot;, which are defined in SMP.h.  Thus the<br>

result is that if the program is linked WITHOUT &quot;-threaded&quot;, the user<br>

gets a linker error about undefined symbols.<br>

<br>

The specific place it&#39;s used is in the &#39;foreign &quot;C&quot;&#39; bits of this .cmm code:<br>

<br>

<a href="https://github.com/rrnewton/haskell-lockfree-queue/blob/87e63b21b2a6c375e93c30b98c28c1d04f88781c/AtomicPrimops/cbits/primops.cmm" target="_blank">https://github.com/rrnewton/<u></u>haskell-lockfree-queue/blob/<u></u>87e63b21b2a6c375e93c30b98c28c1<u></u>d04f88781c/AtomicPrimops/<u></u>cbits/primops.cmm</a><br>


<br>

I&#39;m trying to explore hacks that will enable me to pull in those<br>

functions during compile time, without duplicating a whole bunch of code<br>

from the RTS.  But it&#39;s a fragile business.<br>

<br>

It seems to me that some of these routines have general utility.  In<br>

future versions of GHC, could we consider linking in those routines<br>

irrespective of &quot;-threaded&quot;?<br>

</blockquote>

<br>

We should make the non-THREADED versions EXTERN_INLINE too, so that there will be (empty) functions to call in rts/Inlines.c.  Want to submit a patch?<br>

<br>

A better solution would be to make them into primops.  You don&#39;t really want to be calling out to a C function to implement a memory barrier. We have this for write_barrier(), but none of the others so far.  Of couse that&#39;s a larger change.<br>


<br>

Cheers,<br>

        Simon<br>

<br>

<br>

<br></div></div>

______________________________<u></u>_________________<br>

ghc-devs mailing list<br>

<a>ghc-devs@haskell.org</a><br>

<a href="http://www.haskell.org/mailman/listinfo/ghc-devs" target="_blank">http://www.haskell.org/<u></u>mailman/listinfo/ghc-devs</a><br>

</blockquote>

</blockquote></div><br></div></div></div></div>

</blockquote></div><br></div>

</div></div></blockquote></div><br></div>

</div></div></blockquote></div><br></div>

</div></div></blockquote></div><br></div>

</div></div></blockquote></div><br></div>

</div></div></blockquote></div><br></div>

</div></div></blockquote></div><br></div></div></div></div>

</blockquote></div><br></div></div>