<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">

<HTML>

<HEAD>

  <META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">

  <META NAME="GENERATOR" CONTENT="GtkHTML/3.28.1">

</HEAD>

<BODY>

Hi Simon,<BR>

<BR>

<BLOCKQUOTE TYPE=CITE>

<PRE>

Thanks - I already did this for alloca/malloc, I'll add the others from 

your patch.

</PRE>

</BLOCKQUOTE>

<BR>

Thank you.<BR>

<BR>

<BLOCKQUOTE TYPE=CITE>

<PRE>

We go to quite a lot of trouble to avoid locking in the common cases and 

fast paths - most of our data structures are CPU-local.  Where in 

particular have you encountered locking that could be reduced?

</PRE>

</BLOCKQUOTE>

<BR>

<BLOCKQUOTE TYPE=CITE>

<PRE>

The pinned_object_block is CPU-local, usually no locking is required. 

Only when the block is full do we have to get a new block from the block 

allocator, and that requires a lock, but it's a rare case.

</PRE>

</BLOCKQUOTE>

<BR>

OK, the code I have checked out from the repository contains this in &quot;rts/sm/Storage.h&quot;:<BR>

<BR>

<BLOCKQUOTE>

    extern bdescr * pinned_object_block;<BR>

</BLOCKQUOTE>

<BR>

And in &quot;rts/sm/Storage.c&quot;:<BR>

<BR>

<BLOCKQUOTE>

    bdescr *pinned_object_block;<BR>

</BLOCKQUOTE>

<BR>

My C might be rusty, but I see no way for pinned_object_block to be CPU local. If it is truly CPU local then what makes it to be that kind?<BR>

<BR>

As for locking, here is one one of examples:<BR>

<BR>

<BLOCKQUOTE>

    StgPtr<BR>

    allocatePinned( lnat n )<BR>

    {<BR>

    &nbsp;&nbsp;&nbsp; StgPtr p;<BR>

    &nbsp;&nbsp;&nbsp; bdescr *bd = pinned_object_block;<BR>

    <BR>

    &nbsp;&nbsp;&nbsp; // If the request is for a large object, then allocate()<BR>

    &nbsp;&nbsp;&nbsp; // will give us a pinned object anyway.<BR>

    &nbsp;&nbsp;&nbsp; if (n &gt;= LARGE_OBJECT_THRESHOLD/sizeof(W_)) {<BR>

    &nbsp; p = allocate(n);<BR>

    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Bdescr(p)-&gt;flags |= BF_PINNED;<BR>

    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return p;<BR>

    &nbsp;&nbsp;&nbsp; }<BR>

    <BR>

    &nbsp;&nbsp;&nbsp; <B>ACQUIRE_SM_LOCK; // [RTVD: here we acquire the lock]</B><BR>

    <BR>

    &nbsp;&nbsp;&nbsp; TICK_ALLOC_HEAP_NOCTR(n);<BR>

    &nbsp;&nbsp;&nbsp; CCS_ALLOC(CCCS,n);<BR>

    <BR>

    &nbsp;&nbsp;&nbsp; // If we don't have a block of pinned objects yet, or the current<BR>

    &nbsp;&nbsp;&nbsp; // one isn't large enough to hold the new object, allocate a new one.<BR>

    &nbsp;&nbsp;&nbsp; if (bd == NULL || (bd-&gt;free + n) &gt; (bd-&gt;start + BLOCK_SIZE_W)) {<BR>

    &nbsp; pinned_object_block = bd = allocBlock();<BR>

    &nbsp; dbl_link_onto(bd, &amp;g0s0-&gt;large_objects);<BR>

    &nbsp; g0s0-&gt;n_large_blocks++;<BR>

    &nbsp; bd-&gt;gen_no = 0;<BR>

    &nbsp; bd-&gt;step&nbsp;&nbsp; = g0s0;<BR>

    &nbsp; bd-&gt;flags&nbsp; = BF_PINNED | BF_LARGE;<BR>

    &nbsp; bd-&gt;free&nbsp;&nbsp; = bd-&gt;start;<BR>

    &nbsp; alloc_blocks++;<BR>

    &nbsp;&nbsp;&nbsp; }<BR>

    <BR>

    &nbsp;&nbsp;&nbsp; p = bd-&gt;free;<BR>

    &nbsp;&nbsp;&nbsp; bd-&gt;free += n;<BR>

    &nbsp;&nbsp;&nbsp; <B>RELEASE_SM_LOCK; // [RTVD: here we release the lock]</B><BR>

    &nbsp;&nbsp;&nbsp; return p;<BR>

    }<BR>

    <BR>

    Of course, TICK_ALLOC_HEAP_NOCTR and CCS_ALLOC may require synchronization if they use shared state (which is, again, probably unnecessary). However, in case no profiling goes on and &quot;pinned_object_block&quot; is TSO-local, isn't it possible to remove locking completely from this code? The only case when locking will be necessary is when a fresh block has to be allocated, and that can be done within the &quot;allocBlock&quot; method (or, more precisely, by using &quot;allocBlock_lock&quot;.<BR>

    <BR>

    ACQUIRE_SM_LOCK/RELEASE_SM_LOCK pair is present in other places too, but I have not analysed yet if it is really necessary there. For example, things like newCAF and newDynCAF are wrapped into it.<BR>

    <BR>

    With kind regards,<BR>

    Denys Rtveliashvili

</BLOCKQUOTE>

</BODY>

</HTML>