<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">
<META NAME="GENERATOR" CONTENT="GtkHTML/3.28.1">
</HEAD>
<BODY>
Hi Simon,<BR>
<BR>
<BLOCKQUOTE TYPE=CITE>
<PRE>
Thanks - I already did this for alloca/malloc, I'll add the others from
your patch.
</PRE>
</BLOCKQUOTE>
<BR>
Thank you.<BR>
<BR>
<BLOCKQUOTE TYPE=CITE>
<PRE>
We go to quite a lot of trouble to avoid locking in the common cases and
fast paths - most of our data structures are CPU-local. Where in
particular have you encountered locking that could be reduced?
</PRE>
</BLOCKQUOTE>
<BR>
<BLOCKQUOTE TYPE=CITE>
<PRE>
The pinned_object_block is CPU-local, usually no locking is required.
Only when the block is full do we have to get a new block from the block
allocator, and that requires a lock, but it's a rare case.
</PRE>
</BLOCKQUOTE>
<BR>
OK, the code I have checked out from the repository contains this in "rts/sm/Storage.h":<BR>
<BR>
<BLOCKQUOTE>
extern bdescr * pinned_object_block;<BR>
</BLOCKQUOTE>
<BR>
And in "rts/sm/Storage.c":<BR>
<BR>
<BLOCKQUOTE>
bdescr *pinned_object_block;<BR>
</BLOCKQUOTE>
<BR>
My C might be rusty, but I see no way for pinned_object_block to be CPU local. If it is truly CPU local then what makes it to be that kind?<BR>
<BR>
As for locking, here is one one of examples:<BR>
<BR>
<BLOCKQUOTE>
StgPtr<BR>
allocatePinned( lnat n )<BR>
{<BR>
StgPtr p;<BR>
bdescr *bd = pinned_object_block;<BR>
<BR>
// If the request is for a large object, then allocate()<BR>
// will give us a pinned object anyway.<BR>
if (n >= LARGE_OBJECT_THRESHOLD/sizeof(W_)) {<BR>
p = allocate(n);<BR>
Bdescr(p)->flags |= BF_PINNED;<BR>
return p;<BR>
}<BR>
<BR>
<B>ACQUIRE_SM_LOCK; // [RTVD: here we acquire the lock]</B><BR>
<BR>
TICK_ALLOC_HEAP_NOCTR(n);<BR>
CCS_ALLOC(CCCS,n);<BR>
<BR>
// If we don't have a block of pinned objects yet, or the current<BR>
// one isn't large enough to hold the new object, allocate a new one.<BR>
if (bd == NULL || (bd->free + n) > (bd->start + BLOCK_SIZE_W)) {<BR>
pinned_object_block = bd = allocBlock();<BR>
dbl_link_onto(bd, &g0s0->large_objects);<BR>
g0s0->n_large_blocks++;<BR>
bd->gen_no = 0;<BR>
bd->step = g0s0;<BR>
bd->flags = BF_PINNED | BF_LARGE;<BR>
bd->free = bd->start;<BR>
alloc_blocks++;<BR>
}<BR>
<BR>
p = bd->free;<BR>
bd->free += n;<BR>
<B>RELEASE_SM_LOCK; // [RTVD: here we release the lock]</B><BR>
return p;<BR>
}<BR>
<BR>
Of course, TICK_ALLOC_HEAP_NOCTR and CCS_ALLOC may require synchronization if they use shared state (which is, again, probably unnecessary). However, in case no profiling goes on and "pinned_object_block" is TSO-local, isn't it possible to remove locking completely from this code? The only case when locking will be necessary is when a fresh block has to be allocated, and that can be done within the "allocBlock" method (or, more precisely, by using "allocBlock_lock".<BR>
<BR>
ACQUIRE_SM_LOCK/RELEASE_SM_LOCK pair is present in other places too, but I have not analysed yet if it is really necessary there. For example, things like newCAF and newDynCAF are wrapped into it.<BR>
<BR>
With kind regards,<BR>
Denys Rtveliashvili
</BLOCKQUOTE>
</BODY>
</HTML>