Programmable locality specifications for multicore
Don Stewart
dons at galois.com
Fri Mar 20 14:45:21 EDT 2009
Re. locality, and affinity, and the complexity of it, have you see the
work on Chapel for programmable locality 'strategies' (in the Haskell
sense) they've done? I think there's some interesting overlaps there in
with what GHC needs to do, and how it's envisaged for Chapel on Cray
machines.
Brad - what's the best reference for the locality/capability machine
models you plug into the Chapel compiler?
-- Don
(CCing Brad Chamberlain, who's done most of the work on this at Cray)
marlowsd:
> Manuel M T Chakravarty wrote:
>> I am fixing that for Mac OS right now.
>>
>> The comment above setThreadAffinity() says,
>>
>>> // Schedules the thread to run on CPU n of m. m may be less than the
>>> // number of physical CPUs, in which case, the thread will be allowed
>>> // to run on CPU n, n+m, n+2m etc.
>>
>> I am not convinced that this is a good plan. If m is less than n, some
>> threads can hop between CPUs and so invalidate their L2 cache
>> repeatedly. The man page for sched_setaffinity() says,
>>
>>> Restricting a process to run on a single CPU also prevents the
>>> performance cost caused by the cache invalidation that occurs when a
>>> process ceases to execute on one CPU and then recommences execution
>>> on a different CPU.
>>
>> Besides, you call this function with
>>
>>> setThreadAffinity(cap->no, n_capabilities);
>>
>> So, m is the number of capabilities, not the number of CPUs. The way I
>> understand the man page of sched_setaffinity(), if I run a Haskell
>> program with +RTS -N4 on an 8 core machine, it can only ever use the
>> first four cores of the system when thread affinity is used.
>
> The idea is that when you use +RTS -N4 on an 8 core machine, the first
> capability is allowed to run on CPUS 1 and 5, the second on 2 and 6, and
> so on. It just ensures that threads on different Capabilities don't
> pre-empt each other.
>
> Well, maybe we also want to be able to say "just use CPUS 1-4" for
> locality reasons, as you say. But why 1-4, and not 4-8? It doesn't make
> much sense to pick particular CPUs, hence the current method.
>
> This stuff is just for benchmarking and playing around with - I don't
> think fixing affinity like this is a good idea in general.
More information about the Cvs-ghc
mailing list