cvs commit: fptools/ghc/rts Schedule.c
simonmar at microsoft.com
Thu Feb 26 17:26:11 EST 2004
> > > On a little test I have here which does lots of call-ins
> > on Windows,
> > > this reduces the slowdown for using the threaded RTS from
> > a factor of
> > > 7-8 down to a factor of 4-5. I'm aiming for a factor of 2 or
> > > better...
> > I assume that test doesn't work without Visual Studio and therefore
> > can't be run on other platforms?
> > By the way, on Mac OS X, initCondition a.k.a.
> > pthread_cond_init does no
> > syscall and takes about fifteen instructions in total :-) .
> The test works fine on other platforms (it's just a tight loop in C
> calling a foreign exported 'return ()'), but I expect the performance
> behaviour differs between platforms (as you mentioned), and Windows is
> the platform I care about at the moment. I haven't measured the
> overhead on Linux yet, but will do so later...
> The test is running without any worker threads in the system. With a
> worker thread I expect there to more overhead, because the worker will
> be sitting in awaitEvent() with a capability, which it needs to hand
> over to the thread doing the call-in. I'm planning to experiment with
> releasing the capability before awaitEvent() if EMPTY_RUN_QUEUE.
> Also, I'm not sure that the storage manager's mutex is required.
> Instead, we could just assume that the storage manager is
> only available
> to a thread with a Capability (or the garbage collector, which
> notionally runs with a Capability).
So it turns out that synchronisation is really really expensive on
Windows. I've fixed a few places where we were doing some unnecessary
synchronisation and got the threaded RTS penalty for this test down to a
factor of 3-4 on Windows (it's around 2 on Linux).
Now... if I add a worker thread into the system, the test gets worse by
another factor of 2. This is something we really need to fix, but it
needs some thought (my first attempt failed).
More information about the Cvs-ghc