SMP crash

Simon Marlow simonmarhaskell at gmail.com
Mon Feb 27 06:31:41 EST 2006


Krasimir Angelov wrote:

> While trying to build VSHaskell with the recent GHC I found the
> following problem. In Stable.c the stable_mutex is used for
> synchronization but it is initialized only from initStablePtrTable.
> The initStablePtrTable function is called only from hs_init but
> according to the following comment:
> 
> // Nothing to do:
> // the table will be allocated the first time makeStablePtr is
> // called, and we want the table to persist through multiple inits.
> //
> // Also, getStablePtr is now called from __attribute__((constructor))
> // functions, so initialising things here wouldn't work anyway.
> 
> it might be too late. In this case the CRITICAL_SECTION will not be
> initialized at the right time. The consequence is that the whole
> program crashes. In order to fix that I have added a call to
> initStablePtrTable to each function that requires locking. The actions
> in initStablePtrTable are executed only the first time when it is
> called. Since I am using SPT_size as a flag it isn't safe to call
> it for a first time from two concurrent threads. As long as it is
> executed from hs_init or from any __attribute__((constructor))
> function, I think it is safe.

Thanks Krasimir, I've committed your patch.

> This fixes the problem but after that the RTS blocks with waitCondition
> at line 401 in Capability.c. The trace messages are with +RTS -Ds are:
> 
> ACQUIRE_LOCK(0x64FDC130) Stable.c 248
> RELEASE_LOCK(0x64FDC130) Stable.c 251
> ACQUIRE_LOCK(0x64FDCE50) Schedule.c 2689
> sched (task 00000EF8): allocated 1 capabilities
> RELEASE_LOCK(0x64FDCE50) Schedule.c 2722
> ACQUIRE_LOCK(0x64FDCDF0) Storage.c 132
> RELEASE_LOCK(0x64FDCDF0) Storage.c 256
> ACQUIRE_LOCK(0x64FDCE50) RtsAPI.c 560
> sched (task 00000EF8): new task (taskCount: 1)
> RELEASE_LOCK(0x64FDCE50) RtsAPI.c 562
> ACQUIRE_LOCK(0x64FDD128) Capability.c 387
> sched (task 00000EF8): returning; I want capability 0
> RELEASE_LOCK(0x64FDD128) Capability.c 395
> sched (task 00000EF8): returning; got capability 0
> ACQUIRE_LOCK(0x64FDCE50) Schedule.c 2425
> RELEASE_LOCK(0x64FDCE50) Schedule.c 2429
> sched (task 00000EF8): created thread 1, stack size = f2 words
> sched (task 00000EF8): new bound thread (1)
> sched (task 00000EF8): ### NEW SCHEDULER LOOP (task: 01D8FD50, cap: 64FDD040)
> sched (task 00000EF8): ### Running thread 1 in bound thread
> sched (task 00000EF8): -->> running thread 1 ThreadRunGHC ...
> sched (task 00000EF8): thread 1 did a safe foreign call
> ACQUIRE_LOCK(0x64FDD128) Schedule.c 2212
> sched (task 00000EF8): starting new worker on capability 0
> ACQUIRE_LOCK(0x01D8FE10) Task.c 245
> sched (task 00000EF8): new worker task (taskCount: 2)
> RELEASE_LOCK(0x01D8FE10) Task.c 265
> RELEASE_LOCK(0x64FDD128) Schedule.c 2218
> sched (task 00000EF8): thread 1: leaving RTS
> ACQUIRE_LOCK(0x64FDD128) Capability.c 387
> sched (task 00000EF8): returning; I want capability 0
> RELEASE_LOCK(0x64FDD128) Capability.c 398
> ACQUIRE_LOCK(0x01D8FD70) Capability.c 401
> RELEASE_LOCK(0x01D8FD70) win32/OSThreads.c 75
> 
> I also have added optional debug messages to ACQUIRE_LOCK/RELEASE_LOCK
> for Windows like they have been added for Linux. The applied patch is attached.

That's odd - there should be two threads attempting to grab capability 
0.  Can you see what each thread is doing?  In gdb, something like this:

  > thread 0
  > where
  > thread 1
  > where

I'll try to get my Windows build up today and see if any of the threaded 
tests are failing.

Cheers,
	Simon


More information about the Cvs-ghc mailing list