Mac/PPC threaded RTS problems -- potential clue

David Kirkman dkirkman at gmail.com
Sun Aug 13 16:58:41 EDT 2006


I managed to build ghc-6.5.20060804 on a powerpc mac, and I spent a
little time Saturday night trying to work out what might be going on
with the threaded RTS.

Running the testsuite with make WAY=threaded1 shows many (73)
failures.  Many of them (the conc??? set) have to do with TVars, in
particular writing TVars.  On my dual-proc G5 macs, with
ghc-6.5.20060804, the following program will hang when running with
the threaded RTS, but will work fine when compiled without -threaded.

>module Main where
>
>import GHC.Conc
>import Control.Concurrent
>
>main = do
>       t1 <- atomically (do t1 <- newTVar 80
>	                    return t1)
>       atomically ( do writeTVar t1 1 )
>       putStr ("done\n")

This problem seems to be powerpc specific -- it works fine with the
threaded RTS on a multi-processor intel mac (built by me, from the
same ghc-6.5.20060804 source tree I used to build the ppc compiler).

Looking around in STM.c (via decidedly low-tech printfs) I quickly
zoomed in on cond_lock_tvar, which lead me to cas (atomic compare and
swap) in SMP.h, where I found (I think) a fairly clear error in the
powerpc code -- I've appended a patch to the end of this message.

The problem is that the inline assembler code was placing the result
of an operation in a register that is used as input later in the code.
At the bottom of this message I've extracted a short short code
fragment that you can run through gcc (on a powerpc machine) to see
the generated assembly output.

The changes to fix the problem are fairly simple.  The first adds an
ampersand to the output list of the assembly fragment ("=r" (result)
--> "=&r" (result)) The ampersand just tells gcc that result can not
be placed in a register used for any of the input parameters (o, n, or
p).  Otherwise, it feels free to place output parameters in the same
registers used by the inputs -- but because of the flow of control
here we need everything in a distinct register.  This change fixes the
TVar program above.

The second change adds a clobber list (the :"cc", "memory").  This
tells gcc that the condition code (due to the compare) and memory (due
to the store) might be changed during the asm execution.  The lack of
a clobber list did not seem to be causing any trouble, but without it
gcc is free to assume that no state is changed during the execution.

Applying the following patch to SMP.h, and rebuilding everything, I
not only fixed the simple writeTVar program, but it also fixed 8
programs in the testsuite (conc043 -> conc049, conc052 and conc053).
The only conc test program that still fails is conc039.  But there are
still many mac problems, I still have 132 unexpected failures with
make fast.  At least the patch does not cause any new failures (in
either make fast or make WAY=threaded1).

Anyway, seeing as the change to SMP.h fixes a fair number of test
cases in the testsuite, I figure there is some chance that it might
fix some of the problems that people are having with the threaded RTS.
On the other hand, I'm not real happy the the large number of
testsuite failures my build gets, so I can't really call this
'tested'.  But I'm posting because it might be a useful clue for
somebody with a little more mac/ghc experience.

Cheers,

-david k.

//
// Short code to run through gcc -S.  On my powermac, without the change
// the generated assembly produces
//            1: lwarx   r0, 0, r0
//      load to here  ---^      ^------- from address here
//                                       But we need this value for the stwcx.
//
// with the fix, the first line of the generated assembly becomes
//            1: lwarx   r11, 0, r0
// and r0 remains unmodified if we need to use it later in the stwcx.

/*
  * CMPXCHG - the single-word atomic compare-and-exchange instruction.  Used
  * in the STM implementation.
 */
long cas(long* p, long o, long n)
{
    long result;

    //
    // Change
    //       :"=r" (result) --> :"=&r" (result)
    // to get result and p in different registers
    __asm__ __volatile__ (
        "1:     lwarx     %0, 0, %3\n"
        "       cmpw      %0, %1\n"
        "       bne       2f\n"
        "       stwcx.    %2, 0, %3\n"
        "       bne-      1b\n"
        "2:"
        :"=r" (result)
        :"r" (o), "r" (n), "r" (p)
    );
    return result;
}


Here's a "diff -cp" for SMP.h

*** SMP.h       Sun Aug 13 01:08:53 2006
--- SMP-new.h   Sun Aug 13 01:08:47 2006
*************** cas(StgVolatilePtr p, StgWord o, StgWord
*** 76,83 ****
          "       stwcx.    %2, 0, %3\n"
          "       bne-      1b\n"
          "2:"
!         :"=r" (result)
          :"r" (o), "r" (n), "r" (p)
      );
      return result;
  #else
--- 76,84 ----
          "       stwcx.    %2, 0, %3\n"
          "       bne-      1b\n"
          "2:"
!         :"=&r" (result)
          :"r" (o), "r" (n), "r" (p)
+         :"cc", "memory"
      );
      return result;
  #else


More information about the Glasgow-haskell-users mailing list