GCC, Mac OS X & the future
Simon Marlow
marlowsd at gmail.com
Sat Jul 2 21:42:43 CEST 2011
On 02/07/11 19:34, David Peixotto wrote:
> I'm glad you caught my benchmarking error because the new results
> look quite different! Running the benchmarks with the -threaded
> runtime shows that the actual slowdown is close to 30% for GC-intense
> programs.
>
> In the fibon results, the average execution time was 12% slower for
> llvm-gcc, and the average GC slowdown was 42%. In the nofib gc
> benchmark results, the average execution time for llvm-gcc was 30%
> longer.
Ok, that's bad. I'm not a Mac user, but I wouldn't put up with more
than 5% (and I'd be very unhappy about that).
> While the results are disappointing, they seem reasonable after
> taking a look at the code generated for the access of the `gct`
> variable in the GC. I had hoped using pthread_getspecific would just
> require a few inline assembly instructions, but it looks like the
> overhead is much higher. When accessing the `gct` variable in the GC
> it calls `getThreadLocalVar` which is the GHC wrapper for
> pthread_getspecific. Then the actual call to pthread_getspecific goes
> through the dynamic linker so we take an extra hit there. The actual
> code for pthread_getspecific is just a mov followed by a return.
>
> The best we could hope for would be for an access of `gct` to turn
> into something like this in the GC:
>
> movq (%rdi),%rdi #deref the key which is an index into the tls memory
> movq %gs:0x00000060(,%rdi,8),%rax # read the value indexed by the key
>
> but it looks like we are getting something like this:
>
> call getThreadLocalVar
> movq (%rdi),%rdi #deref the key which is an index into the tls memory
> jmp<dynamic_linker_stub>
> movq %gs:0x00000060(,%rdi,8),%rax #pthread_getspecific body
> ret
you don't need to go through getThreadLocalVar, right? Just call
pthread_getspecific directly. I don't know why it's going through the
dynamic linker stub, I thought it was supposed to be #defined to the
inline assembly.
Anyway, the last resort will be to pass gct as a parameter to the
critical functions in the GC - scavenge_block() and everything it calls
transitively, including evacuate(). This is likely to give quite good
performance, but not as good as a register variable, so unfortunately
we'll need some #ifdefery or macros which will be quite ugly (hence why
I say this is a last resort).
Cheers,
Simon
> The call to getThreadLocalVar may be getting inlined in some places, but not at the site I examined. I've include the detailed benchmark results below.
>
> For the fibon results, a negative number indicates that llvm-gcc is slower. Efficiency is the percent of the total execution time spent in GC.
>
> Fibon Results
> -----------------------------------------------------------------
> MutCPUTime GCCPUTime TotalCPUTime Efficiency
> -----------------------------------------------------------------
> Agum +0.17% -53.48% -11.49% 80.32%
> BinaryTrees +0.13% -60.70% -22.98% 68.96%
> Blur -0.26% -15.27% -0.43% 98.58%
> Bzlib -2.31% -8.57% -2.32% 99.89%
> Chameneos -15.65% -37.53% -15.74% 99.53%
> Cpsa -0.02% -58.13% -5.25% 91.32%
> Crypto -0.59% -47.97% -27.08% 52.18%
> FFT2d +2.09% -33.66% +0.30% 94.64%
> FFT3d -1.58% -12.19% -1.88% 96.58%
> Fannkuch -0.84% -26.99% -2.59% 92.64%
> Fgl -0.40% -50.78% -21.16% 63.74%
> Fst +0.32% -66.43% -13.21% 81.93%
> Funsat -1.36% -44.08% -18.94% 65.30%
> Gf -5.38% -44.43% -17.56% 77.11%
> HaLeX +3.77% -66.52% +1.13% 96.30%
> Happy -0.98% -59.06% -25.67% 64.51%
> Hgalib -2.45% -44.33% -5.96% 91.67%
> Laplace +0.43% -23.42% -0.63% 95.09%
> MMult +1.04% -13.62% +0.48% 95.34%
> Mandelbrot +0.06% -17.29% +0.03% 99.78%
> Nbody -0.69% -18.18% -0.82% 98.99%
> Palindromes -2.83% -82.54% -52.78% 57.72%
> Pappy +0.17% -44.32% -38.64% 34.84%
> Pidigits +0.17% -57.56% -11.34% 81.62%
> QuickCheck +0.36% -50.14% -6.52% 87.62%
> Regex -1.14% -35.26% -2.78% 94.79%
> Simgi +1.39% -41.70% -10.15% 74.64%
> SpectralNorm +0.06% ---- +0.06% 100.00%
> TernaryTrees +1.59% -48.39% -23.62% 58.03%
> Xsact -0.72% -61.65% -28.25% 63.44%
> -----------------------------------------------------------------
> Min -15.65% -82.54% -52.78% 34.84%
> Mean -0.85% -42.21% -12.19% 81.90%
> Max +3.77% -8.57% +1.13% 100.00%
>
>
> For the nofib results, a positive number means the llvm-gcc version was slower.
>
> NoFib Results
> ------------------------------------------------------------------------------
> Program Size Allocs Runtime Elapsed TotalMem
> ------------------------------------------------------------------------------
> circsim +0.0% +0.0% +22.5% +21.2% -0.2%
> constraints +0.0% +0.0% +39.4% +38.3% +0.0%
> fulsom +0.0% +0.0% +23.7% +22.2% +7.1%
> gc_bench +0.1% +0.0% +68.7% +67.8% +0.3%
> happy +0.1% +0.0% +14.8% +14.4% +0.0%
> lcss +0.1% +0.0% +34.3% +31.6% +0.0%
> mutstore1 +0.0% +0.0% +41.3% +35.6% +0.0%
> mutstore2 +0.0% +0.0% +24.3% +23.4% +0.0%
> power +0.0% +0.0% +34.6% +35.1% +0.0%
> spellcheck +0.1% +0.0% +11.8% +11.9% +0.0%
> ------------------------------------------------------------------------------
> Min +0.0% +0.0% +11.8% +11.9% -0.2%
> Max +0.1% +0.0% +68.7% +67.8% +7.1%
> Geometric Mean +0.0% +0.0% +30.7% +29.3% +0.7%
>
> On Jul 1, 2011, at 2:45 PM, David Peixotto wrote:
>
>>
>> On Jul 1, 2011, at 2:05 PM, Simon Marlow wrote:
>>
>>> On 30/06/11 17:43, David Peixotto wrote:
>>>> I have made the changes necessary to compile GHC with llvm-gcc. The
>>>> major change was to use the pthread api for thread level storage to
>>>> access the gct variable during garbage collection. My measurements
>>>> indicate this causes an average slowdown of about 5% for gc heavy
>>>> programs. The changes are available from the `clang` branch on my
>>>> github fork.
>>>
>>> Sounds good. One question: did you measure the GC performance with -threaded? Because the thread-specific variable in the GC is only used with -threaded.
>>>
>>
>> Oops, I totally forgot about that :\ Those numbers were actually for the non-threaded runtime, so they don't measure the changes to the GC just the difference in compiling with llvm-gcc. I'll rerun the benchmarks with -threaded. Sorry about that!
>>
>>
>> _______________________________________________
>> Cvs-ghc mailing list
>> Cvs-ghc at haskell.org
>> http://www.haskell.org/mailman/listinfo/cvs-ghc
>>
>
More information about the Cvs-ghc
mailing list