Merge Request: LLVM Code Generator for GHC

Simon Marlow marlowsd at gmail.com
Wed Feb 24 11:40:44 EST 2010


On 22/02/2010 16:49, Simon Marlow wrote:
> On 22/02/2010 12:34, Simon Marlow wrote:
>
>> I'm currently running some benchmarks to see how much impact turning off
>> TNTC has on the -fasm backend.
>
> Here are the results on x86-64/Linux:
[ snip ]
> --------------------------------------------------------------------------------
>
> Mi             +4.7% -0.0%  -0.6%  -1.7%
> Max            +8.9% +0.0% +16.9% +13.8%
> Geometric Mean +6.1% -0.0%  +4.9%  +4.2%

and here are the results on x86/Linux:

--------------------------------------------------------------------------------
         Program           Size    Allocs   Runtime   Elapsed
--------------------------------------------------------------------------------
            anna          +6.9%     +0.0%     +7.1%     +7.4%
            ansi          +4.3%     +0.0%      0.00      0.00
            atom          +4.5%     +0.0%    +23.6%    +21.7%
          awards          +4.2%     +0.0%      0.00      0.00
          banner          +3.5%     +0.0%      0.00      0.00
      bernouilli          +4.2%     +0.0%     +2.7%     +1.8%
           boyer          +4.3%     +0.0%      0.10      0.11
          boyer2          +4.1%     +0.0%      0.01      0.02
            bspt          +5.5%     +0.0%      0.02      0.02
       cacheprof          +5.3%     +0.0%     +3.1%     +3.0%
        calendar          +4.2%     +0.0%      0.00      0.00
        cichelli          +4.2%     +0.0%      0.19      0.22
         circsim          +4.6%     +0.0%     +3.3%     +2.5%
        clausify          +4.3%     +0.0%      0.07      0.09
   comp_lab_zift          +4.5%     +0.0%    +15.3%    +14.4%
        compress          +4.4%     +0.0%     +4.1%     +4.3%
       compress2          +4.3%     +0.0%     +0.5%     +0.4%
     constraints          +4.5%     +0.0%     +6.4%     +5.9%
    cryptarithm1          +3.8%     +0.0%     +5.3%     +3.3%
    cryptarithm2          +4.0%     +0.0%      0.03      0.03
             cse          +3.9%     +0.0%      0.00      0.00
           eliza          +3.6%     +0.0%      0.00      0.00
           event          +4.3%     +0.0%     +7.9%     +7.5%
          exp3_8          +4.2%     +0.0%    +17.8%    +13.3%
          expert          +4.1%     +0.0%      0.00      0.00
             fem          +5.5%     +0.0%      0.06      0.06
             fft          +4.6%     +0.0%      0.09      0.10
            fft2          +4.9%     +0.0%      0.22    +12.3%
        fibheaps          +4.3%     +0.0%      0.08      0.08
            fish          +4.0%     +0.0%      0.05      0.06
           fluid          +6.3%     +0.0%      0.02      0.02
          fulsom          +6.1%     +0.0%     +3.4%     +3.2%
          gamteb          +5.0%     +0.0%      0.19      0.21
             gcd          +4.2%     +0.0%      0.06      0.07
     gen_regexps          +4.0%     +0.0%      0.00      0.00
          genfft          +4.2%     +0.0%      0.09      0.10
              gg          +5.1%     +0.0%      0.03      0.03
            grep          +4.5%     +0.0%      0.00      0.00
          hidden          +5.7%  (stdout)  (stdout)  (stdout)
             hpg          +5.2%     +0.0%     +6.1%     +2.0%
             ida          +4.4%     +0.0%    +10.2%     +6.6%
           infer          +4.9%     +0.0%      0.13      0.14
         integer          +4.2%     +0.0%     +1.2%     -0.2%
       integrate          +4.6%     +0.0%     +4.9%     +5.0%
         knights          +4.6%     +0.0%      0.01      0.01
            lcss          +4.2%     +0.0%     +8.5%     +7.7%
            life          +3.8%     +0.0%    +23.8%    +19.5%
            lift          +4.5%     +0.0%      0.00      0.00
       listcompr          +3.8%     +0.0%     +5.3%     +4.7%
        listcopy          +3.8%     +0.0%     +5.7%     +6.3%
        maillist          +4.0%     +0.0%      0.15     +6.1%
          mandel          +4.5%     +0.0%     -0.6%     -2.4%
         mandel2          +3.9%     +0.0%      0.02      0.02
         minimax          +4.2%     +0.0%      0.01      0.01
         mkhprog          +4.2%     +0.0%      0.00      0.01
      multiplier          +4.4%     +0.0%    +10.0%    +10.6%
        nucleic2          +4.6%     +0.0%    +16.8%    +15.0%
            para          +4.4%     +0.0%    +11.7%     +9.7%
       paraffins          +4.3%     +0.0%     -1.9%     +0.8%
          parser          +5.0%     +0.0%      0.08      0.08
         parstof          +4.8%     +0.0%      0.02      0.02
             pic          +5.0%     +0.0%      0.03      0.03
           power          +4.4%     +0.0%     +2.7%     +2.7%
          pretty          +4.4%     +0.0%      0.00      0.00
          primes          +4.2%     +0.0%      0.12      0.13
       primetest          +4.3%     +0.0%     -0.9%     +0.5%
          prolog          +4.2%     +0.0%      0.00      0.00
          puzzle          +4.1%     +0.0%     +8.7%     +7.8%
          queens          +4.2%     +0.0%      0.03      0.03
         reptile          +5.1%     +0.0%      0.03      0.04
         rewrite          +4.6%     +0.0%      0.02      0.03
            rfib          +4.5%     +0.0%      0.12      0.12
             rsa          +4.3%     +0.0%      0.17      0.18
             scc          +3.7%     +0.0%      0.00      0.00
           sched          +4.3%     +0.0%      0.05      0.05
             scs          +5.7%     +0.0%     +2.3%     +1.3%
          simple          +6.8%     +0.0%     +5.6%     +5.8%
           solid          +4.5%     +0.0%    +11.1%     +6.6%
         sorting          +4.0%     +0.0%      0.00      0.00
          sphere          +5.3%     +0.0%    +17.2%    +12.9%
          symalg          +5.3%     +0.0%      0.10      0.10
             tak          +4.2%     +0.0%      0.02      0.02
       transform          +4.9%     +0.0%     +2.2%     +2.1%
        treejoin          +3.7%     +0.0%     -0.4%     +2.7%
       typecheck          +4.3%     +0.0%    -23.8%    -24.1%
         veritas          +6.5%     +0.0%      0.00      0.00
            wang          +4.6%     +0.0%     +8.0%     +7.7%
       wave4main          +4.4%     +0.0%     +5.2%     +5.3%
    wheel-sieve1          +4.2%     +0.0%    +10.0%     +8.8%
    wheel-sieve2          +4.2%     +0.0%     +2.1%     +2.2%
            x2n1          +4.6%     +0.0%      0.06      0.06
--------------------------------------------------------------------------------
             Min          +3.5%     +0.0%    -23.8%    -24.1%
             Max          +6.9%     +0.0%    +23.8%    +21.7%
  Geometric Mean          +4.5%     -0.0%     +6.0%     +5.3%

Slightly worse than the x86_64 results, though this is an older processor.

The result for typecheck is very odd.  It's repeatable, but only on this 
machine - I suspect a bad cache interaction or similar.  I should 
probably re-run the tests on a machine with a more recent processor.

While I was at it, I measured the -fvia-C backend against the NCG:

--------------------------------------------------------------------------------
         Program           Size    Allocs   Runtime   Elapsed
--------------------------------------------------------------------------------
             Min          -7.7%    -47.3%    -33.0%    -30.2%
             Max          -3.8%     +0.0%    +29.5%    +28.8%
  Geometric Mean          -5.0%     -0.7%     -6.7%     -5.7%

while we weren't looking, the via-C backend has regressed a lot, at 
least on these "average" Haskell programs.  The +29.5% outlier is 
typecheck agaain, since I'm using the same set of results for -fasm as 
above.

I think the main reason for the regression is code like this:

	movl	$stg_ap_n_fast, %eax
.L3:
	jmp	*%eax
.L2:
	movl	$8, 112(%edx)
	movl	-8(%ebx), %eax
	jmp .L3

gcc is being too clever in commoning up the indirect jump.

Conclusion: don't use -fvia-C, even in 6.12, unless you are sure it 
speeds things up.  I'm turning it off for our builds.

===============

So here's a crazy idea.  Why don't we post-process the assembly code 
coming out of LLVM?  Before you throw up your hands in horror, consider that

  - it's a simple transformation, just re-ordering blocks of code

  - we can do it in Haskell using ByteStrings, it would probably
    amount to a couple of hundred lines of code at the most.  Perhaps
    an Alex lexer would be the quickest way to split into blocks, then
    a bit of Haskell to glue them back into the correct order.  We may
    have to fiddle with the .aligns a bit.

  - we don't care too much about compile-time performance, since LLVM is
    a -O2 thing, we have the NCG for generating code fast

  - at the same time we can talk with the LLVM folks about adding
    support for TNTC, but we'd have a way to generate code in the
    meantime.

Just a thought...

Cheers,
	Simon



More information about the Cvs-ghc mailing list