Stack manipulation in LLVM backend
Sam Martin
sam.martin at geomerics.com
Mon Apr 5 14:46:02 EDT 2010
Hi guys,
Looks like there's a possible fix on that bug already.
This is pretty interesting discovery, and from the llvm-dev thread it sounds like those guys are aware of this but probably snowed under. I wonder how much mileage there might be in attempting to collar one of the devs for a chat (eg. irc, skype or whatever)? I wouldn't be surprised if they had a load of useful info to divulge, and could probably tell you whether tailcallopt is essential or not.
Anyway, just a thought.
Cheers,
Sam
-----Original Message-----
From: cvs-ghc-bounces at haskell.org on behalf of David Terei
Sent: Mon 05/04/2010 15:03
To: Max Bolingbroke
Cc: GHC Development Mailing List
Subject: Re: Stack manipulation in LLVM backend
Yeah I had noticed this as well recently when I tried out the sibling
call optimisation improvements in LLVM.
The interesting thing is, can we get away with removing the
'-tailcallopt' flag and just hoping for the best? It makes me a little
worried that the tail calls aren't guaranteed but for the code GHC
generates LLVM really shouldn't have any troubles picking them up with
the plain sibling call optimisation.
I ran the testsuite (fast) against the LLVM back-end on x86-32 today
without the '-tailcallopt' argument to llc and had everything pass
fine. I also ran nofib a couple of times and got a nice performance
improvement of 2%. I'll try some more tests and on a different
platform or two as well but its probably worth while getting rid of
the '-tailcallopt' argument for now.
Thanks for filing the bug report, will be interesting to see if there
is much of a response.
NoFib Results
--------------------------------------------------------------------------------
Program Size Allocs Runtime Elapsed
--------------------------------------------------------------------------------
anna +2.0% +0.0% 0.12 0.13
ansi +0.0% +0.0% 0.00 0.00
atom +0.0% +0.0% +8.6% +10.0%
awards +0.0% +0.0% 0.00 0.00
banner +0.0% +0.0% 0.00 0.00
bernouilli +0.0% +0.0% +0.5% +0.5%
boyer +0.0% +0.0% 0.04 0.05
boyer2 +0.1% +0.0% 0.01 0.01
bspt +0.5% +0.0% 0.01 0.01
cacheprof +0.6% +0.0% +2.6% +2.9%
calendar +0.0% +0.0% 0.00 0.00
cichelli +0.1% +0.0% 0.10 0.10
circsim +0.2% +0.0% +0.7% +0.2%
clausify +0.0% +0.0% 0.05 0.05
comp_lab_zift +0.1% +0.0% +0.0% +0.9%
compress +0.1% +0.0% 0.20 0.21
compress2 +0.1% +0.0% 0.16 0.18
constraints +0.1% +0.0% -1.4% +1.5%
cryptarithm1 +0.0% +0.0% +5.1% +1.3%
cryptarithm2 +0.1% +0.0% 0.02 0.01
cse +0.1% +0.0% 0.00 0.00
eliza +0.0% +0.0% 0.00 0.00
event +0.0% +0.0% 0.16 0.18
exp3_8 +0.0% +0.0% 0.11 0.12
expert +0.1% +0.0% 0.00 0.00
fem +0.5% -0.0% 0.02 0.03
fft +0.1% +0.0% 0.03 0.03
fft2 +0.1% +0.0% 0.08 0.09
fibheaps +0.1% +0.0% 0.03 0.04
fish +0.1% +0.0% 0.02 0.02
fluid +0.8% +0.0% 0.01 0.01
fulsom +0.6% +0.0% +2.3% +0.7%
gamteb +0.2% +0.0% 0.10 0.10
gcd +0.0% +0.0% 0.03 0.03
gen_regexps +0.0% +0.0% 0.00 0.00
genfft +0.0% +0.0% 0.04 0.04
gg +0.6% +0.0% 0.01 0.01
grep +0.2% +0.0% 0.00 0.00
hidden +0.4% -0.0% +1.3% -2.5%
hpg +0.4% +0.0% 0.17 +2.7%
ida +0.1% +0.0% 0.11 0.12
infer +0.4% +0.0% 0.06 0.06
integer +0.0% +0.0% +3.0% +4.2%
integrate +0.0% +0.0% -1.4% -3.0%
knights +0.2% +0.0% 0.00 0.01
lcss +0.0% +0.0% +3.8% +1.1%
life +0.0% +0.0% +4.9% +4.0%
lift +0.3% +0.0% 0.00 0.00
listcompr +0.0% +0.0% 0.11 0.11
listcopy +0.1% +0.0% 0.11 0.12
maillist +0.0% -0.0% 0.06 0.10
mandel +0.1% -0.0% 0.13 0.13
mandel2 +0.0% +0.0% 0.01 0.01
minimax +0.1% +0.0% 0.00 0.00
mkhprog +0.1% +0.0% 0.00 0.00
multiplier +0.2% +0.0% 0.13 0.14
nucleic2 +0.2% +0.0% 0.09 0.09
para +0.2% +0.0% +0.0% -0.9%
paraffins +0.1% +0.0% 0.10 0.11
parser +0.5% +0.0% 0.04 0.04
parstof +0.3% +0.0% 0.00 0.01
pic +0.2% +0.0% 0.01 0.01
power +0.2% +0.0% +0.8% +3.5%
pretty +0.1% +0.0% 0.00 0.00
primes +0.0% +0.0% 0.05 0.06
primetest +0.1% +0.0% -2.4% -3.4%
prolog +0.2% +0.0% 0.00 0.00
puzzle +0.1% +0.0% 0.20 +5.9%
queens +0.0% +0.0% 0.02 0.02
reptile +0.3% +0.0% 0.02 0.02
rewrite +0.2% +0.0% 0.02 0.02
rfib +0.0% +0.0% 0.04 0.04
rsa +0.0% +0.0% 0.09 0.09
scc +0.0% +0.0% 0.00 0.00
sched +0.0% +0.0% 0.03 0.03
scs +0.5% -0.0% -1.3% -2.0%
simple +0.8% +0.0% +3.8% +0.0%
solid +0.1% +0.0% 0.18 -10.6%
sorting +0.0% +0.0% 0.00 0.00
sphere +0.3% +0.0% 0.12 0.13
symalg +0.4% +0.0% 0.04 0.04
tak +0.0% +0.0% 0.02 0.02
transform +0.4% +0.0% +4.3% +0.5%
treejoin +0.0% -0.0% +1.7% -12.3%
typecheck +0.1% +0.0% +1.6% +0.7%
veritas +1.6% +0.0% 0.00 0.00
wang +0.1% +0.0% 0.10 0.11
wave4main +0.1% +0.0% +2.9% +2.8%
wheel-sieve1 +0.0% +0.0% +10.1% +9.9%
wheel-sieve2 +0.0% +0.0% 0.17 -2.9%
x2n1 +0.0% +0.0% 0.02 0.03
--------------------------------------------------------------------------------
Min +0.0% -0.0% -2.4% -12.3%
Max +2.0% +0.0% +10.1% +10.0%
Geometric Mean +0.2% -0.0% +2.2% +0.5%
On 4 April 2010 21:52, Max Bolingbroke <batterseapower at hotmail.com> wrote:
> I've filed this as bug 6774 on the LLVM trac
> (http://llvm.org/bugs/show_bug.cgi?id=6774), so the issue at least
> stands a chance of being fixed.
>
> On 2 April 2010 16:02, Max Bolingbroke <batterseapower at hotmail.com> wrote:
>> Hi,
>>
>> I've narrowed down one of the problems with the LLVM backend that
>> David descried in his thesis. LLVM sometimes generates gratuitous
>> extra stack manipulations.
>>
>> For example, consider this input program:
>>
>> {{{
>> module Toy where
>>
>> toy :: (Int -> Maybe Int) -> Int
>> toy f = case f 10 of Just x -> x; Nothing -> 20
>> }}}
>>
>> Compiled using ghc -O (which runs llvl's "opt -O2"), we get the
>> following LLVM IR:
>>
>> {{{
>> define cc10 void @Toy_toy_entry(i32 %stg_terei_baseArg, i32
>> %stg_terei_spArg, i32 %stg_terei_hpArg, i32 %stg_terei_r1Arg) nounwind
>> {
>> cdx:
>> %ndz = add i32 %stg_terei_spArg, -4 ; <i32> [#uses=3]
>> %ndB = add i32 %stg_terei_baseArg, 84 ; <i32> [#uses=1]
>> %ndC = inttoptr i32 %ndB to i32* ; <i32*> [#uses=1]
>> %ndD = load i32* %ndC ; <i32> [#uses=1]
>> %ndE = icmp ult i32 %ndz, %ndD ; <i1> [#uses=1]
>> br i1 %ndE, label %cdG, label %ndH
>>
>> ndH: ; preds = %cdx
>> %ndJ = inttoptr i32 %stg_terei_spArg to i32* ; <i32*> [#uses=2]
>> %ndK = load i32* %ndJ ; <i32> [#uses=1]
>> %ndP = inttoptr i32 %ndz to i32* ; <i32*> [#uses=1]
>> store i32 add (i32 ptrtoint (%Toy_toy1_closure_struct*
>> @Toy_toy2_closure to i32), i32 1), i32* %ndP
>> store i32 ptrtoint (%sbo_info_struct* @sbo_info to i32), i32* %ndJ
>> tail call cc10 void @stg_ap_p_fast(i32 %stg_terei_baseArg, i32 %ndz,
>> i32 %stg_terei_hpArg, i32 %ndK) nounwind
>> ret void
>>
>> cdG: ; preds = %cdx
>> %ne1 = add i32 %stg_terei_baseArg, -4 ; <i32> [#uses=1]
>> %ne2 = inttoptr i32 %ne1 to i32* ; <i32*> [#uses=1]
>> %ne3 = load i32* %ne2 ; <i32> [#uses=1]
>> %ne4 = inttoptr i32 %ne3 to void (i32, i32, i32, i32)* ; <void (i32,
>> i32, i32, i32)*> [#uses=1]
>> tail call cc10 void %ne4(i32 %stg_terei_baseArg, i32
>> %stg_terei_spArg, i32 %stg_terei_hpArg, i32 ptrtoint
>> (%Toy_toy_closure_struct* @Toy_toy_closure to i32)) nounwind
>> ret void
>> }
>> }}}
>>
>> So far so good. However, when we compile this with "llc -tailcallopt
>> -O2 -march=x86", we get:
>>
>> {{{
>> _Toy_toy_entry: ## @Toy_toy_entry
>> ## BB#0: ## %cdx
>> subl $12, %esp
>> leal -4(%ebp), %eax
>> cmpl 84(%ebx), %eax
>> jb LBB2_2
>> ## BB#1: ## %ndH
>> movl (%ebp), %esi
>> movl $_Toy_toy2_closure+1, -4(%ebp)
>> movl $_sbo_info, (%ebp)
>> movl %eax, %ebp
>> addl $12, %esp
>> jmp _stg_ap_p_fast # TAILCALL
>> LBB2_2: ## %cdG
>> movl -4(%ebx), %eax
>> movl $_Toy_toy_closure, %esi
>> addl $12, %esp
>> jmpl *%eax # TAILCALL
>>
>> .globl ___stginit_Toy_
>> .align 4, 0x90
>> }}}
>>
>> Note the "subl $12, %esp" and "addl $12, %esp" pairs. %esp is never
>> actually used, so this fiddling is entirely pointless! This seems to
>> happen in every single function LLVM compiles, so I imagine fixing the
>> problem would be quite a big win.
>>
>> Without -tailcallopt, we get nice code:
>>
>> {{{
>> _Toy_toy_entry: ## @Toy_toy_entry
>> ## BB#0: ## %cdx
>> leal -4(%ebp), %eax
>> cmpl 84(%ebx), %eax
>> jb LBB2_2
>> ## BB#1: ## %ndH
>> movl (%ebp), %esi
>> movl $_Toy_toy2_closure+1, -4(%ebp)
>> movl $_sbo_info, (%ebp)
>> movl %eax, %ebp
>> jmp _stg_ap_p_fast # TAILCALL
>> LBB2_2: ## %cdG
>> movl -4(%ebx), %eax
>> movl $_Toy_toy_closure, %esi
>> jmpl *%eax # TAILCALL
>> }}}
>>
>> Unfortunately -tailcallopt is apparently the only way to get
>> *guaranteed* tail calls. The LLVM developers appear to know that
>> tailcallopt causes this sort of rubbish in the output
>> (http://old.nabble.com/Removing--tailcallopt--td27475523.html) but
>> obviously haven't fixed it (my results are with the very recent LLVM
>> r100183).
>>
>> I couldn't get the stack manipulations to go away with any amount of
>> "noreturn" and "unreachable" annotations around "tail call"s and
>> function definitions. It was worth a try though :-)
>>
>> In summary, it looks like anyone wanting to fix the excess stack
>> manipulations issue is going to have to get their hands messy and
>> delve into LLVM's LLC!
>>
>> Cheers,
>> Max
>>
>
> _______________________________________________
> Cvs-ghc mailing list
> Cvs-ghc at haskell.org
> http://www.haskell.org/mailman/listinfo/cvs-ghc
>
_______________________________________________
Cvs-ghc mailing list
Cvs-ghc at haskell.org
http://www.haskell.org/mailman/listinfo/cvs-ghc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/cvs-ghc/attachments/20100405/4f7c27dc/attachment-0001.html
More information about the Cvs-ghc
mailing list