Stack manipulation in LLVM backend
Max Bolingbroke
batterseapower at hotmail.com
Fri Apr 2 11:02:49 EDT 2010
Hi,
I've narrowed down one of the problems with the LLVM backend that
David descried in his thesis. LLVM sometimes generates gratuitous
extra stack manipulations.
For example, consider this input program:
{{{
module Toy where
toy :: (Int -> Maybe Int) -> Int
toy f = case f 10 of Just x -> x; Nothing -> 20
}}}
Compiled using ghc -O (which runs llvl's "opt -O2"), we get the
following LLVM IR:
{{{
define cc10 void @Toy_toy_entry(i32 %stg_terei_baseArg, i32
%stg_terei_spArg, i32 %stg_terei_hpArg, i32 %stg_terei_r1Arg) nounwind
{
cdx:
%ndz = add i32 %stg_terei_spArg, -4 ; <i32> [#uses=3]
%ndB = add i32 %stg_terei_baseArg, 84 ; <i32> [#uses=1]
%ndC = inttoptr i32 %ndB to i32* ; <i32*> [#uses=1]
%ndD = load i32* %ndC ; <i32> [#uses=1]
%ndE = icmp ult i32 %ndz, %ndD ; <i1> [#uses=1]
br i1 %ndE, label %cdG, label %ndH
ndH: ; preds = %cdx
%ndJ = inttoptr i32 %stg_terei_spArg to i32* ; <i32*> [#uses=2]
%ndK = load i32* %ndJ ; <i32> [#uses=1]
%ndP = inttoptr i32 %ndz to i32* ; <i32*> [#uses=1]
store i32 add (i32 ptrtoint (%Toy_toy1_closure_struct*
@Toy_toy2_closure to i32), i32 1), i32* %ndP
store i32 ptrtoint (%sbo_info_struct* @sbo_info to i32), i32* %ndJ
tail call cc10 void @stg_ap_p_fast(i32 %stg_terei_baseArg, i32 %ndz,
i32 %stg_terei_hpArg, i32 %ndK) nounwind
ret void
cdG: ; preds = %cdx
%ne1 = add i32 %stg_terei_baseArg, -4 ; <i32> [#uses=1]
%ne2 = inttoptr i32 %ne1 to i32* ; <i32*> [#uses=1]
%ne3 = load i32* %ne2 ; <i32> [#uses=1]
%ne4 = inttoptr i32 %ne3 to void (i32, i32, i32, i32)* ; <void (i32,
i32, i32, i32)*> [#uses=1]
tail call cc10 void %ne4(i32 %stg_terei_baseArg, i32
%stg_terei_spArg, i32 %stg_terei_hpArg, i32 ptrtoint
(%Toy_toy_closure_struct* @Toy_toy_closure to i32)) nounwind
ret void
}
}}}
So far so good. However, when we compile this with "llc -tailcallopt
-O2 -march=x86", we get:
{{{
_Toy_toy_entry: ## @Toy_toy_entry
## BB#0: ## %cdx
subl $12, %esp
leal -4(%ebp), %eax
cmpl 84(%ebx), %eax
jb LBB2_2
## BB#1: ## %ndH
movl (%ebp), %esi
movl $_Toy_toy2_closure+1, -4(%ebp)
movl $_sbo_info, (%ebp)
movl %eax, %ebp
addl $12, %esp
jmp _stg_ap_p_fast # TAILCALL
LBB2_2: ## %cdG
movl -4(%ebx), %eax
movl $_Toy_toy_closure, %esi
addl $12, %esp
jmpl *%eax # TAILCALL
.globl ___stginit_Toy_
.align 4, 0x90
}}}
Note the "subl $12, %esp" and "addl $12, %esp" pairs. %esp is never
actually used, so this fiddling is entirely pointless! This seems to
happen in every single function LLVM compiles, so I imagine fixing the
problem would be quite a big win.
Without -tailcallopt, we get nice code:
{{{
_Toy_toy_entry: ## @Toy_toy_entry
## BB#0: ## %cdx
leal -4(%ebp), %eax
cmpl 84(%ebx), %eax
jb LBB2_2
## BB#1: ## %ndH
movl (%ebp), %esi
movl $_Toy_toy2_closure+1, -4(%ebp)
movl $_sbo_info, (%ebp)
movl %eax, %ebp
jmp _stg_ap_p_fast # TAILCALL
LBB2_2: ## %cdG
movl -4(%ebx), %eax
movl $_Toy_toy_closure, %esi
jmpl *%eax # TAILCALL
}}}
Unfortunately -tailcallopt is apparently the only way to get
*guaranteed* tail calls. The LLVM developers appear to know that
tailcallopt causes this sort of rubbish in the output
(http://old.nabble.com/Removing--tailcallopt--td27475523.html) but
obviously haven't fixed it (my results are with the very recent LLVM
r100183).
I couldn't get the stack manipulations to go away with any amount of
"noreturn" and "unreachable" annotations around "tail call"s and
function definitions. It was worth a try though :-)
In summary, it looks like anyone wanting to fix the excess stack
manipulations issue is going to have to get their hands messy and
delve into LLVM's LLC!
Cheers,
Max
More information about the Cvs-ghc
mailing list