LLVM backend and spilling

Simon Marlow marlowsd at gmail.com
Mon Sep 13 04:29:29 EDT 2010


On 13/09/2010 09:03, David Terei wrote:
> Hi Simon,
>
> Hmm, I've been trying to find a nasty performance regression when ghc
> is bootstrapped with llvm, this problem must be a large part of this
> so thanks.

Right, I expect the mfence on every update is costing quite a lot (one 
of those can be hundreds of cycles).

Thanks for looking at this!

Cheers,
`	Simon


>>   - All the spilling.  In the slow path there's a foreign call (to allocBlocks_lock)
>>     but that is annotated with the live registers, in this case [R1], so I don't
>>     understand why LLVM should be spilling everything.  Any ideas?
>
> I fixed a problem like this ox x86 a month or two  back. The issue is this:
>
> fun f (Base, Hp, Sp, R1, R2, R3, R4, R5, R6) {
>    // do some stuff
>    call foreign g() [R1];
>    // do some stuff
>    tail call f_next (Base, Hp', Sp', R1', R2, R3, R4, R5, R6);
> }
>
> Because of the use of the calling convention to pass the stg registers
> around llvm thinks they are all live across the call in the code
> above, since well they are. The fix is to explicitly kill the
> registers that aren't live across the call, which is easy as llvm
> provides a nice symbolic 'undef' value that can do just this. So above
> code needs to become:
>
> fun f (Base, Hp, Sp, R1, R2, R3, R4, R5, R6) {
>    // do some stuff
>    call foreign g() [R1];
>    // do some stuff
>    tail call f_next (Base, Hp', Sp', R1', undef, undef, undef, undef, undef);
> }
>
> What I actually do is slightly different then this but the concept is
> the same. Hope that makes sense. As I said I thought I had made this
> fix a while back but hand written Cmm generally tests very different
> code paths than compiler generated Cmm, so I must have missed a case.
> This should be an easy fix as well.
>
> I played around a little with compiling Updates.cmm on x86-32, will
> try soon on x64. The ncg gives this assembly (annotated with the
> corresponding cmm for my benefit):
>
> _stg_upd_frame_info:
> .Lcz:
>    movl 4(%ebp),%eax                 # bits32 updatee = b32[Sp +
> SIZEOF_StgHeader]
>    movl %eax,64(%esp)                # spill updatee
>    addl $8,%ebp                      # Sp = Sp + (SIZEOF_StgHeader + 4)
>    movl %esi,4(%eax)                 # gcptr[updatee + SIZEOF_StgHeader] = R1
>                                      # prim %write_barrier() [];
>    movl $_stg_BLACKHOLE_info,0(%eax) # b32[updatee] = stg_BLACKHOLE_info
>
>    movl %eax,%ecx                    # copy updatee
>    andl $-1048576,%ecx               # x = updatee&   ((1<<  20) - 1)&
> ~((1<<  12) - 1)
>    andl $1044480,%eax                # y = updatee&  ~((1<<  20) - 1)
>    shrl $7,%eax                      # x>>  (12 - 5)
>    orl %ecx,%eax                     # bd = x&  y
>
>    cmpw $0,28(%eax)                  #
>    jne .LcA                          # if ( b16[bd + 28] != 0): goto .LcA
>
>    jmp *0(%ebp)                      # else: jump %ENTRY_CODE( bits32[Sp])
>
>
> If I compile it with llvm (manually fixing the write barrier issue) I
> get the following llvm code:
>
> define cc10 void @stg_upd_frame_info(i32* noalias nocapture %Base_Arg,
> i32* noalias nocapture %Sp_Arg, i32* noalias nocapture %Hp_Arg, i32
> %R1_Arg) nounwind section ".text; .text 2#" align 4 {
> c2K:
>    %ln2M = getelementptr inbounds i32* %Sp_Arg, i32 1
>    %ln2Q = load i32* %ln2M
>    %ln2V = getelementptr inbounds i32* %Sp_Arg, i32 2
>    %ln2Y = add i32 %ln2Q, 4
>    %ln30 = inttoptr i32 %ln2Y to i32*
>    store i32 %R1_Arg, i32* %ln30
>    %ln34 = inttoptr i32 %ln2Q to i32*
>    store i32 ptrtoint ([0 x i32]* @stg_BLACKHOLE_info to i32), i32* %ln34
>    %ln3c = lshr i32 %ln2Q, 7
>    %ln3e = and i32 %ln3c, 8160
>    %ln3j = and i32 %ln2Q, -1048576
>    %ln3k = or i32 %ln3e, %ln3j
>    %ln3m1 = or i32 %ln3k, 28
>    %ln3n = inttoptr i32 %ln3m1 to i16*
>    %ln3o = load i16* %ln3n, align 4
>    %ln3p = icmp eq i16 %ln3o, 0
>    br i1 %ln3p, label %n3r, label %c3q
>
> n3r:                                              ; preds = %c2K
>    %ln3x = load i32* %ln2V
>    %ln3y = inttoptr i32 %ln3x to void (i32*, i32*, i32*, i32)*
>    tail call cc10 void %ln3y(i32* %Base_Arg, i32* %ln2V, i32* %Hp_Arg,
> i32 %R1_Arg) nounwind
>    ret void
>
> Which seems quite good to me. That compiles to the assembly:
>
> _stg_upd_frame_info:
>    subl	$20, %esp
>    movl	%edi, 16(%esp)          # 4-byte Spill
>    movl	4(%ebp), %edi
>    addl	$8, %ebp
>    movl	%esi, 8(%esp)           # 4-byte Spill
>    movl	%edi, %ecx
>    movl	%edi, %eax
>    shrl	$7, %ecx
>    andl	$-1048576, %eax         # imm = 0xFFFFFFFFFFF00000
>    andl	$8160, %ecx             # imm = 0x1FE0
>    addl	%eax, %ecx
>    movl	%esi, 4(%edi)
>    movl	$_stg_BLACKHOLE_info, (%edi)
>    movswl	28(%ecx), %eax
>    movl	%eax, 12(%esp)          # 4-byte Spill
>    testl	%eax, %eax
>    je	LBB1_4
> [...]
> LBB1_4:                                 # %n3r
>    movl	(%ebp), %eax
>    movl	16(%esp), %edi          # 4-byte Reload
>    addl	$20, %esp
>    jmpl	*%eax  # TAILCALL
>
> Which is OK but still worse than the ncg. Improving this will probably
> require talking to the llvm guys, I think the llvm register allocator
> may have some asssumptions/design decisions that interact badly with
> our calling convention.
>
> Roman and I were a while ago investigating an issue where llvm wasn't
> doing a very good job for some dph code, quite a few unnecessary
> spills. We thought it was an aliasing issue but in the end seemed to
> be an llvm problem with the instruction selector/scheduler creating a
> lot of register pressure.
>
> Cheers,
> David
>
>>
>> I don't expect us to do better than the NCG here, because the NCG code is just about optimal, but I would like to use -fllvm on other parts of the RTS code so it would be good if we could generate code that is at least as good as the NCG here.
>>
>> Cheers,
>>         Simon
>>
>>
>
> _______________________________________________
> Cvs-ghc mailing list
> Cvs-ghc at haskell.org
> http://www.haskell.org/mailman/listinfo/cvs-ghc



More information about the Cvs-ghc mailing list