Optimizing the placement of heap checks (trac #1498)

David Peixotto dmp at rice.edu
Mon Mar 15 14:35:44 EDT 2010


I have some questions after starting to look at implementing an optimization in the GHC back end related to Trac ticket #1498[1,2]. The goal of the optimization is to eliminate unnecessary heap checks in recursive functions. 

For example, if we have a recursive function such as

f :: Int -> Int -> Maybe Int
f = \x y -> case x of
             0 ->   (Just y)
             _ -> f (x-1) y

we see that it only does an allocation in the base case of the recursion. GHC HEAD generates great looking core for this function (-ddump-stg):

Main.$wf =
    \r [ww_sSX w_sT1]
        case ww_sSX of ds_sSZ {
          __DEFAULT ->
              case -# [ds_sSZ 1] of sat_sT8 {
                __DEFAULT -> Main.$wf sat_sT8 w_sT1;
              };
          0 -> Data.Maybe.Just [w_sT1];
        };

With the corresponding Cmm code (-ddump-cmm):

Main_zdwf_entry()
<...info table elided...>
cT6:
    Hp = Hp + 16;
    if (Hp > HpLim) goto cT9;
    _sSS::I64 = R2;
    if (_sSS::I64 != 0) goto cTc;
    I64[Hp - 8] = base_DataziMaybe_Just_con_info;
    I64[Hp + 0] = R3;
    R1 = Hp - 6;
    jump (I64[Sp + 0]) ();
cT9:
    HpAlloc = 16;
    R1 = Main_zdwf_closure;
    jump stg_gc_fun ();
cTc:
    _sT1::I64 = _sSS::I64 - 1;
    R2 = _sT1::I64;
    Hp = Hp - 16;
    jump Main_zdwf_info ();
}

Now we have a nice looking loop, but the straightforward code generation done in the native back end causes a heap check on every iteration giving these extra overheads on each iteration:

  1) incrementing the heap pointer
  2) loading the heap limit
  3) testing the heap pointer against the limit
  4) decrementing the heap pointer

It was suggested in an earlier discussion[2] that we could get rid of this overhead by pushing the heap check down into the branch that actually does the allocation. It is noted that this is not a safe transformation in general because the heap check can only be done at a "safe point" for the GC.

Interestingly, the LLVM backend gets good code by moving the heap check in the opposite direction. It pulls the heap check up out of the loop that it generates for the tail-recursive call. It ends up with the three instruction loop that you would get if you were to write the loop by hand (decrement, test, branch) and does the allocation in the fall through case of the loop. 

The LLVM generated code shows the advantage of having powerful low-level transformations in GHC. If the LLVM is invoked with -O instead of -O2, it still transforms the recursive call to a loop, but does not hoist the heap check out of the loop. The assembly codes from the two back ends are pasted at the end of the message for the interested.

Looking into this issue brings me to ask a few questions. Any feedback is welcomed.

1) Does it still make sense to pursue these optimizations in the native code generator?

Given the strong optimizations already in LLVM is it worth investing time in the GHC native backend? One real advantage I see with the native back end (and an area I would like to explore) is that it is easier to pass information from the high-level optimizer to the low-level optimizer when it all resides in the same compiler. This may allow certain optimizations to fire that would be very difficult to prove safe in the LLVM back end.

2) What is the definition of a safe point for the GC? 

My understanding is that currently all heap checks are done on entry to a closure, but this could be relaxed as long as we can provide a safe continuation point where we can resume execution after GC. One idea that could help in the example above would be to outline the allocation points into separate functions at the cmm level so that f no longer allocates any data and the check is only performed in the new outlined function in the base case. It seems to me that there could be some interesting optimization opportunities for code motion of heap checks.

3) What is the status of the new code generator? 

I've seen a number of wiki pages describing the new code generation pipeline[3,4], but it is a little difficult to tell the exact status of the implementation. Are people still actively working on the new code generator or has most of the work been completed?


Thanks!

-Dave

[1] http://hackage.haskell.org/trac/ghc/ticket/1498
[2] http://www.haskell.org/pipermail/cvs-ghc/2007-July/036496.html
[3] http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/NewCodeGen
[4] http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/NewCodeGenPipeline

#
# LLVM ASSEMBLY (x86_64)
#
Main_zdwf_entry:                        # @Main_zdwf_entry
# BB#0:                                 # %cVo
	subq	$8, %rsp
	leaq	16(%r12), %rax
	cmpq	144(%r13), %rax
	ja	.LBB1_3
	jmp	.LBB1_1
	.align	16
.LBB1_4:                                # %cVG
                                        #   in Loop: Header=BB1_1 Depth=1
	decq	%r14
.LBB1_1:                                # %nVA
                                        # =>This Inner Loop Header: Depth=1
	testq	%r14, %r14
	jne	.LBB1_4
# BB#2:                                 # %nVH
	movq	$base_DataziMaybe_Just_con_info, 8(%r12)
	movq	%rsi, 16(%r12)
	movq	%r12, %rbx
	movq	%rax, %r12
	movq	(%rbp), %rcx
	addq	$10, %rbx
	movq	(%rcx), %r11
	addq	$8, %rsp
	jmpq	*%r11  # TAILCALL
.LBB1_3:                                # %cVz
	movq	$16, 184(%r13)
	movl	$Main_zdwf_closure, %ebx
	movq	%rax, %r12
	movq	-8(%r13), %r11
	addq	$8, %rsp
	jmpq	*%r11  # TAILCALL
	
#
# NATIVE BACK-END ASSEMBLY (x86_64)
#	
Main_zdwf_info:
.LcTh:
	addq $16,%r12
	cmpq 144(%r13),%r12
	ja .LcTk
	movq %r14,%rax
	testq %rax,%rax
	jne .LcTn
	movq $base_DataziMaybe_Just_con_info,-8(%r12)
	movq %rsi,(%r12)
	leaq -6(%r12),%rbx
	jmp *(%rbp)
.LcTk:
	movq $16,184(%r13)
	movl $Main_zdwf_closure,%ebx
	jmp *-8(%r13)
.LcTn:
	leaq -1(%rax),%r14
	addq $-16,%r12
	jmp Main_zdwf_info

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/cvs-ghc/attachments/20100315/bb9808af/attachment.html


More information about the Cvs-ghc mailing list