factorial: let's get ahead of jhc! :)

Fri Feb 24 11:18:30 EST 2006

Bulat Ziganshin wrote:

> i propose to do for start rather small change that will allow to
> make things go several times faster and in particular outperform jhc
> for the small "leaf" loops (i.e. loops that use only GHC primitives
> and don't call other functions). that include factorial, "a[]+=b[i]"
> loop that was discussed here several weeks ago, some shootout
> examples, my own arrays serialization code and so on, so on
> 
> the idea is that the following STG code
> 
> f :: Int# -> Double# -> ... -> Int#
> 
> (i.e. function use only unboxed values/results)
> 
> f x y z = code1 in
>   case expr1 of
>     A -> codeA
>     B -> codeB
>     C -> codeC in f x_c y_c z_c
>     D -> codeD in f x_d y_d z_d
>     _ -> code0 in f x0 y0 z0
> 
> can be compiled to the following C/C-- code:
> 
> f() {
>   x = sp[0];
>   y = sp[4];
>   z = sp[8];
>   sp+=12;
> 
>   code1;
>   while (expr1!=A) {
>     if (expr1==B) then return codeB;
>     else if (expr1==C) then {x=x_c; y=y_c; z=z_c;}
>     else if (expr1==D) then {x=x_d; y=y_d; z=z_d;}
>     else {x=x0; y=y0; z=z0;}
>     code1;
>   }
>   return codeA;
> }
> 
> this compilation don't require any changes in GHC memory model. all we
> need:
> 
> 1) add for/if/while to the C-- statement types list (data CmmStmt)

Please don't extend the C-- data type - it's very small and simple 
because that makes it easy to manipulate and reason about.

I don't see why you need to, either: you can already express for, if and 
while in C-- using conditional branches.  I don't think gcc cares 
whether you write your code using labels and goto or while/for/if, it 
generates the same code either way.

> 2) implement recognizer for such simple STG functions (leaf unboxed
> procedures recognizer)
> 
> 3) implement the translation itself

By all means try this.  What you want is to compile recursive functions 
like this:

   f() {
    x = arg1;
    y = arg2;
   L:
    ... body of f, with args mapped to x & y,
        and recursive calls jumping to L passing args in x & y.
   }

It's quite hard to do this as a C-- to C-- optimisation, at least 
without implementing a lot of other optimisations that we don't already 
have.  I was hoping that gcc would do it for us, if we compile code like 
this:

   f() {
   L:
    ... body of f ...
    goto L;
   }

but sadly it doesn't do the whole job.  (see cmmLoopifyForC in 
cmm/CmmOpt.hs, which I added today).

So you might try the hacky way of doing this transformation at a higher 
level, when generating the C-- in the first place.  Putting the args in 
temporaries is the easy bit; generating different code for the recursive 
call will be more tricky.

Cheers,
	Simon