[Haskell-cafe] Re: #haskell works

Thu Dec 20 08:01:59 EST 2007

Tim Chevalier wrote:
> On 12/14/07, Dan Piponi <dpiponi at gmail.com> wrote:
>> There have been some great improvements in array handling recently. I
>> decided to have a look at the assembly language generated by some
>> simple array manipulation code and understand why C is at least twice
>> as fast as ghc 6.8.1. One the one hand it was disappointing to see
>> that the Haskell register allocator seems a bit inept and was loading
>> data into registers that should never have been spilled out of
>> registers in the first place.
> 
> Someone who knows the backend better than I do can correct me if I'm
> wrong, but it's my understanding that GHC 6.8.1 doesn't even attempt
> to do any register allocation on x86. So -- register allocator? What
> register allocator?

That's not entirely true - there is a fairly decent linear-scan register 
allocator in GHC

http://darcs.haskell.org/ghc/compiler/nativeGen/RegAllocLinear.hs

the main bottleneck is not the quality of the register allocation (at 
least, not yet).

The first problem is that in order to get good performance when compiling 
via C we've had to lock various global variables into registers (the heap 
pointer, stack pointer etc.), which leaves too few registers free for 
argument passing on x86, so the stack is used too much.  This is probably 
why people often say that the register allocator sucks - in fact it is 
really the calling convention that sucks.  There is some other stupidness 
such as reloading values from the stack, though.

Another problem is that the backend doesn't turn recursion into loops (i.e. 
backward jumps), so those crappy calling conventions are used around every 
loop.  If we fixed that - which is pretty easy, we've tried it - then the 
bottleneck becomes the lack of loop optimisations in the native code 
generator, and we also run into the limitations of the current register 
allocator.  Fortunately the latter has been fixed: Ben Lippmeier has 
written a graph-colouring allocator, and it's available for trying out in 
GHC HEAD.

Fixing it all properly means some fairly significant architectural changes, 
and dropping the via-C backend (except for bootstrapping on new platforms), 
which is what we'll be doing in 6.10.  I'd expect to see some dramatic 
improvements for those tight loops, in ByteString for example, but for 
"typical" Haskell code and GHC itself I'll be pleased if we get 10%.  We'll 
see.

Cheers,
	Simon