[Haskell-cafe] Switch optimization

Stefan O'Rear stefanor at cox.net
Sun Jun 10 20:00:00 EDT 2007


On Mon, Jun 11, 2007 at 09:43:17AM +1000, Thomas Conway wrote:
> Hi All,
> 
> I'm writing some variable byte codec routines, which are used in
> inner^N loops, and which I want to make really efficient. There are
> several spots where the code uses lookup tables.
> 
> The best example is the table, which given the first byte, returns the
> number of additional bytes. It is a precomputed version of the
> following function:
> 
> >codeLength :: Word8 -> Int
> >codeLength w
> >     | w .&. 0x80 == 0   = 0
> >     | otherwise         = 1 + (codeLength $ w `shiftL` 1)
> 
> from which we construct a 256 entry table:
> 
> codeLen 0 = 0
> codeLen 1 = 0
> codeLen 2 = 0
> ...
> codeLen 127 = 0
> codeLen 128 = 1
> ...
> codeLen 255 = 8
> 
> Now, compiling with ghc 6.6.1 and -O3, I see that it generates a long
> chain of conditional branches. Now, even taking laziness into account,
> this seems like terribly inefficient code.  I wrote this thinking it
> would be more efficient than constructing a CAF array:
> 
> codeLens = listArray (0,255) (map codeLength [0..255])
> 
> but I'm guessing the CAF version would probably work out much more
> efficient in the long run.
> 
> However, I think ghc (and any other compiler), should detect the
> following properties:
> 
> 1. all the clauses are mutually exclusive, so the sequencing is
> irrelevant to the semantics
> 
> 2. Given an explicit type declaration Word8 -> ...., the 256 cases
> cover all the possible constructors of the type, so there are no
> missing clauses.
> 
> I would have expected the generated code to have the form:
> 
> codeLen:
>    x <- pick up arg
>   return closure (codeLen' x)
> 
> codeLen':
>    x' <- force x
>    update precomputed-table[x']
> 
> Even if you leave out property 2 and include bounds checks, this seems
> like an important kind of function to optimize well. So what have I
> missed, or is it time to learn how to hack on ghc?

I'd suggest learning how to hack on GHC, I get a chain of branches even
at the maximum optimization setting.  Hackers are always appreciated!

For an early project, how about
http://hackage.haskell.org/trac/ghc/ticket/1261 :)

(-O3 is misparsed as an extremely low setting, worse than -O and
sometimes worse than -Onot)

BTW, you should definitely look at the FFI for small performance
critical functions.  It's already a miracle that Haskell is as fast as
it is; I would not hold high hopes for idiomatic general Haskell
reaching C speed any time soon.  (For some special cases like operating
on multi-MB strings and large parallel arrays, there is definite
promise.)  Of course the interface is fairly costly, be sure to
benchmark.

Stefan


More information about the Haskell-Cafe mailing list