Speed of simple operations with Ptr Word32s

Wolfgang Thaller wolfgang.thaller at gmx.net
Sat Dec 4 19:39:49 EST 2004


Ian Lynagh wrote:
> Hi all,
>
> I was under the impression that simple code like the below, which swaps
> the endianness of a block of data, ought to be near C speed:
>
> [...]
>       poke p (shiftL x 24 .|. shiftL (x .&. 0xff00) 8
>                           .|. (shiftR x 8 .&. 0xff00)
>                           .|. shiftR x 24)
> [...]

The problem here is that the shiftL and shiftR operations don't get 
inlined properly. They get replaced by a call to shift, but that 
doesn't get inlined.
The shift function also wastes some more time by checking the sign of 
the shift amount.
A few well-placed INLINE pragmas in the libraries might help.

> Is there anything I can do to get better performance in this sort of
> code without resorting to calling out to C?

You could import some private GHC modules and use the primop directly:

import GHC.Prim
import GHC.Word

main :: IO ()
main = do p <- mallocArray 104857600
           foo p 104857600

shiftL (W32# a) (I# b) = W32# (shiftL# a b)
shiftR (W32# a) (I# b) = W32# (shiftRL# a b)

Using those instead of the standard ones speeds up the program a lot; 
be aware however that you shouldn't use negative shift amounts with 
those (undefined result, no checking).

Cheers,

Wolfgang



More information about the Glasgow-haskell-users mailing list