[Haskell-cafe] MD5 performance optimizations, and GHC -via-C producing segfaulting binary

Don Stewart dons at galois.com
Tue May 20 13:48:37 EDT 2008


Thanks for taking a look at this.

kirby81:
> 2008/5/17 Andrew Coppin <andrewcoppin at btinternet.com>:
> > Hi folks.
> >
> > OK, try this:
> >
> >  darcs get http://darcs.orphi.me.uk/MyMD5
> >  cd MyMD5
> >  ghc -O2 --make md5sum
> >  md5sum <some large filename>
> 
> I've got some time to take a look at the code. It's very nice,
> readable and declarative, but obviously not optimized for "raw speed".
> There're a few things to do to have a speed-up of 2x, without going "low-level".
> 
> 1) There's a lazyness leak in Pad.hs. The sum in:
>           else make_block_bs bs0 : work (n + block_size_bytes) bs1
> is not strict. With a very large file (e.g. 100 mb) it means stack overflow.
> 
> [roxas:~/Desktop/test2/MyMD5] kirby% ghc -O2 --make md5sum.hs
> 
> [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
> Stack space overflow: current size 8388608 bytes.
> Use `+RTS -Ksize' to increase it.
> 
> To solve it just add a bang (!) before the n parameter:
> 
>     work !n bs =
> 
> You've got to add {-# LANGUAGE BangPatterns #-} at the top of the file
> too. There're solutions that don't imply BangPatterns, and use only
> seq, but I like bang patterns!
> 
> Now it works:
> [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
> E6542028D538BAEC180A96A5D1E6EC3A
> 11.958u 0.210s 0:12.19 99.7%	0+0k 0+2io 0pf+0w

Good idea. I saw some lazy left folds in there too, which look
suspicious.

> 
> 2) make_block_bs is sub-optimal, and very critical to performance. I
> decided to use Data.Binary for it (it's also more readable, and you
> get rid of the unsafeWrite):
> import Data.Binary
> import Data.Binary.Get
> import Control.Monad
> 
> // ...
> 
> instance Binary Block where
>     put _ = undefined
>     get = do
>         xs <- replicateM 16 getWord32le
>         return $ Block $ listArray (0, 15) xs
> 
> make_block_bs :: B.ByteString -> Block
> make_block_bs = decode
> 
> Now we are getting faster:
> [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
> E6542028D538BAEC180A96A5D1E6EC3A
> 8.063u 0.174s 0:08.23 100.0%	0+0k 0+0io 0pf+0w

Definitely a good idea.
  
> 3) You are doing a lot of access to fields of a strict data type
> (State). You can at least ask the compiler to help you a bit with
> -funbox-strict-fields.
> 
> Now we have:
> [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
> E6542028D538BAEC180A96A5D1E6EC3A
> 6.780u 0.133s 0:06.91 100.0%	0+0k 0+1io 0pf+0w
> 
> We have got a good, nearly 2x, speed-up with very few optimizations,
> and we run in a very small constant amount of memory.

Great.

> 
> Probably compiling with -fvia-C could help even more, but strangely:
> 
> [roxas:~/Desktop/test2/MyMD5] kirby% ghc -fvia-C -funbox-strict-fields
> -O2 --make md5sum.hs
> [roxas:~/Desktop/test2/MyMD5] kirby% time ./md5sum ../../jboss-4.2.2.GA.zip
> Segmentation fault
> 
> Is this supposed to happen? There're no "unsafe" functions or imports
> used in the program. Maybe a bug in GHC?

That could be a gcc bug, in fact. Can you provide the final source?
Does changing the gcc optimisation level help? (i.e. -fvia-C -optc-O2 )

Andrew, I hope you look carefully at the suggestions here, and use them
to improve the code you were working on.

-- Don


More information about the Haskell-Cafe mailing list