[Haskell-beginners] Performance of parallel mergesort

Stephen Blackheath [to Haskell-Beginners] mutilating.cauliflowers.stephen at blacksapphire.com
Sun Dec 27 15:56:51 EST 2009


Jon,

Jon Harrop wrote:
> This is something that concerns me. Lots of discussions of parallelism,
> including the Haskell literature I have read, neglect this critical
problem
> of making sure that more time is spent doing useful work than spawning
tasks
> (sparks). How is this solved in Haskell? Do you just put magic numbers in
> that work on the machine you're currently using?

It is simply not true that Haskell literature neglects the question of
spark granularity - this is very basic and is always covered.  Read
"Real World Haskell" (available free online).  There's no 'magic
number'.  You must explicitly write your code to give the right granule
size.  I don't know the exact cost of sparking, but in my experience it
is quite small - so - as long as your granule is doing *some* real work,
it should speed up.

Jon Harrop wrote:
> The parallelism is obviously not obtaining any real speedup so something is 
> wrong. But can anyone fix it?

I've spent a bit of time measuring parallel speedup on real commercial
projects, and this is what I've found:

1. ghc-6.12 performs significantly better than ghc-6.10, and has now
been released, therefore don't use ghc-6.10.

2. The defaults generally work better than giving huge heap sizes.  Your
-K100000000 - maximum heap size per thread - will either do nothing or
cause an artificial slowdown (I have observed this with the minimum heap
size option).  Don't use it, unless experimentation proves it makes
things better.

3. +RTS -s is well worth using. It breaks the time down into MUT
(mutator) and GC (garbage collector).

4. MUT parallelization is excellent, but the parallel GC is not so good.
 If your code is GC-heavy it can spend around half of its time garbage
collecting, which doesn't parallelize so well, and this eats into the
speedup.

5. There seems to be a scalability problem with the parallel gc for
larger numbers of cores (namely 8).  I am guessing somewhat, but my
experiments tend to confirm the issue raised in Simon Marlow's (the
implementor of GHC parallelization) recent paper that it's to do with
"stopping the world" for gc.

If GHC's lack of perfection at this point in time makes Haskell "look
bad" I don't mind.  I am not selling anything, so the reader at least
knows they're getting the truth.  I see this as one of the great
advantages of open source.

Progress on GHC has been very rapid in the last couple of years, and so
I know we'll continue to see the speed of GHC's parallelism improving in
leaps and bounds.  It's actually still quite a new area, considering the
difficulty of some of the technical issues and how recent it is that
multicores are widely available on consumer hardware.  I know you
OCaml/F# guys are making great progress too.


Steve


More information about the Beginners mailing list