[Haskell-cafe] Parallel term reduction

Mon Feb 2 04:15:08 EST 2009

I spent four hours investigating this problem!  Thank you very much for the
excellent brainfood, and challenging Haskell's claim to be rawkin' at
parallelism.  I think, though it took much experimentation, that I have
confirmed that it is :-)

On Sun, Feb 1, 2009 at 9:26 PM, John D. Ramsdell <ramsdell0 at gmail.com>
wrote:
>
> I have a reduction system in which a rule takes a term and returns a
> set of terms.
> The reduction system creates a tree that originates at a starting
> value called the root.
> For most problems, the reduction system terminates, but a step count
> limit protects
> from non-termination.

That's typically a bad idea.  Instead, use laziness to protect from
nontermination.  For example, in this case, we can output a collection of
items lazily, and then take a finite amount of the output (or check whether
the output is longer than some length), without having to evaluate all of
it.

Here's my writeup of my solution, in literate Haskell.  It doesn't output
the exact same structure as yours, but hopefully you can see how to tweak it
to do so.

> {-# LANGUAGE RankNTypes #-}
>
> *import* *qualified* Data.MemoCombinators *as* Memo
> *import* *qualified* Data.Set *as* Set
> *import* Control.Parallel (par)
> *import* *qualified* Control.Parallel.Strategies *as* Par
> *import* Data.Monoid (Monoid(..))
> *import* Control.Monad.State
> *import* *qualified* Data.DList *as* DList
> *import* Debug.Trace
> *import* Control.Concurrent

First, I want to capture the idea of a generative set like you're doing.
GenSet is like a set, with the constructor "genset x xs" which says "if x is
in the set, then so are xs". I'll represent it as a stateful computation of
the list of things we've seen so far, returning the list of things we've
seen so far. It's redundant information, but sets can't be consumed lazily,
thus the list (the set will follow along lazily :-). Remember that State s a
is just the function (s -> (s,a)). So we're taking the set of things we've
seen so far, and returning the new elements added and the set unioned with
those elements.

> *newtype* GenSet a
>       = GenSet (State (Set.Set a) (DList.DList a))
>
> genset :: (Ord a) => a -> GenSet a -> GenSet a
> genset x (GenSet f) = GenSet $ *do*
>     seen <- gets (x `Set.member`)
>     *if* seen
>         *then* return mempty
>         *else* fmap (DList.cons x) $
>                    modify (Set.insert x) >> f
>
> toList :: GenSet a -> [a]
> toList (GenSet f) = DList.toList $ evalState f Set.empty

GenSet is a monoid, where mappend is just union.

> *instance* (Ord a) => Monoid (GenSet a) *where*
>     mempty = GenSet (return mempty)
>     mappend (GenSet a) (GenSet b) =
>                  GenSet (liftM2 mappend a b)

Okay, so that's how we avoid exponential behavior when traversing the tree.
We can now just toss around GenSets like they're sets and everything will be
peachy. Here's the heart of the algorithm: the reduce function. To avoid
recomputation of rules, we could just memoize the rule function. But we'll
do something a little more clever. The function we'll memoize ("parf") first
sparks a thread computing its *last* child. Because the search is
depth-first, it will typically be a while until we get to the last one, so
we benefit from the spark (you don't want to spark a thread computing
something you're about to compute anyway).

> reduce :: (Ord a) => Memo.Memo a -> (a -> [a]) -> a -> [a]
> reduce memo f x = toList (makeSet x)
>     *where*
>     makeSet x = genset x . mconcat . map makeSet . f' $ x
>     f' = memo parf
>     parf a = *let* ch = f a *in*
>              ch `seq` (f' (last ch) `par` ch)

The ch `seq` is there so that the evaluation of ch and last ch aren't
competing with each other. Your example had a few problems. You said the
rule was supposed to be expensive, but yours was cheap. Also, [x-1,x-2,x-3]
are all very near each other, so it's hard to go do unrelated stuff. I made
a fake expensive function before computing the neighbors, and tossed around
some prime numbers to scatter the space more.

> rule :: Int -> [Int]
> rule n = expensive `seq`
>            [next 311 4, next 109 577, next 919 353]
>     *where*
>     next x y = (x * n + y) `mod` 5000
>     expensive = sum [1..50*n]
>
> main :: IO ()
> main = *do*
>     *let* r = reduce Memo.integral rule 1
>     print (length r)

The results are quite promising: % ghc --make -O2 rules2 -threaded % time
./rules2 5000 ./rules2 13.25s user 0.08s system 99% cpu 13.396 total % time
./rules2 +RTS -N2 5000 ./rules2 +RTS -N2 12.52s user 0.30s system 159% cpu
8.015 total That's 40% decrease in running time! Woot! I'd love to see what
it does on a machine with more than 2 cores.

Enjoy!
Luke
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20090202/eea28e20/attachment-0001.htm