Personal tools

Benchmarks Game/Parallel/BinaryTrees

From HaskellWiki

< Benchmarks Game | Parallel(Difference between revisions)
Jump to: navigation, search
Line 11: Line 11:
 
$ ghc -O2 --make -fasm -threaded Parallel.hs
 
$ ghc -O2 --make -fasm -threaded Parallel.hs
 
$ ./Parallel 20 +RTS -N5 -A350M
 
$ ./Parallel 20 +RTS -N5 -A350M
  +
  +
This is a version of the Haskell GHC binary-trees benchmark, annotated for parallelism, using parallel strategy combinators.
  +
When compiled with the -threaded flag, and run with +RTS -N5 -RTS, it will exploit all cores on the quad-core machine,
  +
dramatically reducing running times.
  +
  +
On my quad core, running time goes from,
  +
  +
* single core, 26.997s
  +
* quad core, 5.692s
  +
  +
The following flags should be used:
  +
  +
Compile time:
  +
  +
ghc -O2 -fasm --make Parallel2.hs -threaded
  +
  +
Runtime:
  +
  +
./Parallel2 20 +RTS -N5 -A350M -RTS
  +
  +
The -N5 flag asks the Haskell runtime to use 5 capabilites, which map onto the underlying cores.
  +
  +
Here is the result on my quad core,
  +
  +
$ time ./Parallel2 20 +RTS -N5 -A350M -RTS
  +
stretch tree of depth 21 check: -1
  +
2097152 trees of depth 4 check: -2097152
  +
524288 trees of depth 6 check: -524288
  +
131072 trees of depth 8 check: -131072
  +
32768 trees of depth 10 check: -32768
  +
8192 trees of depth 12 check: -8192
  +
2048 trees of depth 14 check: -2048
  +
512 trees of depth 16 check: -512
  +
128 trees of depth 18 check: -128
  +
32 trees of depth 20 check: -32
  +
long lived tree of depth 20 check: -1
  +
./Parallel2 20 +RTS -N5 -A350M -RTS 15.80s user 1.52s system 304% cpu 5.692 total
  +
  +
Which is a satisfying result, as the parallelisation strategy is super simple.
  +
  +
  +
Code:
   
 
<haskell>
 
<haskell>

Revision as of 23:19, 7 September 2008

1 Binary Trees

1.1 Parallel Strategies: parMap

  • Status: submitted.

Flags:

   $ ghc -O2 --make -fasm -threaded  Parallel.hs
   $ ./Parallel 20 +RTS -N5 -A350M

This is a version of the Haskell GHC binary-trees benchmark, annotated for parallelism, using parallel strategy combinators. When compiled with the -threaded flag, and run with +RTS -N5 -RTS, it will exploit all cores on the quad-core machine, dramatically reducing running times.

On my quad core, running time goes from,

* single core, 26.997s
* quad core, 5.692s

The following flags should be used:

Compile time:

  ghc -O2 -fasm --make Parallel2.hs -threaded

Runtime:

  ./Parallel2 20 +RTS -N5 -A350M -RTS

The -N5 flag asks the Haskell runtime to use 5 capabilites, which map onto the underlying cores.

Here is the result on my quad core,

   $ time ./Parallel2 20 +RTS -N5 -A350M -RTS
  stretch tree of depth 21	 check: -1
  2097152	 trees of depth 4	 check: -2097152
  524288	 trees of depth 6	 check: -524288
  131072	 trees of depth 8	 check: -131072
  32768	 trees of depth 10	 check: -32768
  8192	 trees of depth 12	 check: -8192
  2048	 trees of depth 14	 check: -2048
  512	 trees of depth 16	 check: -512
  128	 trees of depth 18	 check: -128
  32	 trees of depth 20	 check: -32
  long lived tree of depth 20	 check: -1
  ./Parallel2 20 +RTS -N5 -A350M -RTS  15.80s user 1.52s system 304% cpu 5.692 total

Which is a satisfying result, as the parallelisation strategy is super simple.


Code:

{-# OPTIONS -fbang-patterns -funbox-strict-fields #-}
--
-- The Computer Language Shootout
-- http://shootout.alioth.debian.org/
--
-- Contributed by Don Stewart and Thomas Davie
--
-- This implementation uses a parallel strategy to exploit the quad core machine.
-- For more information about Haskell parallel strategies, see,
--
--  http://www.macs.hw.ac.uk/~dsg/gph/papers/html/Strategies/strategies.html
--
 
import System
import Data.Bits
import Text.Printf
import Control.Parallel.Strategies
import Control.Parallel
 
--
-- an artificially strict tree.
--
-- normally you would ensure the branches are lazy, but this benchmark
-- requires strict allocation.
--
data Tree = Nil | Node !Int !Tree !Tree
 
minN = 4
 
io s n t = printf "%s of depth %d\t check: %d\n" s n t
 
main = do
    n <- getArgs >>= readIO . head
    let maxN     = max (minN + 2) n
        stretchN = maxN + 1
 
    -- stretch memory tree
    let c = check (make 0 stretchN)
    io "stretch tree" stretchN c
 
    -- allocate a long lived tree
    let !long    = make 0 maxN
 
    -- allocate, walk, and deallocate many bottom-up binary trees
    let vs = (parMap rnf) (depth' maxN) [minN,minN+2..maxN]
    mapM_ (\((m,d,i)) -> io (show m ++ "\t trees") d i) vs
 
    -- confirm the the long-lived binary tree still exists
    io "long lived tree" maxN (check long)
 
-- generate many trees
depth' :: Int -> Int -> (Int,Int,Int)
depth' m d =
  (2*n,d,sumT d n 0)
  where
    n = 1 `shiftL` (m - d + minN)
 
-- allocate and check lots of trees
sumT :: Int -> Int -> Int -> Int
sumT d 0 t = t
sumT  d i t = sumT d (i-1) (t + a + b)
  where a = check (make i    d)
        b = check (make (-i) d)
 
-- traverse the tree, counting up the nodes
check :: Tree -> Int
check Nil          = 0
check (Node i l r) = i + check l - check r
 
-- build a tree
make :: Int -> Int -> Tree
make i 0 = Node i Nil Nil
make i d = Node i (make (i2-1) d2) (make i2 d2)
  where i2 = 2*i; d2 = d-1