Personal tools

Benchmarks Game/Parallel/BinaryTrees

From HaskellWiki

< Benchmarks Game | Parallel(Difference between revisions)
Jump to: navigation, search
 
m (Shootout/Parallel/BinaryTrees moved to Benchmarks Game/Parallel/BinaryTrees: The name of the benchmarks site has changed)
 
(4 intermediate revisions by one user not shown)
Line 3: Line 3:
 
* http://shootout.alioth.debian.org/u64q/benchmark.php?test=binarytrees&lang=ghc&id=1
 
* http://shootout.alioth.debian.org/u64q/benchmark.php?test=binarytrees&lang=ghc&id=1
   
=== Parallel Strategies: parMap ==
+
=== 2009-03-01: Current Entry ===
  +
  +
Submitted: http://alioth.debian.org/tracker/index.php?func=detail&aid=311523&group_id=30402&atid=411646
  +
  +
Also filed a bug ticket with GHC to find out if the GC growth strategy can be improved (so that -H240M isn't required): http://hackage.haskell.org/trac/ghc/ticket/3061
  +
  +
<haskell>
  +
{-# OPTIONS -funbox-strict-fields #-}
  +
{-# LANGUAGE BangPatterns #-}
  +
--
  +
-- The Computer Language Benchmarks Game
  +
-- http://shootout.alioth.debian.org/
  +
--
  +
-- Contributed by Don Stewart
  +
-- Modified by Stephen Blackheath to parallelize (a very tiny tweak)
  +
--
  +
-- Compile with:
  +
--
  +
-- > ghc -O2 -fasm -threaded --make
  +
--
  +
-- Run with:
  +
--
  +
-- > ./A +RTS -N4 -H300M -RTS 20
  +
--
  +
-- Where '4' is the number of cores. and "set your -H value high (3 or
  +
-- more times the maximum residency)", as per GHC User's Guide:
  +
--
  +
-- <http://haskell.org/ghc/docs/6.10.1/html/users_guide/runtime-control.html#rts-options-gc>
  +
--
  +
-- -H "provides a “suggested heap size” for the garbage collector. The
  +
-- garbage collector will use about this much memory until the program
  +
-- residency grows and the heap size needs to be expanded to retain
  +
-- reasonable performance."
  +
--
  +
  +
import System
  +
import Data.Bits
  +
import Text.Printf
  +
import Control.Parallel.Strategies
  +
  +
--
  +
-- an artificially strict tree.
  +
--
  +
-- normally you would ensure the branches are lazy, but this benchmark
  +
-- requires strict allocation.
  +
--
  +
data Tree = Nil | Node !Int !Tree !Tree
  +
  +
minN = 4
  +
  +
io s n t = printf "%s of depth %d\t check: %d\n" s n t
  +
  +
main = do
  +
n <- getArgs >>= readIO . head
  +
let maxN = max (minN + 2) n
  +
stretchN = maxN + 1
  +
  +
-- stretch memory tree
  +
let c = check (make 0 stretchN)
  +
io "stretch tree" stretchN c
  +
  +
-- allocate a long lived tree
  +
let !long = make 0 maxN
  +
  +
-- allocate, walk, and deallocate many bottom-up binary trees
  +
let vs = parMap rnf id $ depth minN maxN
  +
mapM_ (\((m,d,i)) -> io (show m ++ "\t trees") d i) vs
  +
  +
-- confirm the the long-lived binary tree still exists
  +
io "long lived tree" maxN (check long)
  +
  +
-- generate many trees
  +
depth :: Int -> Int -> [(Int,Int,Int)]
  +
depth d m
  +
| d <= m = (2*n,d,sumT d n 0) : depth (d+2) m
  +
| otherwise = []
  +
where n = 1 `shiftL` (m - d + minN)
  +
  +
-- allocate and check lots of trees
  +
sumT :: Int -> Int -> Int -> Int
  +
sumT d 0 t = t
  +
sumT d i t = sumT d (i-1) (t + a + b)
  +
where a = check (make i d)
  +
b = check (make (-i) d)
  +
  +
-- traverse the tree, counting up the nodes
  +
check :: Tree -> Int
  +
check Nil = 0
  +
check (Node i l r) = i + check l - check r
  +
  +
-- build a tree
  +
make :: Int -> Int -> Tree
  +
make i 0 = Node i Nil Nil
  +
make i d = Node i (make (i2-1) d2) (make i2 d2)
  +
where i2 = 2*i; d2 = d-1
  +
  +
</haskell>
  +
  +
=== Parallel Strategies: parMap ===
   
 
* Status: submitted.
 
* Status: submitted.
Line 9: Line 9:
 
Flags:
 
Flags:
   
{{{
+
$ ghc -O2 --make -fasm -threaded Parallel.hs
$ ghc -O2 --make -fasm -threaded Parallel.hs
+
$ ./Parallel 20 +RTS -N5 -A350M
$ ./Parallel 20 +RTS -N5 -A350M
+
}}}
+
This is a version of the Haskell GHC binary-trees benchmark, annotated for parallelism, using parallel strategy combinators.
  +
When compiled with the -threaded flag, and run with +RTS -N5 -RTS, it will exploit all cores on the quad-core machine,
  +
dramatically reducing running times.
  +
  +
On my quad core, running time goes from,
  +
  +
* single core, 26.997s
  +
* quad core, 5.692s
  +
  +
The following flags should be used:
  +
  +
Compile time:
  +
  +
ghc -O2 -fasm --make Parallel2.hs -threaded
  +
  +
Runtime:
  +
  +
./Parallel2 20 +RTS -N5 -A350M -RTS
  +
  +
The -N5 flag asks the Haskell runtime to use 5 capabilites, which map onto the underlying cores.
  +
  +
Here is the result on my quad core,
  +
  +
$ time ./Parallel2 20 +RTS -N5 -A350M -RTS
  +
stretch tree of depth 21 check: -1
  +
2097152 trees of depth 4 check: -2097152
  +
524288 trees of depth 6 check: -524288
  +
131072 trees of depth 8 check: -131072
  +
32768 trees of depth 10 check: -32768
  +
8192 trees of depth 12 check: -8192
  +
2048 trees of depth 14 check: -2048
  +
512 trees of depth 16 check: -512
  +
128 trees of depth 18 check: -128
  +
32 trees of depth 20 check: -32
  +
long lived tree of depth 20 check: -1
  +
./Parallel2 20 +RTS -N5 -A350M -RTS 15.80s user 1.52s system 304% cpu 5.692 total
  +
  +
Which is a satisfying result, as the parallelisation strategy is super simple.
  +
  +
  +
Code:
   
 
<haskell>
 
<haskell>

Latest revision as of 22:26, 22 January 2012

[edit] 1 Binary Trees

[edit] 1.1 2009-03-01: Current Entry

Submitted: http://alioth.debian.org/tracker/index.php?func=detail&aid=311523&group_id=30402&atid=411646

Also filed a bug ticket with GHC to find out if the GC growth strategy can be improved (so that -H240M isn't required): http://hackage.haskell.org/trac/ghc/ticket/3061

{-# OPTIONS -funbox-strict-fields #-}
{-# LANGUAGE BangPatterns #-}
--
-- The Computer Language Benchmarks Game
-- http://shootout.alioth.debian.org/
--
-- Contributed by Don Stewart
-- Modified by Stephen Blackheath to parallelize (a very tiny tweak)
--
-- Compile with:
--
-- >    ghc -O2 -fasm -threaded --make 
--
-- Run with:
--
-- >    ./A +RTS -N4 -H300M -RTS 20
-- 
-- Where '4' is the number of cores. and "set your -H value high (3 or
-- more times the maximum residency)", as per GHC User's Guide:
--
--  <http://haskell.org/ghc/docs/6.10.1/html/users_guide/runtime-control.html#rts-options-gc>
--
-- -H "provides a “suggested heap size” for the garbage collector. The
-- garbage collector will use about this much memory until the program
-- residency grows and the heap size needs to be expanded to retain
-- reasonable performance."
--
 
import System
import Data.Bits
import Text.Printf
import Control.Parallel.Strategies
 
--
-- an artificially strict tree.
--
-- normally you would ensure the branches are lazy, but this benchmark
-- requires strict allocation.
--
data Tree = Nil | Node !Int !Tree !Tree
 
minN = 4
 
io s n t = printf "%s of depth %d\t check: %d\n" s n t
 
main = do
    n <- getArgs >>= readIO . head
    let maxN     = max (minN + 2) n
        stretchN = maxN + 1
 
    -- stretch memory tree
    let c = check (make 0 stretchN)
    io "stretch tree" stretchN c
 
    -- allocate a long lived tree
    let !long    = make 0 maxN
 
    -- allocate, walk, and deallocate many bottom-up binary trees
    let vs = parMap rnf id $ depth minN maxN
    mapM_ (\((m,d,i)) -> io (show m ++ "\t trees") d i) vs
 
    -- confirm the the long-lived binary tree still exists
    io "long lived tree" maxN (check long)
 
-- generate many trees
depth :: Int -> Int -> [(Int,Int,Int)]
depth d m
    | d <= m    = (2*n,d,sumT d n 0) : depth (d+2) m
    | otherwise = []
  where n = 1 `shiftL` (m - d + minN)
 
-- allocate and check lots of trees
sumT :: Int -> Int -> Int -> Int
sumT d 0 t = t
sumT  d i t = sumT d (i-1) (t + a + b)
  where a = check (make i    d)
        b = check (make (-i) d)
 
-- traverse the tree, counting up the nodes
check :: Tree -> Int
check Nil          = 0
check (Node i l r) = i + check l - check r
 
-- build a tree
make :: Int -> Int -> Tree
make i 0 = Node i Nil Nil
make i d = Node i (make (i2-1) d2) (make i2 d2)
  where i2 = 2*i; d2 = d-1

[edit] 1.2 Parallel Strategies: parMap

  • Status: submitted.

Flags:

   $ ghc -O2 --make -fasm -threaded  Parallel.hs
   $ ./Parallel 20 +RTS -N5 -A350M

This is a version of the Haskell GHC binary-trees benchmark, annotated for parallelism, using parallel strategy combinators. When compiled with the -threaded flag, and run with +RTS -N5 -RTS, it will exploit all cores on the quad-core machine, dramatically reducing running times.

On my quad core, running time goes from,

* single core, 26.997s
* quad core, 5.692s

The following flags should be used:

Compile time:

  ghc -O2 -fasm --make Parallel2.hs -threaded

Runtime:

  ./Parallel2 20 +RTS -N5 -A350M -RTS

The -N5 flag asks the Haskell runtime to use 5 capabilites, which map onto the underlying cores.

Here is the result on my quad core,

   $ time ./Parallel2 20 +RTS -N5 -A350M -RTS
  stretch tree of depth 21	 check: -1
  2097152	 trees of depth 4	 check: -2097152
  524288	 trees of depth 6	 check: -524288
  131072	 trees of depth 8	 check: -131072
  32768	 trees of depth 10	 check: -32768
  8192	 trees of depth 12	 check: -8192
  2048	 trees of depth 14	 check: -2048
  512	 trees of depth 16	 check: -512
  128	 trees of depth 18	 check: -128
  32	 trees of depth 20	 check: -32
  long lived tree of depth 20	 check: -1
  ./Parallel2 20 +RTS -N5 -A350M -RTS  15.80s user 1.52s system 304% cpu 5.692 total

Which is a satisfying result, as the parallelisation strategy is super simple.


Code:

{-# OPTIONS -fbang-patterns -funbox-strict-fields #-}
--
-- The Computer Language Shootout
-- http://shootout.alioth.debian.org/
--
-- Contributed by Don Stewart and Thomas Davie
--
-- This implementation uses a parallel strategy to exploit the quad core machine.
-- For more information about Haskell parallel strategies, see,
--
--  http://www.macs.hw.ac.uk/~dsg/gph/papers/html/Strategies/strategies.html
--
 
import System
import Data.Bits
import Text.Printf
import Control.Parallel.Strategies
import Control.Parallel
 
--
-- an artificially strict tree.
--
-- normally you would ensure the branches are lazy, but this benchmark
-- requires strict allocation.
--
data Tree = Nil | Node !Int !Tree !Tree
 
minN = 4
 
io s n t = printf "%s of depth %d\t check: %d\n" s n t
 
main = do
    n <- getArgs >>= readIO . head
    let maxN     = max (minN + 2) n
        stretchN = maxN + 1
 
    -- stretch memory tree
    let c = check (make 0 stretchN)
    io "stretch tree" stretchN c
 
    -- allocate a long lived tree
    let !long    = make 0 maxN
 
    -- allocate, walk, and deallocate many bottom-up binary trees
    let vs = (parMap rnf) (depth' maxN) [minN,minN+2..maxN]
    mapM_ (\((m,d,i)) -> io (show m ++ "\t trees") d i) vs
 
    -- confirm the the long-lived binary tree still exists
    io "long lived tree" maxN (check long)
 
-- generate many trees
depth' :: Int -> Int -> (Int,Int,Int)
depth' m d =
  (2*n,d,sumT d n 0)
  where
    n = 1 `shiftL` (m - d + minN)
 
-- allocate and check lots of trees
sumT :: Int -> Int -> Int -> Int
sumT d 0 t = t
sumT  d i t = sumT d (i-1) (t + a + b)
  where a = check (make i    d)
        b = check (make (-i) d)
 
-- traverse the tree, counting up the nodes
check :: Tree -> Int
check Nil          = 0
check (Node i l r) = i + check l - check r
 
-- build a tree
make :: Int -> Int -> Tree
make i 0 = Node i Nil Nil
make i d = Node i (make (i2-1) d2) (make i2 d2)
  where i2 = 2*i; d2 = d-1