Personal tools

Haskell for multicores

From HaskellWiki

(Difference between revisions)
Jump to: navigation, search
(MVar example)
(Lock-free synchronisation)
 
(19 intermediate revisions by 6 users not shown)
Line 1: Line 1:
[[Category:Parallelism]]
+
[[Category:Parallel]]
   
 
[http://haskell.org/ghc GHC Haskell] comes with a large set of libraries and tools for building programs that exploit [http://en.wikipedia.org/wiki/Multi-core_(computing) multicore architectures].
 
[http://haskell.org/ghc GHC Haskell] comes with a large set of libraries and tools for building programs that exploit [http://en.wikipedia.org/wiki/Multi-core_(computing) multicore architectures].
Line 15: Line 15:
   
 
<haskell>
 
<haskell>
import Control.Parallel
+
import Control.Parallel
import Control.Monad
+
import Control.Monad
import Text.Printf
+
import Text.Printf
   
cutoff = 35
+
cutoff = 35
   
fib' :: Int -> Integer
+
fib' :: Int -> Integer
fib' 0 = 0
+
fib' 0 = 0
fib' 1 = 1
+
fib' 1 = 1
fib' n = fib' (n-1) + fib' (n-2)
+
fib' n = fib' (n-1) + fib' (n-2)
   
fib :: Int -> Integer
+
fib :: Int -> Integer
fib n | n < cutoff = fib' n
+
fib n | n < cutoff = fib' n
| otherwise = r `par` (l `pseq` l + r)
+
| otherwise = r `par` (l `pseq` l + r)
where
+
where
l = fib (n-1)
+
l = fib (n-1)
r = fib (n-2)
+
r = fib (n-2)
   
main = forM_ [0..45] $ \i ->
+
main = forM_ [0..45] $ \i ->
printf "n=%d => %d\n" i (fib i)
+
printf "n=%d => %d\n" i (fib i)
 
</haskell>
 
</haskell>
   
Line 70: Line 70:
 
[http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Concurrent.html Control.Concurrent Control.Concurrent]
 
[http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Concurrent.html Control.Concurrent Control.Concurrent]
   
* forkIO
+
For explicit concurrency and/or parallelism, Haskell implementations have a light-weight thread system that schedules logical threads on the available operating system threads. These light and cheap threads can be created with forkIO. (We won't discuss full OS threads which are created via <code>forkOS</code>, as they have significantly higher overhead and are only useful in a few situations like in FFIs.)
 
'''TODO - finish'''
 
 
For explicit concurrency and/or parallelism, Haskell implementations have a light-weight thread system that schedules logical threads on the available operating system threads. These light and cheap threads can be created with forkIO. Full OS threads will not be discussed here beyond saying they pose a significantly higher overhead, but you create them using forkOS if truly needed.
 
   
 
<haskell>
 
<haskell>
Line 79: Line 79:
   
 
<haskell>
 
<haskell>
import Data.Digest.Pure.MD5 (md5)
+
import Data.Digest.Pure.MD5 (md5)
import qualified Data.ByteString.Lazy as L
+
import qualified Data.ByteString.Lazy as L
import System.Environment (getArgs)
+
import System.Environment (getArgs)
   
main = do
+
main = do
[fileA, fileB] <- getArgs
+
[fileA, fileB] <- getArgs
hashAndPrint fileA
+
hashAndPrint fileA
hashAndPrint fileB
+
hashAndPrint fileB
   
hashAndPrint f = L.readFile f >>= return . md5 >>= \h -> putStrLn (f ++ ": " ++ show h)
+
hashAndPrint f = L.readFile f >>= return . md5 >>= \h -> putStrLn (f ++ ": " ++ show h)
 
</haskell>
 
</haskell>
   
Line 96: Line 96:
   
 
<haskell>
 
<haskell>
import Control.Concurrent (forkIO)
+
import Control.Concurrent (forkIO)
import Data.Digest.Pure.MD5 (md5)
+
import Data.Digest.Pure.MD5 (md5)
import qualified Data.ByteString.Lazy as L
+
import qualified Data.ByteString.Lazy as L
import System.Environment (getArgs)
+
import System.Environment (getArgs)
   
main = do
+
main = do
[fileA,fileB] <- getArgs
+
[fileA,fileB] <- getArgs
forkIO $ hashAndPrint fileA
+
forkIO $ hashAndPrint fileA
hashAndPrint fileB
+
hashAndPrint fileB
   
hashAndPrint f = L.readFile f >>= return . md5 >>= \h -> putStrLn (f ++ ": " ++ show h)
+
hashAndPrint f = L.readFile f >>= return . md5 >>= \h -> putStrLn (f ++ ": " ++ show h)
 
</haskell>
 
</haskell>
   
Now we have a rough program with reasonable great performance boost, which is expected given the trivially parallel computation.
+
Now we have a rough program with great performance boost - which is expected given the trivially parallel computation.
   
 
But wait! You say there is a bug? Two, actually. One is that if the main thread is finished hashing fileB first, the program will exit before the child thread is done with fileA. The second is a potential for garbled output due to two threads writing to stdout. Both these problems can be solved using some inter-thread communication - we'll pick this example up in the MVar section.
 
But wait! You say there is a bug? Two, actually. One is that if the main thread is finished hashing fileB first, the program will exit before the child thread is done with fileA. The second is a potential for garbled output due to two threads writing to stdout. Both these problems can be solved using some inter-thread communication - we'll pick this example up in the MVar section.
Line 117: Line 117:
 
* [http://blog.moertel.com/articles/2004/03/13/concurrent-port-scanner-in-haskell A concurrent port scanner]
 
* [http://blog.moertel.com/articles/2004/03/13/concurrent-port-scanner-in-haskell A concurrent port scanner]
 
* [http://haskell.org/haskellwiki/Research_papers/Parallelism_and_concurrency#Concurrent_Haskell Research papers on concurrency in Haskell]
 
* [http://haskell.org/haskellwiki/Research_papers/Parallelism_and_concurrency#Concurrent_Haskell Research papers on concurrency in Haskell]
* [http://haskell.org/haskellwiki/Research_papers/Parallelism_and_concurrency#Parallel_Haskell Research papes on parallel Haskell
+
* [http://haskell.org/haskellwiki/Research_papers/Parallelism_and_concurrency#Parallel_Haskell Research papers on parallel Haskell]
   
 
== Synchronisation with locks ==
 
== Synchronisation with locks ==
   
 
[http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Concurrent-MVar.html Control.Concurrent.MVar]
 
[http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Concurrent-MVar.html Control.Concurrent.MVar]
 
* MVar
 
 
Previously in the forkIO example we developed a program to hash two files in parallel and ended with a couple small bugs because the program terminated prematurely (the main thread would exit when done). A second issue was that threads can conflict with eachothers use of stdout.
 
   
 
Locking mutable variables (MVars) can be used to great effect not only for communicating values (such as the resulting string for a single function to print) but it is also common for programmers to use their locking features as a signaling mechanism.
 
Locking mutable variables (MVars) can be used to great effect not only for communicating values (such as the resulting string for a single function to print) but it is also common for programmers to use their locking features as a signaling mechanism.
   
MVars are a polymorphic mutable variables that might or might not contain a value at any given time. This example will only use the following four functions:
+
MVars are a polymorphic mutable variables that might or might not contain a value at any given time. Common functions include:
   
 
<haskell>
 
<haskell>
  +
newMVar :: a -> IO (MVar a)
 
newEmptyMVar :: IO (MVar a)
 
newEmptyMVar :: IO (MVar a)
 
takeMVar :: MVar a -> IO a
 
takeMVar :: MVar a -> IO a
 
putMVar :: MVar a -> a -> IO ()
 
putMVar :: MVar a -> a -> IO ()
  +
isEmptyMVar :: MVar a -> IO Bool
 
</haskell>
 
</haskell>
   
While they are fairly self-explanitory it should be noted that takeMVar will block until the MVar is non-empty and putMVar will block until the current MVar is empty. Taking an MVar will leave the MVar empty when returning the value.
+
While they are fairly self-explanitory it should be noted that <hask>takeMVar</hask> will block until the MVar is non-empty and <hask>putMVar</hask> will block until the current MVar is empty. Taking an MVar will leave the MVar empty when returning the value.
   
Lets now generalize our forkIO program to operate on any number of files, block until the hashing is complete, printing all the results as they are computed but from one function so no stdout garbling occurs.
+
In the <hask>forkIO</hask> example we developed a program to hash two files in parallel and ended with a couple small bugs because the program terminated prematurely (the main thread would exit when done). A second issue was that threads can conflict with each others use of stdout.
  +
  +
Lets now generalize the example to operate on any number of files, block until the hashing is complete, and print all the results from just one thread so no stdout garbling occurs.
   
 
<haskell>
 
<haskell>
  +
{-# LANGUAGE BangPatterns #-}
 
import Data.Digest.Pure.MD5
 
import Data.Digest.Pure.MD5
 
import qualified Data.ByteString.Lazy as L
 
import qualified Data.ByteString.Lazy as L
 
import System.Environment
 
import System.Environment
 
import Control.Concurrent
 
import Control.Concurrent
  +
import Control.Monad (replicateM_)
   
 
main = do
 
main = do
Line 153: Line 153:
 
printNrResults (length files) str
 
printNrResults (length files) str
   
printNrResults 0 _ = return ()
+
printNrResults i var = replicateM_ i (takeMVar var >>= putStrLn)
printNrResults i var = do
 
s <- takeMVar var
 
putStrLn s
 
printNrResults (i - 1) var
 
   
 
hashAndPrint str f = do
 
hashAndPrint str f = do
 
bs <- L.readFile f
 
bs <- L.readFile f
putMVar str (f ++ ": " ++ show (md5 bs))
+
let !h = show $ md5 bs
  +
putMVar str (f ++ ": " ++ h)
 
</haskell>
 
</haskell>
   
We define a new variable, <hask>str</hask>, as an empty MVar. Throughout the hashing we use <hask>putMVar</hash> to report the results - this function blocks when the MVar is already full so no hashes should get dropped on account of the mutable memory. Similarly, <hask>printNrResults<hask> uses the <hask>takeMVar</hask> function which will block until the MVar is full - or once the next file is done being hashed in this case.
+
We define a new variable, <hask>str</hask>, as an empty MVar. After the hashing, the result is reported with <hask>putMVar</hask> - remember this function blocks when the MVar is already full so no hashes are dropped on account of the mutable memory. Similarly, <hask>printNrResults</hask> uses the <hask>takeMVar</hask> function which will block until the MVar is full - or once the next file is done being hashed in this case.
   
The main thread intelligently knows <hask>str</hask> will be filled <hask>length files</hask> times so after printing the given number of hash results it exists, thus terminating the program.
+
Note how the value is evaluated before the putMVar call. If the argument is an unevaluated thunk then <hask>printNrResults</hask> will have to evaluate the thunks before it prints the result and our efforts would have been worthless.
   
  +
Knowing the <hask>str</hask> MVar will be filled '<hask>length files</hask>' times we can let the main thread exit after printing the given number of results, thus terminating the program.
  +
  +
<pre>
  +
$ ghc exMVar.hs -o exMVar-threaded --make -O2 -threaded
  +
$ time ./exMVar-threaded +RTS -N2 -RTS 2GB 2GB 2GB 2GB
  +
2GB: b8f1f1faa6dda5426abffb3a7811c1fb
  +
2GB: b8f1f1faa6dda5426abffb3a7811c1fb
  +
2GB: b8f1f1faa6dda5426abffb3a7811c1fb
  +
2GB: b8f1f1faa6dda5426abffb3a7811c1fb
  +
  +
real 0m40.524s
  +
  +
  +
$ time ./exMVar-threaded +RTS -N1 -RTS 2GB 2GB 2GB 2GB
  +
2GB: b8f1f1faa6dda5426abffb3a7811c1fb
  +
2GB: b8f1f1faa6dda5426abffb3a7811c1fb
  +
2GB: b8f1f1faa6dda5426abffb3a7811c1fb
  +
2GB: b8f1f1faa6dda5426abffb3a7811c1fb
  +
  +
real 1m8.170s
  +
</pre>
   
 
=== Further reading ===
 
=== Further reading ===
Line 171: Line 192:
 
[http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Concurrent-Chan.html Control.Concurrent.Chan]
 
[http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Concurrent-Chan.html Control.Concurrent.Chan]
   
* Chan
+
For streaming data it is hard to beat the performance of channels. After declaring a channel (<hask>newChan</hask>), you can pipe data between threads (<hask>writeChan</hask>, <hask>readChan</hask>) and tee data to separate readers (<hask>dupChan</hask>). The flexibility of channels makes them useful for a wide range of communications.
   
''Todo''
+
Continuing with our hashing example, lets say the names of the files needing hashed are coming available, or need streaming for other reasons. We can fork a set of worker threads and feed them the filenames through a channel. For consistancy, the program has also been modified to communicate the result from worker to printer via a channel.
  +
  +
<haskell>
  +
{-# LANGUAGE BangPatterns #-}
  +
import Data.Digest.Pure.MD5
  +
import qualified Data.ByteString.Lazy as L
  +
import System.Environment
  +
import Control.Concurrent
  +
import Control.Concurrent.Chan
  +
import Control.Monad (forever, forM_, replicateM_)
  +
  +
nrWorkers = 2
  +
  +
main = do
  +
files <- getArgs
  +
str <- newChan
  +
fileChan <- newChan
  +
forM_ [1..nrWorkers] (\_ -> forkIO $ worker str fileChan)
  +
forM_ files (writeChan fileChan)
  +
printNrResults (length files) str
  +
  +
printNrResults i var = replicateM_ i (readChan var >>= putStrLn)
  +
  +
worker :: Chan String -> Chan String -> IO ()
  +
worker str fileChan = forever (readChan fileChan >>= hashAndPrint str)
  +
  +
hashAndPrint str f = do
  +
bs <- L.readFile f
  +
let !h = show $ md5 bs
  +
writeChan str (f ++ ": " ++ h)
  +
</haskell>
  +
  +
Notice that this has advantages: results are available incrementally, and the performance has improved with the number of parallel hash operations matching the number of cores.
   
 
=== Examples ===
 
=== Examples ===
Line 184: Line 205:
 
== Lock-free synchronisation ==
 
== Lock-free synchronisation ==
   
[http://www.haskell.org/ghc/docs/latest/html/libraries/stm/Control-Concurrent-STM.html Software Transactional Memory]
+
[http://www.haskell.org/haskellwiki/Software_transactional_memory Software Transactional Memory]
   
 
* STM
 
* STM
Line 211: Line 232:
 
== Parallelism strategies ==
 
== Parallelism strategies ==
   
[http://www.haskell.org/ghc/docs/latest/html/libraries/parallel/Control-Parallel-Strategies.html Control.Parallel]
+
[http://hackage.haskell.org/package/parallel Control.Parallel]
   
 
* Parallel, pure strategies
 
* Parallel, pure strategies

Latest revision as of 19:44, 2 October 2012


GHC Haskell comes with a large set of libraries and tools for building programs that exploit multicore architectures.

This site attempts to document all our available information on exploiting such hardware with Haskell.

Throughout, we focus on exploiting shared-memory SMP systems, with aim of lowering absolute wall clock times. The machines we target are typical 2x to 32x desktop multicore machine, on which vanilla GHC will run.

Contents


[edit] 1 Introduction

To get an idea of what we aim to do -- reduce running times by exploiting more cores -- here's a naive "hello, world" of parallel programs: parallel, naive fib. It simply tells us whether or not the SMP runtime is working:

import Control.Parallel
import Control.Monad
import Text.Printf
 
cutoff = 35
 
fib' :: Int -> Integer
fib' 0 = 0
fib' 1 = 1
fib' n = fib' (n-1) + fib' (n-2)
 
fib :: Int -> Integer
fib n | n < cutoff = fib' n
      | otherwise  = r `par` (l `pseq` l + r)
 where
    l = fib (n-1)
    r = fib (n-2)
 
main = forM_ [0..45] $ \i ->
            printf "n=%d => %d\n" i (fib i)

We compile it with the `-threaded` flag:

   $ ghc -O2 -threaded --make fib.hs
   [1 of 1] Compiling Main             ( fib.hs, fib.o )
   Linking fib ...

And run it with:

   +RTS -Nx

where 'x' is the number of cores you have (or a slightly higher value). Here, on a quad core linux system:

   ./fib +RTS -N4  76.81s user 0.75s system 351% cpu 22.059 total

So we were able to use 3.5/4 of the available cpu time. And this is typical, most problems aren't easily scalable, and we must trade off work on more cores, for more overhead with communication.

[edit] 1.1 Examples

[edit] 1.2 Further reading

[edit] 2 Thread primitives

Control.Concurrent Control.Concurrent

For explicit concurrency and/or parallelism, Haskell implementations have a light-weight thread system that schedules logical threads on the available operating system threads. These light and cheap threads can be created with forkIO. (We won't discuss full OS threads which are created via forkOS, as they have significantly higher overhead and are only useful in a few situations like in FFIs.)

    forkIO :: IO () -> IO ThreadId

Lets take a simple Haskell application that hashes two files and prints the result:

import Data.Digest.Pure.MD5 (md5)
import qualified Data.ByteString.Lazy as L
import System.Environment (getArgs)
 
main = do
    [fileA, fileB] <- getArgs
    hashAndPrint fileA
    hashAndPrint fileB
 
hashAndPrint f = L.readFile f >>= return . md5 >>= \h -> putStrLn (f ++ ": " ++ show h)

This is a straight forward solution that hashs the files one at a time printing the resulting hash to the screen. What if we wanted to use more than one processor to hash the files in parallel?

One solution is to start a new thread, hash in parallel, and print the answers as they are computed:

import Control.Concurrent (forkIO)
import Data.Digest.Pure.MD5 (md5)
import qualified Data.ByteString.Lazy as L
import System.Environment (getArgs)
 
main = do
    [fileA,fileB] <- getArgs
    forkIO $ hashAndPrint fileA
    hashAndPrint fileB
 
hashAndPrint f = L.readFile f >>= return . md5 >>= \h -> putStrLn (f ++ ": " ++ show h)

Now we have a rough program with great performance boost - which is expected given the trivially parallel computation.

But wait! You say there is a bug? Two, actually. One is that if the main thread is finished hashing fileB first, the program will exit before the child thread is done with fileA. The second is a potential for garbled output due to two threads writing to stdout. Both these problems can be solved using some inter-thread communication - we'll pick this example up in the MVar section.

[edit] 2.1 Further reading

[edit] 3 Synchronisation with locks

Control.Concurrent.MVar

Locking mutable variables (MVars) can be used to great effect not only for communicating values (such as the resulting string for a single function to print) but it is also common for programmers to use their locking features as a signaling mechanism.

MVars are a polymorphic mutable variables that might or might not contain a value at any given time. Common functions include:

    newMVar :: a -> IO (MVar a)
    newEmptyMVar :: IO (MVar a)
    takeMVar :: MVar a -> IO a
    putMVar :: MVar a -> a -> IO ()
    isEmptyMVar :: MVar a -> IO Bool
While they are fairly self-explanitory it should be noted that
takeMVar
will block until the MVar is non-empty and
putMVar
will block until the current MVar is empty. Taking an MVar will leave the MVar empty when returning the value. In the
forkIO
example we developed a program to hash two files in parallel and ended with a couple small bugs because the program terminated prematurely (the main thread would exit when done). A second issue was that threads can conflict with each others use of stdout.

Lets now generalize the example to operate on any number of files, block until the hashing is complete, and print all the results from just one thread so no stdout garbling occurs.

    {-# LANGUAGE BangPatterns #-}
    import Data.Digest.Pure.MD5
    import qualified Data.ByteString.Lazy as L
    import System.Environment
    import Control.Concurrent
    import Control.Monad (replicateM_)
 
    main = do
        files <- getArgs
        str <- newEmptyMVar
        mapM_ (forkIO . hashAndPrint str) files
        printNrResults (length files) str
 
    printNrResults i var = replicateM_ i (takeMVar var >>= putStrLn)
 
    hashAndPrint str f = do
        bs <- L.readFile f
        let !h = show $ md5 bs
        putMVar str (f ++ ": " ++ h)
We define a new variable,
str
, as an empty MVar. After the hashing, the result is reported with
putMVar
- remember this function blocks when the MVar is already full so no hashes are dropped on account of the mutable memory. Similarly,
printNrResults
uses the
takeMVar
function which will block until the MVar is full - or once the next file is done being hashed in this case. Note how the value is evaluated before the putMVar call. If the argument is an unevaluated thunk then
printNrResults
will have to evaluate the thunks before it prints the result and our efforts would have been worthless. Knowing the
str
MVar will be filled '
length files
' times we can let the main thread exit after printing the given number of results, thus terminating the program.
$ ghc exMVar.hs -o exMVar-threaded --make -O2 -threaded
$ time ./exMVar-threaded +RTS -N2 -RTS 2GB 2GB 2GB 2GB 
  2GB: b8f1f1faa6dda5426abffb3a7811c1fb
  2GB: b8f1f1faa6dda5426abffb3a7811c1fb
  2GB: b8f1f1faa6dda5426abffb3a7811c1fb
  2GB: b8f1f1faa6dda5426abffb3a7811c1fb

  real    0m40.524s


$ time ./exMVar-threaded +RTS -N1 -RTS 2GB 2GB 2GB 2GB
  2GB: b8f1f1faa6dda5426abffb3a7811c1fb
  2GB: b8f1f1faa6dda5426abffb3a7811c1fb
  2GB: b8f1f1faa6dda5426abffb3a7811c1fb
  2GB: b8f1f1faa6dda5426abffb3a7811c1fb

  real    1m8.170s

[edit] 3.1 Further reading

[edit] 4 Message passing channels

Control.Concurrent.Chan

For streaming data it is hard to beat the performance of channels. After declaring a channel (
newChan
), you can pipe data between threads (
writeChan
,
readChan
) and tee data to separate readers (
dupChan
). The flexibility of channels makes them useful for a wide range of communications.

Continuing with our hashing example, lets say the names of the files needing hashed are coming available, or need streaming for other reasons. We can fork a set of worker threads and feed them the filenames through a channel. For consistancy, the program has also been modified to communicate the result from worker to printer via a channel.

{-# LANGUAGE BangPatterns #-}
import Data.Digest.Pure.MD5
import qualified Data.ByteString.Lazy as L
import System.Environment
import Control.Concurrent
import Control.Concurrent.Chan
import Control.Monad (forever, forM_, replicateM_)
 
nrWorkers = 2
 
main = do
    files <- getArgs
    str <- newChan
    fileChan <- newChan
    forM_ [1..nrWorkers] (\_ -> forkIO $ worker str fileChan)
    forM_ files (writeChan fileChan)
    printNrResults (length files) str
 
printNrResults i var = replicateM_ i (readChan var >>= putStrLn)
 
worker :: Chan String -> Chan String -> IO ()
worker str fileChan = forever (readChan fileChan >>= hashAndPrint str)
 
hashAndPrint str f = do
        bs <- L.readFile f
        let !h = show $ md5 bs
        writeChan str (f ++ ": " ++ h)

Notice that this has advantages: results are available incrementally, and the performance has improved with the number of parallel hash operations matching the number of cores.

[edit] 4.1 Examples

[edit] 4.2 Further reading

[edit] 5 Lock-free synchronisation

Software Transactional Memory

  • STM

Todo

[edit] 5.1 Further reading

[edit] 6 Asynchronous messages

Control.Exception:asynchronous

  • Async exceptions


Todo

[edit] 6.1 Examples

[edit] 6.2 Further reading

[edit] 7 Parallelism strategies

Control.Parallel

  • Parallel, pure strategies

Todo

[edit] 7.1 Further reading

[edit] 8 Data parallel arrays

Data Parallel Arrays

Todo

[edit] 8.1 Further reading

[edit] 9 Foreign languages calls and concurrency

Non-blocking foreign calls in concurrent threads.

[edit] 10 Profiling and measurement

   +RTS -sstderr

[edit] 10.1 Further reading

Todo