Haskell for multicores
From HaskellWiki
(Initial stab at forkIO - is it bad?) |
(MVar example) |
||
| Line 129: | Line 129: | ||
* MVar | * MVar | ||
| - | + | Previously in the forkIO example we developed a program to hash two files in parallel and ended with a couple small bugs because the program terminated prematurely (the main thread would exit when done). A second issue was that threads can conflict with eachothers use of stdout. | |
| + | |||
| + | Locking mutable variables (MVars) can be used to great effect not only for communicating values (such as the resulting string for a single function to print) but it is also common for programmers to use their locking features as a signaling mechanism. | ||
| + | |||
| + | MVars are a polymorphic mutable variables that might or might not contain a value at any given time. This example will only use the following four functions: | ||
| + | |||
| + | <haskell> | ||
| + | newEmptyMVar :: IO (MVar a) | ||
| + | takeMVar :: MVar a -> IO a | ||
| + | putMVar :: MVar a -> a -> IO () | ||
| + | </haskell> | ||
| + | |||
| + | While they are fairly self-explanitory it should be noted that takeMVar will block until the MVar is non-empty and putMVar will block until the current MVar is empty. Taking an MVar will leave the MVar empty when returning the value. | ||
| + | |||
| + | Lets now generalize our forkIO program to operate on any number of files, block until the hashing is complete, printing all the results as they are computed but from one function so no stdout garbling occurs. | ||
| + | |||
| + | <haskell> | ||
| + | import Data.Digest.Pure.MD5 | ||
| + | import qualified Data.ByteString.Lazy as L | ||
| + | import System.Environment | ||
| + | import Control.Concurrent | ||
| + | |||
| + | main = do | ||
| + | files <- getArgs | ||
| + | str <- newEmptyMVar | ||
| + | mapM_ (forkIO . hashAndPrint str) files | ||
| + | printNrResults (length files) str | ||
| + | |||
| + | printNrResults 0 _ = return () | ||
| + | printNrResults i var = do | ||
| + | s <- takeMVar var | ||
| + | putStrLn s | ||
| + | printNrResults (i - 1) var | ||
| + | |||
| + | hashAndPrint str f = do | ||
| + | bs <- L.readFile f | ||
| + | putMVar str (f ++ ": " ++ show (md5 bs)) | ||
| + | </haskell> | ||
| + | |||
| + | We define a new variable, <hask>str</hask>, as an empty MVar. Throughout the hashing we use <hask>putMVar</hash> to report the results - this function blocks when the MVar is already full so no hashes should get dropped on account of the mutable memory. Similarly, <hask>printNrResults<hask> uses the <hask>takeMVar</hask> function which will block until the MVar is full - or once the next file is done being hashed in this case. | ||
| + | |||
| + | The main thread intelligently knows <hask>str</hask> will be filled <hask>length files</hask> times so after printing the given number of hash results it exists, thus terminating the program. | ||
| + | |||
=== Further reading === | === Further reading === | ||
Revision as of 01:29, 3 September 2008
GHC Haskell comes with a large set of libraries and tools for building programs that exploit multicore architectures.
This site attempts to document all our available information on exploiting such hardware with Haskell.
Throughout, we focus on exploiting shared-memory SMP systems, with aim of lowering absolute wall clock times. The machines we target are typical 2x to 32x desktop multicore machine, on which vanilla GHC will run.
Contents |
1 Introduction
To get an idea of what we aim to do -- reduce running times by exploiting more cores -- here's a naive "hello, world" of parallel programs: parallel, naive fib. It simply tells us whether or not the SMP runtime is working:
import Control.Parallel import Control.Monad import Text.Printf cutoff = 35 fib' :: Int -> Integer fib' 0 = 0 fib' 1 = 1 fib' n = fib' (n-1) + fib' (n-2) fib :: Int -> Integer fib n | n < cutoff = fib' n | otherwise = r `par` (l `pseq` l + r) where l = fib (n-1) r = fib (n-2) main = forM_ [0..45] $ \i -> printf "n=%d => %d\n" i (fib i)
We compile it with the `-threaded` flag:
$ ghc -O2 -threaded --make fib.hs [1 of 1] Compiling Main ( fib.hs, fib.o ) Linking fib ...
And run it with:
+RTS -Nx
where 'x' is the number of cores you have (or a slightly higher value). Here, on a quad core linux system:
./fib +RTS -N4 76.81s user 0.75s system 351% cpu 22.059 total
So we were able to use 3.5/4 of the available cpu time. And this is typical, most problems aren't easily scalable, and we must trade off work on more cores, for more overhead with communication.
1.1 Examples
1.2 Further reading
- GHC's multiprocessor guide
- runtime options to enable SMP parallelism
- API documentation for paralell strategies
- Real World Haskell: Concurrent and Parallel Programming
- Blog posts about parallelism
2 Thread primitives
Control.Concurrent Control.Concurrent
- forkIO
TODO - finish
For explicit concurrency and/or parallelism, Haskell implementations have a light-weight thread system that schedules logical threads on the available operating system threads. These light and cheap threads can be created with forkIO. Full OS threads will not be discussed here beyond saying they pose a significantly higher overhead, but you create them using forkOS if truly needed.
forkIO :: IO () -> IO ThreadId
Lets take a simple Haskell application that hashes two files and prints the result:
import Data.Digest.Pure.MD5 (md5) import qualified Data.ByteString.Lazy as L import System.Environment (getArgs) main = do [fileA, fileB] <- getArgs hashAndPrint fileA hashAndPrint fileB hashAndPrint f = L.readFile f >>= return . md5 >>= \h -> putStrLn (f ++ ": " ++ show h)
This is a straight forward solution that hashs the files one at a time printing the resulting hash to the screen. What if we wanted to use more than one processor to hash the files in parallel?
One solution is to start a new thread, hash in parallel, and print the answers as they are computed:
import Control.Concurrent (forkIO) import Data.Digest.Pure.MD5 (md5) import qualified Data.ByteString.Lazy as L import System.Environment (getArgs) main = do [fileA,fileB] <- getArgs forkIO $ hashAndPrint fileA hashAndPrint fileB hashAndPrint f = L.readFile f >>= return . md5 >>= \h -> putStrLn (f ++ ": " ++ show h)
Now we have a rough program with reasonable great performance boost, which is expected given the trivially parallel computation.
But wait! You say there is a bug? Two, actually. One is that if the main thread is finished hashing fileB first, the program will exit before the child thread is done with fileA. The second is a potential for garbled output due to two threads writing to stdout. Both these problems can be solved using some inter-thread communication - we'll pick this example up in the MVar section.
2.1 Further reading
- A concurrent port scanner
- Research papers on concurrency in Haskell
- [http://haskell.org/haskellwiki/Research_papers/Parallelism_and_concurrency#Parallel_Haskell Research papes on parallel Haskell
3 Synchronisation with locks
- MVar
Previously in the forkIO example we developed a program to hash two files in parallel and ended with a couple small bugs because the program terminated prematurely (the main thread would exit when done). A second issue was that threads can conflict with eachothers use of stdout.
Locking mutable variables (MVars) can be used to great effect not only for communicating values (such as the resulting string for a single function to print) but it is also common for programmers to use their locking features as a signaling mechanism.
MVars are a polymorphic mutable variables that might or might not contain a value at any given time. This example will only use the following four functions:
newEmptyMVar :: IO (MVar a) takeMVar :: MVar a -> IO a putMVar :: MVar a -> a -> IO ()
While they are fairly self-explanitory it should be noted that takeMVar will block until the MVar is non-empty and putMVar will block until the current MVar is empty. Taking an MVar will leave the MVar empty when returning the value.
Lets now generalize our forkIO program to operate on any number of files, block until the hashing is complete, printing all the results as they are computed but from one function so no stdout garbling occurs.
import Data.Digest.Pure.MD5 import qualified Data.ByteString.Lazy as L import System.Environment import Control.Concurrent main = do files <- getArgs str <- newEmptyMVar mapM_ (forkIO . hashAndPrint str) files printNrResults (length files) str printNrResults 0 _ = return () printNrResults i var = do s <- takeMVar var putStrLn s printNrResults (i - 1) var hashAndPrint str f = do bs <- L.readFile f putMVar str (f ++ ": " ++ show (md5 bs))
3.1 Further reading
4 Message passing channels
- Chan
Todo
4.1 Examples
4.2 Further reading
5 Lock-free synchronisation
- STM
Todo
5.1 Further reading
6 Asynchronous messages
Control.Exception:asynchronous
- Async exceptions
Todo
6.1 Examples
6.2 Further reading
7 Parallelism strategies
- Parallel, pure strategies
Todo
7.1 Further reading
8 Data parallel arrays
Todo
8.1 Further reading
9 Foreign languages calls and concurrency
Non-blocking foreign calls in concurrent threads.
10 Profiling and measurement
+RTS -sstderr
10.1 Further reading
Todo
