Difference between revisions of "Parallelism"

From HaskellWiki
Jump to navigation Jump to search
m (swap order of pure parallelism and concurrency, put pure first)
(12 intermediate revisions by one other user not shown)
Line 1: Line 1:
  +
Parallelism is about speeding up a program by using multiple processors.
== Parallel Programming in GHC ==
 
   
  +
In Haskell we provide two ways to achieve parallelism:
This page contains notes and information about how to use parallelism in GHC to speed up pure functions in your program.
 
  +
* Pure parallelism, which can be used to speed up pure (non-IO) parts of the program.
  +
* Concurrency, which can be used for parallelising IO.
   
  +
Pure Parallelism (Control.Parallel): Speeding up a pure computation using multiple processors. Pure parallelism has these advantages:
You may be interested in [[GHC/Concurrency|concurrency]] instead, which would allow you to manage simultaneous IO actions.
 
  +
* Guaranteed deterministic (same result every time)
  +
* no [[race conditions]] or [[deadlocks]]
   
  +
[[Concurrency]] (Control.Concurrent): Multiple threads of control that execute "at the same time".
GHC provides multi-scale support for parallel programming, from very fine-grained, small "sparks", to coarse-grained explicit threads and locks (using concurrency), along with other models of parallel programming.
 
  +
* Threads are in the IO monad
  +
* IO operations from multiple threads are interleaved non-deterministically
  +
* communication between threads must be explicitly programmed
  +
* Threads may execute on multiple processors simultaneously
  +
* Dangers: [[race conditions]] and [[deadlocks]]
   
  +
Rule of thumb: use Pure Parallelism if you can, Concurrency otherwise.
* See "Real World Haskell" [http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html chapter 24], for an introduction to the most common forms of concurrent and parallel programming in GHC.
 
* A [http://donsbot.wordpress.com/2009/09/03/parallel-programming-in-haskell-a-reading-list/ reading list for parallelism in Haskell].
 
* The [http://stackoverflow.com/questions/3063652/whats-the-status-of-multicore-programming-in-haskell status of parallel and concurrent programming] in Haskell.
 
 
The parallel programming models in GHC can be divided into the following forms:
 
   
  +
== Starting points ==
* Very fine grained: parallel sparks and futures, as described in the paper "[http://www.haskell.org/~simonmar/bib/multicore-ghc-09_abstract.html Runtime Support for Multicore Haskell]"
 
* Nested data parallelism: a parallel programming model based on bulk data parallelism, in the form of the [http://www.haskell.org/haskellwiki/GHC/Data_Parallel_Haskell DPH] and [http://hackage.haskell.org/package/repa Repa] libraries for transparently parallel arrays.
 
* Intel [http://software.intel.com/en-us/blogs/2010/05/27/announcing-intel-concurrent-collections-for-haskell-01/ Concurrent Collections for Haskell]: a graph-oriented parallel programming model.
 
   
  +
* '''Control.Parallel'''. The first thing to start with parallel programming in Haskell is the use of par/pseq from the parallel library. Try the Real World Haskell [http://book.realworldhaskell.org/read/concurrent-and-multicore-programming.html chapter on parallelism and concurrency]. The parallelism-specific parts are in the second half of the chapter.
The most important (as of 2010) to get to know are implicit parallelism via sparks. If you're interested in scientific programming specifically, you may also be interested in current research on nested data parallelism in Haskell.
 
  +
* If you need more control, try Strategies or perhaps the Par monad
   
=== Starting points ===
+
== Multicore GHC ==
   
 
{{GHC/Multicore}}
* '''Nested Data Parallelism'''. For an approach to exploiting the implicit parallelism in array programs for multiprocessors, see [[GHC/Data Parallel Haskell|Data Parallel Haskell]] (work in progress).
 
   
=== Multicore GHC ===
+
== Alternative approaches ==
   
 
* Nested data parallelism: a parallel programming model based on bulk data parallelism, in the form of the [http://www.haskell.org/haskellwiki/GHC/Data_Parallel_Haskell DPH] and [http://hackage.haskell.org/package/repa Repa] libraries for transparently parallel arrays.
{{GHC/Multicore}}
 
 
* Intel [http://software.intel.com/en-us/blogs/2010/05/27/announcing-intel-concurrent-collections-for-haskell-01/ Concurrent Collections for Haskell]: a graph-oriented parallel programming model.
   
=== Related work ===
+
== See also ==
   
  +
* The [[Parallel|parallelism and concurrency portal]]
* The Sun project to improve http://ghcsparc.blogspot.com/ GHC performance on Sparc]
 
  +
* Parallel [[Parallel/Reading|reading list]]
* A [http://www.well-typed.com/blog/38 Microsoft project to improve industrial applications of GHC parallelism].
 
  +
* [[Parallel/Research|Ongoing research in Parallel Haskell]]
* [http://www.haskell.org/~simonmar/bib/bib.html Simon Marlow's publications on parallelism and GHC]
 
* [http://www.macs.hw.ac.uk/~dsg/gph/ Glasgow Parallel Haskell]
 
* [http://www.macs.hw.ac.uk/~dsg/gdh/ Glasgow Distributed Haskell]
 
* http://www-i2.informatik.rwth-aachen.de/~stolz/dhs/
 
* http://www.informatik.uni-kiel.de/~fhu/PUBLICATIONS/1999/ifl.html
 
* [http://www.mathematik.uni-marburg.de/~eden Eden]
 

Revision as of 14:25, 20 April 2011

Parallelism is about speeding up a program by using multiple processors.

In Haskell we provide two ways to achieve parallelism:

  • Pure parallelism, which can be used to speed up pure (non-IO) parts of the program.
  • Concurrency, which can be used for parallelising IO.

Pure Parallelism (Control.Parallel): Speeding up a pure computation using multiple processors. Pure parallelism has these advantages:

Concurrency (Control.Concurrent): Multiple threads of control that execute "at the same time".

  • Threads are in the IO monad
  • IO operations from multiple threads are interleaved non-deterministically
  • communication between threads must be explicitly programmed
  • Threads may execute on multiple processors simultaneously
  • Dangers: race conditions and deadlocks

Rule of thumb: use Pure Parallelism if you can, Concurrency otherwise.

Starting points

  • Control.Parallel. The first thing to start with parallel programming in Haskell is the use of par/pseq from the parallel library. Try the Real World Haskell chapter on parallelism and concurrency. The parallelism-specific parts are in the second half of the chapter.
  • If you need more control, try Strategies or perhaps the Par monad

Multicore GHC

Since 2004, GHC supports running programs in parallel on an SMP or multi-core machine. How to do it:

  • Compile your program using the -threaded switch.
  • Run the program with +RTS -N2 to use 2 threads, for example (RTS stands for runtime system; see the GHC users' guide). You should use a -N value equal to the number of CPU cores on your machine (not including Hyper-threading cores). As of GHC v6.12, you can leave off the number of cores and all available cores will be used (you still need to pass -N however, like so: +RTS -N).
  • Concurrent threads (forkIO) will run in parallel, and you can also use the par combinator and Strategies from the Control.Parallel.Strategies module to create parallelism.
  • Use +RTS -sstderr for timing stats.
  • To debug parallel program performance, use ThreadScope.

Alternative approaches

  • Nested data parallelism: a parallel programming model based on bulk data parallelism, in the form of the DPH and Repa libraries for transparently parallel arrays.
  • Intel Concurrent Collections for Haskell: a graph-oriented parallel programming model.

See also