Difference between revisions of "Cloud Haskell"

From HaskellWiki
Jump to navigation Jump to search
(Updating references to issue tracker and wiki)
(25 intermediate revisions by 7 users not shown)
Line 1: Line 1:
Cloud Haskell is a domain-specific language for developing programs for a distributed computing environment. Implemented as a shallow embedding in Haskell, it provides a message passing communication model, inspired by Erlang, without introducing incompatibility with Haskell’s established shared-memory
+
Cloud Haskell is a library for distributed concurrency in Haskell. The purpose is to make it easier to write programs for clusters of machines. It provides a message passing communication model, inspired by and very similar to that of Erlang.
concurrency.
 
   
 
== Availability ==
 
== Availability ==
   
Cloud Haskell is available from Hackage as [http://hackage.haskell.org/package/distributed-process distributed-process]. You might also want to install [http://hackage.haskell.org/package/distributed-process-simplelocalnet distributed-process-simplelocalnet]. The cutting edge development version is on [https://github.com/haskell-distributed/distributed-process github].
+
Cloud Haskell is available from Hackage as [http://hackage.haskell.org/package/distributed-process distributed-process]. You will probably also want to install a backend:
  +
  +
* The [http://hackage.haskell.org/package/distributed-process-simplelocalnet distributed-process-simplelocalnet] backend is designed to get you started and experiment with Cloud Haskell on your local machine or local network.
  +
* The [http://hackage.haskell.org/package/distributed-process-azure distributed-process-azure] backend makes it possible to run Cloud Haskell applications on Microsoft Azure virtual machines.
  +
  +
The cutting edge development version of Cloud Haskell is on [https://github.com/haskell-distributed/distributed-process github] and various resources are available via the [http://haskell-distributed.github.com website].
   
 
There is also the older prototype implementation [http://hackage.haskell.org/package/remote remote] (also available from [https://github.com/jepst/CloudHaskell github]).
 
There is also the older prototype implementation [http://hackage.haskell.org/package/remote remote] (also available from [https://github.com/jepst/CloudHaskell github]).
   
== Documentation ==
+
== Videos and Blog Posts ==
  +
 
Cloud Haskell intros
  +
 
* '''blog:''' [http://www.well-typed.com/blog/68 A Cloud Haskell Appetiser (Parallel Haskell Digest 11)]
  +
* '''video:''' ''(1hr)'' [http://skillsmatter.com/podcast/home/cloud-haskell/ac-5258 Cloud Haskell]: a general introduction and tutorial, focusing on what it does and how to use it. It also covers some details about the current implementation.
  +
* '''video:''' ''(1hr)'' [http://skillsmatter.com/podcast/home/haskell-cloud/js-4179 Towards Haskell in the Cloud]: an older but more detailed introduction by Simon Peyton Jones about the problem area and the design decisions and internals of Cloud Haskell. In particular it covers the details of how sending functions over the wire really works.
  +
* '''video:''' (''25min'') [http://www.youtube.com/watch?v=1jJ2paFuErM Cloud Haskell 2.0] A more technical overview of the new implementation (the [http://www.haskell.org/wikiupload/4/46/Hiw2012-duncan-coutts.pdf slides] are available too).
  +
* '''video:''' (''1hr'') [http://vimeo.com/53906049#t=53m0s Putting Cloud Haskell to Work] An introductory talk from the NY Haskell Users Group. ([http://gbaz.github.com/slides/cloud-11-2012.html slides] and [https://github.com/gbaz/slides/blob/gh-pages/cloud-11-2012.lhs source]).
  +
  +
 
Well-Typed have a series of blog posts "Communication Patterns in Cloud Haskell"
  +
 
* [http://www.well-typed.com/blog/71 Part 1: Master-Slave, Work-Stealing and Work-Pushing]
 
* [http://www.well-typed.com/blog/72 Part 2: Performance]
  +
* [http://www.well-typed.com/blog/73 Part 3: Map-Reduce]
  +
* [http://www.well-typed.com/blog/74 Part 4: K-Means]
  +
 
Alen Ribic has a series of blog posts about (Cloud) Haskell on the Raspberry Pi
  +
 
* [http://alenribic.com/posts/2012-08-06-running-haskell-on-raspberry-pi.html Running Haskell on Raspberry Pi]
 
* [http://alenribic.com/posts/2012-08-17-raspberry-pi-in-a-haskell-cloud.html Raspberry Pi in a Haskell Cloud]
  +
  +
Other blog posts
  +
  +
* [http://malcodigo.blogspot.com.es/2012/10/using-cloud-haskell-in-hpc-cluster.html Using Cloud Haskell in HPC Cluster] by Mal Código
  +
 
== Papers ==
  +
 
* [http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/remote.pdf Towards Haskell in the Cloud], Jeff Epstein, Andrew Black, and and Simon Peyton Jones. Haskell Symposium, Tokyo, Sept 2011.
 
* [http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/epstein-thesis.pdf Functional programming for the data centre], Jeff Epstein. Masters Thesis, University of Cambridge, 2011
  +
  +
== Documentation and Support ==
   
For an overview of Cloud Haskell it's probably a good idea to read ''Towards Haskell in the Cloud'' (details below). The relevant documentation (in order of importance is)
+
For an overview of Cloud Haskell it's probably a good idea to read ''Towards Haskell in the Cloud'' (see just above). The relevant API documentation of the <tt>distributed-process</tt> package (in order of importance) is
   
 
* [http://hackage.haskell.org/packages/archive/distributed-process/latest/doc/html/Control-Distributed-Process.html Control.Distributed.Process]
 
* [http://hackage.haskell.org/packages/archive/distributed-process/latest/doc/html/Control-Distributed-Process.html Control.Distributed.Process]
Line 24: Line 60:
 
* [http://hackage.haskell.org/packages/archive/distributed-static/latest/doc/html/Control-Distributed-Static.html Control.Distributed.Static]
 
* [http://hackage.haskell.org/packages/archive/distributed-static/latest/doc/html/Control-Distributed-Static.html Control.Distributed.Static]
   
  +
Probably the best place to ask questions is the [http://groups.google.com/group/parallel-haskell parallel-haskell google group].
== Blog Posts ==
 
   
Cloud Haskell intros
 
   
  +
== Current Status ==
* [http://www.well-typed.com/blog/68 A Cloud Haskell Appetiser (Parallel Haskell Digest 11)]
 
   
  +
The summary about the new implementation is that it exists, it works, it's on hackage, and we think it is now ready for serious experiments.
Alen Ribic has a series of blog posts about (Cloud) Haskell on the Raspberry Pi
 
   
  +
Compared to the previous prototype:
* [http://alenribic.com/writings/post/running-haskell-on-raspberry-pi Running Haskell on Raspberry Pi]
 
* [http://alenribic.com/writings/post/raspberry-pi-in-a-haskell-cloud Raspberry Pi in a Haskell Cloud]
 
   
  +
* it is much faster;
Well-Typed have a series of blog posts "Communication Patterns in Cloud Haskell"
 
  +
* it can run on [http://hackage.haskell.org/package/network-transport multiple kinds of network];
  +
* has backends to support different environments (like cluster or cloud);
  +
* has a new system for dealing with node disconnect and reconnect, and a more precisely defined semantics (see section Semantics, below)
  +
* supports [http://hackage.haskell.org/package/distributed-static composable], [http://hackage.haskell.org/package/rank1dynamic polymorphic] serialisable closures;
  +
* and internally the code is better structured and easier to work with.
   
  +
== Contributing ==
* [http://www.well-typed.com/blog/71 Part 1: Master-Slave, Work-Stealing and Work-Pushing]
 
* [http://www.well-typed.com/blog/72 Part 2: Performance]
 
* Part 3: Map-Reduce (to appear)
 
* Part 4: K-Means (to appear)
 
   
  +
We need your help! The [https://cloud-haskell.atlassian.net/issues/?jql=status%20in%20%28Open%2C%20%22In%20Progress%22%2C%20Reopened%29 issue tracker] on Jira lists all currently known issues; the [http://haskell-distributed.github.com/wiki.html wiki] generally contains more developer oriented documentation, though possibly not enough. Patches are most welcome! (Before you spent serious time on an issue it might be a good idea to add a comment to an issue with what you intend to do.)
== Papers ==
 
   
  +
In addition, if you are experimenting with Cloud Haskell and find problems, or even just areas where the documentation is unclear, please open new issues documenting those problems.
* [http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/remote.pdf Towards Haskell in the Cloud], Jeff Epstein, Andrew Black, and and Simon Peyton Jones. Haskell Symposium, Tokyo, Sept 2011.
 
  +
* [http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/epstein-thesis.pdf Functional programming for the data centre], Jeff Epstein. Masters Thesis, University of Cambridge, 2011
 
  +
There is also [https://github.com/haskell-distributed/distributed-process-platform an effort underway] to develop an OTP-like platform for Cloud Haskell. In addition to providing many of the features that OTP offers (e.g., generic managed processes, supervision trees, etc) this layer provides an implementation of the ''task layer'' described in the original paper and implemented in the remote package. There is much to do her also, and any help would be much appreciated!
  +
  +
== Semantics ==
  +
  +
[https://github.com/haskell-distributed/distributed-process/blob/master/doc/semantics/CloudHaskellSemantics.pdf Cloud Haskell Semantics] (PDF) is an draft document that gives a more precise semantics to messaging in Cloud Haskell. The semantics is based on the [http://dl.acm.org/citation.cfm?id=1863514 Unified Semantics for Future Erlang] paper, but extends it with a notion of "reconnecting" (this is described in detail in the introduction of the document).
  +
  +
The document also describes some open issues, in particular in relation to ordering of link and monitor notifications relative to regular messages (and messages sent on typed channels). Note however that Cloud Haskell backends that use the TCP network (that is, all backends currently available) do ''not'' suffer from the problems described in that section (essentially because the TCP transport maintains a single TCP connection between Cloud Haskell nodes and orders ''all'' messages sent on that one connection). However, be aware that if you take advantage of this in your code that your code may not work with Cloud Haskell backends that use more esoteric network transports.
   
 
== Other Useful Packages ==
 
== Other Useful Packages ==
Line 51: Line 94:
 
=== Serializable ===
 
=== Serializable ===
   
A core concept in Cloud Haskell is that of ''serializable'' values. The <tt>Serializable</tt> type class combines <tt>Typeable</tt> and <tt>Binary</tt>. <tt>ghc</tt> can automatically derive <tt>Typeable</tt> instances for custom data types, but you need a package to derive <tt>Binary</tt>. There are various packages available that assist with this:
+
A core concept in Cloud Haskell is that of ''serializable'' values. The <tt>Serializable</tt> type class combines <tt>Typeable</tt> and <tt>Binary</tt>. <tt>ghc</tt> can automatically derive <tt>Typeable</tt> instances for custom data types. For <tt>binary-0.6.3.0</tt> and up, <tt>ghc</tt> can also provide a <tt>Binary</tt> instance if you derive <tt>Generic</tt> and add an empty <tt>Binary</tt> instance ([http://hackage.haskell.org/packages/archive/binary/0.6.4.0/doc/html/Data-Binary.html#g:3 see here for an example]).
  +
  +
For <tt>binary</tt> versions below <tt>0.6.3.0</tt>, you need a package to derive <tt>Binary</tt>. There are various packages available that assist with this:
   
 
* [http://hackage.haskell.org/package/binary-generic binary-generic]
 
* [http://hackage.haskell.org/package/binary-generic binary-generic]
Line 59: Line 104:
   
 
<tt>binary-generic</tt> and <tt>derive</tt> have been confirmed to work with Cloud Haskell; the status of the other packages is unknown -- YMMV (please feel free to update this wiki page if you have more information).
 
<tt>binary-generic</tt> and <tt>derive</tt> have been confirmed to work with Cloud Haskell; the status of the other packages is unknown -- YMMV (please feel free to update this wiki page if you have more information).
  +
  +
== Migration from <tt>remote</tt> ==
  +
  +
Here are some suggestions that might ease the migration from the Cloud Haskell prototype <tt>remote</tt> to <tt>distributed-process</tt>.
  +
  +
* The "implicit" type of mkClosure has changed (implicit because mkClosure is a Template Haskell function). In <tt>distributed-process</tt> mkClosure takes a function of type <tt>T1 -> T2</tt> and returns a function of type <tt>T1 -> Closure T2</tt>. In other words, the first argument to your function becomes the closure environment; if you want two items in your closure environment, create a function of type <tt>(T1, T2) -> T3</tt>; if you want none, create a function of type <tt>() -> T1</tt>.
  +
  +
* <tt>distributed-process</tt> follows the naming conventions in ''Towards Haskell in the Cloud'' rather than in <tt>remote</tt> so the functions that deal with typed channels are called <tt>sendChan</tt>, <tt>receiveChan</tt> and <tt>newChan</tt> instead of <tt>sendChannel</tt>, <tt>receiveChannel</tt> and <tt>newChannel</tt>.
  +
  +
* <tt>sendChan</tt>, <tt>receiveChan</tt> (and <tt>send</tt>) never fail in <tt>distributed-process</tt> (in <tt>remote</tt> they might throw a <tt>TransmitException</tt>). Instead, if you want to be notified of communication failure, you need to use <tt>monitor</tt> or <tt>link</tt>.
  +
  +
* The function <tt>forkProcess</tt> in <tt>remote</tt> is called <tt>spawnLocal</tt> in <tt>distributed-process</tt>
  +
  +
* The <tt>Process</tt> monad is called <tt>Process</tt> in <tt>distributed-process</tt> (rather than <tt>ProcessM</tt>). Similarly, the type <tt>Match</tt> replaces <tt>MatchM</tt> (and is no longer a monad).
  +
  +
* Initialization is different. See the documentation of of the [http://hackage.haskell.org/packages/archive/distributed-process-simplelocalnet/latest/doc/html/Control-Distributed-Process-Backend-SimpleLocalnet.html Control.Distributed.Process.SimpleLocalnet] to get started (note that the master/slave distinction in SimpleLocalnet is optional and does not need to be used).
  +
  +
* Peer discovery is different. The functions <tt>getPeers</tt> and <tt>nameQuery</tt> are no longer available. The function <tt>findPeers</tt> from <tt>SimpleLocalnet</tt> replaces some, but not all, of the functionality of <tt>getPeers</tt>. You can use <tt>whereisRemoteAsync</tt> to find processes that have been registered by name.

Revision as of 19:29, 30 March 2013

Cloud Haskell is a library for distributed concurrency in Haskell. The purpose is to make it easier to write programs for clusters of machines. It provides a message passing communication model, inspired by and very similar to that of Erlang.

Availability

Cloud Haskell is available from Hackage as distributed-process. You will probably also want to install a backend:

The cutting edge development version of Cloud Haskell is on github and various resources are available via the website.

There is also the older prototype implementation remote (also available from github).

Videos and Blog Posts

Cloud Haskell intros


Well-Typed have a series of blog posts "Communication Patterns in Cloud Haskell"

Alen Ribic has a series of blog posts about (Cloud) Haskell on the Raspberry Pi

Other blog posts

Papers

Documentation and Support

For an overview of Cloud Haskell it's probably a good idea to read Towards Haskell in the Cloud (see just above). The relevant API documentation of the distributed-process package (in order of importance) is

and

If you want to know more details about Closure or Static (without the Template Haskell magic on top) you might want to read

Probably the best place to ask questions is the parallel-haskell google group.


Current Status

The summary about the new implementation is that it exists, it works, it's on hackage, and we think it is now ready for serious experiments.

Compared to the previous prototype:

  • it is much faster;
  • it can run on multiple kinds of network;
  • has backends to support different environments (like cluster or cloud);
  • has a new system for dealing with node disconnect and reconnect, and a more precisely defined semantics (see section Semantics, below)
  • supports composable, polymorphic serialisable closures;
  • and internally the code is better structured and easier to work with.

Contributing

We need your help! The issue tracker on Jira lists all currently known issues; the wiki generally contains more developer oriented documentation, though possibly not enough. Patches are most welcome! (Before you spent serious time on an issue it might be a good idea to add a comment to an issue with what you intend to do.)

In addition, if you are experimenting with Cloud Haskell and find problems, or even just areas where the documentation is unclear, please open new issues documenting those problems.

There is also an effort underway to develop an OTP-like platform for Cloud Haskell. In addition to providing many of the features that OTP offers (e.g., generic managed processes, supervision trees, etc) this layer provides an implementation of the task layer described in the original paper and implemented in the remote package. There is much to do her also, and any help would be much appreciated!

Semantics

Cloud Haskell Semantics (PDF) is an draft document that gives a more precise semantics to messaging in Cloud Haskell. The semantics is based on the Unified Semantics for Future Erlang paper, but extends it with a notion of "reconnecting" (this is described in detail in the introduction of the document).

The document also describes some open issues, in particular in relation to ordering of link and monitor notifications relative to regular messages (and messages sent on typed channels). Note however that Cloud Haskell backends that use the TCP network (that is, all backends currently available) do not suffer from the problems described in that section (essentially because the TCP transport maintains a single TCP connection between Cloud Haskell nodes and orders all messages sent on that one connection). However, be aware that if you take advantage of this in your code that your code may not work with Cloud Haskell backends that use more esoteric network transports.

Other Useful Packages

Serializable

A core concept in Cloud Haskell is that of serializable values. The Serializable type class combines Typeable and Binary. ghc can automatically derive Typeable instances for custom data types. For binary-0.6.3.0 and up, ghc can also provide a Binary instance if you derive Generic and add an empty Binary instance (see here for an example).

For binary versions below 0.6.3.0, you need a package to derive Binary. There are various packages available that assist with this:

binary-generic and derive have been confirmed to work with Cloud Haskell; the status of the other packages is unknown -- YMMV (please feel free to update this wiki page if you have more information).

Migration from remote

Here are some suggestions that might ease the migration from the Cloud Haskell prototype remote to distributed-process.

  • The "implicit" type of mkClosure has changed (implicit because mkClosure is a Template Haskell function). In distributed-process mkClosure takes a function of type T1 -> T2 and returns a function of type T1 -> Closure T2. In other words, the first argument to your function becomes the closure environment; if you want two items in your closure environment, create a function of type (T1, T2) -> T3; if you want none, create a function of type () -> T1.
  • distributed-process follows the naming conventions in Towards Haskell in the Cloud rather than in remote so the functions that deal with typed channels are called sendChan, receiveChan and newChan instead of sendChannel, receiveChannel and newChannel.
  • sendChan, receiveChan (and send) never fail in distributed-process (in remote they might throw a TransmitException). Instead, if you want to be notified of communication failure, you need to use monitor or link.
  • The function forkProcess in remote is called spawnLocal in distributed-process
  • The Process monad is called Process in distributed-process (rather than ProcessM). Similarly, the type Match replaces MatchM (and is no longer a monad).
  • Initialization is different. See the documentation of of the Control.Distributed.Process.SimpleLocalnet to get started (note that the master/slave distinction in SimpleLocalnet is optional and does not need to be used).
  • Peer discovery is different. The functions getPeers and nameQuery are no longer available. The function findPeers from SimpleLocalnet replaces some, but not all, of the functionality of getPeers. You can use whereisRemoteAsync to find processes that have been registered by name.