Personal tools

Yhc/Erlang/Proof of concept

From HaskellWiki

< Yhc(Difference between revisions)
Jump to: navigation, search
(template)
(a better title for the last section ;))
 
(17 intermediate revisions by 2 users not shown)
Line 3: Line 3:
   
 
==Introduction==
 
==Introduction==
This Wiki article describes an experiment targeting execution of Haskell programs on top of the Erlang Virtual Machine (BEAM). Haskell source code is compiled to Yhc Core with York Haskell Compiler (Yhc), then the program further discussed converts Yhc Core to Core Erlang; finally Erlang Compiler (erlc) compiles Core Erlang to the BEAM file format which can be loaded and executed by the Erlang VM.
+
This Wiki article describes an experiment targeting execution of Haskell programs on top of the Erlang Virtual Machine (BEAM). Haskell source code is compiled to Yhc Core with York Haskell Compiler (Yhc), next the program further discussed converts Yhc Core to Core Erlang; finally Erlang Compiler (erlc) compiles Core Erlang to the BEAM file format which can be loaded and executed by the Erlang VM.
   
 
There have been numerous discussions about Haskell (mainly GHC) runtime lacking some properties that are available in Erlang environment, as well as about possible improvements in Erlang language syntax and type system to bring some elements available in Haskell.
 
There have been numerous discussions about Haskell (mainly GHC) runtime lacking some properties that are available in Erlang environment, as well as about possible improvements in Erlang language syntax and type system to bring some elements available in Haskell.
   
 
This experiment is an attempt to answer the criticism from both sides. Once it becomes possible to execute Haskell programs in Erlang environment, Haskell users get access to the robust concurrency-oriented runtime, still being able to use Haskell native syntax. Erlang users get possibility to develop some algorithms with regard to the Haskell strong type system, while still being able to code directly in Erlang, where it seems more appropriate implementation-wise.
 
This experiment is an attempt to answer the criticism from both sides. Once it becomes possible to execute Haskell programs in Erlang environment, Haskell users get access to the robust concurrency-oriented runtime, still being able to use Haskell native syntax. Erlang users get possibility to develop some algorithms with regard to the Haskell strong type system, while still being able to code directly in Erlang, where it seems more appropriate implementation-wise.
  +
  +
==Similar projects==
  +
[http://blog.tornkvist.org/blog.yaws?id=1190846785574003 '''Haskerl''']: converts a subset of Haskell syntax to Core Erlang. Pieces of Haskerl's code were used in the course of this experiment.
  +
  +
[http://forum.trapexit.org/viewtopic.php?p=40268 '''Lisp Flavoured Erlang'''] (LFE): "a lisp syntax front-end to the Erlang compiler"
  +
  +
[http://code.google.com/p/erlyjs/ '''Javascript Flavoured Erlang'''] (ErlyJs): "a Javascript compiler running on and compiling for the Erlang virtual machine"
  +
  +
[http://code.google.com/p/erlocaml/ '''A tight bridge between Erlang and OCaml'''] (erlocaml): "this binding could be a way to use Ocaml in distributed systems, where Erlang is used as a "systems glue" for supervision, load balancing, replication, etc."
   
 
==Implementation details==
 
==Implementation details==
Line 19: Line 28:
 
Core Erlang plays the same role in the Erlang compilation process as [[Yhc/API/Core|Yhc Core]] in the Haskell (Yhc) compilation process. So, it turns out to be the most convenient to do the conversion between these formats rather than between e. g. Haskell source and Erlang source.
 
Core Erlang plays the same role in the Erlang compilation process as [[Yhc/API/Core|Yhc Core]] in the Haskell (Yhc) compilation process. So, it turns out to be the most convenient to do the conversion between these formats rather than between e. g. Haskell source and Erlang source.
   
There was an [http://blog.tornkvist.org/blog.yaws?id=1190846785574003 attempt] made earlier to do similar things, only converting from Haskell source (indeed, a subset of Haskell syntax) to Core Erlang. This project is called '''Haskerl''', developed by Torbjörn Törnkvist, (not to be confused with [http://www.dcs.gla.ac.uk/~partain/haskerl.html Will Partain's Haskerl]. Some [http://www.tornkvist.org/darcsweb source code] from Haskerl was used in this experiment, in particular, the algebraic data type to represent Core Erlang internally, and the pretty printer module for Core Erlang, both with some necessary extensions.
+
There was an [http://blog.tornkvist.org/blog.yaws?id=1190846785574003 attempt] made earlier to do similar things, only converting from Haskell source (indeed, a subset of Haskell syntax) to Core Erlang. This project is called '''Haskerl''', developed by Torbjörn Törnkvist, (not to be confused with [http://www.dcs.gla.ac.uk/~partain/haskerl.html Will Partain's Haskerl]). Some [http://www.tornkvist.org/darcsweb source code] from Haskerl was used in this experiment, in particular, the definition of an algebraic data type to represent Core Erlang internally, and the pretty printer module for Core Erlang, both with some necessary extensions.
   
 
The whole compilation chain looks like this:
 
The whole compilation chain looks like this:
   
 
# Haskell source modules are compiled into Yhc Core and linked;
 
# Haskell source modules are compiled into Yhc Core and linked;
# Some overall program optimizations (functionality provided with the Yhc Core library) are performed on the linked Yhc Core;
+
# Some overall program optimizations (functionality provided with the [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/yhccore Yhc Core library]) are performed on the linked Yhc Core;
 
# Yhc Core is converted to Core Erlang;
 
# Yhc Core is converted to Core Erlang;
# The [http://www.tornkvist.org/darcsweb Erlang compiler] erlc produces a BEAM file.
+
# The [http://linux.die.net/man/1/erlc Erlang compiler] erlc produces a BEAM file.
   
 
===Haskell on BEAMs ;)===
 
===Haskell on BEAMs ;)===
 
From the Erlang VM standpoint, a Haskell program is just a large(ish) Erlang module. To run the program, certain function exported from that module (usually, <hask>main</hask>) needs to be called with arguments as necessary. Often the special <code>force</code> function has to be applied to values returned from Haskell-originated module: otherwise some unfinished computation may be returned instead of the expected result.
 
From the Erlang VM standpoint, a Haskell program is just a large(ish) Erlang module. To run the program, certain function exported from that module (usually, <hask>main</hask>) needs to be called with arguments as necessary. Often the special <code>force</code> function has to be applied to values returned from Haskell-originated module: otherwise some unfinished computation may be returned instead of the expected result.
   
From Haskell program standpoint, Erlang VM is just an execution environment providing system calls that are strict on all their arguments, and may have variable number of arguments. Haskell program may spawn concurrent processes able to receive messages of certain types (the idea of typed processes was borrowed from [http://mibori.livejournal.com/64115.html this Livejournal article] (in Russian). The message distribution/transport mechanism is entirely provided by the Erlang VM runtime.
+
From Haskell program standpoint, Erlang VM is just an execution environment providing system calls that are strict on all their arguments, and may have variable number of arguments. Haskell program may spawn concurrent processes able to receive messages of certain types (the idea of typed processes was borrowed from [http://mibori.livejournal.com/64115.html this Livejournal article], in Russian). The message distribution/transport mechanism is entirely provided by the Erlang VM runtime.
   
 
===Lazy computations===
 
===Lazy computations===
Erlang is a [http://en.wikipedia.org/wiki/Strict_evaluation strict] language. This means that functions always evaluate their agruments, and values get passed around already computed. In Haskell, due to its [http://en.wikipedia.org/wiki/Lazy_evaluation lazy/non-strict] nature, some values are passed around un-evaluated, and may be evaluated later as needed (or never at all). "Traditional" implementation of Haskell runtime often combine non-strict evaluation with [http://en.wikipedia.org/wiki/Memoization memoization] which serves to avoid redundant evaluation of the same value several times.
+
Erlang is a [http://en.wikipedia.org/wiki/Strict_evaluation strict] language. This means that functions always evaluate their agruments, and values get passed around already computed. In Haskell, due to its [http://en.wikipedia.org/wiki/Lazy_evaluation lazy/non-strict] nature, some values are passed around un-evaluated, and may be evaluated later as needed (or never at all). "Traditional" implementation of Haskell runtime often combine non-strict evaluation with [http://en.wikipedia.org/wiki/Memoization memoization] which serves to avoid redundant evaluation of the same expression several times.
   
However memoization is not possible when compiling Haskell into (Core) Erlang because of Erlang's [http://en.wikipedia.org/wiki/Single_assignment single assignment] nature. Once created, objects in Erlang are immutable (with few exceptions that are of no value in this experiment). As alternative to memoization, the following approach, based on strictness analysis is used.
+
However memoization is not possible when compiling Haskell into (Core) Erlang because of Erlang's [http://en.wikipedia.org/wiki/Single_assignment single assignment] nature. Once created, objects in Erlang are immutable (with few exceptions that are of no value to this experiment). As alternative to memoization, the following approach, based on strictness analysis is used.
   
[http://hackage.haskell.org/packages/archive/yhccore/0.9/doc/html/Yhc-Core-Strictness.html The Yhc Core Strictness Analyzer] is able to determine whether a function is strict on some (or none) of its arguments. This is done via analysis from the bottom up, with OS/platform primitives (usually assumed strict on all arguments, and the same applies to Erlang primitives). If a function passes its argument to another function which will evaluate it, the former function is also strict on that argument. If a function unconditionally evaluates its argument as a case statement scrutinee, it is also strict on that argument, and this strictness propagates to other functions which call it.
+
[http://hackage.haskell.org/packages/archive/yhccore/0.9/doc/html/Yhc-Core-Strictness.html The Yhc Core Strictness Analyzer] is able to determine whether a function is strict on some (or none) of its arguments. This is done via analysis from the bottom up, with OS/platform primitives (strict on all arguments, which is true for Erlang primitives) being on the bottom. If a function passes its argument to another function which evaluates it, the former function is also strict on that argument. If a function unconditionally evaluates its argument as a case statement scrutinee, it is also strict on that argument, and this strictness propagates to other functions which call it.
   
 
After Yhc Core is linked and optimized, strictness analysis is run on it. All Erlang primitives (see [[#Haskell calling Erlang|below]] about possible ways to call Erlang functions from Haskell) are considered strict on all arguments (and they naturally are). If a function is determined to be strict on some of its arguments, for each such argument a code is inserted into the function's body to make sure these arguments will be evaluated as early as possible, and will be passed around evaluated. While this does not replace memoization, it is expected that such approach will at least eliminate some redundant computations.
 
After Yhc Core is linked and optimized, strictness analysis is run on it. All Erlang primitives (see [[#Haskell calling Erlang|below]] about possible ways to call Erlang functions from Haskell) are considered strict on all arguments (and they naturally are). If a function is determined to be strict on some of its arguments, for each such argument a code is inserted into the function's body to make sure these arguments will be evaluated as early as possible, and will be passed around evaluated. While this does not replace memoization, it is expected that such approach will at least eliminate some redundant computations.
   
Another consequence, for functions with side effects involved in sequential computations, the runtime implementation must carefully observe that already computed value is passed to the continuation, rather than an unevaluated thunk, since repeated evaluation of the thunk results in repeated side effect.
+
Another consequence, for functions with side effects involved in sequential computations, the runtime implementation must carefully observe that value already computed is passed to the continuation, rather than an unevaluated thunk, since repeated evaluation of the thunk results in repeated side effect.
  +
  +
===Haskell modules===
  +
Individual modules that the resulting Core Erlang module is linked from, are not preserved. All modules are combined into one. This Core Erlang module will be named based on the name of the first module compiled by Yhc (that is, the module which was specified on the yhc's command line with ''--linkcore'' option) for this Core Erlang module. Module name is lowercased, and prefixed with ''"hs_"''.
  +
  +
Thus, for a Haskell module named <code>Test1</code>, Core Erlang module will be named <code>hs_test1</code>.
  +
  +
This behavior was only tested on Haskell modules with non-hierarchical names. It is yet to be agreed upon how hierarchical Haskell module names map into Erlang module names (that is, what the dots in Haskell module name will be replaced with).
   
 
===Haskell objects===
 
===Haskell objects===
Line 48: Line 57:
   
 
====Functions====
 
====Functions====
Each Haskell (or, more precisely, Yhc Core) function is translated to corresponding Core Erlang function. Names of functions are not generally preserved. Functions that are exported (expected to be called by Erlang code) keep their names with module name stripped off(remember that Yhc Core file is linked from multiple individual Core files), and some characters replaced (such as primes as they are used in Core Erlang to quote atom names) with underscores. Functions not to be exported are given unique numeric identifiers prepended with dot (.) to form valid Erlang atom names. Thus, for example the <hask>Prelude.map</hask> function is translated from this Yhc Core representation:
+
Each Haskell (or, more precisely, Yhc Core) function is translated to corresponding Core Erlang function. Names of functions are not generally preserved. Functions that are exported (expected to be called by Erlang code) keep their names with module name stripped off (remember that Yhc Core file is linked from multiple individual Core files), and some characters replaced (such as primes as they are used in Core Erlang to quote atom names) with underscores. Functions not to be exported are given unique numeric identifiers prepended with dot (.) to form valid Erlang atom names. Thus, for example the <hask>Prelude.map</hask> function is translated from this Yhc Core representation:
   
 
<haskell>
 
<haskell>
Line 95: Line 104:
   
 
====Forcing evaluation of Haskell expressions====
 
====Forcing evaluation of Haskell expressions====
Non-function Haskell objects are represented using Erlang tuples, tagged with the first member, an atom. The [http://darcs.haskell.org/yhc/src/translator/erlang/00proof/hserl.erl runtime support module] (written in Erlang) provides the <code>force/1</code> function that is called every time a Haskell expression needs to be evaluated (see example og the <hask>map</hask> function above when <code>hserl:force</code> is called upon a case scrutinee variable. For expressions already evaluated, <code>force/1</code> returns its argument as supplied, but in some cases it will actually evaluate its argument, and return a computed value.
+
Non-function Haskell objects are represented using Erlang tuples, tagged with the first member, an atom. The [http://darcs.haskell.org/yhc/src/translator/erlang/00proof/hserl.erl runtime support module] (written in Erlang) provides the <code>force/1</code> function that is called every time a Haskell expression needs to be evaluated (see example of the <hask>map</hask> function above when <code>hserl:force</code> is called upon a case scrutinee variable. For expressions already evaluated, and further unreducible, <code>force/1</code> returns its argument as supplied, but otherwise it will actually evaluate its argument, and return a computed value.
   
 
====Thunks====
 
====Thunks====
Line 110: Line 119:
 
* Saturated application (''Arity'' == length (''Args'')): <code>erlang:apply</code> is called upon ''Func'' and ''Args''; result is forced again.
 
* Saturated application (''Arity'' == length (''Args'')): <code>erlang:apply</code> is called upon ''Func'' and ''Args''; result is forced again.
   
* Oversaturated application (''Arity'' == 1, length (''Args'') > ''Arity''): ''Func'' is applied to the head of ''Args'', the result is applied to the remainder of ''Args'', etc. (also known as [www.haskell.org/~simonmar/papers/eval-apply.pdf Eval-Apply] evaluation strategy). Only curried versions of functions may be involved (of arity 1); thus partial application is simply impossible.
+
* Oversaturated application (''Arity'' == 1, length (''Args'') > ''Arity''): ''Func'' is applied to the head of ''Args'', the result is applied to the remainder of ''Args'', etc. (known as [http://www.haskell.org/~simonmar/papers/eval-apply.pdf Eval-Apply] evaluation strategy). Only curried versions of functions may be involved (of arity 1); thus partial application is simply impossible.
   
 
* Partial/oversaturated application of n-ary function: should not occur.
 
* Partial/oversaturated application of n-ary function: should not occur.
Line 123: Line 132:
 
''Module'' and ''Function'' are atoms.
 
''Module'' and ''Function'' are atoms.
   
If a CAF is part of function application, it is called first, and whatever is returned, is evaluated again (this may be another CAF, or a function). Then the application is processed as described above.
+
If a CAF is part of function application, it is called first, and whatever is returned, is evaluated again (this may be another CAF, or a function). Once an actual function is obtained from CAF's evaluation, the application is processed as described above.
   
 
====Data constructors====
 
====Data constructors====
Line 130: Line 139:
 
* <hask>Prelude.:</hask> is renamed to ''.CONS'';
 
* <hask>Prelude.:</hask> is renamed to ''.CONS'';
 
* <hask>Prelude.[]</hask> is renamed to ''.EOL'';
 
* <hask>Prelude.[]</hask> is renamed to ''.EOL'';
* Tuple constructors are renamed to ''.TUP''<code>n</code>, where <code>n</code> is number of commas (so 2-tuple is ''.TUP1'').
+
* Tuple constructors are renamed to ''.TUP''<code>n</code>, where <code>n</code> is number of '''commas''' (so 2-tuple is ''.TUP1'').
   
Applications of data constructors are Erlang tuples tagged with atom ''@dt''. These tuples have variable length, as arguments do not form a list, but rather are all included in the tuple.
+
Applications of data constructors are Erlang tuples tagged with atom ''@dt''. Arguments of a data constructor do not form a list, but rather are all included in the tuple.
   
Applications of data constructors are non-strict on all their arguments, and do not change when <code>hserl:force</code> is applied to them.
+
Applications of data constructors are non-strict on all their arguments, and <code>hserl:force</code> applied to an application of a data constructor returns the same application.
   
 
====Special cases====
 
====Special cases====
Erlang list objects are wrapped in an Erlang tuple tagged with ''@lst'' when passed to Haskell functions as values. The <code>hserl:force</code> function lazily converts such lists into Haskell lists such as:
+
Erlang list objects are wrapped in Erlang tuples tagged with ''@lst'' when passed to Haskell functions as values. The <code>hserl:force</code> function lazily converts such lists into Haskell lists such as:
   
 
<code>
 
<code>
Line 144: Line 153:
 
</code>
 
</code>
   
Note that <code>hserl:force</code> does not attempt to evaluate its results like it does when evaluating thunks.
+
Another function, <code>hserl:hslist</code>, does the opposite: converts a Haskell list (must be finite) to an Erlang list.
   
 
===Haskell calling Erlang===
 
===Haskell calling Erlang===
Line 164: Line 173:
 
* The <hask>Erlang</hask> class provides a method <hask>toErlang</hask> which should be applied to a value of proper type to convert it into a value that would be correctly processed by Erlang. This method is strict on its argument, so, e. g. infinite lists cannot be passed to Erlang functions.
 
* The <hask>Erlang</hask> class provides a method <hask>toErlang</hask> which should be applied to a value of proper type to convert it into a value that would be correctly processed by Erlang. This method is strict on its argument, so, e. g. infinite lists cannot be passed to Erlang functions.
   
* Due to variadic nature of Erlang functions, special infix operators are introduced which play the same role as opening parentheses, commas, and closing parentheses.
+
* Due to variadic nature of Erlang functions, special infix operators are introduced which play the same role as opening parentheses, commas, and closing parentheses in Erlang function calls.
   
 
So, in the Haskell code used in this experiment, the following definitions are given:
 
So, in the Haskell code used in this experiment, the following definitions are given:
Line 222: Line 231:
 
In some cases, it may be more convenient to use the Haskell FFI to import certain Erlang functions (they will be treated as Yhc Core primitives).
 
In some cases, it may be more convenient to use the Haskell FFI to import certain Erlang functions (they will be treated as Yhc Core primitives).
   
Yhc has loosened requirements to the syntax of FFI declarations: anything may be used as the name of calling conventions, or imported entity identifier. It is therefore OK to write:
+
Yhc has loose requirements to the syntax of FFI declarations: anything may be used as the name of a calling convention, or imported entity identifier. It is therefore OK to write:
   
 
<haskell>
 
<haskell>
Line 230: Line 239:
 
thus importing the <code>hserl:force</code> function as a primitive. FFI-based imports may be used when imported function's type signature does not fit in the General calling convention.
 
thus importing the <code>hserl:force</code> function as a primitive. FFI-based imports may be used when imported function's type signature does not fit in the General calling convention.
   
In addition to <hask>ERlObj</hask>, the following opaque types are defined that are used by imported Erlang primitives:
+
In addition to <hask>ErlObj</hask>, the following opaque types are defined that are used by imported Erlang primitives:
   
 
* <hask>Atom</hask>: a type to represent Erlang [http://erlang.org/doc/reference_manual/data_types.html#2.3 atoms].
 
* <hask>Atom</hask>: a type to represent Erlang [http://erlang.org/doc/reference_manual/data_types.html#2.3 atoms].
Line 244: Line 253:
 
From the point of view adopted in this experiment, calls of Haskell functions from Erlang are rare. This mostly happens when a Haskell application compiled for Erlang VM is started; so a predefined function like <hask>main</hask> is invoked.
 
From the point of view adopted in this experiment, calls of Haskell functions from Erlang are rare. This mostly happens when a Haskell application compiled for Erlang VM is started; so a predefined function like <hask>main</hask> is invoked.
   
For Erlang standpoint, Haskell functions are just regular functions. It is however good to observe few rules:
+
From the Erlang standpoint, Haskell functions are just regular functions. It is however good to observe few rules:
   
 
* Haskell objects are not always identical to Erlang objects. For example, while numbers may be passed as they are, lists need to be wrapped in tagged tuples. Some Erlang objects do not have exact mapping to Haskell objects.
 
* Haskell objects are not always identical to Erlang objects. For example, while numbers may be passed as they are, lists need to be wrapped in tagged tuples. Some Erlang objects do not have exact mapping to Haskell objects.
   
* Haskell programs are compiled with respect to function type signatures defined in the source code (or inferred during compilation). Haskell functions do not contain code to determine values of which types are passed to as arguments. Haskell function compiled to receive integer values will behave unpredictably when given a string or a list of integers as its argument.
+
* Haskell programs are compiled with respect to function type signatures defined in the source code (or inferred during compilation). Haskell functions do not contain code to check types of values passed as arguments. Haskell function compiled to receive integer values will behave unpredictably when given a string or a list of integers as its argument.
   
 
* It is strongly recommended to export from Haskell modules only functions that take values of monomorphic types. That is, type signature <hask>Int -> String</hask> is OK while <hask>(Num a) => a -> String</hask> is not. For the latter function to work correctly, additional argument should be passed which is a proper class dictionary. Dictionaries are nearly impossible to export as Haskell functions; they will be assigned numeric identifiers unrecognizable at Erlang side.
 
* It is strongly recommended to export from Haskell modules only functions that take values of monomorphic types. That is, type signature <hask>Int -> String</hask> is OK while <hask>(Num a) => a -> String</hask> is not. For the latter function to work correctly, additional argument should be passed which is a proper class dictionary. Dictionaries are nearly impossible to export as Haskell functions; they will be assigned numeric identifiers unrecognizable at Erlang side.
   
* Due to non-strict nature of Haskell, it is recommended to wrap call to such function in <code>hserl:force</code> if a function returns numeric value or an atom, or <code>hserl:hslist</code>, if a function returns a list (e. g. a String):
+
* Due to non-strict nature of Haskell, it is recommended to wrap call to a function exported from a Haskell-originated module in <code>hserl:force</code> if a function returns numeric value or an atom, or <code>hserl:hslist</code>, if a function returns a list (e. g. a String):
   
* It is sometimes advisable not to try to call a Haskell function with its arguments directly, but instead form a thunk (an Erlang tuple tagged with ''@ap'') including function's name and arguments, and apply <code>hserl:force</code> or <code>hserl:hslist</code> to the thunk.
+
* It is sometimes advisable not to try to call a Haskell function with its arguments directly, but to form a thunk (an Erlang tuple tagged with ''@ap'') instead, including function's name and arguments, and apply <code>hserl:force</code> or <code>hserl:hslist</code> to the thunk.
   
 
Necessity of using <code>hserl:force</code> or <code>hserl:hslist</code> when calling Haskell functions from Erlang is illustrated below:
 
Necessity of using <code>hserl:force</code> or <code>hserl:hslist</code> when calling Haskell functions from Erlang is illustrated below:
Line 271: Line 280:
   
 
===Typed processes===
 
===Typed processes===
Ability to handle large number of concurrent processes is an essential feature of Erlang runtime that Haskell programs may take advantage of. Like everyhting in Haskell, processes also bear some type signatures - type signatures of messages that may be sent to them. This idea was implemented in [http://www.iist.unu.edu/~vs/haskell/dhs/ Erlang-style Distributed Haskell], and later its implementation was shown on [http://mibori.livejournal.com/64115.html this] LiveJournal article (in Russian).
+
Ability to handle large number of concurrent processes is an essential feature of the Erlang runtime that Haskell programs take advantage of. Like everyhting in Haskell, processes also bear some type signatures - type signatures of messages that may be sent to them. This idea was implemented in [http://www.iist.unu.edu/~vs/haskell/dhs/ Erlang-style Distributed Haskell], and later its implementation was shown on [http://mibori.livejournal.com/64115.html this] LiveJournal article (in Russian).
   
 
A typed process consists of an optional prologue, and a message-processing function. To describe a type of such function, the following newtype was defined:
 
A typed process consists of an optional prologue, and a message-processing function. To describe a type of such function, the following newtype was defined:
Line 279: Line 288:
 
</haskell>
 
</haskell>
   
If there is a prologue, it should finally evaluate to a function with this type.
+
If there is a prologue, it should evaluate to a function of this type.
   
 
From this type signature, it follows that a message processing function takes an argument of some type ''a'', and returns a function that takes an argument of the same type. In other words, each message arriving causes the function to process it, and at the end the function has no choice but request another message (or to call <code>erlang:exit</code> to terminate the process).
 
From this type signature, it follows that a message processing function takes an argument of some type ''a'', and returns a function that takes an argument of the same type. In other words, each message arriving causes the function to process it, and at the end the function has no choice but request another message (or to call <code>erlang:exit</code> to terminate the process).
Line 289: Line 298:
   
 
pong :: Process PingPong
 
pong :: Process PingPong
  +
  +
-- Prologue: print a message
   
 
pong = ("io", "format") `lpar` "Pong started\n" `rpar` \_ ->
 
pong = ("io", "format") `lpar` "Pong started\n" `rpar` \_ ->
  +
  +
-- Message processing function
  +
 
Process pong' where
 
Process pong' where
 
pong' = \msg ->
 
pong' = \msg ->
  +
  +
-- At this point the message has been received
  +
 
case msg of
 
case msg of
  +
  +
-- Another process requests this process to terminate
  +
 
Finished -> ("io", "format") `lpar` "PONG: Finished\n" `rpar` \_ ->
 
Finished -> ("io", "format") `lpar` "PONG: Finished\n" `rpar` \_ ->
 
exit (mkAtom "ok")
 
exit (mkAtom "ok")
  +
  +
-- Another process pings this process: PID is contained in the message
  +
 
Ping p -> ("io", "format") `lpar` "PONG: Ping!!!\n" `rpar` \_ ->
 
Ping p -> ("io", "format") `lpar` "PONG: Ping!!!\n" `rpar` \_ ->
  +
  +
-- Reply with message
  +
 
p `send` Pong $ \_ ->
 
p `send` Pong $ \_ ->
  +
  +
-- Return the same message processing function: it will be called when another message arrives
  +
  +
 
Process pong'
 
Process pong'
  +
  +
-- Any other variants of the message are not acceptable
  +
 
_ -> error "Pong"
 
_ -> error "Pong"
 
</haskell>
 
</haskell>
Line 319: Line 352:
 
Note the use of <hask>force</hask>. It is used to ensure that the continuation ''k'' gets a finally evaluated value of a PID rather than a thunk which would evaluate to a PID.
 
Note the use of <hask>force</hask>. It is used to ensure that the continuation ''k'' gets a finally evaluated value of a PID rather than a thunk which would evaluate to a PID.
   
Note the <hask>Boxed</hask> data constructor. This is an ordinary data constructor with one argument which prevents the given expression ''f'' from evaluation in the context of the "parent" process. The <code>hserl:spawn0</code> function unboxes the expression like this:
+
Note the <hask>Boxed</hask> data constructor. This is an ordinary data constructor with one argument which prevents the given expression ''f'' from evaluation in the context of the "parent" process. The <code>hserl:spawn</code> function unboxes the expression like this:
   
 
<code>
 
<code>
Line 398: Line 431:
   
 
==Examples==
 
==Examples==
Sample code to demonstrate results of the experiment was checked into the Yhc Darcs repo. Haskell and Erlang sources as well as compiled BEAM files are located at http://darcs.haskell.org/yhc/src/translator/erlang/00proof/ .
+
Sample code to demonstrate results of the experiment was checked into the Yhc Darcs repo. Haskell and Erlang sources as well as compiled BEAM files are located at http://darcs.haskell.org/yhc/src/translator/erlang/00proof/.
  +
  +
All Haskell code necessary for this experiment is in this file:
  +
http://darcs.haskell.org/yhc/src/translator/erlang/00proof/Test1.hs
  +
  +
Minimal Erlang runtime is in this file:
  +
http://darcs.haskell.org/yhc/src/translator/erlang/00proof/hserl.erl
   
 
In order to use these examples, download these two files:
 
In order to use these examples, download these two files:
Line 415: Line 448:
 
fac :: Int -> Int
 
fac :: Int -> Int
   
fac 0 = 0
+
fac 0 = 1
fac 1 = 1
 
 
fac n = n * fac (n - 1)
 
fac n = n * fac (n - 1)
 
</haskell>
 
</haskell>
Line 445: Line 478:
   
 
Note that the ''Ping'' process takes number of repetitions and ''Pong'' 's PID as parameters.
 
Note that the ''Ping'' process takes number of repetitions and ''Pong'' 's PID as parameters.
  +
  +
==Yhc core conversion program==
  +
A program was developed in the course of this experiment which given a linked Yhc Core file outputs to the standard output Core Erlang source suitable for compiling to the BEAM format with erlc. This program must be given at least one command line parameter: a path to the Yhc Core file to be converted to Core Erlang. Additional arguments will specify which functions are to be visible at the Erlang side. In absence of such arguments, <hask>Prelude.main</hask> is always assumed visible to Erlang.
  +
  +
When specifying those Erlang-visible functions, names in Yhc Core should be provided rather than Haskell names. In Yhc Core, dot between module name and function name is replaced with semicolon; other dots are preserved. A function named <hask>A.B.C.d</hask> will be named <hask>A.B.C;d</hask> in Yhc core.
  +
  +
Refer to the [http://darcs.haskell.org/yhc/src/translator/erlang/00proof/Makefile Makefile] used in this experiment for an example of running this program with 3 functions made visible to Erlang.
  +
  +
The Yhc Core conversion program is called '''y2e'''.
  +
  +
==Experiment on your own==
  +
If you wish to try compiling your own Haskell code and test it in Erlang environment, consider the following:
  +
  +
#Obtain the Yhc source: <code>darcs get http://darcs.haskell.org/yhc</code>
  +
#Compile and build the Yhc itself: <code>scons core=1 build && scons install</code>
  +
#Change to the directory within the Yhc source tree: <code>cd src/translator/erlang/00proof</code>
  +
#Edit the [http://darcs.haskell.org/yhc/src/translator/erlang/00proof/Test1.hs Test1.hs] file adding your own functions. It is not recommended to start a new file from scratch as there is no library infrastructure created for the Erlang backend yet while Test1.hs contains all necessary definitions and foreign imports
  +
#Modify the Makefile adding names of your functions to the command line of ''y2e'' within the <code>hs_test1.beam</code> target
  +
#Run <code>make</code> to rebuild the Core Erlang file (hs_test1.core).
  +
#Start the Erlang shell: <code>erl</code>
  +
#In the shell, using recommendations above how to call Haskell functions from Erlang, test your functions.
  +
  +
==Future directions==
  +
The following ideas may be considered for the future development of the Yhc Erlang backend:
  +
  +
* Use <hask>foreign export</hask> declarations in Haskell source to mark functions visible to Erlang, and specify names these functions are desired to be visible under
  +
  +
* Extend the receive messages API by wrapping the reference to the message processing function using some data constructor. This might be for example:
  +
  +
<haskell>
  +
data MSGRQ a = Receive (Process a) | ReceiveTm Int (Process a)
  +
</haskell>
  +
  +
A message-driven process might look like this:
  +
  +
<haskell>
  +
--* This code was not tested *--
  +
  +
receive = Receive . Process
  +
  +
pong :: Process PingPong
  +
  +
pong = ("io", "format") `lpar` "Pong started\n" `rpar` \_ ->
  +
receive pong' where
  +
pong' = \msg ->
  +
...
  +
receive pong'
  +
</haskell>
  +
  +
==Developer's contact information==
  +
Please contact the developer at ''golubovsky at gmail dot com'' for any questions, suggestions, and concerns. Any feedback is appreciated.
  +
  +
==Just for fun()==
  +
If this thing takes off, the logo might look like this:
  +
  +
[[image:hs_beams.jpg]]

Latest revision as of 17:40, 19 May 2008

Part of Yhc

(Download)

Contents


[edit] 1 Introduction

This Wiki article describes an experiment targeting execution of Haskell programs on top of the Erlang Virtual Machine (BEAM). Haskell source code is compiled to Yhc Core with York Haskell Compiler (Yhc), next the program further discussed converts Yhc Core to Core Erlang; finally Erlang Compiler (erlc) compiles Core Erlang to the BEAM file format which can be loaded and executed by the Erlang VM.

There have been numerous discussions about Haskell (mainly GHC) runtime lacking some properties that are available in Erlang environment, as well as about possible improvements in Erlang language syntax and type system to bring some elements available in Haskell.

This experiment is an attempt to answer the criticism from both sides. Once it becomes possible to execute Haskell programs in Erlang environment, Haskell users get access to the robust concurrency-oriented runtime, still being able to use Haskell native syntax. Erlang users get possibility to develop some algorithms with regard to the Haskell strong type system, while still being able to code directly in Erlang, where it seems more appropriate implementation-wise.

[edit] 2 Similar projects

Haskerl: converts a subset of Haskell syntax to Core Erlang. Pieces of Haskerl's code were used in the course of this experiment.

Lisp Flavoured Erlang (LFE): "a lisp syntax front-end to the Erlang compiler"

Javascript Flavoured Erlang (ErlyJs): "a Javascript compiler running on and compiling for the Erlang virtual machine"

A tight bridge between Erlang and OCaml (erlocaml): "this binding could be a way to use Ocaml in distributed systems, where Erlang is used as a "systems glue" for supervision, load balancing, replication, etc."

[edit] 3 Implementation details

This section discusses in deep the approach taken in this experiment. It is good to remember that nothing yet is final; some of the techniques described may possibly make it into the mainstream code, while others may not. This is only the beginning.

[edit] 3.1 Core Erlang overview

The Core Erlang initiative project is a "collaboration between the High-Performance Erlang (HiPE) project at the Department of Information Technology of Uppsala University, and Ericsson's OTP/Erlang developers".

Core Erlang is an intermediate form of Erlang source compilation. It provides a desugared (compared to Erlang) syntax of a strict functional language. Erlang source may be compiled to Core Erlang, and Core Erlang may be compiled to BEAM bytecode.

Core Erlang plays the same role in the Erlang compilation process as Yhc Core in the Haskell (Yhc) compilation process. So, it turns out to be the most convenient to do the conversion between these formats rather than between e. g. Haskell source and Erlang source.

There was an attempt made earlier to do similar things, only converting from Haskell source (indeed, a subset of Haskell syntax) to Core Erlang. This project is called Haskerl, developed by Torbjörn Törnkvist, (not to be confused with Will Partain's Haskerl). Some source code from Haskerl was used in this experiment, in particular, the definition of an algebraic data type to represent Core Erlang internally, and the pretty printer module for Core Erlang, both with some necessary extensions.

The whole compilation chain looks like this:

  1. Haskell source modules are compiled into Yhc Core and linked;
  2. Some overall program optimizations (functionality provided with the Yhc Core library) are performed on the linked Yhc Core;
  3. Yhc Core is converted to Core Erlang;
  4. The Erlang compiler erlc produces a BEAM file.

[edit] 3.2 Haskell on BEAMs ;)

From the Erlang VM standpoint, a Haskell program is just a large(ish) Erlang module. To run the program, certain function exported from that module (usually,
main
) needs to be called with arguments as necessary. Often the special force function has to be applied to values returned from Haskell-originated module: otherwise some unfinished computation may be returned instead of the expected result.

From Haskell program standpoint, Erlang VM is just an execution environment providing system calls that are strict on all their arguments, and may have variable number of arguments. Haskell program may spawn concurrent processes able to receive messages of certain types (the idea of typed processes was borrowed from this Livejournal article, in Russian). The message distribution/transport mechanism is entirely provided by the Erlang VM runtime.

[edit] 3.3 Lazy computations

Erlang is a strict language. This means that functions always evaluate their agruments, and values get passed around already computed. In Haskell, due to its lazy/non-strict nature, some values are passed around un-evaluated, and may be evaluated later as needed (or never at all). "Traditional" implementation of Haskell runtime often combine non-strict evaluation with memoization which serves to avoid redundant evaluation of the same expression several times.

However memoization is not possible when compiling Haskell into (Core) Erlang because of Erlang's single assignment nature. Once created, objects in Erlang are immutable (with few exceptions that are of no value to this experiment). As alternative to memoization, the following approach, based on strictness analysis is used.

The Yhc Core Strictness Analyzer is able to determine whether a function is strict on some (or none) of its arguments. This is done via analysis from the bottom up, with OS/platform primitives (strict on all arguments, which is true for Erlang primitives) being on the bottom. If a function passes its argument to another function which evaluates it, the former function is also strict on that argument. If a function unconditionally evaluates its argument as a case statement scrutinee, it is also strict on that argument, and this strictness propagates to other functions which call it.

After Yhc Core is linked and optimized, strictness analysis is run on it. All Erlang primitives (see below about possible ways to call Erlang functions from Haskell) are considered strict on all arguments (and they naturally are). If a function is determined to be strict on some of its arguments, for each such argument a code is inserted into the function's body to make sure these arguments will be evaluated as early as possible, and will be passed around evaluated. While this does not replace memoization, it is expected that such approach will at least eliminate some redundant computations.

Another consequence, for functions with side effects involved in sequential computations, the runtime implementation must carefully observe that value already computed is passed to the continuation, rather than an unevaluated thunk, since repeated evaluation of the thunk results in repeated side effect.

[edit] 3.4 Haskell modules

Individual modules that the resulting Core Erlang module is linked from, are not preserved. All modules are combined into one. This Core Erlang module will be named based on the name of the first module compiled by Yhc (that is, the module which was specified on the yhc's command line with --linkcore option) for this Core Erlang module. Module name is lowercased, and prefixed with "hs_".

Thus, for a Haskell module named Test1, Core Erlang module will be named hs_test1.

This behavior was only tested on Haskell modules with non-hierarchical names. It is yet to be agreed upon how hierarchical Haskell module names map into Erlang module names (that is, what the dots in Haskell module name will be replaced with).

[edit] 3.5 Haskell objects

This section describes Erlang data structures used to represent objects visible to Haskell programs.

[edit] 3.5.1 Functions

Each Haskell (or, more precisely, Yhc Core) function is translated to corresponding Core Erlang function. Names of functions are not generally preserved. Functions that are exported (expected to be called by Erlang code) keep their names with module name stripped off (remember that Yhc Core file is linked from multiple individual Core files), and some characters replaced (such as primes as they are used in Core Erlang to quote atom names) with underscores. Functions not to be exported are given unique numeric identifiers prepended with dot (.) to form valid Erlang atom names. Thus, for example the
Prelude.map
function is translated from this Yhc Core representation:
Prelude;map v22178 v22179_f =
    let v22179 = _f_ v22179_f
    in let v22179_c = _f_ v22179
       in case v22179_c of
              Prelude;[] -> Prelude;[]
              (Prelude;:) v22180 v22181 ->
                  (Prelude;:) (v22178 v22180) (Prelude;map v22178 v22181)

to this Core Erlang representation:

'.56'/2 =

 fun (_v22178,_v22179_f) ->
   let <_v22179> =
     <call 'hserl':'force'(_v22179_f)>
     in let <_v22179_c> =
     <call 'hserl':'force'(_v22179)>
     in case <_v22179_c> of
     <{'@dt','.EOL'}> when 'true' ->
       {'@dt','.EOL'}
     <{'@dt','.CONS',_v22180,_v22181}> when 'true' ->
       {'@dt','.CONS',{'@ap',_v22178,1,[_v22180|[]]},
         {'@ap',{'hs_test1','.56'},2,[_v22178|[_v22181|[]]]}}
   end

The code above also shows how some other Haskell objects are represented in Core Erlang.

The function interface shown above (with arity in Core Erlang equal to arity in Yhc Core) is used for saturated calls. For partial and oversaturated function applications, however a slower but more flexible curried interface (of arity 1) is provided:

'.56_c'/1 =

 fun (_v22178) ->
   fun (_v22179_f) ->
     call 'hs_test1':'.56'(_v22178,_v22179_f)

This function, if applied to one argument, returns another function of one argument, which in turn calls the "target" function with both arguments.

Names of curried functions are formed by appending _c to the name of the target function.

[edit] 3.5.2 Forcing evaluation of Haskell expressions

Non-function Haskell objects are represented using Erlang tuples, tagged with the first member, an atom. The runtime support module (written in Erlang) provides the force/1 function that is called every time a Haskell expression needs to be evaluated (see example of the
map
function above when hserl:force is called upon a case scrutinee variable. For expressions already evaluated, and further unreducible, force/1 returns its argument as supplied, but otherwise it will actually evaluate its argument, and return a computed value.

[edit] 3.5.3 Thunks

Thunks, or delayed function calls, are tagged with atom @ap. Below, possible types of thunks, and their evaluation logic are discussed.

General structure of an application thunk is as follows:

{'@ap', Func, Arity, Args}

Func may be a 2-tuple directly identifying a function, or some expression that may evaluate to a function. Arity is usually 1 for partial and oversaturated applications, otherwise it is an arity of a function involved in a saturated call. Args is a list (in Erlang sense) of arguments.

  • Saturated application (Arity == length (Args)): erlang:apply is called upon Func and Args; result is forced again.
  • Oversaturated application (Arity == 1, length (Args) > Arity): Func is applied to the head of Args, the result is applied to the remainder of Args, etc. (known as Eval-Apply evaluation strategy). Only curried versions of functions may be involved (of arity 1); thus partial application is simply impossible.
  • Partial/oversaturated application of n-ary function: should not occur.

[edit] 3.5.4 CAFs

CAFs (nullary functions) are tagged with atom @caf and structured as follows:

{'@caf', Module, Function}

Module and Function are atoms.

If a CAF is part of function application, it is called first, and whatever is returned, is evaluated again (this may be another CAF, or a function). Once an actual function is obtained from CAF's evaluation, the application is processed as described above.

[edit] 3.5.5 Data constructors

Data constructors are renamed similarly to functions, but no curried forms are created because all applications of data constructors in Yhc Core are saturated; the compiler creates necessary wrappers for partial applications itself. Certain data constructors are given sensible identifiers, such as:

  • Prelude.:
    is renamed to .CONS;
  • Prelude.[]
    is renamed to .EOL;
  • Tuple constructors are renamed to .TUPn, where n is number of commas (so 2-tuple is .TUP1).

Applications of data constructors are Erlang tuples tagged with atom @dt. Arguments of a data constructor do not form a list, but rather are all included in the tuple.

Applications of data constructors are non-strict on all their arguments, and hserl:force applied to an application of a data constructor returns the same application.

[edit] 3.5.6 Special cases

Erlang list objects are wrapped in Erlang tuples tagged with @lst when passed to Haskell functions as values. The hserl:force function lazily converts such lists into Haskell lists such as:

force ({'@lst', []}) -> {'@dt', '.EOL'};
force ({'@lst', [H|T]}) -> {'@dt', '.CONS', H, {'@lst', T}};

Another function, hserl:hslist, does the opposite: converts a Haskell list (must be finite) to an Erlang list.

[edit] 3.6 Haskell calling Erlang

This section describes possible ways Haskell programs may call Erlang functions.

[edit] 3.6.1 General calling convention

The main reason Haskell programs may call arbitrary Erlang functions is to perform I/O and to communicate with processes, that is, to perform actions with side effects. Therefore it is important that proper sequence of calls is observed. The General calling convention described here makes use of plain CPS to sequence such actions. The calling convention also ensures that continuation receives evaluated expressions, not thunks, which means that using results of side-effectful actions does not cause side effects to repeat in uncontrolled manner.

Generally, Erlang functions are variadic, that is, functions with same name but different arities are possible. This is not easy to fit into traditional Haskell Foreign Function Interface.

Another aspect of calling Erlang is the dynamic typing nature of Erlang. Some function may accept values that map to various Haskell types hard or impossible to unify. Thus the io:write function accepts arbitrary Erlang terms to cause output of their string representation: both numeric value and a list may be accepted.

Thus, the General calling convention is built on the following principles:

  • An opaque phantom type
    ErlObj
    is used to encode all possible Erlang types. Haskell programs do not have access to internal structure of those objects via this type.
  • Haskell values that may be passed to Erlang functions must have types that are instances of the special
    Erlang
    class.
  • The
    Erlang
    class provides a method
    toErlang
    which should be applied to a value of proper type to convert it into a value that would be correctly processed by Erlang. This method is strict on its argument, so, e. g. infinite lists cannot be passed to Erlang functions.
  • Due to variadic nature of Erlang functions, special infix operators are introduced which play the same role as opening parentheses, commas, and closing parentheses in Erlang function calls.

So, in the Haskell code used in this experiment, the following definitions are given:

class Erlang a where
  toErlang :: a -> ErlObj
 
instance (Erlang a) => Erlang [a] where
  toErlang = hsList . (map toErlang)
 
instance Erlang Int where
  toErlang = identity
 
-- etc.
Here,
hsList
and
identity
are Erlang functions imported as primitives (that is, using another calling convention, see below) which perform appropriate transformations from Haskell representation of objects to Erlang representation (often this does not involve any special actions).

Here is an example of Erlang function calls and equivalent Haskell code using General calling convention:

io:format("Hello "),
io:format("World!").

  ("io", "format") `lpar` "Hello " `rpar` \_ ->
  ("io", "format") `lpar` "World!" `rpar` \_ ->
The
lpar
and
rpar
are equivalent to opening (left) and closing (right) parentheses. Similarly, the same io:format function in its 2-argument form might be called like this:

io:format("|~10.5c|~-10.5c|~5c|~n", [$a, $b, $c])

  ("io", "format") `lpar` "|~10.5c|~-10.5c|~5c|~n" `comma` ['a', 'b', 'c'] `rpar` \_ ->
Note absence of dollar sign between
`rpar`
and its continuation: it is not needed as
`rpar`
is an infix operation itself. It is useful to look at the implementation of
rpar
:
rpar :: ErlApp -> CPS z ErlObj
 
rpar a k = k (force (erlCall a))
The continuation k gets the result of
erlCall
(another primitive) after the
force
(this is the same hserl:force function discussed above, called from Haskell as a primitive) is applied to it. This is to ensure that the continuation gets already evaluated value rather than a thunk whose evaluation may cause a side effect to repeat undesirably.

Please note that spawn should not be called using General calling convention, see Spawning processes below.

[edit] 3.6.2 Primitive calls

In some cases, it may be more convenient to use the Haskell FFI to import certain Erlang functions (they will be treated as Yhc Core primitives).

Yhc has loose requirements to the syntax of FFI declarations: anything may be used as the name of a calling convention, or imported entity identifier. It is therefore OK to write:

foreign import erlang "hserl:force" force :: a -> a

thus importing the hserl:force function as a primitive. FFI-based imports may be used when imported function's type signature does not fit in the General calling convention.

In addition to
ErlObj
, the following opaque types are defined that are used by imported Erlang primitives:
  • Atom
    : a type to represent Erlang atoms.
  • PID a
    : a newtype to represent an Erlang process that receives messages of type a.
  • ErlApp
    : a type to represent an unevaluated application of an Erlang function to its arguments;
    `lpar`
    creates it, out of a 2-tuple with module and function names, with one argument in the arguments list;
    `comma`
    adds an argument to the arguments list;
    `rpar`
    applies the Erlang function referred to by the tuple to the list of arguments accumulated at the moment.

[edit] 3.6.3 Hardcoded BIFs

Certain primitives are hardcoded by the Yhc Core to Core Erlang conversion program. These are mainly Yhc own primitives, and few others, such as arithmetic operations. Calls to these functions are usually generated by Yhc internally, and they are not visible to applications. Such hardcoding in most cases maps Yhc primitives to Erlang BIFs.

[edit] 3.7 Erlang calling Haskell

From the point of view adopted in this experiment, calls of Haskell functions from Erlang are rare. This mostly happens when a Haskell application compiled for Erlang VM is started; so a predefined function like
main
is invoked.

From the Erlang standpoint, Haskell functions are just regular functions. It is however good to observe few rules:

  • Haskell objects are not always identical to Erlang objects. For example, while numbers may be passed as they are, lists need to be wrapped in tagged tuples. Some Erlang objects do not have exact mapping to Haskell objects.
  • Haskell programs are compiled with respect to function type signatures defined in the source code (or inferred during compilation). Haskell functions do not contain code to check types of values passed as arguments. Haskell function compiled to receive integer values will behave unpredictably when given a string or a list of integers as its argument.
  • It is strongly recommended to export from Haskell modules only functions that take values of monomorphic types. That is, type signature
    Int -> String
    is OK while
    (Num a) => a -> String
    is not. For the latter function to work correctly, additional argument should be passed which is a proper class dictionary. Dictionaries are nearly impossible to export as Haskell functions; they will be assigned numeric identifiers unrecognizable at Erlang side.
  • Due to non-strict nature of Haskell, it is recommended to wrap call to a function exported from a Haskell-originated module in hserl:force if a function returns numeric value or an atom, or hserl:hslist, if a function returns a list (e. g. a String):
  • It is sometimes advisable not to try to call a Haskell function with its arguments directly, but to form a thunk (an Erlang tuple tagged with @ap) instead, including function's name and arguments, and apply hserl:force or hserl:hslist to the thunk.

Necessity of using hserl:force or hserl:hslist when calling Haskell functions from Erlang is illustrated below:

3> hs_test1:fuse({'@lst', "ABCDE"}, {'@lst', "123"}).
{'@dt','.CONS',65,
       {'@dt','.CONS',49,
              {'@ap',{hs_test1,fuse},2,[{'@lst',"BCDE"},{'@lst',"23"}]}}}
4> hserl:hslist(hs_test1:fuse({'@lst', "ABCDE"}, {'@lst', "123"})).
"A1B2C3DE"

When hserl:hslist was not applied, the function returns a portion of a Haskell list, and a recursive application of the same function to process the rest of source lists. When hserl:hslist is applied, returned value is as expected: elements of the two lists are interleaved together.

[edit] 3.8 Typed processes

Ability to handle large number of concurrent processes is an essential feature of the Erlang runtime that Haskell programs take advantage of. Like everyhting in Haskell, processes also bear some type signatures - type signatures of messages that may be sent to them. This idea was implemented in Erlang-style Distributed Haskell, and later its implementation was shown on this LiveJournal article (in Russian).

A typed process consists of an optional prologue, and a message-processing function. To describe a type of such function, the following newtype was defined:

newtype Process a = Process (a -> Process a)

If there is a prologue, it should evaluate to a function of this type.

From this type signature, it follows that a message processing function takes an argument of some type a, and returns a function that takes an argument of the same type. In other words, each message arriving causes the function to process it, and at the end the function has no choice but request another message (or to call erlang:exit to terminate the process).

Below is an example of such a process (from the Ping-pong example):

data PingPong = Finished | Ping (PID PingPong) | Pong
 
pong :: Process PingPong
 
-- Prologue: print a message
 
pong = ("io", "format") `lpar` "Pong started\n" `rpar` \_ -> 
 
-- Message processing function
 
  Process pong' where
    pong' = \msg ->
 
-- At this point the message has been received
 
      case msg of
 
-- Another process requests this process to terminate
 
        Finished -> ("io", "format") `lpar` "PONG: Finished\n" `rpar` \_ ->
                    exit (mkAtom "ok")
 
-- Another process pings this process: PID is contained in the message
 
        Ping p -> ("io", "format") `lpar` "PONG: Ping!!!\n" `rpar` \_ ->
 
-- Reply with message
 
                  p `send` Pong $ \_ ->
 
-- Return the same message processing function: it will be called when another message arrives
 
 
                  Process pong'
 
-- Any other variants of the message are not acceptable
 
        _ -> error "Pong"
The process consists of a prologue which just prints a message "Pong started", and evaluates to the
Process
of the necessary type. The message processing function
pong'
is called every time message of type
PingPong
arrives, takes necessary actions based on the message contents (which may have side effects), and finally returns itself wrapped in the same
Process
newtype constructor, thus being ready to process another message.

[edit] 3.8.1 Spawning processes

To correctly spawn a typed process, a special Erlang primitive hserl:spawn0 is defined in the runtime which in turn is a wrapper around the "standard" erlang:spawn function. The primitive, as imported, has this type signature:

foreign import erlang "hserl:spawn" spawn0 :: Boxed (Process a) -> PID a
and the
spawn
function to be called by applications:
spawn f = \k -> k (force (spawn0 $ Boxed f))
Note the use of
force
. It is used to ensure that the continuation k gets a finally evaluated value of a PID rather than a thunk which would evaluate to a PID. Note the
Boxed
data constructor. This is an ordinary data constructor with one argument which prevents the given expression f from evaluation in the context of the "parent" process. The hserl:spawn function unboxes the expression like this:

spawn ({'@dt', _, F}) ->

 erlang:spawn (fun() -> loop (force(F)) end).

The loop function (not exported by the runtime) contains the receive expression which drives the process on the Haskell side. The function applies the given message processing function to the message received, and calls itself recursively on whatever comes from processing the message.

loop (F) ->

 receive (X) ->
   K = force (F (X)),
   loop (K)
 end.

So, in order to spawn a process defined as

pong :: Process PingPong

an aplication calls:

...
spawn pong $ \p ->
...

which causes p to be bound to the PID of the spawned process.

[edit] 3.8.2 Receiving messages

A message processing function is called every time a message arrives. Return of the same function, or another function satisfying the process type means request for another message.

A process may preserve state between processing messages by passing additional arguments around. Below is the Ping process (from the same Ping-Pong example) which has a counter of remaining laps to go:

ping :: PID PingPong -> Int -> Process PingPong
 
ping _ n | n <= 0 = exit (mkAtom "ok")
 
ping p n = ("io", "format") `lpar` "Ping started\n" `rpar` \_ -> 
  self $ \ps ->
  p `send` (Ping ps) $ \_ ->
  Process (ping' (n - 1)) where
    ping' n = \msg -> case msg of
      Pong  ->
        ("io", "format") `lpar` "PING: Pong!!!\n" `rpar` \_ -> 
        if (n <= 0)
          then p `send` Finished $ \_ ->
               ("io", "format") `lpar` "PING: Finished\n" `rpar` \_ ->
               exit (mkAtom "ok")
          else self $ \ps ->
               p `send` (Ping ps) $ \_ ->
               ("io", "format") `lpar` "PING: " ++ show n ++ " to go\n" `rpar` \_ ->
               Process $ ping' (n - 1)
      _ ->
        ("io", "format") `lpar` "PING: Wrong!!!\n" `rpar` \_ ->
        ("erlang", "exit") `lpar` p `comma` mkAtom "kill" `rpar` \_ ->
        exit (mkAtom "ok")
The
ping'
function has additional argument n which it passes with each request for the next message.

Note: The message receiving API described here does not provide any way to request a message with timeout. This is of course doable, but requires some extension of this API which may be taken care of in the future development.

[edit] 3.8.3 Sending messages

To send a message, an application calls:

pid `send` msg $ \_ ->
where
send
is a message sending primitive invoking Erlang's ! operator. Both pid and msg have to be properly typed: messages of type a may only be sent to a process identified with PID a.

[edit] 4 Examples

Sample code to demonstrate results of the experiment was checked into the Yhc Darcs repo. Haskell and Erlang sources as well as compiled BEAM files are located at http://darcs.haskell.org/yhc/src/translator/erlang/00proof/.

All Haskell code necessary for this experiment is in this file: http://darcs.haskell.org/yhc/src/translator/erlang/00proof/Test1.hs

Minimal Erlang runtime is in this file: http://darcs.haskell.org/yhc/src/translator/erlang/00proof/hserl.erl

In order to use these examples, download these two files:

  1. http://darcs.haskell.org/yhc/src/translator/erlang/00proof/hs_test1.beam
  2. http://darcs.haskell.org/yhc/src/translator/erlang/00proof/hserl.beam

in some directory, and start an Erlang shell (erl) there. Erlang version used in this experiment was R12B-1. There was a report received privately that some older versions of Erlang cannot execute these BEAM files.

See the README file for instructions how to run these examples.

[edit] 4.1 Factorial

This is a very simple factorial calculating function. Nothing special can be said about it.

fac :: Int -> Int
 
fac 0 = 1
fac n = n * fac (n - 1)

[edit] 4.2 Merging lists

The fuse function "fuses" two lists together, interleaving their elements until one of the two lists ends (if they are of different length), and then appending the remainder of the longer list to the result.

fuse :: [a] -> [a] -> [a]
 
fuse [] z = z
fuse z [] = z
fuse (x:xs) (y:ys) = x:y:(fuse xs ys)

[edit] 4.3 Ping-pong

This example shows how to work with processes. Its two pieces, Ping and Pong were shown earlier. Here is the main function which starts the processes:

main :: Int -> Atom
 
main nn = ("io", "format") `lpar` "Main: Starting ping-pong " ++ 
                                  show nn ++ " times\n" `rpar` \_ ->
          spawn pong $ \p ->
          spawn (ping p nn) $ \_ ->
          ("io", "format") `lpar` "Main: Finishing\n" `rpar` \_ -> 
          mkAtom "ok"

Note that the Ping process takes number of repetitions and Pong 's PID as parameters.

[edit] 5 Yhc core conversion program

A program was developed in the course of this experiment which given a linked Yhc Core file outputs to the standard output Core Erlang source suitable for compiling to the BEAM format with erlc. This program must be given at least one command line parameter: a path to the Yhc Core file to be converted to Core Erlang. Additional arguments will specify which functions are to be visible at the Erlang side. In absence of such arguments,
Prelude.main
is always assumed visible to Erlang. When specifying those Erlang-visible functions, names in Yhc Core should be provided rather than Haskell names. In Yhc Core, dot between module name and function name is replaced with semicolon; other dots are preserved. A function named
A.B.C.d
will be named
A.B.C;d
in Yhc core.

Refer to the Makefile used in this experiment for an example of running this program with 3 functions made visible to Erlang.

The Yhc Core conversion program is called y2e.

[edit] 6 Experiment on your own

If you wish to try compiling your own Haskell code and test it in Erlang environment, consider the following:

  1. Obtain the Yhc source: darcs get http://darcs.haskell.org/yhc
  2. Compile and build the Yhc itself: scons core=1 build && scons install
  3. Change to the directory within the Yhc source tree: cd src/translator/erlang/00proof
  4. Edit the Test1.hs file adding your own functions. It is not recommended to start a new file from scratch as there is no library infrastructure created for the Erlang backend yet while Test1.hs contains all necessary definitions and foreign imports
  5. Modify the Makefile adding names of your functions to the command line of y2e within the hs_test1.beam target
  6. Run make to rebuild the Core Erlang file (hs_test1.core).
  7. Start the Erlang shell: erl
  8. In the shell, using recommendations above how to call Haskell functions from Erlang, test your functions.

[edit] 7 Future directions

The following ideas may be considered for the future development of the Yhc Erlang backend:

  • Use
    foreign export
    declarations in Haskell source to mark functions visible to Erlang, and specify names these functions are desired to be visible under
  • Extend the receive messages API by wrapping the reference to the message processing function using some data constructor. This might be for example:
data MSGRQ a = Receive (Process a) | ReceiveTm Int (Process a)

A message-driven process might look like this:

--* This code was not tested *--
 
receive = Receive . Process
 
pong :: Process PingPong
 
pong = ("io", "format") `lpar` "Pong started\n" `rpar` \_ ->
  receive pong' where
    pong' = \msg ->
    ...
    receive pong'

[edit] 8 Developer's contact information

Please contact the developer at golubovsky at gmail dot com for any questions, suggestions, and concerns. Any feedback is appreciated.

[edit] 9 Just for fun()

If this thing takes off, the logo might look like this:

Hs beams.jpg