Difference between revisions of "STG in Javascript"

Latest revision as of 22:00, 15 December 2008

Note (Aug 27, 2007): This page was started about a year ago. Over time, the focus was changed to integration with Yhc Core, and the work in progress may be observed here: Yhc/Javascript.

Disclaimer: Here are my working notes related to an experiment to execute Haskell programs in a web browser. You may find them bizzarre, and even non-sensual. Don't hesitate to discuss them (please use the Talk:STG in Javascript page). Chances are, at some point a working implementation will be produced.

The Javascript Shell is of great help for this experiment.

Aug 22, 2006

Several people expressed interest in the matter, e. g.: [1], [2].

A Wiki page Hajax has been recently created, which summarizes the achievements in the related fields. By these experiments, I am trying to address the problem of Javascript generation out of a Haskell source.

To achieve this, an existing Haskell compiler, namely nhc98, is being patched to add a Javascript generation facility out of a STG tree: the original compiler generates bytecodes from the same source.

After (unsuccessful) trying several approaches (e. g. Javascript closures (see [3]), it has been decided to implement a STG machine (as described in [4]) in Javascript.

The abovereferenced paper describes how to implemement a STG machine in assembly language (or C). Javascript implementation uses the same ideas, but takes advantage of automatic memory management provided by the Javascript runtime, and also built-in handling of values more complex than just numbers and arrays of bytes.

To describe a thunk, a Javascript object of the following structure may be used:

thunk = {
  _c:function(){ ... },                 // code to evaluate a thunk
  _1:...,                               // argument 1
  _2:...,
  _N:...                                // argument n
};

So, similarly to what is described in the STG paper, the c method is used to evaluate a thunk. This method may also do self-update of the thunk, replacing itself (i. e. this.c) with something else, returning a result as it becomes known (i. e. in the very end of thunk evaluation).

Some interesting things may be done by manipulating prototypes of Javascript built-in classes.

Consider this (Javascript shell log pasted below):


Number.prototype.c=function(){return this};
function(){return this}
(1).c()
1
(2).c()
2
(-999).c()
-999
1
1
2
2
999
999

Thus, simple numeric values are given thunk behavior: by calling the c method on them, their value is returned as if a thunk were evaluated, and in the same time they may be used in a regular way, when passed to Javascript functions outside Haskell runtime (e. g. DOM manipulation functions).

Similar trick can be done on Strings and Arrays: for these, the c method will return a head value (i. e. String.charAt(0)) CONS'ed with the remainder of a String/Array.

Aug 23, 2006

First thing to do is to learn how to call primitives. In Javascript, primitives mostly cover built-in arithmetics and interface to the Math object. Primitives need all their arguments evaluated before they are called, and usually return strict values. So there is no need to build a thunk each time a primitive is called.

At the moment, the following Haskell code:

f :: Int -> Int -> Int

f a b = (a + b) * (a - b)

g = f 1 2

compiles into (part of the Javascript below was inserted manually):

var HMain = {m:"HMain"};

Number.prototype._c=function(){return this;};

// Compiled code starts

HMain.f_T=function(v164,v165){return {_c:HMain.f_C,
                                      _w:"9:1-9:24",
                                      _1:v164,
                                      _2:v165};};
HMain.f_C=function(){
return ((((this._1)._c())+((this._2)._c()))._c())*
        ((((this._1)._c())-((this._2)._c()))._c());
};

HMain.g_T=function(){return {_c:HMain.g_C,_w:"11:1-11:9"};};
HMain.g_C=function(){
return HMain.f_T(1,2); // NB should be HMain.f_T(1,2)._c()
};

// Compiler code ends

print(HMain.f_T(3,4)._c());

print(HMain.g_T()._c()._c());

When running, the script produces:

Running...
-7
-3

So, for each Haskell function, two Javascript functions are created: one creates a thunk when called with arguments (so it is good for saturated calls), another is the thunk's evaluation function. The latter will be passed around when dealing with partial applications (which will likely involve special sort of thunks, but we haven't got down to this as of yet).

Note that the _c() method is applied twice to the output from HMain.g_T: the function calls f_T which returns an unevaluated thunk, but this result is not used, so we need to force the evaluation to get the final result.

NB: indeed, the thunk evaluation function for HMain.g should evaluate the thunk created by HMain.f_T. Laziness will not be lost because HMain.g_C will not be executed until needed.

Sep 12, 2006

To simplify handling of partial function applications, format of thunk has been changed so that instead of _1, _2, etc. for function argument, an array named _a is used. This array always has at least one element which is undefined. Arguments start with array element indexed at 1, so to access an argument n, the following needs to be used: this._a[n].

For Haskell programs executing in a web browser environment, analogous to FFI is calling external Javascript functions. Imagine this Javascript function which prints its argument on the window status line:

// Output an integer value into the window status line

putStatus = function (i) {window.status = i; return i;};

To import such a function is a Haskell program, the following FFI declaration is to be used:

foreign import ccall "putStatus" putStatus :: Int -> Int

Note the type signature: of course it should be somewhat monadic, but for the moment, nothing has been done to support monads, so this signature is only good for testing purposes.

The current NHC98-based implementation compiles the above FFI declaration into this:

Test2.putStatus_T=function(_1){return {_c:Test2.putStatus_C, _w:"7:1-7:56", 
                                       _a:[undefined, _1]};};
Test2.putStatus_C=function(){
return (putStatus)((this._a[1])._c());
};

Note that like a primitive, a foreign function evaluates all its arguments before it starts executing.

A test page illustrating this can be found at:

http://www.golubovsky.org/repos/nhcjs/test2.html

When this page is loaded, the window status line should display "456" while the rest of the page remains blank. The Haskell source for this test page is:

http://www.golubovsky.org/repos/nhcjs/test2.hs

Sep 19, 2006

Initially, functions compiled from Haskell to Javascript were prepresented as members of objects (one object per Haskell module). Anticipating some complications with multilevel module hierarchy, and also with functions whose names contain special characters, it has been decided to pass every function identifier through the fixStr function: in nhc98 it replaces non-alphanumeric characters with their numeric code prefixed with an underscore. So a typical function definition looks like:

p3 :: Int -> Int -> Int -> Int
p3 a b c = (a + b) * c;

compiles into:

var Test3_46p3_T=function(v210, v211, v212){return {_c:Test3_46p3_C, 
                                                    _w:"15:1-15:22", 
                                                    _a:[undefined, 
                                                        v210, v211, v212]};};
var Test3_46p3_C=function(){
return (((((this._a[1])._c())+((this._a[2])._c()))._c())*
         ((this._a[3])._c()))._c();
};

Note the function name: Test3_46p3_T; in previous examples it would have been something like Test3.p3_T.

Partial function applications need a different thunk format. This kind of thunk holds the function to be applied to its arguments when the application will be saturated (number of arguments becomes equal to function arity), number of remaining arguments, and an array of arguments so far.

Thus, for a function:

w = p3 1

resulting Javascript is:

var Test3_46w_T=function(){return {_c:Test3_46w_C, _w:"17:1-17:8", 
                                   _a:[undefined]};};
var Test3_46w_C=function(){
return ({_c:function(){return this;}, _s:Test3_46p3_T, _x:2, _a:[1]})._c();
};

Such a thunk always evaluates to itself (_c()); it holds the function name in its _s member, number of remaining arguments in its _x member, and available arguments in its _a member, only in this case the array does not have undefined as its zeroth element.

An application of such a function (w) to additional arguments:

z = w 2 3

compiles into:

var Test3_46z_T=function(){return {_c:Test3_46z_C, _w:"23:1-23:9", 
                                   _a:[undefined]};};
var Test3_46z_C=function(){
return (HSRuntime_46doApply((Test3_46w_T())._c(), [2, 3]))._c();
};

So, when such an expression is being computed, a special Runtime support function is called, which obtains the partial application thunk via evaluation of its first argument (Test3_46w_T())._c()), and adds the arguments provided ([2, 3]) to the list of arguments available so far. If number of arguments becomes equal to the target function arity, normal function application thunk is returned, otherwise another partial application thunk is returned. The Runtime support function looks like this:

var HSRuntime_46doApply = function (thunk, targs){
  thunk._a = thunk._a.concat (targs);
  thunk._x = thunk._x - targs.length;
  if (thunk._x > 0) {
    return thunk;
  } else {
    return thunk._s.apply (null, thunk._a);
  }
};

Note the use of the apply method. It may be used also with functions that are not methods of some object. The first argument (this_arg) may be null or undefined as it will not be used by the function applied to the arguments.

NHC98 acts differently when a partial application is not defined as a separate function, but is part of another expression.

First, some Haskell definitions:

z :: Int -> Int

z = (3 +)

p :: Int -> Int -> Int

p = (+)

compile into:

var Test4_46z_T=function(){return {_c:Test4_46z_C, _w:"9:1-9:8", 
                                   _a:[undefined]};};
var Test4_46z_C=function(){
return ({_c:function(){return this;}, _s:LAMBDA181_T, _x:1, _a:[]})._c();
};

var LAMBDA181_T=function(v178){return {_c:LAMBDA181_C, _w:"9:8", 
                                       _a:[undefined, v178]};};
var LAMBDA181_C=function(){
return (((3)._c())+((this._a[1])._c()))._c();
};

var Test4_46p_T=function(){return {_c:Test4_46p_C, _w:"13:1-13:6", 
                                   _a:[undefined]};};
var Test4_46p_C=function(){
return ({_c:function(){return this;}, _s:LAMBDA182_T, _x:2, _a:[]})._c();
};

var LAMBDA182_T=function(v179, v180){return {_c:LAMBDA182_C, 
                                             _w:"13:6", 
                                             _a:[undefined, v179, v180]};};
var LAMBDA182_C=function(){
return (((this._a[1])._c())+((this._a[2])._c()))._c();
};

Now, when these functions (p, z) are used:

t4main = putStatus (z (p 6 8)) -- see above for putStatus

the generated Javascript is:

var Test4_46t4main_T=function(){return {_c:Test4_46t4main_C, 
                                        _w:"17:1-17:28", 
                                        _a:[undefined]};};
var Test4_46t4main_C=function(){
return (Test4_46putStatus_T(
          NHC_46Internal_46_95apply1_T(
            Test4_46z_T(), 
            NHC_46Internal_46_95apply2_T(
              Test4_46p_T(), 6, 8)
        )))._c();
};

For each application of p and z, an internal function NHC_46Internal_46_95applyN_T is called where N depends on the target function arity. In Javascript implementation, all these functions are indeed one function (because in Javascript it is possible to determine the number of arguments a function was called with, so no need in separate functions for each arity). The internal function extracts its first argument and evaluates it (by calling the _c() method), getting a partial application thunk. Then, the Runtime support function HSRuntime_46doApply is called with the thunk and arguments array:

var NHC_46Internal_46_95apply1_T = function() {return __apply__(arguments);};
var NHC_46Internal_46_95apply2_T = function() {return __apply__(arguments);};
...
var __apply__ = function (args) {
  var i, targs = new Array();
  var thunk = args[0]._c();
  for (i = 1; i < args.length; i++) {
    targs [i - 1] = args [i];
  }
  return HSRuntime_46doApply (thunk, targs);
};

Note by Dimitry: Just for clarity, Dimitry's part ends here, and Vir's part starts.

Aug 25, 2007

Here's my attempt. I'm going to implement Haskell to javascript compiller, based on STG machine. This appeared to be not so easy task, so I'd be happy to get some feedback.

This is an example translation of some Haskell functions to JavaScript, I'm trying to be descriptive, but if I'm not, please, ask me or write your suggestions. I'm not quite sure if this code is really correct.

// Example of Haskell to JavaScript translation
//
// PAP - Partial Application
// every object (heap object in STG) is called closure here
// closure and function are used interchangable here
//


////////////////////////////////////////////////////////////////
// Run-time system:

var closure; // current entered closure
var args; // arguments
var RCons; // Constructor tag, constructors set this tag to some value
var RVal; // Some returned value

Number.prototype.arity = 0;
Number.prototype.code = function () {
  RVal = closure;
  args = null;
  return null;
}

String.prototype.arity = 0;
String.prototype.code = function ()
{
  if (closure.length == 0) {
    args = null;
    closure = Nil;
    return apply;
  }

  args = new Array (2);
  args[0] = new Number (closure.charCodeAt (0));
  args[1] = closure.slice (1, closure.length);
  closure = Cons;
  return apply;
}

// mini enterpreter is used to implement tail calls
// to jump to some function, we don't call it, but
// return it's address instead
function save_continuation_and_run (function_to_run)
{
  while (function_to_run != null)
      function_to_run = function_to_run ();
}

// calling convention
// function is pointed by a [closure] global variable
// arguments are in [args] array
function apply ()
{
  var f = closure;
  var nargs = 0
  if (args != null)
      nargs = args.length;

  if (f.arity == nargs)
      return f.code;

  if (nargs == 0) {
      // we don't know what to do, so run a continuation
      return null;
  }
  // We CAN'T call a function, so we must build a PAP and call continuation!!!
  if (f.arity > nargs) {
     var supplied_args = args;
     args = null;
     var pap = {
       arity : f.arity - nargs,
       code : function () {
           var new_args = args;
           args = supplied_args
           supplied_args = null;

           // not working, type information is lost... :(
           //args.push (new_args);

           for (i = nargs; i < f.arity; i++)
               args[i] = new_args[i - nargs];
           new_args = null;
           closure = f;
           return apply;
       }
     }

     closure = pap;
     // we don't know what to do, so run a continuation
     return null;
  }

  // f.arity < nargs

  var remaining_args = args.slice (f.arity, nargs);
  args.length = f.arity;

  save_continuation_and_run (f.code)

  // closure now points to some new function, we'll try to call it
  args = remaining_args;
  return apply;
}

// Updates are called and used essentially as apply function
// updatable thunks pushes continuation and runs as usual
// when continuation activates it replaces the closure with the value
// after that it returns to the next continuation
function update()
{
    var f = closure;

    save_continuation_and_run (f.realcode);

    f.RCons = RCons;
    f.RVal = RVal;
    f.args = args;
    f.code = updated_code;
    f.realcode = null;
    return null;
}

function update_code ()
{
  RCons = closure.RCons;
  RVal = closure.RVal;
  args = closure.args;
  return null;
}

////////////////////////////////////////////////////////////////////
// Examples: STG -> JS
/* add = \a b -> case a of {a -> case b of {b -> primOp + a b}} */

add = {
  arity: 2,
  code: function () {
    var a = args[0];
    var b = args[1];
    closure = a;
    args = null;
    save_continuation_and_run (apply);
    var a = RVal;
    closure = b;
    args = null;
    save_continuation_and_run (apply);
    var b = RVal;
    RVal = a + b;
    args = null;
    return null;
  }
}


/*
  compose = \f g x ->
     let gx = g x
     in f gx
*/
compose = {
  arity: 2,
  code: function () {
     var f = args[0];
     var g = args[1];
     var x = args[2];
     var gx = {
       arity : 0,
       code : update,
       realcode : function () {
         closure = g;
         args = new Array (1);
         args[0] = x;
         return apply;
       }
     }
     args = new Array (1);
     closure = f;
     args[0] = gx;
     return apply;
  }
}

ConsTag = 3;
Cons = {
   arity : 2,
   code : function () {
     // This is tag to distinguish this constructor from Nil
     RCons = ConsTag;

     // We must return to continuation, arguments are returned in args array
     return null;
   }
}

NilTag = 2;
Nil = {
   arity : 0,
   code : function () {
     // This is tag to distinguish this constructor from Cons
     RCons = NilTag;

     // We must return to continuation
     return null;
   }
}

/*
    map = \f xs->
        case xs of {
            Cons x xs ->
              let fx = f x
              in let mapfxs = map f xs
                 in Cons fx mapfxs
            ; Nil -> Nil
        }
*/
map = {
    arity: 2,
    code : function () {
       var f = args[0];
       var xs = args[1];
       //push continuation and enter xs
       closure = xs;
       args = null;
       save_continuation_and_run (xs.code)
       switch (RCons) {
           case ConsTag:
             {
                var x = args[0];
                var xs = args[1];
                var fx = {
                  arity : 0,
                  code : update,
                  realcode : function () {
                     closure = f;
                     args = new Array(1);
                     args[0] = x;
                     return apply;
                  }
                }
                var mapfxs = {
                  arity : 0,
                  code : update,
                  realcode : function () {
                     closure = map;
                     args = new Array(2);
                     args[0] = f;
                     args[1] = xs;
                     return apply;
                  }
                }
                closure = cons;
                args = new Array(2);
                args[0] = fx;
                args[1] = mapfxs;
                return apply;
             }
             break;
           case NilTag:
             closure = Nil;
             args = null;
             return Nil.code;
             break;
       }
    }
}

inc3 = {
  arity: 0,
  code: function () {
    args = new Array (1);
    args[0] = 3;
    closure = add;
    return apply;
  }
}

Victor Nazarov

asviraspossible@gmail.com

Aug 29, 2007

Code from previous section was updated. Here are some tests I've used to debug this code:

args = null;
closure = 1013;
save_continuation_and_run (apply);

document.write (RVal + "<br />");

args = new Array(2);
args[0] = 7;
args[1] = 6;
closure = add;
save_continuation_and_run (apply);

document.write (RVal + "<br />");

args = new Array(1);
closure = inc3;
args[0] = new Number(21);
save_continuation_and_run (apply);

document.write (RVal + "<br />");

closure = "123";
args = null;
save_continuation_and_run (apply);

document.write (RCons + "<br />");

closure = args[1];
args = null;
save_continuation_and_run (apply);

document.write (RCons + "<br />");

closure = args[1];
args = null;
save_continuation_and_run (apply);

document.write (RCons + "<br />");

closure = args[1];
args = null;
save_continuation_and_run (apply);

document.write (RCons + "<br />");

The result of this test is the following:

1013 // Means that JS numbers work as closures using prototype trick
13 // Simple function calls are working
24 // Not so simple calls are working: PAP is properly build and used
3 // Cons - list constructor
3 // Cons - list constructor
3 // Cons - list constructor
2 // Nil - list constructor

Last 4 lines shows that javascript strings works properly using prototype trick. We can observe the structure of "123" object: Cons 1 (Cons 2 (Cons 3 Nil))

Sept 4, 2007

I've got some feedback from Edward Kmett. Edward claims that simple trampolining as used below is less efficient than Appels trampoline. Trampolining is a trick to simulate tail calls. Simon Peyton Jones used the same technic as I did in my code (and Dimitry did too). The technic is simple: return continuation to call it. Mini-interpreter is used for trampolining on the stack:

while (f=f())
    ;

It is efficient in Simon version because of GNU C compiler's tweaks (not portable so). But they are not available in JavaScript.

Using the same interpreter in JavaScript I have to return function to simulate tail call and call interpreter again to simulate normal call. This seems very inefficient and Edward claims it is.

The trick is the transformation of the program to Continuation Passing Style. We need no stack at all when using this transformation. So every call is tail call. We can get rid of interpreter and just call functions as usual in JavaScript. We can use counter to count stack frames (function enterings), and when we rich the limit, we don't call continuation directly, but register it as a callback on the timer event (and thus we flush the stack). So we use longer jumps on stack, and some peaple claims it's more efficient. Moreover we get some framework to introduce parallel threads. Stack jump become a quantum for the thread.

I'd like to thank Edward for his ideas, and explore them feather.

Apr 1, 2008

Not an april joke, just links: Dmitry Golubovsky's and YHC Team's Javascript backend and Haskell Web Toolkit: http://yhc06.blogspot.com/2008/03/yhcjavascript-backend.html

Victor Nazarov's GHC backend: http://vir.mskhug.ru/