Difference between revisions of "Libraries and tools/HJS"

From HaskellWiki
Jump to navigation Jump to search
m (Making some keywords into links)
(Complete rewrite)
Line 1: Line 1:
  +
HJS - Haskell Javascript Parser and Interpreter
''Javascript - The World's Most Misunderstood Programming Language?'' [http://javascript.crockford.com/javascript.html]
 
   
  +
== Current Status - March 2007 ==
A Javascript [http://developer.mozilla.org/en/docs/About_JavaScript] parser using [[Happy]] and [[Alex]]. It will parse the grammer detailed in ECMA-262 with some additions from [http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference] to support the parsing of specific test cases.
 
  +
Most program fragments can be parsed. This includes a Haskell in Javascript example, [[Yhc/Javascript]], and JS implemented in JS files, [http://lxr.mozilla.org/mozilla/source/js/narcissus/ Narcissus]
 
Possible applications of a Javascript interpreter:
 
* Provide basis for experimenting with monadic interpreters and the addition of things like [[software transactional memory|Software Transactional Memory]] to a 'mainstream language'
 
* Basis for other 'large' language parsers
 
* Embed Javascript in applications and application servers (such as [[HAppS]])
 
   
  +
A smaller set of program fragments can be run. This includes a [http://cfis.savagexi.com/articles/2006/08/25/new-and-improved-javascript-inheritance Javascript inheritance framework]
== Installing ==
 
  +
Comes as a basic [[Cabal]] package [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/hjs-0.1]. If you do change the parser or lexer file you will need to manually
 
  +
Working toward running the YCR2JS and Narcissus files. Current version is available on request.
run Alex and Happy '''*and*''' run FixHappy on the JavaScriptParser.hs file. This will add an 'action' argument
 
  +
to lexer call; this is a 'hack' to get parsing of regular expressions to work.
 
  +
Later: Review the parser code, inefficiences and poor Haskell constructs.
  +
  +
== Introduction ==
  +
HJS is based on the grammer and behaviour specified in [http://www.ecma-international.org/publications/standards/Ecma-262.htm ECMA-262, 3rd Edition] with additions and modifications from [http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference JavaScript 1.5].
  +
  +
Javascript is typically associated with the web dynamic pages and has been called [http://www.crockford.com/javascript/javascript.html The World's Most Misunderstood Programming Language]
 
 
== Todo ==
+
== Code ==
  +
With [http://web.cecs.pdx.edu/~mpj/pubs/modinterp.html Monad Transformers and Modular Interpreters] as the starting point.
* Extend scope of interpreter to all statements
 
* Include function definition and calls
 
* Include objects
 
* Include built in objects and functions
 
   
  +
=== Value ===
  +
This is the type that carries primitive values, objects (in the form of an ID), references and other internal data.
  +
  +
<code>
  +
<pre>
  +
type Value =
  +
-- Primitive values
  +
Either Int
  +
( Either String
  +
( Either Bool
  +
( Either Undefined
  +
( Either Null
  +
-- Internal values
  +
( Either ObjId -- Object ID
  +
( Either Ref -- Reference
  +
( Either [ObjId] -- List of object IDs
  +
( Either CallValue -- Code, used for the 'Call' property of Function object.
  +
( Either [String] -- List of argument names
  +
( Either BreakContinue
  +
() ))))))))))
  +
</pre>
  +
</code>
  +
  +
CallValue allows us to easily incorporate builtin functions along side Javascript functions.
  +
  +
<code><pre>
  +
data CallValue = CallJS [SourceElement] | CallBuiltIn (InterpM Value)
  +
</pre></code>
  +
  +
The class
  +
  +
<code><pre>
  +
class SubType sub sup where
  +
inj :: sub -> sup
  +
prj :: sup -> Maybe sub
  +
</pre></code>
  +
  +
allows us to inject into a Value type, and project out of it.
  +
  +
=== Monads ===
  +
  +
<code><pre>
  +
-- Context is a scope chain, variable object, this and current function
  +
type Ctx = ([ObjId], ObjId, ObjId, ObjId)
  +
  +
-- State is a context stack, object store and line and column of current statement being used.
  +
type JSState = ([Ctx], Map ObjId Object,(Int,Int))
  +
  +
-- Compose state and error. The order is important, if it where the other way round state would be lost.
  +
type InterpM = ErrorT Throwable (StateT JSState Identity)
  +
</pre></code>
  +
  +
=== Parser/Lexer ===
  +
  +
The lexer and parser are two seperate instances of a [[Parsec]] parser. The tokens which are returned by the Lexer are:
  +
  +
<code><pre>
  +
data Token
  +
=
  +
TokenWhite
  +
| TokenInt Int
  +
| TokenIdent String
  +
| TokenStringLit String
  +
| TokenNL
  +
| TokenRegex (String,String)
  +
| TokenEof
  +
| TokenRID String
  +
| TokenROP String
  +
deriving (Show,Eq)
  +
</pre></code>
  +
  +
The lexer returns a list of tuples (SourcePos,Token).
  +
  +
The actual parser constructs follow the grammer rules but are altered to get around left recursion and a whole swath of them ignored by using [[Parsec]]'s builin expression handling.
  +
  +
Statements are decorated with line and column numbers to aid locating of run time errors. As a statement is interpreted this information is added to the state for use in error reporting.
  +
  +
Two challenges were literal regular expressions and automatic semi-colon insertion.
  +
  +
The parser was previously written using [[Happy]] and [[Alex]] but I found the resulting Haskell file large, slow to compile and slow to run. Using [[Parsec]] gives me more control.
  +
  +
=== Object Framework ===
  +
Object is a record type with an ID and a map of properties:
  +
<code><pre>
  +
data Object = Object {
  +
idd :: ObjId,
  +
prototype :: Maybe ObjId,
  +
klass :: String,
  +
value :: Maybe Value,
  +
properties :: M.Map String (Value,[Attribute])
  +
} deriving Show
  +
</pre></code>
  +
  +
Properties on an object have a list of attributes which indicate if the property is to be included in enumeration, is writable etc. There are functions to get and set property values. The 'getProperty' function will traverse up the __proto__ chain looking for the property.
  +
  +
<code><pre>
  +
getProperty :: ObjId -> String -> InterpM Value
  +
putProperty :: ObjId -> String -> Value -> InterpM ()
  +
</pre></code>
  +
  +
=== Interpreter Class ===
  +
  +
All syntax constructs are instances of this class.
  +
  +
<code><pre>
  +
class InterpC t where
  +
interp :: t -> InterpM Value
  +
</pre></code>
  +
  +
Some examples are:
  +
<code><pre>
  +
  +
instance (InterpC t1, InterpC t2) => InterpC (Either t1 t2) where
  +
interp (Left x) = interp x
  +
interp (Right x) = interp x
  +
  +
interp (Assign left AssignNormal right) = do
  +
v <- interp right
  +
r <- interp left
  +
putValue r v
  +
return v
  +
  +
interp (DoWhile s e ) = do
  +
vv <- (interp s) `catchError` handleBreakContinue
  +
case (prj vv) of
  +
(Just Break) -> return (inj (0::Int))
  +
_ -> do
  +
b <- interp e >>= toRealBool
  +
case b of
  +
False -> return vv
  +
_ -> interp (DoWhile s e )
  +
</pre></code>
  +
''Note: Can the DoWhile case be simplified?''
  +
  +
=== Exceptions, Breaks, Continues and Returns ===
  +
These are handled by throwing and catching (in the Error monad) values of the type:
  +
  +
<code><pre>
  +
data Throwable = ThrowReturn Value
  +
| ThrowBreak (Maybe String)
  +
| ThrowContinue (Maybe String)
  +
| ThrowException Value
  +
| ThrowTypeError String
  +
| ThrowInternalError String
  +
deriving Show
  +
</pre></code>
  +
  +
For each of the above there is a specific handling function. If the value does not match the correct constructor then the error is re-thrown. For example
  +
  +
<code><pre>
  +
throwReturn v = throwError (ThrowReturn v)
  +
handleReturn (ThrowReturn v) = return v
  +
handleReturn s = throwError s
  +
</pre></code>
  +
  +
=== Type Conversion ===
  +
Type conversion within the Value type is done using the Convert class:
  +
  +
<code><pre>
  +
class Convert a where
  +
typeOf :: a -> [Int]
  +
typeOf _ = []
  +
toBoolean :: a -> InterpM Value
  +
toBoolean _ = return $ inj False
  +
toNumber :: a -> InterpM Value
  +
toNumber _ = inj (0::Int)
  +
toString :: a -> InterpM Value
  +
toString _ = return $ inj ""
  +
toPrimitive :: PrimHint -> a -> InterpM Value
  +
toPrimitive _ i = return $ undefinedValue
  +
</pre></code>
  +
For instance, toNumber will convert a Value containing something to one containing a number. These operate in the Monad
  +
as some will make calls to functions or get properties.
  +
  +
In order to get these to work with Value values, native Haskell types and Either are made instances of Convert
  +
  +
Conversion to native Haskell types is done using the functions
  +
  +
<code><pre>
  +
toRealInt :: Convert a => a -> InterpM Int
  +
toRealString :: Convert a => a -> InterpM String
  +
toRealBool :: Convert a => a -> InterpM Bool
  +
</pre></code>
  +
  +
 
[[Category:Applications]]
 
[[Category:Applications]]

Revision as of 20:12, 19 March 2007

HJS - Haskell Javascript Parser and Interpreter

Current Status - March 2007

Most program fragments can be parsed. This includes a Haskell in Javascript example, Yhc/Javascript, and JS implemented in JS files, Narcissus

A smaller set of program fragments can be run. This includes a Javascript inheritance framework

Working toward running the YCR2JS and Narcissus files. Current version is available on request.

Later: Review the parser code, inefficiences and poor Haskell constructs.

Introduction

HJS is based on the grammer and behaviour specified in ECMA-262, 3rd Edition with additions and modifications from JavaScript 1.5.

Javascript is typically associated with the web dynamic pages and has been called The World's Most Misunderstood Programming Language

Code

With Monad Transformers and Modular Interpreters as the starting point.

Value

This is the type that carries primitive values, objects (in the form of an ID), references and other internal data.

type Value =  
       -- Primitive values
              Either Int                       
            ( Either String 
            ( Either Bool                              
            ( Either Undefined 
            ( Either Null 
       -- Internal values
            ( Either ObjId                     -- Object ID
            ( Either Ref                       -- Reference
            ( Either [ObjId]                   -- List of object IDs
            ( Either CallValue                 -- Code, used for the 'Call' property of Function object. 
            ( Either [String]                  -- List of argument names
	    ( Either BreakContinue
            ()   ))))))))))

CallValue allows us to easily incorporate builtin functions along side Javascript functions.

data CallValue = CallJS [SourceElement] | CallBuiltIn (InterpM Value)

The class

class SubType sub sup where
    inj :: sub -> sup
    prj :: sup -> Maybe sub

allows us to inject into a Value type, and project out of it.

Monads

-- Context is a scope chain, variable object, this and current function
type Ctx     = ([ObjId], ObjId, ObjId, ObjId)

-- State is a context stack, object store and line and column of current statement being used.
type JSState = ([Ctx], Map ObjId Object,(Int,Int))

-- Compose state and error. The order is important, if it where the other way round state would be lost.
type InterpM = ErrorT Throwable (StateT JSState Identity)

Parser/Lexer

The lexer and parser are two seperate instances of a Parsec parser. The tokens which are returned by the Lexer are:

data Token
      = 
        TokenWhite  
      | TokenInt Int
      | TokenIdent String
      | TokenStringLit String
      | TokenNL
      | TokenRegex (String,String)
      | TokenEof
      | TokenRID String
      | TokenROP String
 deriving (Show,Eq)

The lexer returns a list of tuples (SourcePos,Token).

The actual parser constructs follow the grammer rules but are altered to get around left recursion and a whole swath of them ignored by using Parsec's builin expression handling.

Statements are decorated with line and column numbers to aid locating of run time errors. As a statement is interpreted this information is added to the state for use in error reporting.

Two challenges were literal regular expressions and automatic semi-colon insertion.

The parser was previously written using Happy and Alex but I found the resulting Haskell file large, slow to compile and slow to run. Using Parsec gives me more control.

Object Framework

Object is a record type with an ID and a map of properties:

data Object = Object {
                      idd :: ObjId,
		      prototype :: Maybe ObjId,
		      klass :: String,
		      value :: Maybe Value,
		      properties :: M.Map String (Value,[Attribute]) 
		     } deriving Show

Properties on an object have a list of attributes which indicate if the property is to be included in enumeration, is writable etc. There are functions to get and set property values. The 'getProperty' function will traverse up the __proto__ chain looking for the property.

getProperty :: ObjId -> String -> InterpM Value
putProperty :: ObjId -> String -> Value -> InterpM ()

Interpreter Class

All syntax constructs are instances of this class.

class InterpC t where
    interp  :: t -> InterpM Value

Some examples are:


instance (InterpC t1, InterpC t2) => InterpC (Either t1 t2) where
   interp (Left x) = interp x
   interp (Right x) = interp x

interp (Assign left AssignNormal right) = do 
		                                  v <- interp right
                                                  r <- interp left
                                                  putValue r v
                                                  return v

interp (DoWhile s e ) = do
                            vv <- (interp s) `catchError` handleBreakContinue 
                            case (prj vv) of
                                  (Just Break)  -> return (inj (0::Int))
                                  _   -> do 
                                            b <- interp e >>= toRealBool
                                            case b of
                                                  False -> return vv
                                                  _ -> interp (DoWhile s e )

Note: Can the DoWhile case be simplified?

Exceptions, Breaks, Continues and Returns

These are handled by throwing and catching (in the Error monad) values of the type:

data Throwable = ThrowReturn Value 
	       | ThrowBreak (Maybe String)
	       | ThrowContinue (Maybe String)
	       | ThrowException Value
	       | ThrowTypeError String
	       | ThrowInternalError String 
		 deriving Show

For each of the above there is a specific handling function. If the value does not match the correct constructor then the error is re-thrown. For example

throwReturn v = throwError (ThrowReturn v)
handleReturn (ThrowReturn v) = return v
handleReturn s = throwError s

Type Conversion

Type conversion within the Value type is done using the Convert class:

class Convert a where
   typeOf :: a -> [Int]
   typeOf _ = []
   toBoolean :: a -> InterpM Value
   toBoolean _ = return $ inj False
   toNumber :: a -> InterpM Value
   toNumber _ = inj (0::Int)
   toString :: a -> InterpM Value
   toString _ = return $ inj ""
   toPrimitive :: PrimHint -> a  -> InterpM Value
   toPrimitive  _ i = return $ undefinedValue

For instance, toNumber will convert a Value containing something to one containing a number. These operate in the Monad as some will make calls to functions or get properties.

In order to get these to work with Value values, native Haskell types and Either are made instances of Convert

Conversion to native Haskell types is done using the functions

toRealInt :: Convert a => a -> InterpM Int
toRealString :: Convert a => a -> InterpM String
toRealBool :: Convert a => a -> InterpM Bool