[Haskell-cafe] Tiger compiler: variable escaping analysis phase

Tue Jun 22 10:53:15 EDT 2010

On Tue, Jun 22, 2010 at 10:01:37AM -0300, Felipe Lessa wrote:
> On Tue, Jun 22, 2010 at 09:33:22AM -0300, José Romildo Malaquias wrote:
> > In the variable escaping analysis phase of the compiler, a function
> > findEscape looks for escaping variables and record this information in
> > the escape fields of the abstract syntax. To do this the entire abstract
> > syntax tree is traversed looking for escaping uses of every variable,
> > and, when found, the field is set to true.
> >
> > findEscape uses a symbol table to accomplish its work, binding variable
> > names to a reference to boolean (the same reference used in the abstract
> > syntax tree). When processing a variable declaraction, for instance, it
> > inserts the variable name into the symbol table, binding it to its
> > escaping field. When processing an expression which is a simple
> > variable, if the variable occurs in a nested function, its binding in
> > the symbol table is set to true. This reflects directly in the abstract
> > syntax tree of the variable declaration, as the escape field in the
> > variable declaration and the binding in the symbol table are the same
> > reference to bool.
> 
> Idea of pure algorithm:
> 
>  1) Update the symbol table itself, that is: instead of using
> 
>        (a) Map Symbol (IORef Bool)
> 
>     use
> 
>        (b) Map Symbol Bool.
> 
>     This doesn't change the complexity of the algorithm as
>     searching and updating have the same complexity for many data
>     structures (maps and hashtables, for example).
> 
>     In reality, you don't need to use a Map at all.  Just use
> 
>        (c) Set Symbol
> 
>     and symbols not in the set do not escape.  Using (a) gives
>     you O(n log k) for this step, where n is the size of the AST
>     and k is the number of symbols.  On the other hand, (c) gives
>     you O(n log e), where e is the number of escaping symbols.

These solutions would not work because they do not deal with scopes of
variables. In a Tiger program (as in most programming languges) a
variable name can be reused many times to refer to different variables,
and each variable can escape.

Maybe a set of pairs containing the variable name and its position in
the source code (available from the abstract syntax tree) would be a
good idea.

Then the code generator would traverse the abstract syntax tree and,
when needed, use this set to find out whether a variable escapes.

>  2) After doing the whole analysis, use a function
>     'addEscapeInfo' to process the whole pure AST again adding
>     information about escaped symbols.  This is O(n log e) as
>     well.

If the variable escaping analysis is done and its result is made
available in another data structure, there would be no need to add the
escaping information back to the abstract syntax tree.

> > The second option is more elegant in my point of view, but would be much
> > less efficient than the first one.
> 
> While O(n log e) is better than O(n log k), probably the
> constants in the pure algorithm are higher than their impure
> counterparts.  I guess you could also try to write:
> 
>   1) Take an 'AST e' into 'AST (STRef s Bool, e)' in O(n).
> 
>   2) Use the impure algorithm inside ST monad in O(n log k).
> 
>   3) Take 'AST (STRef s Bool, e)' into 'AST (Bool, e)' in O(n).
> 
>   4) 'runST' on the above steps to get a pure function from
>      'AST e' into 'AST (Bool, e)'.
> 
> The ST monad has the same runtime overhead as IO.  Steps 1) and
> 3) are used to hide the ST monad from the rest of the compiler.

Romildo