# Writing efficient free variable traversals

#### bgamari - 2019-07-28

Imagine that I show you a fragment of a Haskell program:

```
let x = y + 1
in x
```

and pose the question: “what are the variables in this program?” You would
likely respond `x`

and `y`

. Now I ask you: what is the difference between those
two variables?

One answer is that `x`

is *bound* by the fragment whereas `y`

comes from the
fragment’s surrounding context. Variables like `y`

are known as *free
variables*.

When compiling a language like Haskell you often find yourself needing to enumerate the free variables of types and expressions. Consequently, GHC’s free variable traversals are some of the most heavily used pieces of code in the compiler and we try hard to ensure that they are efficient.

How does one write an efficient free variable traversal? This is a question we will try to answer in this post.

## A naïve solution

For the sake of concreteness, let’s consider a toy expression language:

```
data Var = Var Int
data Expr = EVar Var -- a variable expression
| ELam Var Expr -- lambda abstraction
| EApp Expr Expr -- function application
| ELit Integer -- an integer literal
```

Since we are talking about *sets* of variables, we need a way to represent such
things (in GHC this is implemented using `containers`

’ `IntMap`

, but anything
with this interface will do):

```
data VarSet
instance Monoid VarSet
instance Semigroup VarSet
insertVarSet :: Var -> VarSet -> VarSet
delVarSet :: Var -> VarSet -> VarSet
unitVarSet :: Var -> VarSet
```

With this groundwork in place, our first attempt at writing a free variable traversal for such a language might look like this:

```
-- A naïve free variable traversal.
freeVars :: Expr -> VarSet
EVar v) = unitVarSet v
freeVars (ELam v e) = delVarSet (freeVars e)
freeVars (EApp fun arg) = freeVars fun <> freeVars arg
freeVars (ELit _) = mempty freeVars (
```

Loading this into GHCi we see that this indeed gives us the right answer for a simple example program:

```
>>> f:x:y:z:_ = map (EVar . Var) [0..]
-- \x -> (\y -> f x) (f z)
>>> e = ELam x ((ELam y (EApp f x)) `EApp` (EApp f z))
>>> freeVars e
VarSet (fromList [EVar (Var 0), EVar (Var 3)])
```

However, this traversal is far from efficient:

We build

`VarSet`

s containing variables for which we have already encountered a binding site. For instance, when computing`freeVars ELam y (EApp x y)`

we will build the full free variable set of the subexpression`EApp x y`

, then perform yet*another*allocation to remove the variable that we bind (e.g.`y`

). It would be better to avoid adding`y`

to our result in the first place.the

`EApp`

case is not tail recursive, resulting in unnecessary stack and heap allocation. To see this we look at the (cleaned-up) STG produced for the RHS of`freeVar`

‘s’`EApp`

case (compiled with`-O0`

), recalling that in STG`let`

bindings correspond directly to thunk allocation:`= \expr -> freeVars case expr of EApp fun arg -> let x = freeVars fun -- allocate a thunk for the free vars of fun = freeVars arg -- allocate a thunk for the free vars of arg y in (<>) x y -- call (<>) with the two thunks ...`

Note that GHC, when invoked with

`-O1`

, will eliminate these thunk allocations as it knows that`(<>)`

is strict in its arguments. This will produce slight less allocation-heavy STG (noting that`case`

in STG corresponds to forcing an expression):`= \expr -> freeVars case expr of EApp fun arg -> case freeVars fun of -> fun_fvs case freeVars arg of -> arg_fvs <>) fun_fvs arg_fvs (...`

Now we have two nested

`case`

analyses, each of which will incur a stack frame push and pop. This cost likely won’t be terrible, but we can nevertheless do better.

## Improving upon our attempt

We can address the issues noted above with a two-pronged plan of attack:

Carry a set of bound variables downward with us as we traverse the expression tree. Consequently, when we encounter a

`EVar`

we can first ask whether we have seen a binding site for that variable*before*committing to allocate a new`VarSet`

.Move to an

*accumulator style*, refactoring`freeVars`

to take a partially-built set of free-variables we have seen thusfar as an argument. This allows us to define`freeVars`

tail recursively.

An implementation of these optimizations might look like:

```
freeVars' :: VarSet -- ^ bound variable set
-> Expr -- ^ the expresson we want the free variables of
-> VarSet -- ^ the accumulator
-> VarSet -- ^ the final free variable set
EVar v) acc
freeVars' boundVars (| memberVarSet v boundVars
= acc -- we have seen the binding site for the variable, not free
| otherwise
= insertVarSet v acc
ELam v e) acc
freeVars' boundVars (= let -- we are binding 'v' so add it to our bound variable set
= insertVarSet v boundVars
boundVars' in freeVars' boundVars' e acc
EApp fun arg) acc
freeVars' boundVars (= freeVars' boundVars fun $ freeVars' boundVars arg acc
ELit _) acc
freeVars' boundVars (= acc
freeVars :: Expr -> VarSet
= freeVars' mempty e mempty freeVars e
```

While a bit more verbose this gives much nicer performance characteristics:

we have eliminated the allocation in the common case that we encounter a variable bound in the fragment we are traversing

computing the

`EApp`

case now allocates only one return frame rather than two:`case freeVars' boundVars arg acc of -> arg_vars freeVars' boundVars fun acc`

### Aside: Avoiding redundant inserts

An additional optimization that can also be useful is to check whether we have
already added a variable to our accumulator *before* inserting:

```
EVar v) acc
freeVars' boundVars (| memberVarSet v boundVars
= acc -- we have seen the binding site for the variable, not free
| memberVarSet v acc
= acc -- the variable is free but has already been added to the accumulator
| otherwise
= insertVarSet v acc
```

In `VarSet`

representations like the `IntMap`

used by GHC this can alleviate
pressure on the garbage collector. Given that GC time is often considerable
in a heavily-allocating program like GHC this is likely a worthwhile
optimization (although I have not yet measured its effect).

However, as it is not critical to the current discussion, we will ignore it for the remainder of the post.

## Cleaning up the implementation

It is nice that this new implementation is fast, but wouldn’t it be great if we
could recover the *readability* of our naïve approach? To do so we might be
tempted to define a type and some combinators to capture the patterns seen in
`freeVars'`

above:

```
-- | A computation to build a free variable set.
newtype FV = FV { runFV :: VarSet -- bound variable set
-> VarSet -- the accumulator
-> VarSet -- the result
}
instance Monoid FV where
-- The empty FV just passes the accumulator along unperturbed
mempty = FV $ \_ acc -> acc
instance Semigroup FV where
-- Unioning two FVs passes the accumulator produced by one to the other
<> fv2 = FV $ \boundVars acc ->
fv1
runFV fv1 boundVars (runFV fv2 boundVars acc)
-- | Consider a variable to be bound in the given 'FV' computation.
bindVar :: Var -> FV -> FV
= FV $ \boundVars acc ->
bindVar v fv
runFV fv (insertVarSet v boundVars) acc
-- | Take note of a variable reference.
unitFV :: Var -> FV
= FV $ \boundVars acc ->
unitFV v if memberVarSet v boundVars
then acc -- variable is known to be bound, ignore
else insertVarSet v acc -- variable is free, insert into accumulator
fvToVarSet :: FV -> VarSet
= runFV fv mempty mempty fvToVarSet fv
```

With this groundwork in place `freeVars`

becomes a simple declarative
description of how to walk the `Expr`

AST:

```
exprFV :: Expr -> FV
EVar v) = unitFV v
exprFV (ELam v e) = bindVar v (exprFV e)
exprFV (EApp fun arg) = exprFV fun <> exprFV arg
exprFV (ELit _) = mempty
exprFV (
= fvToVarSet . exprFV freeVars
```

## Improving code re-use

GHC has used a close relative of the `FV`

type seen above for its free variable
traversals on its Core ASTs for a few years now. However, over the years the
varying needs of different parts of GHC have caused the codebase to sprout
several additional traverals:

- free variable collection that doesn’t guarantee determinism across compiler runs (as GHC uses non-deterministic unique numbers to identify variables)
- free variable collection which only returns local variables
- free variable collection which only returns coercion variables (a variety of variable beyond the scope of the present discussion; see the System FC paper for details)
- traversal that only returns a boolean reflecting whether an expression has any free variables

Moreover, in GHC each of these must be implemented on each of the three ASTs
which comprise Core programs: `Expr`

, `Type`

, and `Coercion`

.

Needless to say, this results in a significant amount of repetition. Is it
possible to abstract over the details of each travesal strategy and share the
bare skeleton of `exprFV`

*and* get preserve our hard-fought performance
improvements? We would be poor functional programmers if we answered “no”.
Let’s see how.

## Abstracting the traversal strategy

To improve code re-use we want to separate the AST *traversal* (that is, how we
walk over the various parts of the AST) from what we call the *strategy* used
to build the free variable set. To guide our abstraction we can start by
looking at the operations used by the `exprFV`

traversal above:

`unitFV`

was used to declare the occurrence of a variable (free or otherwise)`bindVar`

was used to declare a variable binding site`Monoid`

was used to provide an empty free variable set`Semigroup`

was used to union free variable sets

Following a final tagless style, we write down a typeclass capturing all of
these operations (N.B. recall that `Semigroup`

is implied by `Monoid`

):

```
class Monoid a => FreeVarStrategy a where
unitFV :: Var -> a
bindVar :: Var -> a -> a
```

We can now define our traversals in terms of this class (note that only the
type signature has changed from the `exprFV`

seen above; how nice!):

```
exprFV :: FreeVarStrategy a => Expr -> a
EVar v) = unitFV v
exprFV (ELam v e) = bindVar v (exprFV e)
exprFV (EApp fun arg) = exprFV fun <> exprFV arg
exprFV (ELit _) = mempty exprFV (
```

Finally we can write some strategies implementing this interface. To start, the naïve implementation from the beginning of this post can be captured as:

```
newtype Naive = Naive { runNaive :: VarSet }
instance Monoid Naive where
mempty = Naive mempty
instance Semigroup Naive where
Naive a <> Naive b = Naive (a <> b)
instance FreeVarStrategy Naive where
= Naive . unitVarSet
unitFV Naive xs) = Naive (delVarSet v xs) bindVar v (
```

Our optimized implementation is ported with a bit more work:

```
newtype FV = FV { runFV :: VarSet -- bound variable set
-> VarSet -- the accumulator
-> VarSet -- the result
}
instance Monoid FV where
mempty = FV $ \_ acc -> acc
instance Semigroup FV where
<> fv2 = FV $ \boundVars acc ->
fv1
runFV fv1 boundVars (runFV fv2 boundVars acc)
instance FreeVarStrategy FV where
= FV $ \boundVars acc ->
unitFV v if memberVarSet v boundVars
then acc
else insertVarSet v acc
= FV $ \boundVars acc ->
bindVar v fv
runFV fv (insertVarSet v boundVars) acc
fvToVarSet :: FV -> VarSet
= runFV fv mempty mempty fvToVarSet fv
```

Further, we can implement a strategy which determines whether an
expression has any free variables without writing any code looking at `Expr`

(using the same bound-variable set optimized as `FV`

):

```
newtype NoFreeVars = NoFreeVars { runNoFreeVars :: VarSet -- bound variable set
-> Bool -- True is free variable set is empty
}
instance Monoid NoFreeVars where
mempty = NoFreeVars $ const True
instance Semigroup NoFreeVars where
NoFreeVars f <> NoFreeVars g = NoFreeVars $ \boundVars ->
&& g boundVars
f boundVars
instance FreeVarStrategy NoFreeVars where
= NoFreeVars $ \boundVars ->
unitFV v
memberVarSet v boundVarsNoFreeVars f) = NoFreeVars $ \boundVars ->
bindVar tv ($ insertVarSet v boundVars
f
noFreeVars :: NoFreeVars -> Bool
NoFreeVars f) = f mempty noFreeVars (
```

Finally, to guarantee that this abstration carries no cost, we can explicitly
ask GHC to specialise our `exprFV`

traversal for these strategies:

```
{-# SPECIALISE exprFV :: Expr -> Naive #-}
{-# SPECIALISE exprFV :: Expr -> FV #-}
{-# SPECIALISE exprFV :: Expr -> NoFreeVars #-}
```

Looking at `-ddump-stg`

output we can see that GHC has done a wonderful job in
optimising our abstration away.

## The problem of mutual recursion

While this has all worked out swimmingly so far, things can go awry with more complex ASTs (such as Core). In particular, ASTs where the traversal is part of a mutually recursive group can break the nice performance characteristics that we have seen thusfar.

To see how, let’s augment our expression AST with some (admittedly rather peculiar) dependent type system.

```
data Expr = EVar Var Type -- a variable expression and its type
| ELam Var Expr -- lambda abstraction
| EApp Expr Expr -- function application
| ELit Integer -- an integer literal
data Type = ... -- these constructors are irrelevant
| TExpr Expr
```

Note here that the point here is not the type system itself but rather the fact
that our traversals now must be mutually recursive functions For instance, in
GHC Core this mutual recursion happens between the `Type`

and `Coercion`

types.

Writing traversals for this AST is straightforward:

```
exprFV :: FreeVarStrategy a => Expr -> a
EVar v ty) = unitFV v <> typeFV ty
exprFV (ELam v e) = bindVar v (exprFV e)
exprFV (EApp fun arg) = exprFV fun <> exprFV arg
exprFV (ELit _) = mempty
exprFV (
typeFV :: FreeVarStrategy a => Type -> a
TExpr e) = exprFV e
typeFV (-- omitted: traverse other constructors of Type
```

However, if we look at the STG generated for `NoFreeVars`

we see that things
have gone terribly wrong (again recalling that in STG `let`

corresponds to heap
allocation):

```
exprFV :: Expr -> NoFreeVars
= \expr ->
exprFV case expr of
EApp fun arg ->
let fun_fvs :: NoFreeVars
= exprFV fun
fun_fvs in
let arg_fvs :: NoFreeVars
= exprFV arg
arg_fvs in
let r :: NoFreeVars
= \boundVars ->
r case fun_fvs boundVars of
False -> False
True -> arg_fvs boundVars
in r
...
```

Notice that GHC has now allocated a thunk for each application of `exprFV`

(and
`typeFV`

, although this is not seen in the above excerpt). As we will see
below, this is a result of the fact that GHC’s arity analysis does not
attempt to handle mutual recursion (see `Note [Arity analysis]`

in
`CoreArity.hs`

):

In the case of our untyped `Expr`

AST the simplifier was able to see that
`exprFV`

was of arity 2 (as we look through casts) and consequently
eta-expanded its right-hand side to yield the following Core after inlining
(ignoring casts):

```
= \expr boundVars ->
exprFV case expr of
EApp fun arg ->
case fun_fvs boundVars of
False -> False
True -> arg_fvs boundVars
```

By contrast, in the typed AST arity analysis conservatively concludes that
`exprFV`

is arity 1 and consequently this eta expansion does not happen.
Consequently `(<>)`

will be inlined and have its arguments `let`

-bound (which
GHC does to ensure inlining does not compromise sharing).

There are two ways via which this can be remedied:

*Manually eta expand*. This is what GHC’s free variable traversal did for many years. Unfortunately this means giving up on the ability to`exprFV`

and the like`newtype`

`FV`

.*Convince GHC to push*in to the lambda in`fun_fvs`

and`arg_fvs`

`r`

. This is what the rest of this post will be about.

## The magical `oneShot`

primitive

GHC calls the transform of pushing a `let`

binding down an AST tree *floating
in*. In general GHC will not float a `let`

binding into a lambda for
good reason: doing so may reduce sharing. For instance, if we have:

```
= do
main let nthPrime :: Integer -> Integer
=
nthPrime let primes :: [Integer]
= {- insert favorite method for enumerating primes here -}
primes in \n -> primes !! n
print (nthPrime 10)
print (nthPrime 100)
```

We must ensure that `primes`

is only evaluated once, not once for each
application. Consequently GHC will not float a `let`

binding into a
lambda unless it can prove that the lambda is applied at most once (such a
lambda is said to be a *one-shot* lambda).

In the case of our free variable traversal, we indeed expect each `NoFreeVars`

value to be applied to precisely one bound-variable set (which, recall, is the
argument to the function inside a `NoFreeVars`

). Happily, GHC gives us the
means to communicate this expectation via the `GHC.Magic.oneShot`

combinator:

```
oneShot :: (a -> b) -> a -> b
= {- there be dragons here -} oneShot
```

Applying this primitive to a lambda expression will cause GHC to consider that lambda to be one-shot, even if it can’t prove this is the case.

Applying this to the `(<>)`

definition of `NoFreeVars`

:

```
instance Semigroup NoFreeVars where
NoFreeVars f <> NoFreeVars g =
NoFreeVars $ oneShot $ \boundVars -> -- <=== oneShot here
&& g boundVars f boundVars
```

This one-line change allows the simplifier to push the `let`

bindings seen in
the `EApp`

case of `exprFV`

into the lambda in `r`

as we expected, recovering
the nice efficient code that we saw in the untyped AST traversal.

We can use this same trick to fix the `FV`

strategy. However, note that
`oneShot`

only affects the lambda binder that it is immediately applied to.
Consequently, in the case of `FV`

the incantation is a bit more verbose as we
must break what was previously a single syntactic lambda into several lambdas,
punctuated with `oneShot`

s:

```
instance Semigroup FV where
<> fv2 = FV $ oneShot $ \boundVars -> oneShot $ \acc ->
fv1 runFV fv1 boundVars (runFV fv2 boundVars acc)
```

## Summary

In this post we have examined a few ways of writing free variable
travesals in Haskell. We started with a simple yet inefficient implementation
and then used some simple tricks to improve the efficiency of the produced code
at the expense of verbosity. We then abstracted away this verbosity
and in so doing allowed for easy implementation of more specialised traversal
strategies such as `NoFreeVars`

.

Finally, we saw how a short-coming of GHC’s simplifier can cause poor code
generation for mutually-recursive traversals and described GHC’s `oneShot`

primitive. This allowed us to inform GHC of an assumption made by our
abstraction, enabling GHC to produce the code we would write by hand
from our simple, orthogonal specification of an AST traversal and free
variable strategy.