New directions for Template Haskell

simonpj - 2010-10-18

Nota bene: TemplateHaskell/BlogPostChanges is a copy of this blog post, but with subsequent edits and improvements. Don’t pay too much attention to the text below; I’m keeping it only so that you can see the context for comments.

This post explores a set of design proposals for Template Haskell. They are inspired by discussion with Tim Sheard, Kathleen Fisher, and Jacques Carette. It was originally triggered by several Template Haskell tickets: including #4230, #4135, #4128, #4170, #4125, #4124, #4364, #6062, #6089. (See also #7016, which work better with the suggestions below.) Taken together, these proposals would make quite a big change to TH, I think for the better. Happily, I’m pretty sure they are relatively easy to implement.

There’s an interesting critique of Template Haskell on !StackOverflow, some (but not all) of which is addressed by proposals here.

But I’d like to get the design right; hence this post. I’m going to treat it as a working draft, and modify it in the light of comments. So please comment.

I’m going to assume that you are more or less familiar with Template Haskell. If not, there’s lots of background on the Template Haskell page. It’s a Wiki so you can help to improve it.

Here’s the implementation page the describes how we are going to implement this stuff.

Some brief background

After parsing, GHC runs two completely separate passes, one after the other: * The renamer resolves scopes, fixing precisely which binding site is connected which occurrence of every variable. For example, you write

let x = \\x -> x+1 in x

and the renamer changes it to

let x_77 = \\x_78 -> x_78 + 1 in x_77

The renamer also performs depdenency analysis, which sorts bindings (both value declarations and type declarations) into the smallest possible mutually recursive groups. This prior sorting is required to maximise polymorphism in mutually recursive value bindings.

The typechecker works on the renamed program, and typechecks it.

At the moment these two passes relate to Template Haskell as follows: * Quasi-quotes are run in the renamer. Why? Because quasi-quotes can expand to patterns. Consider this, which has a quasi-quoted pattern:

\\x -> \\ [pads| blah |] -> x+1

Is the “x” in “x+1” bound by the outer \\x or by the ‘x’ that might be bought into scope by the [pads| blah |] quasi-quote? The only way to know is to run the quasi-quote, so that’s what happens.

All other Template Haskell stuff is run in the typechecker. Why? Because we try to typecheck quotations before feeding them into TH functions. More on this below.

# The main issue
# A proposal
TH currently embodies an uneasy compromise between being too strongly typed and too weakly typed. So my proposal is to move TH in both directions at once: * Part A: Move the existing structures towards the expressive but weakly-typed end. * Part B: Add new MetaML-style constructs for strongly-typed metaprogramming. * Part C: Clean up and improve reification * Part D: Quasi-quotation improvements

Part A: Reduce typechecking for quotes

On the “make-it-weaker” front, here’s what I propose:

Cease typechecking TH quotes altogether. Instead, to use GHC’s terminology, we would rename a quote, but not typecheck it. The renaming pass ensures that the scope hygiene mechanisms would remain unchanged. By not attempting to typecheck we avoid all the tricky problems sketched above.
Add pattern splices and local declaration splices, as requested in #1476. For example

-- mkPat :: Q Pat -> Q Pat
f $(mkPat [p| x |]) = x+1

-- mkDecl :: Int -> Q [Dec]
g x = let $(mkDecl 3) in ...

These are not supported today because they bring new variables into scope, and hence are incompatible with running splices only after the renamer is finished; see Notes on Template Haskell, section 8.

Run TH splices in the renamer, uniformly with quasi-quotes. Of course, we must still typecheck the code we are about to run. But there’s an existing TH restriction that code run in top-level splices must be imported. So we can typecheck this code even during renaming, because it can only mention imported (and hence already fully-typechecked) code.

This solves #4364 because we run the splice in the renamer, so things are sorted out by the time we are checking for cycles (in the type checker).

Allow quoted names as patterns as requested by Conal Eliott. This is just a variation on allowing splices in patterns, since a quoted name 'x is really just a simple splice

These changes would essentially implement the desires of those who say “I just want to generate syntax trees”. All the mentioned bug reports would be fixed. The big loss is that quotes would not be typechecked at all.

Lexical scoping

Consider these definitions:

g :: Int -> Q Pat

y :: Int
y = 7

f :: Int -> Q Exp
f n = [| \\ $(g n) -> y+1 |]

Where is the ‘y’ bound in the RHS of f?
* Perhaps by the y = 7 that is in scope at the definition of f? * Perhaps by the pattern that $(g n) expands to?
* Perhaps by a ‘y’ that is in scope at the splice site of f? * Does it depend on whether $(g n) in fact binds ‘y’? A major point about TH is that we get lexical scoping (also called “hygienic”). So, to me it seems the the first of these choices is the only reasonable one. If you want the second you can instead use explicit dynamic binding by saying

f n = [| \\ $(g n) -> $(return (VarE (mkName "n"))) + 1 |]

So the rule would be: * In a quote, a variable ‘v’ is bound by the lexically enclosing binding of ‘v’, ignoring all pattern and declaration splices. To be consistent this should apply to top level splices too.

A variation (probably not)

A possible, rather ad hoc, variation would be to still typecheck quotes that are (a) top level, and (b) expression quotes. For example, we might still reject this:

f x = [| $x + (True 'c') |]

because the quote is obviously ill-typed. Only quotes nested inside top-level splices would avoid the type checker (because if the splice is run in the renamer, we can’t typecheck the nexted quote). For example:

$(f [| True 'c' |])

This splice would run in the renamer, and only the result of the splice would be typechecked. But what about this?

f t = [| let g :: $t-> Int; g = e in .... |]

This is still very awkward to typecheck. After all, if $t expands to a polymorphic type, the result of the splice might typecheck, but it’s really impossible to typecheck without knowing the signature. Maybe we should just give up if there’s a type splice? The only really simple thing is not to typecheck quotes at all.

## Part B: Add MetaML-style typed quotes
## Part C: Reification and typechecking
The third part of this proposal concerns reification. The Q monad gives you access to the typechecker’s environment. Notably, Template Haskell provides the function
\| ClassI – A class Dec – Declaration of the class [ClassInstance]- The instances of that class
…etc… ```
### What reify sees
A dark corner of `reify` is this: what types does `reify` see? Consider
It seems to me that the only decent, predictable story is to say that `reify` can only consult the top-level type environment. More specifically, Template Haskell processes a program top-down:
So the proposal is as follows. A declaration group is the chunk of declarations created by a top-level declaration splice, plus those following it, down to but not includig the next top-level declaration splice. Then the type environment seen by `reify` includes all the declaration up to the end of the immediately preceding declaration block, but no more.
So, in the preceding example: * A `reify` inside the splice `$(th1 ..)` would see the definition of `f`. * A `reify` inside the splice `$(blah)` woudl see the definition of `f`, but would not see any bindings created by `$(th1...)`. * A `reify` inside the splice `$(th2..)` would see the definition of `f`, all the bindings created by `$(th1..)`, and teh definition of `h`. * A `reify` inside the splice `$(blah2)` would see the same definitions as the splice `$(th2...)`.
This would mean that you could not say
### Reifying expressions
But what about expressions? It would be useful (stand up Kathleen) to have a more elaborate reify, like this:
You might ask whether we can typeckeck an expression; remember, these `Q ty` things are going to be run in the renamer. But if the type environment is that in force just before the last top-level splice, then all is well: that stuff has been fully typechecked.

Part D: quasiquotation

This part is unrelated to the preceding proposals, and is responding to #4372 and #2041.

For #2041, rather than the proposal made there, I think the nicest thing is for Language.Haskell.TH to expose a Haskell quasiquoter:

parseHaskell :: QuasiQuoter

Remember that a QuasiQuoter is a quadruple of parsers:

data QuasiQuoter = QuasiQuoter { quoteExp  :: String -> Q Exp,
                                 quotePat  :: String -> Q Pat,
                                 quoteType :: String -> Q Type,
                                 quoteDec  :: String -> Q [Dec] }

If TH provided such parsers, you could use them to parse antiquotes. That seems better to than having strings in the TH syntax.

See #4430 for an excellent point about fixities.

For #4372, I’m a bit agnostic. There is no technical issue here; it’s just about syntax. Read the notes on the ticket.
See #4429 for a suggestion about reifying Names.

Part E: Other minor issues

This section collects other TH changes that I think should be done.

The InfixE construtor of Syntax.Exp should only allow a Var in the operator position. See Trac #4877