6.4. Grammar

The grammar section comes after the directives, separated from them by a double-percent (%%) symbol. This section contains a number of productions, each of which defines a single non-terminal. Each production has the following syntax:

<non-terminal> [ :: { <type> } ]
        :  <id> ... {[%] <expression> }
      [ |  <id> ... {[%] <expression> }
        ... ]

The first line gives the non-terminal to be defined by the production and optionally its type (type signatures for productions are discussed in Section 2.4, “Type Signatures”).

Each production has at least one, and possibly many right-hand sides. Each right-hand side consists of zero or more symbols (terminals or non-terminals) and a Haskell expression enclosed in braces.

The expression represents the semantic value of the non-terminal, and may refer to the semantic values of the symbols in the right-hand side using the meta-variables $1 ... $n. It is an error to refer to $i when i is larger than the number of symbols on the right hand side of the current rule. The symbol $ may be inserted literally in the Haskell expression using the sequence \$ (this isn't necessary inside a string or character literal).

Additionally, the sequence $> can be used to represent the value of the rightmost symbol.

A semantic value of the form {% ... } is a monadic action, and is only valid when the grammar file contains a %monad directive (Section 6.3.5, “Monad Directive”). Monadic actions are discussed in Section 2.5, “Monadic Parsers”.

Remember that all the expressions for a production must have the same type.

6.4.1. Parameterized Productions

Starting from version 1.17.1, Happy supports parameterized productions which provide a convenient notation for capturing recurring patterns in context free grammars. This gives the benefits of something similar to parsing combinators in the context of Happy grammars.

This functionality is best illustrated with an example:

opt(p)          : p                   { Just $1 }
                |                     { Nothing }

rev_list1(p)    : p                   { [$1] }
                | rev_list1(p) p      { $2 : $1 }

The first production, opt, is used for optional components of a grammar. It is just like p? in regular expressions or EBNF. The second production, rev_list1, is for parsing a list of 1 or more occurrences of p. Parameterized productions are just like ordinary productions, except that they have parameter in parenthesis after the production name. Multiple parameters should be separated by commas:

fst(p,q)        : p q                 { $1 }
snd(p,q)        : p q                 { $2 }
both(p,q)       : p q                 { ($1,$2) }

To use a parameterized production, we have to pass values for the parameters, as if we are calling a function. The parameters can be either terminals, non-terminals, or other instantiations of parameterized productions. Here are some examples:

list1(p)        : rev_list1(p)        { reverse $1 }
list(p)         : list1(p)            { $1 }
                |                     { [] }

The first production uses rev_list to define a production that behaves like p+, returning a list of elements in the same order as they occurred in the input. The second one, list is like p*.

Parameterized productions are implemented as a prepossessing pass in Happy: each instantiation of a production turns into a separate non-terminal, but are careful to avoid generating the same rule multiple times, as this would lead to an ambiguous grammar. Consider, for example, the following parameterized rule:

sep1(p,q)       : p list(snd(q,p))    { $1 : $2 }

The rules that would be generated for sep1(EXPR,SEP)

  : EXPR list(snd(SEP,EXPR))                { $1 : $2 }

  : list1(snd(SEP,EXPR))                    { $1 }
  |                                         { [] }

  : rev_list1(snd(SEP,EXPR))                { reverse $1 }

  : snd(SEP,EXPR))                          { [$1] }
  | rev_list1(snd(SEP,EXPR)) snd(SEP,EXPR)  { $2 : $1 }

  : SEP EXPR                                { $2 }

Note that this is just a normal grammar, with slightly strange names for the non-terminals.

A drawback of the current implementation is that it does not support type signatures for the parameterized productions, that depend on the types of the parameters. We plan to implement that in the future---the current workaround is to omit the type signatures for such rules.