Ryan Ingram ryani.spam at gmail.com
Tue Aug 14 12:52:50 CEST 2012

```On Tue, Aug 14, 2012 at 1:04 AM, Евгений Пермяков <permeakra at gmail.com>wrote:

>  Your idea looks _much_ better from code clarity point of view, but it's
> unclear to me, how to deal with it internally and in error messages. I'm
> not a compiler guy, though.
>

How to deal with it internally: It's pretty easy, actually.  The hardest
part is implementing an extensible partial order; once you have that and
you can use it to drive comparisons, parsing is not hard.

Basically, at each step when you read an operator token, you need to decide
to "shift", that is, put it onto a stack of operations, "reduce", that is,
apply the operator at the top of the stack (leaving the current token to
check again at the next step), or give a parse error.  The rules for
deciding which of those to do are pretty simple:

Given X, the operator at the top of the stack, and Y, the operator you just

(1) Compare the precedence of X and Y.  If they are incomparable, it's a
parse error.
(2) If Y is higher precedence than X, shift.
(3) If Y is lower precedence than X, reduce.

(At this point, we know X and Y have equal precedence)

(4) If X or Y is non-associative, it's a parse error.
(5) If X and Y don't have the same associativity, it's a parse error.

(At this point we know X and Y have the same associativity)

(6) If X and Y are left associative, reduce.
(7) Otherwise they are both right associative, shift.

So, for example, reading the expression

x * y + x + g w \$ z

The empty operator has lower precedence than anything else (that is, it
will never be reduced).  When you finish reading an expression, reduce
until the empty operator is the only thing on the stack and return its
expression.

* is higher precedence than empty, shift.  [empty x, * y]
+ is lower precedence than *, reduce. [empty (x*y)]
+ is higher precedence than empty, shift. [empty (x*y), + x]
+ is the same precedence as +, and is left associative, reduce.  [empty
((x*y)+x)]
+ is higher precedence than empty, shift [empty ((x*y)+x), + g]
function application is higher precedence than +, shift. [empty ((x*y)+x),
+ g, APP w]
\$ is lower precedence than function application, reduce. [empty ((x*y)+x),
+ (g w)]
\$ is lower precedence than +, reduce. [empty (((x*y)+x) + (g w))]
\$ is higher precedence than empty, shift. [empty (((x*y)+x) + (g w)), \$ z]
Done, but the stack isn't empty.  Reduce.  [empty ((((x*y)+x) + (g w)) \$ z)]
Done, and the stack is empty.
Result: ((((x*y)+x) + (g w)) \$ z)

Each operator is shifted exactly once and reduced exactly once, so this
algorithm runs in a number of steps linear in the expression size.
Parentheses start a new sub-stack when parsing the 'thing to apply the
operator to' part of the expression.

Something like this:

simple_exp :: Parser Exp
simple_exp =
(ExpId <\$> identifier) <|> (ExpLit <\$> literal) <|> (lparen *>
expression <* rparen)

expression :: Parser Exp
expression = do
first <- simple_exp
binops [ (Empty, first) ]

reduceAll [ (Empty, e) ] = return e
reduceAll ((op1, e1) : (op2, e2) : rest) = reduceAll ((op2, (ExpOperator
op1 e1 e2)) : rest)

binops :: Stack -> Parser Exp
binops s = handle_binop <|> handle_application <|> reduceAll s where
handle_binop = do
op <- operator
rhs <- simple_exp
reduce_until_shift op rhs s
handle_application = do
rhs <- simple_exp
reduce_until_shift FunctionApplication rhs s

reduce_until_shift implements the algorithm above until it eventually
shifts the operator onto the stack.
identifier parses an identifier, operator parses an operator, literal
parses a literal (like 3 or "hello")
lparen and rparen parse left and right parentheses.

I haven't considered how difficult it would be to expand this algorithm to
support unary or more-than-binary operators; I suspect it's not
ridiculously difficult, but I don't know.  Haskell's support for both of
those is pretty weak, however; even the lip service paid to unary - is a
source of many problems in parsing Haskell.

Worse, it does not allow to set up fixity relative to operator that is not
> in scope and it will create unnecessary intermodule dependencies.  One
> should fall back to numeric fixities for such cases, if it is needed.
>

You can get numeric fixity by just declaring precedence equal to some
prelude operator with the desired fixity; this will likely be the common
case.

I would expect modules to declare locally relative fixities between
operators imported from different modules if and only if it was relevant to
that module's implementation.  In most cases I expect the non-ordering to
directives; for example, even though (a == b == c) would be a parse error
due to == being non-associative, both ((a == b) == c) and (a == (b == c))
are not.  The same method of 'just add parentheses where you mean it' fixes
any parse error due to incomparable precedences.

-- ryan
-------------- next part --------------
An HTML attachment was scrubbed...
```