Haskell Proposal: Syntactic Sugar for Arrows

Ross Paterson, 10th September 1999 (revised 22nd May 2000).

John Hughes has defined a new abstract view of computation, in his paper Generalising Monads to Arrows (old draft). That paper uses a point-free style. This proposal introduces a notation for arrows that resembles that used for monads. The semantics is defined by translation into the core language. Familiarity with the arrows paper is assumed. Comments welcome.

New: A prototype implementation exists.

This proposal supersedes the initial version, which was a bit lighter, but was unable to cope with operators. There is a paper discussing the notation presented here, including what the arrow laws look like. This and other stuff on arrows may be found on my arrows page.

The proposal is laid out in three stages. The first stage is the largest (and most fully described); it would be useful by itself. The others add further convenience.

In general, the proposal introduces new things called commands, with a similar syntax to expressions, but whose meaning is given by a translation to quite different expressions. Thus inside an arrow expression there are seemingly familiar things like \, if, case and do, with similar, but different, meaning to what you expect. It's a matter of opinion how much of this `abuse of notation' is helpful.

1. An analogue of raw monad syntax

The first level of sugar gives something similar to the raw monad syntax. For example, here is a reformulation of the interpreter from the arrows paper (think of `proc' as a variant of lambda, and `-<' (arrow tail) as a sort of application):

    eval (Var s) = proc env ->
                    returnA -< fromJust (lookup s env)
    eval (Add e1 e2) = proc env ->
                    (eval e1 -< env) `bind` \ ~(Num u) ->
                    (eval e2 -< env) `bind` \ ~(Num v) ->
                    returnA -< Num (u + v)
    eval (If e1 e2 e3) = proc env ->
                    (eval e1 -< env) `bind` \ ~(Bl b) ->
                    if b    then eval e2 -< env
                            else eval e3 -< env
    eval (Lam x e) = proc env ->
                    returnA -< Fun (proc v -> eval e -< (x,v):env)
    eval (App e1 e2) = proc env ->
                    (eval e1 -< env) `bind` \ ~(Fun f) ->
                    (eval e2 -< env) `bind` \ v ->
                    f -< v

The above code makes use of the following (standard Haskell) definitions:

    returnA :: Arrow a => a b b
    returnA = arr id

    bind :: Arrow a => a b c -> a (b,c) d -> a b d
    e `bind` f = returnA &&& e >>> f

Syntax

New reserved words: proc, -< and form. (Can anyone think of better names?)

Add the productions

    exp  ->  ...
         |   proc apat -> cmd

    cmd  ->  fexp -< aexp                        (Arrow application)
         |   \ apat₁ ... apat_n -> cmd            (kappa-abstraction)
         |   form aexp cmd₁ ... cmd_n             (control operator, n >= 0)
         |   cmd₁ qop cmd₂                       (short for `form (qop) cmd₁ cmd₂')
         |   let decls in cmd                    (local definitions)
         |   if exp then cmd else cmd
         |   case exp of { pat₁ -> cmd₁; ... ; pat_n -> cmd_n [;] }
         |   (cmd)                               (same as `cmd')

In contrast to monads, it is necessary to treat conditionals specially, as will become clear in the translation. (I have omitted guards from the case for simplicity. Adding them doesn't raise any extra issues.)

The above grammar is vague about precedence. I would expect arrow application to bind more tightly than let and conditionals, but perhaps more loosely than infix control operators and control operator application.

Translation

The following equations define a naive translation into current Haskell. There is plenty of scope for improvement, for example by assuming that arr preserves composition, dead variable elimination, simplifying patterns and so on.

The first step is to eliminate head patterns that could fail. These are replaced by failure-free patterns (defined as in Haskell 1.4):

    proc p -> c =
        arr (\x -> case x of { p -> Left (Flatten(p)); _ -> Right x }) >>>
        (proc Flatten(p) -> c) ||| zeroArrow

where Flatten(p) is a failure-free pattern with the same irrefutable subpatterns as p. One way to do this is by replacing all non-unique constructors with tuples, e.g. Flatten(y:ys) could be (y,ys).

(Note that this translation requires an arrow in both ArrowChoice and ArrowZero. A Haskell-98-style definition would only eliminate the ArrowZero, while changing the Arrow class.)

From here on, the head pattern is assumed to be failure-free. The translation rules are

    proc p -> f -< e = arr (\ p -> e) >>> f           if Vars(f) and Vars(p) disjoint
                     = arr (\ p -> (f, e)) >>> app    otherwise

    proc p -> \ p₁ ... p_n -> c = proc (...(p, p₁), ... p_n) -> c

    proc p -> form e c₁ ... c_n =
        e (proc p -> c₁) ... (proc p -> c_n)

    proc p -> let decls in c =
        arr (\ p -> let decls in (p,p')) >>>          Vars(p') = Defvars(decls)
	(proc (p,p') -> c)

    proc p -> if e then c₁ else c₂ =
        arr (\ p -> if e then Left p else Right p) >>>
        (proc p -> c₁) ||| (proc p -> c₂)

    proc p -> case e of { p₁ -> c₁ ; ... ; p_n -> c_n } =
        arr (\ p ->
             case e of
             p₁ -> Left (p, Flatten(p₁))
             ...
             p_n-1 -> Right^n-2 (Left (p, Flatten(p_n-1)))
             p_n -> Right^n-1 (p, Flatten(p_n))) >>>
        (proc (p, Flatten(p₁)) -> c₁) ||| ... ||| (proc (p, Flatten(p_n)) -> c_n)

Some points to note:

The first translation of arrow application should be used if possible, as the other one requires that the arrow belong to ArrowApply.
Control operators operate on commands (not values) to define new commands.
Unfortunately, variables defined by let are monomorphic. (This can be fixed with a messy translation using first class polymorphism.)
The conditionals if and case require that the arrow belong to ArrowChoice.
In the translation of case, I have assumed that ||| is right-associative.

Extra restrictions on control operators

A control operator expression (the aexp after form) must not mention variables defined within the proc expression, and must have type

    a₁ s₁ t₁ -> ...  a_n s_n t_n -> a s t            (n >= 0)

where

the a's are arrows (possibly different),
the s's are either of the form (s,t) or the same type variable b,
none of the t's mention b.

(They should also have the corresponding naturality property, but that's the programmer's responsibility. All this is required to enable certain transformations, including the optimizations mentioned above.)

For example, bind above has a type of this form. Some other examples are

    (&&&) :: Arrow a => a b c -> a b d -> a b (c,d)
    zero  :: ArrowZero a => a b c
    (<+>) :: ArrowPlus a => a b c -> a b c -> a b c
    loop  :: ArrowLoop a => a (b,d) (c,d) -> a b c

Many more can be defined.

2. Emulating `do`-notation

The above proposals create a syntax similar to the raw monadic style. Indeed, the rules defining do-notation can be readily adapted to this syntax, which making it a little more pleasant. Then the interpreter example becomes

    eval (Var s) = proc env ->
                    returnA -< fromJust (lookup s env)
    eval (Add e1 e2) = proc env -> do
                    ~(Num u) <- eval e1 -< env
                    ~(Num v) <- eval e2 -< env
                    returnA -< Num (u + v)
    eval (If e1 e2 e3) = proc env -> do
                    ~(Bl b) <- eval e1 -< env
                    if b    then eval e2 -< env
                            else eval e3 -< env
    eval (Lam x e) = proc env ->
                    returnA -< Fun (proc v -> eval e -< (x,v):env)
    eval (App e1 e2) = proc env -> do
                    ~(Fun f) <- eval e1 -< env
                    v <- eval e2 -< env
                    f -< v

Add another reserved word rec and the productions

    cmd  ->  ...
         |   do { stmt₁ ; ... stmt_n ; cmd [;] }
    stmt ->  cmd
         |   pat <- cmd
         |   let decls
         |   rec { stmt₁ ; ... stmt_n [;] }

The translations of the first three are much the same as ordinary do-notation:

    do { c; rest } = c `bind` \ _ -> do { rest }

    do { p <- c; rest } = c `bind` \ p -> do { rest }

    do { let decls; rest } = let decls in do { rest }

The above rules yield commands, which are then translated using the previous rules.

The final kind of statement has no counterpart in ordinary do-notation, though such an extension was proposed in O'Haskell. It is translated by

    do { rec { ss }; rest } =
        loop (\ p -> do { ss; returnA -< (p, p)}) `bind` \ p -> do { rest }

where p is a pattern collecting the variables defined by ss. (I'm not sure whether variables defined by let definitions should be included here.)

The duplication of p in the above rule looks redundant, but in practice a more efficient translation will send different tuples of variables to the feedback and output.

3. Definition syntax

John Hughes has suggested that one allow definitions like

    f p₁ -< q₁ = c₁
    ...
    f p_n -< q_n = c_n

(though he uses a different symbol than -<.

These could be translated as

    f x = proc y ->
          case (x,y) of
          (p₁, q₁) -> c₁
          ...
          (p_n, q_n) -> c_n

(The generalizations to more arguments, and to guards, are straightforward.) If there were only one clause, and all patterns were failure-free, the case could be avoided. Otherwise, the arrow must belong to ArrowChoice. (One might think that discriminating between the p's could be moved out of the arrow. But this transformation would be invalid for some arrows, in particular, for stream processors.)

John gave the following example:

    (f ||| g) << Left x = f << x
    (f ||| g) << Right y = g << y

which is certainly a big improvement over

    f ||| g = proc z -> case z of Left x -> f x
                                  Right y -> g y

On the other hand, if all the q_i were failure-free, the following translation would probably be preferable:

    f p₁ = proc q₁ -> c₁
    ...
    f p_n = proc q_n -> c_n

This discontinuity is a bit unpleasant.

John further suggests that proc be replaced by `\ -<', but maybe that's going too far.