Accessible layout proposal

From HaskellWiki
Revision as of 11:29, 4 February 2007 by Brianh (talk | contribs) (Adjusted desugaring to deal with existing record syntax)
Jump to navigation Jump to search

Description

The character sequences #(, #[, #$, and # would become reserved tokens that start a new layout block with the following desugaring:

    #( {a;b;c}        ==>     (a,b,c)
    #[ {a;b;c}        ==>     [a,b,c]
    epct #$ {a;b;c}   ==>     epct (a) (b) (c)
    # {a;b;c}         ==>     {a,b,c}

where epct represents an expression, pattern, data constructor, type constructor, or class name. The above tokens would be added to the list of tokens that start layout blocks, and appropriate rules would be added to the grammar itself to perform the above desugarings, for example:

    fexp -> [fexp] aexp
    fexp -> fexp "#$" block<exp>

where block expands to "{" [q (";" q)*] "}" in EBNF. It is important to realise that #$ above is not an operator: it's part of the grammar itself, and should be thought of as a syntactic sugar that "happens" before the "real" grammar is used, even though it should be able to be implemented by adding appropriate extra rules to the grammar itself as above.

Note that #$ effectively puts parentheses round each element of the block where parentheses would be needed in the usual grammar so we can convert a block of exp to a sequence of aexp for example.

Motivation

You might be thinking the above just looks totally far out and mad but the reason for wanting it is that it would allow you to make use of layout to avoid many commas and parentheses. For example consider:

    main = do
              a <- getChar
              bracket_
                  (enter a)
                  (exit a)
                  (do
                      putChar a
                      putStrLn "hello")

The programmer has used indentation and newlines to make it clear that bracket_ has 3 arguments. However the compiler can't see this indentation and newlines, because bracket_ does not introduce layout. However, by using #$, we can eliminate the parentheses:

    main = do
              a <- getChar
              bracket_ #$
                  enter a
                  exit a
                  do
                      putChar a
                      putStrLn "hello"

# allows us to write records:

    let p = Person { personName = "Zarathustra Aurelio"
                   , personAddress = "Ancient Persia"
                   , personAge = 4000
                   }

    let p = Person # personName = "Zarathustra Aurelio"
                     personAddress = "Ancient Persia"
                     personAge = 4000

Remember that # desugars a block of things (sequence separated by semicolons enclosed in braces) into a sequence separated by commas enclosed in braces. We can call the former a semicolon-block and the latter a comma-block.

#( would be useful in import and export lists (and everywhere else tuple syntax is used):

    module Foo
        ( Zap(Con1, Con2, Con3)
        , mkZap
        , count
        ) where

could instead be written as:

    module Foo #(
        Zap(Con1, Con2, Con3)
        mkZap
        count
        where

or even:

    module Foo #(
        Zap #(
            Con1
            Con2
            Con3
        mkZap
        count
        where

We could also add some rules to the grammar to allow comma-blocks to be used anywhere that the tuple notation is currently used for non-tuples. Then the example above could be written as:

    module Foo #
        Zap #
            Con1
            Con2
            Con3
        mkZap
        count
        where

Not only does the above look less cluttered than the current syntax which uses parentheses and commas, it also allows you to easily reorder things without having to bother with that pesky comma that always ends up in the wrong place when you cut and paste...

Also, we could add rules to allow data types to be defined using comma-blocks instead (or as well as) the | notation:

    data T = One | Two | Three

could also be written using block notation as:

    data T = {One, Two, Three}

which then allows:

    data T = # One
               Two
               Three

Implications

#(, #[, #$, and # would become reserved tokens so #$ and # could no longer be used as symbols (operators). Allowing comma-blocks to be used in place of parenthesised import/export/deriving/predicate lists should not have any negative effects because such uses occupy a hitherto unused space in the grammar (as far as I can tell Brianh 16:12, 3 February 2007 (UTC)).

Why are there two kinds of block?

In the above we've had to make a distinction between blocks with comma-separated elements and blocks with semicolon-separated elements. This distinction was necessary because the Haskell98 record syntax requires a comma-separated block. Perhaps if the original grammar had been designed with recursive descent parsing in mind, the concept of "block of things" would have arisen and been given only one syntax, namely semicolon separated elements in braces. However as it is the # rule easily allows us to use semicolon-blocks where comma-blocks are needed as the record example shows.