Accessible layout proposal

From HaskellWiki
Revision as of 16:12, 3 February 2007 by Brianh (talk | contribs) (Initial draft)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Description

The character sequences #(, #[, #$, and # would become reserved tokens that start a new layout block with the following desugaring:

    #( {a;b;c}        ===     (a,b,c)
    #[ {a;b;c}        ===     [a,b,c]
    epct #$ {a;b;c}   ===     epct a b c
    # {a;b;c}         ===     {a;b;c}

where epct represents an expression, pattern, data constructor, type constructor, or class name. The above tokens would be added to the list of tokens that start layout blocks, and appropriate rules would be added to the grammar itself to perform the above desugarings, for example:

    fexp -> [fexp] aexp
    fexp -> fexp "#$" block<aexp>

where block<aexp> expands to "{" [aexp (";" aexp)*] "}" in EBNF. It is important to realise that #$ above is not an operator: it's part of the grammar itself, and should be thought of as a syntactic sugar that "happens" before the "real" grammar is used, even though it should be able to be implemented by adding appropriate extra rules to the grammar itself as above.

Motivation

You might be thinking the above just looks totally far out and mad but the reason for wanting it is that it would allow you to make use of layout to avoid many commas and parentheses. For example consider:

    main = do
              a <- getChar
              bracket_
                  (enter a)
                  (exit a)
                  (do
                      putChar a
                      putStrLn "hello")

The programmer has used indentation and newlines to make it clear that bracket_ has 3 arguments. However the compiler can't see this indentation and newlines, because bracket_ does not introduce layout. However, by using #$, we can eliminate the parentheses:

    main = do
              a <- getChar
              bracket_ #$
                  enter a
                  exit a
                  do
                      putChar a
                      putStrLn "hello"

# would allow records to be laid out if we allowed records to also be written using semicolons instead of commas:

    let p = Person { personName = "Zarathustra Aurelio"
                   ; personAddress = "Ancient Persia"
                   ; personAge = 4000
                   }

    let p = Person # personName = "Zarathustra Aurelio"
                     personAddress = "Ancient Persia"
                     personAge = 4000

The above illustrates the power of choosing a single construct - the block which is a semicolon separated list of elements enclosed in braces - in many different situations in the language, which is a very natural way of thinking if you try to write a recursive descent parser for Haskell, but is not supported by the current method of describing the grammar using a CFG, where piecemeal ad-hoc syntax can slip past undetected due to the need to keep duplicating a mapping from syntax to the same concept ("a (possibly ordered) set of things") in many different places in the grammar.

#( would be useful in import and export lists (and everywhere else tuple syntax is used):

    module Foo
        ( Zap(Con1, Con2, Con3)
        , mkZap
        , count
        ) where

could instead be written as:

    module Foo #(
        Zap(Con1, Con2, Con3)
        mkZap
        count
        where

or even:

    module Foo #(
        Zap #(
            Con1
            Con2
            Con3
        mkZap
        count
        where

Similarly to the case for records, we could note that import and export lists are really sets of things, and so perhaps we could allow a simple block to be used instead of using the "tuple" notation. Then the example above could be written as:

    module Foo #
        Zap #
            Con1
            Con2
            Con3
        mkZap
        count
        where

Not only does the above look less cluttered than the current syntax which uses parentheses and commas, it also allows you to easily reorder things without having to bother with that pesky comma that always ends up in the wrong place when you cut and paste...

Also, we could add rules to allow data types to be defined using blocks instead (or as well as) the | notation:

    data T = One | Two | Three

could also be written using block notation as:

    data T = {One; Two; Three}

which then allows:

    data T = # One
               Two
               Three

Implications

#(, #[, #$, and # would become reserved tokens so #$ and # could no longer be used as symbols (operators). Allowing blocks to be used in place of parenthesised import/export/deriving/predicate lists should not have any negative effects because such uses occupy a hitherto unused space in the grammar (as far as I can tell Brianh 16:12, 3 February 2007 (UTC)). A possible parsing challenge would be allowing semicolons to be used as the separator for record field lists (thus turning a record field list into a standard block of fields) while at the same time also allowing commas to be used here for backwards compatibility, though this should hopefully not be insurmountable.