7.3. Syntactic extensions

7.3.1. Unicode syntax

The language extension -XUnicodeSyntax enables Unicode characters to be used to stand for certain ASCII character sequences. The following alternatives are provided:

ASCIIUnicode alternativeCode pointName
::::0x2237PROPORTION
=>0x21D2RIGHTWARDS DOUBLE ARROW
forall0x2200FOR ALL
->0x2192RIGHTWARDS ARROW
<-0x2190LEFTWARDS ARROW
-<0x2919LEFTWARDS ARROW-TAIL
>-0x291ARIGHTWARDS ARROW-TAIL
-<< 0x291BLEFTWARDS DOUBLE ARROW-TAIL
>>- 0x291CRIGHTWARDS DOUBLE ARROW-TAIL
*0x2605BLACK STAR

7.3.2. The magic hash

The language extension -XMagicHash allows "#" as a postfix modifier to identifiers. Thus, "x#" is a valid variable, and "T#" is a valid type constructor or data constructor.

The hash sign does not change semantics at all. We tend to use variable names ending in "#" for unboxed values or types (e.g. Int#), but there is no requirement to do so; they are just plain ordinary variables. Nor does the -XMagicHash extension bring anything into scope. For example, to bring Int# into scope you must import GHC.Prim (see Section 7.2, “Unboxed types and primitive operations”); the -XMagicHash extension then allows you to refer to the Int# that is now in scope. Note that with this option, the meaning of x#y = 0 is changed: it defines a function x# taking a single argument y; to define the operator #, put a space: x # y = 0.

The -XMagicHash also enables some new forms of literals (see Section 7.2.1, “Unboxed types”):

  • 'x'# has type Char#

  • "foo"# has type Addr#

  • 3# has type Int#. In general, any Haskell integer lexeme followed by a # is an Int# literal, e.g. -0x3A# as well as 32#.

  • 3## has type Word#. In general, any non-negative Haskell integer lexeme followed by ## is a Word#.

  • 3.2# has type Float#.

  • 3.2## has type Double#

7.3.3. Negative literals

The literal -123 is, according to Haskell98 and Haskell 2010, desugared as negate (fromInteger 123). The language extension -XNegativeLiterals means that it is instead desugared as fromInteger (-123).

This can make a difference when the positive and negative range of a numeric data type don't match up. For example, in 8-bit arithmetic -128 is representable, but +128 is not. So negate (fromInteger 128) will elicit an unexpected integer-literal-overflow message.

7.3.4. Fractional looking integer literals

Haskell 2010 and Haskell 98 define floating literals with the syntax 1.2e6. These literals have the type Fractional a => a.

The language extension -XNumDecimals allows you to also use the floating literal syntax for instances of Integral, and have values like (1.2e6 :: Num a => a)

7.3.5. Hierarchical Modules

GHC supports a small extension to the syntax of module names: a module name is allowed to contain a dot ‘.’. This is also known as the “hierarchical module namespace” extension, because it extends the normally flat Haskell module namespace into a more flexible hierarchy of modules.

This extension has very little impact on the language itself; modules names are always fully qualified, so you can just think of the fully qualified module name as the module name. In particular, this means that the full module name must be given after the module keyword at the beginning of the module; for example, the module A.B.C must begin

module A.B.C

It is a common strategy to use the as keyword to save some typing when using qualified names with hierarchical modules. For example:

import qualified Control.Monad.ST.Strict as ST

For details on how GHC searches for source and interface files in the presence of hierarchical modules, see Section 4.7.3, “The search path”.

GHC comes with a large collection of libraries arranged hierarchically; see the accompanying library documentation. More libraries to install are available from HackageDB.

7.3.6. Pattern guards

The discussion that follows is an abbreviated version of Simon Peyton Jones's original proposal. (Note that the proposal was written before pattern guards were implemented, so refers to them as unimplemented.)

Suppose we have an abstract data type of finite maps, with a lookup operation:

lookup :: FiniteMap -> Int -> Maybe Int

The lookup returns Nothing if the supplied key is not in the domain of the mapping, and (Just v) otherwise, where v is the value that the key maps to. Now consider the following definition:

clunky env var1 var2 | ok1 && ok2 = val1 + val2
| otherwise  = var1 + var2
where
  m1 = lookup env var1
  m2 = lookup env var2
  ok1 = maybeToBool m1
  ok2 = maybeToBool m2
  val1 = expectJust m1
  val2 = expectJust m2

The auxiliary functions are

maybeToBool :: Maybe a -> Bool
maybeToBool (Just x) = True
maybeToBool Nothing  = False

expectJust :: Maybe a -> a
expectJust (Just x) = x
expectJust Nothing  = error "Unexpected Nothing"

What is clunky doing? The guard ok1 && ok2 checks that both lookups succeed, using maybeToBool to convert the Maybe types to booleans. The (lazily evaluated) expectJust calls extract the values from the results of the lookups, and binds the returned values to val1 and val2 respectively. If either lookup fails, then clunky takes the otherwise case and returns the sum of its arguments.

This is certainly legal Haskell, but it is a tremendously verbose and un-obvious way to achieve the desired effect. Arguably, a more direct way to write clunky would be to use case expressions:

clunky env var1 var2 = case lookup env var1 of
  Nothing -> fail
  Just val1 -> case lookup env var2 of
    Nothing -> fail
    Just val2 -> val1 + val2
where
  fail = var1 + var2

This is a bit shorter, but hardly better. Of course, we can rewrite any set of pattern-matching, guarded equations as case expressions; that is precisely what the compiler does when compiling equations! The reason that Haskell provides guarded equations is because they allow us to write down the cases we want to consider, one at a time, independently of each other. This structure is hidden in the case version. Two of the right-hand sides are really the same (fail), and the whole expression tends to become more and more indented.

Here is how I would write clunky:

clunky env var1 var2
  | Just val1 <- lookup env var1
  , Just val2 <- lookup env var2
  = val1 + val2
...other equations for clunky...

The semantics should be clear enough. The qualifiers are matched in order. For a <- qualifier, which I call a pattern guard, the right hand side is evaluated and matched against the pattern on the left. If the match fails then the whole guard fails and the next equation is tried. If it succeeds, then the appropriate binding takes place, and the next qualifier is matched, in the augmented environment. Unlike list comprehensions, however, the type of the expression to the right of the <- is the same as the type of the pattern to its left. The bindings introduced by pattern guards scope over all the remaining guard qualifiers, and over the right hand side of the equation.

Just as with list comprehensions, boolean expressions can be freely mixed with among the pattern guards. For example:

f x | [y] <- x
    , y > 3
    , Just z <- h y
    = ...

Haskell's current guards therefore emerge as a special case, in which the qualifier list has just one element, a boolean expression.

7.3.7. View patterns

View patterns are enabled by the flag -XViewPatterns. More information and examples of view patterns can be found on the Wiki page.

View patterns are somewhat like pattern guards that can be nested inside of other patterns. They are a convenient way of pattern-matching against values of abstract types. For example, in a programming language implementation, we might represent the syntax of the types of the language as follows:

type Typ

data TypView = Unit
             | Arrow Typ Typ

view :: Typ -> TypView

-- additional operations for constructing Typ's ...

The representation of Typ is held abstract, permitting implementations to use a fancy representation (e.g., hash-consing to manage sharing). Without view patterns, using this signature a little inconvenient:

size :: Typ -> Integer
size t = case view t of
  Unit -> 1
  Arrow t1 t2 -> size t1 + size t2

It is necessary to iterate the case, rather than using an equational function definition. And the situation is even worse when the matching against t is buried deep inside another pattern.

View patterns permit calling the view function inside the pattern and matching against the result:

size (view -> Unit) = 1
size (view -> Arrow t1 t2) = size t1 + size t2

That is, we add a new form of pattern, written expression -> pattern that means "apply the expression to whatever we're trying to match against, and then match the result of that application against the pattern". The expression can be any Haskell expression of function type, and view patterns can be used wherever patterns are used.

The semantics of a pattern ( exp -> pat ) are as follows:

  • Scoping:

    The variables bound by the view pattern are the variables bound by pat.

    Any variables in exp are bound occurrences, but variables bound "to the left" in a pattern are in scope. This feature permits, for example, one argument to a function to be used in the view of another argument. For example, the function clunky from Section 7.3.6, “Pattern guards” can be written using view patterns as follows:

    clunky env (lookup env -> Just val1) (lookup env -> Just val2) = val1 + val2
    ...other equations for clunky...
    

    More precisely, the scoping rules are:

    • In a single pattern, variables bound by patterns to the left of a view pattern expression are in scope. For example:

      example :: Maybe ((String -> Integer,Integer), String) -> Bool
      example Just ((f,_), f -> 4) = True
      

      Additionally, in function definitions, variables bound by matching earlier curried arguments may be used in view pattern expressions in later arguments:

      example :: (String -> Integer) -> String -> Bool
      example f (f -> 4) = True
      

      That is, the scoping is the same as it would be if the curried arguments were collected into a tuple.

    • In mutually recursive bindings, such as let, where, or the top level, view patterns in one declaration may not mention variables bound by other declarations. That is, each declaration must be self-contained. For example, the following program is not allowed:

      let {(x -> y) = e1 ;
           (y -> x) = e2 } in x
      

      (For some amplification on this design choice see Trac #4061.)

  • Typing: If exp has type T1 -> T2 and pat matches a T2, then the whole view pattern matches a T1.

  • Matching: To the equations in Section 3.17.3 of the Haskell 98 Report, add the following:

    case v of { (e -> p) -> e1 ; _ -> e2 }
     =
    case (e v) of { p -> e1 ; _ -> e2 }
    

    That is, to match a variable v against a pattern ( exp -> pat ), evaluate ( exp v ) and match the result against pat.

  • Efficiency: When the same view function is applied in multiple branches of a function definition or a case expression (e.g., in size above), GHC makes an attempt to collect these applications into a single nested case expression, so that the view function is only applied once. Pattern compilation in GHC follows the matrix algorithm described in Chapter 4 of The Implementation of Functional Programming Languages. When the top rows of the first column of a matrix are all view patterns with the "same" expression, these patterns are transformed into a single nested case. This includes, for example, adjacent view patterns that line up in a tuple, as in

    f ((view -> A, p1), p2) = e1
    f ((view -> B, p3), p4) = e2
    

    The current notion of when two view pattern expressions are "the same" is very restricted: it is not even full syntactic equality. However, it does include variables, literals, applications, and tuples; e.g., two instances of view ("hi", "there") will be collected. However, the current implementation does not compare up to alpha-equivalence, so two instances of (x, view x -> y) will not be coalesced.

7.3.8. Pattern synonyms

Pattern synonyms are enabled by the flag -XPatternSynonyms, which is required for both defining them and using them. More information and examples of view patterns can be found on the Wiki page.

Pattern synonyms enable giving names to parametrized pattern schemes. They can also be thought of as abstract constructors that don't have a bearing on data representation. For example, in a programming language implementation, we might represent types of the language as follows:

data Type = App String [Type]

Here are some examples of using said representation. Consider a few types of the Type universe encoded like this:

  App "->" [t1, t2]          -- t1 -> t2
  App "Int" []               -- Int
  App "Maybe" [App "Int" []] -- Maybe Int

This representation is very generic in that no types are given special treatment. However, some functions might need to handle some known types specially, for example the following two functions collect all argument types of (nested) arrow types, and recognize the Int type, respectively:

  collectArgs :: Type -> [Type]
  collectArgs (App "->" [t1, t2]) = t1 : collectArgs t2
  collectArgs _                   = []

  isInt :: Type -> Bool
  isInt (App "Int" []) = True
  isInt _              = False

Matching on App directly is both hard to read and error prone to write. And the situation is even worse when the matching is nested:

  isIntEndo :: Type -> Bool
  isIntEndo (App "->" [App "Int" [], App "Int" []]) = True
  isIntEndo _                                       = False

Pattern synonyms permit abstracting from the representation to expose matchers that behave in a constructor-like manner with respect to pattern matching. We can create pattern synonyms for the known types we care about, without committing the representation to them (note that these don't have to be defined in the same module as the Type type):

  pattern Arrow t1 t2 = App "->"    [t1, t2]
  pattern Int         = App "Int"   []
  pattern Maybe t     = App "Maybe" [t]

Which enables us to rewrite our functions in a much cleaner style:

  collectArgs :: Type -> [Type]
  collectArgs (Arrow t1 t2) = t1 : collectArgs t2
  collectArgs _             = []

  isInt :: Type -> Bool
  isInt Int = True
  isInt _   = False

  isIntEndo :: Type -> Bool
  isIntEndo (Arrow Int Int) = True
  isIntEndo _               = False

Note that in this example, the pattern synonyms Int and Arrow can also be used as expressions (they are bidirectional). This is not necessarily the case: unidirectional pattern synonyms can also be declared with the following syntax:

  pattern Head x <- x:xs

In this case, Head x cannot be used in expressions, only patterns, since it wouldn't specify a value for the xs on the right-hand side.

The semantics of a unidirectional pattern synonym declaration and usage are as follows:

  • Syntax:

    A pattern synonym declaration can be either unidirectional or bidirectional. The syntax for unidirectional pattern synonyms is:

      pattern Name args <- pat
    

    and the syntax for bidirectional pattern synonyms is:

      pattern Name args = pat
    

    Pattern synonym declarations can only occur in the top level of a module. In particular, they are not allowed as local definitions. Currently, they also don't work in GHCi, but that is a technical restriction that will be lifted in later versions.

    The name of the pattern synonym itself is in the same namespace as proper data constructors. Either prefix or infix syntax can be used. In export/import specifications, you have to prefix pattern names with the pattern keyword, e.g.:

      module Example (pattern Single) where
      pattern Single x = [x]
    
  • Scoping:

    The variables in the left-hand side of the definition are bound by the pattern on the right-hand side. For bidirectional pattern synonyms, all the variables of the right-hand side must also occur on the left-hand side; also, wildcard patterns and view patterns are not allowed. For unidirectional pattern synonyms, there is no restriction on the right-hand side pattern.

    Pattern synonyms cannot be defined recursively.

  • Typing:

    Given a pattern synonym definition of the form

      pattern P var1 var2 ... varN <- pat
    

    it is assigned a pattern type of the form

      pattern CProv => P t1 t2 ... tN :: CReq => t
    

    where CProv and CReq are type contexts, and t1, t2, ..., tN and t are types.

    A pattern synonym of this type can be used in a pattern if the instatiated (monomorphic) type satisfies the constraints of CReq. In this case, it extends the context available in the right-hand side of the match with CProv, just like how an existentially-typed data constructor can extend the context.

    For example, in the following program:

    {-# LANGUAGE PatternSynonyms, GADTs #-}
    module ShouldCompile where
    
    data T a where
    	MkT :: (Show b) => a -> b -> T a
    
    pattern ExNumPat x = MkT 42 x
    

    the pattern type of ExNumPat is

    pattern (Show b) => ExNumPat b :: (Num a, Eq a) => T a
    

    and so can be used in a function definition like the following:

      f :: (Num t, Eq t) => T t -> String
      f (ExNumPat x) = show x
    

    For bidirectional pattern synonyms, uses as expressions have the type

      (CProv, CReq) => t1 -> t2 -> ... -> tN -> t
    

    So in the previous example, ExNumPat, when used in an expression, has type

      ExNumPat :: (Show b, Num a, Eq a) => b -> T t
    
  • Matching:

    A pattern synonym occurrence in a pattern is evaluated by first matching against the pattern synonym itself, and then on the argument patterns. For example, in the following program, f and f' are equivalent:

    pattern Pair x y <- [x, y]
    
    f (Pair True True) = True
    f _                = False
    
    f' [x, y] | True <- x, True <- y = True
    f' _                                   = False
    

    Note that the strictness of f differs from that of g defined below:

    g [True, True] = True
    g _            = False
    
    *Main> f (False:undefined)
    *** Exception: Prelude.undefined
    *Main> g (False:undefined)
    False
    

7.3.9. n+k patterns

n+k pattern support is disabled by default. To enable it, you can use the -XNPlusKPatterns flag.

7.3.10. Traditional record syntax

Traditional record syntax, such as C {f = x}, is enabled by default. To disable it, you can use the -XNoTraditionalRecordSyntax flag.

7.3.11. The recursive do-notation

The do-notation of Haskell 98 does not allow recursive bindings, that is, the variables bound in a do-expression are visible only in the textually following code block. Compare this to a let-expression, where bound variables are visible in the entire binding group.

It turns out that such recursive bindings do indeed make sense for a variety of monads, but not all. In particular, recursion in this sense requires a fixed-point operator for the underlying monad, captured by the mfix method of the MonadFix class, defined in Control.Monad.Fix as follows:

class Monad m => MonadFix m where
   mfix :: (a -> m a) -> m a

Haskell's Maybe, [] (list), ST (both strict and lazy versions), IO, and many other monads have MonadFix instances. On the negative side, the continuation monad, with the signature (a -> r) -> r, does not.

For monads that do belong to the MonadFix class, GHC provides an extended version of the do-notation that allows recursive bindings. The -XRecursiveDo (language pragma: RecursiveDo) provides the necessary syntactic support, introducing the keywords mdo and rec for higher and lower levels of the notation respectively. Unlike bindings in a do expression, those introduced by mdo and rec are recursively defined, much like in an ordinary let-expression. Due to the new keyword mdo, we also call this notation the mdo-notation.

Here is a simple (albeit contrived) example:

{-# LANGUAGE RecursiveDo #-}
justOnes = mdo { xs <- Just (1:xs)
               ; return (map negate xs) }

or equivalently

{-# LANGUAGE RecursiveDo #-}
justOnes = do { rec { xs <- Just (1:xs) }
              ; return (map negate xs) }

As you can guess justOnes will evaluate to Just [-1,-1,-1,....

GHC's implementation the mdo-notation closely follows the original translation as described in the paper A recursive do for Haskell, which in turn is based on the work Value Recursion in Monadic Computations. Furthermore, GHC extends the syntax described in the former paper with a lower level syntax flagged by the rec keyword, as we describe next.

7.3.11.1. Recursive binding groups

The flag -XRecursiveDo also introduces a new keyword rec, which wraps a mutually-recursive group of monadic statements inside a do expression, producing a single statement. Similar to a let statement inside a do, variables bound in the rec are visible throughout the rec group, and below it. For example, compare

    do { a <- getChar            do { a <- getChar
       ; let { r1 = f a r2          ; rec { r1 <- f a r2
       ;     ; r2 = g r1 }          ;     ; r2 <- g r1 }
       ; return (r1 ++ r2) }        ; return (r1 ++ r2) }

In both cases, r1 and r2 are available both throughout the let or rec block, and in the statements that follow it. The difference is that let is non-monadic, while rec is monadic. (In Haskell let is really letrec, of course.)

The semantics of rec is fairly straightforward. Whenever GHC finds a rec group, it will compute its set of bound variables, and will introduce an appropriate call to the underlying monadic value-recursion operator mfix, belonging to the MonadFix class. Here is an example:

rec { b <- f a c     ===>    (b,c) <- mfix (\ ~(b,c) -> do { b <- f a c
    ; c <- f b a }                                         ; c <- f b a
                                                           ; return (b,c) })

As usual, the meta-variables b, c etc., can be arbitrary patterns. In general, the statement rec ss is desugared to the statement

vs <- mfix (\ ~vs -> do { ss; return vs })

where vs is a tuple of the variables bound by ss.

Note in particular that the translation for a rec block only involves wrapping a call to mfix: it performs no other analysis on the bindings. The latter is the task for the mdo notation, which is described next.

7.3.11.2. The mdo notation

A rec-block tells the compiler where precisely the recursive knot should be tied. It turns out that the placement of the recursive knots can be rather delicate: in particular, we would like the knots to be wrapped around as minimal groups as possible. This process is known as segmentation, and is described in detail in Secton 3.2 of A recursive do for Haskell. Segmentation improves polymorphism and reduces the size of the recursive knot. Most importantly, it avoids unnecessary interference caused by a fundamental issue with the so-called right-shrinking axiom for monadic recursion. In brief, most monads of interest (IO, strict state, etc.) do not have recursion operators that satisfy this axiom, and thus not performing segmentation can cause unnecessary interference, changing the termination behavior of the resulting translation. (Details can be found in Sections 3.1 and 7.2.2 of Value Recursion in Monadic Computations.)

The mdo notation removes the burden of placing explicit rec blocks in the code. Unlike an ordinary do expression, in which variables bound by statements are only in scope for later statements, variables bound in an mdo expression are in scope for all statements of the expression. The compiler then automatically identifies minimal mutually recursively dependent segments of statements, treating them as if the user had wrapped a rec qualifier around them.

The definition is syntactic:

  • A generator g depends on a textually following generator g', if

    • g' defines a variable that is used by g, or

    • g' textually appears between g and g'', where g depends on g''.

  • A segment of a given mdo-expression is a minimal sequence of generators such that no generator of the sequence depends on an outside generator. As a special case, although it is not a generator, the final expression in an mdo-expression is considered to form a segment by itself.

Segments in this sense are related to strongly-connected components analysis, with the exception that bindings in a segment cannot be reordered and must be contiguous.

Here is an example mdo-expression, and its translation to rec blocks:

mdo { a <- getChar      ===> do { a <- getChar
    ; b <- f a c                ; rec { b <- f a c
    ; c <- f b a                ;     ; c <- f b a }
    ; z <- h a b                ; z <- h a b
    ; d <- g d e                ; rec { d <- g d e
    ; e <- g a z                ;     ; e <- g a z }
    ; putChar c }               ; putChar c }

Note that a given mdo expression can cause the creation of multiple rec blocks. If there are no recursive dependencies, mdo will introduce no rec blocks. In this latter case an mdo expression is precisely the same as a do expression, as one would expect.

In summary, given an mdo expression, GHC first performs segmentation, introducing rec blocks to wrap over minimal recursive groups. Then, each resulting rec is desugared, using a call to Control.Monad.Fix.mfix as described in the previous section. The original mdo-expression typechecks exactly when the desugared version would do so.

Here are some other important points in using the recursive-do notation:

  • It is enabled with the flag -XRecursiveDo, or the LANGUAGE RecursiveDo pragma. (The same flag enables both mdo-notation, and the use of rec blocks inside do expressions.)

  • rec blocks can also be used inside mdo-expressions, which will be treated as a single statement. However, it is good style to either use mdo or rec blocks in a single expression.

  • If recursive bindings are required for a monad, then that monad must be declared an instance of the MonadFix class.

  • The following instances of MonadFix are automatically provided: List, Maybe, IO. Furthermore, the Control.Monad.ST and Control.Monad.ST.Lazy modules provide the instances of the MonadFix class for Haskell's internal state monad (strict and lazy, respectively).

  • Like let and where bindings, name shadowing is not allowed within an mdo-expression or a rec-block; that is, all the names bound in a single rec must be distinct. (GHC will complain if this is not the case.)

7.3.12. Parallel List Comprehensions

Parallel list comprehensions are a natural extension to list comprehensions. List comprehensions can be thought of as a nice syntax for writing maps and filters. Parallel comprehensions extend this to include the zipWith family.

A parallel list comprehension has multiple independent branches of qualifier lists, each separated by a `|' symbol. For example, the following zips together two lists:

   [ (x, y) | x <- xs | y <- ys ]

The behaviour of parallel list comprehensions follows that of zip, in that the resulting list will have the same length as the shortest branch.

We can define parallel list comprehensions by translation to regular comprehensions. Here's the basic idea:

Given a parallel comprehension of the form:

   [ e | p1 <- e11, p2 <- e12, ...
       | q1 <- e21, q2 <- e22, ...
       ...
   ]

This will be translated to:

   [ e | ((p1,p2), (q1,q2), ...) <- zipN [(p1,p2) | p1 <- e11, p2 <- e12, ...]
                                         [(q1,q2) | q1 <- e21, q2 <- e22, ...]
                                         ...
   ]

where `zipN' is the appropriate zip for the given number of branches.

7.3.13. Generalised (SQL-Like) List Comprehensions

Generalised list comprehensions are a further enhancement to the list comprehension syntactic sugar to allow operations such as sorting and grouping which are familiar from SQL. They are fully described in the paper Comprehensive comprehensions: comprehensions with "order by" and "group by", except that the syntax we use differs slightly from the paper.

The extension is enabled with the flag -XTransformListComp.

Here is an example:

employees = [ ("Simon", "MS", 80)
, ("Erik", "MS", 100)
, ("Phil", "Ed", 40)
, ("Gordon", "Ed", 45)
, ("Paul", "Yale", 60)]

output = [ (the dept, sum salary)
| (name, dept, salary) <- employees
, then group by dept using groupWith
, then sortWith by (sum salary)
, then take 5 ]

In this example, the list output would take on the value:

[("Yale", 60), ("Ed", 85), ("MS", 180)]

There are three new keywords: group, by, and using. (The functions sortWith and groupWith are not keywords; they are ordinary functions that are exported by GHC.Exts.)

There are five new forms of comprehension qualifier, all introduced by the (existing) keyword then:

  • then f
    
    This statement requires that f have the type forall a. [a] -> [a]. You can see an example of its use in the motivating example, as this form is used to apply take 5.
  • then f by e
    

    This form is similar to the previous one, but allows you to create a function which will be passed as the first argument to f. As a consequence f must have the type forall a. (a -> t) -> [a] -> [a]. As you can see from the type, this function lets f "project out" some information from the elements of the list it is transforming.

    An example is shown in the opening example, where sortWith is supplied with a function that lets it find out the sum salary for any item in the list comprehension it transforms.

  • then group by e using f
    

    This is the most general of the grouping-type statements. In this form, f is required to have type forall a. (a -> t) -> [a] -> [[a]]. As with the then f by e case above, the first argument is a function supplied to f by the compiler which lets it compute e on every element of the list being transformed. However, unlike the non-grouping case, f additionally partitions the list into a number of sublists: this means that at every point after this statement, binders occurring before it in the comprehension refer to lists of possible values, not single values. To help understand this, let's look at an example:

    -- This works similarly to groupWith in GHC.Exts, but doesn't sort its input first
    groupRuns :: Eq b => (a -> b) -> [a] -> [[a]]
    groupRuns f = groupBy (\x y -> f x == f y)
    
    output = [ (the x, y)
    | x <- ([1..3] ++ [1..2])
    , y <- [4..6]
    , then group by x using groupRuns ]
    

    This results in the variable output taking on the value below:

    [(1, [4, 5, 6]), (2, [4, 5, 6]), (3, [4, 5, 6]), (1, [4, 5, 6]), (2, [4, 5, 6])]
    

    Note that we have used the the function to change the type of x from a list to its original numeric type. The variable y, in contrast, is left unchanged from the list form introduced by the grouping.

  • then group using f
    

    With this form of the group statement, f is required to simply have the type forall a. [a] -> [[a]], which will be used to group up the comprehension so far directly. An example of this form is as follows:

    output = [ x
    | y <- [1..5]
    , x <- "hello"
    , then group using inits]
    

    This will yield a list containing every prefix of the word "hello" written out 5 times:

    ["","h","he","hel","hell","hello","helloh","hellohe","hellohel","hellohell","hellohello","hellohelloh",...]
    

7.3.14. Monad comprehensions

Monad comprehensions generalise the list comprehension notation, including parallel comprehensions (Section 7.3.12, “Parallel List Comprehensions”) and transform comprehensions (Section 7.3.13, “Generalised (SQL-Like) List Comprehensions”) to work for any monad.

Monad comprehensions support:

  • Bindings:

    [ x + y | x <- Just 1, y <- Just 2 ]
    

    Bindings are translated with the (>>=) and return functions to the usual do-notation:

    do x <- Just 1
       y <- Just 2
       return (x+y)
    
  • Guards:

    [ x | x <- [1..10], x <= 5 ]
    

    Guards are translated with the guard function, which requires a MonadPlus instance:

    do x <- [1..10]
       guard (x <= 5)
       return x
    
  • Transform statements (as with -XTransformListComp):

    [ x+y | x <- [1..10], y <- [1..x], then take 2 ]
    

    This translates to:

    do (x,y) <- take 2 (do x <- [1..10]
                           y <- [1..x]
                           return (x,y))
       return (x+y)
    
  • Group statements (as with -XTransformListComp):

    [ x | x <- [1,1,2,2,3], then group by x using GHC.Exts.groupWith ]
    [ x | x <- [1,1,2,2,3], then group using myGroup ]
    
  • Parallel statements (as with -XParallelListComp):

    [ (x+y) | x <- [1..10]
            | y <- [11..20]
            ]
    

    Parallel statements are translated using the mzip function, which requires a MonadZip instance defined in Control.Monad.Zip:

    do (x,y) <- mzip (do x <- [1..10]
                         return x)
                     (do y <- [11..20]
                         return y)
       return (x+y)
    

All these features are enabled by default if the MonadComprehensions extension is enabled. The types and more detailed examples on how to use comprehensions are explained in the previous chapters Section 7.3.13, “Generalised (SQL-Like) List Comprehensions” and Section 7.3.12, “Parallel List Comprehensions”. In general you just have to replace the type [a] with the type Monad m => m a for monad comprehensions.

Note: Even though most of these examples are using the list monad, monad comprehensions work for any monad. The base package offers all necessary instances for lists, which make MonadComprehensions backward compatible to built-in, transform and parallel list comprehensions.

More formally, the desugaring is as follows. We write D[ e | Q] to mean the desugaring of the monad comprehension [ e | Q]:

Expressions: e
Declarations: d
Lists of qualifiers: Q,R,S

-- Basic forms
D[ e | ]               = return e
D[ e | p <- e, Q ]  = e >>= \p -> D[ e | Q ]
D[ e | e, Q ]          = guard e >> \p -> D[ e | Q ]
D[ e | let d, Q ]      = let d in D[ e | Q ]

-- Parallel comprehensions (iterate for multiple parallel branches)
D[ e | (Q | R), S ]    = mzip D[ Qv | Q ] D[ Rv | R ] >>= \(Qv,Rv) -> D[ e | S ]

-- Transform comprehensions
D[ e | Q then f, R ]                  = f D[ Qv | Q ] >>= \Qv -> D[ e | R ]

D[ e | Q then f by b, R ]             = f (\Qv -> b) D[ Qv | Q ] >>= \Qv -> D[ e | R ]

D[ e | Q then group using f, R ]      = f D[ Qv | Q ] >>= \ys ->
                                        case (fmap selQv1 ys, ..., fmap selQvn ys) of
                                 	     Qv -> D[ e | R ]

D[ e | Q then group by b using f, R ] = f (\Qv -> b) D[ Qv | Q ] >>= \ys ->
                                        case (fmap selQv1 ys, ..., fmap selQvn ys) of
                                           Qv -> D[ e | R ]

where  Qv is the tuple of variables bound by Q (and used subsequently)
       selQvi is a selector mapping Qv to the ith component of Qv

Operator     Standard binding       Expected type
--------------------------------------------------------------------
return       GHC.Base               t1 -> m t2
(>>=)        GHC.Base               m1 t1 -> (t2 -> m2 t3) -> m3 t3
(>>)         GHC.Base               m1 t1 -> m2 t2         -> m3 t3
guard        Control.Monad          t1 -> m t2
fmap         GHC.Base               forall a b. (a->b) -> n a -> n b
mzip         Control.Monad.Zip      forall a b. m a -> m b -> m (a,b)

The comprehension should typecheck when its desugaring would typecheck.

Monad comprehensions support rebindable syntax (Section 7.3.15, “Rebindable syntax and the implicit Prelude import”). Without rebindable syntax, the operators from the "standard binding" module are used; with rebindable syntax, the operators are looked up in the current lexical scope. For example, parallel comprehensions will be typechecked and desugared using whatever "mzip" is in scope.

The rebindable operators must have the "Expected type" given in the table above. These types are surprisingly general. For example, you can use a bind operator with the type

(>>=) :: T x y a -> (a -> T y z b) -> T x z b

In the case of transform comprehensions, notice that the groups are parameterised over some arbitrary type n (provided it has an fmap, as well as the comprehension being over an arbitrary monad.

7.3.15. Rebindable syntax and the implicit Prelude import

GHC normally imports Prelude.hi files for you. If you'd rather it didn't, then give it a -XNoImplicitPrelude option. The idea is that you can then import a Prelude of your own. (But don't call it Prelude; the Haskell module namespace is flat, and you must not conflict with any Prelude module.)

Suppose you are importing a Prelude of your own in order to define your own numeric class hierarchy. It completely defeats that purpose if the literal "1" means "Prelude.fromInteger 1", which is what the Haskell Report specifies. So the -XRebindableSyntax flag causes the following pieces of built-in syntax to refer to whatever is in scope, not the Prelude versions:

  • An integer literal 368 means "fromInteger (368::Integer)", rather than "Prelude.fromInteger (368::Integer)".

  • Fractional literals are handed in just the same way, except that the translation is fromRational (3.68::Rational).

  • The equality test in an overloaded numeric pattern uses whatever (==) is in scope.

  • The subtraction operation, and the greater-than-or-equal test, in n+k patterns use whatever (-) and (>=) are in scope.

  • Negation (e.g. "- (f x)") means "negate (f x)", both in numeric patterns, and expressions.

  • Conditionals (e.g. "if e1 then e2 else e3") means "ifThenElse e1 e2 e3". However case expressions are unaffected.

  • "Do" notation is translated using whatever functions (>>=), (>>), and fail, are in scope (not the Prelude versions). List comprehensions, mdo (Section 7.3.11, “The recursive do-notation ”), and parallel array comprehensions, are unaffected.

  • Arrow notation (see Section 7.17, “Arrow notation ”) uses whatever arr, (>>>), first, app, (|||) and loop functions are in scope. But unlike the other constructs, the types of these functions must match the Prelude types very closely. Details are in flux; if you want to use this, ask!

-XRebindableSyntax implies -XNoImplicitPrelude.

In all cases (apart from arrow notation), the static semantics should be that of the desugared form, even if that is a little unexpected. For example, the static semantics of the literal 368 is exactly that of fromInteger (368::Integer); it's fine for fromInteger to have any of the types:

fromInteger :: Integer -> Integer
fromInteger :: forall a. Foo a => Integer -> a
fromInteger :: Num a => a -> Integer
fromInteger :: Integer -> Bool -> Bool

Be warned: this is an experimental facility, with fewer checks than usual. Use -dcore-lint to typecheck the desugared program. If Core Lint is happy you should be all right.

7.3.16. Postfix operators

The -XPostfixOperators flag enables a small extension to the syntax of left operator sections, which allows you to define postfix operators. The extension is this: the left section

  (e !)

is equivalent (from the point of view of both type checking and execution) to the expression

  ((!) e)

(for any expression e and operator (!). The strict Haskell 98 interpretation is that the section is equivalent to

  (\y -> (!) e y)

That is, the operator must be a function of two arguments. GHC allows it to take only one argument, and that in turn allows you to write the function postfix.

The extension does not extend to the left-hand side of function definitions; you must define such a function in prefix form.

7.3.17. Tuple sections

The -XTupleSections flag enables Python-style partially applied tuple constructors. For example, the following program

  (, True)

is considered to be an alternative notation for the more unwieldy alternative

  \x -> (x, True)

You can omit any combination of arguments to the tuple, as in the following

  (, "I", , , "Love", , 1337)

which translates to

  \a b c d -> (a, "I", b, c, "Love", d, 1337)

If you have unboxed tuples enabled, tuple sections will also be available for them, like so

  (# , True #)

Because there is no unboxed unit tuple, the following expression

  (# #)

continues to stand for the unboxed singleton tuple data constructor.

7.3.18. Lambda-case

The -XLambdaCase flag enables expressions of the form

  \case { p1 -> e1; ...; pN -> eN }

which is equivalent to

  \freshName -> case freshName of { p1 -> e1; ...; pN -> eN }

Note that \case starts a layout, so you can write

  \case
    p1 -> e1
    ...
    pN -> eN

7.3.19. Empty case alternatives

The -XEmptyCase flag enables case expressions, or lambda-case expressions, that have no alternatives, thus:

    case e of { }   -- No alternatives
or
    \case { }       -- -XLambdaCase is also required

This can be useful when you know that the expression being scrutinised has no non-bottom values. For example:

  data Void
  f :: Void -> Int
  f x = case x of { }

With dependently-typed features it is more useful (see Trac). For example, consider these two candidate definitions of absurd:

data a :==: b where
  Refl :: a :==: a

absurd :: True :~: False -> a
absurd x = error "absurd"    -- (A)
absurd x = case x of {}      -- (B)

We much prefer (B). Why? Because GHC can figure out that (True :~: False) is an empty type. So (B) has no partiality and GHC should be able to compile with -fwarn-incomplete-patterns. (Though the pattern match checking is not yet clever enough to do that.) On the other hand (A) looks dangerous, and GHC doesn't check to make sure that, in fact, the function can never get called.

7.3.20. Multi-way if-expressions

With -XMultiWayIf flag GHC accepts conditional expressions with multiple branches:

  if | guard1 -> expr1
     | ...
     | guardN -> exprN

which is roughly equivalent to

  case () of
    _ | guard1 -> expr1
    ...
    _ | guardN -> exprN

Multi-way if expressions introduce a new layout context. So the example above is equivalent to:

  if { | guard1 -> expr1
     ; | ...
     ; | guardN -> exprN
     }

The following behaves as expected:

  if | guard1 -> if | guard2 -> expr2
                    | guard3 -> expr3
     | guard4 -> expr4

because layout translates it as

  if { | guard1 -> if { | guard2 -> expr2
                      ; | guard3 -> expr3
                      }
     ; | guard4 -> expr4
     }

Layout with multi-way if works in the same way as other layout contexts, except that the semi-colons between guards in a multi-way if are optional. So it is not necessary to line up all the guards at the same column; this is consistent with the way guards work in function definitions and case expressions.

7.3.21. Record field disambiguation

In record construction and record pattern matching it is entirely unambiguous which field is referred to, even if there are two different data types in scope with a common field name. For example:

module M where
  data S = MkS { x :: Int, y :: Bool }

module Foo where
  import M

  data T = MkT { x :: Int }

  ok1 (MkS { x = n }) = n+1   -- Unambiguous
  ok2 n = MkT { x = n+1 }     -- Unambiguous

  bad1 k = k { x = 3 }  -- Ambiguous
  bad2 k = x k          -- Ambiguous

Even though there are two x's in scope, it is clear that the x in the pattern in the definition of ok1 can only mean the field x from type S. Similarly for the function ok2. However, in the record update in bad1 and the record selection in bad2 it is not clear which of the two types is intended.

Haskell 98 regards all four as ambiguous, but with the -XDisambiguateRecordFields flag, GHC will accept the former two. The rules are precisely the same as those for instance declarations in Haskell 98, where the method names on the left-hand side of the method bindings in an instance declaration refer unambiguously to the method of that class (provided they are in scope at all), even if there are other variables in scope with the same name. This reduces the clutter of qualified names when you import two records from different modules that use the same field name.

Some details:

  • Field disambiguation can be combined with punning (see Section 7.3.22, “Record puns ”). For example:

    module Foo where
      import M
      x=True
      ok3 (MkS { x }) = x+1   -- Uses both disambiguation and punning
    

  • With -XDisambiguateRecordFields you can use unqualified field names even if the corresponding selector is only in scope qualified For example, assuming the same module M as in our earlier example, this is legal:

    module Foo where
      import qualified M    -- Note qualified
    
      ok4 (M.MkS { x = n }) = n+1   -- Unambiguous
    

    Since the constructor MkS is only in scope qualified, you must name it M.MkS, but the field x does not need to be qualified even though M.x is in scope but x is not. (In effect, it is qualified by the constructor.)

7.3.22. Record puns

Record puns are enabled by the flag -XNamedFieldPuns.

When using records, it is common to write a pattern that binds a variable with the same name as a record field, such as:

data C = C {a :: Int}
f (C {a = a}) = a

Record punning permits the variable name to be elided, so one can simply write

f (C {a}) = a

to mean the same pattern as above. That is, in a record pattern, the pattern a expands into the pattern a = a for the same name a.

Note that:

  • Record punning can also be used in an expression, writing, for example,

    let a = 1 in C {a}
    

    instead of

    let a = 1 in C {a = a}
    

    The expansion is purely syntactic, so the expanded right-hand side expression refers to the nearest enclosing variable that is spelled the same as the field name.

  • Puns and other patterns can be mixed in the same record:

    data C = C {a :: Int, b :: Int}
    f (C {a, b = 4}) = a
    

  • Puns can be used wherever record patterns occur (e.g. in let bindings or at the top-level).

  • A pun on a qualified field name is expanded by stripping off the module qualifier. For example:

    f (C {M.a}) = a
    

    means

    f (M.C {M.a = a}) = a
    

    (This is useful if the field selector a for constructor M.C is only in scope in qualified form.)

7.3.23. Record wildcards

Record wildcards are enabled by the flag -XRecordWildCards. This flag implies -XDisambiguateRecordFields.

For records with many fields, it can be tiresome to write out each field individually in a record pattern, as in

data C = C {a :: Int, b :: Int, c :: Int, d :: Int}
f (C {a = 1, b = b, c = c, d = d}) = b + c + d

Record wildcard syntax permits a ".." in a record pattern, where each elided field f is replaced by the pattern f = f. For example, the above pattern can be written as

f (C {a = 1, ..}) = b + c + d

More details:

  • Wildcards can be mixed with other patterns, including puns (Section 7.3.22, “Record puns ”); for example, in a pattern C {a = 1, b, ..}). Additionally, record wildcards can be used wherever record patterns occur, including in let bindings and at the top-level. For example, the top-level binding

    C {a = 1, ..} = e
    

    defines b, c, and d.

  • Record wildcards can also be used in expressions, writing, for example,

    let {a = 1; b = 2; c = 3; d = 4} in C {..}
    

    in place of

    let {a = 1; b = 2; c = 3; d = 4} in C {a=a, b=b, c=c, d=d}
    

    The expansion is purely syntactic, so the record wildcard expression refers to the nearest enclosing variables that are spelled the same as the omitted field names.

  • The ".." expands to the missing in-scope record fields. Specifically the expansion of "C {..}" includes f if and only if:

    • f is a record field of constructor C.

    • The record field f is in scope somehow (either qualified or unqualified).

    • In the case of expressions (but not patterns), the variable f is in scope unqualified, apart from the binding of the record selector itself.

    For example

    module M where
      data R = R { a,b,c :: Int }
    module X where
      import M( R(a,c) )
      f b = R { .. }
    

    The R{..} expands to R{M.a=a}, omitting b since the record field is not in scope, and omitting c since the variable c is not in scope (apart from the binding of the record selector c, of course).

7.3.24. Local Fixity Declarations

A careful reading of the Haskell 98 Report reveals that fixity declarations (infix, infixl, and infixr) are permitted to appear inside local bindings such those introduced by let and where. However, the Haskell Report does not specify the semantics of such bindings very precisely.

In GHC, a fixity declaration may accompany a local binding:

let f = ...
    infixr 3 `f`
in
    ...

and the fixity declaration applies wherever the binding is in scope. For example, in a let, it applies in the right-hand sides of other let-bindings and the body of the letC. Or, in recursive do expressions (Section 7.3.11, “The recursive do-notation ”), the local fixity declarations of a let statement scope over other statements in the group, just as the bound name does.

Moreover, a local fixity declaration *must* accompany a local binding of that name: it is not possible to revise the fixity of name bound elsewhere, as in

let infixr 9 $ in ...

Because local fixity declarations are technically Haskell 98, no flag is necessary to enable them.

7.3.25. Package-qualified imports

With the -XPackageImports flag, GHC allows import declarations to be qualified by the package name that the module is intended to be imported from. For example:

import "network" Network.Socket

would import the module Network.Socket from the package network (any version). This may be used to disambiguate an import when the same module is available from multiple packages, or is present in both the current package being built and an external package.

The special package name this can be used to refer to the current package being built.

Note: you probably don't need to use this feature, it was added mainly so that we can build backwards-compatible versions of packages when APIs change. It can lead to fragile dependencies in the common case: modules occasionally move from one package to another, rendering any package-qualified imports broken.

7.3.26. Safe imports

With the -XSafe, -XTrustworthy and -XUnsafe language flags, GHC extends the import declaration syntax to take an optional safe keyword after the import keyword. This feature is part of the Safe Haskell GHC extension. For example:

import safe qualified Network.Socket as NS

would import the module Network.Socket with compilation only succeeding if Network.Socket can be safely imported. For a description of when a import is considered safe see Section 7.27, “Safe Haskell”

7.3.27. Explicit namespaces in import/export

In an import or export list, such as

  module M( f, (++) ) where ...
    import N( f, (++) ) 
    ...

the entities f and (++) are values. However, with type operators (Section 7.4.4, “Type operators”) it becomes possible to declare (++) as a type constructor. In that case, how would you export or import it?

The -XExplicitNamespaces extension allows you to prefix the name of a type constructor in an import or export list with "type" to disambiguate this case, thus:

  module M( f, type (++) ) where ...
    import N( f, type (++) ) 
    ...
  module N( f, type (++) ) where
    data family a ++ b = L a | R b

The extension -XExplicitNamespaces is implied by -XTypeOperators and (for some reason) by -XTypeFamilies.

7.3.28. Summary of stolen syntax

Turning on an option that enables special syntax might cause working Haskell 98 code to fail to compile, perhaps because it uses a variable name which has become a reserved word. This section lists the syntax that is "stolen" by language extensions. We use notation and nonterminal names from the Haskell 98 lexical syntax (see the Haskell 98 Report). We only list syntax changes here that might affect existing working programs (i.e. "stolen" syntax). Many of these extensions will also enable new context-free syntax, but in all cases programs written to use the new syntax would not be compilable without the option enabled.

There are two classes of special syntax:

  • New reserved words and symbols: character sequences which are no longer available for use as identifiers in the program.

  • Other special syntax: sequences of characters that have a different meaning when this particular option is turned on.

The following syntax is stolen:

forall

Stolen (in types) by: -XExplicitForAll, and hence by -XScopedTypeVariables, -XLiberalTypeSynonyms, -XRankNTypes, -XExistentialQuantification

mdo

Stolen by: -XRecursiveDo

foreign

Stolen by: -XForeignFunctionInterface

rec, proc, -<, >-, -<<, >>-, and (|, |) brackets

Stolen by: -XArrows

?varid

Stolen by: -XImplicitParams

[|, [e|, [p|, [d|, [t|, $(, $$(, [||, [e||, $varid, $$varid

Stolen by: -XTemplateHaskell

[varid|

Stolen by: -XQuasiQuotes

varid{#}, char#, string#, integer#, float#, float##

Stolen by: -XMagicHash

(#, #)

Stolen by: -XUnboxedTuples

varid!varid

Stolen by: -XBangPatterns

pattern

Stolen by: -XPatternSynonyms