# Generalised algebraic datatype

### From HaskellWiki

EndreyMark (Talk | contribs) (Copying a link from Research papers/Type systems#Generalised Algebraic Data Types to here in section - →Papers: (a paper on translating GADT's to system F)) |
CaleGibbard (Talk | contribs) (added Regexp example) |
||

Line 85: | Line 85: | ||

In the definition of `silly': silly 1 = Cons 1 Nil |
In the definition of `silly': silly 1 = Cons 1 Nil |
||

+ | Note that GADTs provide a rather nice platform for embedded domain specific languages. In particular, they allow an EDSL to use Haskell's type system for its own purposes. As a simple example, we might have an EDSL for (a generalisation of) regular expressions that looks something like: |
||

+ | <haskell> |
||

+ | data Regexp tok a where |
||

+ | Zero :: Regexp tok () |
||

+ | One :: Regexp tok () |
||

+ | Check :: (tok -> Bool) -> Regexp tok tok |
||

+ | Satisfy :: ([tok] -> Bool) -> Regexp tok [tok] |
||

+ | Push :: tok -> Regexp tok a -> Regexp tok a |
||

+ | Plus :: Regexp tok a -> Regexp tok b -> Regexp tok (Either a b) |
||

+ | Times :: Regexp tok a -> Regexp tok b -> Regexp tok (a,b) |
||

+ | Star :: Regexp tok a -> Regexp tok [a] |
||

+ | </haskell> |
||

+ | |||

+ | An evaluator/parser is then straightforward. Below it's written monadically for |
||

+ | convenience, but this also means that we could generalise the return type to being in any MonadPlus. |
||

+ | |||

+ | <haskell> |
||

+ | parse :: Regexp tok a -> [tok] -> Maybe a |
||

+ | |||

+ | -- Zero always fails. |
||

+ | parse Zero ts = mzero |
||

+ | |||

+ | -- One matches only the empty string. |
||

+ | parse One [] = return () |
||

+ | parse One _ = mzero |
||

+ | |||

+ | -- Check p matches a string with exactly one token t such that p t holds. |
||

+ | parse (Check p) [t] = if p t then return t else mzero |
||

+ | parse (Check p) _ = mzero |
||

+ | |||

+ | -- Satisfy p any string such that p ts holds. |
||

+ | parse (Satisfy p) xs = if p xs then return xs else mzero |
||

+ | |||

+ | -- Push t x matches a string ts when x matches (t:ts). |
||

+ | parse (Push t x) ts = parse x (t:ts) |
||

+ | |||

+ | -- Plus x y matches when either x or y does. |
||

+ | parse (Plus x y) ts = liftM Left (parse x ts) `mplus` liftM Right (parse y ts) |
||

+ | |||

+ | -- Times x y matches the concatenation of x and y. |
||

+ | parse (Times x y) [] = liftM2 (,) (parse x []) (parse y []) |
||

+ | parse (Times x y) (t:ts) = |
||

+ | parse (Times (Push t x) y) ts `mplus` |
||

+ | liftM2 (,) (parse x []) (parse y (t:ts)) |
||

+ | |||

+ | -- Star x matches zero or more copies of x. |
||

+ | parse (Star x) [] = return [] |
||

+ | parse (Star x) (t:ts) = do |
||

+ | (v,vs) <- parse (Times x (Star x)) (t:ts) |
||

+ | return (v:vs) |
||

+ | </haskell> |
||

+ | |||

+ | Finally, we might define some examples: |
||

+ | |||

+ | <haskell> |
||

+ | token x = Check (== x) |
||

+ | string xs = Satisfy (== xs) |
||

+ | p = Times (token 'a') (token 'b') |
||

+ | p1 = Times (Star (token 'a')) (Star (token 'b')) |
||

+ | p2 = Star p1 |
||

+ | blocks :: (Eq tok) => Regexp tok [[tok]] |
||

+ | blocks = Star (Satisfy allEqual) |
||

+ | where allEqual xs = and (zipWith (==) xs (drop 1 xs)) |
||

+ | </haskell> |
||

+ | |||

+ | Testing this in ghci: |
||

+ | <pre> |
||

+ | *Main> parse p "ab" |
||

+ | Just ('a','b') |
||

+ | *Main> parse p "ac" |
||

+ | Nothing |
||

+ | *Main> parse p1 "aaabbbb" |
||

+ | Just ("aaa","bbbb") |
||

+ | *Main> parse p2 "aaabbbbaabbbbbbbaaabbabab" |
||

+ | Just [("aaa","bbbb"),("aa","bbbbbbb"),("aaa","bb"),("a","b"),("a","b")] |
||

+ | *Main> parse blocks "aaaabbbbbbbbcccccddd" |
||

+ | Just ["aaaa","bbbbbbbb","ccccc","ddd"] |
||

+ | </pre> |
||

[[Category:Language]] |
[[Category:Language]] |

## Revision as of 08:02, 5 May 2006

## 1 Papers

See also research papers on type systems.

- A short descriptions on generalised algebraic datatypes here as GHC language features.
- Another description with links on Haskell' wiki.
- First-Class Phantom Types by James Cheney and Ralf Hinze
- Stratified type inference for generalized algebraic data types by François Pottier and Yann Régis-Gianas. It contains also a lot of links to other papers on GADTs.
- Simple unification-based type inference for GADTs by Simon Peyton Jones, Dimitrios Vytiniotis, Stephanie Weirich, and Geoffrey Washburn. (Revised April 2006.)
- Translating Generalized Algebraic Data Types to System F written by Martin Sulzmann and Meng Wang. Many other papers.

## 2 Motivating example

Generalised Algebraic Datatypes (GADTs) are datatypes for which a constructor has a non standard type. Indeed, in type systems incorporating GADTs, there are very few restrictions on the type that the data constructors can take. To show you how this could be useful, we will implement an evaluator for the typed SK calculus. Note that the K combinator is operationally similar to and, similarly, S is similar to the combinator which, in simply typed lambda calculus, have types and Without GADTs we would have to write something like this:

data Term = K | S | Term :@ Term infixl 6 :@

With GADTs, however, we can have the terms carry around more type information and create more interesting terms, like so:

data Term x where K :: Term (a -> b -> a) S :: Term ((a -> b -> c) -> (a -> b) -> a -> c) Const :: a -> Term a (:@) :: Term (a -> b) -> (Term a) -> Term b infixl 6 :@

now we can write a small step evaluator:

eval::Term a -> Term a eval (K :@ x :@ y) = x eval (S :@ x :@ y :@ z) = x :@ z :@ (y :@ z) eval x = x

Since the types of the so-called object language, being the typed SK calculus, are mimicked by the type system in our meta language, being haskell, we have a pretty convincing argument that the evaluator won't mangle our types. We say that typing is preserved under evaluation (preservation.) Note that this is an argument and not a proof.

This, however, comes at a price: let's see what happens when you try to convert strings into our object language:

parse "K" = K parse "S" = S

you'll get a nasty error like so:

Occurs check: cannot construct the infinite type: c = b -> c Expected type: Term ((a -> b -> c) -> (a -> b) -> a -> b -> c) Inferred type: Term ((a -> b -> c) -> (a -> b) -> a -> c) In the definition of `foo': foo "S" = SOne could, however, reason that parse has type:

## 3 Example with lists

here's another, smaller example:

data Empty data NonEmpty data List x y where Nil :: List a Empty Cons:: a -> List a b -> List a NonEmpty safeHead:: List x NonEmpty -> x safeHead (Cons a b) = a

now safeHead can only be applied to non empty lists, and will never evaluate to bottom. This too comes at a cost; consider the function:

silly 0 = Nil silly 1 = Cons 1 Nil

yields an objection from ghc:

Couldn't match `Empty' against `NonEmpty' Expected type: List a Empty Inferred type: List a NonEmpty In the application `Cons 1 Nil' In the definition of `silly': silly 1 = Cons 1 Nil

Note that GADTs provide a rather nice platform for embedded domain specific languages. In particular, they allow an EDSL to use Haskell's type system for its own purposes. As a simple example, we might have an EDSL for (a generalisation of) regular expressions that looks something like:

data Regexp tok a where Zero :: Regexp tok () One :: Regexp tok () Check :: (tok -> Bool) -> Regexp tok tok Satisfy :: ([tok] -> Bool) -> Regexp tok [tok] Push :: tok -> Regexp tok a -> Regexp tok a Plus :: Regexp tok a -> Regexp tok b -> Regexp tok (Either a b) Times :: Regexp tok a -> Regexp tok b -> Regexp tok (a,b) Star :: Regexp tok a -> Regexp tok [a]

An evaluator/parser is then straightforward. Below it's written monadically for convenience, but this also means that we could generalise the return type to being in any MonadPlus.

parse :: Regexp tok a -> [tok] -> Maybe a -- Zero always fails. parse Zero ts = mzero -- One matches only the empty string. parse One [] = return () parse One _ = mzero -- Check p matches a string with exactly one token t such that p t holds. parse (Check p) [t] = if p t then return t else mzero parse (Check p) _ = mzero -- Satisfy p any string such that p ts holds. parse (Satisfy p) xs = if p xs then return xs else mzero -- Push t x matches a string ts when x matches (t:ts). parse (Push t x) ts = parse x (t:ts) -- Plus x y matches when either x or y does. parse (Plus x y) ts = liftM Left (parse x ts) `mplus` liftM Right (parse y ts) -- Times x y matches the concatenation of x and y. parse (Times x y) [] = liftM2 (,) (parse x []) (parse y []) parse (Times x y) (t:ts) = parse (Times (Push t x) y) ts `mplus` liftM2 (,) (parse x []) (parse y (t:ts)) -- Star x matches zero or more copies of x. parse (Star x) [] = return [] parse (Star x) (t:ts) = do (v,vs) <- parse (Times x (Star x)) (t:ts) return (v:vs)

Finally, we might define some examples:

token x = Check (== x) string xs = Satisfy (== xs) p = Times (token 'a') (token 'b') p1 = Times (Star (token 'a')) (Star (token 'b')) p2 = Star p1 blocks :: (Eq tok) => Regexp tok [[tok]] blocks = Star (Satisfy allEqual) where allEqual xs = and (zipWith (==) xs (drop 1 xs))

Testing this in ghci:

*Main> parse p "ab" Just ('a','b') *Main> parse p "ac" Nothing *Main> parse p1 "aaabbbb" Just ("aaa","bbbb") *Main> parse p2 "aaabbbbaabbbbbbbaaabbabab" Just [("aaa","bbbb"),("aa","bbbbbbb"),("aaa","bb"),("a","b"),("a","b")] *Main> parse blocks "aaaabbbbbbbbcccccddd" Just ["aaaa","bbbbbbbb","ccccc","ddd"]