Personal tools

GADTs for dummies

From HaskellWiki

(Difference between revisions)
Jump to: navigation, search
(Added GADT translation of example data definition)
(3 intermediate revisions by 3 users not shown)
Line 1: Line 1:
For a long time, I didn't understand what GADTs are and how they can be used. It was a sort of conspiracy of silence - people who understand GADTs think
+
[[Category:Tutorials]]
it is all obvious, and don't need any explanation, but I still
+
[[Category:Language extensions]]
couldn't understand.
 
   
Now I have an idea how it works, and think that it was really obvious :) and I want to share my understanding - may be my way to realize GADTs can help
+
For a long time, I didn't understand what GADTs were or how they could be used. It was sort of a conspiracy of silence - people who understood GADTs thought
someone else. See also [[Generalised_algebraic_datatype]]
+
that everything was obvious and didn't need further explanation, but I still
  +
couldn't understand them.
  +
  +
Now that I have an idea how it works, I think that it was really obvious. :) So, I want to share my understanding GADTs. Maybe the way I realized how GADTs work could help
  +
someone else. See also [[Generalised algebraic datatype]]
   
 
== Type functions ==
 
== Type functions ==
Line 32: Line 32:
 
but we can use it on type values. Type constructors declared with
 
but we can use it on type values. Type constructors declared with
 
"data" statements and type functions declared with "type" statements
 
"data" statements and type functions declared with "type" statements
used together to build arbitrarily complex types. In such
+
can be used together to build arbitrarily complex types. In such
 
"computations" type constructors serves as basic "values" and type
 
"computations" type constructors serves as basic "values" and type
 
functions as a way to process them.
 
functions as a way to process them.
Line 42: Line 42:
 
== Hypothetical Haskell extension - Full-featured type functions ==
 
== Hypothetical Haskell extension - Full-featured type functions ==
   
Let's build hypothetical Haskell extension, that mimics for type
+
Let's build a hypothetical Haskell extension that mimics, for type
functions the well-known ways to define ordinary functions, including
+
functions, the well-known ways to define ordinary functions, including
 
pattern matching:
 
pattern matching:
   
Line 50: Line 50:
 
</haskell>
 
</haskell>
   
multiple statements (this is meaningful only in presence of pattern matching):
+
multiple statements (this is meaningful only in the presence of pattern matching):
   
 
<haskell>
 
<haskell>
Line 66: Line 66:
 
</haskell>
 
</haskell>
   
As you may already have guessed, this last definition calculates a simple base type of arbitrarily-nested collections, e.g:
+
As you may already have guessed, this last definition calculates a simple base type of arbitrarily-nested collections, e.g.:
   
 
<haskell>
 
<haskell>
Line 77: Line 77:
 
</haskell>
 
</haskell>
   
Let's don't forget about statement guards:
+
Let's not forget about statement guards:
   
 
<haskell>
 
<haskell>
Line 83: Line 83:
 
</haskell>
 
</haskell>
   
Here we define type function F only for simple datatypes by using in
+
Here we define type function F only for simple datatypes by using a
 
guard type function "IsSimple":
 
guard type function "IsSimple":
   
Line 109: Line 109:
 
</haskell>
 
</haskell>
   
Here, we just defined list of simple types, the implied result of all
+
Here, we defined a list of simple types. The implied result of all
written statements for "IsSimple" is True value, and False value for
+
written statements for "IsSimple" is True, and False for
anything else. Essentially, "IsSimple" is no less than TYPE PREDICATE!
+
everything else. Essentially, "IsSimple" is a TYPE PREDICATE!
   
 
I really love it! :) How about constructing a predicate that traverses a
 
I really love it! :) How about constructing a predicate that traverses a
Line 125: Line 125:
   
 
or a type function that substitutes one type with another inside
 
or a type function that substitutes one type with another inside
arbitrary-deep types:
+
arbitrarily-deep types:
   
 
<haskell>
 
<haskell>
Line 151: Line 151:
   
 
Pay attention to the last statement of the "Collection" definition, where
 
Pay attention to the last statement of the "Collection" definition, where
we've used type variable "b" that was not mentioned on the left side
+
we used the type variable "b" that was not mentioned on the left side,
nor defined in any other way. It's perfectly possible - anyway
+
nor defined in any other way. Since it's perfectly possible for the
"Collection" function has multiple values, so using on the right side
+
"Collection" function to have multiple values, using some free variable on
some free variable that can be replaced with any type is not a problem
+
the right side that can be replaced with any type is not a problem
at all - the "Map Bool Int", "Map [Int] Int" and "Map Int Int" all are
+
at all. "Map Bool Int", "Map [Int] Int" and "Map Int Int" all are
 
possible values of "Collection Int" along with "[Int]" and "Set Int".
 
possible values of "Collection Int" along with "[Int]" and "Set Int".
   
On the first look, it seems that multiple-value functions are
+
At first glance, it seems that multiple-value functions are meaningless - they
meaningless - they can't be used to define datatypes, because we need
+
can't be used to define datatypes, because we need concrete types here. But
concrete types here. But on the second look :) we can find them
+
if we take another look, they can be useful to define type constraints and
useful to define type constraints and type families.
+
type families.
   
We can also represent multiple-value function as predicate:
+
We can also represent a multiple-value function as a predicate:
   
 
<haskell>
 
<haskell>
Line 171: Line 171:
 
</haskell>
 
</haskell>
   
If you remember Prolog, you should guess that predicate, in contrast to
+
If you're familiar with Prolog, you should know that a predicate, in contrast to
function, is multi-purpose thing - it can be used to deduce any
+
a function, is a multi-directional thing - it can be used to deduce any
parameter from other ones. For example, in this hypothetical definition:
+
parameter from the other ones. For example, in this hypothetical definition:
   
 
<haskell>
 
<haskell>
Line 179: Line 179:
 
</haskell>
 
</haskell>
   
we define 'head' function for any Collection containing Ints.
+
we define a 'head' function for any Collection containing Ints.
   
 
And in this, again, hypothetical definition:
 
And in this, again, hypothetical definition:
Line 194: Line 194:
 
== Back to real Haskell - type classes ==
 
== Back to real Haskell - type classes ==
   
Reading all those glorious examples you may be wondering - why Haskell
+
After reading about all of these glorious examples, you may be wondering
don't yet supports full-featured type functions? Hold your breath...
+
"Why doesn't Haskell support full-featured type functions?" Hold your breath...
Haskell already contains them and at least GHC implements all the
+
Haskell already contains them, and GHC has implemented all of the
mentioned abilities more than 10 years ago! They just was named...
+
capabilities mentioned above for more than 10 years! They were just named...
TYPE CLASSES! Let's translate all our examples to their language:
+
TYPE CLASSES! Let's translate all of our examples to their language:
   
 
<haskell>
 
<haskell>
Line 208: Line 208:
 
</haskell>
 
</haskell>
   
Haskell'98 standard supports type classes with only one parameter that
+
The Haskell'98 standard supports type classes with only one parameter.
limits us to defining only type predicates like this one. But GHC and
+
That limits us to only defining type predicates like this one. But GHC and
Hugs supports multi-parameter type classes that allows us to define
+
Hugs support multi-parameter type classes that allow us to define
 
arbitrarily-complex type functions
 
arbitrarily-complex type functions
   
Line 220: Line 220:
 
</haskell>
 
</haskell>
   
All the "hypothetical" Haskell extensions we investigated earlier -
+
All of the "hypothetical" Haskell extensions we investigated earlier are
 
actually implemented at the type class level!
 
actually implemented at the type class level!
   
Line 250: Line 250:
   
   
Let's define type class which contains any collection which uses Int as
+
Let's define a type class which contains any collection which uses Int in
 
its elements or indexes:
 
its elements or indexes:
   
Line 262: Line 262:
   
   
Anther example is a class that replaces all occurrences of 'a' with
+
Another example is a class that replaces all occurrences of 'a' with
'b' in type 't' and return result as 'res':
+
'b' in type 't' and return the result as 'res':
   
 
<haskell>
 
<haskell>
Line 277: Line 277:
   
 
You can compare it to the hypothetical definition we gave earlier.
 
You can compare it to the hypothetical definition we gave earlier.
It's important to note that type class instances, as opposite to
+
It's important to note that type class instances, as opposed to
function statements, are not checked in order. Instead, most
+
function statements, are not checked in order. Instead, the most
_specific_ instance automatically selected. So, in Replace case, the
+
_specific_ instance is automatically selected. So, in the Replace case, the
last instance that is most general will be selected only if all other
+
last instance, which is the most general instance, will be selected only if all the others
are failed to match and that is that we want.
+
fail to match, which is what we want.
   
 
In many other cases this automatic selection is not powerful enough
 
In many other cases this automatic selection is not powerful enough
Line 287: Line 287:
 
language developers. The two most well-known language extensions
 
language developers. The two most well-known language extensions
 
proposed to solve such problems are instance priorities, which allow
 
proposed to solve such problems are instance priorities, which allow
to explicitly specify instance selection order, and '/=' constraints,
+
us to explicitly specify instance selection order, and '/=' constraints,
 
which can be used to explicitly prohibit unwanted matches:
 
which can be used to explicitly prohibit unwanted matches:
   
Line 296: Line 296:
 
</haskell>
 
</haskell>
   
You can check that these instances are no more overlaps.
+
You can check that these instances no longer overlap.
   
   
At practice, type-level arithmetics by itself is not very useful. It
+
In practice, type-level arithmetic by itself is not very useful. It becomes a
becomes really strong weapon when combined with another feature that
+
fantastic tool when combined with another feature that type classes provide -
type classes provide - member functions. For example:
+
member functions. For example:
   
 
<haskell>
 
<haskell>
Line 315: Line 315:
   
   
I'll be also glad to see possibility to use type classes in data
+
I'll also be glad to see the possibility of using type classes in data
declarations like this:
+
declarations, like this:
   
 
<haskell>
 
<haskell>
Line 322: Line 322:
 
</haskell>
 
</haskell>
   
but afaik this is also not yet implemented
+
but as far as I know, this is not yet implemented.
   
   
Line 332: Line 332:
 
== Back to GADTs ==
 
== Back to GADTs ==
   
If you are wonder how relates all these interesting type manipulations
+
If you are wondering how all of these interesting type manipulations relate to
to GADTs, now is the time to give you answer. As you know, Haskell
+
GADTs, here is the answer. As you know, Haskell contains highly
contains highly developed ways to express data-to-data functions. Now
+
developed ways to express data-to-data functions. We also know that
we also know that Haskell contains rich facilities to write
+
Haskell contains rich facilities to write type-to-type functions in the form of
type-to-type functions in form of "type" statements and type classes.
+
"type" statements and type classes. But how do "data" statements fit into this
But how "data" statements fits in this infrastructure?
+
infrastructure?
   
My answer: they just defines type-to-data constructors translation.
+
My answer: they just define a type-to-data constructor translation. Moreover,
Moreover, this translation may give multiple results. Say, the
+
this translation may give multiple results. Say, the following definition:
following definition:
 
   
 
<haskell>
 
<haskell>
Line 346: Line 346:
 
</haskell>
 
</haskell>
   
defines type-to-data constructors function "Maybe" that has parameter
+
defines type-to-data constructors function "Maybe" that has a parameter
 
"a" and for each "a" has two possible results - "Just a" and
 
"a" and for each "a" has two possible results - "Just a" and
 
"Nothing". We can rewrite it in the same hypothetical syntax that was
 
"Nothing". We can rewrite it in the same hypothetical syntax that was
Line 370: Line 370:
 
</haskell>
 
</haskell>
   
But how are flexible "data" definitions? As you should remember,
+
But how flexible are "data" definitions? As you should remember, "type"
"type" definitions was very limited in their features, while type
+
definitions were very limited in their features, while type classes,
classes, vice versa, much more developed than ordinary Haskell
+
on the other hand, were more developed than ordinary Haskell functions
functions facilities. What about features of "data" definitions
+
facilities. What about features of "data" definitions examined as sort of functions?
examined as sort of functions?
 
   
On the one side, they supports multiple statements and multiple
+
On the one hand, they supports multiple statements and multiple results and
results and can be recursive, like the "List" definition above. On the
+
can be recursive, like the "List" definition above. On the other, that's all -
other side, that's all - no pattern matching or even type constants on
+
no pattern matching or even type constants on the left side and no guards.
the left side and no guards.
 
   
Lack of pattern matching means that left side can contain only free
+
Lack of pattern matching means that the left side can contain only free type
type variables, that in turn means that left sides of all "data"
+
variables. That in turn means that the left sides of all "data" statements for a
statements for one type will be essentially the same. Therefore,
+
type will be essentially the same. Therefore, repeated left sides in
repeated left sides in multi-statement "data" definitions are omitted
+
multi-statement "data" definitions are omitted and instead of
and instead of
 
   
 
<haskell>
 
<haskell>
Line 397: Line 397:
   
   
And here finally comes the GADTs! It's just a way to define data types
+
And here we finally come to GADTs! It's just a way to define data types using
using pattern matching and constants on the left side of "data"
+
pattern matching and constants on the left side of "data" statements!
statements! How about this:
+
Let's say we want to do this:
   
 
<haskell>
 
<haskell>
Line 407: Line 407:
 
</haskell>
 
</haskell>
   
Amazed? After all, GADTs seems really very simple and obvious
+
extension to data type definition facilities.
+
We cannot do this using a standard data definition. So, now we must use a GADT definition:
  +
  +
<haskell>
  +
data T a where
  +
D1 :: Int -> T String
  +
D2 :: T Bool
  +
D3 :: (a,a) -> T [a]
  +
</haskell>
  +
  +
Amazed? After all, GADTs seem to be a really simple and obvious extension to
  +
data type definition facilities.
   
   
Line 416: Line 416:
   
   
The idea is to allow a data constructor's return type to
+
The idea here is to allow a data constructor's return type to be specified
be specified directly:
+
directly:
   
 
<haskell>
 
<haskell>
Line 426: Line 426:
 
</haskell>
 
</haskell>
   
In a function that performs pattern matching on Term, the pattern
+
In a function that performs pattern matching on Term, the pattern match gives
match gives type as well as value information. For example,
+
type as well as value information. For example, consider this function:
consider this function:
 
   
 
<haskell>
 
<haskell>
Line 436: Line 436:
 
</haskell>
 
</haskell>
   
If the argument matches Lit, it must have been built with a
+
If the argument matches Lit, it must have been built with a Lit constructor,
Lit constructor, so type 'a' must be Int, and hence we can return 'i'
+
so type 'a' must be Int, and hence we can return 'i' (an Int) in the right
(an Int) in the right hand side. The same objections applies to the Pair
+
hand side. The same thing applies to the Pair constructor.
constructor.
 
   
   
Line 447: Line 447:
 
== Further reading ==
 
== Further reading ==
   
The best paper on type level arithmetic using type classes i've seen
+
The best paper on type level arithmetic using type classes I've seen
 
is "Faking it: simulating dependent types in Haskell"
 
is "Faking it: simulating dependent types in Haskell"
( http://www.cs.nott.ac.uk/~ctm/faking.ps.gz ). Most part of my
+
( http://www.cs.nott.ac.uk/~ctm/faking.ps.gz ). Most of
article is just duplicates his work.
+
this article comes from his work.
   
The great demonstration of type-level arithmetic is TypeNats package
+
A great demonstration of type-level arithmetic is in the TypeNats package,
 
which "defines type-level natural numbers and arithmetic operations on
 
which "defines type-level natural numbers and arithmetic operations on
 
them including addition, subtraction, multiplication, division and GCD"
 
them including addition, subtraction, multiplication, division and GCD"
 
( darcs get --partial --tag '0.1' http://www.eecs.tufts.edu/~rdocki01/typenats/ )
 
( darcs get --partial --tag '0.1' http://www.eecs.tufts.edu/~rdocki01/typenats/ )
   
I should also mention here Oleg Kiselyov page on type-level
+
I should also mention here Oleg Kiselyov's page on type-level
 
programming in Haskell: http://okmij.org/ftp/Haskell/types.html
 
programming in Haskell: http://okmij.org/ftp/Haskell/types.html
   
   
There are plenty of GADT-related papers, but best for beginners
+
There are plenty of GADT-related papers, but the best one for beginners
remains the "Fun with phantom types"
+
is "Fun with phantom types"
 
(http://www.informatik.uni-bonn.de/~ralf/publications/With.pdf).
 
(http://www.informatik.uni-bonn.de/~ralf/publications/With.pdf).
 
Phantom types is another name of GADT. You should also know that this
 
Phantom types is another name of GADT. You should also know that this
paper uses old GADT syntax. This paper is must-read because it
+
paper uses old GADT syntax. This paper is a must-read because it
contains numerous examples of practical GADT usage - theme completely
+
contains numerous examples of practical GADT usage - a theme completely
 
omitted from my article.
 
omitted from my article.
   
   
Other GADT-related papers i know:
+
Other GADT-related papers:
   
 
"Dynamic Optimization for Functional Reactive Programming using
 
"Dynamic Optimization for Functional Reactive Programming using

Revision as of 18:51, 25 October 2011


For a long time, I didn't understand what GADTs were or how they could be used. It was sort of a conspiracy of silence - people who understood GADTs thought that everything was obvious and didn't need further explanation, but I still couldn't understand them.

Now that I have an idea how it works, I think that it was really obvious. :) So, I want to share my understanding GADTs. Maybe the way I realized how GADTs work could help someone else. See also Generalised algebraic datatype

Contents

1 Type functions

A "data" declaration is a way to declare both type constructor and data constructors. For example,

data Either a b = Left a | Right b

declares type constructor "Either" and two data constructors "Left" and "Right". Ordinary Haskell functions works with data constructors:

isLeft (Left a) = True
isLeft (Right b) = False

but there is also an analogous way to work with type constructors!

type X a = Either a a

declares a TYPE FUNCTION named "X". Its parameter "a" must be some type and it returns some type as its result. We can't use "X" on data values, but we can use it on type values. Type constructors declared with "data" statements and type functions declared with "type" statements can be used together to build arbitrarily complex types. In such "computations" type constructors serves as basic "values" and type functions as a way to process them.

Indeed, type functions in Haskell are very limited compared to ordinary functions - they don't support pattern matching, nor multiple statements, nor recursion.

2 Hypothetical Haskell extension - Full-featured type functions

Let's build a hypothetical Haskell extension that mimics, for type functions, the well-known ways to define ordinary functions, including pattern matching:

type F [a] = Set a

multiple statements (this is meaningful only in the presence of pattern matching):

type F Bool   = Char
     F String = Int

and recursion (which again needs pattern matching and multiple statements):

type F [a]       = F a
     F (Map a b) = F b
     F (Set a)   = F a
     F a         = a

As you may already have guessed, this last definition calculates a simple base type of arbitrarily-nested collections, e.g.:

F [[[Set Int]]] = 
F [[Set Int]] =
F [Set Int] = 
F (Set Int) =
F Int =
Int

Let's not forget about statement guards:

type F a | IsSimple a == TrueType  = a

Here we define type function F only for simple datatypes by using a guard type function "IsSimple":

type IsSimple Bool = TrueType
     IsSimple Int = TrueType
     ....
     IsSimple Double = TrueType
     IsSimple a = FalseType
 
data TrueType = T
data FalseType = F

These definitions seem a bit odd, and while we are in imaginary land, let's consider a way to write this shorter:

type F a | IsSimple a  = a
 
type IsSimple Bool
     IsSimple Int
     ....
     IsSimple Double

Here, we defined a list of simple types. The implied result of all written statements for "IsSimple" is True, and False for everything else. Essentially, "IsSimple" is a TYPE PREDICATE!

I really love it! :) How about constructing a predicate that traverses a complex type trying to decide whether it contains "Int" anywhere?

type HasInt Int
     HasInt [a] = HasInt a
     HasInt (Set a) = HasInt a
     HasInt (Map a b) | HasInt a
     HasInt (Map a b) | HasInt b

or a type function that substitutes one type with another inside arbitrarily-deep types:

type Replace t a b | t==a    =  b
     Replace [t] a b         =  [Replace t a b]
     Replace (Set t) a b     =  Set (Replace t a b)
     Replace (Map t1 t2) a b =  Map (Replace t1 a b) (Replace t2 a b)
     Replace t a b           =  t

3 One more hypothetical extension - multi-value type functions

Let's add more fun! We will introduce one more hypothetical Haskell extension - type functions that may have MULTIPLE VALUES. Say,

type Collection a = [a]
     Collection a = Set a
     Collection a = Map b a

So, "Collection Int" has "[Int]", "Set Int" and "Map String Int" as its values, i.e. different collection types with elements of type "Int".

Pay attention to the last statement of the "Collection" definition, where we used the type variable "b" that was not mentioned on the left side, nor defined in any other way. Since it's perfectly possible for the "Collection" function to have multiple values, using some free variable on the right side that can be replaced with any type is not a problem at all. "Map Bool Int", "Map [Int] Int" and "Map Int Int" all are possible values of "Collection Int" along with "[Int]" and "Set Int".

At first glance, it seems that multiple-value functions are meaningless - they can't be used to define datatypes, because we need concrete types here. But if we take another look, they can be useful to define type constraints and type families.

We can also represent a multiple-value function as a predicate:

type Collection a [a]
     Collection a (Set a)
     Collection a (Map b a)

If you're familiar with Prolog, you should know that a predicate, in contrast to a function, is a multi-directional thing - it can be used to deduce any parameter from the other ones. For example, in this hypothetical definition:

head | Collection Int a  ::  a -> Int

we define a 'head' function for any Collection containing Ints.

And in this, again, hypothetical definition:

data Safe c  | Collection c a =  Safe c a

we deduced element type 'a' from collection type 'c' passed as the parameter to the type constructor.


4 Back to real Haskell - type classes

After reading about all of these glorious examples, you may be wondering "Why doesn't Haskell support full-featured type functions?" Hold your breath... Haskell already contains them, and GHC has implemented all of the capabilities mentioned above for more than 10 years! They were just named... TYPE CLASSES! Let's translate all of our examples to their language:

class IsSimple a
instance IsSimple Bool
instance IsSimple Int
     ....
instance IsSimple Double

The Haskell'98 standard supports type classes with only one parameter. That limits us to only defining type predicates like this one. But GHC and Hugs support multi-parameter type classes that allow us to define arbitrarily-complex type functions

class Collection a c
instance Collection a [a]
instance Collection a (Set a)
instance Collection a (Map b a)

All of the "hypothetical" Haskell extensions we investigated earlier are actually implemented at the type class level!

Pattern matching:

instance Collection a [a]

Multiple statements:

instance Collection a [a]
instance Collection a (Set a)

Recursion:

instance (Collection a c) => Collection a [c]

Pattern guards:

instance (IsSimple a) => Collection a (UArray a)


Let's define a type class which contains any collection which uses Int in its elements or indexes:

class HasInt a
instance HasInt Int
instance (HasInt a) => HasInt [a]
instance (HasInt a) => HasInt (Map a b)
instance (HasInt b) => HasInt (Map a b)


Another example is a class that replaces all occurrences of 'a' with 'b' in type 't' and return the result as 'res':

class Replace t a b res
instance Replace t a a t
instance Replace [t] a b [Replace t a b]
instance (Replace t a b res)
          => Replace (Set t) a b (Set res)
instance (Replace t1 a b res1, Replace t2 a b res2)
          => Replace (Map t1 t2) a b (Map res1 res2)
instance Replace t a b t

You can compare it to the hypothetical definition we gave earlier. It's important to note that type class instances, as opposed to function statements, are not checked in order. Instead, the most _specific_ instance is automatically selected. So, in the Replace case, the last instance, which is the most general instance, will be selected only if all the others fail to match, which is what we want.

In many other cases this automatic selection is not powerful enough and we are forced to use some artificial tricks or complain to the language developers. The two most well-known language extensions proposed to solve such problems are instance priorities, which allow us to explicitly specify instance selection order, and '/=' constraints, which can be used to explicitly prohibit unwanted matches:

instance Replace t a a t
instance (a/=b)   => Replace [t] a b [Replace t a b]
instance (a/=b, t/=[_]) => Replace t a b t

You can check that these instances no longer overlap.


In practice, type-level arithmetic by itself is not very useful. It becomes a fantastic tool when combined with another feature that type classes provide - member functions. For example:

class Collection a c where
    foldr1 :: (a -> a -> a) -> c -> a
 
class Num a where
    (+) :: a -> a -> a
 
sum :: (Num a, Collection a c) =>  c -> a
sum = foldr1 (+)


I'll also be glad to see the possibility of using type classes in data declarations, like this:

data Safe c = (Collection c a) => Safe c a

but as far as I know, this is not yet implemented.


UNIFICATION ...


5 Back to GADTs

If you are wondering how all of these interesting type manipulations relate to GADTs, here is the answer. As you know, Haskell contains highly developed ways to express data-to-data functions. We also know that Haskell contains rich facilities to write type-to-type functions in the form of "type" statements and type classes. But how do "data" statements fit into this infrastructure?

My answer: they just define a type-to-data constructor translation. Moreover, this translation may give multiple results. Say, the following definition:

data Maybe a = Just a | Nothing

defines type-to-data constructors function "Maybe" that has a parameter "a" and for each "a" has two possible results - "Just a" and "Nothing". We can rewrite it in the same hypothetical syntax that was used above for multi-value type functions:

data Maybe a = Just a
     Maybe a = Nothing

Or how about this:

data List a = Cons a (List a)
     List a = Nil

and this:

data Either a b = Left a
     Either a b = Right b

But how flexible are "data" definitions? As you should remember, "type" definitions were very limited in their features, while type classes, on the other hand, were more developed than ordinary Haskell functions facilities. What about features of "data" definitions examined as sort of functions?

On the one hand, they supports multiple statements and multiple results and can be recursive, like the "List" definition above. On the other, that's all - no pattern matching or even type constants on the left side and no guards.

Lack of pattern matching means that the left side can contain only free type variables. That in turn means that the left sides of all "data" statements for a type will be essentially the same. Therefore, repeated left sides in multi-statement "data" definitions are omitted and instead of

data Either a b = Left a
     Either a b = Right b

we write just

data Either a b = Left a
                | Right b


And here we finally come to GADTs! It's just a way to define data types using pattern matching and constants on the left side of "data" statements! Let's say we want to do this:

data T String = D1 Int
     T Bool   = D2
     T [a]    = D3 (a,a)


We cannot do this using a standard data definition. So, now we must use a GADT definition:

data T a where
    D1 :: Int -> T String
    D2 :: T Bool
    D3 :: (a,a) -> T [a]

Amazed? After all, GADTs seem to be a really simple and obvious extension to data type definition facilities.




The idea here is to allow a data constructor's return type to be specified directly:

data Term a where
    Lit :: Int ->  Term Int
    Pair :: Term a -> Term b -> Term (a,b)
    ...

In a function that performs pattern matching on Term, the pattern match gives type as well as value information. For example, consider this function:

eval :: Term a -> a
eval (Lit i)     = i
eval (Pair a b)  = (eval a, eval b)
...

If the argument matches Lit, it must have been built with a Lit constructor, so type 'a' must be Int, and hence we can return 'i' (an Int) in the right hand side. The same thing applies to the Pair constructor.




6 Further reading

The best paper on type level arithmetic using type classes I've seen is "Faking it: simulating dependent types in Haskell" ( http://www.cs.nott.ac.uk/~ctm/faking.ps.gz ). Most of this article comes from his work.

A great demonstration of type-level arithmetic is in the TypeNats package, which "defines type-level natural numbers and arithmetic operations on them including addition, subtraction, multiplication, division and GCD" ( darcs get --partial --tag '0.1' http://www.eecs.tufts.edu/~rdocki01/typenats/ )

I should also mention here Oleg Kiselyov's page on type-level programming in Haskell: http://okmij.org/ftp/Haskell/types.html


There are plenty of GADT-related papers, but the best one for beginners is "Fun with phantom types" (http://www.informatik.uni-bonn.de/~ralf/publications/With.pdf). Phantom types is another name of GADT. You should also know that this paper uses old GADT syntax. This paper is a must-read because it contains numerous examples of practical GADT usage - a theme completely omitted from my article.


Other GADT-related papers:

"Dynamic Optimization for Functional Reactive Programming using Generalized Algebraic Data Types" http://www.cs.nott.ac.uk/~nhn/Publications/icfp2005.pdf

"Phantom types" (actually more scientific version of "Fun with phantom types") http://citeseer.ist.psu.edu/rd/0,606209,1,0.25,Download/http:qSqqSqwww.informatik.uni-bonn.deqSq~ralfqSqpublicationsqSqPhantom.ps.gz

"Phantom types and subtyping" http://arxiv.org/ps/cs.PL/0403034

"Existentially quantified type classes" by Stuckey, Sulzmann and Wazny (URL?)





7 Random rubbish from previous versions of article

data family Map k :: * -> *
 
data instance Map ()     v = MapUnit (Maybe v)
data instance Map (a, b) v = MapPair (Map a (Map b v))




let's consider well-known 'data' declarations:

data T a = D a a Int

it can be seen as function 'T' from type 'a' to some data constructor.

'T Bool', for example, gives result 'D Bool Bool Int', while

'T [Int]' gives result 'D [Int] [Int] Int'.

'data' declaration can also have several "results", say

data Either a b = Left a | Right b

and "result" of 'Either Int String' can be either "Left Int" or "Right String"




Well, to give compiler confidence that 'a' can be deduced in just one way from 'c', we can add some form of hint:

type Collection :: a c | c->a
     Collection a [a]
     Collection a (Set a)
     Collection a (Map b a)

The first line i added tell the compiler that Collection predicate has two parameters and the second parameter determines the first. Based on this restriction, compiler can detect and prohibit attempts to define different element types for the same collection:

type Collection :: a c | c->a
     Collection a (Map b a)
     Collection b (Map b a)  -- error! prohibits functional dependency

Of course, Collection is just a function from 'c' to 'a', but if we will define it directly as a function:

type Collection [a]        = a
     Collection (Set a)    = a
     Collection (Map b a)  = a

- it can't be used in 'head' definition above. Moreover, using functional dependencies we can define bi-directional functions:

type TwoTimesBigger :: a b | a->b, b->a
     TwoTimesBigger Int8 Int16
     TwoTimesBigger Int16 Int32
     TwoTimesBigger Int32 Int64

or predicates with 3, 4 or more parameters with any relations between them. It's a great power!