Difference between revisions of "GHC/Type families"

From HaskellWiki
< GHC
Jump to navigation Jump to search
m
Line 8: Line 8:
 
== What do I need to use type families? ==
 
== What do I need to use type families? ==
   
Indexed type families are implemented in GHC's HEAD (from version 6.7), which you can obtain in binary form from the [http://www.haskell.org/ghc/dist/current/dist/ nightly snapshots] or in source form from the [http://hackage.haskell.org/trac/ghc/wiki/Building/GettingTheSources source repository]. Please use the the [mailto:glasgow-haskell-users@haskell.org GHC mailing list] for questions or for the discussion of this language extension and its description on this wiki page.
+
Indexed type families are implemented in GHC's HEAD (from version 6.9), which you can obtain in binary form from the [http://www.haskell.org/ghc/dist/current/dist/ nightly snapshots] or in source form from the [http://hackage.haskell.org/trac/ghc/wiki/Building/GettingTheSources source repository]. Please use the the [mailto:glasgow-haskell-users@haskell.org GHC mailing list] for questions or for the discussion of this language extension and its description on this wiki page.
   
To enable type families in GHC 6.7, you need to supply the compiler option <tt>-XTypeFamilies</tt> (or <tt>-fglasgow-exts</tt>, which implies it). NB: The option used to be called <tt>-findexed-types</tt> and then <tt>-ftype-families</tt> before GHC's change in policy for options enabling language extensions.
+
To enable type families in GHC 6.9, you need to supply the compiler option <tt>-XTypeFamilies</tt> (or <tt>-fglasgow-exts</tt>, which implies it). NB: The option used to be called <tt>-findexed-types</tt> and then <tt>-ftype-families</tt> before GHC's change in policy for options enabling language extensions.
   
 
== What are type families? ==
 
== What are type families? ==

Revision as of 04:49, 9 October 2007

Type families in GHC

Indexed type families are a new addition to GHC's type extensions. They are currently an experimental feature whose design is bound to still change to some degree. Type families are a generalisation of associated data types and associated type synonyms, which are sketched in a recent draft paper. They essentially provide type-indexed data types and named functions on types, which are useful for generic programming and highly parameterised library interfaces as well as interfaces with enhanced static information, much like dependent types.

NB: Type families are currently only available in GHC's development version. Type synonym families are especially new and only recommended for early adopters. In any case, feedback of what works and what doesn't is most appreciated. Please report bugs via the GHC bug tracker, ideally accompanied by a small example program that demonstrates the problem.

What do I need to use type families?

Indexed type families are implemented in GHC's HEAD (from version 6.9), which you can obtain in binary form from the nightly snapshots or in source form from the source repository. Please use the the GHC mailing list for questions or for the discussion of this language extension and its description on this wiki page.

To enable type families in GHC 6.9, you need to supply the compiler option -XTypeFamilies (or -fglasgow-exts, which implies it). NB: The option used to be called -findexed-types and then -ftype-families before GHC's change in policy for options enabling language extensions.

What are type families?

Indexed type families, or type families for short, are type constructors that represent sets of types. Set members are denoted by supplying the type family constructor with type parameters, which are called type indicies. The difference between vanilla parametrised type constructors and family constructors is much like between parametrically polymorphic functions and (ad-hoc polymorphic) methods of type classes. Parametric polymorphic functions behave the same at all type instances, whereas class methods can change their behaviour in dependence on the class type parameters. Similarly, vanilla type constructors imply the same data representation for all type instances, but family constructors can have varying representation types for varying type indicies.

Indexed type families come in two flavours: data families and type synonym families. They are the indexed family variants of algebraic data types and type synonyms, respectively. The instances of data families can be data types and newtypes.

An associated data type example

As an example, consider Ralf Hinze's generalised tries, a form of generic finite maps.

The class declaration

We define a type class whose instances are the types that we can use as keys in our generic maps:

class GMapKey k where
  data GMap k :: * -> *
  empty       :: GMap k v
  lookup      :: k -> GMap k v -> Maybe v
  insert      :: k -> v -> GMap k v -> GMap k v

The interesting part is the associated data family declaration of the class. It gives a kind signature (here * -> *) for the associated data type GMap k - analog to how methods receive a type signature in a class declaration.

What it is important to notice is that the first parameter of the associated type GMap coincides with the class parameter of GMapKey. This indicates that also in all instances of the class, the instances of the associated data type need to have their first argument match up with the instance type. In general, the type arguments of an associated type can be a subset of the class parameters (in a multi-parameter type class) and they can appear in any order, possibly in an order other than in the class head. The latter can be useful if the associated data type is partially applied in some contexts.

The second important point is that as GMap k has kind * -> * and k (implicitly) has kind *, the type constructor GMap (without an argument) has kind * -> * -> *. Consequently, we see that GMap is applied to two arguments in the signatures of the methods empty, lookup, and insert.

An Int instance

To use Ints as keys into generic maps, we declare an instance that simply uses Data.Map, thusly:

instance GMapKey Int where
  data GMap Int v        = GMapInt (Data.Map.Map Int v)
  empty                  = GMapInt Data.Map.empty
  lookup k (GMapInt m)   = Data.Map.lookup k m
  insert k v (GMapInt m) = GMapInt (Data.Map.insert k v m)

The Int instance of the associated data type GMap needs to have both of its parameters, but as only the first one corresponds to a class parameter, the second needs to be a type variable (here v). As mentioned before any associated type parameter that corresponds to a class parameter must be identical to the class parameter in each instance. The right hand side of the associated data declaration is like that of any other data type.

NB: At the moment, GADT syntax is not allowed for associated data types (or other indexed types). This is not a fundemental limitation, but just a shortcoming of the current implementation, which we expect to lift in the future.

As an exercise, implement an instance for Char that maps back to the Int instance using the conversion functions Char.ord and Char.chr.

A unit instance

Generic definitions, apart from elementary types, typically cover units, products, and sums. We start here with the unit instance for GMap:

instance GMapKey () where
  data GMap () v           = GMapUnit (Maybe v)
  empty                    = GMapUnit Nothing
  lookup () (GMapUnit v)   = v
  insert () v (GMapUnit _) = GMapUnit $ Just v

For unit, the map is just a Maybe value.

Product and sum instances

Nest, let us define the instances for pairs and sums (i.e., Either):

instance (GMapKey a, GMapKey b) => GMapKey (a, b) where
  data GMap (a, b) v            = GMapPair (GMap a (GMap b v))
  empty		                = GMapPair empty
  lookup (a, b) (GMapPair gm)   = lookup a gm >>= lookup b 
  insert (a, b) v (GMapPair gm) = GMapPair $ case lookup a gm of
				    Nothing  -> insert a (insert b v empty) gm
				    Just gm2 -> insert a (insert b v gm2  ) gm

instance (GMapKey a, GMapKey b) => GMapKey (Either a b) where
  data GMap (Either a b) v                = GMapEither (GMap a v) (GMap b v)
  empty                                   = GMapEither empty empty
  lookup (Left  a) (GMapEither gm1  _gm2) = lookup a gm1
  lookup (Right b) (GMapEither _gm1 gm2 ) = lookup b gm2
  insert (Left  a) v (GMapEither gm1 gm2) = GMapEither (insert a v gm1) gm2
  insert (Right a) v (GMapEither gm1 gm2) = GMapEither gm1 (insert a v gm2)

If you find this code algorithmically surprising, I'd suggest to have a look at Ralf Hinze's paper. The only novelty concerning associated types, in these two instances, is that the instances have a context (GMapKey a, GMapKey b). Consequently, the right hand sides of the associated type declarations can use GMap recursively at the key types a and b - not unlike the method definitions use the class methods recursively at the types for which the class is given in the instance context.

Using a generic map

Finally, some code building and quering a generic map:

myGMap :: GMap (Int, Either Char ()) String
myGMap = insert (5, Left 'c') "(5, Left 'c')"    $
	 insert (4, Right ()) "(4, Right ())"    $
	 insert (5, Right ()) "This is the one!" $
	 insert (5, Right ()) "This is the two!" $
	 insert (6, Right ()) "(6, Right ())"    $
	 insert (5, Left 'a') "(5, Left 'a')"    $
	 empty
main = putStrLn $ maybe "Couldn't find key!" id $ lookup (5, Right ()) myGMap

Download the code

If you want to play with this example without copying it off the wiki, just download the source code for GMap from GHC's test suite.

Detailed definition of data families

Data families appear in two flavours: (1) they can be defined on the toplevel or (2) they can appear inside type classes (in which case they are known as associated types). The former is the more general variant, as it lacks the requirement for the type-indexes to coincide with the class parameters. However, the later can lead to more clearly structured code and compiler warnings if some type instances were - possibly accidentally - omitted. In the following, we always discuss the general toplevel form first and then cover the additional constraints placed on associated types.

Family declarations

Indexed data families are introduced by a signature, such as

data family GMap k :: * -> *

The special family distinguishes family from standard data declarations. The result kind annotation is optional and, as usual, defaults to * if omitted. An example is

data family Array e

Named arguments can also be given explicit kind signatures if needed. Just as with GADT declarations named arguments are entirely optional, so that we can declare Array alternatively with

data family Array :: * -> *

Associated family declarations

When a data family is declared as part of a type class, we drop the family special. The GMap declaration takes the following form

class GMapKey k where
  data GMap k :: * -> *
  ...

In contrast to toplevel declarations, named arguments must be used for all type parameters that are to be used as type-indexes. Moreover, the argument names must be class parameters. Each class parameter may only be used at most once per associated type, but some may be omitted and they may be in an order other than in the class head. Hence, the following contrieved example is admissible:

class C a b c where
  data T c a :: *

Instance declarations

Instance declarations of data and newtype families are very similar to standard data and newtype declarations. The only two differences are that the keyword data or newtype is followed by instance and that some or all of the type arguments can be non-variable types, but may not contain forall types or type synonym families. However, data families are generally allowed in type parameters, and type synonyms are allowed as long as they are fully applied and expand to a type that is itself admissible - exactly as this is required for occurrences of type synonyms in class instance parameters. For example, the Either instance for GMap is

data instance GMap (Either a b) v = GMapEither (GMap a v) (GMap b v)

In this example, the declaration has only one variant. In general, it can be any number.

Data and newtype instance declarations are only legit when an appropriate family declaration is in scope - just like class instances require the class declaration to be visible. Moreover, each instance declaration has to conform to the kind determined by its family declaration. This implies that the number of parameters of an instance declaration matches the arity determined by the kind of the family. Although, all data families are declared with the data keyword, instances can be either data or newtypes, or a mix of both.

Even if type families are defined as toplevel declarations, functions that perform different computations for different family instances still need to be defined as methods of type classes. In particular, the following is not possible:

data family T a
data instance T Int  = A
data instance T Char = B
nonsence :: T a -> Int
nonsence A = 1             -- WRONG: These two equations together...
nonsence B = 2             -- ...will produce a type error.

Given the functionality provided by GADTs (Generalised Algebraic Data Types), it might seem as if a definition, such as the above, should be feasible. However, type families are - in contrast to GADTs - are open; i.e., new instances can always be added, possibly in other modules. Supporting pattern matching across different data instances would require a form of extensible case construct.

Associated type instances

When an associated family instance is declared within a type class instance, we drop the instance keyword in the family instance. So, the Either instance for GMap becomes:

instance (GMapKey a, GMapKey b) => GMapKey (Either a b) where
  data GMap (Either a b) v = GMapEither (GMap a v) (GMap b v
  ...

The most important point about associated family instances is that the type indexes corresponding to class parameters must be identical to the type given in the instance head; here this is first argument of GMap, namely Either a b, which coincides with the only class parameter. Any parameters to the family constructor that do not correspond to class parameters, need to be variables in every instance; here this is the variable v.

Instances for an associated family can only appear as part of instances declarations of the class in which the family was declared - just as with the equations of the methods of a class. Also in correspondence to how methods are handled, declarations of associated types can be omitted in class instances. If an associated family instance is omitted, the corresponding instance type is not inhabited; i.e., only diverging expressions, such as undefined, can assume the type.

Scoping of class parameters

In the case of multi-parameter type classes, the visibility of class parameters in the right-hand side of associated family instances depends solely on the parameters of the data family. As an example, consider the simple class declaration

class C a b where
  data T a

Only one of the two class parameters is a parameter to the data family. Hence, the following instance declaration is invalid:

instance C [c] d where
  data T [c] = MkT (c, d)    -- WRONG!!  'd' is not in scope

Here, the right-hand side of the data instance mentions the type variable d that does not occur in its left-hand side. We cannot admit such data instances as they would compromise type safety.

Type class instances of family instances

Type class instances of instances of data families can be defined as usual, and in particular data instance declarations can have deriving clauses. For example, we can write

data GMap () v = GMapUnit (Maybe v)
               deriving Show

which implcitly defines an instance of the form

instance Show v => Show (GMap () v) where ...

Note that class instances are always for particular instances of a data family and never for an entire family as a whole. This for essentially the same reasons that we cannot define a toplevel function that performs pattern matching on the data constructors of different instances of a single type family. It would require a form of extensible case construct.

Overlap

The instance declarations of a data family used in a single program may not overlap at all, independent of whether they are associated or not. In contrast to type class instances, this is not only a matter of consistency, but one of type safety.

Import and export

The association of data constructors with type families is more dynamic than that is the case with standard data and newtype declarations. In the standard case, the notation T(..) in an import or export list denotes the type constructor and all the data constructors introduced in its declaration. However, a family declaration never introduces any data constructors; instead, data constructors are introduced by family instances. As a result, which data constructors are associated with a type family depends on the currently visible instance declarations for that family. Consequently, an import or export item of the form T(..) denotes the family constructor and all currently visible data constructors - in the case of an export item, these may be either imported or defined in the current module. The treatment of import and export items that explicitly list data constructors, such as GMap(GMapEither), is analogous.

Associated families

As expected, an import or export item of the form C(..) denotes all of the class' methods and associated types. However, when associated types are explicitly listed as subitems of a class, we need some new syntax, as uppercase identifiers as subitems are usually data constructors, not type constructors. To clarify that we denote types here, each associated type name needs to be prefixed by the keyword type. So for example, when explicitly listing the components of the GMapKey class, we write GMapKey(type GMap, empty, lookup, insert).

Examples

Assuming our running GMapKey class example, let us look at some export lists and their meaning:

  • module GMap (GMapKey) where...: Exports just the class name.
  • module GMap (GMapKey(..)) where...: Exports the class, the associated type GMap and the member functions empty, lookup, and insert. None of the data constructors is exported.
  • module GMap (GMapKey(..), GMap(..)) where...: As before, but also exports all the data constructors GMapInt, GMapChar, GMapUnit, GMapPair, and GMapUnit.
  • module GMap (GMapKey(empty, lookup, insert), GMap(..)) where...: As before.
  • module GMap (GMapKey, empty, lookup, insert, GMap(..)) where...: As before.

Finally, you can write GMapKey(type GMap) to denote both the class GMapKey as well as its associated type GMap. However, you cannot write GMapKey(type GMap(..)) — i.e., sub-component specifications cannot be nested. To specify GMap's data constructors, you have to list it separately.

Instances

Family instances are implicitly exported, just like class instances. However, this applies only to the heads of instances, not to the data constructors an instance defines.

An associated type synonym example

Type synonym families are an alternative to functional dependencies, which makes functional dependency examples well suited to introduce type synonym families. In fact, type families are a more functional way to express the same as functional dependencies (despite the name!), as they replace the relational notation of functional dependencies by an expression-oriented notation; i.e., functions on types are really represented by functions and not relations.

The class declaration

Here's an example from Mark Jones' seminal paper on functional dependencies:

class Collects e ce | ce -> e where
  empty  :: ce
  insert :: e -> ce -> ce
  member :: e -> ce -> Bool
  toList :: ce -> [e]

With associated type synonyms we can write this as

class Collects ce where
  type Elem ce
  empty  :: ce
  insert :: Elem ce -> ce -> ce
  member :: Elem ce -> ce -> Bool
  toList :: ce -> [Elem ce]

Instead of the multi-parameter type class, we use a single parameter class, and the parameter e turned into an associated type synonym Elem ce.

An instance

Instances change in correspondingly. An instance of the two-parameter class

instance Eq e => Collects e [e] where
  empty           = []
  insert e l      = (e:l)
  member e []     = False
  member e (x:xs) 
    | e == x      = True
    | otherwise   = member e xs
  toList l        = l

become an instance of a single-parameter class, where the dependant type parameter turns into an associated type instance declaration:

instance Eq e => Collects [e] where
  type Elem [e]   = e
  empty           = []
  insert e l      = (e:l)
  member e []     = False
  member e (x:xs) 
    | e == x      = True
    | otherwise   = member e xs
  toList l        = l

Using generic collections

With Functional Dependencies the code would be:

sumCollects :: (Collects e c1, Collects e c2) => c1 -> c2 -> c2
sumCollects c1 c2 = foldr insert c2 (toList c1)

In contrast, with associated type synonyms, we get:

sumCollects :: (Collects c1, Collects c2, Elem c1 ~ Elem c2) => c1 -> c2 -> c2
sumCollects c1 c2 = foldr insert c2 (toList c1)

Detailed definition of type synonym families

Type families appear in two flavours: (1) they can be defined on the toplevel or (2) they can appear inside type classes (in which case they are known as associated type synonyms). The former is the more general variant, as it lacks the requirement for the type-indexes to coincide with the class parameters. However, the later can lead to more clearly structured code and compiler warnings if some type instances were - possibly accidentally - omitted. In the following, we always discuss the general toplevel form first and then cover the additional constraints placed on associated types.

Family declarations

Indexed type families are introduced by a signature, such as

type family Elem c :: *

The special family distinguishes family from standard type declarations. The result kind annotation is optional and, as usual, defaults to * if omitted. An example is

type family Elem c

Parameters can also be given explicit kind signatures if needed. We call the number of parameters in a type family declaration, the family's arity, and all applications of a type family must be fully saturated w.r.t. to that arity. This requirement is unlike ordinary type synonyms and it implies that the kind of a type family is not sufficient to determine a family's arity, and hence in general, also insufficient to determine whether a type family application is well formed. As an example, consider the following declaration:

type family F a b :: * -> *   -- F's arity is 2, 
                              -- although it's overall kind is * -> * -> * -> *

Given this declaration the following are examples of well-formed and malformed types:

F Char [Int]       -- OK!  Kind: * -> *
F Char [Int] Bool  -- OK!  Kind: *
F IO Bool          -- WRONG: kind mismatch in the first argument
F Bool             -- WRONG: unsaturated application

Associated family declarations

When a type family is declared as part of a type class, we drop the family special. The Elem declaration takes the following form

class Collects ce where
  type Elem ce :: *
  ...

The argument names of the type family must be class parameters. Each class parameter may only be used at most once per associated type, but some may be omitted and they may be in an order other than in the class head. Hence, the following contrieved example is admissible:

class C a b c where
  type T c a :: *

These rules are exactly as for associated data families.

Instance declarations

Instance declarations of type families are very similar to standard type synonym declarations. The only two differences are that the keyword type is followed by instance and that some or all of the type arguments can be non-variable types, but may not contain forall types or type synonym families. However, data families are generally allowed, and type synonyms are allowed as long as they are fully applied and expand to a type that is admissible - these are the exact same requirements as for data instances. For example, the [e] instance for Elem is

type instance Elem [e] = e

Type family instance declarations are only legitimate when an appropriate family declaration is in scope - just like class instances require the class declaration to be visible. Moreover, each instance declaration has to conform to the kind determined by its family declaration, and the number of type parameters in an instance declaration must match the number of type parameters in the family declaration. Finally, the right-hand side of a type instance must be a monotype; i.e., it may not include explicit foralls. Here are some examples of admissible and illegal type instances:

type family F a :: *
type instance F [Int]              = Int         -- OK!
type instance F String             = Char        -- OK!
type instance F (F a)              = a           -- WRONG: type parameter mentions a type family
type instance F (forall a. (a, b)) = b           -- WRONG: a forall type appears in a type parameter
type instance F Float              = forall a.a  -- WRONG: right-hand side may not be a forall type

type family G a b :: * -> *
type instance G Int            = (,)     -- WRONG: must be two type parameters
type instance G Int Char Float = Double  -- WRONG: must be two type parameters

Associated type instances

When an associated family instance is declared within a type class instance, we drop the instance keyword in the family instance. So, the [e] instance for Elem becomes:

instance (Eq (Elem [e])) => Collects ([e]) where
  type Elem [e] = e
  ...

The most important point about associated family instances is that the type indexes corresponding to class parameters must be identical to the type given in the instance head; here this is [e], which coincides with the only class parameter.

Instances for an associated family can only appear as part of instances declarations of the class in which the family was declared - just as with the equations of the methods of a class. Also in correspondence to how methods are handled, declarations of associated types can be omitted in class instances. If an associated family instance is omitted, the corresponding instance type is not inhabited; i.e., only diverging expressions, such as undefined, can assume the type.

Overlap

The instance declarations of a type family used in a single program may only overlap if the right-hand sides of the overlapping instances coincide for the overlapping types. More formally, two instance declarations overlap if there is a substitution that makes the left-hand sides of the instances syntactically the same. Whenever that is the case, the right-hand sides of the instances must also be syntactically equal under the same substitution. This condition is independent of whether the type family is associated or not, and it not only a matter of consistency, but one of type safety.

Here are two example to illustrate the condition under which overlap is permitted.

type instance F (a, Int) = [a]
type instance F (Int, b) = [b]   -- overlap permitted

type instance G (a, Int)  = [a]
type instance G (Char, a) = [a]  -- ILLEGAL overlap, as [Char] /= [Int]

Decidability

In order to guarantee that type inference in the presence of type families decidable, we need to place a number of additional restrictions on the formation of type instance declarations. Instance declarations have the general form

type instance F t1 .. tn = t

where we require that for every type family application (G s1 .. sm) in t,

  1. the total number of symbols (type constructors and type variables) in s1 .. sm is strictly smaller than in t1 .. tn,
  2. for every type variable a, a occurs in s1 .. sm at most as often as in t1 .. tn, and
  3. s1 .. sm do not contain any type family constructors.

These restrictions are a conservative approximation of the exact conditions for ensuring decidability. They have been chosen such that a compiler can check them and that a programmer can easily decide whether a program meets the restrictions. Consequently, they do exclude perfectly fine programs, just as this the case with the restrictions on type class instances. A recent draft paper describes the reasons for these restrictions in more detail. If the option -fallow-undecidable-instances is passed to the compiler, the discussed restrictions do not apply, and it is the programmers responsibility to ensure that type inferences stay decidable.

Equality constraints

Type context can include equality constraints of the form t1 ~ t2, which express the constraint that t1 and t2 need to be the same type. In the presence of type families, decisions about type equality may need to be deferred to the caller of a function. This is accomplished by using an equality constraint in a function signature, such as

sumCollects :: (Collects c1, Collects c2, Elem c1 ~ Elem c2) => c1 -> c2 -> c2

where we require that the element type of c1 and c2 are the same. In general, the types t1 and t2 of an equality constraint may be arbitrary types. However, they may not contain rank-n or impredicative types, even if these GHC extensions are otherwise enabled.

Equality constraints can also appear in class and instance contexts. The former enable a simple translation of programs using functional dependencies into programs using family synonyms instead. The general idea is to rewrite a class declaration of the form

class C a b | a -> b

to

class (F a ~ b) => C a b where
  type F a

That is, we represent every functional dependency (FD) a1 .. an -> b by an FD type family F a1 .. an and a superclass context equality F a1 .. an ~ b. In class instances, we define the type instances of FD families in accordance with the class head. Method signatures are not affected by that process.

References

  • Associated Types with Class. Manuel M. T. Chakravarty, Gabriele Keller, Simon Peyton Jones, and Simon Marlow. In Proceedings of The 32nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'05), pages 1-13, ACM Press, 2005.
  • Associated Type Synonyms. Manuel M. T. Chakravarty, Gabriele Keller, and Simon Peyton Jones. In Proceedings of The Tenth ACM SIGPLAN International Conference on Functional Programming, ACM Press, pages 241-253, 2005.
  • System F with Type Equality Coercions. Martin Sulzmann, Manuel M. T. Chakravarty, Simon Peyton Jones, and Kevin Donnelly. In Proceedings of The Third ACM SIGPLAN Workshop on Types in Language Design and Implementation, ACM Press, 2007.
  • Towards Open Type Functions for Haskell. Tom Schrijvers, Martin Sulzmann, Simon Peyton-Jones, and Manuel M. T. Chakravarty. Unpublished manuscript, 2007.