Difference between revisions of "Phantom type"

From HaskellWiki
Jump to navigation Jump to search
m (haskell)
(no kind signatures)
(17 intermediate revisions by 7 users not shown)
Line 1: Line 1:
A '''phantom type''' is a [[type]] used only to construct other types;
+
A '''phantom type''' is a parametrised type whose parameters do not all appear on the right-hand side of its definition, e.g. from <tt>Control.Applicative</tt>:
its values are never used. Phantom types are used in [[Type arithmetic]], and for
 
[http://haskell.org/haskellwiki/Smart_constructors#Enforcing_the_constraint_statically encoding bounds checks in the type system.]
 
   
  +
<haskell>
An extension to Haskell 98 supported by [[GHC]] allows you to define datatypes without any constructors (and therefore no values other than [[bottom]]):
 
  +
newtype Const a b = Const { getConst :: a }
  +
</haskell>
  +
  +
Here <tt>Const</tt> is a phantom type, because the <tt>b</tt> parameter doesn't appear after the <tt>=</tt> sign.
  +
  +
Phantom types are useful in a variety of contexts: in <tt>[http://hackage.haskell.org/packages/archive/base/4.6.0.0/doc/html/Data-Fixed.html Data.Fixed]</tt> they are used with type classes to encode the precision being used, with [[smart constructors]] or GADTs they can encode information about how and where a value can be used,
  +
it may help to [[No kind signatures|avoid kind signatures]]
  +
or with more exotic extensions they can be used for [[Smart_constructors#Enforcing_the_constraint_statically|encoding bounds checks in the type system.]]
  +
  +
Since the values of type parameters in a phantom type may be unused, they are often used in combination with [[empty type]]s.
  +
  +
Phantom types are nearly always either <tt>newtype</tt> or <tt>data</tt>. It is possible to create "phantom type synonyms", but they are usually useless: since synonyms are expanded at compile time, the phantom type variable will be discarded.
  +
  +
==Simple examples==
  +
  +
A phantom type will have a declaration that looks something like this:
   
 
<haskell>
 
<haskell>
  +
data FormData a = FormData String
data MyType
 
 
</haskell>
 
</haskell>
   
  +
This looks strange since at first it seems the type parameter is unused and could be anything, without affecting the value inside. Indeed, one can write:
This lets the compiler recognize phantom types and ensure they aren't used improperly.
 
  +
  +
<haskell>
  +
changeType :: FormData a -> FormData b
  +
changeType (FormData str) = FormData str
  +
</haskell>
  +
  +
to change it from any type to any other. However, if the constructor is not exported then users of the library that defined <hask>FormData</hask> can't define functions like the above, so the type parameter can only be set or changed by library functions. So we might do:
  +
  +
<haskell>
  +
data Validated
  +
data Unvalidated
  +
  +
-- since we don't export the constructor itself,
  +
-- users with a String can only create Unvalidated values
  +
formData :: String -> FormData Unvalidated
  +
formData str = FormData str
  +
  +
-- Nothing if the data doesn't validate
  +
validate :: FormData Unvalidated -> Maybe (FormData Validated)
  +
validate (FormData str) = ...
  +
  +
-- can only be fed the result of a call to validate!
  +
useData :: FormData Validated -> IO ()
  +
useData (FormData str) = ...
  +
</haskell>
  +
  +
The beauty of this is that we can define functions that work on all kinds of <hask>FormData</hask>, but still can't turn unvalidated data into validated data:
  +
  +
<haskell>
  +
-- the library exports this
  +
liftStringFn :: (String -> String) -> FormData a -> FormData a
  +
liftStringFn fn (FormData str) = FormData (fn str)
  +
  +
-- the validation state is the *same* in the return type and the argument
  +
dataToUpper :: FormData a -> FormData a
  +
dataToUpper = liftStringFn (map toUpper)
  +
</haskell>
  +
  +
With type classes, we can even choose different behaviours conditional on information that is nonexistent at runtime:
  +
  +
<haskell>
  +
class Sanitise a where
  +
sanitise :: FormData a -> FormData Validated
  +
  +
-- do nothing to data that is already validated
  +
instance Sanitise Validated where
  +
sanitise = id
  +
  +
-- sanitise untrusted data
  +
instance Sanitise Unvalidated where
  +
sanitise (FormData str) = FormData (filter isAlpha str)
  +
</haskell>
  +
  +
This technique is perfect for e.g. escaping user input to a web application. We can ensure with zero overhead that the data is escaped once and only once everywhere that it needs to be, or else we get a compile-time error.
  +
  +
==The use of a type system to guarantee well-formedness.==
  +
  +
We create a Parameterized type in which the parameter does not appear
  +
on the rhs (shameless cutting and pasting from Daan Leijen and Erik Meijer)
  +
<haskell>
  +
data Expr a = Expr PrimExpr
  +
  +
constant :: Show a => a -> Expr a
  +
(.+.) :: Expr Int -> Expr Int -> Expr Int
  +
(.==.) :: Eq a=> Expr a-> Expr a-> Expr Bool
  +
(.&&.) :: Expr Bool -> Expr Bool-> Expr Bool
  +
  +
data PrimExpr
  +
= BinExpr BinOp PrimExpr PrimExpr
  +
| UnExpr UnOp PrimExpr
  +
| ConstExpr String
  +
  +
data BinOp
  +
= OpEq | OpAnd | OpPlus | ...
  +
</haskell>
  +
i.e. the datatype is such that we could get garbage such as
  +
<haskell>
  +
BinExpr OpEq (ConstExpr "1") (ConstExpr "\"foo\"")
  +
</haskell>
  +
but since we only expose the functions our attempts
  +
to create this expression via
  +
<haskell>
  +
constant 1 .==. constant "foo"
  +
</haskell>
  +
would fail to typecheck
  +
  +
== Why not type synonyms ==
  +
Remember that type synonyms are expanded behind the scenes before typechecking.
  +
Suppose that in the above example you replace the declaration of Expr with <hask>type Expr a = PrimExpr</hask>. Then <hask>Expr Int</hask> and <hask>Expr String</hask> are both expanded to <hask>PrimExpr</hask> before being compared, and those types would be compatible, defeating the point of using a phantom type.
  +
  +
== Comments ==
  +
I believe this technique is used when trying to interface
  +
with a language that would cause a runtime exception if the types
  +
were wrong but would have a go at running the expression first.
  +
(They use it in the context of SQL but I have also seen it in the
  +
context of FLI work.)
  +
  +
-- ChrisAngus
  +
  +
[http://www.brics.dk/RS/02/34/ A foundation for embedded languages] provides some formal background for embedding typed languages in Haskell, and also its references give a fairly comprehensive survey of uses of phantom types and related techniques.
  +
  +
== Further Reading ==
   
  +
[http://hdl.handle.net/1813/5614 First-Class Phantom Types]
----
 
   
The term "phantom type" already has an established use. A simple case is described (somewhat messily) in [[http://haskell.org/hawiki/PhantomTypes]]. [[http://www.google.com/search?hl=en&q=%22Phantom+types%22 This]] Google search lists many other uses of the term in that vein.
 
 
[[Category:Idioms]]
 
[[Category:Idioms]]
  +
[[Category:Glossary]]

Revision as of 16:49, 22 March 2013

A phantom type is a parametrised type whose parameters do not all appear on the right-hand side of its definition, e.g. from Control.Applicative:

newtype Const a b = Const { getConst :: a }

Here Const is a phantom type, because the b parameter doesn't appear after the = sign.

Phantom types are useful in a variety of contexts: in Data.Fixed they are used with type classes to encode the precision being used, with smart constructors or GADTs they can encode information about how and where a value can be used, it may help to avoid kind signatures or with more exotic extensions they can be used for encoding bounds checks in the type system.

Since the values of type parameters in a phantom type may be unused, they are often used in combination with empty types.

Phantom types are nearly always either newtype or data. It is possible to create "phantom type synonyms", but they are usually useless: since synonyms are expanded at compile time, the phantom type variable will be discarded.

Simple examples

A phantom type will have a declaration that looks something like this:

data FormData a = FormData String

This looks strange since at first it seems the type parameter is unused and could be anything, without affecting the value inside. Indeed, one can write:

changeType :: FormData a -> FormData b
changeType (FormData str) = FormData str

to change it from any type to any other. However, if the constructor is not exported then users of the library that defined FormData can't define functions like the above, so the type parameter can only be set or changed by library functions. So we might do:

data Validated
data Unvalidated

-- since we don't export the constructor itself,
-- users with a String can only create Unvalidated values
formData :: String -> FormData Unvalidated
formData str = FormData str

-- Nothing if the data doesn't validate
validate :: FormData Unvalidated -> Maybe (FormData Validated)
validate (FormData str) = ...

-- can only be fed the result of a call to validate!
useData :: FormData Validated -> IO ()
useData (FormData str) = ...

The beauty of this is that we can define functions that work on all kinds of FormData, but still can't turn unvalidated data into validated data:

-- the library exports this
liftStringFn :: (String -> String) -> FormData a -> FormData a
liftStringFn fn (FormData str) = FormData (fn str)

-- the validation state is the *same* in the return type and the argument
dataToUpper :: FormData a -> FormData a
dataToUpper = liftStringFn (map toUpper)

With type classes, we can even choose different behaviours conditional on information that is nonexistent at runtime:

class Sanitise a where
  sanitise :: FormData a -> FormData Validated

-- do nothing to data that is already validated
instance Sanitise Validated where
  sanitise = id

-- sanitise untrusted data
instance Sanitise Unvalidated where
  sanitise (FormData str) = FormData (filter isAlpha str)

This technique is perfect for e.g. escaping user input to a web application. We can ensure with zero overhead that the data is escaped once and only once everywhere that it needs to be, or else we get a compile-time error.

The use of a type system to guarantee well-formedness.

We create a Parameterized type in which the parameter does not appear on the rhs (shameless cutting and pasting from Daan Leijen and Erik Meijer)

data Expr a = Expr PrimExpr

constant :: Show a => a -> Expr a
(.+.)  :: Expr Int -> Expr Int -> Expr Int
(.==.) :: Eq a=> Expr a-> Expr a-> Expr Bool
(.&&.) :: Expr Bool -> Expr Bool-> Expr Bool

data PrimExpr
  = BinExpr   BinOp PrimExpr PrimExpr
  | UnExpr    UnOp PrimExpr
  | ConstExpr String

data BinOp
  = OpEq | OpAnd | OpPlus | ...

i.e. the datatype is such that we could get garbage such as

BinExpr OpEq (ConstExpr "1") (ConstExpr "\"foo\"")

but since we only expose the functions our attempts to create this expression via

constant 1 .==. constant "foo"

would fail to typecheck

Why not type synonyms

Remember that type synonyms are expanded behind the scenes before typechecking. Suppose that in the above example you replace the declaration of Expr with type Expr a = PrimExpr. Then Expr Int and Expr String are both expanded to PrimExpr before being compared, and those types would be compatible, defeating the point of using a phantom type.

Comments

I believe this technique is used when trying to interface with a language that would cause a runtime exception if the types were wrong but would have a go at running the expression first. (They use it in the context of SQL but I have also seen it in the context of FLI work.)

-- ChrisAngus

A foundation for embedded languages provides some formal background for embedding typed languages in Haskell, and also its references give a fairly comprehensive survey of uses of phantom types and related techniques.

Further Reading

First-Class Phantom Types