[Haskell] Fixed-length vectors in Haskell

Thu May 12 23:52:36 EDT 2005

David Menendez wrote:
] data Vec n a where
]     Nil  ::                 Vec Zero a
]     Cons :: a -> Vec n a -> Vec (Succ n) a

] data Vec_0   a = Nil
] data Vec_s v a = Cons a (v a)

Ashley Yakeley wrote:
] vcons a v Zero = a
] vcons a v (Succ n) = v n

S.M.Kahrs wrote:
] data Cons b a = a :. b a
] data NIL a = NIL

There is a common pattern in all these approaches to fixed-length
vectors: using *data* constructors to encode particular properties of
the data structure. This approach goes back to Chris Okasaki (``From
fast exponentiation to square matrices: An adventure in types.''
ICFP'99), who showed constructing not only of fixed-size vectors but
also matrices that are guaranteed to be square. He also showed
(quoting Ross Paterson, 1998) how to construct an AVL tree so that the
AVL invariant is enforced by construction:

http://www.haskell.org/pipermail/haskell/2003-April/011693.html

This approach has the immense advantage that the invariant cannot be
broken at all. One just can't have a list with fewer Cons cells than
there are elements in the list. Chris Okasaki (in the above message)
has also pointed out a drawback: all these data constructors take
space to be present in memory, and take time to traverse. So, in all
the above representations, to take the 1023d element of a 1024-element
vector, we have to traverse 1023 Cons cells.

The alternative is to encode the invariant -- the vector size, in our
problem -- in *type* constructors, in phantom types. For example,

> newtype PVector n a = PVector (Array Int a)

The run-time representation of PVector n a is the same as Array Int a,
which is quite efficient. There are no traces of the `Nat' can be
found: we use the type-level invariant for type-checking only. So, we
get additional assurances without distorting data structures and
without any run-time penalty whatsoever.

The drawback and the advantage of phantom types is that they are not
``bound'', so to speak, to a term -- they are just ``attached.'' If
the user gets hold of the PVector constructor, the user can easily
compromise our invariants. So, we have to hide PVector in a module and
expose only functions that build the values of PVector one way or the
other. We also have to make sure that these functions are correct --
because they constitute the trusted kernel of our invariant.  The
advantage of that approach is that we can implement very flexible
invariants. Come to think of it, this approach isn't that far from
LCF.

David Menendez wrote:

] We would like a function "vec :: a -> Vec n a", such that "vec x"
] returns a vector of length n with every element equal to x. The basic
] algorithm is simple:
]
]     vec{0}   x = Nil
]     vec{n+1} x = Cons x (vec{n} x)
]
] The usual way to declare a function like this involves a type class,
] which lets us define functions with different behaviors for different
] types:
]
]     class Nat n where vec :: a -> Vec n a
]
]     instance          Nat Zero     where vec x = Nil
]     instance Nat n => Nat (Succ n) where vec x = Cons x (vec x)
]
] This gives vec the type "Nat n => a -> Vec n a", which is good because
] it points out that the type argument to Vec must be a natural, formed
] from Zero or Succ applied to another natural.
]
] However, this solution is unsatisfactory, because it requires the Nat
] class to include every possible function which might depend on a
] type-level argument.

Actually it is not that difficult to create a Nat class with ``every
possible function'' as a member:

> class Nat' n e f where doit :: e a -> f n a

> newtype VecFN n a = VecFN (Vec n a)
> newtype ID a = ID a 
> unVecFN (VecFN x) = x
> unID (ID x) = x

> instance Nat' Zero ID VecFN where doit (ID a) = VecFN Nil
> instance Nat' n ID VecFN => Nat' (Succ n) ID VecFN where
>     doit (ID a) = VecFN (Cons a (unVecFN $ doit (ID a)))

> vec' :: Nat' n ID VecFN => a -> Vec n a
> vec' = unVecFN . doit . ID 

*Vector_GADT> vec' 2 :: (Vec Three Int)
[2,2,2]

If we wish now to implement fromList function, we do

> newtype FromList' n a = FromList'
>     { runFromList' :: Maybe (Vec n a) }
> 
> fromList' :: Nat' n [] FromList' => [a] -> Maybe (Vec n a)
> fromList' = runFromList' . doit
>
> instance Nat' Zero [] FromList' where doit = \l -> FromList' $ Just Nil
> instance Nat' n [] FromList' => Nat' (Succ n) [] FromList' where
>  doit = \l -> case l of
>                    x:xs -> FromList' (fmap (Cons x) (runFromList' $ doit xs))
>                    []   -> FromList' Nothing

*Vector_GADT> fromList' [1,2,3] :: (Maybe (Vec Three Int))
Just [1,2,3]
*Vector_GADT> fromList' [1,2] :: (Maybe (Vec Three Int))
Nothing

This tagging and untagging may be a bit annoying. OTH, FromList', ID,
etc. are all newtypes.

Ashley Yakeley wrote:
] > impossible :: Empty -> a
] > impossible _ = undefined
]
] It's unfortunate that we can't define impossible without using bottom. I
] dislike using bottom, and here we have a function that cannot return
] bottom unless it is passed bottom. Such functions should be definable
] without using bottom, but this one isn't.

Well, we can write:

> impossible = impossible

Or, better,

> impossible = error "I'm afraid I can't do that, Dave"