I've obtained permission to repost Gershom's slides on how to deserialize GADTs by typechecking them yourself, which are actually a literate haskell source file, that he was rendering to slides. Consequently, I've just pasted the content below as a literate email.<br>
<br>-Edward Kmett<br><br>-<br>Deserializing strongly typed values<br><br>(four easy pieces about typechecking)<br><br>Gershom Bazerman<br><br>-<br>prior art:<br>Oleg (of course)<br><a href="http://okmij.org/ftp/tagless-final/course/course.html">http://okmij.org/ftp/tagless-final/course/course.html</a><br>
<br>...but also<br>Stephanie Weirich<br><a href="http://www.comlab.ox.ac.uk/projects/gip/school/tc.hs">http://www.comlab.ox.ac.uk/projects/gip/school/tc.hs</a><br><br>=<br>Ahem...<br>\begin{code}<br>{-# LANGUAGE DeriveDataTypeable,<br>
ExistentialQuantification,<br> FlexibleContexts,<br> FlexibleInstances,<br> FunctionalDependencies,<br> GADTs,<br> RankNTypes,<br> ScopedTypeVariables<br>
#-}<br>\end{code}<br>=<br>ahem.<br>\begin{code}<br>import Data.Typeable<br>import Data.Maybe<br>import Control.Monad.Error<br>import Control.Applicative<br>import qualified Data.Map as M<br>import Unsafe.Coerce<br>\end{code}<br>
=<br>A simple ADT.<br><br>\begin{code}<br>data SimpleExpr = SOpBi String SimpleExpr SimpleExpr<br> | SOpUn String SimpleExpr<br> | SDbl Double<br> | SBool Bool deriving (Read, Show, Typeable)<br>
<br>\end{code}<br>Yawn.<br>=<br>An awesome GADT!<br><br>\begin{code}<br>data Expr a where<br> EDbl :: Double -> Expr Double<br> EBool :: Bool -> Expr Bool<br> EBoolOpBi :: BoolOpBi -> Expr Bool -> Expr Bool -> Expr Bool<br>
ENumOpBi :: NumOpBi -> Expr Double -> Expr Double -> Expr Double<br> ENumOpUn :: NumOpUn -> Expr Double -> Expr Double<br> deriving Typeable<br><br>data NumOpBi = Add | Sub deriving (Eq, Show)<br>
data NumOpUn = Log | Exp deriving (Eq, Show)<br>data BoolOpBi = And | Or deriving (Eq, Show)<br>\end{code}<br><br>The GADT is well typed. It cannot go wrong.<br>-<br>It also cannot derive show.<br>=<br>But we can write show...<br>
<br>\begin{code}<br>showIt :: Expr a -> String<br>showIt (EDbl d) = show d<br>showIt (EBool b) = show b<br>showIt (EBoolOpBi op x y) = "(" ++ show op<br> ++ " " ++ showIt x<br>
++ " " ++ showIt y ++ ")"<br>showIt (ENumOpBi op x y) = "(" ++ show op<br> ++ " " ++ showIt x<br> ++ " " ++ showIt y ++ ")"<br>
showIt (ENumOpUn op x) = show op ++ "(" ++ showIt x ++ ")"<br>\end{code}<br>=<br>And eval is *much nicer*.<br>It cannot go wrong --> no runtime typechecks.<br><br>\begin{code}<br>evalIt :: Expr a -> a<br>
evalIt (EDbl x) = x<br>evalIt (EBool x) = x<br>evalIt (EBoolOpBi op expr1 expr2)<br> | op == And = evalIt expr1 && evalIt expr2<br> | op == Or = evalIt expr2 || evalIt expr2<br><br>evalIt (ENumOpBi op expr1 expr2)<br>
| op == Add = evalIt expr1 + evalIt expr2<br> | op == Sub = evalIt expr1 - evalIt expr2<br>\end{code}<br>=<br>But how do we write read!?<br><br>read "EBool False" = Expr Bool<br>read "EDbl 12" = Expr Double<br>
<br>The type being read depends on the content of the string.<br><br>Even worse, we want to read not from a string that looks obvious<br>to Haskell (i.e. a standard showlike instance) but from<br>something that looks pretty to the user -- we want to *parse*.<br>
<br>So we parse into our simple ADT.<br><br>Then we turn our simple ADT into our nice GADT.<br>-<br>But how?<br><br>How do we go from untyped... to typed?<br><br>[And in general -- not just into an arbitrary GADT,<br>but an arbitrary inhabitant of a typeclass.]<br>
<br>[i.e. tagless final, etc]<br><br>=<br>Take 1:<br>Even if we do not know what type we are creating,<br>we eventually will do something with it.<br><br>So we paramaterize our typechecking function over<br>an arbitrary continuation.<br>
<br>\begin{code}<br>mkExpr :: (forall a. (Show a, Typeable a) => Expr a -> r) -> SimpleExpr -> r<br>mkExpr k expr = case expr of<br> SDbl d -> k $ EDbl d<br> SBool b -> k $ EBool b<br>
SOpUn op expr1 -> case op of<br> "log" -> k $ mkExpr' (ENumOpUn Log) expr1<br> "exp" -> k $ mkExpr' (ENumOpUn Exp) expr1<br>
_ -> error "bad unary op"<br> SOpBi op expr1 expr2 -> case op of<br> "add" -> k $ mkExprBi (ENumOpBi Add) expr1 expr2<br> "sub" -> k $ mkExprBi (ENumOpBi Sub) expr1 expr2<br>
\end{code}<br>=<br>Where's the typechecking?<br><br>\begin{code}<br>mkExpr' k expr = mkExpr (appCast $ k) expr<br><br><br>mkExprBi k expr1 expr2 = mkExpr' (mkExpr' k expr1) expr2<br><br><br>appCast :: forall a b c r. (Typeable a, Typeable b) => (c a -> r) -> c b -> r<br>
appCast f x = maybe err f $ gcast x<br> where err = error $ "Type error. Expected: " ++ show (typeOf (undefined::a))<br> ++ ", Inferred: " ++ show (typeOf (undefined::b))<br>\end{code}<br>
<br>... We let Haskell do all the work!<br>=<br>Hmmm... the continuation can be anything.<br>So let's just make it an existential constructor.<br><br>\begin{code}<br>data ExprBox = forall a. Typeable a => ExprBox (Expr a)<br>
<br>appExprBox :: (forall a. Expr a -> res) -> ExprBox -> res<br>appExprBox f (ExprBox x) = f x<br><br>tcCast :: forall a b c. (Typeable a, Typeable b) => Expr a -> Either String (Expr b)<br>tcCast x = maybe err Right $ gcast x<br>
where err = Left $ "Type error. Expected: " ++ show (typeOf (undefined::a))<br> ++ ", Inferred: " ++ show (typeOf (undefined::b))<br>\end{code}<br><br>Now we can delay deciding what to apply until later.<br>
Typecheck once, execute repeatedly!<br><br>=<br>One more trick -- monadic notation lets us<br>extend the context of unpacked existentials<br>to the end of the do block<br><br>\begin{code}<br>retBox x = return (ExprBox $ x, typeOf x)<br>
<br>typeCheck :: SimpleExpr -> Either String (ExprBox, TypeRep)<br>typeCheck (SDbl d) = retBox (EDbl d)<br>typeCheck (SBool b) = retBox (EBool b)<br>typeCheck (SOpBi op s1 s2) = case op of<br> "add" -> tcBiOp (ENumOpBi Add)<br>
"sub" -> tcBiOp (ENumOpBi Sub)<br> "and" -> tcBiOp (EBoolOpBi And)<br> "or" -> tcBiOp (EBoolOpBi Or)<br> where<br> tcBiOp constr = do<br> (ExprBox e1, _) <- typeCheck s1<br>
(ExprBox e2, _) <- typeCheck s2<br> e1' <- tcCast e1<br> e2' <- tcCast e2<br> retBox $ constr e1' e2'<br>\end{code}<br>=<br>So that's fine for *very* simple expressions.<br>
How does it work for interesting GADTs?<br>(like, for example, HOAS)?<br><br>(The prior art doesn't demonstrate HOAS --<br> it uses DeBruijn.)<br>=<br>Our simple world<br><br>\begin{code}<br>type Ident = String<br>type TypeStr = String<br>
<br>data STerm = SNum Double<br> | SApp STerm STerm<br> | SVar Ident<br> | SLam Ident TypeRep STerm<br>\end{code}<br><br>Note.. terms are Church style -- each var introduced<br>has a definite type.<br>
<br>Determining this type is left as an exercise.<br><br>=<br>Over the rainbow in well-typed land.<br><br>\begin{code}<br>data Term a where<br> TNum :: Double -> Term Double<br> TApp :: Term (a -> b) -> Term a -> Term b<br>
TLam :: Typeable a => (Term a -> Term b) -> Term (a -> b)<br> TVar :: Typeable a => Int -> Term a<br> deriving Typeable<br>\end{code}<br><br>Wait! DeBrujin (TVar) *and* HOAS (TLam)! The worst of both worlds.<br>
<br>Don't worry. In the final product all TVars are eliminated by construction.<br><br>Exercise to audience: rewrite the code so that TVar can be eliminated<br>from the Term type.<br>=<br>Show and eval...<br><br>\begin{code}<br>
showT :: Int -> Term a -> String<br>showT c (TNum d) = show d<br>showT c (TApp f x) = "App (" ++ showT c f<br> ++ ") (" ++ showT c x ++ ")"<br>showT c (TLam f) = "Lam " ++ ("a"++show c)<br>
++ " -> " ++ (showT (succ c) $ f (TVar c))<br>showT c (TVar i) = "a"++show i<br><br>runT :: Term a -> Term a<br>runT (TNum d) = (TNum d)<br>runT (TLam f) = (TLam f)<br>runT (TApp f x) = case runT f of TLam f' -> runT (f' x)<br>
runT (TVar i) = error (show i)<br>\end{code}<br>=<br>The existential<br><br>\begin{code}<br>data TermBox = forall a. Typeable a => TermBox (Term a)<br>appTermBox :: (forall a. Typeable a => Term a -> res) -> TermBox -> res<br>
appTermBox f (TermBox x) = f x<br>\end{code}<br>=<br>The typechecker returns a box *and* a typeRep.<br><br>Cast is the usual trick.<br><br>\begin{code}<br>retTBox :: Typeable a => Term a -> Either String (TermBox, TypeRep)<br>
retTBox x = return (TermBox $ x, typeOf x)<br><br>type Env = M.Map Ident (TermBox, TypeRep)<br><br>trmCast :: forall a b c. (Typeable a, Typeable b) =><br> Term a -> Either String (Term b)<br>trmCast x = maybe err Right $ gcast x<br>
where err = Left $ "Type error. Expected: " ++<br> show (typeOf (undefined::a))<br> ++ ", Inferred: " ++<br> show (typeOf (undefined::b))<br>\end{code}<br>
<br>=<br>\begin{code}<br>typeCheck' :: STerm -> Env -> Either String (TermBox, TypeRep)<br>typeCheck' t env = go t env 0<br> where<br> go (SNum d) _ idx = retTBox (TNum d)<br> go (SVar i) env idx = do<br>
(TermBox t, _) <- maybe (fail $ "not in scope: " ++ i)<br> return $ M.lookup i env<br> retTBox $ t<br>\end{code}<br><br>Nums and vars are easy.<br>=<br>App and Lam... less so.<br>
<br>\begin{code}<br> go (SApp s1 s2) env idx = do<br> (TermBox e1, tr1) <- go s1 env idx<br> (TermBox e2, _) <- go s2 env idx<br> TermBox rt <- return $ mkTerm $ head $ tail $<br> typeRepArgs $ head $ typeRepArgs $ tr1<br>
-- TypeReps have their... drawbacks.<br> e1' <- trmCast e1<br> retTBox $ TApp e1' e2 `asTypeOf` rt<br> go (SLam i tr s) env idx = do<br> TermBox et <- return $ mkTerm tr<br>
(TermBox e, _) <- go s<br> (M.insert i<br> (TermBox (TVar idx `asTypeOf` et),tr)<br> env<br> )<br>
(idx + 1)<br> retTBox $ TLam (\x -> subst (x `asTypeOf` et) idx e)<br>\end{code}<br>=<br>How does mkTerm work?<br><br>\begin{code}<br>mkTerm :: TypeRep -> TermBox<br>mkTerm tr = go tr TermBox<br>
where<br> go :: TypeRep -> (forall a. (Typeable a) => Term a -> res) -> res<br> go tr k<br> | tr == typeOf (0::Double) = k (TNum 0)<br> | typeRepTyCon tr == arrCon =<br> go (head $ typeRepArgs tr)<br>
$ \xt -> go (head $ tail $ typeRepArgs tr)<br> $ \y -> k (TLam $ \x -> const y (x `asTypeOf` xt))<br><br> arrCon = typeRepTyCon $ typeOf (undefined::Int -> String)<br>\end{code}<br>
<br>Same principle -- but can build arrows directly.<br><br>Doing so requires staying cps... I think.<br>=<br>And this is how we get rid of the DeBruijn terms.<br><br>\begin{code}<br>subst :: (Typeable a) => Term a -> Int -> Term b -> Term b<br>
subst t i trm = go trm<br> where<br> go :: Term c -> Term c<br> go (TNum d) = (TNum d)<br> go (TApp f x) = TApp (go f) (go x)<br> go (TLam f) = TLam (\a -> go (f a))<br> go (TVar i')<br>
| i == i' = either error id $ trmCast t<br> | otherwise = (TVar i')<br>\end{code}<br><br>Q: Now you see why DeBruijn is handy -- substitution<br>is otherwise a pain.<br><br>=<br>But note -- all functions must be monotyped.<br>
This is the simply typed lambda calculus.<br><br>How do we represent TLam (\a -> a)?<br><br>The masses demand HM polymorphism.<br>-<br><br>Take 4:<br><br>A damn dirty hack.<br><br>=<br>All hacks begin with Nats.<br><br>
\begin{code}<br>data Z = Z deriving (Show, Typeable)<br>data S a = S a deriving (Show, Typeable)<br>\end{code}<br>=<br>typeCheck is almost the same.<br><br>\begin{code}<br>typeCheck'' :: STerm -> Env -> Either String (TermBox, TypeRep)<br>
typeCheck'' t env = go t env 0<br> where<br> go :: STerm -> Env -> Int -> Either String (TermBox, TypeRep)<br> go (SNum d) _ idx = retTBox (TNum d)<br> go (SVar i) env idx = do<br> (TermBox t, _) <- maybe (fail $ "not in scope: " ++ i)<br>
return $ M.lookup i env<br> retTBox $ t<br>\end{code}<br>=<br>\begin{code}<br> go (SApp s1 s2) env idx = do<br> (TermBox e1, tr1) <- go s1 env idx<br> (TermBox e2, _) <- go s2 env idx<br>
TermBox rt <- unifyAppRet e1 e2<br> e1' <- unifyAppFun e1 e2<br> retTBox $ TApp e1' e2 `asTypeOf` rt<br> go (SLam i tr s) env idx = do<br> TermBox et <- return $ mkTerm' $ tr<br>
(TermBox e, _) <- go s (M.insert i<br> (TermBox (TVar idx `asTypeOf` et),tr)<br> env)<br> (idx + 1)<br> retTBox $ TLam (\x -> subst (x `asTypeOf` et) idx e)<br>
\end{code}<br><br>It looks like we just factored on the nasty arrow code.<br>=<br>mkTerm is almost the same... we just extended it to deal with Nats.<br>\begin{code}<br>mkTerm' :: TypeRep -> TermBox<br>mkTerm' tr = go tr TermBox<br>
where<br> go :: TypeRep -> (forall a. (Typeable a) => Term a -> res) -> res<br> go tr k<br> | tr == typeOf (0::Double) = k (TNum 0)<br> | tr == typeOf Z = k (TVar 0 :: Term Z)<br>
| typeRepTyCon tr == succCon = go (head $ typeRepArgs tr)<br> $ \t -> k $ succTerm t<br> | isArr tr =<br> go (head $ typeRepArgs tr)<br> $ \xt -> go (head $ tail $ typeRepArgs tr)<br>
$ \y -> k (TLam $ \x -> const y (x `asTypeOf` xt))<br> | otherwise = error $ show tr<br><br> succCon = typeRepTyCon $ typeOf (S Z)<br> succTerm :: Typeable b => Term b -> Term (S b)<br>
succTerm _ = TVar 0<br>\end{code}<br>=<br>Some utilities<br>\begin{code}<br>isArr :: TypeRep -> Bool<br>isArr x = typeRepTyCon x == (typeRepTyCon $ typeOf (undefined::Int -> String))<br><br>splitArrCon :: TypeRep -> Either String (TypeRep, TypeRep)<br>
splitArrCon x<br> | isArr x = case typeRepArgs x of<br> [a,b] -> Right (a,b)<br> _ -> Left $ "Expected function, inferred: " ++ show x<br> | otherwise = Left $ "Expected function, inferred: " ++ show x<br>
<br>\end{code}<br>=<br>Give an arrow term we unify it with its argument...<br>and return a witness.<br>\begin{code}<br>unifyAppRet :: forall a b. (Typeable a, Typeable b) =><br> Term a -> Term b -> Either String TermBox<br>
unifyAppRet x y = do<br> tr <- unifyAppTyps<br> (head $ typeRepArgs $ typeOf x)<br> (head $ typeRepArgs $ typeOf y)<br> return $ mkTerm' tr<br><br>\end{code}<br>=<br>Yes. Actual unification.<br>(although this is unification of a type template,<br>
so at least it is local)<br>\begin{code}<br>unifyAppTyps :: TypeRep -> TypeRep -> Either String TypeRep<br>unifyAppTyps trf trx = do<br> (fl,fr) <- splitArrCon trf<br> env <- go M.empty fl trx<br> subIt env fr<br>
where<br> -- go yields a substitution environment.<br> go :: M.Map String TypeRep -> TypeRep -><br> TypeRep -> Either String (M.Map String TypeRep)<br> go env x y<br> | isFree x = case M.lookup (show x) env of<br>
Just x' -> if x' == y then return env else Left (error "a")<br> Nothing -> return $ M.insert (show x) y env<br> | isArr x = do<br> (lh,rh) <- splitArrCon x<br>
(lh',rh') <- splitArrCon y<br> env' <- go env lh lh'<br> go env' rh rh'<br> | otherwise = if x == y then return env else Left (error "b")<br>
\end{code}<br>=<br>\begin{code}<br> -- subIt applies it<br> subIt :: M.Map String TypeRep -> TypeRep -> Either String TypeRep<br> subIt env x<br> | isFree x = case M.lookup (show x) env of<br>
Just x' -> return x'<br> Nothing -> Left (error "c")<br> | isArr x = do<br> (lh,rh) <- splitArrCon x<br> lh' <- subIt env lh<br>
rh' <- subIt env rh<br> return $ mkTyConApp arrCon [lh',rh']<br> | otherwise = return x<br> succCon = typeRepTyCon $ typeOf (S Z)<br> zCon = typeRepTyCon $ typeOf Z<br>
isFree x = typeRepTyCon x `elem` [zCon, succCon]<br> arrCon = (typeRepTyCon $ typeOf (undefined::Int -> String))<br>\end{code}<br><br>=<br>And now, we just have to convince GHC that they unify!<br><br>\begin{code}<br>
unifyAppFun :: forall a b c. (Typeable a, Typeable b, Typeable c)<br> => Term a -> Term b -> Either String (Term c)<br>unifyAppFun x y = do<br> unifyAppTyps (head $ typeRepArgs $ typeOf x) (head $ typeRepArgs $ typeOf y)<br>
return $ unsafeCoerce x<br>\end{code}<br>=<br>Problem solved.<br><br><br><div class="gmail_quote">On Fri, Jun 25, 2010 at 2:03 PM, Edward Kmett <span dir="ltr"><<a href="mailto:ekmett@gmail.com">ekmett@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">It turns out that defining Read is somewhat tricky to do for a GADT. <br><br>Gershom Bazerman has some very nice slides on how to survive the process by manual typechecking (much in the spirit of Oleg's meta-typechecking code referenced by Stephen's follow up below)<br>
<br>He presented them at hac-phi this time around.<br><br>I will check with him to see if I can get permission to host them somewhere and post a link to them here.<br><font color="#888888"><br>-Edward Kmett</font><div><div>
</div><div class="h5"><br><br><div class="gmail_quote">On Fri, Jun 25, 2010 at 5:04 AM, <span dir="ltr"><<a href="mailto:corentin.dupont@ext.mpsa.com" target="_blank">corentin.dupont@ext.mpsa.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><br>
Hello Haskellers,<br>
<br>
I'm having trouble writing a Read Instance for my GATD.<br>
Arg this GATD!! It causes me more problems that it solves ;)<br>
Especially with no automatic deriving, it adds a lot of burden to my code.<br>
<br>
>data Obs a where<br>
> ProposedBy :: Obs Int -- The player that proposed the tested<br>
rule<br>
> Turn :: Obs Turn -- The current turn<br>
> Official :: Obs Bool -- whereas the tested rule is official<br>
> Equ :: (Eq a, Show a, Typeable a) => Obs a -> Obs a -> Obs Bool<br>
> Plus :: (Num a) => Obs a -> Obs a -> Obs a<br>
> Time :: (Num a) => Obs a -> Obs a -> Obs a<br>
> Minus :: (Num a) => Obs a -> Obs a -> Obs a<br>
> And :: Obs Bool -> Obs Bool -> Obs Bool<br>
> Or :: Obs Bool -> Obs Bool -> Obs Bool<br>
> Not :: Obs Bool -> Obs Bool<br>
> Konst :: a -> Obs a<br>
<br>
<br>
> instance Read a => Read (Obs a) where<br>
> readPrec = (prec 10 $ do<br>
> Ident "ProposedBy" <- lexP<br>
> return (ProposedBy))<br>
> +++<br>
> (prec 10 $ do<br>
> Ident "Official" <- lexP<br>
> return (Official))<br>
> (etc...)<br>
<br>
Observable.lhs:120:8:<br>
Couldn't match expected type `Int' against inferred type `Bool'<br>
Expected type: ReadPrec (Obs Int)<br>
Inferred type: ReadPrec (Obs Bool)<br>
<br>
<br>
Indeed "ProposedBy" does not have the same type that "Official".<br>
Mmh how to make it all gently mix altogether?<br>
<br>
<br>
Best,<br>
Corentin<br>
<br>
<br>
_______________________________________________<br>
Haskell-Cafe mailing list<br>
<a href="mailto:Haskell-Cafe@haskell.org" target="_blank">Haskell-Cafe@haskell.org</a><br>
<a href="http://www.haskell.org/mailman/listinfo/haskell-cafe" target="_blank">http://www.haskell.org/mailman/listinfo/haskell-cafe</a><br>
</blockquote></div><br>
</div></div></blockquote></div><br>