[Haskell-cafe] Re: Typed Lambda-Expressions withOUT GADTs

Tue Jan 4 00:57:10 EST 2005

Conor McBride has pointed out a really interesting issue with
macro-expressing GADT in typeclasses, outlined in the previous
message. Suppose we have a GADT "OpenExpression env r" for
lambda-expressions, and we define

  newtype Expression r = MkExpression (forall env. OpenExpression env r)

to abstract over the environment, and we then write a function
runExpression to ``compile'' a lambda-expression to a native Haskell
expression.

The typeclass encoding translated GADT "OpenExpression env r" into a
class "OpenExpression t env r" and each GADT data constructor into a
regular data constructor (with the appropriate arity plus some phantom
stuff). The class OpenExpression was merely a constraint on the
well-formedness of lambda-terms. We need another class,
RunOpenExpression, to define the ``compiler'' function. So, 
"newtype Expression" had a type

  newtype Expression t r = MkExpression (forall env. 
                                          RunOpenExpression t env r => t)

which is not very satisfactory: the encapsulated constraint is
RunOpenExpression rather than OpenExpression. That means, if we wish
to do something else with the Expression (e.g., compute its size or
depth, or print it), we need to add the corresponding class constraint
to "newtype Expression" declaration. That is not modular. We don't
want to go back and change data declarations just because we thought
up a new way of handling our data.

It should be noted first that this problem arises only when we have to
explicitly state the constraints (as in instance declarations and
signatures of higher-ranked functions). If it is possible to let
the compiler figure out the types, then the approach in the previous
message seems preferable -- it seems neater, at least to me. 

I must also note that when it comes to typeclass constraints, GADT
too have some modularity issues (which are outside of the scope of
this message). 

The scope of this message is the recipe for translating GADT into
typeclasses that _can_ deal with local quantification and similar
issues. As one may see in the code below, we define

  newtype Expression t r = MkExpression (forall env. 
				         OpenExpression t env r => t)

Note the OpenExpression constraint. The compiler is defined as

  runExpression (MkExpression e) = openRun e ()

If we'd rather compute the depth or the size of the expression, we
define

 depth (MkExpression e) = compute_depth e ()
    where <see below>

 size (MkExpression e) = compute_size e ()
    where <see below>

That is, we can handle the expression in arbitrary ways, and we don't need
to modify the OpenExpression constraint. 

The key is to realize that the GADT declaration introduces both the
constructors and the deconstructor. So, to match GADT, our typeclass
translation must also introduce the deconstructor. The second insight is
that a catamorphism makes a good deconstructor. In our case, the fold
must be polymorphic -- like the gfold in ``Boilerplate scrapping''. I
bow to Ralf Laemmel for the type of gfold.

The drawback of the approach is that the class OpenExpression becomes
not-extensible. We no longer can easily add more 'instances' to the
class. OTH, if we place the handlers in a HList, and use the OOHaskell
encoding, we can probably arrive at open (extensible) GADT.

The complete new code is included.

> Where now? Well, counterexample fiends who want to provoke Oleg into
> inventing a new recipe had better write down a higher-order example.
> I just did, then deleted it. Discretion is the better part of valour.

It is possible to undelete that example?

{-# OPTIONS -fglasgow-exts #-}
{-# OPTIONS -fallow-undecidable-instances #-}

module Expression where

-- data Sym env r where
--     FirstSym :: Sym (r,xx) r
--     NextSym :: Sym env r -> Sym (a,env) r

class SymT t env r
data FirstSym = FirstSym deriving Show
data NextSym t = NextSym t deriving Show

instance SymT FirstSym (r,xx) r
instance SymT t env r => SymT (NextSym t) (a,env) r

class SymT t env r => RunSym t env r | t env -> r where
    runSym :: t -> env -> r

-- runSym FirstSym (r,xx) = r;
instance RunSym FirstSym (r,xx) r where
    runSym _ (r,xx) = r

instance RunSym var env r => RunSym (NextSym var) (a,env) r where
    runSym (NextSym var) (a,env) = runSym var env -- the code literally

-- data OpenExpression env r where
--   Lambda :: OpenExpression (a,env) r -> OpenExpression env (a -> r);
--   Symbol :: Sym env r -> OpenExpression env r;
--   Constant :: r -> OpenExpression env r;
--   Application :: OpenExpression env (a -> r) -> OpenExpression env a -> 
--                  OpenExpression env r

-- Note that the types of the efold alternatives are essentially 
-- the "inversion of arrows" in the GADT declaration above
class OpenExpression t env r | t env -> r where
    efold :: t -> env
	     -> (forall r. r -> c r) -- Constant
	     -> (forall r. r -> c r) -- Symbol
	     -> (forall a r. (a -> c r) -> c (a->r))  -- Lambda
	     -> (forall a r. c (a->r) -> c a -> c r) -- Application
	     -> c r

data Lambda a t = Lambda t deriving Show
data Symbol t = Symbol t deriving Show
data Constant r = Constant r deriving Show
data Application a t1 t2 = Application t1 t2 deriving Show

instance OpenExpression t (a,env) r => OpenExpression (Lambda a t) env (a->r)
    where
    efold (Lambda t) env on_const on_symbol on_lambda on_app
	  = on_lambda 
	    (\a -> efold t (a,env) 
	                 on_const on_symbol on_lambda on_app)

instance RunSym t env r => OpenExpression (Symbol t) env r
    where
    efold (Symbol t) env on_const on_symbol on_lambda on_app
	  = on_symbol (runSym t env) 

instance OpenExpression (Constant r) env r
    where
    efold (Constant t) env on_const on_symbol on_lambda on_app
	  = on_const t

instance (OpenExpression t1 env (a->r),
	  OpenExpression t2 env a)
    => OpenExpression (Application a t1 t2) env r
    where
    efold (Application t1 t2) env on_const on_symbol on_lambda on_app
	= on_app (efold t1 env on_const on_symbol on_lambda on_app)
	         (efold t2 env on_const on_symbol on_lambda on_app)

newtype ID a = ID a
unID (ID a) = a

openRun t env = unID $ efold t env ID ID on_lambda on_app
    where on_lambda f = ID $ \a -> unID (f a)
	  on_app (ID f) (ID x) = ID (f x)

newtype Expression t r = MkExpression (forall env. 
				       OpenExpression t env r => t)

runExpression (MkExpression e) = openRun e ()

sym0 = Symbol FirstSym

sym1 = Symbol (NextSym FirstSym)

sym2 = Symbol (NextSym (NextSym FirstSym))

    -- k = \x.\y -> x = \.\.1
k = MkExpression (Lambda (Lambda sym1))

-- s :: Expression ((a -> b -> c) -> (a -> b) -> a -> c);
-- -- s = \x.\y.\z -> x z (y z) = \.\.\.2 0 (1 0)
s = MkExpression (Lambda 
		  (Lambda 
		   (Lambda 
		    (Application
		     (Application
		      sym2 sym0) (Application sym1 sym0)))))

test1 = (runExpression k) sym0 sym2
test2 = (runExpression k) sym2 sym0

test3 = (runExpression s) (runExpression k) (runExpression k)
-- Note: That can only be the identity!
-- :t test3: 
-- test3 :: forall r. r -> r 

test3' = test3 "It's the identity!!! Look at the type!!!"

newtype WInt a = WInt Int
unWInt (WInt v) = v

depth (MkExpression e) = compute_depth e ()
    where 
    compute_depth t env = unWInt $ efold t env w0 w0 on_lambda on_app
    w0 _ = WInt 0
    on_lambda f = WInt $ unWInt (f undefined) + 1
    on_app (WInt x) (WInt y) = WInt $ 1 + max x y

size (MkExpression e) = compute_size e ()
    where 
    compute_size t env = unWInt $ efold t env w0 w0 on_lambda on_app
    w0 _ = WInt 1
    on_lambda f = WInt $ unWInt (f undefined) + 1
    on_app (WInt x) (WInt y) = WInt $ 1 + x + y

-- depth k ==> 2
-- depth s ==> 5
-- size k  ==> 3
-- size s  ==> 10