Applications and libraries/Generic programming/Smash

From HaskellWiki
Jump to navigation Jump to search

Approach: Smash your boilerplate

Required features/Portability

  • The type-level typecase/TypeEq requires overlapping and undecidable instances. The use of these extensions is limited to the library only. The end user code needs no such extensions.
  • No need for Typeable, nor higher-rank types


Expressibility

  • Can do both producer and consumer functions: e.g., gsize (consumer), replacing all Floats with Double (producer: the return type is determined by the original term's type/structure). The return type is computed by the typechecker rather than has to be specified by the user. Pure producers (e.g., gminimum) are supported too.
  • Local redefinitions are possible, see Discussion.

Subset of data types covered

  • Whatever LDat instances are declared
  • All primitive and algebraic types are supported. Function spaces. No Dynamic types. Polymorphic types can be handled, via typeclass proxies.

Usage

  • Library Writer: writes the core library (e.g., STApply class) and declares the LDat class. This is done once and for all.
  • Power User: (new datatype or a new traversal strategy): defines LDat instances
  • User: (new generic function): defines a new generic function by combining, in a HList, functions processing values of specific data types, with an appropriate generic function (cf. `everywhere') that does generic traversal. Polymorphism (when specific functions are polymorphic) requires special handling (see below).
  • End User: apply the generic function like an ordinary function

Error Messages

  • Can be hard to grasp??

Amount of work per data type (Boilerplate)

  • instances of LDat, which are similar to Functor and could be automatically derived from the definition of the data type.

Extensibility

  • Full

Reasoning

TODO ??? what does that mean?

Performance considerations

  • Although that is hidden from the end user, the library heavily relies on the typeclasses. Thus the efficiency of the typeclass implementation has direct impact.
  • Choosing the processor for each subterm is done at the compile time. There is no passing and interpreting types at run time (except when a compiler uses that to implement typeclasses).

Helpful extra features

  • TypeEq and TypeCast as built-ins would have helped.
  • Deriving for LDat (the same problem as with various SYB approaches)

Discussion

Our generic functions are composed of specific and generic parts. A specific part tells what to do if the input term happens to be of a particular type. The generic part tells us what to do with all other terms. The specific part is just an HList of ordinary functions, with different argument types. If the type of the input term matches the argument type of one of the functions in the list, we apply that function. If no specific function applies, we do the generic action, implicit in the traversal strategy (for example, apply the generic function to the subterms and reduce the results).

SYB1, when traversing terms and invoking user functions for subterms of a particular type, relies on a run-time typecase. The latter requires run-time type representation, which is provided by the typeclass Typeable. The typeclass implements the method `cast' for a safe cast from the value of `generic type' to the value of the specific type.

We observe that the typecase, at the type level, has always existed in Haskell (although that fact was not perhaps realized until HList). That typecase, which does not need any `cast' operation, is the type equality predicate TypeEq. The type-level typecase is the essence of the present approach. Since the handler for each subterm is chosen statically, there is no need for run-time type representation.

The set of traversal strategies is extensible. To the pre-defined gmap, reduction, and two-terms-in-lockstep strategies, the programmer may at any time add a new class of traversals. The message `Smash-along' shows one such extension, lazy gmap, and its application to mapping of terms that do not exist.

We should note the duality of `stapply' with respect to typeclasses. Let us consider

class C a where fn :: a -> Int
instance C Bool where fn x = if x then 10 else 20
instance C Char where fn x = fromEnum x

The typeclass C declares the function fn overloaded over the argument types. When typechecking an application "fn x", the compiler selects the instance of "fn" by matching the type of "x" against the types of C's instances. The same functionality can be implemented with `stapply':

data ResFailure -- an abstract data type with no non-bottom values
                -- and no defined operations. An attempt to actually
                -- use it will trigger a type error

fn' x = 
    stapply ((\ (x::Bool) -> if x then (10::Int) else 20) :+:
             (\ (x::Char) -> fromEnum x) :+:
	     HNil) 
            x
	    (undefined::ResFailure)

testtyc_fn1 = fn' True
testtyc_fn2 = fn' 'x'
-- testtyc_fn3 = show $ fn' () -- type error, can't show ResFailure

(The function `stapply' has the `default' clause, for all other data types. One can do the same with the typeclasses, with the help of overlapping instances). With `stapply', the set of concrete processors is given explicitly as a list rather than implicitly as a set of all typeclass instances in effect. The implementation of `stapply' reifies typechecker's instance selection algorithm. Since the concrete processors are given to stapply in a list in a certain order, stapply easily deals with `overlapping' instances: the first matching processing function is chosen.

The stapply approach is comparable to vlookup in LIGD/Dynamics.lhs. However, in our case the `overriding', the dispatch, is decided statically.

The overloaded function fn differs from fn' in that the set of fn instances is open (and can be extended at any time by defining a new instance of the class C). In contrast, all instances of fn' are explicitly enumerated and their set is closed. This is however an artifact of the particular definition fn'. We could just as well write

fn1_insts = (SCons (\ (x::Bool) -> if x then 10 else 20)
                (SCons (\ (x::Char) -> fromEnum x)
		  SNil))
fn1' x = stapply fn1_insts x (undefined::ResFailure)
-- in another module, importing fn1_insts
fn1' x = stapply (SCons (\ () -> 100) fn1_insts) x (undefined::ResFailure)

Since fn1_insts is a list, we can inspect the instances, rearrange and remove them. These operations are not available with the typeclass instances. Thus stapply approach is more flexible and powerful. Finally, the set of stapply instances can be made open, as described in [http://www.haskell.org/pipermail/haskell/2006-October/018684.html Infinite, open, statically constrained HLists]


The present approach shares with SYB the requirement that specific processing functions (the ones that handle terms of particular types) must be monomorphic. That requirement can be relaxed, as described in [http://pobox.com/~oleg/ftp/Haskell/poly2.txt Type-class overloaded functions: second-order typeclass programming with backtracking]