Applications and libraries/Generic programming/Smash

From HaskellWiki
< Applications and libraries‎ | Generic programming
Revision as of 01:21, 14 May 2007 by Oleg (talk | contribs) (Initial submission)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Approach: Smash your boilerplate

Required features/Portability

  • The type-level typecase/TypeEq requires overlapping and undecidable instances. The use of these extensions is limited to the library only. The end user code needs no such extensions.
  • No need for Typeable, nor higher-rank types


Expressibility

  • Can do both producer and consumer functions: e.g., gsize (consumer), replacing all Floats with Double (producer: the return type is determined by the original term's type/structure). The return type is computed by the typechecker rather than has to be specified by the user.
  • Dat/TDat duality
  • Local redefinitions are possible, see Discussion.

Subset of data types covered

  • Whatever TDat instances are declared
  • All primitive and algebraic types are supported. Function spaces. No Dynamic types. Polymorphic types can be handled, via typeclass proxies.

Usage

  • Library Writer: writes the core library (e.g., STApply class) and declares Dat and TDat classes. This is done once and for all.
  • Power User: (new datatype): defines Dat and TDat instances
  • User: (new generic function): defines a new generic function by combining, in a HList, functions processing values of specific data types, with an appropriate generic function (cf. `everywhere') that does generic traversal. Polymorphism (when specific functions are polymorphic) requires special handling (see below).
  • End User: apply the generic function like an ordinary function

Error Messages

  • Can be hard to grasp??

Amount of work per data type (Boilerplate)

  • instances of Dat and TDat. Dat is especially simple (essentially like deriving a Functor). So far, there is no satisfactory way to merge Dat and TDat (of course, one can always jam them together. Alas, the result is not less than the sum of the parts.

Extensibility

  • Full

Reasoning

TODO ??? what does that mean?

Performance considerations

  • Although that is hidden from the end user, the library heavily relies on the typeclasses. Thus the efficiency of the typeclass implementation has direct impact.
  • Choosing the processor for each subterm is done at the compile time. There is no passing and interpreting types at run time (except when a compiler uses that to implement typeclasses).

Helpful extra features

  • TypeEq and TypeCast as built-ins would have helped.
  • Deriving for Dat and TDat (the same problem as with various SYB approaches)

Discussion

Our generic functions are composed of specific and generic parts. A specific part tells what to do if the input term happens to be of a particular type. The generic part tells us what to do with all other terms. The specific part is just an HList of ordinary functions, with different argument types. If the type of the input term matches the argument type of one of the functions in the list, we apply that function. If no specific function applies, we do the generic action, implicit in the traversal strategy (for example, apply the generic function to the subterms and reduce the results).

SYB1, when traversing terms and invoking user functions for subterms of a particular type, relies on a run-time typecase. The latter requires run-time type representation, which is provided by the typeclass Typeable. The typeclass implements the method `cast' for a safe cast from the value of `generic type' to the value of the specific type.

We observe that the typecase, at the type level, has always existed in Haskell (although that fact was not perhaps realized until HList). That typecase, which does not need any `cast' operation, is the type equality predicate TypeEq. The type-level typecase is the essence of the present approach. Since the handler for each subterm is chosen statically, there is no need for run-time type representation.


We should note the duality of `sapply' with respect to typeclasses. Let us consider

class C a where fn :: a -> Int
instance C Bool where fn x = if x then 10 else 20
instance C Char where fn x = fromEnum x

The typeclass C declares the function fn overloaded over the argument types. When typechecking an application "fn x", the compiler selects the instance of "fn" by matching the type of "x" against the types of C's instances. The same functionality can be implemented with `sapply':

fn' x = sapply (SCons (\ (x::Bool) -> if x then 10 else 20)
                 (SCons (\ (x::Char) -> fromEnum x)
		  SNil)) ((error "no match")::Int) x

(The function `sapply' has the `default' clause, for all other data types. One can do the same with the typeclasses, with the help of overlapping instances). With `sapply', the set of concrete processors is given explicitly as a list rather than implicitly as a set of all typeclass instances in effect. The implementation of `sapply' reifies typechecker's instance selection algorithm. Since the concrete processors are given to sapply in a list in a certain order, sapply easily deals with `overlapping' instances: the first matching processing function is chosen.

The sapply approach is comparable to vlookup in LIGD/Dynamics.lhs. However, in our case the `overriding', the dispatch, is decided statically.

The overloaded function fn differs from fn' in that the set of fn instances is open (and can be extended at any time by defining a new instance of the class C). In contrast, all instances of fn' are explicitly enumerated and their set is closed. This is however an artifact of the particular definition fn'. We could just as well write

fn1_insts = (SCons (\ (x::Bool) -> if x then 10 else 20)
                (SCons (\ (x::Char) -> fromEnum x)
		  SNil))
fn1' x = sapply fn1_insts ((error "no match")::Int) x
-- in another module, importing fn1_insts
fn1' x = sapply (SCons (\ () -> 100) fn1_insts) ((error "no match")::Int) x

Since fn1_insts is a list, we can inspect the instances, rearrange and remove them. These operations are not available with the typeclass instances. Thus sapply approach is more flexible and powerful. Finally, the set of sapply instances can be made open, as described in [http://www.haskell.org/pipermail/haskell/2006-October/018684.html Infinite, open, statically constrained HLists]


The present approach shares with SYB the requirement that specific processing functions (the ones that handle terms of particular types) must be monomorphic. That requirement can be relaxed, as described in [http://pobox.com/~oleg/ftp/Haskell/poly2.txt Type-class overloaded functions: second-order typeclass programming with backtracking]