[Haskell-cafe] Re: ANN: First Monad Tutorial of the Season

Tue Aug 26 04:19:49 EDT 2008

Hans van Thiel wrote:
> As a general comment on the teaching of Haskell, all books and
> tutorials, which I've seen, appear to treat this aspect of Haskell as if
> it were self explanatory. This while the better known imperative
> languages don't have anything like it. Only Real World Haskell explains
> algebraic data types to some satisfaction (IMHO, of course).

(Hopefully this different take on it helps more than it hurts...)

In addition to keeping the type-level and the value-level separated, 
Haskell does a little bit to keep the type/interface-level and the 
implementation-level separate.

The "data" keyword introduces both a new type and also a new 
implementation. This is the only way of introducing new implementations. 
ADTs are beauty incarnate, but unfortunately not well known outside of 
functional languages and abstract algebra.

The "newtype" keyword introduces a new type, but it reuses an old 
implementation under the covers. Even though they have the same 
underlying implementation, the newtype and the type of the old 
implementation are considered entirely different semantically and so one 
cannot be used in lieu of the other.

The dubiously named "type" keyword introduces an alias shorthand for 
some type that already exists. These aliases are, in a sense, never 
checked; that is, the macro is just expanded. This means that we can't 
carry any additional semantic information by using aliases and so if we 
have:

     type Celsius    = Int
     type Fahrenheit = Int

...the type checker will do nothing to save us. If we wanted to add 
semantic tags to the Int type in order to say what units the number 
represents, then we could do that with a "newtype" and the type checker 
would ensure that we didn't mix units.

Re "data" vs "newtype", where a newtype is possible (single data 
constructor, which has exactly one argument) there are still a few 
differences at the semantic level. Since a newtype's data constructor 
doesn't exist at runtime, evaluating a newtype to WHNF will evaluate the 
argument to WHNF; hence a newtype can be thought of as the data version 
with an obligatory strictness annotation. In terms of bottom, this means 
that:

     data Foo = Foo Int

...has both _|_ and (Foo _|_). Whereas, both of the following:

     data    Foo = Foo !Int
     newtype Foo = Foo Int

...have only _|_.

It should also be noted that the overhead for newtypes is not *always* 
removed. In particular, if we have the following definitions:

     data    Z   = Z
     newtype S a = S a

We must keep the tags (i.e. boxes) for S around because (S Z) and (S (S 
Z)) need to be distinguishable. This only really comes up with 
polymorphic newtypes (since that enables recursion), and it highlights 
the difference between strict fields and unpacked strict fields. 
Typically newtypes are unpacked as well as strict (hence no runtime tag 
overhead), but it's not guaranteed.

Another operational difference between newtypes and an equivalent data 
declaration has to do with the type class dictionaries. IIRC, with 
GeneralizedNewtypeDeriving the newtype actually uses the exact same 
dictionaries as the underlying type, thus avoiding unwrapping/rewrapping 
overhead. I'm somewhat fuzzy on all the details here, bit its another 
reason to use newtypes when you can.

-- 
Live well,
~wren