newtype/datatype (was efficiency)

Iavor Diatchki diatchki@cse.ogi.edu
Wed, 16 Jan 2002 08:19:47 -0800


hello,

it looks like a lot of Haskell experts are at POPL (prog. language conference)
and are not checking their email... so here is my attempt at explaining the
differences between data and newtype.

data T              -- has 1 vale: _|_
data T = T          -- has 2 values: _|_, T
data T = T ()       -- has 3 values: _|_, T _|_, T ()  
newtype T = T ()    -- has 2 values: _|_, T ()

basically the "data" declaration in Haskell "lifts" a type, by adding a
new _|_ element.  the reason for this is that in general a data may have
different shapes (hence the constructors) and while trying to make a value 
of the type one might non-terminate before they computed what constructor
should be applied.  the annoying thing is that this happens even for
datatypes which dont have different shapes, i.e. data declarations with
only one different shape. Even if there is only one possible shape we always
remember the shape (constructor) in the representation and when pattern matching
check to see if the value is of this shape.  i dont know why.

to (kind of) avoid this, if you have a datatype with only one shape 
and one field you should use a newtype.  unfortunatelly if you have multiple
fields you are stuck with the additional _|_. i dont think there should be
any difference in efficiency between:

data P a b =  P a b
and
newtype P a b = P (a,b)

unless there are some hard-coded tuple specific optimizations done by the 
current implementations. i cannot see any reasons why these optimizations
shouldnt happen for the data declarartion as well (i am not experienced 
compiler writer however...)   which of the above is better is a matter of personal
preference - i prefer the 1st one as i think it leads to more 
readable patterns.

finaly there was the question about strictness annotations - they seem
to be quite orthogonal to the whole newtype/data issue and enable you
to specify that some constructors of your datatype are strict - i.e. when you
construct something with a strict constructor it will first be evalued
(until it becomes a function or an application of a lazy constructor to
some arguments) and only then will the disired value be created.  since
this evaluation might not terminate of course, some "different" values in
the lazy version of the datatype all collapse to teh same thing. so
in some sense newtype makes things more lazy, while ! makes them more eager.

sorry for the long email
hope it helped
iavor

-- 
---------------------------------------+----------------------------------------
Iavor S. Diatchki                      | email: diatchki@cse.ogi.edu           
Dept. of Computer Science              | web: http://www.cse.ogi.edu/~diatchki
OGI School of Science and Engineering  | work phone: 5037481631                
OHSU                                   | home phone: 5036434085                
---------------------------------------+----------------------------------------