Computing the final representation type of a TyCon (Was: Unpack primitive types by default in data)

Thu Nov 29 09:27:39 CET 2012

Hi all,

I've decided to try to implement the proposal included in the end of
this message. To do so I need to write a function

    hasPointerSizedRepr :: TyCon -> Bool

This function would check that that the TyCon is either

 * a newtype, which representation type has a pointer-sized representation, or
 * an algebraic data type, with one field that has a pointer-sized
representation.

I'm kinda lost in all the data types that GHC defines to represent
types. I've gotten no further than

    hasPointerSizedRepr :: TyCon -> Bool
    hasPointerSizedRepr tc@(AlgTyCon {}) = case algTcRhs tc of
                                             DataTyCon{ data_cons = [data_con] }
                                                         -> ...
                                             NewTyCon { data_con = [data_con] }
                                                         -> ...
                                             _           -> False
    hasPointerSizedRepr _                = False

I could use some pointers (no pun intended!) at this point. The
function ought to return True for all the following types:

    data A = A Int#
    newtype B = B A
    data C = C !B
    data D = D !C
    data E = E !()
    data F = F !D

One part that confuses me is figuring out the representation type of a
data constructor after unpacking. For example, the function should not
return true if called on G in this example:

    data G = G !H
    data H = H {-# UNPACK #-} !I
    data I = I !Int !Int

because if we unpacked H into G's constructor it would take up two
words, due to I being unpacked.

Does DataCon contain the unpacked representation of the data
constructor or only the before-optimizations representation?

Cheers,
Johan

On Thu, Feb 16, 2012 at 4:25 PM, Johan Tibell <johan.tibell at gmail.com> wrote:
> Hi all,
>
> I've been thinking about this some more and I think we should
> definitely unpack primitive types (e.g. Int, Word, Float, Double,
> Char) by default.
>
> The worry is that reboxing will cost us, but I realized today that at
> least one other language, Java, does this already today and even
> though it hurts performance in some cases, it seems to be a win on
> average. In Java all primitive fields get auto-boxed/unboxed when
> stored in polymorphic fields (e.g. in a HashMap which stores keys and
> fields as Object pointers.) This seems analogous to our case, except
> we might also unbox when calling lazy functions.
>
> Here's an idea of how to test this hypothesis:
>
>  1. Get a bunch of benchmarks.
>  2. Change GHC to make UNPACK a no-op for primitive types (as library
> authors have already worked around the lack of unpacking by using this
> pragma.)
>  3. Run the benchmarks.
>  4. Change GHC to always unpack primitive types (regardless of the
> presence of an UNPACK pragma.)
>  5. Run the benchmarks.
>  6. Compare the results.
>
> Number (1) might be what's keeping us back right now, as we feel that
> we don't have a good benchmark set. I suggest we try with nofib first
> and see if there's a different and then move on to e.g. the shootout
> benchmarks.
>
> I imagine that ignoring UNPACK pragmas selectively wouldn't be too
> hard. Where the relevant code?
>
> Cheers,
> Johan