Hi all,<br><br>Inspired by the generic maps example at<br><br>    <a href="http://www.haskell.org/haskellwiki/GHC/Indexed_types" target="_blank">http://www.haskell.org/haskellwiki/GHC/Indexed_types</a><br><br>I tried to use associated data types to create a generic finite map that unpacks both the key and value into the leaf data constructor.<br>


<br>This makes functions such as lookup faster as the key can be accessed directly instead than via an indirection. It also makes the data structure more space efficient (4 words less per key/value pair for weight balanced trees), which makes it possible to fit more data in main memory (and cache). Memory overhead is important when working with &quot;Big Data&quot; processing, where fitting as much data in memory as possible is important. Working with big data sets is something done daily at companies like Google, Microsoft, Yahoo, Twitter, Facebook, etc.<br>


<br>We can achieve the above goals using an associated data type like so:<br><br>    {-# LANGUAGE MultiParamTypeClasses, TypeFamilies #-}<br>    module Ex where<br><br>    class Unbox k v where<br>        data Map k v :: *<br>


        empty       :: Map k v<br>        lookup      :: k -&gt; Map k v -&gt; Maybe v<br>        insert      :: k -&gt; v -&gt; Map k v -&gt; Map k v<br><br>and an instance<br><br>    instance Unbox Int Double where<br>

        data Map Int Double = TipIntDouble<br>

                            | BinIntDouble {-# UNPACK #-} !Size<br>                                           {-# UNPACK #-} !Int<br>                                           {-# UNPACK #-} !Double<br>                                           !(Map Int Double)<br>


                                           !(Map Int Double) <br>                                           <br>        -- implementation elided<br>        empty = undefined<br>        lookup k m = undefined<br>        insert k v m = undefined<br>


      <br>    type Size = Int<br><br>However, if we try to apply this method to large programs we run into problems: we need to defined instances for a large number of combinations of keys/values. We could generate a large number of instances, hoping that these will be enough for most users&#39; needs, using Template Haskell or CPP. However, the potential number of instances is very large, about a hundred if you consider only Prelude types and tens of thousands if you include tuples. We cannot add instances for types not defined in base without adding a dependency on all libraries which data types we want to add instances for.<br>


<br>Since we cannot define all instances up-front we&#39;ll have to rely on the user to create instances for the combinations she needs. This is tedious work for the user; most of the time the instance is an exact copy of the above instance for Int/Double, modulo renaming of the type arguments and the constructor names.<br>


<br>Unfortunately our problems don&#39;t end here. If we assume for a second that the user writes the necessary boilerplate (perhaps using a Template Haskell function that generates it) there are still more problems ahead. It&#39;s quite likely that two different libraries wants an instance for the same types, and each declare one locally. However, now the poor user can&#39;t use both libraries as there are conflicting instances (or can she using some extension?) and imports always bring in instances. This problem exists for type classes in general but we only use ten or so type classes in most Haskell programs (e.g Functor, Monad, Eq, Ord, and Show) so it doesn&#39;t seem to be a big problem so far.<br>


<br>What to do? It seems that associated data types might not be right tool for this problem. Could it be extended to work well for this use case? Can it be made to *scale* to large programs.<br><br>Here&#39;s an idea: allow default implementations of associated data types, just like for methods<br>


<br>    class Unbox k v where<br>        data Map k v :: *<br>        empty       :: Map k v<br>        lookup      :: k -&gt; Map k v -&gt; Maybe v<br>        insert      :: k -&gt; v -&gt; Map k v -&gt; Map k v<br><br>

        data Map Int Double = Tip<br>

                            | Bin {-# UNPACK #-} !Size<br>                                  {-# UNPACK #-} !k<br>                                  {-# UNPACK #-} !v<br>                                  !(Map k v)<br>                                  !(Map k v) <br>


<br>        -- implementation elided<br>        empty = undefined<br>        lookup k m = undefined<br>        insert k v m = undefined<br><br>and export the definition in the interface file. This would allow instantiation of the type class without boilerplate<br>


<br>    instance Unbox Int Double  -- no &quot;body&quot;<br><br>The compiler would perhaps have to generate unique names for the constructors for this to work.<br><br>This is not enough. We still have two problems left:<br>


<br>    * boilerplate instance declarations (but less so that before), and<br>    * instance collisions.<br><br>Could we automatic create instances whenever the user mentions a type class? For example, if a program mentions<br>


<br>    f :: Map Int Double -&gt; ...<br><br>we know we need an instance for Int/Double and if we can&#39;t find one we derive one using the default definition. It is all the user needs to do when using the classic containers package. This would completely remove the boilerplate instance declarations.<br>


<br>We could still use OverlappingInstances to allow the user to provide more specific instances (with a different implementation), if needed. This is akin to C++ template specialization.<br><br>The compiler will need to help us with the instance collision problem in that it only generates on instance even if the same type parameters are used with the Map type in several different modules.<br>


<br>Summary: While associated data types in theory allows us to create more efficient data structures, the feature doesn&#39;t seem to scale to large programs, for this use case.<br><br>Cheers,<br>Johan<br><br>