Proposal: add Int indexing functions to Data.Set

wren ng thornton wren at freegeek.org
Sat Apr 30 23:30:24 CEST 2011


On 4/29/11 1:33 AM, Luis Casillas wrote:
> I'm not at all committed to having the order of the indexes reflect the order of the set members or any specific internal property of the set implementation.  I'm happy with any one-to-one, reversible mapping between sets and Ints in the range [0..(size-1)].  I'm also happy if the mapping changes between library versions (or heck, even between different invocations of the same program).

It sounds like what you want is an intern table. That's a pretty common 
need, though not something I think ought to be pushed onto 
Data.{Set,Map}. In part because the needs for intern tables can vary by 
application, so there's no single implementation for everyone.

A simple intern table can be done by:

     module Intern (Intern(), empty, intern, extern) where
     import qualified Data.Map    as M
     import qualified Data.IntMap as IM

     data Intern a = Intern
         { int2elem :: IM.IntMap a
         , elem2int :: M.Map a Int
         , nextInt  :: !Int
         }

     empty = Intern IM.empty M.empty minBound

     intern a self@(Intern i2a a2i n) =
         case M.lookup a a2i of
         Just i  -> Just (i, self)
         Nothing
             | n == maxBound -> Nothing
             | otherwise ->
                 Just (n, Intern
                     (IM.insert n a i2a)
                     (M.insert a n a2i)
                     (n+1))

     extern i (Intern i2a a2i n) = IM.lookup i i2a


Though it should be easy to see the room for variation. E.g.,

* sometimes you want to store the highest used key, other times the 
lowest unused key is more useful.

* sometimes you want to allow for invalidating old keys.

* sometimes you want to allow for recycling old keys ala garbage collection.

* generally the key type should be abstract instead of an Int; but how 
abstract is "abstract"?

* sometimes it's helpful to know whether the key returned by 'intern' is 
freshly allocated or not.

* sometimes you'd be better off using Data.HashMap instead of Data.Map; 
sometimes you'd be better off with a hash-consing trie.

...

-- 
Live well,
~wren



More information about the Libraries mailing list