Generic tries (long)

Sat Jun 21 11:31:01 EDT 2008

Hi Jamie,

thanks for doing this---I think that the functions in the API look
good.  Of course, one can always think of more :-)  One that I often
find useful when working with Data.Map is a function that removes a
key from a map and also returns the associated value---a kind of
destructive lookup.

I think that the library would be better if you exposed a
non-overloaded API for the map(s?) that you plan to implement.  This
is useful because (i) most of the type I anticipiate using a single
type of map, and concrete types lead to more accurate types in type
inference, (ii) it makes the API Haskell'98 and hence likely to work
on most Haskell implementations out there.   If you really think that
the overloading is useful, then you could provide separate modules
that provide the class+instances in terms of the non-overloaded API
and add a cabal flag to turn these on/off.

Also, as others mentioned, using Int# in an API is not great because
this is a GHC internal type.  If you found situatuins where GHC is not
unboxing something please send a mail to the GHC team---they are
usually very good at fixing these sorts of things.  Adrian mentioned
that you use CPP to control the use of Int# but this does not help if
the Int# leaks through the API:  (i) I could still have programs that
work in Hugs and fail to compile in GHC (and vice versa), (ii) if I
want to use your map library, I would have to compile all my programs
with the CPP extensions, which is not nice.

Hope this helps,
-Iavor

On 6/16/08, Jamie Brandon <jcb73 at cam.ac.uk> wrote:
> Hi, I'm writing a library for generic tries for the Summer of Code.
>  The main point of this post is to get some feedback on the api but
>  I'll briefly explain the idea.
>
>  The point of a trie is to exploit the recursive nature of ADTs to save
>  on expensive key comparisons and reduce space consumption. Hinze'
>  original formulation is very elegant but results in very deep
>  structures and is fairly inefficient. The normal trie optimisations
>  (concatenating singleton maps, mainly) cant be applied to the generic
>  version.
>
>  I intend instead to encode ADTs as lists. This encoding can range from
>  a simple walk of the ADT to creating a compressed, bit packing
>  representation. The resulting tradeoff between encoding time and space
>  usage should make the design fairly flexible. This will look something
>  like:
>
>  class Serial k where
>
>  -- | Flattened form of key consists of a list of Nodes
>  -- Node will be Int or UArray Int for compressed implementations
>  type Node k
>
>  -- | Flatten to a list of Nodes
>  serialise :: k -> [Node k]
>
>  -- | Reconstruct from a list of Nodes
>  unserialise :: [Node k] -> k
>
>  The api below is mostly cribbed from Adrian Hey's initial design.
>  Guarantees about ordering will probably vary between maps. All
>  ascending have descending version too. Strict versions of functions
>  will be written where appropriate, I've omitted them here for brevity.
>  Key reconstruction is likely to be expensive so it may make more sense
>  to seperate foldrKeys and friends into a seperate class.
>
>  Adrian has written instance of GMap for lists, UInts and Ord types so
>  I can declare
>  instance (Serial k) => GMap (ListGMap (Node k)) k where ... etc
>
>  I should have a spot on code.haskell.org soon, at which point I'll put
>  up a Haddock page with the most up to date version of the api.
>
>  class GMap map where
>
>  type k
>
>  -- | The empty map.
>  empty :: map a
>
>  -- | Create a map with a single association.
>  singleton :: k -> a -> map a
>
>  -- | Create a map from a list of associations which /must/ be in
>  ascending order of keys
>  -- (with /no/ duplicate keys). If in doubt use one of the safer (but
>  slower) 'fromAssocs' functions.
>  fromAssocsAscending :: [(k,a)] -> map a
>
>  -- | Return 'True' if the map contains no associations.
>  isEmpty :: map a -> Bool
>
>  -- | Return 'True' if the map contains exactly one association.
>  isSingleton :: map a -> Bool
>
>  -- | Return the value associated with the supplied key (if any).
>  lookup :: k -> map a -> Maybe a
>
>  -- | Insert a new association in the map if there is currently no
>  value associated with the key.
>  -- If there is a value associated with the key then replace it with
>  the result of
>  -- applying the supplied function to that value.
>  insert :: (a -> a) -> k -> a -> map a -> map a
>
>  -- | Delete the association for the supplied key (if any).
>  delete :: k -> map a -> map a
>
>  -- | This is a combined insert\/modify\/delete operation. The argument
>  to the supplied function
>  -- is ('Just' a) if there is a value (a) associated with the supplied
>  key, otherwise 'Nothing'.
>  -- If the return value is ('Just' a'), a' becomes the new value
>  associated with the supplied key.
>  -- If the return value is 'Nothing', the association for the supplied
>  key (if any) is deleted.
>  alter :: (Maybe a -> Maybe a) -> k -> map a -> map a
>
>  -- | Evaluate the union of two maps. If the maps contain common keys
>  then combine the
>  -- values associated with those keys using the supplied function. The
>  value arguments
>  -- to this function are supplied in the same order as the map arguments.
>  union :: (a -> a -> a) -> map a -> map a -> map a
>
>  -- | Evaluate the intersection of two maps, combining common
>  associations using the supplied function.
>  intersection :: (a -> b -> c) -> map a -> map b -> map c
>
>  -- | Evaluate the difference between two maps. For any key occuring in
>  the second map,
>  -- the corresponding association (if any) is deleted from the first map.
>  -- The associated values in the second map are irrelevant.
>  difference :: map a -> map b -> map a
>
>  -- | Returns true if the keys in the first map are a subset of the
>  keys in the second map.
>  -- (This includes the case where the key sets are identical). Note
>  that this function does
>  -- not examine the associated values (which are irrelevant). See
>  'isSubmapOf' if you
>  -- do want associated values examined.
>  isSubsetOf :: map a -> map b -> Bool
>
>  -- | Returns true if the keys in the first map are a subset of the
>  keys in the second map
>  -- and the corresponding function always returns true when applied to
>  the values associated
>  -- with matching keys.
>  isSubmapOf :: (a -> b -> Bool) -> map a -> map b -> Bool
>
>  -- | Apply the supplied function to every associated value in the map.
>  map :: (a -> b) -> map a -> map b
>
>  -- | Apply the supplied function to every association in the map, and
>  use the result
>  -- as the new associated value for the corresponding key.
>  mapWithKey :: (k -> a -> b) -> map a -> map b
>
>  -- | Delete associations for which the supplied predicate returns
>  'False' when applied to
>  -- the associated value.
>  filter :: (a -> Bool) -> map a -> map a
>
>  -- | Fold right over the list of elements in ascending order of keys.
>  -- See 'foldrElemsAscending'' for a strict version of this function.
>  foldrElemsAscending :: (a -> b -> b) -> map a -> b -> b
>
>  -- | Fold right over the list of keys in ascending order.
>  -- See 'foldrKeysAscending'' for a strict version of this function.
>  foldrKeysAscending :: (k -> b -> b) -> map a -> b -> b
>
>  -- | Fold right over the list of associations in ascending order of keys.
>  -- See 'foldrAssocsAscending'' for a strict version of this function.
>  foldrAssocsAscending :: (k -> a -> b -> b) -> map a -> b -> b
>
>  -- | Fold over elements in un-specified order using /unboxed/ Int
>  accumulator (with GHC).
>  -- Defaults to boxed Int for other Haskells. Typically used for
>  counting functions.
>  -- Implementations are free to traverse the map in any order.
>  -- The folded function is always applied strictly.
>  foldElemsUINT :: (a -> UINT -> UINT) -> map a -> UINT -> UINT
>
>  In addition there a few functions which are useful for groups of maps
>  or nested maps.
>
>  -- | Add the number of associations in a map to the supplied /unboxed/
>  Int (with GHC).
>  -- Defaults to boxed Int for other Haskells.
>  addSize :: map a -> UINT -> UINT
>
>  -- | Find the value associated with the supplied key (if any) and
>  return the result
>  -- of applying the supplied continuation function to that value.
>  Useful for nested lookup.
>  lookupCont :: (a -> Maybe b) -> k -> map a -> Maybe b
>
>  -- | Reject empty maps (return Nothing).
>  nonEmpty :: map a -> Maybe (map a)
>
>  The following functions are useful internally as most of the maps are
>  defined in terms of simpler maps. Also see this thread on the need for
>  unionMaybe.
>
>  -- | Similar to 'insert', but the association is deleted if the
>  supplied function returns 'Nothing'.
>  -- (The supplied function is always applied strictly.)
>  insertMaybe :: (a -> Maybe a) -> k -> a -> map a -> map a
>
>  -- | Find the value associated with the supplied key (if any) and
>  apply the supplied function
>  -- to that value. Delete the association if the result is 'Nothing'.
>  Replace the old value with
>  -- the new value if the result is ('Just' something).
>  -- (The supplied function is always applied strictly.)
>  deleteMaybe :: (a -> Maybe a) -> k -> map a -> map a
>
>  -- | Evaluate the union of two maps, but delete combined associations
>  from the result map
>  -- if the combining function returns 'Nothing'.
>  -- (The combining function is always applied strictly.)
>  unionMaybe :: (a -> a -> Maybe a) -> map a -> map a -> map a
>
>  -- | Evaluate the intersection of two maps, but delete combined
>  associations from the result map
>  -- if the combining function returns 'Nothing'.
>  -- (The combining function is always applied strictly.)
>  intersectionMaybe :: (a -> b -> Maybe c) -> map a -> map b -> map c
>
>  -- | Difference with a combining function. If the combining function returns
>  -- @Just a@ then the corresponding association is not deleted from the
>  result map
>  -- (it is retained with @a@ as the associated value).
>  differenceMaybe :: (a -> b -> Maybe a) -> map a -> map b -> map a
>
>  -- | Apply the supplied function to every associated value in the map.
>  -- If the result is 'Nothing' then the delete the corresponding association.
>  -- (The supplied function is always applied strictly.)
>  mapMaybe :: (a -> Maybe b) -> map a -> map b
>
>  Thanks
>
>  Jamie
>  _______________________________________________
>  Libraries mailing list
>  Libraries at haskell.org
>  http://www.haskell.org/mailman/listinfo/libraries
>