union in Data.Map

Ross Paterson ross at soi.city.ac.uk
Tue Mar 1 18:49:08 CET 2011

[sidetracking from the discussion of unordered-containers]
On Thu, Feb 24, 2011 at 12:36:42PM +0000, Ross Paterson wrote:
> On Wed, Feb 23, 2011 at 08:45:47AM +0000, Max Bolingbroke wrote:
> >  3. Some map combining algorithms work more efficiently when one of
> > their two arguments are smaller. For example, Data.Map.union is most
> > efficient for (bigmap `union` smallmap). If you don't care about which
> > of the two input maps wins when they contain the same keys, you can
> > improve constant factors by testing the size of the map input to size
> > (in O(1)) and flipping the arguments if you got (smallmap `union`
> > bigmap) instead of the desirable way round.
> This is a most unfortunate feature of Data.Map, forcing users to bend the
> logic of their application to suit the library.

On closer inspection, I think the documentation is underselling the
code here.  From experiments adding 1000 trees of size 10 to a tree of
size 1000, it seems that the best order varies with the distribution of
keys, but on average they come out about even.  (Maybe small U big is
slightly faster.)

Also, I believe the hedge algorithm does achieve the asymptotic optimum,
namely O(m log(n/m)), where m and n are the sizes of the smaller and
larger trees respectively.  It does do more comparisons than necessary,
but these can be avoided, e.g. by altering hedgeUnionL to

hedgeUnionL :: Ord a
            => MaybeS a -> MaybeS a -> Map a b -> Map a b
            -> Map a b
hedgeUnionL _   _   t1  Tip
  = t1
hedgeUnionL blo bhi Tip (Bin _ kx x l r)
  = join kx x (filterGt blo l) (filterLt bhi r)
hedgeUnionL blo bhi (Bin _ k1 x1 l1 r1) t2@(Bin _ k2 _ l2 r2)
  = case compare k1 k2 of
    LT -> join k1 x1 (hedgeUnionL blo bmi l1 (trim blo bmi l2))
                     (hedgeUnionL bmi bhi r1 t2)
    EQ -> join k1 x1 (hedgeUnionL blo NothingS l1 (trim blo NothingS l2))
                     (hedgeUnionL NothingS bhi r1 (trim NothingS bhi r2))
    GT -> join k1 x1 (hedgeUnionL blo bmi l1 t2)
                     (hedgeUnionL bmi bhi r1 (trim bmi bhi r2))
    bmi = JustS k1

and similarly for difference.  The extra lookup in the With versions
seems harder to avoid, though.

More information about the Libraries mailing list