# union in Data.Map

Ross Paterson ross at soi.city.ac.uk
Tue Mar 1 18:49:08 CET 2011

```[sidetracking from the discussion of unordered-containers]
On Thu, Feb 24, 2011 at 12:36:42PM +0000, Ross Paterson wrote:
> On Wed, Feb 23, 2011 at 08:45:47AM +0000, Max Bolingbroke wrote:
> >  3. Some map combining algorithms work more efficiently when one of
> > their two arguments are smaller. For example, Data.Map.union is most
> > efficient for (bigmap `union` smallmap). If you don't care about which
> > of the two input maps wins when they contain the same keys, you can
> > improve constant factors by testing the size of the map input to size
> > (in O(1)) and flipping the arguments if you got (smallmap `union`
> > bigmap) instead of the desirable way round.
>
> This is a most unfortunate feature of Data.Map, forcing users to bend the
> logic of their application to suit the library.

On closer inspection, I think the documentation is underselling the
code here.  From experiments adding 1000 trees of size 10 to a tree of
size 1000, it seems that the best order varies with the distribution of
keys, but on average they come out about even.  (Maybe small U big is
slightly faster.)

Also, I believe the hedge algorithm does achieve the asymptotic optimum,
namely O(m log(n/m)), where m and n are the sizes of the smaller and
larger trees respectively.  It does do more comparisons than necessary,
but these can be avoided, e.g. by altering hedgeUnionL to

hedgeUnionL :: Ord a
=> MaybeS a -> MaybeS a -> Map a b -> Map a b
-> Map a b
hedgeUnionL _   _   t1  Tip
= t1
hedgeUnionL blo bhi Tip (Bin _ kx x l r)
= join kx x (filterGt blo l) (filterLt bhi r)
hedgeUnionL blo bhi (Bin _ k1 x1 l1 r1) t2@(Bin _ k2 _ l2 r2)
= case compare k1 k2 of
LT -> join k1 x1 (hedgeUnionL blo bmi l1 (trim blo bmi l2))
(hedgeUnionL bmi bhi r1 t2)
EQ -> join k1 x1 (hedgeUnionL blo NothingS l1 (trim blo NothingS l2))
(hedgeUnionL NothingS bhi r1 (trim NothingS bhi r2))
GT -> join k1 x1 (hedgeUnionL blo bmi l1 t2)
(hedgeUnionL bmi bhi r1 (trim bmi bhi r2))
where
bmi = JustS k1

and similarly for difference.  The extra lookup in the With versions
seems harder to avoid, though.

```