Proposal: add 'findLess' and variants to containers

Twan van Laarhoven twanvl at gmail.com
Fri Feb 17 23:20:10 CET 2012


On 2012-02-17 22:21, Twan van Laarhoven wrote:
> On 16/02/12 20:37, Johan Tibell wrote:
>> Hi,
>>
>> First, thanks for writing such a detailed proposal.
>>
>> I don't have strong feelings one way or the other. Like always I'm
>> worried about API growth. How about performance? Can these functions
>> be implemented efficiently outside the library?
>
> I also did some performance tests, comparing an implementation that uses only
> the exposed API (split,findMin) to one that uses the constructors. The benchmark
> code is attached, it is mostly copy-pasted from benchmarks/Map.hs. Here is a
> general idea of the results (ghc 7.0.4 on windows 7) :
>
> I haven't yet compared the performance for Set yet (which I expect will be
> similar), nor for IntMap (where I have no clue).

Here are the results for IntMap:

     benchmarking GE split present
     mean: 313.7273 us, lb 312.2985 us, ub 314.8706 us, ci 0.950
     std dev: 7.010637 us, lb 4.061641 us, ub 9.947760 us, ci 0.950

     benchmarking GE def present
     mean: 197.7439 us, lb 195.8572 us, ub 199.6307 us, ci 0.950
     std dev: 9.474155 us, lb 9.388414 us, ub 9.482203 us, ci 0.950

     benchmarking GE split absent
     mean: 625.5818 us, lb 621.1370 us, ub 630.5826 us, ci 0.950
     std dev: 24.49309 us, lb 21.45281 us, ub 26.63424 us, ci 0.950

     benchmarking GE Craig absent
     mean: 239.5751 us, lb 237.8357 us, ub 241.3148 us, ci 0.950
     std dev: 9.259975 us, lb 7.455007 us, ub 11.33443 us, ci 0.950

     benchmarking GE def absent
     mean: 186.1479 us, lb 184.3935 us, ub 187.7270 us, ci 0.950
     std dev: 8.600419 us, lb 8.080284 us, ub 8.809609 us, ci 0.950

As you can see, when the key is present in the map, a specialized version 
("def") takes 2/3 as long as one built on top of split. But when the key is not 
in the map, the difference becomes larger, around 1/3. For findGreater (as 
opposed to findGreaterEqual), we are essentially always in the 'not found in the 
map' case.

By the way, in all cases, both for Map and for IntMap, passing a default 
argument is better than returning nothing and then later unpacking that again. 
What I mean is:

     go _ Tip = Nothing
     go k (Bin _ kx x l r) =
         case compare k kx of
             LT -> case go k l of
                     Nothing -> Just (kx,x)
                     ret -> ret
             GT -> go k r
             EQ -> Just (kx,x)

vs.

     go def _ Tip = def
     go def k (Bin _ kx x l r) =
         case compare k kx of
             LT -> go (Just (kx,x)) k l
             GT -> go def k r
             EQ -> Just (kx,x)

If you want to include the attached code in the library, you should use 
Map_FindGE.findGreater{Equal}3 and IntMap_FindGE.findGreater{Equal}3, those are 
the most efficient versions.


Twan
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: IntMap_FindGE.hs
URL: <http://www.haskell.org/pipermail/libraries/attachments/20120217/da6a08fe/attachment.asc>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: findGE-IntMap.hs
URL: <http://www.haskell.org/pipermail/libraries/attachments/20120217/da6a08fe/attachment.txt>


More information about the Libraries mailing list