Faster IntSet by using BitMaps in the lower branches

Joachim Breitner mail at joachim-breitner.de
Sun Sep 18 22:47:23 CEST 2011


Hi,

Am Sonntag, den 18.09.2011, 22:13 +0200 schrieb Henning Thielemann:
> On Sun, 18 Sep 2011, Joachim Breitner wrote:
> > I have attached some benchmarking result. While results are good for
> > member, great for insert, intersection and union, toList is slower for
> > sparse maps. toList is basically a foldr, so I think the culprit is this
> > function:
> >
> > foldrBits :: Int -> (Int -> a -> a) -> a -> Word -> a
> > foldrBits shift f x = go shift
> >  where STRICT_1_OF_2(go)
> >        go bi 0 = x
> >        go bi n | n `testBit` 0 = f bi (go (succ bi) (n `shiftRL` 1))
> >                | otherwise     =       go (succ bi) (n `shiftRL` 1)
> >
> > I’ll try to optimize this function individually now, any suggestions are
> > welcome.
> 
> You can certainly do some binary search by masking and comparing with bit 
> patterns like
>     1 `shiftL` 32 - 1 `shiftL` 16
>     1 `shiftL` 16 - 1 `shiftL` 0

I’d like to avoid the binary search, as it is more expensive for dense
sets. Milan’s suggestion of shifts by 6 might be a good compromise.
Another approach might be to first use lowestBitSet to start with the
lowest bit. In case of only one bit set, it will not iterate further
then.

I tried adding some strictness annotation to go and see if that helps.
According to the attachement, it does, instead of 4 times slower in the
worst case (Size 4 million, step 100) it is only ~2,2x slower.


What does "Str=DmdType U(L)U(L)m" in -fdump-stranal mean?


Acccording to the core, GHCziPrim.uncheckedShiftRLzh is used, and also
succ gets properly resolved to GHCziPrim.zpzh.


Intersection is slower because the intersection of a Tip with a Bin
cannot just be resolved by a single lookup. Might this be related to the
following and if yes, what I can do about it?

SpecConstr
    Function `main:Data.DenseIntSet.intersection{v r1dK} [lidx]'
      has four call patterns, but the limit is 3
    Use -fspec-constr-count=n to set the bound
    Use -dppr-debug to see specialisations

Greetings,
Joachim

-- 
Joachim "nomeata" Breitner
  mail at joachim-breitner.de  |  nomeata at debian.org  |  GPG: 0x4743206C
  xmpp: nomeata at joachim-breitner.de | http://www.joachim-breitner.de/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://www.haskell.org/pipermail/libraries/attachments/20110918/639661e2/attachment.pgp>


More information about the Libraries mailing list