Tillmann Rendel rendel at informatik.uni-marburg.de
Sat Jun 16 09:57:20 CEST 2012

```Hi George,

George Giorgidze wrote:
> I would like to announce the first release of the set-monad library.
>

Very cool. Seems to work fine. But I am wondering about the impact of
using your package on asymptotic complexity (and thereby, on performance).

> data Set a where
>   Prim   :: (Ord a) => S.Set a -> Set a
>   [...]
>   Zero   :: Set a
>   Plus   :: Set a -> Set a -> Set a

I notice that this part of your datatype looks like a binary tree of
sets. Lets see how your run function treats it:

> run :: (Ord a) => Set a -> S.Set a
> run (Prim s)              = s
> [...]
> run (Zero)                = S.empty
> run (Plus ma mb)          = S.union (run ma) (run mb)

As expected, the Prim-Zero-Plus structure is treated as a binary tree of
sets, and run collects all the sets together. That is probably fine,
because it should have the same complexity as combining these sets in
the first place.

> run (Bind (Prim s) f)     = S.foldl' S.union S.empty (S.map (run . f) s)
> [...]
> run (Bind Zero _)         = S.empty
> run (Bind (Plus ma mb) f) = run (Plus (Bind ma f) (Bind mb f))
> [...]

But when I use the binary tree of sets on the left-hand side of a bind,
your run function has to naively traverse the whole tree. So if the same
elements are included in many sets in the tree of sets, these elements
will be processed more than once, so the overall complexity is worse
than necessary.

Here's a ghci session that seems to confirm my suspicion. I first define
> \$ let s1 `times` s2 = do {e1 <- s1; e2 <- s2; return (e1, e2)}

Now I produce some test data:
> \$ let numbers = fromList [1 .. 1000]
> \$ let unioned = numbers `union` numbers
> \$ let mplused = numbers `mplus` numbers

Note that these three sets are all equivalent.
> \$ (numbers == unioned, numbers == mplused, unioned == mplused)
> (True, True, True)

However, they behave differently when passed to the times function
above. numbers and unioned are similarly "fast":
> \$ :set +s
> \$ size \$ numbers `times` numbers
> 1000000
> (2.56 secs, 1315452984 bytes)
>
> \$ size \$ unioned `times` unioned
> (2.39 secs, 1314950600 bytes)
> 1000000

(Why is unioned faster then numbers? Is union doing some rebalancing?
Can I trigger that effect directly?)

But mplused is much slower:
> \$ size \$ mplused `times` mplused
> 1000000
> (10.83 secs, 5324731444 bytes)

I suspect that I can achieve similar results by using the list monad and
converting to a set in the very end. In what situations can I benefit