Oh, god, so much to respond to...heh.<br><br><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">
Submit this package for canonicalization as part of the Haskell Platform. I would for one would support its inclusion.</blockquote><br>This is an option I seriously hadn't considered. To be fair, that's because I've never used the Platform myself, preferring rather to have the most up-to-date version of GHC at all times, heh. That said, while I'd be okay with this option, I'd prefer putting it into containers, because I feel like a canonical, reliable priority queue implementation is the sort of thing a self-respecting language ought to have built in.<br>
<br><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">
As does Python. In Python, though, the PQ implementation is not a built-in class in the default namespace (as are dict and set). Rather, it is one of the "batteries included" libraries that come with Python. I think that's the right place for it in Haskell, too.</blockquote>
<br>I don't know Python, but according to Wikipedia, dict and set are built into the language. I don't think it's a fair comparison: set and dict in Python seem to have a role almost as ubiquitous as [] in Haskell, much more ubiquitous than e.g. Data.Set or Data.Map. I'm also not entirely sure that "batteries included" doesn't describe containers, given all the other packages that come with GHC.<br>
<br><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">
> * There is no distinction between keys and priority values. A utility type</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">
> Prio p a with the instance Ord p => Ord (Prio p a) is exported to allow</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">
> usage of distinct keys and priority values. </blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">
I disagree with this one. It requires an Ord instance that isn't really an ordering, and makes a Functor instance impossible. I would prefer an interface separating keys and values like that of Data.Map (which would also increase consistency within the package).</blockquote>
<br>I'd be okay with separating out a priority/value version. However, I'm still clueless as to what you're talking about with respect to Functor -- what's wrong with the following?<br>data Prio p a = Prio p a<br>
instance Ord p => Ord (Prio p a) where ...<br>instance Functor (Prio p) where fmap f (Prio p a) = Prio p (f a)<div><br>I can understand if you're uncomfortable with (==) and (\ x y -> compare x y == EQ) being inequivalent, but neither the H98 Report nor the Prelude make any such claim, as far as I can tell.<br>
<br><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">
The Foldable instance breaks the abstraction. I think it should process elements in order.</blockquote><br>I think that wanting to iterate over the priority queue in the middle of the computation, without caring about order, is a perfectly legitimate desire for a programmer! Moreover, the only way to make a Foldable instance process elements in order would be something like<br>
data Ord a => PQueue a = ....<br>which I think is an awfully dirty approach. I'm not at all a fan of adding restrictions like that, not least because it adds lots of awkward overhead.<br>Would you be okay with not exporting a Foldable instance at all, but still exporting a toList method which doesn't guarantee any ordering on the return list?<br>
<br><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">
My advice would be not to stress over whether priority queues go into containers. It's not some pristine thing of beauty that deserves treatment with velvet gloves.</blockquote><br>I'm...not sure how to respond to this claim. At least part of me wants to say, I genuinely do think the containers package is a piece of art...and then another part pipes up, "except for the inconsistencies between the various minView/maxView versions, and the little differences between IntMap and Map, and..." That said, I wouldn't be a fan of scrapping the style which the containers package has at least tried to create. I could be convinced that rewriting the rest of containers would be a good thing to do, though...and I might just want to do that myself. Hah.<br>
<br><div><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">
How does this implementation compare with using Map/Set as a priority queue?</blockquote><br>Continuing the discussion of the benchmarks: first, Jim, it appears that I'm the one who made a n00b benchmarking error. TT_____TT That said, as you found, my implementation is still slightly faster when the benchmark is corrected. Some comments:</div>
<div><ul><li>QuickBinom needs to have O(1) findMin for a reasonable comparison. I added that in the benchmark below, and here.</li><li>I can't think of any more optimizations for the sparse binomial heap -- I genuinely think it's not going to get better.</li>
<li>There is a way to optimize my implementation still further, but it makes my code much less readable. (Specifically, I start the BinomForest at Succ Zero, and unpack the first child of every node still in the forest. Modifying the whole implementation that way, though, makes it unreadably ugly...and I think QuickBinom is possibly already at that point. I started implementing it, and realized just how ugly it was, and I stopped, but I can finish it if I have to.)</li>
</ul></div><div>Sorting 500,000 elements, compiled with -O2, run with +RTS -H128m -K64m, with another few implementations thrown in for good measure:<br><div>Times (ms)</div><div> min mean +/-sd median max </div>
<div>Pairing: 1440.090 1482.893 31.501 1482.093 1532.095</div><div>Binomial: 1356.084 1389.687 26.881 1388.087 1436.090</div><div>SBinomial: 1376.086 1422.489 48.453 1400.088 1520.095</div><div>Data.Set: 1712.107 1800.912 74.880 1782.111 1976.123</div>
<div>Skew: 1584.099 1644.503 85.702 1602.101 1848.116</div></div><div><br></div><div>Some other <a href="http://hackage.haskell.org/trac/ghc/attachment/ticket/3909/plot_2.png" target="_blank">benchmarks</a> were done by Milan Straka earlier, when we hadn't decided what heap implementation to use at all. His comments:<br>
<blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">
I think the pairing heaps are out of the question now. I would choose between Binomial and Leftist. The Binomial have O(1) amortized inserts, but beware, that this does not work in persistent setting -- the insert is O(log N) when the heaps are used persistently (there are Skew Binomial Heaps with O(1) insert in the persistent setting too). </blockquote>
<blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">
I vote for current (v5) Binomial implementation, even if the O(1) amortized inserts works only non-persistently (ie. under heavy persistent usage, leftist heaps might be a _little_ better).</blockquote><div><br></div><div>
Conclusions: There aren't any differences in asymptotics, and as it stands, the existing implementation is just as fast. It's also a) done, and b) full of Haskellish type goodness.</div><div><br></div><div>After about five hours' work (!!!) I *finally* managed to install Criterion yesterday, so I'll send out those tests ASAP.</div>
<br>Louis Wasserman<br><a href="mailto:wasserman.louis@gmail.com" target="_blank">wasserman.louis@gmail.com</a><br>
<a href="http://profiles.google.com/wasserman.louis" target="_blank">http://profiles.google.com/wasserman.louis</a><br><br></div>
</div>