Naive question on lists of duplicates

Liyang HU liyang@nerv.cx
Thu, 5 Jun 2003 19:33:45 +0100


On Thu, Jun 05, 2003 at 08:09:02AM -0500, Stecher, Jack wrote:
> The task is simply to look at a lengthy list of stock keeping units (SKUs
> -- what retailers call individual items), stores, dates that a promotion
> started, dates the promotion ended, and something like sales amount; we
> want to pull out the records where promotions overlap.  I will have dates
> in yyyymmdd format, so there's probably no harm in treating them as Ints.

(Disclaimer: provided I've understood your problem correctly...)
For `heap' below, read `Data.FiniteMap'.

Create an empty heap.
For each record,
   For each day the promotion runs,
       Insert the record[1] into the heap with the date as the key.
       If it clashes with an existing element, then we've an overlap,
           Store both records in another heap keyed on the SKU.
           We can ignore clashes.
       Don't forget to insert the rest of the dates.
(I'm sure there's plenty of optimisations that can be done here...)

[1] Or at least the SKU, or anything that can uniquely identify the
promotion.

As Sarah said, _provided_ there's a small bound on the number of days that a
promotion can run for, then we can assume it to be a constant, resulting in
an O(n log n) algorithm.

Your dates are already Ord'ered, so that's good. The only niggle I see is
that you're going to have to figure out some way of generating all the dates
for which a promotion runs, which is going to be messy. Thankfully (I think)
the System.Time module should take care of this for you, if you can massage
your existing dates into some workable form...

/Liyang
[0] or some other suitable data structure. I like heaps because you can
avoid costly O(n^2) duplicate checking by paying a price of O(log n) for
each insertion. :)
-- 
.--| Liyang HU |--| http://nerv.cx/ |--| Caius@Cam |--| ICQ: 39391385 |--.
| :::::::::::::::::::::: This is not a signature. :::::::::::::::::::::: |