# mkSet on sorted lists

Wolfgang Jeltsch wolfgang at jeltsch.net
Mon Jan 12 22:11:30 EST 2004

Am Sonntag, 11. Januar 2004 22:15 schrieb Tomasz Zielonka:
> On Wed, Jan 07, 2004 at 07:30:35PM +0100, Tomasz Zielonka wrote:
> > On Wed, Jan 07, 2004 at 05:41:04PM +0100, Wolfgang Jeltsch wrote:
> > > Hello,
> > >
> > > does Data.Set.mkSet run in linear time when applied to a sorted list?
> >
> > No, at least not in the version from GHC 6.0. This implementation is
> > based on FiniteMap. mkSet translates to listToFM, which incrementally
> > constructs the result map using foldl on the input list.
> >
> > Hmmm, I may try to fix it in a day or two... it shouldn't be too
> > difficult.
>
> Well, implementing the algorithm wasn't difficult, but keeping memory
> usage low was quite tricky. In the first version the 'build' function
> returned a pair with a tree and the rest of the input list. The cost
> of this was so high that the linear version was still slower than
> traditional listToFM for lists with 100000 elements.
>
> I checked that using unboxed tuples helps, but I wasn't satisfied with
> such solution. In the end I found that transforming the function to use
> continuation passing style gives similar performance. Now the speed gain
> is easily noticable (3x on a 100000 element list and a couple of
> lookups).
>
> The code given below won't immediately affect mkSet. This is only a
> prototype, so the name and signature are probably not the best.
>
> The function could also cope with non-decreasing lists by omitting
> duplicated keys, but I didn't wanted to make it too complex yet.
>
> listToFM could work as follows - take the sorted prefix of the list,
> build a tree for this prefix in linear time and then fall back to
> inserting elements one by one.
>
> It would be nicer to extract the longest sorted sublist from the list,
> but I am afraid that this has Theta(N log N) complexity already.
>
> Well, what do you think about it?
> Would you like such improvements to FiniteMap and Set?
>
> Best regards,
> Tom
>
> ---------------------------------------------------------------------------
> -- Building tree from a sorted list
>
> trySortedListToFM :: (Ord key OUTPUTABLE_key)
>                   => [(key, elt)] -> Maybe (FiniteMap key elt)
> trySortedListToFM list
>
>     | sorted    = Just (build (length list) list const)
>     | otherwise = Nothing
>
>   where
>     sorted  = and (zipWith (\(l,_) (r,_) -> l < r) list (tail list))
>
>     build 0 xs cont =
>         cont EmptyFM xs
>     build 1 ((key, elt) : xs) cont =
>         cont (unitFM key elt) xs
>     build n xs cont =
>         build leftSize xs (\left ((key, elt) : ys) ->
>             build rightSize ys (\right zs ->
>                 cont (mkBranch 14{-which-} key elt left right) zs))
>       where
>         leftSize = n div 2
>         rightSize = n - leftSize - 1

Hello Tom,

from which cardinality on one can notice a speed gain with the linear time
algorithm?  If this cardinality is larger than 1000, I probably don't need a
linear time algorithm in my current project.

However, it would be nice (at least from a theoretical point of view) to have
a listToFM algorithm which naturally has O(n) time for sorted lists by which
I mean that the algorithm doesn't include a test for sorted lists (or
sublists) but is just constructed in such a way that it automatically has
O(n) time for sorted lists.

Wolfgang