[Haskell] Re: Quicksearch vs. lazyness

Mon Mar 19 10:19:34 EDT 2007

Steffen Mazanek wrote:
>
> say, we want to find the k'th element in a sorted list.

I assume that you want to find the k'th smallest element of an unsorted
list by sorting it?

> [...]
>
> However, I was wondering whether it might be possible
> to implement quicksort in a way that quicksearch is
> done out of the box by lazy evaluation. Is there any way
> to do this?

Yes, that can be done. It's a small extension of the folklore (?) example

  minimum = head . mergesort

that extracts the smallest element of a list. Given a proper mergesort,
laziness will ensure that this function actually runs in O(n) time
instead of the expected O(n log n). Apparently, the first k smallest
elements now can be obtained by

  take k . mergesort

which may run in O(n + k*log n) time with a laziness-friendly mergesort.

> From my understanding for small k's lazy
> evaluation already does the trick for the naive quicksort
> algorithm (quicksort smaller ++ [x] ++ quicksort greater),
> doesn't it? Is there a search algorithm that makes better
> use of lazy evaluation out of the box?

No, as far as I thought about quicksort, it needs O(n log n) to produce
the first element. But even the mergesort implementation has to be
chosen carefully as I've seen many that still need O(n log n) just to
return the smallest element.

Alas, her is a laziness-friendly mergesort:

  mergesort []  = []
  mergesort xs  = foldtree1 merge $ map return xs

  foldtree1 f [x] = x
  foldtree1 f xs  = foldtree1 f $ pairs xs
     where
     pairs []        = []
     pairs [x]       = [x]
     pairs (x:x':xs) = f x x' : pairs xs

  merge []     ys     = ys
  merge xs     []     = xs
  merge (x:xs) (y:ys) =
      if x <= y then x:merge xs (y:ys) else y:merge (x:xs) ys

Why does this work? The function 'foldtree1' folds the elements of a
list as if they where in a binary tree:

  foldrtree1 f [1,2,3,4,5,6,7,8]
 ==>
  ((1 `f` 2) `f` (3 `f` 4)) `f` ((5 `f` 6) `f` (7 `f` 8))

The important point is that this expression is constructed in O(n + n/2
+ n/4 + n/8 + ..) = O(n) time. Now, given such a binary tree of `merge`
operations, getting the next list element of the result takes O(log n)
time. Why? Because the merge in the middle gets the next element either
from its left or its right subexpression which in turn gets its element
from its left or its right subexpression and so on. Clearly, we only
need logarithmically many steps to hit the bottom. Putting both
together, we see that we have to pay O(n + k*log n) steps to build the
expression and to fetch the first k elements.
Making this argument rigorous with a suitable debit invariant is left to
the attentive reader :)

Regards,
apfelmus