Treaps and Randomization in Haskell
by Jesper Louis Andersen <jlouis@mongers.org> for The Monad.Reader IssueFour
05/07 - 2005
Abstract. We give an example implementation of Treaps in Haskell. The emphasis is partly on treaps, partly on the System.Random module from the hierachial libraries. We show how to derive the code and explain it in an informal style.
Introduction
I have, a number of times, warned people that I ought to do a TMR article. The world had its way, and I had to wait until the Summer to be able to finish an article. So this is it. Originally, I considered playing around with the ALL-PAIRS-SHORTEST-PATH algorithms, but for some reason I was not really satisfied with it. Also, with the upcoming Matrix library in the hierachial libraries, this might prove to be a better solution.
Instead I will provide a treatise on the Treap data structure, devised by Aragon and Seidel. I have much to thank them for in the following. Usually citations are at the back of an article, but I really advise you to read Randomized Search Trees (Algorithmica, 16(4/5):464-497, 1996). This document owes about 90% to the mentioned article.
I also advise you to check out Oleg Kiselyovs work on treaps for Scheme. He does a number of optimizations on the data structure which I have skipped over here. Take a look at
Olegs Scheme Treap implementation
Search trees
The classic problem of computer science is how to express and represent a finite map in a programming language. Formally a finite map is a function f: K --> V, which is said to map a finite set K, of keys, to a (thus also finite) set V, of values. The basic functions are: lookup(f, k), which will return the value f(k) in V, associated with the value k in K; insert(f, (k,v)) which extends or updates the finite map with a new key/value pair; and finally, delete(f, k), which removes the association of k from f.
One such representation is the binary search tree (In much litterature, the acronym BST is used). I assume most readers of TMR are familiar with binary search trees, and especially the pathological case degenerating the worst case search time bounds to O(n).
There are a number of strategies for avoiding the degenerate case where the tree becomes a linked list in effect. One could be to add invariants to the tree, which ensures that it stays inside certain balance bounds. One example is the AVL tree, which maintains the following invariant: At each node, the child-subtrees differ in depth by at most 1. This ensures the AVL-tree is always balanced and the pathological case where a tree is actually a list is ruled out. As a side note, an AVL tree will never be worse in structure than a fibonacci tree1.
Another famous example is the Red-Black tree, which provides a less strict balance invariant than the AVL tree. The invariant is harder to describe in a single paragraph -- but involves colouring nodes either red or black and adding invariants such that the tree always stays reasonably balanced. See Introduction to algorithms by Cormen, Leiserson, Rivest and Stein if you want to read the hard, incomprehensible imperative version of this data structure, or Purely Functional Data Structures by Chris Okasaki if you want the functional approach to this (the functional code is a mere 12 lines without the delete operation).
The Data.Map module in the hierachial libraries of Haskell use another type of tree known as the 2-3 tree. The 2-3 tree is self-balancing but uses a different trick. A 2-3 tree node contains either one or two (k, v)-pairs and thus has either 2 or 3 children. We call a node with 2 children a 2-node and a node with 3 children a 3-node. Insertions are always done into leaf nodes which can grow from 2-nodes to 3-nodes in a natural way. Growth of a 3-node is then done by splitting the node into two 2-nodes; the least (k, v)-pair, the greatest pair and the middle pair is inserted into the parent node2.
Sadly, the above is not true. Data.Map has binary trees which are balanced according to the size of the left and right subtrees. If one subtree grows beyond a certain constant factor it is rebalanced -- JesperAndersen
Yet another variant of the finite map is the splay tree. In the splay tree the rebalancing is done according to a simple heuristic which amortized over a certain number of operations yields O(lg n) worst case running time. Splay trees are not that good for purely functional languages, since they change the tree for all operations, including lookup. Thus our type for a lookup function would be:
Splay_lookup :: key -> SplayTree key value -> (Maybe value, SplayTree)
