[Haskell-cafe] weak references & caches

Thu Feb 8 21:11:32 EST 2007

Hi All,

I'm hacking some (external) B-Tree code, and amongst the numerous
interesting problems I've come up against[*], is to do with managing
which pages/nodes are in memory and which are not.

We use TVars to point from one (in-memory version of a) page to
another, so we have a type like the following:

data Node
    = Leaf [(Key,Value)]
    | Node NodePtr [(Key,NodePtr)]

type NodePtr = TVar (Maybe Address, Maybe Node)

type Address = Integer

Addresses are just page numbers or file offsets, or similar.

NodePtr is the interesting bit - the following table summarizes what
the combination of values mean:

explain (Just addr, Nothing) = "An external node we have not read in yet"
explain (Nothing, Just node) = "A new node for which we have not yet
allocated a page"
explain (Just addr, Just node) = "A node we've read in"
explain (Nothing, Nothing) = "Emit nasal daemons"

Now, we want a cache to hang on to nodes that we've read in to hold
them in memory, to save repeatedly reading them off disk
unnecessarily. But more than that, we want to make sure that when we
create a NodePtr pointing to a particular Address, that all references
to this particular address go through the same NodePtr, otherwise in a
concurrent situation we could well end up with two copies of the page
in memory, both recieving updates leading to havoc. At the same time,
we'd like to be able to evict pages according to some policy (the
Oracle hot-list cold-list cache is attractive, if implemented
carefully).

So one scheme that comes to mind is to keep a table mapping addresses
to (Weak NodePtr). When we ask for a node, first we consult the cache
(if we have one) and get the NodePtr right back. If the addressed node
is not in the cache then we check the table to see if there is/was a
known NodePtr for that address. If there is none then we can allocate
a new one and stick it in the table, and if it is there then we return
it.  This is also nice because we can attach finalizers to the (Weak
NodePtr)s to write them out if dirty.

So the question is, is there a simpler scheme (though this one isn't
tooooo bad), and more importantly, is this likely to be horribly slow?
How efficient is dereferencing a weak reference, and what is the
likely impact on performance of having 10^3-10^5 weak references
around?

cheers,
Tom
[*]Film at 11. I'm planning to write it all up, maybe for a
conference, maybe just for the Monad Reader, or something.