Yhc/RTS/GC

From HaskellWiki
< Yhc‎ | RTS
Revision as of 15:15, 12 May 2006 by TomShackell (talk | contribs)
Jump to navigation Jump to search
Part of Yhc

(Download)

The Yhc Garbage collector is based on the Jonkers collector which is a Mark/Sweep/Compacting collector designed to run on systems with limited memory, for more details of the algorithm in general see:

 H. B. M. Jonkers. A fast garbage compaction algorithm. Information Processing Letters, 9(1):25--30, July 1979.

Sadly this paper is a little hard to get hold of now so this page will explain the algorithm in brief and in particular how the Yhc implementation of it works.

This page should serve as a help to anyone wanting to modify the garbage collector, however, my first piece of advice in such an adventure is don't. These pages should give an impression of just how tightly "tied together" the garbage collector is, and how much it depends on to work. It has easily been the most time consuming and bug prone part of writing Yhc, you have been warned :-)

The Jonkers algorithm is divided in 3 phases:

 - Mark     find all the heap nodes that are reachable from the program root.
 - Sweep    go through and calculate the new positions for all the nodes about 
            to be moved and update all the pointers to point to the new 
            positions.
 - Compact  actually move all the heap nodes to their new positions.

Mark

Yhc uses a fairly standard Yhc/RTS/PointerReversal mark phase to find all the heap nodes that are reachable from the program roots (and thus are still alive).

One thing of interest is that Yhc uses a seperate table of mark bits rather than having the mark bit as a field of each heap node. This is described in Yhc/RTS/Machine.

The program roots for Yhc are:

 - For each process stack (see Yhc/RTS/Machine):
      - The stack itself (stacks are heap nodes).
      - All the entries in the stack that are 'stack data' (i.e. not stack frames).       
      - For each stack frame the vapptr pointer.
 - For each heap global (see Yhc/RTS/Primitive):
      - The contents of the global pointer.
 - For each FInfo node which is reachable from the heap:
      - All the references to heap nodes in the constant table of the FInfo.

The most complicated part is the FInfos. FInfos have constant-tables which contain references to heap nodes and references to other FInfos that the function might call (see Yhc/RTS/Machine).

Thus the constant tables act as another form of program root, however, we only need to save the nodes referenced in a constant table of a FInfo if that FInfo is reachable from some node in the heap. A FInfo is reachable if it is either referenced by a node in the heap or it is referenced from a reachable FInfo.

Thus when a node in the heap is marked, if it is an application to a FInfo then the FInfo is added to end of the list of live FInfos. The first FInfo in this list is stored in the global variable G_firstCaf (the name 'Caf' here is a historical carry over from nhc, it is not only Cafs that are of interest in this process).

The 'link' field in the FInfo (see Yhc/RTS/Machine) is used to link the FInfos in the list together. It also serves as a 'mark bit' for the FInfo; if a FInfo's link field is already set then this FInfo has already been added to the list.

When all the heap nodes reachable from the other (non-FInfo) roots have been marked the FInfo list is traversed. All nodes that are referenced from the constant table are marked and all new FInfos references from the constant table are added to the end of the FInfo list.

Marking the heap is O(L) where L is the number of live objects in the heap.

Sweep

The Jonkers algorithm is a compacting Mark/Sweep algorithm. This means all the live heap nodes which are fragmented throughout the heap are moved to front of the heap. Compacting the heap in this way has the desirable property that allocation is then simply a case of incrementing the heap pointer.

The sweep phase in the Jonkers algorithm scans linearly through the heap looking for marked (and thus live) heap nodes and calculating what the heap nodes new positions will be.

Because the Jonkers algorithm moves heap nodes all the pointers to a heap node need to be updated with that node's new position. This process is also (partly) done by the sweep phase.

The crucial problem here is given a heap node how do we find all the pointers to that node. In a two-space collector we can replace the heap node with a forwarding pointer to it's new location, and then trace all live nodes from the roots. However we don't have that option in a Mark/Sweep/Compact collector, we can't overwrite the original heap node because we need to remember what value should be copied to the new location.

The problem is the central theme of the Jonkers algorithm and is finessed using a particularly clever trick called 'threading'. The basic principle is to transform a pointer to a node from this:

  root                                                           node A
+--------+                                                  +------------+
|      ---------------------------------------------------->|    XXX     |
+--------+                                                  +------------+

into this:

  root                                                           node A
+--------+                                                  +------------+
|  XXX   |<---------------------------------------------------           |
+--------+                                                  +------------+

This is 'threading the root'. Now if we find another pointer pointing to node A we can 'thread' it in, so this:

  root                                                           node A
+--------+                                                  +------------+
|  XXX   |<---------------------------------------------------           |
+--------+                                                  +------------+
                    node B                                         ^
                 +----------+-----------+                          |
                 |    YYY   |         -----------------------------+
                 +----------+-----------+

becomes this:

  root                                                           node A
+--------+                                                  +------------+
|  XXX   |<-----------------------+                    +------           |
+--------+                        |                    |    +------------+
                   node B         |                    |          
                +----------+------|----+               |          
                |    YYY   |           |<--------------+
                +----------+-----------+


Forming a linked list from node A to all the pointers that point to node A.

Now when we want to update all the pointers to "node A" we simply follow the chain, updating as we go. When we reach the end of the chain we put the node's original value (XXX) back into the head of the node. Thus all the pointers to node have been moved to the node's new position and the node itself has been restored to it's original state (ready to be moved).

We can tell whether we've reached the end of the chain or not by looking at the bottom two bits of the value. All heap pointers are at least 4 byte aligned, so the bottom two bits are always 00. Node headers (XXX and YYY in our example) on the other hand always have one of the bottom two bits set (see Yhc/RTS/Heap). The exception is indirection nodes, but indirection nodes are always ignored by the mark phase and thus will never appear as a marked node.

The sweep phase is thus defined as:

    - thread all the program roots (stacks, globals and finfo constants).
    - scan linearly through the heap looking for marked nodes, for each:
        - calculate the new position of the heap node
        - update all the pointers to this node (using the thread-chain).
        - thread all the pointer arguments of this node.

Sweeping is thus O(N) where N is the total number of words in the heap (live or otherwise). However it is actually has a very favourable constant in most cases owing to using a trick with the markt able. When searching for nodes that are marked it scans 32 (or 64, depending on the word size) heap words at a time. It can do this because the marktable is a continous table of bits and it is effective because the vast majority of the heap is generally not marked, thus it often sees that the whole marktable word is 0 and skips all 32 words. For details see mkt_firstMarked in mark.c

We use an additional trick in the sweep phase that speeds up the move phase. When the sweep finds a block of nodes that are unmarked it replaces the first word of the first node with a pointer to the next node that is marked and sets the lower two bits set to the GC node flag (See Yhc/RTS/Heap).

The move phase then uses this to instantly skip all unmarked nodes, making the move phase O(L).

Move

The move phase does two things: it moves all nodes to their new positions and it performs any remaining threaded-pointer updates.

Looking carefully at the description of the sweep phase it can be seen that it will only update program roots and forward pointers in the heap (i.e. pointers to addresses higher than the address of the pointer). However the sweep phase will not have updated backward pointers, although it will have already threaded them.

The only slight complication comes when moving stack nodes as these have an internal pointer structure that needs to be preserved (the chain of stack frames). The move phase thus calls proc_moveStack for these special case nodes, which fixes the frame pointer chains in the stack to be correct for the stack nodes's new location.