GC Theory – Mark&compact

Up until know, in the previous posts, we’ve taken a look at how memory is retrieved by different GC methods. However, we did not look at the state, in which the memory is left after such process. Obviously this is a big deal – no matter how fast GC algorithm is, when it leaves memory in the gigantic mess of fragmentation, we make memory allocator’s job way harder. In this post we will take a look at the GC algorithm, that takes that under consideration – mark&compact algorithm, although we will mostly concentrate on the second phase – compacting.

Basics of compaction

Why is memory fragmentation that bad? Well, the answer is simple – when the memory is not fragmented, the allocator can do its job by simply checking if there’s enough space in the free memory for the requested amount of it, and just increase the ‘free memory pointer’ by the size of the request. In other words – it can work like simple stack. On the other hand – when we have multiple gaps in the heap (especially when they’re of different sizes), allocator’s job becomes way harder.

In the introduction to GC algorithms post, I’ve mentioned copying algorithm. It is a kind of compacting algorithm, but I will describe it in detail in the next post. What must be said is that it performs compacting by moving data from one heap region to the other one. Algorithms presented here, are doing compacting in the same region, by just moving data incrementally to the end of it.

Compacting can be done in a different manners – taking under consideration the order in which they were placed originally, or not. Here are the types:

  • arbitrary – it’s the simplest of them all. We just take the live data, remove gaps between them, and place them however we please. No logical or spatial order is kept.
  • linearising – objects are moved and then ordered based on the logical relations. Eg. objects in the same collection are placed next to each other.
  • sliding – we take live data, remove gaps between them, and then move them to the other part of the heap/region, preserving the original order of them. It is the most popular type in use nowadays.

In the following sections I will describe a couple of algorithms, that are covering as much ground as possible in terms of types/number of heap scans/etc. Let’s go!

The two-finger algorithm

We start with two-finger algorithm proposed by Saunders in 1974. It is presented mostly for the completeness sake, as this is simple arbitrary algorithm. It works as follows (I’m using the word heap, but this algorithm can also refer to the region):

  • 2 separate pointers – we start from the the opposite ends of the heap (hence 2 finger) – one pointer is named free, and the second live
  • find next free gap in the free section – we advance the free pointer to the place where we have first gap
  • find live object to move – we start with second finger, and we search first live object when found, move it – we take live object and we move it to the opposite side of the heap. In the previous cell, we put a forwarding address
  • repeat until two pointers meet
  • update references – when we’ve moved live objects from one part of the heap to the other, we go into second phase of compacting. We iterate over all the live objects, and check theirs’ references. If they point to the opposite side of the high-water threshold (or just threshold), we update them using forwarding addresses that we left there.
  • Viola! – we have compacted the memory

Code showing all of this can be found below (taken directly from ‘The GC handbook’).

 relocate(HeapStart, HeapEnd)
 updateReferences(HeapStart, free)
 relocate(start, end):
   free ← start 
   scan ← end  
   while free < scan

     while isMarked(free)  
       free ← free + size(free)                   /∗ find next hole ∗/ 
     while not isMarked(scan) && scan > free  
       scan ← scan − size(scan)                   /∗ find previous live object ∗/  
     if scan > free  
       move(scan, free) 
       *scan ← free /∗ leave forwarding address (destructively) ∗/  
       free ← free + size(free)  
       scan ← scan − size(scan)  

 updateReferences(start, end):  
   for each fld in Roots             /∗ update roots that pointed to moved objects ∗/  
     ref ← *fld  
     if ref ≥ end
       *fld ← *ref                   /∗ use the forwarding address left in first pass ∗/  
   scan ← start

   while scan < end                  /∗ update fields in live region ∗/  
     for each fld in Pointers(scan)  
       ref ← *fld  
       if ref ≥ end  
         *fld ← *ref                 /∗ use the forwarding address left in first pass ∗/

     scan ← scan + size(scan)        /∗ next object ∗/

I know that the code can be a little hard to visualise, therefore I’ve created this animation (please forgive lack of graphical talent). It shows a step-by-step process, with comments and pointers.

Now I’m sure it’s easier to see what’s going on, and how the algorithm works. I’ve mentioned above, that I use the word heap, but the algorithm can be used on regions too. That’s mostly due to the fact, that this algorithm works great for the objects of the same size. It’s cheap, and fast. In the older book it was mentioned, that there were usages of this algorithm for variable-sized cells (using different regions all right), but in the later book this is not mentioned at all. I assume it did not pass the test of time.

Lisp 2 algorithm

This algorithm is somehow based on the previous one, although it performs three heap/region traversals, not two. So far, it is seen as the fastest compaction algorithm there is, but that comes with a price. Every object in memory, must have a pointer-sized field in its header. It is needed to store forwarding address, however, due to that fact, we’re not limited to the specific size of the cells. As we’re operating on pointers, we can freely compact objects of varying size. Let’s see how it’s done.

After marking, first phase of compaction is the same as in the two finger algorithm, although, in this case we’re providing the threshold. Another difference is the update – we’re not writing forwarding address into the cell (and moving its contents before), but we just update the header of the object to be moved (all the code comes from ‘The GC handbook’).

computeLocations(start, end, toRegion):
    scan ← start
    free ← toRegion

    while scan < end
        if isMarked(scan)
            forwardingAddress(scan) ← free
            free ← free + size(scan)
        scan ← scan + size(scan)

In the second phase, we got through the objects to be moved, and we scan their fields. If there are pointers present, we update their values with appropriate forwarding address, that is stored in these object’s header.

updateReferences(start, end):
    for each fld in Roots /∗ update roots ∗/
        ref ← *fld
        if ref != null
            *fld ← forwardingAddress(ref)
    scan ← start /∗ update fields ∗/
    while scan < end    
        if isMarked(scan)
            for each fld in Pointers(scan)
                if *fld != null
                    *fld ← forwardingAddress(*fld)
        scan ← scan + size(scan)

The last phase is an actual move process – we just relocate the cells.

relocate(start, end):
    scan ← start
    while scan < end
        if isMarked(scan)
            dest ← forwardingAddress(scan)
            move(scan, dest)
        scan ← scan + size(scan)

Table-based methods

Before we start I must mention the fact, that this type of algorithms are present in the first book, and not in the second one. Although, I will present it here for completeness. With using break table, we can actually preserve the order, in which objects lived, without the need of keeping additional pointer field in every memory object. The idea here is, to keep the information about live data inside the free blocks! Information kept is the original location of the live data, and a total amount of occupied space found at the moment, when move occurred. After storing that information, we can freely move the live data to empty space.

The algorithm looks simple:

Compact_Table() = 
    nlive = mark[]

As usual graphical visualisation will help to show what we’re talking about (example taken from the book, with changed colors):

Updating pointers is as simple as:

To adjust a pointer p, the break table is searched for adjacent pairs (a, s) and (a’, s’) such that a <= p <= a’. The adjusted value of p will then be p – s. Although this, too, is apparently an n log n operation, matters can usually be improved. If there is sufficient space in the free area after the break table, a hash table can be constructed to improve searching. The k most significant bits of the pointer can be used as a hash key to look up the start and end of the region of the break table containing a and a’.

Threaded methods

To start with – this section is complicated. I’ve read chapters describing this in two books, and searched data on the web. And as usual – when I think I got the whole thing in my head right, it slips. I mean – I know what is happening, but the rationale of overall logic still slips. Although, I’ll try my best to present the method here.

In general the problem with previously mentioned algorithms is this – you either need additional memory to keep the forwarding addresses or number of heap/free store is increasing. That was also a reason to introduce table algorithms described in the previous section. Unfortunately, it requires three traversals of the heap. First, to try limiting that was Fisher in 1974, although the work by Jonkers is provided by books’ authors as canonical and less restrictive. Here’s a diagram showing a situation after marking, and before compacting. Objects A, B, C and D are pointing (they have pointer field) to some other object P (that holds simple scalar value).

In order to reduce searching for pointers in the heap (so search for objects A, B and C), an idea here is, to make a list, that can be traversable from P to the object A (which lies ‘as first’ in the heap). When we now where P will be moved (after compacting), we just go through the list (using pointers of course), and we update the values to point to the new location. But let’s not get ahead too much – below you can find an indicator how the list looks like. This relationship from P to A is called threading.

Jonkers algorithm imposes some restrictions on this process, such as:

  • pointers may only point to the header of cell
  • this must be large enough to contain an address
  • headers must contain values that are distinguishable from pointers into the heap (although pointers to other areas of memory are possible)

The whole algorithm is done in two steps – first, we update forward pointers (as in the example above, P is located ‘after’ objects ABC), and in the second go, we update backward pointers and move the values. I think that below pictures should show it step by step.

The algorithm in pseudo-code looks like this:


thread(ref):                                 /∗ thread a reference ∗/
    if *ref != null
        *ref, **ref ← **ref, ref

update(ref, addr):                           /∗ unthread all references, replacing with addr ∗/
    tmp ← *ref
    while isReference(tmp)
        *tmp, tmp ← addr, *tmp
    *ref ← tmp


    for each fld in Roots

    free ← HeapStart
    scan ← HeapStart

    while scan ≤ HeapEnd
        if isMarked(scan) 
            update(scan, free)               /∗ forward refs to scan set to free ∗/
            for each fld in Pointers(scan)
            free ← free + size(scan)
        scan ← scan + size(scan)

    free ← HeapStart
    scan ← HeapStart

    while scan ≤ HeapEnd
        if isMarked(scan)
            update(scan, free)               /∗ backward refs to scan set to free ∗/
            move(scan, free)                 /∗ slide scan back to free ∗/
            free ← free + size(scan)
        scan ← scan + size(scan)

One-pass algorithms

So far we had to settle for at least two separate heap traversals (not counting marking here), for the compacting phase. However, new generation of algorithms was invented – it’s mentioned in the newer book only – that perform computation only once.

Abuaiadh et al, and Kermany and Petrank both designed high performance mark&compact algorithms for multiprocessors that do precisely this. The former is a parallel, stop-the-world algorithm (it employs multiple compaction threads); the latter can also be configured to be concurrent (allowing mutator threads to run alongside collector threads), and incremental (periodically suspending a mutator thread briefly to perform a small quantum of compaction work).

Here I will follow only the stop-the-world solution, as parallel ones will be described later in the book in a dedicated chapter. The supplementary structures used in this algorithm are lists or vectors – during the marking, a specific bits are set, indicating whether the specific granule (as the authors put it – we can treat it as words) is live or not. Second helper structure here is offset vector, that stores forwarding addresses. Addresses of what? That’s the neat thing – collector divides the whole heap into small blocks (256/512 bytes), and keeps in this offset vector an address to the first live object in the specific block. In combination with bitmap vector, it’s possible to calculate a new address of any live object on the fly. The direct quote here will explain everything best:

After marking is complete, the computeLocations routine passes over the mark bit vector to produce the offset vector. Essentially, it performs the same calculation as in Lisp 2 but does not need to touch any object in the heap. For example, consider the first marked object in block 2, shown with a bold border in the picture. Bits 4–7, and 12–15 are set in the first block, and bits 6–11 in the second (in this example, each block comprises 16 slots). This represents 14 granules (words) that are marked in the bitmap before this object. Thus the first live object in block 2 will be relocated to the fourteenth slot in the heap. This address is recorded in the offset vector for the block (see the dashed arrow marked offset[block] in the figure).

The picture that shows this structure is presented below (directly copied from the book):

And to finish here – the code:

    computeLocations(HeapStart, HeapEnd, HeapStart)
    updateReferencesRelocate(HeapStart, HeapEnd)

computeLocations(start, end, toRegion):
    loc ← toRegion 
    block ← getBlockNum(start)
    for b ← 0 to numBits(start, end)−1
        if b% BITS_IN_BLOCK = 0                 /∗ crossed block boundary? ∗/
            offset[block] ← loc                 /∗ first object will be moved to loc ∗/
            block ← block + 1
        if bitmap[b] = MARKED
            loc ← loc + BYTES_PER_BIT           /∗ advance by size of live objects ∗/

    block ← getBlockNum(old)
    return offset[block] + offsetInBlock(old)

updateReferencesRelocate(start, end):
    for each fld in Roots
        ref ← *fld
        if ref 6 != null
            *fld ← newAddress(ref)
    scan ← start

    while scan < end
        scan ← nextMarkedObject(scan)           /∗ use the bitmap ∗/
        for each fld in Pointers(scan)          /∗ update references ∗/
            ref ← *fld
            if ref != null
                *fld ← newAddress(ref)
        dest ← newAddress(scan)
        move(scan, dest)


  • ‘Garbage Collection. Algorithms for automatic dynamic memory management.’ by
    Jones&Lins, chapter 5
  • ‘The Garbage Collection Handbook.’ by Jones&Hoskings&Moss, chapter 3
  • Lecture about mark&compact slides (starting from slide no. 34)

You Might Also Like

One Comment

  1. January 2022 summary – Bare.Metal.Dev

    […] – finished mark&compact article, and started writing next one about copying […]

Leave a Reply

Back to top