GC Theory – Reference counting explained

Reference counting is one of its kind GC algorithm. I’ve covered it briefly in the introductory post about GC theory. In this article, we’ll dive into it deeper. Let’s go!

How to deal with ref-counting deficiencies?

There are two main problems with ref-counting algorithms. First, it has problems with cyclic objects, resulting in possible memory leaks. Second problem lies in the storage itself – we need an additional space to keep the counters. However, before we go there, we will tackle a simple thing as a warm-up – possible long memory freeing times.

Non-recursive freeing

The simplest algorithm for ref counting assumes, that when the total count of references drops to zero, all the pointers that a specific object holds, will be freed recursively too. As simple as it is, the problem can arise, if the object we’re freeing is a big one (having a lot of pointers to other objects). Therefore, the processing time of freeing the memory can be quite long. A solution to this problem is to avoid recursion here.

When aforementioned situation happens (we want to free the object), we don’t go towards freeing all its references. We just mark the ‘main’ object as freed, and we put it on the free list. It’s not until the system wants to allocate new memory, using our memory cell. That’s the place and time when the cleaning of the memory actually happens. Unfortunately, we have to live with the fact, that there’s a lot of memory, that will be ‘officially’ seen as not free, while using this algorithm.

Deferred reference counting

The cost of updating the counter is quite high. It can be present even in the situation, when we just want to traverse the specific object. From the GC perspective there’s no difference – we may actually increase the counter of references to the specific object by doing only reads. Therefore, tt was suggested to limit the counting only to the parts of code, that was likely to actually change them. Unfortunately, in the long run it did not prove effective.

A different variation of the above was given by Deutsch and Bobrow. They’ve suggested, that a lot of memory operations is performed in the stack/register area. So there’s no need to update the counter for every operation that is happening there, because objects in the stack (called roots in this example) are usually short-lived (it’s a kind-of generation hypothesis known from Java Memory Model). The idea here is to defer (hence the name), the eventual memory freeing, until we are sure, that there’s no references to the existing object from other objects on the heap. In order to achieve that, every object which ref count drops to 0, is not reclaimed right away, but rather put in the ZCT (zero count table).

Once in a while, a kind-of stop-the-world reconciliation process/thread is run, and the memory is reclaimed (usually when trying to allocate new memory failed). We stop all the threads, then go through all the objects on the stack (remember – called roots), and we increment the counter for them. Then, the whole ZCT is scanned. If there’s an object there, with ref count equal 0, it means that there’s no reference to it from both the heap and the roots. That means the object can be safely deleted. When we’re done with ZCT traversal, we go back to the roots and decrement the ref count.

The above algorithm has proven useful, reducing the overhead of GC even by 80%. Unfortunately as shown above, there’s a cost of pausing the execution to clean up the ZGT.

Coalesced reference counting

Based on the deferred ref-counting, an innovative idea arose in 1999, authored by Levanoni and Petrank. I’ve mentioned it above – quite often counters on pointers are modified during simple reads (eg. traversing the collection). With every step a pointer’s ref-counter to one cell is increased, and then decreased. Aforementioned authors suggested, that it would be beneficial to just perform counter updates upon entry to the specific code block, and upon leaving it, skipping all the intermittent steps. Wikipedia summarises this:

Levanoni and Petrank showed in 2001 how to use such update coalescing in a reference counting collector. When using update coalescing with an appropriate treatment of new objects, more than 99% of the counter updates are eliminated for typical Java benchmarks. In addition, the need for atomic operations during pointer updates on parallel processors is eliminated. Finally, they presented an enhanced algorithm that may run concurrently with multithreaded applications employing only fine synchronization.

Limited-field reference counts

I’ve mentioned that one of the biggest problem with ref counting is the storage used to hold the counter. In the worst case scenario, it should be able to hold as much as the count of all the objects on the stack and heap. Obviously, such situation is impossible to come by in real life, but still the question arises – how large should it be? The studies suggested, that most of the objects are unique, and therefore they don’t need that much of counting possibilities (especially in the functional languages). It also enables to perform runtime optimizations such as move semantics (I’ve written an article about it in C++).

The simplest way to limit the size of the fields is to drop the ref counting process, after a specific threshold (or sticky – as it’s called) is reached. Simple pseudo-code taken from Jones&Lins book looks like this:

incrementRC(N) = 
    if RC(N) < sticky
        RC(N) = RC(N) + 1

decrementRC(N) = 
    if RC(N) < sticky
        RC(N) = RC(N) - 1

Obviously the problem arises – how to handle garbage collecting itself, when we finally reach sticky value? There’s no other way – we have to run a separate GC process, that will mark&sweep the objects, in order to free the memory. Book authors claim, that it is not as troublesome as it sounds, as we would have needed it anyway, to deal with cyclic dependencies.

The most radical solution of that type is one-bit ref-count, where we don’t care that much about the total amount of counts, rather than simple information – are there any references to specific objects present or not? We can keep this bit in the object, but it’s better to actually use pointers for that. The idea is simple – every time a new object is allocated in memory, a pointer to it is created. That pointer is marked as unique. As long as there’s no other pointer created, that references the same piece of memory, we know that if unique pointer counter drops to zero, we can safely reclaim the memory. If there’s more than one pointer used, they both have sticky status, and in order to properly reclaim the memory, again, a separate GC is needed.

Hardware ref counting

This concept appears only in the first book – ‘GC. Algorithms for automatic dynamic memory management’, and is not mentioned in the second one. I expect that it did not work well, due to the progress in the development of off-the-shelf architectures. In general the idea was to extract the counting to the hardware level. The researchers – Wise/Chang/Gehringer – built their own memory chip, that was doing the counting. Depending on the heap size, the increases in speed were up to 70%.

Cyclic ref counting

Second big problem with ref counting algorithms are cyclic data structures. Due to their heavy use in the programming, tackling this problem is crucial for GC algorithms. However, the authors claim in the first book, that despite a lot of algorithms was proposed, not one of them gained widespread adoption.

Functional programming languages

It was observed that in functional languages, cyclic data structures are easier to maintain, due to the fact of immutability. As long as the below conditions are met, simple amendments to the basic ref counting algorithm can tackle them:

  • circular structure is created all at once
  • any use of a proper subset of the cycle that does not include its root is copied as an independent structure rather than shared
  • cycle-closing pointers to the head of the cycle are tagged as such

Bobrow’s technique

Bobrow proposed distinguishing between cells by assigning them to the groups (that must be done by the programmer). In such case, we don’t operate on the cells level, but groups level, and we don’t count references between the cells that form the group, but between the groups themselves. If the group as a whole becomes unreachable, all the members of the group can be deleted. Unfortunately, as it works great for the inner cells cyclic relationships, it is not so great when one group members, reference members of other group.

Weak-pointer algorithms

The basic algorithm was proposed by Brownbridge, with further updates from others. In general, we start with simple pseudo-code:

New() = 
    if free_list == empty
        abort "Memory exhausted"
    newcell = allocate()
    SRC(R) = 1
    return strong(newcell)

All the cells hold two counters – for weak and strong references. Every time a new memory is allocated, a strong reference counter is set to 1. So far, so good. With every possible copy operation, a cycle may be introduced, and in that case, a weak reference count must be increased. That is to enforce two invariants of the algorithm:

  • active nodes are reachable from root via a chain of strong pointers
  • strong pointers do not form cycles

The main issue occur during removal of the objects. The whole pseudocode for the algorithm is presented below:

    delete(T) = 
        if is_weak(T)
            WRC(T) = WRC(T) - 1     // WRC means 'Weak Ref Count'
        else
            SRC(T) = SRC(T) - 1     // SRC means 'Strong Ref Count'
            if SRC(T) == 0 and WRC(T) == 0
                for U in Children(T)
                    delete(*U)
                free(T)
            else if SRC(T) == 0 and WRC(T) > 0   // that is the place where normal ref-count fails
                invert_strength(T)
                for U in Children(T)
                    suicide(T, *U)
                if SRC(T) == 0
                    for U in Sons(T)
                        delete(*U)
                    free(T)

    suicide(Start_node, S) = 
        if is_strong(S)
            if s == Start_node
                weaken(S)
            else if SRC(S) > 1
                weaken(S)
            else for T in Chuldren(S)
                suicide(Start_node, *T)

We will concentrate on the last commented piece. When for the node we got only weak pointers left, we must somehow find out, if we can safely delete it (and all its descendants). Obviously, a search must be performed within T descendants, trying to figure out whether there are other strong references to them. How it’s done, I think the direct quote from the book is the best solution:

First, all pointers to T are made strong. If the cell is not garbage, it is strongly reachable once more. However this action might have created stron cycles, so the data structure reachable from T is traversed (along strong pointers only) in order to identify and remove any strong cycles, as well as looking for external pointers. This
description of the algorithm begs two questions:

  • how can we decide if a pointer is strong or weak?
  • how can we efficiently turn all the weak pointers to T into strong ones?

Brownbridge provided an elegant solution to this dilemma. Each pointer and each object has an associated strength-bit. If a pointer and the object to which it is pointing have the same strength-bit value, then the pointer is strong. If the bit-values differ, then the pointer is weak. The strength-bit is also used to determine which of two reference counters is the SRC and which is WRC. To strengthen all weak pointers to T in a single operation we simply invert the value of T’s strength-bit.

Now the piece invert_strength(T) makes more sense. Entering suicide() function traverses all the descendants of T, and makes sure that we’re fine with the two invariants provided above. The problem here lies in a specific edge cases. Updates to the algorithm by Salkid, and later Pepels fixed them, although again additional marking scheme had to be introduced to make it work. The overall performance of this algorithm wasn’t great. In fact, compared to the first book, this algorithm takes maybe half of the page in the ‘The Garbage Collection Handbook’. I guess the performance was really that bad 😉

Partial Mark-Sweep Algorithms aka trial deletion

The first name in the above header was used in the first book. Trial deletion comes from the second one, as it is more robust and tried solution. This algorithm is called The Recycler, and was built upon the work of Martinez by Bacon and Ramjan. It builds upon two observations:

  • In any garbage pointer structure, all reference counts must be due to internal pointers (that is, pointers between objects within the structure)
  • Garbage cycles can arise only from a pointer deletion that leaves a reference count greater than zero

Therefore, trial deletion operates only on the part of the object/cell graph, where a starting point is a cell suspected to be garbage. It traverses the subgraph, temporarily decreasing the counters. If after this there are any counters that are higher than 0, obviously we’re not dealing with garbage. This process happens in three separate phases:

  • We start with the object/cell that is potential garbage – with every step internal pointers count are decreased. Objects are marked as grey.
  • Second phase traverses the graph again – if there’s any value in the counter besides 0 or lower, we can assume that some external pointer to the data exists, and obviously we’re not dealing with garbage. In that case, live objects are marked as black
  • Contrary to the second step – if the counter is 0 or lower – it’s garbage we’re having under our fingers, and it can be safely removed. The objects are marked white.

The authors mention, that usually this process is done in concurrent manner, although in the chapter discussed they present only the synchronous version (with parallel solution to be presented in later parts of the book). Additionally, to the colors mentioned above, the synchronous algorithms uses additional color to mark the objects – purple to make candidates for the roots in the first phase.

Summary

In general, ref counting is not that much popular in the general-purpose languages/systems, as tracing algorithms. In the literature of the subject, I’ve often stumbled upon the notion, that there’s an actual distinction between garbage collection, and ref counting – with the latter being not treated as a part of GC in general. However, the authors still provide a detailed explanations why ref counting is still around – especially in the specific environments, where ref counting can show its power. As I don’t want to repeat part of my previous post about GC, I will only list all the issues/concerns as a simple list:

  • ease of implementation
  • better performance in specific envs/architectures (complex ownership, no cyclic data structures)
  • space overhead disadvantage
  • actual overhead when running the program (although, distributed equally along the time)
  • locality of reference
  • cyclic data structures and all the mess they create

 

SOURCES:

You Might Also Like

Leave a Reply

Back to top