GC Theory – Basic algorithms
In the world of garbage collection, there are three algorithms, that are the founding stone for all the other. In this article I will try to briefly introduce all of them – reference counting algorithm, mark-sweep and copying algorithm. More detailed articles about them will follow later.
The reference counting algorithm
This algorithm is widely known – especially for its simplicity. In short – every cell residing in memory, besides its value, holds an information about the number of references to it (called reference count). Every time a new pointer to this specific cell is created, the ref count is increased, and in the same manner – decreased, when a pointer to the specific cell is destroyed. When the total number of ref count reaches zero, it means that the cell can be returned to the cell pool. A visual representation of reference counting can be seen on this page.
In general, this algorithm is as simple as possible, and therefore has some great advantages:
- it is working permanently – the cost of performing checks/updating RC are equally distributed within all the memory-allocating operations. There’s no stop-the-world situations. However, it’s important to note that removing bigger amounts of memory can be slower too.
- effective memory usage – as updates to the ref count are done in an instant, the memory can be immediately reclaimed, and used again. We don’t need to wait for the GC process to run. At the low-level of the memory, it can mean that we can avoid page faults more often, and also increases spatial locality
- direct cell copy – when we got a cell that has zero references to it, and there’s a need to get its copy (situation occurring in functional languages), we can just ‘borrow’ the pointer to it, and update relevant fields. In a way it resembles moving constructor in C++.
On the other side (as usual in life), there are severe disadvantages present too:
- overall cost of processing – with ref count the total cost is equally divided among the program’s execution, and we avoid stop-the-worlds. It does not change the fact, that the overall performance of the system/app can be greatly reduced by the necessity of updating the counters.
- tight-coupling to execution env/compiler – we must ensure, that every pointer-related operation is reflected with change in the counter. Any kind of error in this area can result in a catastrophe.
- additional memory spent – in order to actually keep the ref count we need some additional memory. In may systems this can be a problem.
- cyclic data structures problem – that’s probably the biggest disadvantage of them all. It is best pictured by the below image. In this situation, when reference from cell A is removed, there’s no way for ref counting algorithms to reclaim the cells represented with letters B-E, resulting in a memory leak.
The mark-sweep algorithm
A complete opposition to the ref count are tracing algorithms, with the best known mark-sweep algorithm. At its core the idea is simple. We don’t keep track of every memory cell. When an object/variable/cell are abandoned, and there’s no references to it, we just continue with the work. Unless there’s a need for fresh memory to be allocated, and there’s no left in the free memory pool. In such situation we stop the application (aka. stop the world), we traverse the whole cells/variables/object tree, and mark which cells can be removed. When this part of the process is done, a second one starts – sweep. As its name suggests, it actually performs removal of the data, and is returning reclaimed memory to the pool.
Pseudocode for the above could look like this (example from ‘Garbage Collection’ book):
mark_sweep() = for R in Roots mark(R) // mark objects for removal sweep() // clear the memory if free_pool is empty // error handling abort "Memory exhausted"
Obviously this algorithm (details will be discussed in the following articles) has both advantages and disadvantages.
- start/stop – I’ve already mentioned this. With usage of this type of algorithms, there’s usually a pause in the application operation. Although, nowadays the algorithms evolved, and they try to keep pauses length to minimum, some non-operation time is always present.
- overall cost of processing based on size – with the need to traverse all the memory tree of the application, if the amount of used memory is large, the GC process takes longer and longer.
- single overload event – there’s no overhead on every memory-altering operation present as in
ref counting. GC happens only once in a while.
The copying algorithm
This type is a second type of tracing algorithm, although it has a different approach than mark-sweep algorithm. With this type, we divide the heap into two separate parts. When one part of the heap is getting full (the one being used by the application), the GC goes through it, and whenever it finds an active object, it moves it to the second part of the heap.
That way there’s no 2 phases of GC, but only one. Marking and copying is happening in the same cycle, reducing an overhead. What is more – contrary to the algorithms mentioned above – we avoid actual heap fragmentation with this approach. Every time an object/cell is being copied from one part of the heap to the other, we put the data right next to the data that are already on the heap (just increasing the pointer with the size of data being copied). When all the current heap part used is copied, a flip is performed, and the ‘fresh’ heap part is being used by the application. The other part can be wiped clean and wait for its time.
What are the cons? The same as with mark-sweep – there’s still a pause present, although this time it is shorter, as only half of the heap is processed. What is more, we may have a lot bigger amount of page faults, as we’re ‘touching’ memory cells twice as often (due to heap division). On the other hand, this algorithm reduces memory fragmentation (especially in the systems with virtual memory we avoid page misses in the long run and CPU caches can be utilized better). Above statements may seem like the total opposite ones, but they’re not. We decrease the amount of page faults during application operation, but, on the other hand we may increase their amount during GC phase.
Ok, so which one is the best?
The only reasonable answer to this question is – ‘it depends’. The decision to choose the specific algorithm should always be based on the application requirements or architectural drivers. Are we fine with longer pauses? Do we need as much memory as possible? Am I utilising caching and want to avoid page misses? All these questions are valid and must be taken under consideration. What is more – the above types are ‘pure one’, presented here with simple implementations just for presentation purposes. Real-world implementations are way more complicated, and they already incorporate many improvements that arose through the years. Before making a decision it is wise to answer the above questions, and when in doubt – measure, measure, measure.