GC Theory – Copying algorithm
The last ‘main’ of the GC algorithms that I’ve mentioned in the first post, was copying algorithm. The time has come then to dedicate a separate article to it.
Basics of copying
Algorithms already described suffer either from their execution time length, or resulting memory fragmentation. Copying GC algorithms are trying to address these issues, although (as usual) not without a cost. In general – we are limiting ourselves to (maximum!) half of the memory, we could have at our disposal. In the times when cost of RAM is cheap, and adding it is easy, it may not sound like that much big of a deal, however with the advent of cloud computing and FAAS solutions, that actually may be a problem. Let’s go back to the beginning though.
Simple copying GC that I’ve mentioned before has a couple of great pros:
- no fragmentation – after copying one part of the heap to the other, there are no gaps between data. Therefore, our free space, can be technically used as a stack, by just advancing the free pointer.
- the cost of copying – the overall cost of the operation depends only on the size of data surviving the GC process. As we will see in the future (discussing generational algorithms) in general objects die young. Using copying takes advantage of that.
- less page faults – with compacting performed every time, the overall program data lies in very limited address range in memory, which prevents page faults, and increases locality in general.
In general, copying algorithms are dividing free space into two separate pieces. With more modern approaches, we don’t have to tackle the whole heap at once. It’s possible to have regions extracted, and operate on them. However, no matter the division schema, the overall idea is the same. For the sake of simplicity we will stay with whole-heap approach.
A conceptual algorithms for copying looks like this:
createSemispaces(): tospace ← HeapStart extent ← (HeapEnd − HeapStart) / 2 /∗ size of a semispace ∗/ top ← fromspace ← HeapStart + extent free ← tospace atomic allocate(size): result ← free newfree ← result + size if newfree > top return null /∗ signal ‘Memory exhausted’ ∗/ free ← newfree return result
At first, we create semispaces (aforementioned halves of the heap) – tospace, and fromspace. As long as there’s room in the tospace, we just use it to store new objects. When it run out of space, we have to perform a GC. As with every algorithm, we need to have working list present, that will keep track of the objects marking and moving. Simple algorithm presented by Fenichel-Yochelson in the 60s, used stack-based approach for keeping the list. Unfortunately, it could suffer from stack overflow in the corner cases, so a more robust algorithm was devised by Cheney.
Cheney’s algorithm uses tricolour abstraction, to keep track of the copying process. In this approach, the colors mean:
- black – the object and its descendants was already visited, and the job was done.
- grey – the object was visited, but it’s not done. Not all the descendants were checked by the algorithm.
- white – the object was not visited. All objects of that color at the end of the GC are garbage, and can be removed.
The whole process looks like this – we start with flipping tospace with fromspace. The roots located in the new fromspace are scanned, and if the object is live, we move it to tospace, marking it grey. Then, we iterate over this object, scanning for every pointer it contains. When such pointer is found, we copy it from fromspace, and place just next to the object in tospace.
At the same time in the fromspace, we put forwarding address, overriding user data in the previous location (we can do that, as we have copied the data already). In the tospace, after copying a child data of the original object, we advance scan pointer, which separates all black nodes, from grey ones. At the same time free pointer, indicates where free memory begins (so obviously it’s located afer all the grey nodes). The whole process progresses, and when these both pointers meet, it means that we’re done. Everything in the fromspace can be just discarded.
The actual algorithm looks like this:
atomic collect(): flip() initialise(worklist) /∗ empty ∗/ for each fld in Roots /∗ copy the roots ∗/ process(fld) while not isEmpty(worklist) /∗ copy transitive closure ∗/ ref ← remove(worklist) scan(ref) flip(): /∗ switch semispaces ∗/ fromspace, tospace ← tospace, fromspace top ← tospace + extent free ← tospace scan(ref): for each fld in Pointers(ref) process(fld) process(fld): /∗update field with reference to tospace replica ∗/ fromRef ← *fld if fromRef != null *fld ← forward(fromRef) /∗ update with tospace reference ∗/ forward(fromRef): toRef ← forwardingAddress(fromRef) if toRef = null /∗ not copied (not marked) ∗/ toRef ← copy(fromRef) return toRef copy(fromRef): /∗ copy object and return forwarding address ∗/ toRef ← free free ← free + size(fromRef) move(fromRef, toRef) forwardingAddress(fromRef) ← toRef /∗ mark ∗/ add(worklist, toRef) return toRef
As usual – image says more than a thousand words. Below you can find graphical representation of the process, taken from the second book directly:
The topic mentioned here is quite vast, and second book adds even more complexity to it (with more modern studies and results). I’m not going to write that much about all of them, as it would mean rewriting the book word by word – the matters are really packed with examples/names/studies/techniques. If you’re interested in GC theory on the level that will allow you to work/experiment in that area, you will undoubtedly buy the books I’m using. Then, it’s not a problem to revisit appropriate chapters. Here, I will just use first book as a reference of what are the main issues, but without going full-metal-armor on all the possible solutions.
The main idea for the spatial locality lies in the purpose of the GC. It’s not only there to reclaim unused memory, but also to make a mutator job easier. It means some application of the scout’s rule,which means leaving the memory in the better state, that it was before. With mark&sweep&compact algorithms, that can be a problem. These algorithms are limited to the memory layout that is available to them. We may be clever about usage of memory gaps, but we will never have the flexibility, that comes with the clean slate of the free semispace. In that regard, copying algorithms have a possibility, to really make a difference. Copying live data can be done in a specific way, that will help mutator in the long run.
Unfortunately, as nice as it sounds, there are issues to consider. As Petrank&Rawitz prooved, finding right object layout while copying is NP-complete, meaning there’s no way to actually know/calculate it. All we have left are heuristics, such as:
- use past behaviour of the objects to predict the future
- preserve allocation order
- place children near the parents (more on that later)
When allocator assigns memory for some object, it usually places it in the neighbouring addresses. If we consider a list ob objects being stored, it will usually occupy adjacent memory cells. What happens, if eg. Cheney algorithm is used? As it performs breadth-first scan, it will place near rather distant cousins, than parents and children. That approach introduces additional probability of page misses, which are waaaay worse for the operating speed, than any kind of latency we may have while moving things in RAM.
In order to address that, we can use different strategies while copying the objects to tospace, the natural choice being breadth-first or depth-first. Every one of them comes with some baggage, and that’s the part of the books that will require 1:1 copy&pasting to describe. I will just leave a below list with names/algorithms’ names for you to follow:
- approximately depth-first copying – Moon
- hierarchical decomposition – Wilson&Lam&Moher
- it’s preferred to breadth-first – Stamos&Blau
- stackless recursive copying collection – Thomas&Jones
- mixed approach
- online object reordering – Huang et al
- generational collectors – Chen et al, Chilimbi&Larus
- static reordering – Wilson et al, Lam et al, Novark et al, Shuf et al
- ‘Garbage Collection. Algorithms for automatic dynamic memory management.’ by Jones&Lins, chapter 6
- ‘The Garbage Collection Handbook.’ by Jones&Hoskings&Moss, chapter 4