Theory of GC / 2022-01-31

GC Theory – Generational algorithms

As a Java programmer, the concept of GC is not new to me. However, as long as I remember, the concept of GC in Java was tightly coupled with Java Memory Model and its generational hypothesis. It reached the point where I actually could put an equal sign between JMM and GC process. Of course, they’re not the same (as all my previous articles showed), yet, the time has finally come to take a look at the generational GC.

Why generations?

We’ve seen already, that objects differ – some of them die young, while some of them reside in memory for the whole program’s duration. Performing regular mark/scan/copy operations on long-lived objects seems like a terrible waste of space. It’s especially painful in copying collectors – shuffling these objects back and forth is fruitless.In order to deal with this inefficiency, a weak generational hypothesis comes to the rescue. It assumes that most of the objects die young, and therefore putting most resources on collecting them, should bring the biggest profit. This hypothesis is true, and dividing the heap into smaller regions called generations is a widely adopted technique.

An actual number of regions, and the policies used to promote/tenure objects from younger generation to older one, are usually as simple as using copying collector on them. It makes sense, as we expect large percent of them to die, copying process will be short and fast. It also provides all the benefits of copying collector, like decreasing fragmentation. However, this separation of generations comes with an additional cost on the mutator. Looking from old generation perspective, the garbage there won’t be cleaned during collecting of young generation, and vice versa. Therefore, we need to keep an additional log/register of some kind (the authors are using term remset (remembered set)), that keeps track of the inter-generation connections. Without it, we could be deleting objects from young generation, while they’re still being referenced in the old generation.

Technical aspects of time and space

Generational GC is based on the notion of young/old objects. Automatically the question arises – how do we define which object is young, and which one is old? We may think that simple time measurement should be enough, unfortunately that’s not the case – storing time units takes more space, and does not reflect the fact, that in multi-threaded environments the time somehow stops, when the thread is idle. Authors mention bytes allocated since creation way, but right away discard it as hard to maintain in multi-threaded systems too. To resolve this problem, a mixed approach is used – usually a total number of survived GC cycles is measured. It requires less space (as it’s a simple unsigned integer value), and is somehow immune to multi-threading issues.

Up until know, we’ve operated under assumption, that weak generational hypothesis is correct, but provided no clear evidence of that. To provide some data to back the hypothesis up, I’m pasting here a direct quote from the second book, that gives some numbers – below quote is somehow edited, I’ve concentrated on more recent languages/studies (leaving year-links to help with eventual search for the original studies).

It also holds for many programs written in object-oriented languages. Ungar [1986] found that less than 7% of Smalltalk objects lived beyond 140 kilobytes. Dieckmann and Holzle [1999] reported that the volume of Java objects in the> SPECjvm98 benchmark suite surviving 100 kilobytes varied between 1% and 40%, and that less than 21% lived beyond one megabyte although the proportion varied significantly from one application to another. Blackburn et al [2006a] found that on average less than 9% of objects allocated by the SPECjvm98 and DaCapo benchmark suites escaped from a four megabyte nursery, although there was wide variation between benchmarks; note that this is an upper bound on the volume of objects living longer than four megabytes since some escapees may have been allocated only shortly before the nursery was collected. Jones and Ryder [2008] found bimodal lifetime distributions common in Java applications; between 65% and 96% of DaCapo objects survived no longer than 64 kilobytes, with 3% to 16% being immortal or living longer than four megabytes. Even in imperative languages without automatic memory management support, the lifetimes of many objects are short. Barrett and Zorn [1993] reported that more than 50% of allocated data died within ten kilobytes and less than 10% survived 32 kilobytes.

To finish this subchapter a layout for generations must be mentioned, before we dive into the details. In general – we use generational GC in order to improve the memory usage, and reduce latency. With dividing the heap into generations, we can improve the GC performance, however, a lot of questions arise here. What should be the structure? Should we just use simple dichotomy for young/old generations or should there be more granular regions used for the young gen? Remember that promoting objects to old gen, usually puts additional work on the mutator (with write barrier, as we have to keep the inter-generation links somewhere).

Second thing to consider is the size of young gen – if it’s small, the cleaning process will be fast, unfortunately it will reclaim only a limited amount of memory. As a result, the the young gen GC will be performed more frequently, stealing CPU time from the actual program’s work.

Last consideration here is promotion time – if we make young objects to be promoted to the old gen too fast, their usual high update frequency, would put additional pressure on the GC for the old gen. As you can see then, there’s a lot of things to consider while designing generational structure. Right now we will look at more technical details.

Multiple generations details

As I’ve described above, age recording is quite important thing in the generational GC, as it indicates whether the object should be promoted or not. In addition to deciding after what time the object should be promoted, there’s a question of how to actually do this. There are four distinctive ways:

en masse promotion – during GC process of young gen, we don’t measure the time at all. As long as the data is live, we promote it. Studies have shown that this can be a wasteful approach, although it’s very simple to implement.
aging semispaces – before promoting, live data in the young gen is copied a couple of times within the young gen space, in order to filter out dead data in the process. When some specific threshold of copies is reached, all the live data is copied en masse to the old gen. Unfortunately, there are still some freshly created object being prematurely promoted.
semispaces with age mark – used by Sun ExactVM (as the authors claim), where semispaces were used as in the above example, but additional age data was kept in every object header. That reduced the amount of premature promotions, unfortunately it had a performance impact due to checks performed.
Shaw’s bucket brigade system – it’s based on multiple buckets, that serve as semispaces, but they’re also playing a role of guards. There’s no way that fresh data is allocated to the bucket, that is directly promoted to the old gen. What is more, the last bucket (for the oldest objects in young gen) is a neighbour of old gen. If there’s a need for promotion, it can be just added to the old gen by just changing old gen address’ boundaries.

To end this topic, I have to mention the idea mentioned in the first book, but not in the second one. It is based on the difference in object sizes, especially if the object is possibly plain binary data (eg. image), without any pointers to other data. To avoid high cost of copying such data, it’s best to reserve a separate space for them, or divide them into header (containing pointers/metadata/etc), and an actual data, which can be scattered in multiple places.

Inter-generational pointers

It was mentioned above, that in generational GCs, there’s always a problem with finding all the roots. For any other algorithm, we just scan registers and stack to find them. When generations are involved, we can’t limit ourselves to the aforementioned places – we need to keep track of objects from different generations, that may point to the object within the generation we’re trying to clean up. In order to do that, a auxiliary data structure must be used.

First data structure of that kind was introduced by Liebermann&Hewitt, and it was called entry table. It was stored in every generation (except the oldest one), and was storing information about every reference to the specific generation, from any older one. Therefore, data in the entry table contained not the address of the older object, but only some identifier and a reference to the current’s generation object. At the same time, object in the old gen hold a pointer to the entry table, not to the object itself. That way, scanning of inter-generational relationships during collection was faster, and limited only to the entry table, not to the all older generations. Unfortunately, keeping the table up-to-date was expensive.

Other way to tackle inter-generational pointers, was aforementioned mechanism of remembered sets. Instead of relying on storing every connection between objects in older gen to the younger ones, remembered set only stores information about the source of the relationship (eg. keeping an address of the object in old gen, that has a reference to young gen). Mind the source of the relationship usage, instead of the direct pointer/memory address. That way, no matter the amount of pointers to younger generation that object has, it’s stored only once in the remembered set, reducing its size.

In both described approaches, it’s necessary to implement some kind of write barrier, which will update the auxiliary data structure (during mutator execution or moving the objects around within generation(s)). The necessity of such barrier increases the complexity of the whole process, and also plays a significant role in estimating performance of the GC algorithm, as the effectiveness of write barrier depends on the mutator behaviour. However, it’s also possible to reduce the amount of stored connections, especially if the GC ensures, that with every old gen clean, younger ones will be cleaned-up too. In such situation, we don’t need to store old -> young relationships at all.

Other generational concerns

Again, the rest of the book’s chapter, contains high-density information about handling the older generation. As I don’t want to rewrite the book here (via direct quotes), I am just leaving a list of topics that were mentioned – I assume that if you need that level of knowledge, you have bought the book already and can read the necessary pages.

space management of older generation – with mixture of different collecting techniques
older-first garbage collector – ways to increase the performance by concentrating on middle-aged objects (a lot of types and solutions)
beltway – a framework design by Blackburn et al, I recommend reading this subchapter!
pretenuring – this means putting objects right away in their target generation (by default we’re talking old gen here)

SOURCES:

‘Garbage Collection. Algorithms for automatic dynamic memory management.’ by Jones&Lins, chapter 7
‘The Garbage Collection Handbook.’ by Jones&Hoskings&Moss, chapter 9
Haskell docs about remembered sets

One Comment

January 2022 monthly summary – Bare.Metal.Dev
2022-02-02 at 15:12
Log in to Reply

[…] 31.01.2022 – published an article about generational GC. […]

GC Theory – Generational algorithms

Why generations?

Technical aspects of time and space

Multiple generations details

Inter-generational pointers

Other generational concerns

You Might Also Like

GC Theory – Allocation

GC Theory – Copying algorithm

GC Theory – Mark&compact

One Comment

January 2022 monthly summary – Bare.Metal.Dev

Leave a Reply Cancel