GC Theory – Allocation
Up until now, we were concentrating on two aspects of the memory management – identifying live data, and a way to free unused memory. This post will concentrate on possibly the most important thing in memory management – allocating the memory.
Two types of allocation
There are two simple types of allocation – sequential and free list (with variants). As sequential allocation is simpler, we will start with it.
The name is rather self-explanatory. We operate on the continuous chunk of memory, and we use as much space, as we see fit. The only limit there (related to the used computer architecture), is with memory alignment (padding), which requires data in memory to be aligned within word boundary. Such approach may be more suitable for allocators, that are operating within moving collectors – it’s easier for them to actually get a continuous chunk of memory.
The pseudocode for such allocation is as simple as this:
sequentialAllocate(n): result ← free newFree ← result + n if newFree > limit return null /∗ signal ‘Memory exhausted’ ∗/ free ← newFree return result
The second type of allocator is a free-list allocator, although it’s not always guaranteed that it’s actually a list used under the hood. From the conceptual point of view, we keep a set/list of free memory cells (their total size and initial address), no matter the actual implementation. This type of allocation can be later split into following categories.
- first fit – the simplest one. It will use the first possible cell, that can contain requested size to be allocated. If the data is smaller than the cell size, the rest of the cell (unused) can be returned to the free-list. A different strategies can be used here, to prevent doing so, if remaining space is too small, however, this algorithm usually ends up with memory having small free cells at the beginning of the list, with size of free cells growing while approaching the end of the list.
- next fit – it is a variation of first fit. Instead of starting the search from the beginning of the list, it continues the search for net free cell from the place when it was successful last time. When an end of the list is reached, it goes back to the beginning. This approach can result in heavy fragmentation, as objects from different execution times are mixed.
- best fit – this one is rather self-explanatory. The whole free-list is checked to match the best possible place for new allocation.
The above algorithms may not go that well with larger amounts of memory. In order to speed up the allocation, researchers came up with a couple of possible improvements:
- use balanced cartesian trees to keep the free-list
- use bitmap, with information about the smallest allocation unit possible per one bit
Fragmentation and segregated-fits allocation
No matter the solution used, sooner rather than later we will encounter free-list fragmentation. There’s no way to remove the possibility of it, unless we’re using copying collectors. So what to do while using marking collectors? As we can’t remove fragmentation, we may at least try to make it smaller. One way of doing so, is to use segregated-fits, which uses separate free-lists for different sizes of the objects. Usually it ends up with very small-grained distribution for smaller objects, and much larger sizes above the specific threshold.
With such approach we can avoid external fragmentation (when we have empty space between live data, and these cells are too small to contain any useful new data) easily, although, it introduces internal fragmentation, which appears due to size rounding. Therefore, there will be always some space wasted here.
How does segregated-fits work in practice? There are two main approaches for populating the free-list for specific size class. The first one is block-based allocation, and the second one is slicing/splitting. In the former, we operate on the memory block level (typically a power of 2) – when there’s a need for a couple of allocations for the size that is smaller than block size, we take the whole block, divide it equally by the requested size, and we assign this specific block to the free-list of the specific size. With that approach it was proved, that it’s better to keep the metadata out of the block (less cache misses, etc).
The second type of populating free-list in the segregated-fits is called splitting/splitting. When a need arises for a fresh memory, and there’s no free one in the specific-size free-list, a larger cell (we don’t operate on the block level here!) can be split, a specific size is used for the allocation, and the remaining cell can be possibly assigned to some other free-list for the resulting size. The variations of this algorithm is buddy-system or Fibonacci buddy-system. The actual usage of segregated-fits can vary, as all the combinations of it with the methods use above (eg. first-fit) are possible.
Other issues with allocation
Here is a list of the mentioned topics in this section of the book. They’re short, and I don’t want to rewrite the book or put 3-page long quote, so I’m just gonna leave them here as a reminder/indicator:
- size constraints
- boundary tags
- heap parsability
- wilderness preservation
- crossing maps
- ‘The Garbage Collection Handbook.’ by Jones&Hoskings&Moss, chapter 7
- Padding in C-structs