-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
Use a principled, and consistent, implementation of freelists. #100240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Size classesMost modern allocators use segregated freelists; one freelist per size-class. Other allocators use a similar scheme, although dlmalloc used by glibc uses power of 2 sizes (I believe). We allocate in mutliples of 1,2,3,4, for 16 size classes in total (adding another 4 for each additional power of 2 that we want to handle). ImplementationThe simplest implementation of the function mapping size to size-class is probably a lookup table: /* Declaring this const is vital, as it allows compilers to treat `size_to_size_class` as a pure function */
const uint8_t LOOKUP_TABLE[32] = { ... };
#define QUANTUM (2*SIZE_OF_VOID_P)
inline int size_to_size_class(intptr_t size) {
assert(size <= QUANTUM*32);
intptr_t size_in_quantum = (size + QUANTUM - 1)>>LOG_2_QUANTUM;
return LOOKUP_TABLE[size_in_quantum];
} |
This combines and updates our freelist handling to use a consistent implementation. Objects in the freelist are linked together using the first word of memory block. If configured with freelists disabled, these operations are essentially no-ops.
This combines and updates our freelist handling to use a consistent implementation. Objects in the freelist are linked together using the first word of memory block. If configured with freelists disabled, these operations are essentially no-ops.
We currently implement freelists for the following object and internal structs
All of these are implemented independently and rather inefficiently.
They take up 3672 bytes of space, instead of the ~200 bytes they should take.
This is not a lot in terms of memory, but it is a lot in terms of L1 cache.
A freelist should look like this:
Only one test is needed for allocation and deallocation (on the fast path).
Allocation needs to test
freelist.head != NULL
. Deallocation needs to testfreelist.space != 0
.The actual list is threaded through the objects on the list, terminated by
NULL
.Cache locality is good. The
head
andspace
are adjacent, and 4 freelists fit in a single cache line.When freeing, the object is hot (and thus in cache).
When allocating, the object is about to be used, so needs to be moved to cache anyway.
The
capacity
fields are there to allow the capacity of a freelist to be temporarily set to 0, ensuring that all allocations go through the main allocator, for use cases liketracemalloc
. Currentlytracemalloc
doesn't see a lot of allocations, due to freelists.Unifying the code for freelists reduces code duplication, and simplifies things for further improvements.
Original discussion
faster-cpython/ideas#132
Linked PRs
The text was updated successfully, but these errors were encountered: