Use a principled, and consistent, implementation of freelists. #100240

markshannon · 2022-12-14T12:29:39Z

We currently implement freelists for the following object and internal structs

str
float
tuple (about 20 freelists, which is really wasteful)
list
async generator
contexts
small dicts
small dict keys
slice (a freelist of one)

All of these are implemented independently and rather inefficiently.
They take up 3672 bytes of space, instead of the ~200 bytes they should take.
This is not a lot in terms of memory, but it is a lot in terms of L1 cache.

A freelist should look like this:

typedef struct {
    void *head;
    uint32_t space;
    uint16_t current_capacity;
    uint16_t limit_capacity;
} _PyFreelist;

Only one test is needed for allocation and deallocation (on the fast path).
Allocation needs to test freelist.head != NULL. Deallocation needs to test freelist.space != 0.

The actual list is threaded through the objects on the list, terminated by NULL.

Cache locality is good. The head and space are adjacent, and 4 freelists fit in a single cache line.
When freeing, the object is hot (and thus in cache).
When allocating, the object is about to be used, so needs to be moved to cache anyway.

The capacity fields are there to allow the capacity of a freelist to be temporarily set to 0, ensuring that all allocations go through the main allocator, for use cases like tracemalloc. Currently tracemalloc doesn't see a lot of allocations, due to freelists.

Unifying the code for freelists reduces code duplication, and simplifies things for further improvements.

Original discussion

faster-cpython/ideas#132

Linked PRs

The text was updated successfully, but these errors were encountered:

iritkatriel · 2023-01-31T16:34:36Z

Closing #89738 as duplicate of this.

See experimental PR at #29165.

markshannon · 2023-02-03T11:10:27Z

Size classes

Most modern allocators use segregated freelists; one freelist per size-class.
I believe jemalloc uses 4 size classes per power-of-2:
1,2,3,4,
5,6,7,8,
10,12,14,16,
20, 24...

Other allocators use a similar scheme, although dlmalloc used by glibc uses power of 2 sizes (I believe).

We allocate in mutliples of sizeof(void*) *2 (for C alignment reasons) and the stats show that 99.5% of allocations are size <= 512 bytes (this might be an artifact of the benchmark suite, so we might want to have freelists to 1k or 2k).
512 bytes is (on a 64bit machine) 32 units of allocation, giving us the following size classes (in units of sizeof(void*) *2)

1,2,3,4,
5,6,7,8,
10,12,14,16,
20,24,28,32

for 16 size classes in total (adding another 4 for each additional power of 2 that we want to handle).

Implementation

The simplest implementation of the function mapping size to size-class is probably a lookup table:

/* Declaring this const is vital, as it allows compilers to treat `size_to_size_class` as a pure function */
const uint8_t LOOKUP_TABLE[32] = { ... };

#define QUANTUM (2*SIZE_OF_VOID_P)
inline int size_to_size_class(intptr_t size) {
    assert(size <= QUANTUM*32);
    intptr_t size_in_quantum = (size + QUANTUM - 1)>>LOG_2_QUANTUM;
    return LOOKUP_TABLE[size_in_quantum];
}

This combines and updates our freelist handling to use a consistent implementation. Objects in the freelist are linked together using the first word of memory block. If configured with freelists disabled, these operations are essentially no-ops.

markshannon added the performance Performance or resource usage label Dec 14, 2022

markshannon self-assigned this Dec 14, 2022

kumaraditya303 mentioned this issue Jan 24, 2023

asyncio with two interpreter instances #91375

Closed

iritkatriel mentioned this issue Jan 31, 2023

Use a more principled approach to freelists #89738

Closed

bedevere-bot mentioned this issue Feb 3, 2023

GH-100240: Generic freelist, applied to ints #101453

Closed

iritkatriel added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Nov 27, 2023

bedevere-app bot mentioned this issue Jul 17, 2024

gh-100240: Use a consistent implementation for freelists #121934

Merged

colesbury added a commit to colesbury/cpython that referenced this issue Jul 18, 2024

Merge branch 'main' into pythongh-100240-freelist

aa8e4d5

markshannon closed this as completed Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a principled, and consistent, implementation of freelists. #100240

Use a principled, and consistent, implementation of freelists. #100240

markshannon commented Dec 14, 2022 •

edited by bedevere-app bot

Loading

iritkatriel commented Jan 31, 2023

markshannon commented Feb 3, 2023 •

edited

Loading

Use a principled, and consistent, implementation of freelists. #100240

Use a principled, and consistent, implementation of freelists. #100240

Comments

markshannon commented Dec 14, 2022 • edited by bedevere-app bot Loading

Original discussion

Linked PRs

iritkatriel commented Jan 31, 2023

markshannon commented Feb 3, 2023 • edited Loading

Size classes

Implementation

markshannon commented Dec 14, 2022 •

edited by bedevere-app bot

Loading

markshannon commented Feb 3, 2023 •

edited

Loading