Skip to content

gh-127266: avoid data races when updating type slots v2 #133177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions Include/internal/pycore_interp_structs.h
Original file line number Diff line number Diff line change
Expand Up @@ -677,8 +677,11 @@ struct _Py_interp_cached_objects {

/* object.__reduce__ */
PyObject *objreduce;
#ifndef Py_GIL_DISABLED
/* resolve_slotdups() */
PyObject *type_slots_pname;
pytype_slotdef *type_slots_ptrs[MAX_EQUIV];
#endif

/* TypeVar and related types */
PyTypeObject *generic_type;
Expand Down
2 changes: 1 addition & 1 deletion Include/internal/pycore_object.h
Original file line number Diff line number Diff line change
Expand Up @@ -313,7 +313,7 @@ extern int _PyDict_CheckConsistency(PyObject *mp, int check_content);
// Fast inlined version of PyType_HasFeature()
static inline int
_PyType_HasFeature(PyTypeObject *type, unsigned long feature) {
return ((FT_ATOMIC_LOAD_ULONG_RELAXED(type->tp_flags) & feature) != 0);
return ((type->tp_flags) & feature) != 0;
}

extern void _PyType_InitCache(PyInterpreterState *interp);
Expand Down
1 change: 0 additions & 1 deletion Include/internal/pycore_typeobject.h
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,6 @@ extern int _PyType_AddMethod(PyTypeObject *, PyMethodDef *);
extern void _PyType_SetFlagsRecursive(PyTypeObject *self, unsigned long mask,
unsigned long flags);

extern unsigned int _PyType_GetVersionForCurrentState(PyTypeObject *tp);
PyAPI_FUNC(void) _PyType_SetVersion(PyTypeObject *tp, unsigned int version);
PyTypeObject *_PyType_LookupByVersion(unsigned int version);

Expand Down
12 changes: 7 additions & 5 deletions Include/object.h
Original file line number Diff line number Diff line change
Expand Up @@ -620,6 +620,12 @@ given type object has a specified feature.
#define Py_TPFLAGS_HAVE_FINALIZE (1UL << 0)
#define Py_TPFLAGS_HAVE_VERSION_TAG (1UL << 18)

// Flag values for ob_flags (16 bits available, if SIZEOF_VOID_P > 4).
#define _Py_IMMORTAL_FLAGS (1 << 0)
#define _Py_STATICALLY_ALLOCATED_FLAG (1 << 2)
#if defined(Py_GIL_DISABLED) && defined(Py_DEBUG)
#define _Py_TYPE_REVEALED_FLAG (1 << 3)
#endif

#define Py_CONSTANT_NONE 0
#define Py_CONSTANT_FALSE 1
Expand Down Expand Up @@ -776,11 +782,7 @@ PyType_HasFeature(PyTypeObject *type, unsigned long feature)
// PyTypeObject is opaque in the limited C API
flags = PyType_GetFlags(type);
#else
# ifdef Py_GIL_DISABLED
flags = _Py_atomic_load_ulong_relaxed(&type->tp_flags);
# else
flags = type->tp_flags;
# endif
flags = type->tp_flags;
#endif
return ((flags & feature) != 0);
}
Expand Down
3 changes: 0 additions & 3 deletions Include/refcount.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,6 @@ immortal. The latter should be the only instances that require
cleanup during runtime finalization.
*/

#define _Py_STATICALLY_ALLOCATED_FLAG 4
#define _Py_IMMORTAL_FLAGS 1

#if SIZEOF_VOID_P > 4
/*
In 64+ bit systems, any object whose 32 bit reference count is >= 2**31
Expand Down
26 changes: 26 additions & 0 deletions Lib/test/test_descr.py
Original file line number Diff line number Diff line change
Expand Up @@ -4113,6 +4113,32 @@ class E(D):
else:
self.fail("shouldn't be able to create inheritance cycles")

def test_assign_bases_many_subclasses(self):
class A:
x = 'hello'
def __call__(self):
return 123
def __getitem__(self, index):
return None

class X:
x = 'bye'

class B(A):
pass

subclasses = []
for i in range(1000):
sc = type(f'Sub{i}', (B,), {})
subclasses.append(sc)

self.assertEqual(subclasses[0]()(), 123)
self.assertEqual(subclasses[0]().x, 'hello')
B.__bases__ = (X,)
with self.assertRaises(TypeError):
subclasses[0]()()
self.assertEqual(subclasses[0]().x, 'bye')

def test_builtin_bases(self):
# Make sure all the builtin types can have their base queried without
# segfaulting. See issue #5787.
Expand Down
5 changes: 3 additions & 2 deletions Lib/test/test_opcache.py
Original file line number Diff line number Diff line change
Expand Up @@ -583,6 +583,7 @@ class TestRacesDoNotCrash(TestBase):
# Careful with these. Bigger numbers have a higher chance of catching bugs,
# but you can also burn through a *ton* of type/dict/function versions:
ITEMS = 1000
SMALL_ITEMS = 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth investigating this further. I'm surprised there's a large slowdown in this PR, but not in the earlier version. What's the relevant difference?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I timed the most recent version of this PR vs "main", using SMALL_ITEMS = 1000. It seems the slowdown is not too bad now. I've left it at 100 but we could revert to 1000 if we want to.

I wasn't able to exactly pin-point the reason for the huge slowdown before. Based on "samply" profiling, it looked like there was a lot of contention for the TYPE_LOCK mutex and the STM mutex. Quite a lot of time was spent in _Py_yield(). That's kind of expected since the two writers are both trying to acquire both of those.

LOOPS = 4
WRITERS = 2

Expand Down Expand Up @@ -626,7 +627,7 @@ class C:
__getitem__ = lambda self, item: None

items = []
for _ in range(self.ITEMS):
for _ in range(self.SMALL_ITEMS):
item = C()
items.append(item)
return items
Expand Down Expand Up @@ -797,7 +798,7 @@ class C:
__getattribute__ = lambda self, name: None

items = []
for _ in range(self.ITEMS):
for _ in range(self.SMALL_ITEMS):
item = C()
items.append(item)
return items
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
In the free-threaded build, avoid data races caused by updating type slots
or type flags after the type was initially created. For those (typically
rare) cases, use the stop-the-world mechanism. Remove the use of atomics
when reading or writing type flags. The use of atomics is not sufficient to
avoid races (since flags are sometimes read without a lock and without
atomics) and are no longer required.
Loading
Loading