Skip to content

ctypes: Switch field accessors to fixed-width integers #127295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
encukou opened this issue Nov 26, 2024 · 1 comment
Closed

ctypes: Switch field accessors to fixed-width integers #127295

encukou opened this issue Nov 26, 2024 · 1 comment
Assignees
Labels
extension-modules C modules in the Modules dir topic-ctypes type-feature A feature request or enhancement

Comments

@encukou
Copy link
Member

encukou commented Nov 26, 2024

Feature or enhancement

I believe the next step toward untangling ctypes should be switching cfield.c to be based on fixed-width integer types.
This should be a pure refactoring, without user-visible behaviour changes.

Currently, we use traditional native C types, usually identified by struct format characters when a short (and identifier-friendly) name is needed:

  • signed char (b) / unsigned char (B)
  • short (h) / unsigned short (h)
  • int (i) / unsigned int (i)
  • long (l) / unsigned long (l)
  • long long (q) / unsigned long long (q)

These map to C99 fixed-width types, which i propose switching to:

  • int8_t/uint8_t
  • int16_t/uint16_t
  • int32_t/uint32_t
  • int64_t/uint64_t

The C standard doesn't guatrantee that the “traditional” types must map to the fixints.
But, ctypes currently requires it, so the assuption won't break anything.

By “map” I mean that the size of the types matches. The alignment requirements might not.
This needs to be kept in mind but is not an issue in ctypes accessors, which explicitly handle unaligned memory for the integer types.

Note that there are 5 “traditional” C type sizes, but 4 fixed-width ones. Two of the former are functionally identical to one another; which ones they are is platform-specific (e.g. int==long==int32_t.)
This means that one of the current implementations is redundant on any given platform.

The fixint types are parametrized by the number of bytes/bits, and one bit for signedness. This makes it easier to autogenerate code for them or to write generic macros (though generic API like PyLong_AsNativeBytes is problematic for performance reasons -- especially compared to a memcpy with compile-time-constant size).

When one has a different integer type, determining the corresponding fixint means a sizeof and signedness check. This is easier and more robust than the current implementations (see wchar_t or _Bool).

The refactoring can pave the way for:

  • Separating bitfield accessors, so bitfield logic doesn't slow down normal access. (This can be done today, but would mean another set of nearly-identical hand-written functions, which is hard to maintain, let alone experiment with. We need more metaprogramming.)
  • Integer types with arbitrary size & alignment (useful for getting the __int128 type, or matching other platforms). This would be a new future feature.

Linked PRs

@encukou encukou added type-feature A feature request or enhancement topic-ctypes labels Nov 26, 2024
@encukou encukou self-assigned this Nov 26, 2024
@picnixz picnixz added the extension-modules C modules in the Modules dir label Nov 26, 2024
@encukou
Copy link
Member Author

encukou commented Nov 27, 2024

@abalkin, @amauryfa, @meadori, you are the listed ctypes experts.
I'd like to maintain and improve the module more actively, including architectural changes like this one.
Are you OK with that? If so, how much do you want to stay in the loop?

encukou added a commit that referenced this issue Dec 20, 2024
…-127297)

This should be a pure refactoring, without user-visible behaviour changes.

Before this change, ctypes uses traditional native C types, usually identified 
by [`struct` format characters][struct-chars] when a short (and 
identifier-friendly) name is needed:
- `signed char` (`b`) / `unsigned char` (`B`)
- `short` (`h`) / `unsigned short` (`h`)
- `int` (`i`) / `unsigned int` (`i`)
- `long` (`l`) / `unsigned long` (`l`)
- `long long` (`q`) / `unsigned long long` (`q`)

These map to C99 fixed-width types, which this PR switches to: - 
- `int8_t`/`uint8_t`
- `int16_t`/`uint16_t`
- `int32_t`/`uint32_t`
- `int64_t`/`uint64_t`

The C standard doesn't guarantee that the “traditional” types must map to the 
fixints. But, [`ctypes` currently requires it][swapdefs], so the assumption won't 
break anything.

By “map” I mean that the *size* of the types matches. The *alignment* 
requirements might not. This needs to be kept in mind but is not an issue in 
`ctypes` accessors, which [explicitly handle unaligned memory][memcpy] for the 
integer types.

Note that there are 5 “traditional” C type sizes, but 4 fixed-width ones. Two of 
the former are functionally identical to one another; which ones they are is 
platform-specific (e.g. `int`==`long`==`int32_t`.) This means that one of the 
[current][current-impls-1] [implementations][current-impls-2] is redundant on 
any given platform.


The fixint types are parametrized by the number of bytes/bits, and one bit for 
signedness. This makes it easier to autogenerate code for them or to write 
generic macros (though generic API like 
[`PyLong_AsNativeBytes`][PyLong_AsNativeBytes] is problematic for performance 
reasons -- especially compared to a `memcpy` with compile-time-constant size).

When one has a *different* integer type, determining the corresponding fixint  
means a `sizeof` and signedness check. This is easier and more robust than the 
current implementations (see [`wchar_t`][sizeof-wchar_t] or 
[`_Bool`][sizeof-bool]).


[swapdefs]: https://github.com/python/cpython/blob/v3.13.0/Modules/_ctypes/cfield.c#L420-L444
[struct-chars]: https://docs.python.org/3/library/struct.html#format-characters
[current-impls-1]: https://github.com/python/cpython/blob/v3.13.0/Modules/_ctypes/cfield.c#L470-L653
[current-impls-2]: https://github.com/python/cpython/blob/v3.13.0/Modules/_ctypes/cfield.c#L703-L944
[memcpy]: https://github.com/python/cpython/blob/v3.13.0/Modules/_ctypes/cfield.c#L613
[PyLong_AsNativeBytes]: https://docs.python.org/3/c-api/long.html#c.PyLong_AsNativeBytes
[sizeof-wchar_t]: https://github.com/python/cpython/blob/v3.13.0/Modules/_ctypes/cfield.c#L1547-L1555
[sizeof-bool]: https://github.com/python/cpython/blob/v3.13.0/Modules/_ctypes/cfield.c#L1562-L1572


Co-authored-by: Bénédikt Tran <[email protected]>
@encukou encukou closed this as completed Dec 20, 2024
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this issue Dec 23, 2024
…rs (pythonGH-127297)

This should be a pure refactoring, without user-visible behaviour changes.

Before this change, ctypes uses traditional native C types, usually identified 
by [`struct` format characters][struct-chars] when a short (and 
identifier-friendly) name is needed:
- `signed char` (`b`) / `unsigned char` (`B`)
- `short` (`h`) / `unsigned short` (`h`)
- `int` (`i`) / `unsigned int` (`i`)
- `long` (`l`) / `unsigned long` (`l`)
- `long long` (`q`) / `unsigned long long` (`q`)

These map to C99 fixed-width types, which this PR switches to: - 
- `int8_t`/`uint8_t`
- `int16_t`/`uint16_t`
- `int32_t`/`uint32_t`
- `int64_t`/`uint64_t`

The C standard doesn't guarantee that the “traditional” types must map to the 
fixints. But, [`ctypes` currently requires it][swapdefs], so the assumption won't 
break anything.

By “map” I mean that the *size* of the types matches. The *alignment* 
requirements might not. This needs to be kept in mind but is not an issue in 
`ctypes` accessors, which [explicitly handle unaligned memory][memcpy] for the 
integer types.

Note that there are 5 “traditional” C type sizes, but 4 fixed-width ones. Two of 
the former are functionally identical to one another; which ones they are is 
platform-specific (e.g. `int`==`long`==`int32_t`.) This means that one of the 
[current][current-impls-1] [implementations][current-impls-2] is redundant on 
any given platform.


The fixint types are parametrized by the number of bytes/bits, and one bit for 
signedness. This makes it easier to autogenerate code for them or to write 
generic macros (though generic API like 
[`PyLong_AsNativeBytes`][PyLong_AsNativeBytes] is problematic for performance 
reasons -- especially compared to a `memcpy` with compile-time-constant size).

When one has a *different* integer type, determining the corresponding fixint  
means a `sizeof` and signedness check. This is easier and more robust than the 
current implementations (see [`wchar_t`][sizeof-wchar_t] or 
[`_Bool`][sizeof-bool]).


[swapdefs]: https://github.com/python/cpython/blob/v3.13.0/Modules/_ctypes/cfield.c#L420-L444
[struct-chars]: https://docs.python.org/3/library/struct.html#format-characters
[current-impls-1]: https://github.com/python/cpython/blob/v3.13.0/Modules/_ctypes/cfield.c#L470-L653
[current-impls-2]: https://github.com/python/cpython/blob/v3.13.0/Modules/_ctypes/cfield.c#L703-L944
[memcpy]: https://github.com/python/cpython/blob/v3.13.0/Modules/_ctypes/cfield.c#L613
[PyLong_AsNativeBytes]: https://docs.python.org/3/c-api/long.html#c.PyLong_AsNativeBytes
[sizeof-wchar_t]: https://github.com/python/cpython/blob/v3.13.0/Modules/_ctypes/cfield.c#L1547-L1555
[sizeof-bool]: https://github.com/python/cpython/blob/v3.13.0/Modules/_ctypes/cfield.c#L1562-L1572


Co-authored-by: Bénédikt Tran <[email protected]>
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this issue Jan 8, 2025
…rs (pythonGH-127297)

This should be a pure refactoring, without user-visible behaviour changes.

Before this change, ctypes uses traditional native C types, usually identified 
by [`struct` format characters][struct-chars] when a short (and 
identifier-friendly) name is needed:
- `signed char` (`b`) / `unsigned char` (`B`)
- `short` (`h`) / `unsigned short` (`h`)
- `int` (`i`) / `unsigned int` (`i`)
- `long` (`l`) / `unsigned long` (`l`)
- `long long` (`q`) / `unsigned long long` (`q`)

These map to C99 fixed-width types, which this PR switches to: - 
- `int8_t`/`uint8_t`
- `int16_t`/`uint16_t`
- `int32_t`/`uint32_t`
- `int64_t`/`uint64_t`

The C standard doesn't guarantee that the “traditional” types must map to the 
fixints. But, [`ctypes` currently requires it][swapdefs], so the assumption won't 
break anything.

By “map” I mean that the *size* of the types matches. The *alignment* 
requirements might not. This needs to be kept in mind but is not an issue in 
`ctypes` accessors, which [explicitly handle unaligned memory][memcpy] for the 
integer types.

Note that there are 5 “traditional” C type sizes, but 4 fixed-width ones. Two of 
the former are functionally identical to one another; which ones they are is 
platform-specific (e.g. `int`==`long`==`int32_t`.) This means that one of the 
[current][current-impls-1] [implementations][current-impls-2] is redundant on 
any given platform.


The fixint types are parametrized by the number of bytes/bits, and one bit for 
signedness. This makes it easier to autogenerate code for them or to write 
generic macros (though generic API like 
[`PyLong_AsNativeBytes`][PyLong_AsNativeBytes] is problematic for performance 
reasons -- especially compared to a `memcpy` with compile-time-constant size).

When one has a *different* integer type, determining the corresponding fixint  
means a `sizeof` and signedness check. This is easier and more robust than the 
current implementations (see [`wchar_t`][sizeof-wchar_t] or 
[`_Bool`][sizeof-bool]).


[swapdefs]: https://github.com/python/cpython/blob/v3.13.0/Modules/_ctypes/cfield.c#L420-L444
[struct-chars]: https://docs.python.org/3/library/struct.html#format-characters
[current-impls-1]: https://github.com/python/cpython/blob/v3.13.0/Modules/_ctypes/cfield.c#L470-L653
[current-impls-2]: https://github.com/python/cpython/blob/v3.13.0/Modules/_ctypes/cfield.c#L703-L944
[memcpy]: https://github.com/python/cpython/blob/v3.13.0/Modules/_ctypes/cfield.c#L613
[PyLong_AsNativeBytes]: https://docs.python.org/3/c-api/long.html#c.PyLong_AsNativeBytes
[sizeof-wchar_t]: https://github.com/python/cpython/blob/v3.13.0/Modules/_ctypes/cfield.c#L1547-L1555
[sizeof-bool]: https://github.com/python/cpython/blob/v3.13.0/Modules/_ctypes/cfield.c#L1562-L1572


Co-authored-by: Bénédikt Tran <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-modules C modules in the Modules dir topic-ctypes type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants