Skip to content

gh-132983: Add compression.zstd and Python tests #133365

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 56 commits into from
May 6, 2025

Conversation

emmatyping
Copy link
Member

@emmatyping emmatyping commented May 4, 2025

The next step to merging the PEP 784 implementation, this PR introduces compression.zstd and tests for the code (which I have been using to test the C code so far).

Since test can be modified post-beta1, I'd like to focus on reviewing the API and Python implementation code. Obviously if there are important tests to add or change, that review would be valuable as well.

@emmatyping
Copy link
Member Author

Also as requested, pinging @Rogdham to review the Python code.

Copy link
Member

@AA-Turner AA-Turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of Lib/compression/zstd/__init__.py

@bedevere-app
Copy link

bedevere-app bot commented May 4, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Copy link
Contributor

@Rogdham Rogdham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This review is focused on changes from pyzstd.

My main issue is decompress(compress(b"xxx") + b"yyy") not raising an exception.

Copy link
Contributor

@Rogdham Rogdham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These extra comments are due to options being passed as positional arguments, but they are passed to level parameter instead of options.

I hope I did not miss any 😅

@emmatyping
Copy link
Member Author

I am working on addressing some of the issues which will include the test failures

Copy link
Member

@tomasr8 tomasr8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits, nothing major

Copy link
Member

@AA-Turner AA-Turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments on Lib/compression/zstd/_zstdfile.py

@emmatyping
Copy link
Member Author

I have made the requested changes; please review again

@bedevere-app
Copy link

bedevere-app bot commented May 5, 2025

Thanks for making the requested changes!

@AA-Turner: please review the changes made to this pull request.

@bedevere-app bedevere-app bot requested a review from AA-Turner May 5, 2025 23:50
@emmatyping
Copy link
Member Author

emmatyping commented May 5, 2025

FWIW I believe I responded to all reviews, so I will add the buildbots to test this.

@emmatyping emmatyping added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label May 5, 2025
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @emmatyping for commit a12a031 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F133365%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label May 5, 2025
AA-Turner added 7 commits May 6, 2025 00:53
Python, in general, works on protocols rather than nominal or specific types. I think this check is overly restrictive, with only minor benefit. Users wanting validation can use static type checkers.
@AA-Turner
Copy link
Member

I've pushed a series of commits. Most are explanatory and fairly minor (docstring wording, etc). The one of most note is removing the exact type checks on {C,Dec}ompressionParamater. I think it is overly restrictivie to check the exact type of the enum, and I can't think of a similar example where we do this. Users who want the validation of not conflating compression and decompression can use a staic type checker.

I've pushed this for speed, given time, but if you feel strongly I'm happy for you to revert.

A

@emmatyping
Copy link
Member Author

@AA-Turner The commits look fine other than the last one. It is very important to check the type of a CompressionParameter vs DecompressionParameter because the C enum variants have overlapping values (compression_level == window_log_max)

You can see in the tests this could lead to confusing errors:

Traceback (most recent call last):
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/test/test_zstd.py", line 1424, in test_init_bad_mode
    ZstdFile(io.BytesIO(), 'rb',
    ~~~~~~~~^^^^^^^^^^^^^^^^^^^^
             options={CompressionParameter.compression_level:5})
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/compression/zstd/_zstdfile.py", line 91, in __init__
    raw = _streams.DecompressReader(
        self._fp,
    ...<3 lines>...
        options=options,
    )
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/compression/_common/_streams.py", line 53, in __init__
    self._decompressor = self._decomp_factory(**self._decomp_args)
                         ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
_zstd.ZstdError: Error when setting zstd decompression parameter "window_log_max", it should 10 <= value <= 31, provided value is 5. (zstd v1.5.5, 64-bit build)

I think I will revert the last commit but keep the others.

This reverts commit bf4b07d.

Checking the type of parameters is important to avoid confusing error
messages.
@AA-Turner
Copy link
Member

It is very important to check the type of a CompressionParameter vs DecompressionParameter because the C enum variants have overlapping values

Yes, but I think the risk of this actually hitting users is low? For example, the examplar test failure you provided is very clear: CompressionParameter.compression_level is provided as a key, and the error message says 'Error when setting zstd decompression parameter "window_log_max"' (emph. added). typeshed could annotate the options dict as Mapping[CompressionParameter, int] which would solve the problem for users of static type checkers. Users who pass nonsense or invalid values will always get undefined behaviour or an error. I think it's better to apply the consenting adults principle here that we see elsewhere in the language. For example, the current code doesn't let me pass an int as the key directly.

I think it's fine to keep for now as we can discuss further and change this during the betas -- this particular item isn't impacted by feature freeze as it's entirely private.

A

@emmatyping
Copy link
Member Author

For example, the examplar test failure you provided is very clear: CompressionParameter.compression_level is provided as a key, and the error message says 'Error when setting zstd decompression parameter "window_log_max"' (emph. added).

I think we view the clarify of the error differently! To me I would expect users to not understand why they are getting an error about a decompression parameter when passing a compression parameter. I think perhaps a compromise would be to check the type if it is a (De)CompressionParameter, but otherwise just allow ints without checking. If you're passing an int, you're on your own!

@emmatyping
Copy link
Member Author

emmatyping commented May 6, 2025

@AA-Turner

For example, the current code doesn't let me pass an int as the key directly.

After writing this:

I think perhaps a compromise would be to check the type if it is a (De)CompressionParameter, but otherwise just allow ints without checking. If you're passing an int, you're on your own!

I just checked and we only check if it is CompressionParameter for decompression, or DecompressionParameter for compression. So the compromise is actually the current behavior. So you can pass an int if you'd like:

>>> from compression.zstd import *
>>> compress(b'testtesttest', options={100:5})
b'(\xb5/\xfd \x0ca\x00\x00testtesttest'

Copy link
Member

@AA-Turner AA-Turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mergeable, we can tweak the few remaining items (eg. tarfile, parameter type checks, perhaps tests) after 3.14 beta 1.

Thanks everyone, and especially thank you @Rogdham for the valuable feedback & context from pyzstd.

A

@AA-Turner AA-Turner merged commit c273f59 into python:main May 6, 2025
41 of 42 checks passed
@emmatyping emmatyping deleted the 3.14-zstd-python-code branch May 6, 2025 00:38
@emmatyping
Copy link
Member Author

emmatyping commented May 6, 2025

Thanks everyone, and especially thank you @Rogdham for the valuable feedback & context from pyzstd.

Hear, hear! And thank you @AA-Turner for your suggestions and contributions! Thank you to all of the reviewers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants