Skip to content

Implement a means to preserve & modify general purpose flag bits when opening an existing archive #491

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
aral-matrix opened this issue May 4, 2025 · 6 comments
Labels
enhancement Request a new feature. feedback Waiting for feedback from submitter.

Comments

@aral-matrix
Copy link

aral-matrix commented May 4, 2025

I did some more testing and would like to propose / request a minor change in how libzip handles existing archives.

My original issue as reported here
#490

was occuring with libzip 4.0.0 (currently in debian bookworm), when files opened with it have the central GPF bit 11 (UTF-8) set, and the same bit is set for files compressed therein (In my case, the GPF bits have the value 0x03cc).

In a file saved with that libzip implementation, the GPF bits appear to become all cleared (0x0000), both for the central flags, as well as newly added files, but they remain unmodified for files pre-existing in the archive, triggering unzip to complain.

Now, I just tested with libzip 5.5.0, downloaded from the repository, and the behavior appears to have changed: while the central GPF bits still get cleared, unzip no longer complains, which leads me to the conclusion that the GPF bits of preexisting files are now cleared as well.

I am not sure how important those bits are, but I could imagine that is not the intended behavior? I would expect something along the lines of:

  • by default, libzip copies to all newly added files the central archive GPF bit flags
  • libzip exposes to the user a zip_source_set_file_attributes function that can recursively modify all GPF bits in the archive, so as to keep the archive an intact zip file

Some feedback would be appreciated - also maybe with a pointer to the libzip commit in which this "clear all GPF bits from preexisting files" functionality was added.

@aral-matrix aral-matrix added the enhancement Request a new feature. label May 4, 2025
@dillof
Copy link
Member

dillof commented May 5, 2025

There are certain fields of a file header that have to match other aspects of the file. We consider these fields as internal and maintain the required constraints. As such, we do not (directly) expose them via APIs.

The General Purpose Bit Flags are one such field. Bit 11 is set if the file name is encoded in UTF-8. libzip clears this flag for file names that only contain ASCII characters (character cods 0x00 to 0x7f). Could you check if your archives contain file names with non-ASCII characters?

@dillof dillof added the feedback Waiting for feedback from submitter. label May 5, 2025
@aral-matrix
Copy link
Author

Could you check if your archives contain file names with non-ASCII characters?

I am explicitly testing with such file names, but to be honest, I believe the non-ASCII file names are limited to the name of the archive itself.

The part I don't know and that I understand from your response is that the only use case for the GPF bit 11 is when the name of a file added to the archive is encoded in UTF-8?

If so, I am fine with the new behavior of the library (5.5.0 version) where the archive ends up being consistent with the GPF bit stripped everywhere if not needed.

However, I was confused by the initial error message by unzip - on an archive edited with libzip 4.0.0 - that seemed to claim that the generic archive flag should match the file flag. Which would mean that if only a single file in the archive has a UTF-8 encoded name, ALL files in the archive must have the UTF-8 GPF bit set, so that the central archive GPF bit can be set without unzip complaining?

Are you reachable on matrix? I am happy to explain the issue in a chat :)

@0-wiz-0
Copy link
Member

0-wiz-0 commented May 6, 2025

Bit 11 of the GBPF must be set if the file name or file comment is encoded in UTF-8. It is specific to a file, different files can have it set differently. libzip takes control of it and sets it automatically to ensure it's correct.
The name of the archive itself is irrelevant for this.

The version numbers 4.0.0 and 5.5.0 are probably shared library versions (I guess?) but I don't know what libzip releases they correspond to. The different behaviour is just a bugfix.

Judging from

If so, I am fine with the new behavior of the library (5.5.0 version) where the archive ends up being consistent with the GPF bit stripped everywhere if not needed.

I think this ticket can be closed.

@aral-matrix
Copy link
Author

The version numbers 4.0.0 and 5.5.0 are probably shared library versions (I guess?) but I don't know what libzip releases they correspond to. The different behaviour is just a bugfix.

I think I took the version numbers from the shared object files - once from debian stable /usr/lib/x86_64-linux-gnu/libzip.so.4.0 and once from the file created by the repository I cloned from here last week: libzip.so.5.5 (not sure where I got the final 0 from). So yes, those are shared library versions.

Judging from

If so, I am fine with the new behavior of the library (5.5.0 version) where the archive ends up being consistent with the GPF bit stripped everywhere if not needed.

I think this ticket can be closed.

Kinda... Would you happen to have any indication for me how - in the older library version - I could fix the "corruption" of archives that seem to have some file's GPF bit 11 set, while the central archive GPF bit 11 is stripped (not set to begin with) when creating an archive in memory from the source loaded from disk?

It's somewhat unfortunate to expect users to live with a zip archive that - albeit being extractable - provokes an error message upon unzipping.

Sadly, all functions that libzip has for modifying the GPF bit 11 appear to be internal to the library, so I think I would need a workaround that duplicates some part of the library functions around GPF bits?

Would you have a good suggestion how to go about this? Is there any way to access a data structure with those bits (uncompressed) in a defined location so I could reinterpret a pointer and implement a "hack" for modifying the GPF bits?

@0-wiz-0
Copy link
Member

0-wiz-0 commented May 6, 2025

No, there is no support for accessing the fields directly. I suggest upgrading to the newer library and just reading and writing the zip archive to a new file.

@aral-matrix
Copy link
Author

No, there is no support for accessing the fields directly. I suggest upgrading to the newer library and just reading and writing the zip archive to a new file.

Can't upgrade to the newer library unless we ship the libzip version with OpenXLSX - which debian doesn't like (for whom I am trying to package) :/ Then again I hope that debian unstable (where a new package would go) has a sufficiently new version of libzip.

I was hoping there might be an easy "hack" if any libzip function exposes the raw data of a file, including the GPF bits, to the caller (since the whole archive is available as raw data when loading / saving from memory).

But yeah, it appears we'll have to live without such a feature, and I do not understand enough about the zip file architecture to think of a workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request a new feature. feedback Waiting for feedback from submitter.
Projects
None yet
Development

No branches or pull requests

3 participants