-
-
Notifications
You must be signed in to change notification settings - Fork 595
Ensure fsencode/fsdecode works on ASCII FS. Use tarball for unicode test files #758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure fsencode/fsdecode works on ASCII FS. Use tarball for unicode test files #758
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #758 +/- ##
==========================================
+ Coverage 78.54% 78.9% +0.36%
==========================================
Files 91 91
Lines 11263 11484 +221
==========================================
+ Hits 8846 9061 +215
- Misses 2417 2423 +6
Continue to review full report at Codecov.
|
The Unicode fixture |
src/commoncode/fileutils.py
Outdated
FS_ENCODING = sys.getfilesystemencoding() or sys.getdefaultencoding() | ||
FS_ENCODING = sys.getfilesystemencoding() | ||
# normalize the encoding name | ||
FS_ENCODING = codecs.lookup(FS_ENCODING).name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still the correct thing to do? I though when using a version of backports.os
that has PiDelport/backports.os#8 merged, we don't need to do normalization anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FS_ENCODING
is now used only in command.py and libarchive2.py to encode ctypes-accessed DLLs/SOs paths ... It is also used implicit in magic2.py and should not be duplicated ... and to your point this is unlikely needed in this form now with @pjdelport fixes and should be replaced by using a proper fsencode/fsdecode in these two places instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this change is still present with backports.os 0.1.1 in use. Shouldn't your either revert this change, or use the proper fsencode/fsdecode that you mentioned?
thirdparty/prod/backports.os.ABOUT
Outdated
@@ -15,3 +15,6 @@ copyright: Copyright (c) Piët Delport, and Python Software Foundation | |||
notes: the invalid_utf8_indexes() helper is copied from Bob Ippolito's | |||
pyutf8 package (pyutf8/ref.py) and is MIT-licensed | |||
https://github.com/etrepum/pyutf8/blob/master/pyutf8/ref.py | |||
This verion is an advanced patched version fixing the #688 issue | |||
and contain the code in the contributed pull request at | |||
https://github.com/pjdelport/backports.os/pull/7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sschuberth yes, I will update this with the latest PiDelport/backports.os#8 that @pjdelport kindly published to Pypi too.
@sschuberth you wrote:
Good catch. Will fix |
@sschuberth If I check https://github.com/nexB/scancode-toolkit/tree/688-fsencode-fsdecode-755-unicode-test-files/tests/scancode/data And the archive seems to contains properly the needed files?
|
This branch now has the latest @pjdelport backports.os 1.1 ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like I got confused about how GitHub displays the removal of empty files. Looking at https://github.com/nexB/scancode-toolkit/pull/758/files I see No changes.
being displayed for both tests/scancode/data/unicode_fixtures/empty.txt
and tests/scancode/data/unicode_fixtures/snow ☃/.gitkeep
, but you can "guess" from the red boxes that these empty files have in fact been removed. Sorry about that.
src/commoncode/fileutils.py
Outdated
FS_ENCODING = sys.getfilesystemencoding() or sys.getdefaultencoding() | ||
FS_ENCODING = sys.getfilesystemencoding() | ||
# normalize the encoding name | ||
FS_ENCODING = codecs.lookup(FS_ENCODING).name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this change is still present with backports.os 0.1.1 in use. Shouldn't your either revert this change, or use the proper fsencode/fsdecode that you mentioned?
But is |
When downloading https://github.com/nexB/scancode-toolkit/raw/0ae20e45a735b660c60e5c1b2b8db449d83fc261/tests/scancode/data/unicode_fixtures.tar.gz and opening it with WinRAR, I still only see a On Linux, it shows correctly, however, and also if I run |
FYI, the RAR devs already confirmed the bug in WinRAR 5.5 realted to Unicode chars in TAR files with PAX extended headers. It will be fixed in the next version. |
* see PiDelport/backports.os#7 for details Signed-off-by: Philippe Ombredanne <[email protected]>
* this ensures that the repo can be cloned on Windows Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
* this fixes the issues for ASCII fs encodings * big thank you to @pjdelport for fixing this Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
0ae20e4
to
9f5595d
Compare
@sschuberth good catch on Winrar ;) |
The code looks good to me now, but Windows CI still fails with
|
Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
All should be passing now: I am skipping some tests on Windows. Merging soon! |
Signed-off-by: Philippe Ombredanne <[email protected]>
@sschuberth you wrote
I can download and extract things alright on Windows eventually using 7zip or extractcode... but they will mangle the names such that they are valid windows names which defeats the purpose of the test: Windows cannot git clone this nor could I find a way to recreate some of the express test files either. |
For #688 this should solved the paths encoding issue even on Doker images without locale and an ASCII FS encoding
For #755 this ensures the repo can be cloned on Windows by using a tarball for test paths with weird encodings
Both as reported by @sschuberth : Thanks!