Ensure fsencode/fsdecode works on ASCII FS. Use tarball for unicode test files #758

pombredanne · 2017-09-19T23:05:24Z

For #688 this should solved the paths encoding issue even on Doker images without locale and an ASCII FS encoding

For #755 this ensures the repo can be cloned on Windows by using a tarball for test paths with weird encodings

Both as reported by @sschuberth : Thanks!

codecov · 2017-09-19T23:11:30Z

Codecov Report

Merging #758 into develop will increase coverage by 0.36%.
The diff coverage is 94.11%.

@@            Coverage Diff             @@
##           develop    #758      +/-   ##
==========================================
+ Coverage    78.54%   78.9%   +0.36%     
==========================================
  Files           91      91              
  Lines        11263   11484     +221     
==========================================
+ Hits          8846    9061     +215     
- Misses        2417    2423       +6

Impacted Files	Coverage Δ
src/commoncode/fileutils.py	`84.75% <100%> (+0.14%)`	⬆️
src/extractcode/libarchive2.py	`94.72% <100%> (+0.06%)`	⬆️
src/typecode/magic2.py	`92.45% <100%> (+0.21%)`	⬆️
src/commoncode/command.py	`83.33% <80%> (+0.4%)`	⬆️
src/commoncode/testcase.py	`83.59% <90%> (-0.01%)`	⬇️
src/typecode/contenttype.py	`85.51% <0%> (+0.35%)`	⬆️
src/scancode/cli.py	`93.82% <0%> (+0.36%)`	⬆️
src/scancode/cache.py	`89.67% <0%> (+1.36%)`	⬆️
src/scancode/api.py	`74.66% <0%> (+2%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 969d84d...4ee8053. Read the comment docs.

sschuberth · 2017-09-20T06:28:29Z

The Unicode fixture tests/scancode/data/unicode_fixtures/empty.txt is present in the TAR but has not been removed from Git, and the fixture tests/scancode/data/unicode_fixtures/snow ☃/.gitkeep is not present in the TAR (only a snow directory is) and it has also not been removed from Git.

sschuberth · 2017-09-20T06:22:44Z

src/commoncode/fileutils.py

-FS_ENCODING = sys.getfilesystemencoding() or sys.getdefaultencoding()
+FS_ENCODING = sys.getfilesystemencoding()
+# normalize the encoding name
+FS_ENCODING = codecs.lookup(FS_ENCODING).name


Is this still the correct thing to do? I though when using a version of backports.os that has PiDelport/backports.os#8 merged, we don't need to do normalization anymore?

FS_ENCODING is now used only in command.py and libarchive2.py to encode ctypes-accessed DLLs/SOs paths ... It is also used implicit in magic2.py and should not be duplicated ... and to your point this is unlikely needed in this form now with @pjdelport fixes and should be replaced by using a proper fsencode/fsdecode in these two places instead.

I see this change is still present with backports.os 0.1.1 in use. Shouldn't your either revert this change, or use the proper fsencode/fsdecode that you mentioned?

sschuberth · 2017-09-20T06:25:07Z

thirdparty/prod/backports.os.ABOUT

@@ -15,3 +15,6 @@ copyright: Copyright (c) Piët Delport, and Python Software Foundation
 notes: the invalid_utf8_indexes() helper is copied from Bob Ippolito's 
 pyutf8 package (pyutf8/ref.py) and is MIT-licensed
 https://github.com/etrepum/pyutf8/blob/master/pyutf8/ref.py
+ This verion is an advanced patched version fixing the #688 issue
+ and contain the code in the contributed pull request at 
+ https://github.com/pjdelport/backports.os/pull/7


Ah, I see. This custom version of backports.os still has your #7 merged, whereas we should now use a version that has #8 merged. I guess we can wait until #8 is merged upstream and then use a version straight from upstream.

@sschuberth yes, I will update this with the latest PiDelport/backports.os#8 that @pjdelport kindly published to Pypi too.

pombredanne · 2017-09-20T19:18:31Z

@sschuberth you wrote:

The Unicode fixture tests/scancode/data/unicode_fixtures/empty.txt is present in the TAR but has not been removed from Git, and the fixture tests/scancode/data/unicode_fixtures/snow ☃/.gitkeep is not present in the TAR (only a snow directory is) and it has also not been removed from Git.

Good catch. Will fix

pombredanne · 2017-09-20T23:56:05Z

@sschuberth If I check https://github.com/nexB/scancode-toolkit/tree/688-fsencode-fsdecode-755-unicode-test-files/tests/scancode/data unicode_fixtures/ is gone-gone as a dir?

And the archive seems to contains properly the needed files?

$ tar -tvf unicode_fixtures.tar.gz 
-rw-rw-r-- pombreda/pombreda 9 2017-09-17 21:18 unicode_fixtures/pets/names.txt
-rw-rw-r-- pombreda/pombreda 11 2017-09-17 21:18 unicode_fixtures/.hidden
-rw-rw-r-- pombreda/pombreda  0 2017-09-17 21:18 unicode_fixtures/empty.txt
-rw-rw-r-- pombreda/pombreda  9 2017-09-17 21:18 unicode_fixtures/nums
-rw-rw-r-- pombreda/pombreda  3 2017-09-17 21:18 unicode_fixtures/foo bar
drwxrwxr-x pombreda/pombreda  0 2017-09-17 21:18 unicode_fixtures/pets/
-rw-rw-r-- pombreda/pombreda 23 2017-09-17 21:18 unicode_fixtures/users/index.html
drwxrwxr-x pombreda/pombreda  0 2017-09-17 21:18 unicode_fixtures/
-rw-rw-r-- pombreda/pombreda 18 2017-09-17 21:18 unicode_fixtures/todo.html
-rw-rw-r-- pombreda/pombreda  0 2017-09-17 21:18 unicode_fixtures/snow ☃/.gitkeep
-rw-rw-r-- pombreda/pombreda  6 2017-09-17 21:18 unicode_fixtures/users/tobi.txt
drwxrwxr-x pombreda/pombreda  0 2017-09-17 21:18 unicode_fixtures/users/
-rw-rw-r-- pombreda/pombreda 11 2017-09-17 21:18 unicode_fixtures/todo.txt
drwxrwxr-x pombreda/pombreda  0 2017-09-17 21:18 unicode_fixtures/snow ☃/

pombredanne · 2017-09-20T23:57:12Z

This branch now has the latest @pjdelport backports.os 1.1 !

sschuberth

Looks like I got confused about how GitHub displays the removal of empty files. Looking at https://github.com/nexB/scancode-toolkit/pull/758/files I see No changes. being displayed for both tests/scancode/data/unicode_fixtures/empty.txt and tests/scancode/data/unicode_fixtures/snow ☃/.gitkeep, but you can "guess" from the red boxes that these empty files have in fact been removed. Sorry about that.

sschuberth · 2017-09-21T07:16:42Z

src/commoncode/fileutils.py

-FS_ENCODING = sys.getfilesystemencoding() or sys.getdefaultencoding()
+FS_ENCODING = sys.getfilesystemencoding()
+# normalize the encoding name
+FS_ENCODING = codecs.lookup(FS_ENCODING).name


I see this change is still present with backports.os 0.1.1 in use. Shouldn't your either revert this change, or use the proper fsencode/fsdecode that you mentioned?

sschuberth · 2017-09-21T07:22:49Z

But is thirdparty/prod/backports.os-0.1.patch1-py2-none-any.whl still needed?

sschuberth · 2017-09-21T07:29:31Z

When downloading https://github.com/nexB/scancode-toolkit/raw/0ae20e45a735b660c60e5c1b2b8db449d83fc261/tests/scancode/data/unicode_fixtures.tar.gz and opening it with WinRAR, I still only see a snow directory:

On Linux, it shows correctly, however, and also if I run tar -tvf from Git Bash. Very odd, I'll report this as a bug in WinRAR.

sschuberth · 2017-09-21T07:55:58Z

I'll report this as a bug in WinRAR.

FYI, the RAR devs already confirmed the bug in WinRAR 5.5 realted to Unicode chars in TAR files with PAX extended headers. It will be fixed in the next version.

* see PiDelport/backports.os#7 for details Signed-off-by: Philippe Ombredanne <[email protected]>

* this ensures that the repo can be cloned on Windows Signed-off-by: Philippe Ombredanne <[email protected]>

Signed-off-by: Philippe Ombredanne <[email protected]>

@pjdelport

* this fixes the issues for ASCII fs encodings * big thank you to @pjdelport for fixing this Signed-off-by: Philippe Ombredanne <[email protected]>

Signed-off-by: Philippe Ombredanne <[email protected]>

pombredanne · 2017-09-22T05:39:16Z

@sschuberth good catch on Winrar ;)
I do not use it though.
This latest commit should be all set now.

sschuberth · 2017-09-22T06:57:24Z

The code looks good to me now, but Windows CI still fails with

WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: 'c:\\users\\appveyor\\appdata\\local\\temp\\1\\scancode_appveyor\\tst\\ rbabmi\\dqxinc\\unicode_fixtures.tar.gz\\unicode_fixtures\\snow ?'

Signed-off-by: Philippe Ombredanne <[email protected]>

pombredanne · 2017-09-22T19:18:53Z

All should be passing now: I am skipping some tests on Windows. Merging soon!

Signed-off-by: Philippe Ombredanne <[email protected]>

pombredanne · 2017-10-02T21:48:32Z

@sschuberth you wrote

But I doubt that's true. I can download and extract https://raw.githubusercontent.com/nexB/scancode-toolkit/d5f3116f0ef8b21c81e60a0a7c18eb46bb257630/tests/scancode/data/unicode_fixtures.tar.gz on Windows just fine. It's probably just a matter of doing it right programmatically.

I can download and extract things alright on Windows eventually using 7zip or extractcode... but they will mangle the names such that they are valid windows names which defeats the purpose of the test: Windows cannot git clone this nor could I find a way to recreate some of the express test files either.
So I have tried many ways but there is no way some tests run on Windows.... so I am going to merge this as it is now.

pombredanne requested a review from sschuberth September 19, 2017 23:05

sschuberth changed the title ~~Ensure fsencode/fsdecode works on ASCII FS. Use tarbell for unicode test files~~ Ensure fsencode/fsdecode works on ASCII FS. Use tarball for unicode test files Sep 20, 2017

sschuberth requested changes Sep 20, 2017

View reviewed changes

sschuberth reviewed Sep 21, 2017

View reviewed changes

pombredanne added 5 commits September 21, 2017 22:30

Update backports.os for ASCII FS encoding

cd9befe

* see PiDelport/backports.os#7 for details Signed-off-by: Philippe Ombredanne <[email protected]>

Use archive for weird test paths for #755

d5f3116

* this ensures that the repo can be cloned on Windows Signed-off-by: Philippe Ombredanne <[email protected]>

Ensure fs encoding is always normalized #688

ab5f0a3

Signed-off-by: Philippe Ombredanne <[email protected]>

Bump backports.os to latest version #688

05f3190

* this fixes the issues for ASCII fs encodings * big thank you to @pjdelport for fixing this Signed-off-by: Philippe Ombredanne <[email protected]>

Use backports.os for fsencode/decode everywhere #688

9f5595d

Signed-off-by: Philippe Ombredanne <[email protected]>

pombredanne force-pushed the 688-fsencode-fsdecode-755-unicode-test-files branch from 0ae20e4 to 9f5595d Compare September 22, 2017 05:30

sschuberth approved these changes Sep 22, 2017

View reviewed changes

pombredanne added 2 commits September 22, 2017 08:21

Ensure tarball with unicode path extracts on Windows #755

2e4e345

Signed-off-by: Philippe Ombredanne <[email protected]>

Skip some unicode-related tests on Windows #755

0ed7c87

Signed-off-by: Philippe Ombredanne <[email protected]>

Attempt to extract tar on Mac as bytes #688

4ee8053

Signed-off-by: Philippe Ombredanne <[email protected]>

pombredanne merged commit 6b93456 into develop Oct 2, 2017

pombredanne deleted the 688-fsencode-fsdecode-755-unicode-test-files branch October 2, 2017 21:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure fsencode/fsdecode works on ASCII FS. Use tarball for unicode test files #758

Ensure fsencode/fsdecode works on ASCII FS. Use tarball for unicode test files #758

pombredanne commented Sep 19, 2017

codecov bot commented Sep 19, 2017 •

edited

Loading

sschuberth commented Sep 20, 2017

sschuberth Sep 20, 2017

pombredanne Sep 20, 2017

sschuberth Sep 21, 2017

sschuberth Sep 20, 2017

pombredanne Sep 20, 2017

pombredanne commented Sep 20, 2017

pombredanne commented Sep 20, 2017

pombredanne commented Sep 20, 2017

sschuberth left a comment

sschuberth Sep 21, 2017

sschuberth commented Sep 21, 2017

sschuberth commented Sep 21, 2017

sschuberth commented Sep 21, 2017

pombredanne commented Sep 22, 2017

sschuberth commented Sep 22, 2017

pombredanne commented Sep 22, 2017 •

edited

Loading

pombredanne commented Oct 2, 2017

Ensure fsencode/fsdecode works on ASCII FS. Use tarball for unicode test files #758

Ensure fsencode/fsdecode works on ASCII FS. Use tarball for unicode test files #758

Conversation

pombredanne commented Sep 19, 2017

codecov bot commented Sep 19, 2017 • edited Loading

Codecov Report

sschuberth commented Sep 20, 2017

sschuberth Sep 20, 2017

Choose a reason for hiding this comment

pombredanne Sep 20, 2017

Choose a reason for hiding this comment

sschuberth Sep 21, 2017

Choose a reason for hiding this comment

sschuberth Sep 20, 2017

Choose a reason for hiding this comment

pombredanne Sep 20, 2017

Choose a reason for hiding this comment

pombredanne commented Sep 20, 2017

pombredanne commented Sep 20, 2017

pombredanne commented Sep 20, 2017

sschuberth left a comment

Choose a reason for hiding this comment

sschuberth Sep 21, 2017

Choose a reason for hiding this comment

sschuberth commented Sep 21, 2017

sschuberth commented Sep 21, 2017

sschuberth commented Sep 21, 2017

pombredanne commented Sep 22, 2017

sschuberth commented Sep 22, 2017

pombredanne commented Sep 22, 2017 • edited Loading

pombredanne commented Oct 2, 2017

codecov bot commented Sep 19, 2017 •

edited

Loading

pombredanne commented Sep 22, 2017 •

edited

Loading