fix: use tarfile extract filters to open tarfiles more safely #3769

terriko · 2024-01-31T16:48:20Z

Had an external report that we weren't sufficiently careful when opening tarfiles because we're using Python's (insecure) default behaviour here. That default is changed in python 3.12 but since we support 3.8-3.11 as well then we need a fix.

This uses the file filters if available as recommended by the tarfile docs (it's not quite a from __future__ but similar idea I think) and does a simple "skip symlinks and absolute paths" check if filters are not available.

Edit:
This is still missing

test
a fix that works in windows

I'm hoping we can merge what I have so we have mitigation on Linux in main, but those will definitely be coming in future PRs. (The test would have been in this one but I'm feeling unwell so I'm sending it for code review while I won't be touching it for a bit.)

Signed-off-by: Terri Oda <[email protected]>

Also changed pre-commit config so interrogate is at the top and its output doesn't obscure more urgent error messages. Signed-off-by: Terri Oda <[email protected]>

codecov-commenter · 2024-01-31T17:16:04Z

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (70b9ef9) 78.04% compared to head (0e91188) 80.83%.
Report is 32 commits behind head on main.

Files	Patch %	Lines
cve_bin_tool/extractor.py	64.28%	4 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3769      +/-   ##
==========================================
+ Coverage   78.04%   80.83%   +2.78%     
==========================================
  Files         803      808       +5     
  Lines       11810    11994     +184     
  Branches     1365     1602     +237     
==========================================
+ Hits         9217     9695     +478     
+ Misses       2158     1878     -280     
+ Partials      435      421      -14

Flag	Coverage Δ
longtests	`80.22% <50.00%> (?)`
win-longtests	`78.49% <50.00%> (+0.44%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

anthonyharrison

In addition to detecting symbolic links it would be good to block character and block devices . by including a member.isdev() in the set of files which are not extracted.

terriko · 2024-02-01T17:45:31Z

@anthonyharrison thanks, that's a good point. After looking it up, I realized I should just be using member.isfile() to avoid them all, so I switched to that.

Still tweaking the startsWith check so it works on all OSes we support.

antoniogi · 2024-02-05T22:18:47Z

cve_bin_tool/extractor.py

+            # Python 3.12 has a data filter we can use in extract
+            # tarfile has this available in older versions as well
+            if hasattr(tarfile, "data_filter"):
+                await aio_unpack_archive(filename, extraction_path, filter="data")


Why don't you just use something like tarfile.extract_all instead of asynchronously unpacking and then waiting for the operation to finish?

Originally this was so multiple files could be unpacked in parallel., which helped to speed up certain types of scan if they're not disk i/o bound.

But now that you mention it, I"m worried that the filters getting passed through unpack_archive might be a problem on older versions of python and tarfile.extract might be better for backwards compatibility. I couldn't find evidence either way, so I'll probably change this.

(I say "probably" because if changing it makes all our tests super slow I might need to rethink. But I will make the change and see what happens.)

What has happened so far: bandit apparently doesn't know about filter='data' so we got a security error about unsafe use of tarfile. 😆 (I checked, they have a bug filed about it)

My backported filter isn't working on windows, this is a temporary measure so we can at least merge the linux fix while working on windows. Signed-off-by: Terri Oda <[email protected]>

antoniogi · 2024-02-07T20:52:07Z

cve_bin_tool/extractor.py

+
+            # FIXME: the backported fix is not working on windows.
+            # this leaves the current (unsafe) behaviour so we can fix at least one OS for now
+            if sys.platform == "win32":


I should probably read the spec, but this code opens the possibility to tarfile having the data_filter attribute and being a windows system, in which case you would extract the file twice, right? Maybe the right way to do this is by returning above, right after tar.extractall with return e.exit_code?

I think this just needs to be an elif (I remember thinking that but apparently didn't actually write it... le sigh.)

antoniogi · 2024-02-07T20:52:57Z

cve_bin_tool/extractor.py

+            # this leaves the current (unsafe) behaviour so we can fix at least one OS for now
+            if sys.platform == "win32":
+                tar.extractall(path=extraction_path)  # nosec
+                tar.close()


There's a call to tar.close after the next 'else' section that will be called. With this one here, targ.close gets called twice.

thank you. Absolutely non-sarcastically, this is why I lovecode review. I'd originally had the close in multiple spots and though "oh I should move it below" and then I apparently didn't move them all. 🤦 Totally the sort of thing I miss when reading my own code.

…3769) Also changed pre-commit config so interrogate is at the top and its output doesn't obscure more urgent error messages. Note that this is still missing a test and appropriate windows behaviour; those will be coming in future PRs. Signed-off-by: Terri Oda <[email protected]>

terriko added 4 commits January 30, 2024 11:15

fix: add data filter for tar extraction

c5bfd3c

fix: add basic filter for python > 3.12

8198ee3

Signed-off-by: Terri Oda <[email protected]>

fix: use tarfile filters if available even <3.12

b819cf2

Also changed pre-commit config so interrogate is at the top and its output doesn't obscure more urgent error messages. Signed-off-by: Terri Oda <[email protected]>

ci: test should run when extractor updated

d7004ce

fix: see if PurePath works better for windows

37b623c

terriko added this to the 3.3 milestone Feb 1, 2024

anthonyharrison requested changes Feb 1, 2024

View reviewed changes

terriko added 2 commits February 1, 2024 09:36

fix: use absolute paths for comparison

41ac6b6

fix: use isfile() in lieu of individual checks

ba41675

antoniogi self-requested a review February 5, 2024 19:10

antoniogi reviewed Feb 5, 2024

View reviewed changes

terriko added 10 commits February 5, 2024 14:59

fix: temporary workaround for windows

1315a8c

My backported filter isn't working on windows, this is a temporary measure so we can at least merge the linux fix while working on windows. Signed-off-by: Terri Oda <[email protected]>

fix: switch to tarfile.extractall directly

2efdb2f

fix: experiment with tar filter instead of data

54f429a

fix: experiment with member redefinition

3862739

fix: indentation mistake

8bf1373

refactor: use generation function

85ee64e

refactor: variable names

3a2d1ef

fix: expand extraction dir, add docstring

2fd2453

fix: unsafe windwos fix (for now)

f8ae4d6

fix: correct windows check

a038a2d

terriko marked this pull request as ready for review February 7, 2024 18:38

antoniogi reviewed Feb 7, 2024

View reviewed changes

fix: mistakes caught in code review

0e91188

antoniogi approved these changes Feb 12, 2024

View reviewed changes

terriko merged commit b4feb03 into intel:main Feb 12, 2024

terriko mentioned this pull request Feb 20, 2024

discussion: bandit linter showing high severity_score for tarfile library #3841

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: use tarfile extract filters to open tarfiles more safely #3769

fix: use tarfile extract filters to open tarfiles more safely #3769

Uh oh!

terriko commented Jan 31, 2024 •

edited

Loading

Uh oh!

codecov-commenter commented Jan 31, 2024 •

edited

Loading

Uh oh!

anthonyharrison left a comment

Uh oh!

terriko commented Feb 1, 2024

Uh oh!

antoniogi Feb 5, 2024

Uh oh!

terriko Feb 5, 2024

Uh oh!

terriko Feb 5, 2024

Uh oh!

terriko Feb 6, 2024

Uh oh!

antoniogi Feb 7, 2024

Uh oh!

terriko Feb 8, 2024

Uh oh!

antoniogi Feb 7, 2024

Uh oh!

terriko Feb 8, 2024

Uh oh!

Uh oh!

fix: use tarfile extract filters to open tarfiles more safely #3769

fix: use tarfile extract filters to open tarfiles more safely #3769

Uh oh!

Conversation

terriko commented Jan 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Jan 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

anthonyharrison left a comment

Choose a reason for hiding this comment

Uh oh!

terriko commented Feb 1, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

terriko commented Jan 31, 2024 •

edited

Loading

codecov-commenter commented Jan 31, 2024 •

edited

Loading