Skip to content

fix: use tarfile extract filters to open tarfiles more safely #3769

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Feb 12, 2024

Conversation

terriko
Copy link
Contributor

@terriko terriko commented Jan 31, 2024

Had an external report that we weren't sufficiently careful when opening tarfiles because we're using Python's (insecure) default behaviour here. That default is changed in python 3.12 but since we support 3.8-3.11 as well then we need a fix.

This uses the file filters if available as recommended by the tarfile docs (it's not quite a from __future__ but similar idea I think) and does a simple "skip symlinks and absolute paths" check if filters are not available.

Edit:
This is still missing

  • test
  • a fix that works in windows

I'm hoping we can merge what I have so we have mitigation on Linux in main, but those will definitely be coming in future PRs. (The test would have been in this one but I'm feeling unwell so I'm sending it for code review while I won't be touching it for a bit.)

Also changed pre-commit config so interrogate is at the top and its output
doesn't obscure more urgent error messages.

Signed-off-by: Terri Oda <[email protected]>
@codecov-commenter
Copy link

codecov-commenter commented Jan 31, 2024

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (70b9ef9) 78.04% compared to head (0e91188) 80.83%.
Report is 32 commits behind head on main.

Files Patch % Lines
cve_bin_tool/extractor.py 64.28% 4 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3769      +/-   ##
==========================================
+ Coverage   78.04%   80.83%   +2.78%     
==========================================
  Files         803      808       +5     
  Lines       11810    11994     +184     
  Branches     1365     1602     +237     
==========================================
+ Hits         9217     9695     +478     
+ Misses       2158     1878     -280     
+ Partials      435      421      -14     
Flag Coverage Δ
longtests 80.22% <50.00%> (?)
win-longtests 78.49% <50.00%> (+0.44%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@terriko terriko added this to the 3.3 milestone Feb 1, 2024
Copy link
Contributor

@anthonyharrison anthonyharrison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to detecting symbolic links it would be good to block character and block devices . by including a member.isdev() in the set of files which are not extracted.

@terriko
Copy link
Contributor Author

terriko commented Feb 1, 2024

@anthonyharrison thanks, that's a good point. After looking it up, I realized I should just be using member.isfile() to avoid them all, so I switched to that.

Still tweaking the startsWith check so it works on all OSes we support.

@antoniogi antoniogi self-requested a review February 5, 2024 19:10
# Python 3.12 has a data filter we can use in extract
# tarfile has this available in older versions as well
if hasattr(tarfile, "data_filter"):
await aio_unpack_archive(filename, extraction_path, filter="data")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you just use something like tarfile.extract_all instead of asynchronously unpacking and then waiting for the operation to finish?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally this was so multiple files could be unpacked in parallel., which helped to speed up certain types of scan if they're not disk i/o bound.

But now that you mention it, I"m worried that the filters getting passed through unpack_archive might be a problem on older versions of python and tarfile.extract might be better for backwards compatibility. I couldn't find evidence either way, so I'll probably change this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I say "probably" because if changing it makes all our tests super slow I might need to rethink. But I will make the change and see what happens.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What has happened so far: bandit apparently doesn't know about filter='data' so we got a security error about unsafe use of tarfile. 😆 (I checked, they have a bug filed about it)

@terriko terriko marked this pull request as ready for review February 7, 2024 18:38

# FIXME: the backported fix is not working on windows.
# this leaves the current (unsafe) behaviour so we can fix at least one OS for now
if sys.platform == "win32":

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should probably read the spec, but this code opens the possibility to tarfile having the data_filter attribute and being a windows system, in which case you would extract the file twice, right? Maybe the right way to do this is by returning above, right after tar.extractall with return e.exit_code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this just needs to be an elif (I remember thinking that but apparently didn't actually write it... le sigh.)

# this leaves the current (unsafe) behaviour so we can fix at least one OS for now
if sys.platform == "win32":
tar.extractall(path=extraction_path) # nosec
tar.close()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a call to tar.close after the next 'else' section that will be called. With this one here, targ.close gets called twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you. Absolutely non-sarcastically, this is why I lovecode review. I'd originally had the close in multiple spots and though "oh I should move it below" and then I apparently didn't move them all. 🤦 Totally the sort of thing I miss when reading my own code.

@terriko terriko merged commit b4feb03 into intel:main Feb 12, 2024
inosmeet pushed a commit to inosmeet/cve-bin-tool that referenced this pull request Feb 16, 2024
…3769)

Also changed pre-commit config so interrogate is at the top and its output
doesn't obscure more urgent error messages.

Note that this is still missing a test and appropriate windows behaviour; those will be coming in future PRs.

Signed-off-by: Terri Oda <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants