-
Notifications
You must be signed in to change notification settings - Fork 547
fix: use tarfile extract filters to open tarfiles more safely #3769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Terri Oda <[email protected]>
Also changed pre-commit config so interrogate is at the top and its output doesn't obscure more urgent error messages. Signed-off-by: Terri Oda <[email protected]>
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #3769 +/- ##
==========================================
+ Coverage 78.04% 80.83% +2.78%
==========================================
Files 803 808 +5
Lines 11810 11994 +184
Branches 1365 1602 +237
==========================================
+ Hits 9217 9695 +478
+ Misses 2158 1878 -280
+ Partials 435 421 -14
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to detecting symbolic links it would be good to block character and block devices . by including a member.isdev()
in the set of files which are not extracted.
@anthonyharrison thanks, that's a good point. After looking it up, I realized I should just be using member.isfile() to avoid them all, so I switched to that. Still tweaking the startsWith check so it works on all OSes we support. |
cve_bin_tool/extractor.py
Outdated
# Python 3.12 has a data filter we can use in extract | ||
# tarfile has this available in older versions as well | ||
if hasattr(tarfile, "data_filter"): | ||
await aio_unpack_archive(filename, extraction_path, filter="data") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you just use something like tarfile.extract_all instead of asynchronously unpacking and then waiting for the operation to finish?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally this was so multiple files could be unpacked in parallel., which helped to speed up certain types of scan if they're not disk i/o bound.
But now that you mention it, I"m worried that the filters getting passed through unpack_archive might be a problem on older versions of python and tarfile.extract might be better for backwards compatibility. I couldn't find evidence either way, so I'll probably change this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I say "probably" because if changing it makes all our tests super slow I might need to rethink. But I will make the change and see what happens.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What has happened so far: bandit apparently doesn't know about filter='data' so we got a security error about unsafe use of tarfile. 😆 (I checked, they have a bug filed about it)
My backported filter isn't working on windows, this is a temporary measure so we can at least merge the linux fix while working on windows. Signed-off-by: Terri Oda <[email protected]>
cve_bin_tool/extractor.py
Outdated
|
||
# FIXME: the backported fix is not working on windows. | ||
# this leaves the current (unsafe) behaviour so we can fix at least one OS for now | ||
if sys.platform == "win32": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should probably read the spec, but this code opens the possibility to tarfile having the data_filter attribute and being a windows system, in which case you would extract the file twice, right? Maybe the right way to do this is by returning above, right after tar.extractall with return e.exit_code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this just needs to be an elif
(I remember thinking that but apparently didn't actually write it... le sigh.)
cve_bin_tool/extractor.py
Outdated
# this leaves the current (unsafe) behaviour so we can fix at least one OS for now | ||
if sys.platform == "win32": | ||
tar.extractall(path=extraction_path) # nosec | ||
tar.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a call to tar.close after the next 'else' section that will be called. With this one here, targ.close gets called twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you. Absolutely non-sarcastically, this is why I lovecode review. I'd originally had the close in multiple spots and though "oh I should move it below" and then I apparently didn't move them all. 🤦 Totally the sort of thing I miss when reading my own code.
…3769) Also changed pre-commit config so interrogate is at the top and its output doesn't obscure more urgent error messages. Note that this is still missing a test and appropriate windows behaviour; those will be coming in future PRs. Signed-off-by: Terri Oda <[email protected]>
Had an external report that we weren't sufficiently careful when opening tarfiles because we're using Python's (insecure) default behaviour here. That default is changed in python 3.12 but since we support 3.8-3.11 as well then we need a fix.
This uses the file filters if available as recommended by the tarfile docs (it's not quite a
from __future__
but similar idea I think) and does a simple "skip symlinks and absolute paths" check if filters are not available.Edit:
This is still missing
I'm hoping we can merge what I have so we have mitigation on Linux in main, but those will definitely be coming in future PRs. (The test would have been in this one but I'm feeling unwell so I'm sending it for code review while I won't be touching it for a bit.)