-
Notifications
You must be signed in to change notification settings - Fork 544
fix: improve robustness of version compare #3694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #3694 +/- ##
==========================================
- Coverage 80.62% 80.15% -0.48%
==========================================
Files 805 805
Lines 11985 11978 -7
Branches 1605 1598 -7
==========================================
- Hits 9663 9601 -62
- Misses 1876 1950 +74
+ Partials 446 427 -19
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Note to self, I realized over the weekend that I missed a case, so this need revision, as well as figuring out what to do with interrogate. |
In case anyone's wondering about interrogate: We needed to switch to using pre-commit's path exclusion function rather than interrogate's so that both our use cases worked as I expected. |
This PR is fixing the issue with |
cve_bin_tool/version_compare.py
Outdated
# It's not a "pure" alpha or number string, it's not something like rc12 or 44g | ||
# if anything looks like a hash, we'll replace it with the word HASH and | ||
# treat it like a pre-release in comparisons | ||
versionString = re.sub("[a-fA-F0-9]{8,}", "HASH", versionString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so difficult... This now means that if somebody decides to create a version string with DDMMYYYY at the beginning, that's a hash. Is that a problem? Or, could that be a problem? Would it be worth it to go up to 12-14 characters instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, that's a good point. So far I've seen dates only with delimiters, but that's in no way a guarantee that this is how they look across all the data. Time to go grump my way through the entire database for a bit, I think...
cve_bin_tool/version_compare.py
Outdated
"""Check to see if a version looks like it contains a hash | ||
to avoid nonsense comparisons. This likely needs improvement.""" | ||
|
||
if re.match("[a-fA-F0-9]{8,}", self) is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as the other one regarding the length of the hash
So I'm thinking I'm going to take the hash detection out entirely, and the data sort of supports this. I wrote a lazy script so I could regex through sqlite: import re
import sqlite3
dbcon = sqlite3.connect("/home/terri/.cache/cve-bin-tool/cve.db")
dbcon.create_function('regexp', 2, lambda x, y: 1 if re.search(x,y) else 0)
cursor = dbcon.cursor()
cursor.execute("select product, versionStartIncluding from cve_range where versionStartIncluding REGEXP '[0-9a-fA-F]{8}'")
results = cursor.fetchall()
for i in results:
print(i) Note that this is just versionStartIncluding. But it turns out it returns so few results that I can just paste them here: (venv-3.10) [terri@cedar cve-bin-tool]$ python sqlite-experiments.py
('emc_unity_operating_environment', '4.3.0.1522077968')
('emc_unityvsa_operating_environment', '4.3.0.1522077968')
('grsecurity', '3.1-4.9.8-201702060653')
('ikiwiki', '3.20190207')
('guest-oslogin', '20190304.00')
('guest-oslogin', '20190304.00')
('guest-oslogin', '20190304.00')
('azure_ad', '164.v5b48baa961d2')
('talkyard', 'tyse-v0.2021.02-879ef3fe1-regular')
('talkyard', '0.2020.22-wip-b2e97fe0e')
('neomutt', '20191025')
('spam_engine', '8.12.0-2104190000')
('laserjet_pro_j8h60a_firmware', '20210810')
('laserjet_pro_j8h61a_firmware', '20210810')
('laserjet_pro_j8h61a_firmware', '20210810')
('laserjet_pro_j8h60a_firmware', '20210810')
('enterprise_protection', '8.12.0-2107140000')
('coslat_bx5s1d3_firmware', '5.24.0.r.20180630')
('coslat_bx5s1d4_firmware', '5.24.0.r.20180630')
('coslat_bx5s1d5_firmware', '5.24.0.r.20180630')
('coslat_rm1ds1000_firmware', '5.24.0.r.20180630')
('coslat_rm2ds2000_firmware', '5.24.0.r.20180630')
('coslat_rm2s200_firmware', '5.24.0.r.20180630')
('coslat_rm3s300_firmware', '5.24.0.r.20180630')
('coslat_rm4s500_firmware', '5.24.0.r.20180630')
('raspberrymatic', '2.31.25.20180428')
('credentials', '1055.v1346ba467ba1')
('pipeline\\', '544.vff04fa68714d')
('fedora_coreos', '36.20220820.3.0')
('nest_hub_max_firmware', '10.20221207.2.109')
('nest_hub_firmware', '10.20221207.2.100038')
('dronescout_ds230_firmware', '20211210-1627')
('gin', '1.3.1-0.20190301021747-ccb9e902956d')
('gotham-fe-bundle', '100.30230706.0')
('gotham-fe-bundle', '100.30230702.0')
('geforce_now', '6.00.32705137')
('dronescout_ds230_firmware', '20211210-1627')
('dronescout_ds230_firmware', '20211210-1627')
('azure_ad', '378.vd6e2874a_69eb')
('miniflare', '3.20230821.0') There's a couple in there that look like hashes but we're talking a handful across all of NVD. There are more cases which look like dates (and most of those seem to be firmware, interestingly. Culture difference in action.). I extended it out for the other So, I'm going to remove the hash detection entirely and leave a shortened version of this analysis in a comment for now. |
Signed-off-by: Terri Oda <[email protected]>
Signed-off-by: Terri Oda <[email protected]>
Signed-off-by: Terri Oda <[email protected]>
* fix: improve version compare robustness * fix: add missing docstrings * fix: improve prerelease handling * fix: remove hash detection, add tests & experiment script --------- Signed-off-by: Terri Oda <[email protected]>
* fix: improve version compare robustness * fix: add missing docstrings * fix: improve prerelease handling * fix: remove hash detection, add tests & experiment script --------- Signed-off-by: Terri Oda <[email protected]>
As described in #3657, this switches us to splitting version strings up a bit differently so all numbers and letters are handled as separate entities
So for example,
1apple23banana456carrot
would be parsed out as1.apple.23.banana.456.carrot
We've got some very rudimentary git hash detection in there right now to avoid total nonsense compares, but I suspect we'll need to handle other types of hash correctly as well.
Also, I hit an issue with the new interrogate setup and had to modify the .pre-commit.config file or it wouldn't let me commit anything, so... uh, that's definitely going to be an issue.