Skip to content

fix: improve robustness of version compare #3694

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 22, 2024
Merged

Conversation

terriko
Copy link
Contributor

@terriko terriko commented Jan 5, 2024

As described in #3657, this switches us to splitting version strings up a bit differently so all numbers and letters are handled as separate entities

So for example, 1apple23banana456carrot would be parsed out as 1.apple.23.banana.456.carrot

We've got some very rudimentary git hash detection in there right now to avoid total nonsense compares, but I suspect we'll need to handle other types of hash correctly as well.

Also, I hit an issue with the new interrogate setup and had to modify the .pre-commit.config file or it wouldn't let me commit anything, so... uh, that's definitely going to be an issue.

@terriko terriko linked an issue Jan 5, 2024 that may be closed by this pull request
@codecov-commenter
Copy link

codecov-commenter commented Jan 5, 2024

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (666a3f8) 80.62% compared to head (7d82d3b) 80.15%.

Files Patch % Lines
cve_bin_tool/version_compare.py 84.61% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3694      +/-   ##
==========================================
- Coverage   80.62%   80.15%   -0.48%     
==========================================
  Files         805      805              
  Lines       11985    11978       -7     
  Branches     1605     1598       -7     
==========================================
- Hits         9663     9601      -62     
- Misses       1876     1950      +74     
+ Partials      446      427      -19     
Flag Coverage Δ
longtests 75.09% <92.30%> (+<0.01%) ⬆️
win-longtests 78.17% <92.30%> (-0.48%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@terriko
Copy link
Contributor Author

terriko commented Jan 8, 2024

Note to self, I realized over the weekend that I missed a case, so this need revision, as well as figuring out what to do with interrogate.

@terriko
Copy link
Contributor Author

terriko commented Jan 9, 2024

In case anyone's wondering about interrogate: We needed to switch to using pre-commit's path exclusion function rather than interrogate's so that both our use cases worked as I expected.

@ffontaine
Copy link
Contributor

This PR is fixing the issue with test/condensed-downloads/apparmor_2.9.0-3_amd64.deb.tar.gz

@antoniogi antoniogi self-requested a review January 11, 2024 22:05
# It's not a "pure" alpha or number string, it's not something like rc12 or 44g
# if anything looks like a hash, we'll replace it with the word HASH and
# treat it like a pre-release in comparisons
versionString = re.sub("[a-fA-F0-9]{8,}", "HASH", versionString)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so difficult... This now means that if somebody decides to create a version string with DDMMYYYY at the beginning, that's a hash. Is that a problem? Or, could that be a problem? Would it be worth it to go up to 12-14 characters instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that's a good point. So far I've seen dates only with delimiters, but that's in no way a guarantee that this is how they look across all the data. Time to go grump my way through the entire database for a bit, I think...

"""Check to see if a version looks like it contains a hash
to avoid nonsense comparisons. This likely needs improvement."""

if re.match("[a-fA-F0-9]{8,}", self) is None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as the other one regarding the length of the hash

@terriko
Copy link
Contributor Author

terriko commented Jan 18, 2024

So I'm thinking I'm going to take the hash detection out entirely, and the data sort of supports this.

I wrote a lazy script so I could regex through sqlite:

import re
import sqlite3

dbcon = sqlite3.connect("/home/terri/.cache/cve-bin-tool/cve.db")
dbcon.create_function('regexp', 2, lambda x, y: 1 if re.search(x,y) else 0)
cursor = dbcon.cursor()

cursor.execute("select product, versionStartIncluding from cve_range where versionStartIncluding REGEXP '[0-9a-fA-F]{8}'")

results = cursor.fetchall()

for i in results:
    print(i)

Note that this is just versionStartIncluding. But it turns out it returns so few results that I can just paste them here:

(venv-3.10) [terri@cedar cve-bin-tool]$ python sqlite-experiments.py 
('emc_unity_operating_environment', '4.3.0.1522077968')
('emc_unityvsa_operating_environment', '4.3.0.1522077968')
('grsecurity', '3.1-4.9.8-201702060653')
('ikiwiki', '3.20190207')
('guest-oslogin', '20190304.00')
('guest-oslogin', '20190304.00')
('guest-oslogin', '20190304.00')
('azure_ad', '164.v5b48baa961d2')
('talkyard', 'tyse-v0.2021.02-879ef3fe1-regular')
('talkyard', '0.2020.22-wip-b2e97fe0e')
('neomutt', '20191025')
('spam_engine', '8.12.0-2104190000')
('laserjet_pro_j8h60a_firmware', '20210810')
('laserjet_pro_j8h61a_firmware', '20210810')
('laserjet_pro_j8h61a_firmware', '20210810')
('laserjet_pro_j8h60a_firmware', '20210810')
('enterprise_protection', '8.12.0-2107140000')
('coslat_bx5s1d3_firmware', '5.24.0.r.20180630')
('coslat_bx5s1d4_firmware', '5.24.0.r.20180630')
('coslat_bx5s1d5_firmware', '5.24.0.r.20180630')
('coslat_rm1ds1000_firmware', '5.24.0.r.20180630')
('coslat_rm2ds2000_firmware', '5.24.0.r.20180630')
('coslat_rm2s200_firmware', '5.24.0.r.20180630')
('coslat_rm3s300_firmware', '5.24.0.r.20180630')
('coslat_rm4s500_firmware', '5.24.0.r.20180630')
('raspberrymatic', '2.31.25.20180428')
('credentials', '1055.v1346ba467ba1')
('pipeline\\', '544.vff04fa68714d')
('fedora_coreos', '36.20220820.3.0')
('nest_hub_max_firmware', '10.20221207.2.109')
('nest_hub_firmware', '10.20221207.2.100038')
('dronescout_ds230_firmware', '20211210-1627')
('gin', '1.3.1-0.20190301021747-ccb9e902956d')
('gotham-fe-bundle', '100.30230706.0')
('gotham-fe-bundle', '100.30230702.0')
('geforce_now', '6.00.32705137')
('dronescout_ds230_firmware', '20211210-1627')
('dronescout_ds230_firmware', '20211210-1627')
('azure_ad', '378.vd6e2874a_69eb')
('miniflare', '3.20230821.0')

There's a couple in there that look like hashes but we're talking a handful across all of NVD. There are more cases which look like dates (and most of those seem to be firmware, interestingly. Culture difference in action.).

I extended it out for the other *cluding columns and added the vendor, and it looks like mostly we'll have some weird results for jenkins products and no one else if we take out the hash detection, but if we leave it in as is we'll potentially screw up a lot of firmware and a few other products.

So, I'm going to remove the hash detection entirely and leave a shortened version of this analysis in a comment for now.

@terriko terriko merged commit 72c198c into intel:main Jan 22, 2024
inosmeet pushed a commit to inosmeet/cve-bin-tool that referenced this pull request Feb 6, 2024
* fix: improve version compare robustness
* fix: add missing docstrings 
* fix: improve prerelease handling
* fix: remove hash detection, add tests & experiment script

---------

Signed-off-by: Terri Oda <[email protected]>
inosmeet pushed a commit to inosmeet/cve-bin-tool that referenced this pull request Feb 16, 2024
* fix: improve version compare robustness
* fix: add missing docstrings 
* fix: improve prerelease handling
* fix: remove hash detection, add tests & experiment script

---------

Signed-off-by: Terri Oda <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: Parse error for 2.8.94.0ubuntu1.4
4 participants