Skip to content

Commit 1dd0238

Browse files
stvmlshanscendent
andauthored
fix: Support detecting multiple product versions (#4911)
This patch refactors the core version checker code to support detecting multiple versions of the same product in a single file. There was some code already present that indicated an earlier intention to make this work, but the version checker logic was written to exit early after matching one version in a given file. While doing this refactor, I also discovered that the version filename parser test suite was broken, allowing faulty checkers and faulty checker test cases to pass. I fixed the broken test, then Shan and I fixed the handful of checkers that were found to be defective. Roughly exhaustive list of changes: * Rename `Checker.get_version()` to `get_versions()` to reflect its new abilities. Replace the name across code and documentation. * Create a new `VersionMatchInfo` class to capture the return from `get_versions()`. For a given file, the checker will now report whether it matched on the filename, the contents, or both, as well as a list of all version numbers detected. * Add some type hints! `get_versions()` is now explicit about what it returns. Previously, the intent was to return either a dict or a list of dicts, and the caller had to apply some fragile logic to figure out which. * Fix the test bug caused by that fragile logic. `TestCheckerVersionParser::test_filename_is` was failing to assert anything when the checker returned an empty dict, and therefore was allowing bad test cases and bad checkers alike to pass the test. * After fixing that test, also fix the handful of filename version checkers that were broken all along but flying under the radar. * Update `regex_find` in `util.py` to exhaust `pattern.finditer()` instead of exiting after finding one result with `pattern.search()`. Return a list of strings instead of a string. * Tweak the ignore checker logic to do a true regex search for the ignore pattern instead of some sketchy `str(a) in str(b)` stuff. The ignore patterns are a `list[Pattern[str]]` so we should probably be doing actual regex lookups here. Fixes #4184. --------- Signed-off-by: Steve Miller <[email protected]> Co-authored-by: Shan Lee <[email protected]>
1 parent 003df2a commit 1dd0238

File tree

13 files changed

+125
-85
lines changed

13 files changed

+125
-85
lines changed

cve_bin_tool/checkers/README.md

Lines changed: 16 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -311,7 +311,7 @@ That regex might look like this: `3\?Xiph.Org libVorbis ([0-9]+\.[0-9]+\.[0-9]+)
311311

312312
> If you can't get a signature match using just regex you may end up needing to
313313
> overwrite the
314-
> [`get_version()`](https://github.com/intel/cve-bin-tool/blob/main/cve_bin_tool/checkers/__init__.py#L120-L132)
314+
> [`get_versions()`](https://github.com/intel/cve-bin-tool/blob/main/cve_bin_tool/checkers/__init__.py#L120-L132)
315315
> method for the checker, but that should be a last resort if you can't find a
316316
> regex that works for `VERSION_PATTERNS`.
317317
>
@@ -322,10 +322,10 @@ That regex might look like this: `3\?Xiph.Org libVorbis ([0-9]+\.[0-9]+\.[0-9]+)
322322
> 1.3.7. While this was a nice example for how one might find a signature, it
323323
> in the end is not all the work that is needed to create a checker for
324324
> libvorbis. We need to make sure that any checker we develop has a
325-
> `get_version()` function which works for versions of the software which have
325+
> `get_versions()` function which works for versions of the software which have
326326
> CVEs. If not overridden in a subclass the Checker base class implements a
327-
> `get_version()` method which will use regex to determine the version (as
328-
> described above). In the case of libvorbis a custom `get_version()` function
327+
> `get_versions()` method which will use regex to determine the version (as
328+
> described above). In the case of libvorbis a custom `get_versions()` function
329329
> is likely needed, this is because the signature we found is not in the 1.2.0
330330
> version, where the CVE is found.
331331
@@ -503,25 +503,26 @@ You can set an arbitrary number of workers. A good rule of thumb is to specify n
503503

504504
The CVE-bin-checker works by extracting strings from binaries and determining
505505
if a given library has been compiled into the binary. For this, Checker class
506-
contains two methods: 1) `guess_contains()` and 2) `get_version()`.
506+
contains two methods: 1) `guess_contains()` and 2) `get_versions()`.
507507

508508
1. `guess_contains()` method takes list of extracted string lines as an input and
509509
return True if it finds any of the `CONTAINS_PATTERNS` on any line from the
510510
lines.
511-
2. `get_version()` method takes list of extracted string lines and the filename as
512-
inputs and returns information about whether the binary contains the library
513-
in question, is a copy of the library in question, and if either of those are
514-
true it also returns a version string. If the binary does not contain the
515-
library, this function returns an empty dictionary.
511+
2. `get_versions()` method takes a list of extracted string lines and the filename as
512+
inputs and returns a VersionMatchInfo. This object contains whether a match was
513+
found in the filename, whether a match was found in the contents of the file, and a list
514+
of version strings found in the file. If no matches are found, both booleans will be false
515+
and the list of version strings will be empty. If a match is found but no valid version
516+
strings are matched, the list of version strings will contain only "UNKNOWN".
516517

517-
If `curl` product is being scanned, `get_version()` method of CurlChecker will
518-
return following dictionary.
518+
For example, if the `curl` product is being scanned, the `get_versions()` method of CurlChecker will
519+
return the following object:
519520

520521
```json
521522
{
522-
"is_or_contains": "is",
523-
"modulename": "curl",
524-
"version": "6.41.0"
523+
"matched_filename": true,
524+
"matched_contains": true,
525+
"versions": ["6.41.0"]
525526
}
526527
```
527528

cve_bin_tool/checkers/__init__.py

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
import sys
1010

1111
from cve_bin_tool.error_handler import InvalidCheckerError
12-
from cve_bin_tool.util import regex_find
12+
from cve_bin_tool.util import VersionMatchInfo, regex_find
1313

1414
if sys.version_info >= (3, 10):
1515
from importlib import metadata as importlib_metadata
@@ -475,21 +475,22 @@ def guess_contains(self, lines):
475475
return True
476476
return False
477477

478-
def get_version(self, lines, filename):
479-
version_info = dict()
478+
def get_versions(self, lines: str, filename: str) -> VersionMatchInfo:
479+
480+
version_details = VersionMatchInfo()
480481

481482
if any(pattern.match(filename) for pattern in self.FILENAME_PATTERNS):
482-
version_info["is_or_contains"] = "is"
483+
version_details.matched_filename = True
483484

484-
if "is_or_contains" not in version_info and self.guess_contains(lines):
485-
version_info["is_or_contains"] = "contains"
485+
if self.guess_contains(lines):
486+
version_details.matched_contains = True
486487

487-
if "is_or_contains" in version_info:
488-
version_info["version"] = regex_find(
488+
if version_details.matched_filename or version_details.matched_contains:
489+
version_details.versions = regex_find(
489490
lines, self.VERSION_PATTERNS, self.IGNORE_PATTERNS
490491
)
491492

492-
return version_info
493+
return version_details
493494

494495

495496
BUILTIN_CHECKERS = {

cve_bin_tool/checkers/bind.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ class BindChecker(Checker):
2828
r"libisccc([-_]?(\d+\.)+\d.*)?\.so",
2929
r"libisccfg([-_]?(\d+\.)+\d.*)?\.so",
3030
r"libns([-_]?(\d+\.)+\d.*)?\.so",
31+
r"libbind9([-_]?(\d+\.)+\d.*)?\.so",
3132
]
3233
VERSION_PATTERNS = [
3334
r"version: BIND ([0-9]+\.[0-9]+\.[0-9]+)", # for .rpm, .tgz, etc.

cve_bin_tool/checkers/curl.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
1414
Note: Some of the "first vulnerable in" data may not be entered correctly.
1515
"""
16+
1617
from cve_bin_tool.checkers import Checker
1718

1819

@@ -26,6 +27,8 @@ class CurlChecker(Checker):
2627
# r"error retrieving curl library information",
2728
# r"ignoring --proxy-capath, not supported by libcurl",
2829
]
29-
FILENAME_PATTERNS = [r"curl"]
30+
FILENAME_PATTERNS = [
31+
r"curl",
32+
]
3033
VERSION_PATTERNS = [r"\r?\ncurl[ -/]([678]+\.[0-9]+\.[0-9]+)"]
3134
VENDOR_PRODUCT = [("haxx", "curl")]

cve_bin_tool/checkers/ffmpeg.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
1212
Note: Some of the "first vulnerable in" data may not be entered correctly.
1313
"""
14+
1415
from cve_bin_tool.checkers import Checker
1516

1617

@@ -20,7 +21,12 @@ class FfmpegChecker(Checker):
2021
r"Codec '%s' is not recognized by FFmpeg.",
2122
r"Codec '%s' is known to FFmpeg, but no %s for it are available.",
2223
]
23-
FILENAME_PATTERNS = [r"ffmpeg"]
24+
FILENAME_PATTERNS = [
25+
# check if it's the library instead of the command line util
26+
r"libffmpeg\.so(\.\d+)?",
27+
# command line util
28+
r"ffmpeg",
29+
]
2430
VERSION_PATTERNS = [
2531
r"%s version ([0-9]+\.[0-9]+\.[0-9]+)[a-zA-Z0-9 \(\)%~\-\r\n]*(?:avutil|FFmpeg)",
2632
r"FFmpeg version n?([0-9]+\.[0-9]+\.[0-9]+)",

cve_bin_tool/checkers/gstreamer.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,14 +8,20 @@
88
https://www.cvedetails.com/vulnerability-list/vendor_id-9481/Gstreamer.html
99
https://www.cvedetails.com/product/35669/Gstreamer-Project-Gstreamer.html?vendor_id=16047
1010
"""
11+
1112
from cve_bin_tool.checkers import Checker
1213

1314

1415
class GstreamerChecker(Checker):
1516
CONTAINS_PATTERNS = [
1617
r"http://bugzilla.gnome.org/enter_bug.cgi?product=GStreamer",
1718
]
18-
FILENAME_PATTERNS = [r"gstreamer"]
19+
FILENAME_PATTERNS = [
20+
# check if it's the library instead of the command line util
21+
r"libgstreamer\.so(\.\d+)?",
22+
# command line util
23+
r"gstreamer",
24+
]
1925
VERSION_PATTERNS = [
2026
r"((\d+\.)+\d+)[a-zA-Z \r\n]*GStreamer ",
2127
r"gstreamer[a-zA-Z \r\n]+((\d+\.)+\d+)",

cve_bin_tool/checkers/icu.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
https://www.cvedetails.com/vulnerability-list/vendor_id-17477/Icu-project.html
1010
https://www.cvedetails.com/product/101097/Unicode-International-Components-For-Unicode.html?vendor_id=21486
1111
"""
12+
1213
from cve_bin_tool.checkers import Checker
1314

1415

@@ -20,7 +21,13 @@ class IcuChecker(Checker):
2021
r"is about 300kB larger than the ucadata-implicithan\.icu version\.",
2122
r"the ucadata-unihan\.icu version of the collation root data",
2223
]
23-
FILENAME_PATTERNS = [r"genrb", r"uconv"]
24+
FILENAME_PATTERNS = [
25+
# check if it's the library instead of the command line utils
26+
r"international_components_for_unicode.o",
27+
# command line utils
28+
r"genrb",
29+
r"uconv",
30+
]
2431
VERSION_PATTERNS = [
2532
r"icu(?:-|/)([0-9]+\.[0-9]+(\.[0-9]+)?)",
2633
r"ICU ([0-9]+\.[0-9]+(\.[0-9]+)?)",

cve_bin_tool/checkers/libcurl.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@
1515

1616
class LibcurlChecker(Checker):
1717
CONTAINS_PATTERNS: list[str] = []
18-
FILENAME_PATTERNS: list[str] = []
18+
FILENAME_PATTERNS: list[str] = [
19+
r"libcurl\.so(\.\d+)?",
20+
]
1921
VERSION_PATTERNS = [r"libcurl[ -/]([678]+\.[0-9]+\.[0-9]+)"]
2022
VENDOR_PRODUCT = [("haxx", "libcurl")]

cve_bin_tool/checkers/sqlite.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
CVE checker for sqlite
77
88
References:
9-
SQLLite version/sha1 data from https://www.sqlite.org/changes.html
9+
SQLite version/sha1 data from https://www.sqlite.org/changes.html
1010
1111
CVE list: https://www.cvedetails.com/vulnerability-list/vendor_id-9237/product_id-16355/Sqlite-Sqlite.html
1212
@@ -81,14 +81,14 @@ def guess_contains(self, lines):
8181
# If that fails, find a signature that might indicate presence of sqlite
8282
return super().guess_contains(lines)
8383

84-
def get_version(self, lines, filename):
84+
def get_versions(self, lines, filename):
8585
"""returns version information for sqlite as found in a given file.
8686
8787
The most correct way to do this is to search for the sha1 sums per release.
8888
Fedora rpms have a simpler SQLite version string.
8989
"""
9090

91-
version_info = super().get_version(lines, filename)
91+
version_info = super().get_versions(lines, filename)
9292

9393
for mapping in self.VERSION_MAP:
9494
# Truncate last four characters as "If the source code has been edited
@@ -97,6 +97,6 @@ def get_version(self, lines, filename):
9797
# https://www.sqlite.org/c3ref/c_source_id.html
9898
if mapping[1][:-4] in lines:
9999
# overwrite version with the version found by sha mapping
100-
version_info["version"] = mapping[0]
100+
version_info.versions.append(mapping[0])
101101

102102
return version_info

cve_bin_tool/util.py

Lines changed: 48 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -194,6 +194,30 @@ class ScanInfo(NamedTuple):
194194
file_path: str
195195

196196

197+
class VersionMatchInfo:
198+
"""
199+
Class to hold version match information
200+
attributes:
201+
matched_filename: bool
202+
matched_contains: bool
203+
versions: list[str]
204+
"""
205+
206+
matched_filename: bool
207+
matched_contains: bool
208+
versions: list[str]
209+
210+
def __init__(
211+
self,
212+
matched_filename: bool = False,
213+
matched_contains: bool = False,
214+
versions: list[str] = [],
215+
):
216+
self.matched_filename = matched_filename
217+
self.matched_contains = matched_contains
218+
self.versions = versions or []
219+
220+
197221
class VersionInfo(NamedTuple):
198222
"""
199223
Class to hold version information of a product
@@ -241,23 +265,32 @@ def __missing__(self, key: str) -> list[CVE] | set[str]:
241265

242266
def regex_find(
243267
lines: str, version_patterns: list[Pattern[str]], ignore: list[Pattern[str]]
244-
) -> str:
245-
"""Search a set of lines to find a match for the given regex"""
246-
new_guess = ""
268+
) -> list[str]:
269+
"""Search a set of lines to find all matches for the given regex"""
270+
versions = list()
247271

248272
for pattern in version_patterns:
249-
match = pattern.search(lines)
250-
if match:
251-
new_guess = match.group(1).strip()
252-
for i in ignore:
253-
if str(i) in str(new_guess) or str(new_guess) in str(i):
254-
new_guess = ""
255-
break
256-
if new_guess != "":
257-
new_guess = new_guess.replace("_", ".")
258-
return new_guess.replace("-", ".")
259-
else:
260-
return "UNKNOWN"
273+
version_matches = pattern.finditer(lines)
274+
for match in version_matches:
275+
# before collecting a potential version number, ensure the version string isn't on the ignore list
276+
if not check_ignored(match.string, ignore):
277+
version = match.group(1).strip()
278+
version = version.replace("_", ".").replace("-", ".")
279+
versions.append(version)
280+
281+
# if we searched and found no matches, just return a one-element list containing "UNKNOWN"
282+
if not versions:
283+
versions.append("UNKNOWN")
284+
285+
return versions
286+
287+
288+
def check_ignored(possible_version_string: str, ignore: list[Pattern[str]]) -> bool:
289+
"""Check if a possible version should be ignored based on the ignore patterns."""
290+
for pattern in ignore:
291+
if pattern.search(possible_version_string):
292+
return True
293+
return False
261294

262295

263296
def inpath(binary: str) -> bool:

cve_bin_tool/version_scanner.py

Lines changed: 8 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -136,7 +136,7 @@ def print_language_checkers(self) -> None:
136136
self.logger.info(f'Language Checkers: {", ".join(self.language_checkers)}')
137137

138138
def number_of_language_checkers(self) -> int:
139-
"""return the number of langauge checkers avaialble"""
139+
"""return the number of language checkers available"""
140140
return len(self.language_checkers)
141141

142142
def is_executable(self, filename: str) -> tuple[bool, str | None]:
@@ -254,31 +254,19 @@ def run_checkers(self, filename: str, lines: str) -> Iterator[ScanInfo]:
254254
# tko
255255
for dummy_checker_name, checker in self.checkers.items():
256256
checker = checker()
257-
result = checker.get_version(lines, filename)
258-
# do some magic so we can iterate over all results, even the ones that just return 1 hit
259-
if "is_or_contains" in result:
260-
results = [dict()]
261-
results[0] = result
262-
else:
263-
results = result
264-
265-
for result in results:
266-
if "is_or_contains" in result:
267-
version = "UNKNOWN"
268-
if "version" in result and result["version"] != "UNKNOWN":
269-
version = result["version"]
270-
elif result["version"] == "UNKNOWN":
257+
version_results = checker.get_versions(lines, filename)
258+
259+
if version_results.matched_filename or version_results.matched_contains:
260+
for version in version_results.versions:
261+
if version == "UNKNOWN":
271262
file_path = "".join(self.file_stack)
272263
self.logger.debug(
273264
f"{dummy_checker_name} was detected with version UNKNOWN in file {file_path}"
274265
)
275266
else:
276-
self.logger.error(f"No version info for {dummy_checker_name}")
277-
278-
if version != "UNKNOWN":
279267
file_path = "".join(self.file_stack)
280268
self.logger.debug(
281-
f'{file_path} {result["is_or_contains"]} {dummy_checker_name} {version}'
269+
f"{file_path} matched {dummy_checker_name} {version} ({version_results.matched_filename=}, {version_results.matched_contains=})"
282270
)
283271
for vendor, product in checker.VENDOR_PRODUCT:
284272
yield ScanInfo(
@@ -292,7 +280,7 @@ def run_checkers(self, filename: str, lines: str) -> Iterator[ScanInfo]:
292280
def clean_file_path(filepath: str) -> str:
293281
"""Returns a cleaner filepath by removing temp path from filepath"""
294282

295-
# we'll recieve a filepath similar to
283+
# we'll receive a filepath similar to
296284
# /temp/anything/extractable_filename.extracted/folders/inside/file
297285
# We'll return /folders/inside/file to be scanned
298286

test/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@ To test the filename mappings, rather than making a bunch of empty files, we're
160160
)
161161
```
162162

163-
The function test_filename_is will then load the checker you have specified (and fail spectacularly if you specify a checker that does not exist), try to run get_version() with an empty file content and the filename you specified, then check that it "is" something (as opposed to "contains") and that the modulename that get_version returns is in fact the expected_result you specified.
163+
The function test_filename_is will then load the checker you have specified (and fail spectacularly if you specify a checker that does not exist), try to run get_versions() with an empty file content and the filename you specified, then check that it "is" something (as opposed to "contains") and that the modulename that get_version returns is in fact the expected_result you specified.
164164

165165
For ease of maintenance, please keep the parametrize list in alphabetical order when you add a new tests.
166166

0 commit comments

Comments
 (0)