Description
Bagit-python is currently using a generic pattern to import the list of checksums available to the tool to be used:
Lines 125 to 131 in bff20d2
This creates a help-command with the following in Python 3.9.6
.
--sha3_224
--sha3_256
--blake2s
--sha3_384
--sha384
--md5
--sha224
--shake_256
--sha256
--shake_128
--sha3_512
--blake2b
--sha512
--sha1
@pwinckles brings up something similar here: #156 I am not clear what is invalid about those names - that being said:
The spec states:
To avoid future ambiguity, the checksum algorithm SHOULD be registered in IANA's "Named Information Hash Algorithm Registry"
The most obvious example of algorithm's not in the registry at time of writing are those from the shake family. Further, there is ambiguity in the Blake algorithms presented by Python's hashlib.
Errors caused by shake
Additionally, the shake algorithms require a length argument not handled by the code: https://docs.python.org/3/library/hashlib.html#shake-variable-length-digests
Causing the following when selected.
Traceback (most recent call last):
File "C:\temp\testing\bags\venv\lib\site-packages\bagit.py", line 241, in make_bag
total_bytes, total_files = make_manifests(
File "C:\temp\testing\bags\venv\lib\site-packages\bagit.py", line 1254, in make_manifests
checksums = [manifest_line_generator(i) for i in _walk(data_dir)]
File "C:\temp\testing\bags\venv\lib\site-packages\bagit.py", line 1254, in <listcomp>
checksums = [manifest_line_generator(i) for i in _walk(data_dir)]
File "C:\temp\testing\bags\venv\lib\site-packages\bagit.py", line 1412, in generate_manifest_lines
results = [
File "C:\temp\testing\bags\venv\lib\site-packages\bagit.py", line 1413, in <listcomp>
(alg, hasher.hexdigest(), decoded_filename, total_bytes)
TypeError: hexdigest() missing required argument 'length' (pos 1)
2022-05-13 10:38:36,341 - ERROR - Failed to create bag in govdoc-gifs: hexdigest() missing required argument 'length' (pos 1)
Traceback (most recent call last):
File "C:\temp\testing\bags\venv\lib\site-packages\bagit.py", line 1599, in main
make_bag(
File "C:\temp\testing\bags\venv\lib\site-packages\bagit.py", line 241, in make_bag
total_bytes, total_files = make_manifests(
File "C:\temp\testing\bags\venv\lib\site-packages\bagit.py", line 1254, in make_manifests
checksums = [manifest_line_generator(i) for i in _walk(data_dir)]
File "C:\temp\testing\bags\venv\lib\site-packages\bagit.py", line 1254, in <listcomp>
checksums = [manifest_line_generator(i) for i in _walk(data_dir)]
File "C:\temp\testing\bags\venv\lib\site-packages\bagit.py", line 1412, in generate_manifest_lines
results = [
File "C:\temp\testing\bags\venv\lib\site-packages\bagit.py", line 1413, in <listcomp>
(alg, hasher.hexdigest(), decoded_filename, total_bytes)
TypeError: hexdigest() missing required argument 'length' (pos 1)
While it's nice there's some extensibility for free, perhaps, as this issue can come up again in future, the algorithms can be provided from a known list compatible with IANA today, and Python, and perhaps cross-referenced against, checksums_guaranteed
instead?