Skip to content

🐞 Unable to hash password due to lack of memory #165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AlleyPally opened this issue Jan 6, 2025 · 29 comments
Closed

🐞 Unable to hash password due to lack of memory #165

AlleyPally opened this issue Jan 6, 2025 · 29 comments
Assignees

Comments

@AlleyPally
Copy link

lrzip-next Version

0.13.2

lrzip-next command line

lrzip-next -Uvv --encrypt=goodpassword hello.txt

What happened?

SCRYPT was unable to hash the password due to a lack of memory, i'm not sure (I have 4 gigs of RAM, perhaps the error is genuinely not a bug?)

The file in question:
hello.txt

What was expected behavior?

To compress and encrypt and hash a small text file.

Steps to reproduce

  1. Clone lrzip 0.13.2.
  2. Build.
  3. Attempt to encrypt a file using the options above.
  4. Watch it fail.

Relevant log output

The following options are in effect for this COMPRESSION.
Threading is ENABLED. Number of CPUs detected: 2
Detected 3,898,195,968 bytes ram
Nice Value: 19
Show Progress
Max Verbose
Temporary Directory set as: /tmp/
Compression mode is: LZMA. LZ4 Compressibility testing enabled
Compression level 7
RZIP Compression level 7
Initial LZMA Dictionary Size: 33,554,432
MD5 Hashing Used
AES128 Encryption Used
Using Unlimited Window size
Storage time in seconds 1,404,425,981
SCRYPTing password: Cost factor 8,388,608, Parallelization Factor: 1
Unable to hash password. Not enough memory? Error: 7
Fatal error - exiting

Please provide system details

OS Distro: Arch Linux
Kernel Version (uname -a): Linux Alley 6.12.8-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 02 Jan 2025 22:52:26 +0000 x86_64 GNU/Linux

System ram (free -h): total used free shared buff/cache available
Mem: 3.6Gi 1.9Gi 387Mi 407Mi 2.0Gi 1.8Gi
Swap: 4.0Gi 123Mi 3.9Gi

Additional Context

I have tried to use all the different hashing algorithms, different compressors, different compressors, different memsize, etc, to no avail.

Normal lrzip works just fine but it's encryption scheme is not as robust lrzip-next's. My specs are on the lower side so this might be a me issue.

Thank you mister Peter for your continuous hard work and dedication!

@pete4abw
Copy link
Owner

pete4abw commented Jan 6, 2025

This is not a bug. SCRYPT hashing is intended to defeat large scale cracking operations. From Wikipedia: https://en.wikipedia.org/wiki/Scrypt

In cryptography, scrypt is a password-based key derivation function created by Colin Percival in March 2009, originally for the Tarsnap online backup service.[2][3] The algorithm was specifically designed to make it costly to perform large-scale custom hardware attacks by requiring large amounts of memory.

Your system does not have enough RAM to process. There's probably a way to anticipate memory requirements, but I don't know it. For your case, try NOT using -U, and for goodness sakes, don't use a small test file. enwik8 file is a good test. https://www.mattmahoney.net/dc/enwik8.zip.

Thank you for the report, but I am closing because you have a system with insufficient RAM.

@pete4abw
Copy link
Owner

pete4abw commented Jan 6, 2025

Closing.

@pete4abw pete4abw closed this as not planned Won't fix, can't repro, duplicate, stale Jan 6, 2025
@pete4abw
Copy link
Owner

pete4abw commented Jan 8, 2025

I did a little research. This site explains the memory requirements for SCRYPT key derivation, which lrzip-next uses. Applying the memory requirements formula: 128 * cost factor * block size (defaults to 8 in libgcrypt) you get:

128 * 8,388,608 * 8 = 8,602,818,478 which is a little more than 8GB.

The function nloops in lrzip.c is used to determine cost factor -- although in early days (2007-2011), it was used to hash and rehash passwords uniquely. This function has remained unchanged even though lrzip-next uses a totally different key derivation and hashing model. Perhaps a check could be added to reduce memory requirements for encryption on smaller systems. It may no longer be required, but a lot of work and testing will need to be done before changing anything.

Thank you for reporting this.

@pete4abw
Copy link
Owner

Reopening. lrzip-next should check if there is enough ram for SCRYPT hashing. See whats-next branch for test.

@pete4abw pete4abw reopened this Jan 10, 2025
@AlleyPally
Copy link
Author

It works on both compression and decompression! I compiled and tested the whats-next branch with the command:
lrzip-next -vv --encrypt=goodpassword ~/enwik8
And soon afterwards, lrzip-next prints out:
Costfactor reduced to 2097152 due to ram limitations!

Fantastic job! Thank you!

@pete4abw
Copy link
Owner

It works on both compression and decompression! I compiled and tested the whats-next branch with the command: lrzip-next -vv --encrypt=goodpassword ~/enwik8 And soon afterwards, lrzip-next prints out: Costfactor reduced to 2097152 due to ram limitations!

Fantastic job! Thank you!

We're not done yet. One more check has to be done in case the decompression is performed on a different system with more/less ram. Available ram affects costfactor so we have to be sure that the old costfactor already computed is used to decrypt. Otherwise an error will be reported. This involves storing the hash loops in bytes 6 and 7 of the header and then NOT trying to recompute the hash on decompression.

What you have now will work fine on your system, but won't be portable.

Thank you again for testing. Stay tuned.

@AlleyPally
Copy link
Author

Hello again! You were completely correct. I set up an Arch Linux virtual machine and gave it only one GB of RAM as to test out the portability of SCRYPTed files. Attempting to decompress a file that was compressed on the host (4 gigs of RAM) did not work. The error that was outputted was Validating file for consistency...Invalid stream ctype (cf) for encrypted file. Bad Password?. Likewise, decompressing a file from the guest on the host also did not work and outputted a similar error (only the stream ctype was different).

This was for redundancy's sake but I also attempted increasing the amount of memory that the guest has to 2 GB and no cigar, It is exactly as you said!

The file in question is the enwik8 file and it was compressed with the following options on both the host and the guest: lrzip-next --encrypt=goodpassword ~/enwik8.

Log output:
hostdecompress.txt
remotedecompress.txt

@pete4abw
Copy link
Owner

@AlleyPally , right on. Thank you for confirming. The bug is a little involved and has to do with the writing and then reading of the encrypted headers for each block. Interestingly, but not surprisingly, with no encryption, a system of any size can decompress a follow created anywhere.

I'll keep on this. Thank you again.

Quick tip. Use the -m option to fake maximum system memory. You won't need to create virtual machines or alternate installations!
-m 80 would be 8G 80x100MB
-m 20 would be 2G 20x100MB

See manage.

@pete4abw
Copy link
Owner

@AlleyPally , I haven't forgotten about this. The issue appears to be that during decompression, the costfactor for SCRYPTing is recomputed based on available ram and should not be. The costfactor should be stored in the magic header as it is derived on the host system. Stay tuned...

@pete4abw
Copy link
Owner

@AlleyPally Try the whats-next branch, please and report. You will notice that I tested decompression with different memory sizes. See what you can find!

$ src/lrzip-next -i -epassword ../enwik8.lrz

Summary

File: ../enwik8.lrz
lrzip-next version: 0.13 AES128 Encrypted file
Compression Method: rzip + lzma -- lc = 3, lp = 0, pb = 2, Dictionary Size = 262,144
Rzip Compression Level: 1, Lrzip-next Compression Level: 1

Due to using Encryption, expected decompression size not available
Decompressed file size: Unavailable
Compressed file size: 32,968,062
Compression ratio: Unavailable

MD5 Checksum: a1fa5ffddb56f4953e226637dabbb36a

$ src/lrzip-next -i -epassword ../enwik8.lrz -m10

Summary

File: ../enwik8.lrz
lrzip-next version: 0.13 AES128 Encrypted file
Compression Method: rzip + lzma -- lc = 3, lp = 0, pb = 2, Dictionary Size = 262,144
Rzip Compression Level: 1, Lrzip-next Compression Level: 1

Due to using Encryption, expected decompression size not available
Decompressed file size: Unavailable
Compressed file size: 32,968,062
Compression ratio: Unavailable

MD5 Checksum: a1fa5ffddb56f4953e226637dabbb36a

@AlleyPally
Copy link
Author

Greetings! In my testing, I noticed that a PC with less RAM is unable to decompress a file that was compressed by a PC with bigger RAM. However, the opposite isn't true; a PC with more RAM can decompress what was compressed by a PC with less RAM!

I compressed a file and subsequently decompressed it with the -m option. I noticed that lrzip-next used more memory than it should have otherwise been able to. I decided to boot up a VM for testing.

I gave the guest 1 GB RAM and decompression was unsuccessful. Likewise with 2 gigs to make sure. It spat out the error below, error 32854:

remoteerror1gig.txt

However, compressing a file on the guest and then decompressing it on the host worked was successful.

For redundancy's sake, I gave the guest 4 gigs of (shared) RAM and it decompressed the file just fine!

@pete4abw
Copy link
Owner

@AlleyPally , the error makes sense. lrzip-next will store the cost factor in the header. In the report you submitted

Detected 987009024 bytes ram
SCRYPTing password: Cost factor 2097152, Parallelization Factor: 1
Unable to hash password. Not enough memory? Error: 32854 - Cannot allocate memory

The memory requirements for the stored cost factor is 2097152*1024 = 2,147,524,648 or 2GB. The lightweight system simply does not have enough ram. I'm making some further tweaks that hopefully will simplify things more. The original algo for hashing passwords does not apply to computing cost factor. I think I may just ditch that and tailor cost factor to host ram on compression. Thank you again. I appreciate your efforts to make lrzip-next better!

@pete4abw
Copy link
Owner

@AlleyPally , I made some minor changes, but the fact is, when trying to decompress on a system with less ram than the system an encrypted compression was made on, it may fail. And that is expected behavior. I think I have taken this as far as I can.

@AlleyPally
Copy link
Author

I have re-done my previous tests as to confirm and it is like before; the guest was unable to decompress but the host was able to decompress just fine.

I cannot thank you enough for your dedication and care mister Peter! You poured a lot of effort, looking into this small issue of mine and have done a great job. I appreciate it!

I believe there is nothing more to do in this case. Thank you!

@pete4abw
Copy link
Owner

Unfortunately, the only way to assure compatibility would be to have a much lower cost factor. The default used for logins is typically N=16384. This would make memory requirements 16MB (16384 * 128 * 8). Currently, memory requirements are 1GB plus. Is it overkill? A command line option to set cost factor is also a conssideration - e.g. N=14 where Cost factor = 2^14 = 16384 and memory requirements would be 16MB. See https://datatracker.ietf.org/doc/html/rfc7914, and also the scrypt program which has tuneable parameters. See https://words.filippo.io/the-scrypt-parameters/ for a further explanation. I would also recommend downloading the scrypt package, or visit its github page: https://github.com/Tarsnap/scrypt .

However, making it universally compatible across systems would require limiting N to the value that would apply to the smallest system! Thanks again!

$ scrypt enc enwik8 enwik8.scr
Please enter passphrase:
Please confirm passphrase:
$ scrypt info enwik8.scr
Parameters used: N = 1048576; r = 8; p = 1;
Decrypting this file requires at least 1.0 GB bytes of memory.

$ scrypt enc --logN 12 -r 8 -p 1 enwik8 enwik8.scr
Please enter passphrase:
Please confirm passphrase:
$ scrypt info enwik8.scr
Parameters used: N = 4096; r = 8; p = 1;
Decrypting this file requires at least 4.1 MB bytes of memory.

@Theelx
Copy link
Contributor

Theelx commented Jan 26, 2025

Hello, I couldn't help reading this report and I noticed that this issue seems to revolve around the amount of RAM available when decompressing on systems with little RAM. It was mentioned that the file seem to decompress fine when going from smaller RAM to bigger RAM, but I wonder if that holds in all cases. I have a recently built desktop with 128GB of RAM, would it be helpful to test the current latest code for decompression from 1GB or so to ~64GB, or would that be unnecessary because the problem you've pinpointed is different? I'm having a bit of trouble following the comment chain because I'm exhausted at the moment, but if it would help at all I'd be happy to do some testing over the next few days!

@pete4abw
Copy link
Owner

@Theelx , you are correct that the current whats-next branch will encrypt on larger systems, but may not work on smaller systems. The nloops function in lrzip.c was designed to compute a unique loop count for hashing a salted key that would grow and always be greater than 8MB. I used that as a basis for computing a cost factor. However, this was not necessary for SCRYPT which does its own hashing.

Obviously, the higher the cost factor, the harder to brute force the key. But, is this really necessary? So what I am working on now, and this may take a little time is:

  1. Preserve current functionality so that current encrypted archives will still decompress.
  2. Devise a new algorithm for determing a viable cost factor. RAM rounded down to the nearest power of 2 for example. (recall, cost factor must be a power of 2 between 1KB and 1TB or 2^10 to 2^40).
  3. Allow a user to input his/her own cost factor exponent to override the system, but at risk of causing encryption to fail if there is not enough RAM or decryption to possibly fail on a smaller system.
  4. Store that exponent in the magic header. This will be in v0.14.0

On a 16GB system currently, the cost factor is 8MB or 2^23. But for a smaller system, this won't work in the master branch.
so...
On a 4GB system, cost factor would default to 2MB of 2^21.
Or, on a 16GB system, a user could tune cost factor to any amount. --costfactor=21, for example. This would allow a 4GB system to decrypt and decompress a file.
Or simply use a low default cost factor. As noted, 2^14, 16KB is used for login terminals. This would make lrzip-next universally usable with no intervention.

Thanks for the offer to help, but this is a really low probability error and the whats-next branch will work on a single system with no issue..Stay tuned!

@pete4abw
Copy link
Owner

v0.14.0 is in whats-next branch. New option --costfactor=N where N is between 10 and 40, representing a cost factor of 2^10 (1KB) and 2^40 (1TB). See WHATS-NEW file and man page for details. Cost factor will default to RAM/1024/2. This version should be compatible with older versions to v0.8x, but no guarantees.

@pete4abw
Copy link
Owner

pete4abw commented Feb 1, 2025

@AlleyPally @Theelx , just curious how the testing is going?

@Theelx
Copy link
Contributor

Theelx commented Feb 1, 2025

Oh sorry, I misunderstood your reply and thought testing wasn't needed. I'll get on it in the next few hours!

@Theelx
Copy link
Contributor

Theelx commented Feb 1, 2025

I have encountered what seems to be a bug :(
I have 128GB of RAM on this computer, so I tried using this command with costfactor 25 and got this output which worked:

lrzip-next --zstd --zstd-level 17 --costfactor 25 -vP -p12 -U -o ./thing_25.lrz thing.tar

The following options are in effect for this COMPRESSION.
Threading is ENABLED. Number of CPUs detected: 12
Detected 134,757,314,560 bytes ram
Nice Value: 19
Show Progress
Verbose
Output Filename Specified: ./thing_25.lrz
Temporary Directory set as: /tmp/
Compression mode is: ZSTD. LZ4 Compressibility testing enabled
Compression level 7
RZIP Compression level 7
ZSTD Compression Level: 17, ZSTD Compression Strategy: btopt
MD5 Hashing Used
Using Unlimited Window size
File size: 41,772,001,280
Will take 1 pass
Per Thread Memory Overhead is 0
Beginning rzip pre-processing phase
Total: 99%  Chunk: 99%
thing.tar - Compression Ratio: 28.152. bpb: 0.284. Average Compression Speed: 145.387MB/s.
Total time: 00:04:33.68

Next, I tried it with costfactor 21 and it worked and produced the same compressed output, but it was slower (~5m30s). Then, I tried using costfactor 27 and it hung at 99%. I noticed the amount of RAM used spiked from 12.4GB to ~55GB, but it didn't even get close to being full. Why this occurs, I don't know, and I'd like help getting tooling for debugging it:

lrzip-next --zstd --zstd-level 17 --costfactor 27 -vP -p12 -U -o ./thing_27.lrz thing.tar

The following options are in effect for this COMPRESSION.
Threading is ENABLED. Number of CPUs detected: 12
Detected 134,757,314,560 bytes ram
Nice Value: 19
Show Progress
Verbose
Output Filename Specified: ./thing_27.lrz
Temporary Directory set as: /tmp/
Compression mode is: ZSTD. LZ4 Compressibility testing enabled
Compression level 7
RZIP Compression level 7
ZSTD Compression Level: 17, ZSTD Compression Strategy: btopt
MD5 Hashing Used
Using Unlimited Window size
File size: 41,772,001,280
Will take 1 pass
Per Thread Memory Overhead is 0
Beginning rzip pre-processing phase
Total: 99%  Chunk: 99%

@pete4abw
Copy link
Owner

pete4abw commented Feb 1, 2025

@Theelx , cost factor has no meaning with no encryption. I'm concerned that the per thread memory shows as 0. With 128gb of ram, why use -U? I'll study your output. May take some days, but I'll get to it. Thank you.

@Theelx
Copy link
Contributor

Theelx commented Feb 1, 2025

@Theelx , cost factor has no meaning with no encryption. I'm concerned that the per thread memory shows as 0. With 128gb of ram, why use -U? I'll study your output. May take some days, but I'll get to it. Thank you.

I have muscle memory for -U, but I agree it's not necessary. I recognized that cost factor has no meaning with no encryption, and when I tested it with encryption it worked fine (sorry for leaving that out), it's just that the combination of options seems to have broken something (what, I'm not sure).

@pete4abw
Copy link
Owner

pete4abw commented Feb 2, 2025

@Theelx , cost factor has no meaning with no encryption. I'm concerned that the per thread memory shows as 0. With 128gb of ram, why use -U? I'll study your output. May take some days, but I'll get to it. Thank you.

I have muscle memory for -U, but I agree it's not necessary. I recognized that cost factor has no meaning with no encryption, and when I tested it with encryption it worked fine (sorry for leaving that out), it's just that the combination of options seems to have broken something (what, I'm not sure).

@Theelx , please take a moment to try using the master branch, with the same options that caused the crash (except
--costfactor, of course). I just want to make sure this is not a new bug I introduced or one hiding around in the master branch. Thank you again.

@AlleyPally
Copy link
Author

Apologies for the lack of reply! Decryption no longer works on encrypted files no matter what cost factor is used. However, while somewhat irrelevant, I noticed that decryption (using latest lrzip-next) of a file compressed by lrzip-next version 0.13.3 was successful. Of course, vice versa is not the case. I am clueless but perhaps something is off about the encryption of the newest version itself?

Compression options and output: lrzip-next -vvp 2 --encrypt=goodpassword enwik8

compresslrz.txt

Decompression options and output: lrzip-next -dvv enwik8.lrz

decompresslrz.txt

I tried decompressing files from a guest VM and vice versa, no cigar!

I also tried using @Theelx's options using the whats-next and the master branch (Encryption was used for the whats-next version but not master) and everything worked on my end.

@pete4abw
Copy link
Owner

pete4abw commented Feb 3, 2025

@AlleyPally there is a problem. On decryption, testing, info, the value stored in the lrz file is not read for costfactor. This forces it to be recomputed incorrectly. Thank you. Please stay tuned.

@pete4abw
Copy link
Owner

pete4abw commented Feb 3, 2025

@Theelx @AlleyPally Please try now. Don't go crazy with big files or lots of options. Even something simple like this will suffice. Costfactor can be anything.

lrzip-next -fL1 -epassword --costfactor 10 enwik8
lrzip-next -t -epassword --costfactor enwik8.lrz

You can test the storage of costfactor by using this command. If lrzip-next returns without error, costfactor and decryption works fine.

$ hexdump -Cn8 enwik8.lrz
00000000 4c 52 5a 49 00 0e 0f 74 |LRZI...t|

Where byte 6, 0f, is the costfactor exponent

@AlleyPally
Copy link
Author

Everything is Oll Korrect! I have tested the newest version and have done the aforementioned test and hexdump check. I have also done so in a guest VM for redundancy's sake and all worked flawlessly! Thank you!

@Theelx
Copy link
Contributor

Theelx commented Feb 4, 2025

I found out that my earlier error was due to an errant compiler option that I have for custom builds, so that can be ignored. Everything works fine on my end, thanks so much for your hard work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants