-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split files into chunks before encrypting to metigate potential attacks on file size #4
Comments
Thanks for bringing this issue up. After your comment, I have updated the Encryption document to explain how this is indeed a potential threat, but how at this moment it's considered by design and its risk is considered manageable. The solution you're proposing would work indeed. It is something I've considered already, although I do see two issues with that: First, as you stated yourself, this would require wasting more storage space because of padding. The second issue will be more complex to solve, and it's about the way the index works. At the moment, we maintain an index file which contains the list of all files in the storage (for each file, in version 0.4 we store: the decrypted filename, the encrypted filename, the file type, the creation date). This is currently a single file, and I am aware of the fact that as the repo grows bigger, this becomes larger and larger, and could lead to a variety of issues (not just about performance): to try to reduce the issue, in version 0.4 I migrated the index from JSON to a file encoded with protocol buffers. If we start adding chunks, we'll almost certainly need to reward the way the index is designed, as it could grow too large to be manageable. Of course, creating an index for each file would defeat the purpose, so that won't happen. Because the index file needs to be able to be stored in object storage services, whatever format we choose it needs to satisfy four requirements:
I have not found an effective solution to the problem above. I'm fairly confident that whatever the solution will be, it will need to leverage multiple files (for point 1), so things like a SQLite database won't work. I am open to suggestions however :) PS: one small correction here:
In addition to that, because we use the DARE format there's a small overhead (they claim ~0.05%) as each 64KB chunk has a header. This can be calculated deterministically, however. |
Found another disadvantage of splitting files into chunks after exploring AWS S3 pricing: each PUT object operation costs $0.005, so uploading a 1GB file split into 1MB chunks would cost $5 |
Actually that’s the price for 1,000 requests, so uploading that 1GB file would only be $0.005 ;) |
Oh, didn't notice the per 1,000 requests part 🤦 |
In any case, if you (or anyone else) have suggestions on how to best implement the index, please feel free to speak up! (And PRe are welcome too) |
As indicated in this comment, the fact that each file is encrypted as a whole leaves following details open after encryption:
Suggested solution is splitting each file into chunks of a fixed size, making sure to pad the last chunk with random data (just how last block is padded in block cipher implementations).
Additionally, it would be a good idea to upload the chunks to the cloud storage in parallel in somewhat randomized order to hinder sort all chunks by creation date attacks.
As explained by @ItalyPaleAle in this reply, this could be considered an acceptable compromise when considering complexities involved in addressing this.
The text was updated successfully, but these errors were encountered: