Split files into chunks before encrypting to metigate potential attacks on file size #4

andreiled · 2020-07-04T20:36:55Z

As indicated in this comment, the fact that each file is encrypted as a whole leaves following details open after encryption:

exact number of files in the encrypted volume
almost exact size of each file (using block ciphers like AES-256 allows to determine file size in whole blocks, but not in bytes)

Suggested solution is splitting each file into chunks of a fixed size, making sure to pad the last chunk with random data (just how last block is padded in block cipher implementations).
Additionally, it would be a good idea to upload the chunks to the cloud storage in parallel in somewhat randomized order to hinder sort all chunks by creation date attacks.

As explained by @ItalyPaleAle in this reply, this could be considered an acceptable compromise when considering complexities involved in addressing this.

ItalyPaleAle · 2020-07-04T21:12:09Z

Thanks for bringing this issue up.

After your comment, I have updated the Encryption document to explain how this is indeed a potential threat, but how at this moment it's considered by design and its risk is considered manageable.

The solution you're proposing would work indeed. It is something I've considered already, although I do see two issues with that:

First, as you stated yourself, this would require wasting more storage space because of padding.
To limit the amount of wasted space we could make chunks smaller (e.g. 64KB, which is the size of each chunk in the DARE format too), however this would make the second issue below even more critical.

The second issue will be more complex to solve, and it's about the way the index works. At the moment, we maintain an index file which contains the list of all files in the storage (for each file, in version 0.4 we store: the decrypted filename, the encrypted filename, the file type, the creation date). This is currently a single file, and I am aware of the fact that as the repo grows bigger, this becomes larger and larger, and could lead to a variety of issues (not just about performance): to try to reduce the issue, in version 0.4 I migrated the index from JSON to a file encoded with protocol buffers.

If we start adding chunks, we'll almost certainly need to reward the way the index is designed, as it could grow too large to be manageable. Of course, creating an index for each file would defeat the purpose, so that won't happen.

Because the index file needs to be able to be stored in object storage services, whatever format we choose it needs to satisfy four requirements:

It needs to be compact, because clients will be uploading it frequently
It needs to be variable in length, so that by getting the full size of the (encrypted) index, you can't understand how many files are there in the repo. That is: we can't use fixed-size fields. Of course, one will always be able to assume that the larger the index, the more files (or chunks) are stored, but that's the same as looking at the size of the data folder.
It needs to be robust, so if a client crashes half-way while updating the index, there's no risk the file gets corrupted (range requests can be dangerous)

I have not found an effective solution to the problem above. I'm fairly confident that whatever the solution will be, it will need to leverage multiple files (for point 1), so things like a SQLite database won't work.

I am open to suggestions however :)

PS: one small correction here:

almost exact size of each file (using block ciphers like AES-256 allows to determine file size in whole blocks, but not in bytes)

In addition to that, because we use the DARE format there's a small overhead (they claim ~0.05%) as each 64KB chunk has a header. This can be calculated deterministically, however.
The other thing is that prvt itself adds a header, which is of variable length and can be at most 256+1024 bytes.

andreiled · 2020-07-05T02:45:17Z

Found another disadvantage of splitting files into chunks after exploring AWS S3 pricing: each PUT object operation costs $0.005, so uploading a 1GB file split into 1MB chunks would cost $5

ItalyPaleAle · 2020-07-05T02:53:52Z

Actually that’s the price for 1,000 requests, so uploading that 1GB file would only be $0.005 ;)

andreiled · 2020-07-05T03:02:43Z

Oh, didn't notice the per 1,000 requests part 🤦
$0.005 to upload 1GB file looks so much better.

ItalyPaleAle · 2020-07-05T07:12:35Z

In any case, if you (or anyone else) have suggestions on how to best implement the index, please feel free to speak up! (And PRe are welcome too)

andreiled referenced this issue Jul 4, 2020

Documentation: readme and encryption docs

0f0c47f

ItalyPaleAle added enhancement New feature or request help wanted Extra attention is needed labels Jul 4, 2020

ItalyPaleAle mentioned this issue Nov 28, 2020

Chunk index to multiple files #25

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split files into chunks before encrypting to metigate potential attacks on file size #4

Split files into chunks before encrypting to metigate potential attacks on file size #4

andreiled commented Jul 4, 2020

ItalyPaleAle commented Jul 4, 2020

andreiled commented Jul 5, 2020 •

edited

Loading

ItalyPaleAle commented Jul 5, 2020

andreiled commented Jul 5, 2020 •

edited

Loading

ItalyPaleAle commented Jul 5, 2020

Split files into chunks before encrypting to metigate potential attacks on file size #4

Split files into chunks before encrypting to metigate potential attacks on file size #4

Comments

andreiled commented Jul 4, 2020

ItalyPaleAle commented Jul 4, 2020

andreiled commented Jul 5, 2020 • edited Loading

ItalyPaleAle commented Jul 5, 2020

andreiled commented Jul 5, 2020 • edited Loading

ItalyPaleAle commented Jul 5, 2020

andreiled commented Jul 5, 2020 •

edited

Loading

andreiled commented Jul 5, 2020 •

edited

Loading