Replies: 4 comments
-
Interesting rabbit hole: same here, when I discovered attic (borg's predecessor) back then. :-) But it works a bit differently than what you described above - maybe some docs could be improved if you really got that from the borg docs:
|
Beta Was this translation helpful? Give feedback.
0 replies
-
About your use case 1:
|
Beta Was this translation helpful? Give feedback.
0 replies
-
About use case 2:
|
Beta Was this translation helpful? Give feedback.
0 replies
-
About borg2:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Let me start of by saying that the search for a better backup solution has lead me down some of the most interesting rabbit holes in a long time. Especially Borgbackup. I spent the last 5 days reading about its internal workings and followed several discussions of Hash Collisions, Crypto optimizations and all kinds of great things. Even tho all of this info feels very chaotic and spread out its all there if one is willing to look for it. Unlike some other backup software cough Duplicacy cough...
As far as i understand it Borgbackup works as follows (leaving out compression and encryption for simplicity sake):
The Source folders get scanned, all files that need to be backed up are aligned and sort of treated as one giant block of Data (the docs used the analogy of a tar ball). This giant mother block is then chopped into Chunks. Either fixed size or by using a buzhash that is trying to find "smart" places to cut by searching for blocks of zeros within a given window of permitted Chunk sizes.
This means that one block can actually contain blocks of more then one file (not entirely sure about that tho).
After being chunked those chunks now get SHA-256 hashed. This hash is used to identify the chunk. If a hash already exists inside the manifest (i think that's the term used in the docs) AND the chunk size is identical, this chunk will be assumed to be a duplicate and thus NOT put in the repository (the "Destination").
A List is created that tells Borg what chunks need to be put back together to recreate the original file. This allows for a single block to be used in several files, thus de-duplication was achieved. To keep things a bit more manageable Borg put several chunks into one file. (think Haystack or a zip archive) . Thus what you see on your Repository file system will be a bunch of 500 MB (default) files with strange names.
To my understanding no checksum for the original file is being stored. Duplicacy for example does that, allowing to double check on restore that your files are actually OK. However it too does not do much in terms of hash collision mitigation.
Adding checksums yourself is easy to do, depending on the nature of your Data. In my case i just keep a separate DB that has those checksums as well as last modified dates and file sizes, just be sure :)
Assuming my understanding of Borgs inner workings is correct, i was hoping someone could help me optimizing things a little for my use case.
Use Case 1:
Large Media files that will not see much change, if any. I know a lot of people will scream rclone now, but running several backup schemes in parallel is a pain in my experience. Also Borg has some advantages, mostly that accidental deletions don't mess up your backup if you don't catch them in time.
Given that Media files usually are already compressed, is there anything to gain by disabling compression in borg? Does variable chunking make any sense if 99% of files are 1GB+ in size? I this use case only a local copy to an external Hard drive will be made.
Use Case 2:
In this borg makes more sense. All types of files, from just a couple of bytes to several GB, both compressed media and Databases, logs, pretty much everything. Also changes are more likely. A goof third of those files are expected to change pretty much between every backup. For that i feel like borgs defaults are probably the best choice.
Another option would be to seperate the media files out into their own backup job and again disabling compression. I don't think a larger chunk size would make sense here, since most of is jpg. Unfortunately the people who come up with Metadata standarts in the photo world seem to life in a very strange parallel universe and thought it was a good idea to embed them into files... This means that changing a single tag results in a new file getting written with a new checksum and modified date. I was hoping that borg could catch at least most of that, especially for larger .mp4 files.
And one last thing (feel free to just ignore this, i know how annoying users can be):
Given that i am about to go full out on borg, would it be worth waiting for 2.0? Will backups created in the current version of borg be compatible with 2.0?
Big thank you to everyone who made it this far!
Beta Was this translation helpful? Give feedback.
All reactions