Skip to content

borg check errors after upgrade to 1.2.0 #6687

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jdchristensen opened this issue May 9, 2022 · 156 comments
Closed

borg check errors after upgrade to 1.2.0 #6687

jdchristensen opened this issue May 9, 2022 · 156 comments
Assignees
Labels
Milestone

Comments

@jdchristensen
Copy link
Contributor

jdchristensen commented May 9, 2022

Have you checked borgbackup docs, FAQ, and open Github issues?

Yes.

Is this a BUG / ISSUE report or a QUESTION?

Possible bug.

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

1.2.0 on all clients and servers. Previous version was 1.1.17 (not 1.1.7, as I wrote on the mailing list).

Operating system (distribution) and version.

Debian buster and Ubuntu 20.04 on servers. Debian buster, Ubuntu 20.04 and Ubuntu 21.10 on clients.

Hardware / network configuration, and filesystems used.

Multiple local and remote clients accessing each repository. Repositories are on ext4, on RAID 1 mdadm devices, with spinning disks underlying them. The Debian server also uses lvm.

How much data is handled by borg?

The repos are all around 100GB in size, with up to 400 archives each. The repositories have been in use for many years.

Full borg commandline that lead to the problem (leave away excludes and passwords)

borg check /path/to/repo [more details below]

Describe the problem you're observing.

borg check shows errors on three different repositories on two different machines. See below for details.

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

Yes, borg check shows the same errors when run again.

Include any warning/errors/backtraces from the system logs

I upgraded from borg 1.1.17 to 1.2.0 on several different systems on about April 9. On May 9, my monthly "borg check" runs gave errors on three repositories on two systems. Note that I use the setup where several clients do their backups into the same repositories. I don't have any non-shared repositories for comparison.

At the time of the upgrade from 1.1.17 to 1.2.0, I ran borg compact --cleanup-commits ... followed by borg check ... on all repos. There were no errors then. After that, I run borg compact without --cleanup-commits followed by borg check once per month. The errors occurred at the one month mark.

System 1 runs Ubuntu 20.04. Two of the three repos on this machine now have errors:

# borg check /Backups/borg/home.borg
Index object count mismatch.
committed index: 1166413 objects
rebuilt index:   1166414 objects
ID: 8a158ba7fdfae9b1373063a5bb5ea8ea6698c93ed7feff89ca6ff0a3c8842ebd
rebuilt index: (18596, 199132336) committed index: <not found>
Finished full repository check, errors found.

# ls -l /Backups/borg/home.borg/data/37
total 1453100
-rw------- 2 bu bu 201259844 Dec  8  2020 18596
-rw------- 2 bu bu 185611530 Dec 12  2020 18651
-rw------- 2 bu bu 125106377 Dec 25  2020 18858
-rw------- 2 bu bu 524318301 Dec 26  2020 18874
-rw------- 2 bu bu 193813842 Dec 30  2020 18940
-rw------- 2 bu bu 116657254 Dec 30  2020 18945
-rw------- 2 bu bu 141181725 Dec 31  2020 18953

# borg check /Backups/borg/system.borg
Index object count mismatch.
committed index: 2324200 objects
rebuilt index:   2324202 objects
ID: 1e20354918f4fdeb9cc0d677c28dffe1a383dd1b0db11ebcbc5ffb809d3c2b8a
rebuilt index: (24666, 60168) committed index: <not found>
ID: d9c516b5bf53f661a1a9d2ada08c8db7c33a331713f23e058cd6969982728157
rebuilt index: (3516, 138963001) committed index: <not found>
Finished full repository check, errors found.

# ls -l /Backups/borg/system.borg/data/49
total 3316136
-rw------- 2 bu bu 500587725 Oct  5  2021 24555
-rw------- 2 bu bu 168824081 Oct  8  2021 24603
-rw------- 2 bu bu 116475028 Oct  9  2021 24619
-rw------- 2 bu bu 107446533 Oct 11  2021 24634
-rw------- 2 bu bu 252958665 Oct 12  2021 24666
-rw------- 2 bu bu 124871243 Oct 19  2021 24777
-rw------- 2 bu bu 277627834 Oct 19  2021 24793
-rw------- 2 bu bu 231932763 Oct 21  2021 24835
-rw------- 2 bu bu 114031902 Oct 22  2021 24847
-rw------- 2 bu bu 127020577 Oct 26  2021 24899
-rw------- 2 bu bu 220293895 Oct 26  2021 24907
-rw------- 2 bu bu 113238393 Oct 27  2021 24933
-rw------- 2 bu bu 525154704 Oct 27  2021 24941
-rw------- 2 bu bu 291472023 Oct 27  2021 24943
-rw------- 2 bu bu 223721033 Oct 30  2021 24987

# ls -l /Backups/borg/system.borg/data/7
total 1200244
-rw------- 2 bu bu 524615544 Feb  4  2018 3516
-rw------- 2 bu bu 145502511 Feb  5  2018 3529
-rw------- 2 bu bu 266037549 Feb 21  2018 3740
-rw------- 2 bu bu 292869056 Mar 14  2018 3951

System 2 runs Debian buster. One of the three repos on this machine now has errors:

# borg check /Backups/borg/system.borg
Index object count mismatch.
committed index: 2052187 objects
rebuilt index:   2052188 objects
ID: 6b734ed388e7e086af7107847c6b6d3d34a29c20e7e539ded71b32606cb857bd
rebuilt index: (946, 15871355) committed index: <not found>
Finished full repository check, errors found.

# ls -l /Backups/borg/system.borg/data/1
total 205308
-rw------- 1 bu bu 210234581 Jun 20  2017 946

I have used borg on these systems for years, and no hardware has changed recently. System 1 has the repos on a RAID 1 mdadm device with two SATA spinning disks. System 2 also has the repos on RAID 1 mdadm devices with two SATA disks, with lvm as a middle layer. In both cases, smartctl shows no issues for any of the drives, and memtester also shows no errors.

Since the errors have happened on different machines within a month of upgrading to 1.2.0, I am concerned that this is a borg issue rather than a hardware issue. It is also suspicious to me that the error is the same in all cases, with a committed index not found. Hardware errors tend to produce garbage.

I have not run repair yet. Is there anything I should do before running repair to try to figure out the issue?

Update: there is a bounty for finding/fixing this bug: https://app.bountysource.com/issues/108445140-borg-check-errors-after-upgrade-to-1-2-0

@jdchristensen
Copy link
Contributor Author

I should also mention that I don't use encryption with any of these repos.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented May 9, 2022

Thanks for the detailed bug report.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented May 9, 2022

BTW, do you use borg break-lock or --bypass-lock option in your automated scripts?

@jdchristensen
Copy link
Contributor Author

jdchristensen commented May 9, 2022

I have never used break-lock or bypass-lock. I updated the top post to show the correct directories.

@jdchristensen
Copy link
Contributor Author

I have made a copy of one of the repos. Can you tell me exactly how to run the borg debug get-obj command? Where will it put the output?

@ThomasWaldmann ThomasWaldmann added this to the 1.2.1 milestone May 9, 2022
@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented May 9, 2022

% borg debug get-obj --help
usage: borg debug get-obj REPOSITORY ID PATH

thus, for the first repo:

borg debug get-obj /Backups/borg/home.borg \
    8a158ba7fdfae9b1373063a5bb5ea8ea6698c93ed7feff89ca6ff0a3c8842ebd \
    8a158ba7fdfae9b1373063a5bb5ea8ea6698c93ed7feff89ca6ff0a3c8842ebd.chunk

But likely it will only work after you have repaired the repo index.

@ThomasWaldmann
Copy link
Member

The directory listings did not reveal anything special. But considering the described circumstances this looks like a bug.

@ThomasWaldmann
Copy link
Member

In case we do not find the issue otherwise, maybe keep an unmodified copy of at least one of the repos until this is resolved.

@jdchristensen
Copy link
Contributor Author

I made a full copy of the third repository above (on the Debian system) and made hard link copies of the first two repositories. I'm now doing repairs on two of the repositories. When they complete, I will include the output, and do the get-obj check.

@jdchristensen
Copy link
Contributor Author

On the Debian system:

# borg check -v --repair /Backups/borg/system.borg 
This is a potentially dangerous function.
check --repair might lead to data loss (for kinds of corruption it is not
capable of dealing with). BE VERY CAREFUL!

Type 'YES' if you understand this and want to continue: YES
Starting repository check
finished segment check at segment 38108
Starting repository index check
Finished full repository check, no problems found.
Starting archive consistency check...
Analyzing archive estrela-system-20170729-04:32:29 (1/236)
...
Analyzing archive boots-system-20220509-06:32:25 (236/236)
1 orphaned objects found!
Deleting 1 orphaned and 236 superseded objects...
Finished deleting orphaned/superseded objects.
Writing Manifest.
Committing repo.
Archive consistency check complete, problems found.

On the Ubuntu system:

# borg check -v --repair /Backups/borg/home.borg
This is a potentially dangerous function.
check --repair might lead to data loss (for kinds of corruption it is not
capable of dealing with). BE VERY CAREFUL!

Type 'YES' if you understand this and want to continue: YES
Starting repository check
finished segment check at segment 32373
Starting repository index check
Finished full repository check, no problems found.
Starting archive consistency check...
Analyzing archive estrela-home-20170329-10:51:14 (1/405)
...
Analyzing archive yogi-home-20220509-02:01:01 (405/405)
1 orphaned objects found!
Deleting 1 orphaned and 420 superseded objects...
Finished deleting orphaned/superseded objects.
Writing Manifest.
Committing repo.
Archive consistency check complete, problems found.

@jdchristensen
Copy link
Contributor Author

On the Debian system:

# borg debug get-obj /Backups/borg/system.borg 6b734ed388e7e086af7107847c6b6d3d34a29c20e7e539ded71b32606cb857bd 6b734ed388e7e086af7107847c6b6d3d34a29c20e7e539ded71b32606cb857bd.chunk
object 6b734ed388e7e086af7107847c6b6d3d34a29c20e7e539ded71b32606cb857bd not found.

# ls -l /Backups/borg/system.borg/data/1
total 205308
-rw------- 1 bu bu 210234581 Jun 20  2017 946

On the Ubuntu system:

# borg debug get-obj /Backups/borg/home.borg 8a158ba7fdfae9b1373063a5bb5ea8ea6698c93ed7feff89ca6ff0a3c8842ebd 8a158ba7fdfae9b1373063a5bb5ea8ea6698c93ed7feff89ca6ff0a3c8842ebd.chunk
object 8a158ba7fdfae9b1373063a5bb5ea8ea6698c93ed7feff89ca6ff0a3c8842ebd not found.

# ls -l /Backups/borg/home.borg/data/37
total 1453100
-rw------- 2 bu bu 201259844 Dec  8  2020 18596
-rw------- 2 bu bu 185611530 Dec 12  2020 18651
-rw------- 2 bu bu 125106377 Dec 25  2020 18858
-rw------- 2 bu bu 524318301 Dec 26  2020 18874
-rw------- 2 bu bu 193813842 Dec 30  2020 18940
-rw------- 2 bu bu 116657254 Dec 30  2020 18945
-rw------- 2 bu bu 141181725 Dec 31  2020 18953

The repair said that orphans were being deleted, so maybe that is why they don't show up? And maybe the segment files are still there because I haven't run compact?

@jdchristensen
Copy link
Contributor Author

I get the same "object ... not found" message if I try get-obj on the (unrepaired) copy of the Debian repo.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented May 9, 2022

OK, so the 1 missing index entry was an orphan chunk. That takes quite a bit of severity off from this ticket. :-)

It can't be found in the index before repair, because it is not in the index (maybe it never was added to the index).

It can't be found in the index after repair, because it was determined orphaned (no archive uses this object) and thus removed.

So what remains to be determined is what borg operation creates this orphan chunk.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented May 9, 2022

What could be done to determine the contents of the orphan chunk is to seek to the given offset into the given segment file and look what's there.

@jdchristensen
Copy link
Contributor Author

I tried looking at that offset in the file, but the chunks are compressed, so it wasn't obvious. If there's some way to extract the chunk and decompress it, let me know.

About the cause: would network errors cause orphaned chunks? I do occasionally see backups that don't successfully complete due to a network error. But I thought that further repo operations and/or borg compact would cleanup such things.

What is particularly odd is that none of those segment files has a recent date. One of them is from 2017! Maybe a chunk in there was somehow left after a prune operation? Maybe a prune operation had a network issue?

@jdchristensen
Copy link
Contributor Author

I'm also puzzled that the segment file is still there after the borg repair. Maybe the chunk will get removed after a borg compact? Or maybe these chunks are small enough that borg decides to leave them in place?

@ThomasWaldmann
Copy link
Member

If a small not-used-anymore (== "deleted") chunk sits in a segment file with other still-used chunks, it might well stay there. borg computes a ratio deleted/total size and if the ratio is below the threshold, borg compact won't compact the segment.

in recent borg, you can also give borg compact a threshold of 0 (but that might move a lot of data around) for not much space saving.

@ThomasWaldmann
Copy link
Member

BTW, as long as the chunk is logically deleted, it won't show as orphan.

  • PUT x, DEL x and nothing uses x == logically deleted, not an orphan
  • PUT x and nothing uses x == orphan

@ThomasWaldmann
Copy link
Member

You could try this (it should decrypt and decompress as needed). It dumps ALL objects, so needs quite some space / time.

mkdir myrepo-dump
cd myrepo-dump
borg debug dump-repo-objs --ghost ../myrepo

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented May 9, 2022

From the borg check docs:

In repair mode, when all the archives were checked, orphaned chunks are deleted
from the repo. One cause of orphaned chunks are input file related errors (like
read errors) in the archive creation process.

E.g. if it starts backing up file_with_ioerror_in_the_middle:

  • it will start reading and chunking the file, writing some chunks to the repo
  • then it runs into an I/O error, because one block of the source file is not readable
  • this aborts backup of that one file due to the exception, but it does not abort the complete backup
  • some chunks are already written to the repo, but no borg item is written to the metadata stream (borg would do that after all content data from this file is processed)
  • so these chunks might be:
    • orphan chunks if not also used by some other archive (new data, not deduplicated)
    • chunks with currently too high refcount (deduplicated data)

Maybe borg could do some cleanup in an exception handler, but IIRC it does not do that yet.

@jdchristensen
Copy link
Contributor Author

None of this explains why there would be orphan chunks from years ago. I run borg check every month, and it never reported any errors until yesterday, so I don't think these orphan chunks could have arisen from I/O errors at the time of backup. Somehow they recently became orphans. Could there be a bug in prune? Or maybe several prune operations got interrupted, all in the same month, even though that never caused an error in the past five years?

@ThomasWaldmann
Copy link
Member

Yeah, guess that's right. Would be interesting to look at the plaintext of that chunk.

@jdchristensen
Copy link
Contributor Author

I don't think it's worth the time/space to extract all chunks. I tried using

dd skip=15871355 count=1000000 if=system-bad.borg/data/1/946 of=chunk bs=1

to extract 1MB starting where the chunk starts (this is on the Debian system), and then using od -x chunk to view the bytes, but I don't see any zlib magic bytes there. A debug option to extract a chunk based on segment and offset would be handy.

Here is the start of the od -x chunk output:

0000000 3e63 51fa 0033 0000 6b00 4e73 88d3 e0e7
0000020 af86 0771 7c84 6d6b 343d 9ca2 e720 39e5
0000040 d7de 321b 6c60 57b8 02bd 0000 302d 332e
0000060 3832 100a 517c ceea 0000 0000 c4d8 9186
0000100 a311 4cb9 0ebb d948 783d 1c02 1a37 fc6f
0000120 040e 54d8 7699 1d58 3cf3 758c 7802 559c
0000140 4b8e 830e 0c30 ef44 35e2 ce41 721f 2e0e
0000160 8110 a8f9 5688 da49 e20d 0dee 5510 caea
0000200 f1e3 db1b 0c3b 793e 3b70 cff4 5c6d 5ba6
0000220 3a05 a06b 5031 4f9c fab9 c098 4081 196e
0000240 c66a 876d 29ca 44e9 8b53 2b56 0da9 2c97
0000260 885b 1c70 7547 ecf1 2f97 c339 7604 4e88
0000300 c769 8d4d d68d 95b2 96d6 055c 9f2f 99fa
0000320 f4ad 15a1 a99c 1220 340d 4b80 4de1 6779
0000340 fb3f a26d df98 4dc9 f1f2 e451 eb75 b21e
0000360 e325 58bc f227 ac67 bb7e 3c09 78be 49cd

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented May 9, 2022

It's a bit hard to decipher due to the 16bit formatting with bytes swapped, but:

crc32 = 51fa3e63 
len32 = 00000033
tag8 = 00 (PUT)
id256 = 6b734ed388e7e086af7107847c6b6d3d34a29c20e7e539ded71b32606cb857bd
compression = 0200 (lzma or rather "xz")
data = ... (only 8 bytes)

Hmm, does not decompress, strange.

@ThomasWaldmann
Copy link
Member

It seems it is uncompressed data:

bytes.fromhex(d).decode()
'\x00-0.328\n'

@jdchristensen
Copy link
Contributor Author

jdchristensen commented May 9, 2022

So a null byte, followed by the ascii string "-0.328\n". Strange. I confirmed this by adding 43 to the offset and extracting 8 bytes from the segment file. I don't know what that file could have been.

I should have mentioned that I use --compression auto,zlib, so some data won't be compressed, which happened here since such short data doesn't compress well. So I'm surprised there are bytes indicating compression.

Edit: for the record, the length above, 0x33, is 51 in decimal, and includes the 43-byte header, which is why the data portion is 8 bytes long.

@jdchristensen
Copy link
Contributor Author

I repaired the third repository, the other one on the Ubuntu system. Similar results.

# borg check -v --repair /Backups/borg/system.borg
This is a potentially dangerous function.
check --repair might lead to data loss (for kinds of corruption it is not
capable of dealing with). BE VERY CAREFUL!

Type 'YES' if you understand this and want to continue: YES
Starting repository check
finished segment check at segment 28246
Starting repository index check
Finished full repository check, no problems found.
Starting archive consistency check...
Analyzing archive estrela-system-20170426-08:55:29 (1/289)
...
Analyzing archive yogi-system-20220509-02:04:03 (289/289)
2 orphaned objects found!
Deleting 2 orphaned and 294 superseded objects...
Finished deleting orphaned/superseded objects.
Writing Manifest.
Committing repo.
Archive consistency check complete, problems found.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented May 15, 2022

A theory how this could happen (see #6687 (comment)):

  • borg check is ok
  • the repo loses the DEL somehow
  • the previously logically deleted chunk (which is not required because no archive references it) chunks gets undeleted due to that
  • borg check --repair finds the orphan chunk and deletes it again

Did some reviews, but did not find anything suspicious:

git diff -r 1.1.16 -r 1.2.0 src/borg/repository.py
git diff -r 1.1.16 -r 1.2.0 src/borg/_hashindex.c
git diff -r 1.1.16 -r 1.2.0 src/borg/hashindex.pyx

But one thing came to my mind:

  • borg 1.1.x did not persist shadow_index into the hints file (as there was no need to do that, it always used that information immediately within compact_segments in the same borg invocation)
  • borg 1.2 persists/loads shadow_index to/from hints file.
  • thus, when switching from 1.1 to 1.2, there is no shadow_index to load from hints and thus it will default to empty initially.

Have to investigate more whether that could be an issue (guess that shouldn't be an issue, the hints file could also just get lost at some time).

@horazont
Copy link
Contributor

get-obj fails to retrieve it and I don't have enough scratch space for dump-repo-objs --ghost. I did a manual search for the object in the segments and it looks like compressed data. Decompressing the data looks like medium-entropy data, I cannot ascertain what it belongs to. Does not look like archive stream data or similar (in contrast to #6687 (comment) ).

@horazont
Copy link
Contributor

Deleting the hints file before running compact seems to avoid the issue.

That indicates to me that the shadow_index is wrongly generated by the deletes/prunes and the compact logic itself is sound.

@ThomasWaldmann ThomasWaldmann modified the milestones: 1.2.8, 1.2.7 Oct 14, 2023
@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Oct 26, 2023

@horazont if there is no entry for some chunkid in the shadow index, that triggers a special case in the compact code (like "we do not know about shadowed PUTs, thus we better keep the DEL"), so this is what you achieve by deleting the hints file.

The problem in your hints file seems to be that the list of segment numbers with shadowed PUTs is empty for that chunkid, so the compact code decides to drop the seemingly not needed DEL. But the list should not be empty as there actually still is a PUT for that chunkid in the segment files.

Maybe the shadow_index being wrong is related to these changes: #5636 #5670

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Oct 28, 2023

#5670 was wrong, I reopened #5661, see there.

could it be that the issues seen here were seen after borg did "double-puts" (writing a chunk x to the repo which already has a chunk x), like when using borg recreate to change the compression algorithm?

the manifest gets also always (over)written to same id 0, but might behave differently because it is always written to its own segment file (causing the segment with the superseded manifest always being compacted) - otherwise we would have seen this issue all the time, not just rarely.

not 100% sure if this also fixes #6687 (this issue), but it could be.

if somebody is seeing this frequently, locally applying the fix and observing the behaviour in practice would be welcome! the fix is for the repo code, so it needs to be applied on the repo side when running borg in client/server mode.

@ThomasWaldmann
Copy link
Member

See #5661 for some conclusion.

More conclusions about shadow_index (short: si):

not persisting si in borg < 1.2 and then persisting it in borg 1.2 might cause an issue: if there was an old PUT x ... DEL x in the repo (made with borg < 1.2) we do not have a si entry about that. if borg >= 1.2 does a PUT x ... DEL x we will have a si entry about x, but it will only list the 2nd PUT (and borg will assume that an existing entry for x lists all the PUT segment numbers).

if then borg compacts the segment file with the first DEL, it will drop the DEL as it does not know about the 1st PUT. it will still be correct, because there is that last DEL, keeping x deleted.

if then borg compacts the segment files with the last PUT and also the one with the last DEL, then these both will be removed. that will resurrect the first PUT when borg check --repair rebuilds the index (repository part) and delete it again after the archives part, because it is orphan.

after that, it should be stable, as there is an si entry now for x listing the segment number of the first PUT.

TODO: think about whether borg check or borg compact --total could create a complete si.

ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Oct 28, 2023
before this fix, borg check --repair just created an
empty shadow index, which can lead to incomplete
entries if entries are added later.

and such incomplete (but present) entries can lead to
compact_segment() resurrecting old PUTs by accidentally
dropping related DELs.
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Oct 28, 2023
before this fix, borg check --repair just created an
empty shadow index, which can lead to incomplete
entries if entries are added later.

and such incomplete (but present) entries can lead to
compact_segments() resurrecting old PUTs by accidentally
dropping related DELs.
@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Oct 30, 2023

Testing the PR #7896 and #7897:

Practical testing of these is very welcome (be careful!).

For @horazont :

I suspect the users who either have old repos from borg < 1.2 or use borg recreate to change the compression are more likely to experience this issue than with repos that are fresh since borg 1.2 and never see borg recreate to change compression.

@ThomasWaldmann
Copy link
Member

After another self-review of these, I just merged the 2 PRs into 1.2-maint.

Guess if some more people besides @horazont could practically test the code (see previous comment), we could have a 1.2.7 release rather soon.

@horazont
Copy link
Contributor

horazont commented Nov 3, 2023

I deleted/changed some ancient data on that host. Ideally, that will trigger relevant ancient (borg 1.1.x-created) chunks to be expired from the repository next week, which would then trigger the issue.

ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Nov 5, 2023
before this fix, borg check --repair just created an
empty shadow index, which can lead to incomplete
entries if entries are added later.

and such incomplete (but present) entries can lead to
compact_segments() resurrecting old PUTs by accidentally
dropping related DELs.
@horazont
Copy link
Contributor

The trap triggered.

With the pre-prune copy and borg 1.2.6 from debian testing, I did:

  1. borg check --repository-only -> OK
  2. various borg prune calls -> OK
  3. borg check --repository-only -> OK
  4. borg compact -> OK
  5. borg check --repository-only -> failed as expected

Then I checked out 1.2-maint and installed it into a venv (borg 1.2.7.dev79+g70eed5e9). I reset the repository to its previous state and deleted any data related to that repository from ~/.config/borg and ~/.cache/borg.

Using the borg from the venv, I did:

  1. borg check --repair -> OK
  2. various borg prune calls -> OK
  3. borg check --repository-only -> OK
  4. borg compact -> OK
  5. borg check --repository-only -> Finished full repository check, no problems found.

🥳

@horazont
Copy link
Contributor

For the record: the --repair is necessary, just using 1.2-maint before pruning is not sufficient. I think that is expected?

@ThomasWaldmann
Copy link
Member

@horazont yes, I added rebuilding of the compaction infos when using --repair.

@ThomasWaldmann
Copy link
Member

Did someone else except @horazont do practical testing with the current 1.2-maint code?

Guess it would be good to get more practical testing before 1.2.7 release, especially considering this issue here.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Dec 1, 2023

OK, guess I'll just close this as fixed (see #6687 (comment) ) until otherwise proven.

If somebody is seeing something that looks like this issue, please first do a borg check --repair on the repo to rebuild the compaction info. Then try to reproduce.

Above mentioned fixes will be in borg 1.2.7 soon... (master branch is also fixed).

@jdchristensen
Copy link
Contributor Author

I think the Changelog should mention that users who have had problems with borg check involving orphaned chunks should run borg check --repair on the repo after upgrading to 1.2.7. I'm going to do that today, and am confident that my monthly checks will stop reporting orphaned chunks. Thanks for figuring this out!

Oh, I also just realized that the Changelog seems to mention an incorrect issue/PR:

check/compact: fix spurious reappearance of orphan chunks since borg 1.2, #6687 - this consists of 2 fixes:
for existing chunks: check --repair: recreate shadow index, #6687
for newly created chunks: update shadow index when doing a double-put, #5661

Should the "5661" be "7896"? And maybe the second "6687" should be "7897", to point to the PR instead of this issue?

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Dec 2, 2023

@jdchristensen thanks for the feedback! some links for easier checking of these issues:

Updated the change log: #7961

Usually I rather give the issue number than the PR number, but here I just gave both now. There was quite some back and forth here...

ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Dec 2, 2023
ThomasWaldmann added a commit that referenced this issue Dec 2, 2023
@palbr
Copy link

palbr commented Dec 2, 2023

Thanks a lot. The effort you put in BorgBackup is amazing!

How I was affected by this issue

I have been using BorgBackup 1.1.15 for many years and love its functionality and reliability! I am using Debian Linux and my Borg repository is on an external HDD, which is attached via USB during the time of backup creation.

On 3rd November 2023, I decided to migrate to a newer version of BorgBackup. I followed the Notes (excellent documentation, BTW!):

  1. Upgrade to BorgBackup 1.1.18
  2. borg create
  3. borg prune
  4. borg check --verify-data:
Completed repository check, no problems found.
Finished cryptographic data integrity verification, verified 528541 chunks with 0 integrity errors.
Archive consistency check complete, no problems found.
terminating with success status, rc 0
  1. Upgrade to BorgBackup 1.2.6
  2. address 1.2.5 TAM issue: All my 54 archives are already tam:verified 😀
  3. borg compact --cleanup-commits (freed up 3 GB of disc space - thank you 👍 )
  4. borg check --verify-data:
Finished full repository check, no problems found.
Finished cryptographic data integrity verification, verified 528541 chunks with 0 integrity errors.
Archive consistency check complete, no problems found.
terminating with success status, rc 0
  1. run borg info to build the local pre12-meta cache

Did some borg create runs and tested the results. Everything looked good.

I happily created 65 further archives during the next days. (borg create)

On 30th November 2023 it was time for my monthly borg prune and since I am using BorgBackup 1.2.6 now, I also started a borg compact run. Both commands worked without problems.

Afterwards I did a borg check --verify-data, which gave me:

Starting repository check
finished segment check at segment 11882
Starting repository index check
Index object count mismatch.
committed index: 528960 objects
rebuilt index:   528965 objects
ID: 4bca288173b1b81cf3f0844369ee5d83ad6763e0b769c0b04fedf6b79be799cd rebuilt index: (10693, 162087387) committed index: <not found>
ID: b54c28821a17ac98e280a1027499771a9e48fde5610d6bd1dad88d6a3283d5f0 rebuilt index: (10693, 278089494) committed index: <not found>
ID: e94079e3335847d23038b941351c37f0848768ac6aaf166e12d5179b7a61c50b rebuilt index: (8448, 616114)   committed index: <not found>
ID: 4e75c8f9d27cd882d0c29385b73d7a6d9ad744c60ec67c8ba86f74c28f50ade1 rebuilt index: (11074, 187051999) committed index: <not found>
ID: 457ecb4e89cb279468a56b5e8ffb6cb43a4909991b070c5ee0f0e054d84d7a9e rebuilt index: (6563, 1055550)  committed index: <not found>
Finished full repository check, errors found.
terminating with warning status, rc 1

This was the first time I ever encountered problems via borg check.

Research on the internet led me to this ticket #6687. I am sorry, that I couldn't contribute to solve this issue. But I am very grateful, that you could fix it by now.

I am looking forward to test BorgBackup 1.2.7 in the next days.

Do you think, this issue and its fix (install borg version >= 1.2.7 and run borg check --repair once) should be added to
Upgrade Notes - borg 1.1.x to 1.2.x? To help other users (migrating from 1.1.x to 1.2.x) avoiding to run into this issue.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Dec 3, 2023

@palbr Guess if someone upgrades from 1.1.x to 1.2.7+, they might be never affected, so there is nothing to repair in that case.

But I already added it to the 1.2.7 changelog entry (after release though), see above.

Also, guess running borg check --repair is the quite natural thing to do when encountering any issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests