Skip to content

autoCreation broken in 0.22.0. #650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stuckj opened this issue May 23, 2024 · 16 comments · Fixed by #657
Closed

autoCreation broken in 0.22.0. #650

stuckj opened this issue May 23, 2024 · 16 comments · Fixed by #657
Labels

Comments

@stuckj
Copy link
Contributor

stuckj commented May 23, 2024

Howdie,

I've been using znapzend for several years. Thanks BTW, it's been a big help!

TL;DR:

  • Autocreation is broken in 0.22.0.
  • New enable-dst-autocreation command for znapzendzetup is also broken UNLESS the autocreation property already exists on the dataset.
  • New autocreation property (set manually) doesn't fix auto-creation...still doesn't work between datasets.
  • Script demonstrating problem is below.
  • Outputs for the script from several versions below.

Details:

While setting up a new desktop I noticed that the latest version from git (and 0.22.0) appears to have broken autoCreation. Specifically, if you have a simple setup with a recursive backup definition at the root and propagated to all sub-datasets, the initial backup to a destination will not auto-create those sub-datasets. Also, if you make new datasets in the source they won't get sent to the destination.

Furthermore, the new enable-dst-autocreation feature on znapzendzetup only works if the autocreation property already exists on the dataset. It won't set te property if it's not there already.

My laptop has 0.21.1 installed and it properly auto-creates sub-dataset to my server. The new desktop is running proxmox and I'm using my znapzend docker wrapper for proxmox (https://hub.docker.com/r/stuckj/znapzend-proxmox) which currently uses the latest from git (going to change that to 0.21.2 since discovering this). That's how I noticed the discrepancy since it works fine on my laptop, but not on the new desktop.

I made a super-simple script to demonstrate the problem (autocreate-test.sh.gz). You must run it as root since it creates test pools (and destroys them first). Do look it over before running it just in case you have existing pools named test1 or test2. :-P

Running that script on 0.21.1 or 0.21.2 successfully replicates the pool structure between test1 -> test2. On 0.22.0+ (checked out directly and built locally) it fails to create the sub-datasets. Only the root dataset is replicated.

I noticed in the diff that there were changes adding in autoCreation zfs properties for the destination. I tried setting those with znapzendzetup enable-dst-autocreation test1 nas, but it did nothing. This appears to be a separate bug. So, I MANUALLY set the property (based on what the code looked like it wanted) with zfs set org.znapzend:dst_nas_autocreation=on test1. That still doesn't work though...same error about the sub-destination not existing.

Below are the log outputs to versions 0.21.1, 0.21.2, 0.22.0, and the latest in git. It only works properly in versions 0.21.1 and 0.21.2 and appears to have broken in version 0.22.0.

Log Output

0.21.1 output: autocreate-results-0.21.1.log
0.21.2 output: autocreate-results-0.21.2.log
0.22.0 output: autocreate-results-0.22.0.log
git latest output: autocreate-results-latest.log

Test Script

#!/bin/bash

TEST_FILE1=./test1.img
TEST_FILE1_FULL=$(realpath -s "${TEST_FILE1}")
TEST_FILE2=./test2.img
TEST_FILE2_FULL=$(realpath -s "${TEST_FILE2}")
TEST_SIZE=300M

POOL1=test1
SUB1=${POOL1}/sub1
SUB1_2=${SUB1}/sub2
SUB2=${POOL1}/sub2
POOL2=test2

set -x

# Pool creation...
zpool destroy ${POOL1}
rm -f "${TEST_FILE1_FULL}"
truncate -s ${TEST_SIZE} "${TEST_FILE1_FULL}"
zpool create ${POOL1} "${TEST_FILE1_FULL}"
zfs create ${SUB1}
zfs create ${SUB1_2}
zfs create ${SUB2}

zpool destroy ${POOL2}
rm -f "${TEST_FILE2_FULL}"
truncate -s ${TEST_SIZE} "${TEST_FILE2_FULL}"
zpool create ${POOL2} "${TEST_FILE2_FULL}"

# Znapzend setup...
znapzendzetup create --donotask --tsformat='%Y%m%dT%H%M%S' --recursive SRC '1h=>15min,2d=>1h,14d=>1d,3m=>1w' test1 DST:nas '1h=>15min,2d=>1h,14d=>1d,3m=>1w,1y=>1m' test2

# Try the new autoCreation property...
znapzendzetup enable-dst-autocreation ${POOL1} nas

# Output zfs properties...
zfs get all ${POOL1} | grep org.znapzend

# Manually set the property since it doesn't work from znapzendzetup.
zfs set org.znapzend:dst_nas_autocreation=on ${POOL1}

# Output zfs properties...
zfs get all ${POOL1} | grep org.znapzend

# Znapzend test...
znapzend --debug --logto=/dev/stdout --runonce=${POOL1} --autoCreation

# Output datasets...
zfs list -r ${POOL1} ${POOL2}
@oetiker
Copy link
Owner

oetiker commented May 24, 2024

@jimklimov does this ring a bell ?

@oetiker oetiker added the bug label May 24, 2024
@oetiker
Copy link
Owner

oetiker commented May 24, 2024

@stuckj thank you for the report. If you could add to our test suite we could make sure this remains fixed

@jimklimov
Copy link
Contributor

No, sorry - did not misbehave like that for me :\ Maybe I tried with somewhat different use-cases. Will try to investigate in the coming days

@stuckj
Copy link
Contributor Author

stuckj commented May 24, 2024

@stuckj thank you for the report. If you could add to our test suite we could make sure this remains fixed

Do you mean add it as a perl test? I can take a stab at that. If I'm understanding the current test design, it looks like all of the external tools are mocked out to store env vars based on how they're called. And then the tests just verify that the tools were called with the expected parameters, right? I can figure out how the working instances from 0.21.2 were working vs 0.22.0 and see if I can modify the tests to handle the case.

I'll try to find some evening time to work on it (always a challenge with work and kids, but I'll give it a go).

@oetiker
Copy link
Owner

oetiker commented May 24, 2024

@stuckj thank you for the report. If you could add to our test suite we could make sure this remains fixed

Do you mean add it as a perl test? I can take a stab at that. If I'm understanding the current test design, it looks like all of the external tools are mocked out to store env vars based on how they're called. And then the tests just verify that the tools were called with the expected parameters, right? I can figure out how the working instances from 0.21.2 were working vs 0.22.0 and see if I can modify the tests to handle the case.

yes exactly, the tests mock everyting so the mocks may require enhancement

@jimklimov
Copy link
Contributor

jimklimov commented Jun 2, 2024

So, dug a bit more into this. About the znapzendzetup enable-dst-autocreation ${POOL1} nas not doing much - gotta be a copy-paste error, fixed.

Regarding the main issue, in sendRecvCleanup() we check if (!$backupSet->{"dst_$key" . '_valid'}) {...} (e.g. target exists) and handle my $autoCreation there -- skipping the target or creating it.

  • This part checks the backupSet with the directly defined retention schedule, e.g. the test1 pool in this example.

Then we get into the "sending loop through all subdatasets", where we also juggle (another evaluation of) autoCreation as well as sendRaw. Maybe this mix of checks is what misifres, some and or not mis-placed?

At least, in this block:

            # Time to check if the target sub-dataset exists
            # at all (unless we would auto-create one anyway).
            if ((!$autoCreation || !$self->sendRaw) && !($self->zZfs->dataSetExists($dstDataSet))) {
                my $errmsg = "sub-destination '" . $dstDataSet
                    . "' does not exist or is offline; ignoring it for this round... Consider "
                    . ( $autoCreation || $self->sendRaw ? "" : "running znapzend --autoCreation or " )
                    . "disabling this dataset from znapzend handling.";

changing || to &&

            if ((!$autoCreation && !$self->sendRaw) && !($self->zZfs->dataSetExists($dstDataSet))) {

... does help the test script succeed:

...
[2024-06-02 19:52:24.06096] [130998] [debug] === getSnapshotProperties(): Stopping recursion after test1/sub1@20240602T195223, we have all the properties we needed
[2024-06-02 19:52:24.06110] [130998] [debug] not cleaning up source test1/sub1@20240602T195223 because it is needed by test2/sub1
[2024-06-02 19:52:24.06119] [130998] [debug] got nothing to clean in source test1/sub1
[2024-06-02 19:52:24.06127] [130998] [info] done with backupset test1 in 1 seconds
[2024-06-02 19:52:24.06217] [130958] [debug] send/receive worker for test1 done (130998)
znapzend (PID=130958) is done.
+ zfs list -r test1 test2
NAME              USED  AVAIL     REFER  MOUNTPOINT
test1             277K   160M       25K  /test1
test1/sub1         48K   160M       24K  /test1/sub1
test1/sub1/sub2    24K   160M       24K  /test1/sub1/sub2
test1/sub2         24K   160M       24K  /test1/sub2
test2             276K   160M       25K  /test2
test2/sub1         62K   160M       24K  /test2/sub1
test2/sub1/sub2    24K   160M       24K  /test2/sub1/sub2
test2/sub2         24K   160M       24K  /test2/sub2
  • that line probably originated as (or meant to be) a different boolean combo, but was mis-adapted:
            if ( !($autoCreation || $self->sendRaw) && !($self->zZfs->dataSetExists($dstDataSet))) {

Gotta sit closely and rectify when and how we want the impact from sendRaw wherever it is mentioned.

@oetiker : IIRC it is about encrypted datasets; do we want those always sent (effectively auto-created by virtue of being encrypted)?

@jimklimov
Copy link
Contributor

Looking at history in CHANGES.old about sendRaw mentions, the ideas were:

  • Add sendRaw as a feature to enable sending encrypted datasets without decrypting to untrusted backup hosts
  • Do not use the -F receive option when sendRaw feature is used, zfs receive -F cannot be used to destroy an encrypted filesystem or overwrite an unencrypted one with an encrypted one
  • Do not automatically create destination when using feature sendRaw, receiving a raw encrypted stream is not supported on unencrypted datasets
  • Do not send snapshot to destination when sendRaw feature is used and destination dataset doesn't exist (causing zfs to automatically create the destination dataset), when the autoCreation option isn't used

The latter entry should have been about my recent efforts in the area; earlier ones come IIRC from different authors.

@jimklimov
Copy link
Contributor

jimklimov commented Jun 2, 2024

I guess in this clause, we have no impact from sendRaw being set or not. If not told to auto-create and destination does not exist - skip. Right?

Or maybe not. If autocreating, but also a dataset is sendRaw, still avoid creating it (per third changelog line above).

@jimklimov
Copy link
Contributor

I wonder if 0f6639e in PR #491 was right in this clause:

Not in direct scope of this issue at hand, and short on time to think more of it, but I am not sure now if that would do what the change description intends.

jimklimov added a commit to jimklimov/znapzend that referenced this issue Jun 2, 2024
…working or slacking off, and do not rely on existence of the property to set it [oetiker#650]

Addresses the part of issue oetiker#650 where

    ./bin/znapzendzetup enable-dst-autocreation --debug test1 nas

in fact did nothing.

Signed-off-by: Jim Klimov <[email protected]>
jimklimov added a commit to jimklimov/znapzend that referenced this issue Jun 2, 2024
…ilteredAway in the log, to avoid confusion [oetiker#650]

Signed-off-by: Jim Klimov <[email protected]>
jimklimov added a commit to jimklimov/znapzend that referenced this issue Jun 2, 2024
…n condition clause when a destination sub-dataset is missing [oetiker#650]

Signed-off-by: Jim Klimov <[email protected]>
jimklimov added a commit to jimklimov/znapzend that referenced this issue Jun 2, 2024
…working or slacking off, and do not rely on existence of the property to set it [oetiker#650]

Addresses the part of issue oetiker#650 where

    ./bin/znapzendzetup enable-dst-autocreation --debug test1 nas

in fact did nothing.

Signed-off-by: Jim Klimov <[email protected]>
jimklimov added a commit to jimklimov/znapzend that referenced this issue Jun 2, 2024
…ilteredAway in the log, to avoid confusion [oetiker#650]

Signed-off-by: Jim Klimov <[email protected]>
jimklimov added a commit to jimklimov/znapzend that referenced this issue Jun 2, 2024
…n condition clause when a destination sub-dataset is missing [oetiker#650]

Signed-off-by: Jim Klimov <[email protected]>
@jimklimov
Copy link
Contributor

jimklimov commented Jun 2, 2024

PR #657 posted.

And I tender my apologies about probably botching this up in the first place :\

@jimklimov
Copy link
Contributor

By the way, @stuckj : would you mind posting a PR to add your repro script into https://github.com/oetiker/znapzend/tree/master/contrib - even "as is" it can be a helpful foundation for developer testing iterations. Did help me while fixing the issue :)

@stuckj
Copy link
Contributor Author

stuckj commented Jun 3, 2024

@jimklimov, absolutely. Here's the PR: #658.

Sorry, I have not had a chance to integrate this into the test suite. We have end of school year madness with my kids right now along with sickness with the same kids. Looks like winter-time bugs are running into summer this year. :-P

@Cnly
Copy link

Cnly commented Dec 2, 2024

Do not automatically create destination when using feature sendRaw, receiving a raw encrypted stream is not supported on unencrypted datasets

Hmmm, I'm a bit confused. It seems that after this change, it's no longer possible for znapzend to send the first (initial) snapshot backup to remote, as the remote dataset doesn't exist yet, and znapzend has now decided not to create it automatically. But it also won't work if I simply create a dataset on the remote, as that'd be an unencrypted dataset.

So.. what's the recommended way of setting up an encrypted backup destination? With some manual send/recv at first? Not a zfs expert, so I'd appreciate any help.

@oetiker
Copy link
Owner

oetiker commented Dec 3, 2024

have you tried starting with the --autoCreate option?

@Cnly
Copy link

Cnly commented Dec 3, 2024

@oetiker I suppose you mean --autoCreation? I didn't find --autoCreate in the man page of znapzend 0.23.2. I've always been using --autoCreation (through the NixOS module option). If I downgrade from 0.23.2 to 0.21.0, znapzend is able to automatically work with newly created encrypted backup destinations again. It seems that the following line changed the logic?

https://github.com/oetiker/znapzend/pull/657/files#diff-4c777e431e6c30149fe108e6414fcc5e7f750026583799b3861ff40555e2d07eL667-R678

@jimklimov
Copy link
Contributor

jimklimov commented Dec 3, 2024

I think the recent posts are growing to be a separate (though related) issue about auto-creation behavior of encrypted destination datasets. This one was originally about something else, resolved and closed (perhaps popping up a pain point elsewhere).

Anyhow, #650 (comment) listed some "prior art" in earlier discussions. My understanding at the moment is that different implementations of ZFS across its two decades and many OSes and teams working on it could at least disagree about receiving or not the initial encrypted snapshot as a raw one, maybe depending also on some circumstances of the receiving pool or parent dataset.

Apparently something worked before, at least in your src/dst combo, and does not now, so it is a regression, violation of least-surprise principle in development, etc. What I currently worry about is if there is a single fix that fits all cases, or yet another feature or CLI option or dst-based property is needed to tweak it for different src/dst combos?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants