Skip to content

Make ganging redundancy respect redundant_metadata property #17073

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 19, 2025

Conversation

pcd1193182
Copy link
Contributor

@pcd1193182 pcd1193182 commented Feb 20, 2025

Motivation and Context

The redundant_metadata setting in ZFS allows users to trade resilience for performance and space savings. This applies to all data and metadata blocks in zfs, with one exception: gang blocks. Gang blocks currently just take the copies property of the IO being ganged and, if it's 1, sets it to 2. This means that we always make at least two copies of a gang header, which is good for resilience. However, if the users care more about performance than resilience, their gang blocks will be even more of a penalty than usual.

Description

We add logic to calculate the number of gang headers copies directly, and store it as a separate IO property. This is stored in the IO properties and not calculated when we decide to gang because by that point we may not have easy access to the relevant information about what kind of block is being stored. We also check the redundant_metadata property when doing so, and use that to decide whether to store an extra copy of the gang headers, compared to the underlying blocks.

There is also a new test to verify that gang headers are stored redundantly in roughly the way that we want, plus some changes to the other gang block tests to make the new test (and other tests) easier to work with.

How Has This Been Tested?

Mostly the new zfs-test suite test, plus similar manual testing.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no objections to this part, while base one I haven't looked yet.

PS: GitHub reports some merge conflict.

@amotin amotin added the Status: Code Review Needed Ready for review and testing label Feb 21, 2025
@pcd1193182 pcd1193182 force-pushed the gang_redundant branch 3 times, most recently from e2e7766 to 661d78e Compare February 25, 2025 20:09
@pcd1193182
Copy link
Contributor Author

Here is the change separated out from the dynamic gang header stuff. I'll work on separating out the dynamic allocation changes next, though that one is a little more entwined.

Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Just a small nit:

@amotin
Copy link
Member

amotin commented Mar 9, 2025

@pcd1193182 "error: commit message body contains line over 72 characters"

The redundant_metadata setting in ZFS allows users to trade resilience
for performance and space savings. This applies to all data and metadata
blocks in zfs, with one exception: gang blocks. Gang blocks currently
just take the copies property of the IO being ganged and, if it's 1,
sets it to 2. This means that we always make at least two copies of a
gang header, which is good for resilience. However, if the users care
more about performance than resilience, their gang blocks will be even
more of a penalty than usual.

We add logic to calculate the number of gang headers copies directly,
and store it as a separate IO property. This is stored in the IO
properties and not calculated when we decide to gang because by that
point we may not have easy access to the relevant information about what
kind of block is being stored. We also check the redundant_metadata
property when doing so, and use that to decide whether to store an extra
copy of the gang headers, compared to the underlying blocks.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Paul Dagnelie <[email protected]>
@tonyhutter
Copy link
Contributor

At a high level this PR makes sense as the gang block headers could be considered "metadata" and thus should follow redundant_metdata rules. There's some details to consider though,

redundant_metadata requires an extra copy of the metadata per the actual data:

redundant_metadata=all|most|some|none

Controls what types of metadata are stored redundantly. ZFS stores an extra copy of metadata, so that if a single block is corrupted, the amount of user data lost is limited. This extra copy is in addition to any redundancy provided at the pool level (e.g. by mirroring or RAID-Z), and is in addition to an extra copy specified by the copies property (up to a total of 3 copies). For example if the pool is mirrored, copies=2, and redundant_metadata=most, then ZFS stores 6 copies of most metadata, and 4 copies of data and some metadata.

https://openzfs.github.io/openzfs-docs/man/master/7/zfsprops.7.html#redundant_metadata

So if we consider gang block headers as redundant_metadata, then with copies=3, we should actually be storing 4 copies of the gang block headers. That would be difficult to do.

Also, ZFS prefers storing redundant_metadata on a special device. Are we currently trying to allocate gang block headers on special class devices? It seems like we should if we're considering them as redundant_metadata.

@pcd1193182
Copy link
Contributor Author

redundant_metadata requires an extra copy of the metadata per the actual data:
...
So if we consider gang block headers as redundant_metadata, then with copies=3, we should actually be storing 4 copies of the gang block headers. That would be difficult to do.

Sort of, but...

(up to a total of 3 copies).

We don't store more than 3 copies, regardless of the redundant_metadata setting. The only sense in which we do is if mirroring is in play (which, as the man page says, is "redundancy provided at the pool level"). The logic here is the same as for normal metadata blocks, which also respect the copies property (see dmu_write_policy; copies always starts at os->os_copies, whether the block is metadata or not).

Also, ZFS prefers storing redundant_metadata on a special device. Are we currently trying to allocate gang block headers on special class devices? It seems like we should if we're considering them as redundant_metadata.

That's an interesting question. My gut says that yes, we should ideally be sticking gang headers on special devices, if they're available. But that could easily be a separate change. It's not particularly complicated (just an extra call to metaslab_alloc in zio_write_gang_block, but the intent is separate and I'd like to get a little more feedback from other people on whether they agree with the idea without holding this change up.

@tonyhutter
Copy link
Contributor

tonyhutter commented Mar 19, 2025

So if we consider gang block headers as redundant_metadata, then with copies=3, we should actually be storing 4 copies of the gang block headers. That would be difficult to do.

Sort of, but...

  (up to a total of 3 copies).

Ah you're right - I totally misread that. I thought they were talking about the limits on the copies property, not the limits of the copies of the metadata. So it's like:

copies=1: 2 metadata copies
copies=2: 3 metadata copies
copies=3: 3 metadata copies

My gut says that yes, we should ideally be sticking gang headers on special devices, if they're available. But that could easily be a separate change.

That's true, let's consider it for a separate PR.

@tonyhutter tonyhutter merged commit 9250403 into openzfs:master Mar 19, 2025
22 of 25 checks passed
robn added a commit to robn/zfs that referenced this pull request Mar 20, 2025
Missed in openzfs#17073, probably because that PR was branched before openzfs#17001
was landed and never rebased.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <[email protected]>
robn added a commit to robn/zfs that referenced this pull request Mar 20, 2025
Missed in openzfs#17073, probably because that PR was branched before openzfs#17001
was landed and never rebased.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <[email protected]>
tonyhutter pushed a commit that referenced this pull request Mar 20, 2025
Missed in #17073, probably because that PR was branched before #17001
was landed and never rebased.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
fuporovvStack pushed a commit to fuporovvStack/zfs that referenced this pull request Apr 11, 2025
…17073)

The redundant_metadata setting in ZFS allows users to trade resilience
for performance and space savings. This applies to all data and metadata
blocks in zfs, with one exception: gang blocks. Gang blocks currently
just take the copies property of the IO being ganged and, if it's 1,
sets it to 2. This means that we always make at least two copies of a
gang header, which is good for resilience. However, if the users care
more about performance than resilience, their gang blocks will be even
more of a penalty than usual.

We add logic to calculate the number of gang headers copies directly,
and store it as a separate IO property. This is stored in the IO
properties and not calculated when we decide to gang because by that
point we may not have easy access to the relevant information about what
kind of block is being stored. We also check the redundant_metadata
property when doing so, and use that to decide whether to store an extra
copy of the gang headers, compared to the underlying blocks.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Paul Dagnelie <[email protected]>
Co-authored-by: Paul Dagnelie <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
fuporovvStack pushed a commit to fuporovvStack/zfs that referenced this pull request Apr 11, 2025
Missed in openzfs#17073, probably because that PR was branched before openzfs#17001
was landed and never rebased.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
fuporovvStack pushed a commit to fuporovvStack/zfs that referenced this pull request Apr 11, 2025
…17073)

The redundant_metadata setting in ZFS allows users to trade resilience
for performance and space savings. This applies to all data and metadata
blocks in zfs, with one exception: gang blocks. Gang blocks currently
just take the copies property of the IO being ganged and, if it's 1,
sets it to 2. This means that we always make at least two copies of a
gang header, which is good for resilience. However, if the users care
more about performance than resilience, their gang blocks will be even
more of a penalty than usual.

We add logic to calculate the number of gang headers copies directly,
and store it as a separate IO property. This is stored in the IO
properties and not calculated when we decide to gang because by that
point we may not have easy access to the relevant information about what
kind of block is being stored. We also check the redundant_metadata
property when doing so, and use that to decide whether to store an extra
copy of the gang headers, compared to the underlying blocks.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Paul Dagnelie <[email protected]>
Co-authored-by: Paul Dagnelie <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
fuporovvStack pushed a commit to fuporovvStack/zfs that referenced this pull request Apr 11, 2025
Missed in openzfs#17073, probably because that PR was branched before openzfs#17001
was landed and never rebased.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
ixhamza pushed a commit to truenas/zfs that referenced this pull request May 14, 2025
…17073)

The redundant_metadata setting in ZFS allows users to trade resilience
for performance and space savings. This applies to all data and metadata
blocks in zfs, with one exception: gang blocks. Gang blocks currently
just take the copies property of the IO being ganged and, if it's 1,
sets it to 2. This means that we always make at least two copies of a
gang header, which is good for resilience. However, if the users care
more about performance than resilience, their gang blocks will be even
more of a penalty than usual.

We add logic to calculate the number of gang headers copies directly,
and store it as a separate IO property. This is stored in the IO
properties and not calculated when we decide to gang because by that
point we may not have easy access to the relevant information about what
kind of block is being stored. We also check the redundant_metadata
property when doing so, and use that to decide whether to store an extra
copy of the gang headers, compared to the underlying blocks.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Paul Dagnelie <[email protected]>
Co-authored-by: Paul Dagnelie <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
ixhamza pushed a commit to truenas/zfs that referenced this pull request May 14, 2025
Missed in openzfs#17073, probably because that PR was branched before openzfs#17001
was landed and never rebased.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
@robn robn mentioned this pull request May 24, 2025
14 tasks
@behlendorf behlendorf mentioned this pull request Jun 13, 2025
14 tasks
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Jun 13, 2025
…17073)

The redundant_metadata setting in ZFS allows users to trade resilience
for performance and space savings. This applies to all data and metadata
blocks in zfs, with one exception: gang blocks. Gang blocks currently
just take the copies property of the IO being ganged and, if it's 1,
sets it to 2. This means that we always make at least two copies of a
gang header, which is good for resilience. However, if the users care
more about performance than resilience, their gang blocks will be even
more of a penalty than usual.

We add logic to calculate the number of gang headers copies directly,
and store it as a separate IO property. This is stored in the IO
properties and not calculated when we decide to gang because by that
point we may not have easy access to the relevant information about what
kind of block is being stored. We also check the redundant_metadata
property when doing so, and use that to decide whether to store an extra
copy of the gang headers, compared to the underlying blocks.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Paul Dagnelie <[email protected]>
Co-authored-by: Paul Dagnelie <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Jun 13, 2025
Missed in openzfs#17073, probably because that PR was branched before openzfs#17001
was landed and never rebased.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Jun 16, 2025
…17073)

The redundant_metadata setting in ZFS allows users to trade resilience
for performance and space savings. This applies to all data and metadata
blocks in zfs, with one exception: gang blocks. Gang blocks currently
just take the copies property of the IO being ganged and, if it's 1,
sets it to 2. This means that we always make at least two copies of a
gang header, which is good for resilience. However, if the users care
more about performance than resilience, their gang blocks will be even
more of a penalty than usual.

We add logic to calculate the number of gang headers copies directly,
and store it as a separate IO property. This is stored in the IO
properties and not calculated when we decide to gang because by that
point we may not have easy access to the relevant information about what
kind of block is being stored. We also check the redundant_metadata
property when doing so, and use that to decide whether to store an extra
copy of the gang headers, compared to the underlying blocks.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Paul Dagnelie <[email protected]>
Co-authored-by: Paul Dagnelie <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Jun 16, 2025
Missed in openzfs#17073, probably because that PR was branched before openzfs#17001
was landed and never rebased.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
gamanakis pushed a commit to gamanakis/zfs that referenced this pull request Jul 20, 2025
…17073)

The redundant_metadata setting in ZFS allows users to trade resilience
for performance and space savings. This applies to all data and metadata
blocks in zfs, with one exception: gang blocks. Gang blocks currently
just take the copies property of the IO being ganged and, if it's 1,
sets it to 2. This means that we always make at least two copies of a
gang header, which is good for resilience. However, if the users care
more about performance than resilience, their gang blocks will be even
more of a penalty than usual.

We add logic to calculate the number of gang headers copies directly,
and store it as a separate IO property. This is stored in the IO
properties and not calculated when we decide to gang because by that
point we may not have easy access to the relevant information about what
kind of block is being stored. We also check the redundant_metadata
property when doing so, and use that to decide whether to store an extra
copy of the gang headers, compared to the underlying blocks.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Paul Dagnelie <[email protected]>
Co-authored-by: Paul Dagnelie <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants