Skip to content

Allow opt-in of zvol blocks in special class #14876

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 24, 2025

Conversation

don-brady
Copy link
Contributor

Motivation and Context

The zfs special_small_blocks property allows a dataset to opt-in to using the special vdev class for small block allocations. This property is currently considered valid for filesystems but invalid for volumes. This distinction is somewhat arbitrary for an opt-in policy. In some situations, a small volume (relative to the size of the special class) might benefit from using the special class.

Description

Allow setting special_small_blocks property for volumes. This is effectively a policy change and not tied to a feature flag (other than feature@allocation_classes).

Note: there are no guardrails in place when setting this property and it is possible to oversubscribe the special class with more blocks than could possibly fit. This is not fatal but might be less than ideal from a performance perspective.

How Has This Been Tested?

Manually tested adding a zvol with special_small_blocks set and confirmed allocations went to the special class.
Added new ZTS test to confirm it works: functional/alloc_class/alloc_class_016_pos
Also imported pool with special_small_blocks set on a zvol using an older zfs kernel module.

$ sudo dd of=/dev/zvol/hybrid/zvol if=/dev/urandom bs=1M count=50
50+0 records in
50+0 records out
52428800 bytes (52 MB, 50 MiB) copied, 2.86468 s, 18.3 MB/s
$ zpool list -v hybrid
NAME        SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
hybrid      298G  51.0M   298G        -         -     0%     0%  1.00x    ONLINE  -
  sda2      149G   278K   149G        -         -     0%  0.00%      -    ONLINE
special        -      -      -        -         -      -      -      -         -
  sda3      149G  50.8M   149G        -         -     0%  0.03%      -    ONLINE

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label May 17, 2023
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for extending this!

@don-brady don-brady force-pushed the special_small_blocks_zvol branch from d3dd86a to c04bd08 Compare May 18, 2023 17:34
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels May 18, 2023
@rincebrain
Copy link
Contributor

rincebrain commented May 18, 2023

I might wonder if we would want a tunable defaulted to 0 like zfs_user_indirect_is_special for controlling whether zvol data blocks can land on there, otherwise we might have situations where special_small_blocks is set at, say, pool root, and after this change, we'd have a change in zvol behavior.

(Really, I'd like a pool property for it, not a global, but I don't know how backward compatible those are...)

@don-brady
Copy link
Contributor Author

I might wonder if we would want a tunable defaulted to 0 like zfs_user_indirect_is_special for controlling whether zvol data blocks can land on there, otherwise we might have situations where special_small_blocks is set at, say, pool root, and after this change, we'd have a change in zvol behavior.

@rincebrain Ah, interesting point! I somehow missed that special_small_blocks is an inheritable property. Perhaps we should have a volume specific property, like say, volspecialsize, so it is truly an opt-in property for zvols.

@rincebrain
Copy link
Contributor

I think a new property would be a good option too, given the very different constraints involved, though the semantics might be weird because people will be annoyed if it can't be inherited and they have to manually set it, but setting a noop property on filesystems just so it's inherited on volumes is also unfortunate...

@beren12
Copy link
Contributor

beren12 commented May 19, 2023

Does this also respect the metadata reserve % tunable?

@don-brady
Copy link
Contributor Author

Does this also respect the metadata reserve % tunable?

yes

@don-brady
Copy link
Contributor Author

Note that the other volume-specific properties are not inheritable. If you want all your zvols to have the same volblocksize there is no way to inherit that value.

@scineram
Copy link

scineram commented May 19, 2023

If you want all your zvols to have the same volblocksize there is no way to inherit that value.

Zvols don't have children, so there is nothing to inherit.
Maybe zvols can be exempted from inheriting this property, since they're special?

@rincebrain
Copy link
Contributor

Yes, zvols can't have children. But unlike, say, volblocksize or refreservation, where the values don't make sense to inherit and are zvol specific, whether you store it on the special vdev might make sense to inherit, but just reusing special_small_blocks could yield amusing outcomes.

Idea: A novel property on zvols, honor_special_for_data or some better name, which can be on/off, and listens to special_small_blocks for data records if it's on. Then you can inherit or override special_small_blocks for zvols, and also control whether it cares about the setting for data blocks. (You still get to do N sets to turn it on on all the zvols, but it's the best I've got at the moment.)

@behlendorf behlendorf added Status: Design Review Needed Architecture or design is under discussion and removed Status: Accepted Ready to integrate (reviewed, tested) labels May 19, 2023
@tcpluess
Copy link

what is the current status of this? will it be included in the next release?

@AlexeyMatskevich
Copy link

Are there any plans for this MR?

@jumbi77
Copy link
Contributor

jumbi77 commented Jun 26, 2024

@don-brady Can you may rebase this and ask the maintainers whats left to get it integrated? Much thanks in any case!

@jumbi77
Copy link
Contributor

jumbi77 commented Sep 27, 2024

@don-brady Can you may rebase this and ask the maintainers whats left to get it integrated? Much thanks in any case!

Hello @don-brady, can I politely ask for rebasing? Looking forward to this feature. Much thanks in advance.

@jumbi77
Copy link
Contributor

jumbi77 commented Oct 9, 2024

@behlendorf It is possible to get this into 2.3?

@behlendorf
Copy link
Contributor

behlendorf commented Oct 9, 2024

This is something we could consider for 2.3 if @don-brady has the cycles to wrap up the lose ends and get it rebased.

@tcpluess
Copy link

This is something we could consider for 2.3 is @don-brady has the cycles to wrap up the lose ends and get it rebased.

this would indeed be phantastic. I would contribute if I could, but I am still trying to get started at all. 😂

@jumbi77
Copy link
Contributor

jumbi77 commented Nov 15, 2024

This is something we could consider for 2.3 if @don-brady has the cycles to wrap up the lose ends and get it rebased.

Can I may politely ask @don-brady one last time to finish this up to getting it ready for 2.3 ? Much thanks.

@amotin amotin linked an issue Apr 28, 2025 that may be closed by this pull request
@PrivatePuffin
Copy link
Contributor

@don-brady Any idea if you ever will have the time to finish this?
@behlendorf So the only "blockers" are rebase and adding a toggle for this feature?

@Pesc0
Copy link

Pesc0 commented May 23, 2025

Hi, I would really benefit from this feature, here's my usecase:
Hardware: 2 NVMe + 6 HDD. All disks will be in 2 wide mirror vdevs
Needs: storage for proxmox, storage for NAS

Without this patch:

  • option 1:
    NVMe mirror for proxmox
    Mobo sata controller and all 6 disks passtrough to truenas VM
    Cons: a full VM is kinda wasteful, NVMes mostly unutilized, HDDs could benefit from special device

  • option 2:
    NVMe mirror for proxmox
    Separate HDD pool managed by proxmox
    Share exported by proxmox or lxc or very light VM
    Cons: still not making use of NVMes
    Pros: removed VM overhead, security wise is ok, just for home use and access will be very restricted trough firewall and permissions

  • option 3:
    Unified pool under proxmox, NVMe as special
    Share exported by proxmox or lxc or very light VM
    Cons: now proxmox VM volumes will be striped to HDDs as well, not allowing spindown
    Pros: making use of special device, as well as making use of all the space available, both for VM volumes and NAS use

With this patch:

  • option 4:
    Unified pool under proxmox, NVMe as special
    Share exported by proxmox or lxc or very light VM
    Will set special_small_blocks such that: NAS dataset will be allocated on HDDs and NVMes, VM volumes and proxmox root filesystem can be sent entirely to NVMe mirror, keeping the system fast and allowing HDD spindown
    Pros: making the most of all the hardware available.

This patch is quite simple, I could rebase it on current master if needed. However I have zero experience with zfs source code and will not be able to modify the implementation according to the discussion in this thread.

As a side note: please do give feedback on my proposed setup, it would help a lot prevent mistakes, thanks :)

@PrivatePuffin
Copy link
Contributor

As a side note: please do give feedback on my proposed setup, it would help a lot prevent mistakes, thanks :)

Option 4 is basically how I run all my data.
And it's a pain that I cannot use zvols this way. This really cuts into my pool performance.

@Pesc0
Copy link

Pesc0 commented May 23, 2025

Thank you! :)

@tcpluess
Copy link

Hi, I would really benefit from this feature, here's my usecase: Hardware: 2 NVMe + 6 HDD. All disks will be in 2 wide mirror vdevs Needs: storage for proxmox, storage for NAS

Without this patch:

* option 1:
  NVMe mirror for proxmox
  Mobo sata controller and all 6 disks passtrough to truenas VM
  Cons: a full VM is kinda wasteful, NVMes mostly unutilized, HDDs could benefit from special device

* option 2:
  NVMe mirror for proxmox
  Separate HDD pool managed by proxmox
  Share exported by proxmox or lxc or very light VM
  Cons: still not making use of NVMes
  Pros: removed VM overhead, security wise is ok, just for home use and access will be very restricted trough firewall and permissions

* option 3:
  Unified pool under proxmox, NVMe as special
  Share exported by proxmox or lxc or very light VM
  Cons: now proxmox VM volumes will be striped to HDDs as well, not allowing spindown
  Pros: making use of special device, as well as making use of all the space available, both for VM volumes and NAS use

With this patch:

* option 4:
  Unified pool under proxmox, NVMe as special
  Share exported by proxmox or lxc or very light VM
  Will set special_small_blocks such that: NAS dataset will be allocated on HDDs and NVMes, VM _volumes_ and proxmox root filesystem can be sent entirely to NVMe mirror, keeping the system fast and allowing HDD spindown
  Pros: making the most of all the hardware available.

This patch is quite simple, I could rebase it on current master if needed. However I have zero experience with zfs source code and will not be able to modify the implementation according to the discussion in this thread.

As a side note: please do give feedback on my proposed setup, it would help a lot prevent mistakes, thanks :)

Please do the rebase and try to merge it!

also considering your other questions. I would also do option 4.
As a side note, I am running a couple large-ish ZFS filers. I never do the PCI passthrough, because this will not easily let you migrate containers or VMs. Also the performance impact is not so large anymore like it has probably been in the past.
I have outstanding performance with a LXC container that actually runs the file server. The filer is connected with 25Gb ethernet and can easily serve 20+ clients with 1GbE connections each. Also I did multiple tests with 10GbE clients. No issues so far. So I would recommend not to tinker too much with PCI passthrough. IMO it is just not worth the disadvantage of not being able to easily migrate.

@Pesc0
Copy link

Pesc0 commented May 23, 2025

Got it, however for my home setup I have only one node so migration is not an issue. Thanks!
Will try the rebase when I have time. I won't be able to implement the other changes though...

@Pesc0
Copy link

Pesc0 commented May 23, 2025

there you go, rebased :)
@amotin mind taking a look when tests are done running?

@PrivatePuffin
Copy link
Contributor

there you go, rebased :) @amotin mind taking a look when tests are done running?

Maybe tag him in the correct PR (yours) instead of here.

@PrivatePuffin
Copy link
Contributor

That being said @amotin this might be of interest for iX-Systems to finish off as well:

In previous iterations of TrueNAS, we could use raw-backed VM disks to "force" small blocks to go to special vdevs.
With the new versions based on Incus, we're forced to use zvols, which is a considerable performance impact.

Considering the "barely any" hours needed to patch this up, this might be of interest of your employer as well.

@amotin amotin force-pushed the special_small_blocks_zvol branch from c04bd08 to 5c8df4d Compare May 23, 2025 13:51
Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rebased the branch. Lets see what CI tell.

@amotin amotin force-pushed the special_small_blocks_zvol branch from 5c8df4d to 60f04ac Compare May 23, 2025 14:02
@amotin amotin added the Status: Code Review Needed Ready for review and testing label May 23, 2025
@amotin amotin added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing Status: Design Review Needed Architecture or design is under discussion labels May 24, 2025
@amotin amotin merged commit b048bfa into openzfs:master May 24, 2025
23 of 24 checks passed
@Pesc0 Pesc0 mentioned this pull request May 26, 2025
14 tasks
gamanakis pushed a commit to gamanakis/zfs that referenced this pull request Jul 20, 2025
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Signed-off-by: Don Brady <[email protected]>
Closes openzfs#14876
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ZVOL ignores special_small_blocks