-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Linux 6.15 compat #17229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linux 6.15 compat #17229
Conversation
Died overnight during the ZTS run with a hard panic:
Last log says it was in cleanup for |
Second run ran to completion, all successful. I'll do a few, see if I can run into it again. |
Just tried this, using pf-6.15, latest zfs git with this applied as a patch. everything compiled fine, except nvidia-drivers, which is normal for an RC-stage kernel. Booted fine. Running 3 pools (+1 dormant, all running 2.3), so far so good (only been about 10 min live), will report back if anything untoward occurs. |
Haven't tripped the above panic again since. ZTS completed sucessfully on 6.15.0-rc3. This is shaping up towards "probably mostly fine", but I'm still eager for thorough review on the capability change. Is there a Linux kernel boffin in the room? |
that seems related to what we're after here #17118 |
Rebased to master + #17273, original description updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks ok 👍 I just kicked off the CI to see if Fedora 42 behaves.
The tests won’t reveal the possible downstream effects of returning NULL instead of the dentry on success, especially in setup with kNFSd, I mean sorry for being annoying, but the kernel guys seem to be after some speedups there and if every in tree vfs api consumer complies, then it’s possible that over time, this becomes an undertested edge case for them… at least that’s how I understand the new notes. I haven’t dived any deeper into this so far than looking at the diff to see what’s new and applying my recent experience :)) |
Are those patches compatible with 2.2.7 or require backporting?
|
@KrzysztofHajdamowicz that particular snippet comes from the long filename support added in 2.3. The last kernel supported by 2.2.7 in 6.12, so you'll probably need to backport at least some of the 6.13 and 6.14 patches first. |
Clean ZTS run on 6.15-rc5 overnight. |
I have 6.13 and 6.14 [1] working with 2.2.7 [2] so it's pretty compatible right now. 1: https://github.com/KrzysztofHajdamowicz/pve-kernel/releases |
I see your latest push still uses |
@tonyhutter sorry, wasn't clear. I think |
@KrzysztofHajdamowicz more the latter - just go through the patches, take the bits you need. You probably do need c8fa39b though. |
Ok, we should probably give a heads-up in the code why we didn't use diff --git a/module/os/linux/zfs/zpl_inode.c b/module/os/linux/zfs/zpl_inode.c
index f9f6406f8..08f7713b4 100644
--- a/module/os/linux/zfs/zpl_inode.c
+++ b/module/os/linux/zfs/zpl_inode.c
@@ -205,6 +205,11 @@ zpl_create(struct inode *dir, struct dentry *dentry, umode_t mode, bool flag)
remove_inode_hash(ZTOI(zp));
iput(ZTOI(zp));
} else {
+ /*
+ * We can use d_instantiate() here since we know our
+ * inode will always be I_NEW since we create it.
+ */
+ ASSERT(ZTOI(zp)->i_state & I_NEW);
d_instantiate(dentry, ZTOI(zp));
}
} |
I haven't studied it closely enough to know if that assert is safe. To be clear, Of course, if sorting it out now is required for 6.15, then I'll do it now (like the |
@robn sorry, my earlier diff was in zpl_create() instead of zpl_mkdir(). It should be: diff --git a/module/os/linux/zfs/zpl_inode.c b/module/os/linux/zfs/zpl_inode.c
index f9f6406f8..0b6fdc6ba 100644
--- a/module/os/linux/zfs/zpl_inode.c
+++ b/module/os/linux/zfs/zpl_inode.c
@@ -423,6 +423,12 @@ zpl_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
remove_inode_hash(ZTOI(zp));
iput(ZTOI(zp));
} else {
+ /*
+ * We can use d_instantiate() here since we know our
+ * inode will always be I_NEW.
+ */
+ ASSERT(ZTOI(zp)->i_state & I_NEW);
+
d_instantiate(dentry, ZTOI(zp));
}
}
I bring it up in the context of this PR since
(https://elixir.bootlin.com/linux/v6.15-rc5/source/Documentation/filesystems/vfs.rst) But if we're sure the inode is always I_NEW then the ASSERT() will always pass, and we can keep things simple with |
I agree you're probably right, but I have definitely not thought hard about the conditional "if there any chance that the inode could already be attached to a dentry". I am fairly sure that's not the case, but there's stuff I haven't looked at much (like snapdirs) and I don't know much about how things like overlays, bind mounts and such interact here. My take is that this call always took a dentry from the kernel, and if you wanted to replace it you had to do jump through some hoops, whereas now you just have to return it. We haven't been jumping through hoops, so we can continue not to, and that's it. But, like I said, I reckon it's probably right, I just haven't done the work to be sure and as we know, I am very averse to risk. But I sense I might be overblowing it here, so I'll push it for the test runners to take a look at, and get a fresh run going against rc5, and see that they are probably fine :) |
Heh, nope. On 6.8.12 and 6.15-rc5:
So yeah. lets not do that. |
The intent is that the filesystem may have a reference to an "old" version of the new directory, eg if it was keeping it alive because a remote NFS client still had it open. We don't need anything like that, so this really just changes things so we return error codes encoded in pointers. Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <[email protected]>
Renamed in 6.2, and the compat wrapper removed in 6.15. No signature or functional change apart from that, so a very minimal update for us. Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <[email protected]>
ZTS pass on -rc6. Pretty sure I have nothing more to add here. |
Renamed in 6.2, and the compat wrapper removed in 6.15. No signature or functional change apart from that, so a very minimal update for us. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Pavel Snajdr <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #17229
Thanks guys, merged. We'll bump the META version when the final 6.15 is tagged and we're able to retest with it. |
Great, thanks. FYI, it passed on -rc7 overnight. |
The intent is that the filesystem may have a reference to an "old" version of the new directory, eg if it was keeping it alive because a remote NFS client still had it open. We don't need anything like that, so this really just changes things so we return error codes encoded in pointers. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Pavel Snajdr <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#17229 (cherry picked from commit bb740d6)
Renamed in 6.2, and the compat wrapper removed in 6.15. No signature or functional change apart from that, so a very minimal update for us. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Pavel Snajdr <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#17229 (cherry picked from commit 841be1d) (cherry picked from commit 1dc9bec)
The intent is that the filesystem may have a reference to an "old" version of the new directory, eg if it was keeping it alive because a remote NFS client still had it open. We don't need anything like that, so this really just changes things so we return error codes encoded in pointers. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Pavel Snajdr <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#17229 (cherry picked from commit bb740d6)
Renamed in 6.2, and the compat wrapper removed in 6.15. No signature or functional change apart from that, so a very minimal update for us. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Pavel Snajdr <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#17229 (cherry picked from commit 841be1d) (cherry picked from commit 1dc9bec)
The intent is that the filesystem may have a reference to an "old" version of the new directory, eg if it was keeping it alive because a remote NFS client still had it open. We don't need anything like that, so this really just changes things so we return error codes encoded in pointers. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Pavel Snajdr <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#17229 (cherry picked from commit bb740d6)
Renamed in 6.2, and the compat wrapper removed in 6.15. No signature or functional change apart from that, so a very minimal update for us. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Pavel Snajdr <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#17229 (cherry picked from commit 841be1d)
The intent is that the filesystem may have a reference to an "old" version of the new directory, eg if it was keeping it alive because a remote NFS client still had it open. We don't need anything like that, so this really just changes things so we return error codes encoded in pointers. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Pavel Snajdr <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#17229 (cherry picked from commit bb740d6)
Renamed in 6.2, and the compat wrapper removed in 6.15. No signature or functional change apart from that, so a very minimal update for us. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Pavel Snajdr <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#17229 (cherry picked from commit 841be1d)
Just FYI, final 6.15 was tagged yesterday. |
The intent is that the filesystem may have a reference to an "old" version of the new directory, eg if it was keeping it alive because a remote NFS client still had it open. We don't need anything like that, so this really just changes things so we return error codes encoded in pointers. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Pavel Snajdr <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #17229 (cherry picked from commit bb740d6)
Renamed in 6.2, and the compat wrapper removed in 6.15. No signature or functional change apart from that, so a very minimal update for us. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Pavel Snajdr <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #17229 (cherry picked from commit 841be1d) (cherry picked from commit 1dc9bec)
The intent is that the filesystem may have a reference to an "old" version of the new directory, eg if it was keeping it alive because a remote NFS client still had it open. We don't need anything like that, so this really just changes things so we return error codes encoded in pointers. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Pavel Snajdr <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #17229 (cherry picked from commit bb740d6)
Renamed in 6.2, and the compat wrapper removed in 6.15. No signature or functional change apart from that, so a very minimal update for us. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Pavel Snajdr <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #17229 (cherry picked from commit 841be1d)
Motivation and Context
6.15-rc1 released, so off we go again.
Description
First two changes are benign: minor signature change for
iops->mkdir
, and a rename fordel_timer_sync
. Nothing to see here.Update 2025-04-29: the third change, to update the credential checks, has been removed in favour of #17273. This PR has been rebased on that one. Original PR description in the collapsed section for posterity.
Original description of changes to credential checks.
The last one is more complicated, and I would like a subject-matter expert or two to have a look, because I'm not that and this is educated guesswork.
For background, since I had to go figure it out myself:
secpolicy_zfs_proc()
is used to ensure that a given process (or credential from it) is allowed to perform some operation. It's used by various async admin tasks (eg taking a snapshot), where the calling process (egzfs snapshot
in userspace) is not the one that actually performs the work (some kernel thread), and of particular interest, I think, is the use in channel programs, where we need to consider the permissions of thezfs program
process as different Lua functions are called.On Linux, most of the capability-checking functions check permissions for the current task, not some arbitrary task. We have used
has_capability()
, which can take a credential from a target process; exactly what we want. However, this function has been removed in 6.15, and there's no easy equivalent we can just drop in:has_capabaility_noaudit()
would drop in, but as it says, doesn't post an audit event, which isn't something I just want to get rid of unless we have no choicehas_ns_capability()
,security_capable()
and other low-level functions aren't exported to modulesfile_ns_capable()
is exported, but it's a quite awful to cook astruct file
with only astruct cred
insideSo, I've taken a slightly different approach. There's a set of APIs for swapping a
struct cred
into the current task, and reverting it. I'm now using this to get the cred from the target task, swap it in, do the check, swap it all back out and return. This appears to at least not break anything in the test suite, but I don't know to what extent it's actually exercised, so I'd like someone who knows this subsystem or has the ability to test properly to do so and report back.I don't love this, in part because of the warning in the kernel headers for
get_task_cred()
that "the caller must make sure task doesn't go away". However, it seems that if this is a problem, then it's already a problem, in that the stashed task we pass tosecpolicy_zfs_proc()
could have disappeared since the original call from userspace. I'm assuming that's not already happening for now.Incidentally, if this does pan out, it might be the way to change the API to only need the
cred_t
, like on FreeBSD, by callingprepare_kernel_cred()
wherever we would callCRED()
now. This creates astruct cred
explicitly designed to allow a kernel thread to do something on behalf of a userspace program. It does change thing a little, as we need a release function (the returned cred is heap-allocated), but it might allow us to not have to pass the task around with the cred. I'm not going to do this (for now) as I really don't know enough about it all, but there's a small project if someone wants it.Tiny spanner:
get_task_cred()
exists but was not exported in upstream 4.19 (though I'd be surprised if it were not backported to an EL8 kernel). Regardless, it is trivial to implement a wrapper, or to just fall back tohas_capability()
there. The reason I haven't kepthas_capability()
for <6.15 is because I would rather not have two entirely different methods on "active" kernel releases unless there's a real need. Too many places for bugs and quirks to hide otherwise.How Has This Been Tested?
Compiled checked on all LTS kernel series >= 4.19, and all 6.x kernel series.
Sanity run (create pool, fio, export, import, scrub) run on >= 4.19, completed successfully. (except 5.5, something weird is stopping it booting on my test VMs, unrelated)
ZTS passed on 6.15-rc1, -rc2, -rc4, rc5, -rc6.
Update 2025-04-29: rechecked after rebase onto #17273.
Types of changes
Checklist:
Signed-off-by
.