-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preliminary soft reboot support #3390
base: main
Are you sure you want to change the base?
Conversation
Hi @Mstrodl. Thanks for your PR. I'm waiting for a ostreedev member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
79852cc
to
45f2550
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for working on this!! Just a preliminary review pass so far
|
||
/* Read in mounted FSes */ | ||
g_autofree gchar *mounts_contents = NULL; | ||
if (!g_file_get_contents ("/proc/self/mounts", &mounts_contents, NULL, error)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's libmount for parsing this stuff
/* Need to fork because setns will fail if we have threads (GIO loop) */ | ||
pid_t child_pid = fork (); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but the problem here is that in the general case now we can't call any function which may have relied on global state - especially like the glib worker thread.
I used to help maintain glib upstream and I probably would have been sympathetic to this situation but just closed as wontfix any bugs arising in this situation.
In bootc I did something fairly similar to this in https://github.com/bootc-dev/bootc/blob/f818b37a57a9cdaf657e55c84de8d6e773f64e97/lib/src/mount.rs#L171 and I was pretty careful about what APIs I called there.
And even there I was pretty tempted to rework to do a "re-exec self" dance (ref bootc-dev/bootc#799 (comment) )
OTOH this won't be our first instance of similar code...the fork()
we also do for fsfreeze
is related.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... I'm not thrilled about this. I couldn't find a good way to stop the GIO loop for this. Can you think of any better ways to do it? One option is to have ostree-prepare-root nsenter
and set up the /run/nextroot
mount. This might be doable by looking at /proc/1/ns/mnt
, but I'm not sure. Or we could nsenter as part of the glib spawn (we're forking there anyways)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know what you think is best and we can do it :)
- Obviously keeping things the way they are is an option (forking isn't great though...)
- I have a suspicion the
/proc/1/ns/mnt
option could cause issues/weird behavior, so I'd prefer not to do that. - ostree-prepare-root mounting
/run/nextroot
feels like it's a little out of scope for the tool (But also then we can nsenter in the fork created for the exec...)- Actually now that I just sent this, we could also maybe set up
/run/nextroot
as part of the spawn setup? Maybe that's a dumb idea though.
- Actually now that I just sent this, we could also maybe set up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bump :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ultimately the replacement for fork()
I think will be https://lwn.net/Articles/908268/
But Josh went back to working on Rust upstream I think and it doesn't have an active driver right now.
Anyways, the problem is there's a lot of APIs invoked here after fork()
in ostree_sysroot_deployment_prepare_next_root()
like e.g. ot_journal_print()
and such.
I mentioned this above and we can try to prove this out but the most reliable thing is to do a re-exec of the current process. Except the massive complication is that libostree is a library...so we'd need to require all callers to give us the chance to intercept re-invocations in main()
(doable, just ugly) or alternatively split out a little /usr/libexec/libostree-helper
binary.
But probably the best alternative in this specific case is as you say
One option is to have ostree-prepare-root nsenter and set up the /run/nextroot mount.
If we can do as much parsing as possible before forking, and then just hand off to it the operations to do that seems likely to be tractable, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that worries me is that the setns call might not work from ostree-prepare-root, but I guess I can't really be certain until I try it :) Definitely the FD I keep open for the mountns won't get passed on, so it'd need to look at pid 1's mountns, which might be okay?
Ideally it would be nice to have a setns thing in io_uring (in the io_uring_spawn scenario). I suspect a lot of the container folks would like that too.
I agree reexec is really not the solution.
Having ostree-prepare-root do the mounting shouldn't be too crazy.
end_loop: | ||
|
||
/* At this point, /sysroot is mounted at /run/nextroot, do the pivot from there: */ | ||
g_ptr_array_add (args, "/usr/lib/ostree/ostree-prepare-root"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though maybe one alternative is to move this handling directly into ostree-prepare-root
? Uncertain.
} | ||
else | ||
{ | ||
kernel_cmdline = argv[2]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! This looks like a really nice change and should crucially really help us for testing ostree-prepare-root outside of the initramfs in general (should be totally doable to run in a container)
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Most of the easy stuff from #3242
One big thing not handled here is detecting a difference in kernels and mounting in the old kernel's
/lib/modules/$(uname -r)
. Probably we shouldn't even be doing that much by default, and instead error out without the user explicitly asking for their kernel to have weird problems.One issue I ran into is that the upgrade process creates a mount namespace so /sysroot can be written to, but we need to escape from it when preparing
/run/nextroot
. To work around this, I keep an FD of/proc/self/ns/mnt
so we cansetns
to it. Unfortunately, we have to do this in a separate fork()'d process, becausesetns
will fail if we have any threads (I couldn't figure out how to stop GIO's main loop...).I'm not thrilled about the setns() stuff or needing to parse through
/proc/self/mounts
, so if anyone has any ideas on how to do it more elegantly, I'm all ears. One option is to set up/run/nextroot
and switch namespaces in the setup part ofg_spawn_sync
, which would save a fork, but I'm not sure if it's worth it.