-
Notifications
You must be signed in to change notification settings - Fork 253
internal/exec/stages/disks: prevent races with udev #1319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
FYI, Flatcar uses this already for a while now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for upstreaming this and writing a detailed commit message! Always fun to debug udev races. :)
Had a minor comment, but this looks sane to me overall.
One concern is that we're calling this on every separate filesystem/disk/LUKS/RAID device. Would it be more efficient to instead do it only once per section for the last device in each associated list?
848f20a
to
716cec5
Compare
That would rely on inotify for the other devices and I think there is no guarantee that the inotify event goes into the queue and gets fully processed if we trigger a tagged event and wait for this event's completion - maybe if we check all implementation details we know that it works for the current udev implementation but I would rather be on the safe side since absence of the race condition is not verifiable through testing… |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Will let @bgilbert take a look as well.
d14a76a
to
fba4ff4
Compare
I won't be able to look at this for some weeks, unfortunately, but it's still on my list and I'll circle back as soon as I can. |
Your reasoning makes sense to me, but I haven't thought it through carefully. I'll defer to re-review from @jlebon, which actually I should have done months ago. 😳 Thanks for your patience. PR needs rebase to pick up CI changes. |
d12147c
to
3c6297c
Compare
I added a release note entry as required |
@jlebon Can you do the approval (actually, you did already, not sure you need to submit again) and hit the merge button? |
The "udevadm settle" command used to wait for udev to process the disk changes and recreate the entries under /dev was still prone to races where udev didn't get notified yet of the final event to wait for. This caused the boot with a btrfs root filesystem created by Ignition to fail almost every time on certain hardware. Issue tagged events and wait for them to be processed by udev. This is actually meanigful in all stages not only for the other parts of the initramfs which may be surprised by sudden device nodes disappearing shortly like the case was with systemd's fsck service but also for the inter-stage dependencies which currently are using the waiter for systemd device units but that doesn't really prevent from races with udev device node recreation. Thus, these changes are complementary to the existing waiter which mainly has the purpose to wait for unmodified devices. For newly created RAIDs we can wait for the new node to be available as udev will not recreate it.
3c6297c
to
3f78465
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. It (still) makes sense to me and in fact we do something similar in other places of the stack for similar reasons. I'm pretty sure we should be able to drop some of those with this.
I just tweaked the error messages and the release note item. Let me know if you disagree with some of them.
The "udevadm settle" command used to wait for udev to process the disk
changes and recreate the entries under /dev was still prone to races
where udev didn't get notified yet of the final event to wait for.
This caused the boot with a btrfs root filesystem created by Ignition
to fail almost every time on certain hardware.
Issue tagged events and wait for them to be processed by udev. This is
actually meanigful in all stages not only for the other parts of the
initramfs which may be surprised by sudden device nodes disappearing
shortly like the case was with systemd's fsck service but also for the
inter-stage dependencies which currently are using the waiter for
systemd device units but that doesn't really prevent from races with
udev device node recreation. Thus, these changes are complementary to
the existing waiter which mainly has the purpose to wait for unmodified
devices. For newly created RAIDs we can wait for the new node to be
available as udev will not recreate it.