Only interrupt active disk I/Os in failmode=continue #17372
+9
−5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Sponsored by: [Klara, Inc., Wasabi Technology, Inc.]
Motivation and Context
failmode=continue is in a sorry state. Originally designed to fix a very specific problem, it causes crashes and panics for most people who end up trying to use it. At this point, we should either remove it entirely, or try to make it more usable.
Description
With this patch, I choose the latter. While the feature is fundamentally unpredictable and prone to race conditions, it should be possible to get it to the point where it can at least sometimes be useful for some users. This patch fixes one of the major issues with failmode=continue: it interrupts even ZIOs that are patiently waiting in line behind stuck IOs. Advancing a ZIO from the VDEV_IO_START stage to the VDEV_IO_DONE stage causes a problem if the IO hadn't actually been issued yet, since the IO_DONE stage unconditionally removes the zio from the active list. That fails if the IO hasn't been added to the active list.
To prevent this, we just only wake IOs that are actually active leaf vdev IOs. Any other I/O will get ignored; this does reduce the scope of what the
continue
failmode can address, but I think it does so in a helpful way. Feedback on other ideas is welcome, though.How Has This Been Tested?
I used
mdadm
to create virtual devices, and then built a pool on top of them. I then usedmdadm suspend
to cause the IOs to one of the devices to hang. I then set the failmode to continue after the deadman messages started to appear in the log. Without the patch, the system reliably kernel panics. With the patch, the IOs are reexecutedTypes of changes
Checklist:
Signed-off-by
.