Only interrupt active disk I/Os in failmode=continue #17372

pcd1193182 · 2025-05-23T17:42:19Z

Sponsored by: [Klara, Inc., Wasabi Technology, Inc.]

Motivation and Context

failmode=continue is in a sorry state. Originally designed to fix a very specific problem, it causes crashes and panics for most people who end up trying to use it. At this point, we should either remove it entirely, or try to make it more usable.

Description

With this patch, I choose the latter. While the feature is fundamentally unpredictable and prone to race conditions, it should be possible to get it to the point where it can at least sometimes be useful for some users. This patch fixes one of the major issues with failmode=continue: it interrupts even ZIOs that are patiently waiting in line behind stuck IOs. Advancing a ZIO from the VDEV_IO_START stage to the VDEV_IO_DONE stage causes a problem if the IO hadn't actually been issued yet, since the IO_DONE stage unconditionally removes the zio from the active list. That fails if the IO hasn't been added to the active list.

To prevent this, we just only wake IOs that are actually active leaf vdev IOs. Any other I/O will get ignored; this does reduce the scope of what the continue failmode can address, but I think it does so in a helpful way. Feedback on other ideas is welcome, though.

How Has This Been Tested?

I used mdadm to create virtual devices, and then built a pool on top of them. I then used mdadm suspend to cause the IOs to one of the devices to hang. I then set the failmode to continue after the deadman messages started to appear in the log. Without the patch, the system reliably kernel panics. With the patch, the IOs are reexecuted

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Quality assurance (non-breaking change which makes the code more robust against bugs)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

failmode=continue is in a sorry state. Originally designed to fix a very specific problem, it causes crashes and panics for most people who end up trying to use it. At this point, we should either remove it entirely, or try to make it more usable. With this patch, I choose the latter. While the feature is fundamentally unpredictable and prone to race conditions, it should be possible to get it to the point where it can at least sometimes be useful for some users. This patch fixes one of the major issues with failmode=continue: it interrupts even ZIOs that are patiently waiting in line behind stuck IOs. Signed-off-by: Paul Dagnelie <[email protected]> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc.

robn

To be clear: this is about zfs_deadman_failmode=continue (tunable/modparam), not failmode=continue (pool property), that is, spa->spa_deadman_failmode vs spa->spa_failmode.

(Important to me, because I'm in the middle of rehabilitating failmode=continue proper, via #17355 and beyond. So if we are bumping into each other, we should have a chat about that. But I don't think we are!).

Property lawyering aside, this is good for what it is! My gut feel is that at least some of this should be "inside" the queue, but that's vague and this function obviously needs more work than this, so we good. Good stuff!

robn approved these changes May 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Only interrupt active disk I/Os in failmode=continue #17372

Only interrupt active disk I/Os in failmode=continue #17372

pcd1193182 commented May 23, 2025

Uh oh!

robn left a comment

Uh oh!

Uh oh!

Only interrupt active disk I/Os in failmode=continue #17372

Are you sure you want to change the base?

Only interrupt active disk I/Os in failmode=continue #17372

Conversation

pcd1193182 commented May 23, 2025

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

Uh oh!

robn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!