Skip to content

Only interrupt active disk I/Os in failmode=continue #17372

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pcd1193182
Copy link
Contributor

Sponsored by: [Klara, Inc., Wasabi Technology, Inc.]

Motivation and Context

failmode=continue is in a sorry state. Originally designed to fix a very specific problem, it causes crashes and panics for most people who end up trying to use it. At this point, we should either remove it entirely, or try to make it more usable.

Description

With this patch, I choose the latter. While the feature is fundamentally unpredictable and prone to race conditions, it should be possible to get it to the point where it can at least sometimes be useful for some users. This patch fixes one of the major issues with failmode=continue: it interrupts even ZIOs that are patiently waiting in line behind stuck IOs. Advancing a ZIO from the VDEV_IO_START stage to the VDEV_IO_DONE stage causes a problem if the IO hadn't actually been issued yet, since the IO_DONE stage unconditionally removes the zio from the active list. That fails if the IO hasn't been added to the active list.

To prevent this, we just only wake IOs that are actually active leaf vdev IOs. Any other I/O will get ignored; this does reduce the scope of what the continue failmode can address, but I think it does so in a helpful way. Feedback on other ideas is welcome, though.

How Has This Been Tested?

I used mdadm to create virtual devices, and then built a pool on top of them. I then used mdadm suspend to cause the IOs to one of the devices to hang. I then set the failmode to continue after the deadman messages started to appear in the log. Without the patch, the system reliably kernel panics. With the patch, the IOs are reexecuted

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Quality assurance (non-breaking change which makes the code more robust against bugs)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

failmode=continue is in a sorry state. Originally designed to fix a very
specific problem, it causes crashes and panics for most people who end
up trying to use it. At this point, we should either remove it entirely,
or try to make it more usable.

With this patch, I choose the latter. While the feature is fundamentally
unpredictable and prone to race conditions, it should be possible to get
it to the point where it can at least sometimes be useful for some
users. This patch fixes one of the major issues with failmode=continue:
it interrupts even ZIOs that are patiently waiting in line behind stuck
IOs.

Signed-off-by: Paul Dagnelie <[email protected]>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Copy link
Member

@robn robn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: this is about zfs_deadman_failmode=continue (tunable/modparam), not failmode=continue (pool property), that is, spa->spa_deadman_failmode vs spa->spa_failmode.

(Important to me, because I'm in the middle of rehabilitating failmode=continue proper, via #17355 and beyond. So if we are bumping into each other, we should have a chat about that. But I don't think we are!).

Property lawyering aside, this is good for what it is! My gut feel is that at least some of this should be "inside" the queue, but that's vague and this function obviously needs more work than this, so we good. Good stuff!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants