Skip to content

zed: protect against wait4()/fork() races to the launched process tree #11928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 23, 2021

Conversation

nabijaczleweli
Copy link
Contributor

Motivation and Context

See #11924

Description

See commit message.
Slightly different than the original patch, but superficially so.

How Has This Been Tested?

Appears to eliminate the crash in @don-brady's tests

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the OpenZFS code style requirements.
  • I have updated the documentation accordingly.
  • I have read the contributing document.
  • I have added tests to cover my changes. – none apply
  • I have run the ZFS Test Suite with this change applied. – as above
  • All commit messages are properly formatted and contain Signed-off-by.

As soon as wait4() returns, fork() can immediately return with the same
PID, and race to lock _launched_processes_lock, then try to add the new
(duplicate) PID to _launched_processes, which asserts

By locking before wait4(), we ensure, that, given that same
unfortunate scheduling, _launched_processes_lock cannot be locked by the
spawner before we pop the process in the reaper, and only afterward will
it be added

This moves where the reaper idles when there are children from the
wait4() to the pause(), locking for the duration of that single syscall
in both the no-children and running-children cases; the impact of this
is one to two syscalls (depending on _launched_processes_lock state)
per loop

Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11924
@behlendorf behlendorf requested a review from don-brady April 22, 2021 18:25
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Apr 23, 2021
@behlendorf behlendorf merged commit f9bece9 into openzfs:master Apr 23, 2021
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Apr 23, 2021
As soon as wait4() returns, fork() can immediately return with the same
PID, and race to lock _launched_processes_lock, then try to add the new
(duplicate) PID to _launched_processes, which asserts

By locking before wait4(), we ensure, that, given that same
unfortunate scheduling, _launched_processes_lock cannot be locked by the
spawner before we pop the process in the reaper, and only afterward will
it be added

This moves where the reaper idles when there are children from the
wait4() to the pause(), locking for the duration of that single syscall
in both the no-children and running-children cases; the impact of this
is one to two syscalls (depending on _launched_processes_lock state)
per loop

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Don Brady <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11924
Closes openzfs#11928
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
As soon as wait4() returns, fork() can immediately return with the same
PID, and race to lock _launched_processes_lock, then try to add the new
(duplicate) PID to _launched_processes, which asserts

By locking before wait4(), we ensure, that, given that same
unfortunate scheduling, _launched_processes_lock cannot be locked by the
spawner before we pop the process in the reaper, and only afterward will
it be added

This moves where the reaper idles when there are children from the
wait4() to the pause(), locking for the duration of that single syscall
in both the no-children and running-children cases; the impact of this
is one to two syscalls (depending on _launched_processes_lock state)
per loop

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Don Brady <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11924
Closes openzfs#11928
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants