Skip to content

ZED still crashing after PR-11928 fix integrated #11963

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
don-brady opened this issue Apr 28, 2021 · 6 comments
Closed

ZED still crashing after PR-11928 fix integrated #11963

don-brady opened this issue Apr 28, 2021 · 6 comments
Labels
Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@don-brady
Copy link
Contributor

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 18.04
Linux Kernel 5.4.0-1045
Architecture x86_64
ZFS Version 2021.04.27.17-3b038a4
SPL Version 2021.04.27.17-3b038a4

Describe the problem you're observing

ZED is encountering an assert due to adding a duplicate key into an AVL tree.

Describe how to reproduce the problem

We are just using stock ZED in our automated testing

Include any warning/errors/backtraces from the system logs

#1  0x00007feb6a1e5921 in __GI_abort () at abort.c:79
#2  0x00007feb6b20d805 in libspl_assertf (file=0x7feb6b21c199 "../../module/avl/avl.c", func=0x7feb6b21c1b0 <__FUNCTION__.5089> "avl_add", line=640, format=<optimized out>)
    at assert.c:45
#3  0x00007feb6b215cfc in libspl_assert (line=640, func=<synthetic pointer>, file=0x7feb6b21c199 "../../module/avl/avl.c",
    buf=0x7feb6b21c170 "avl_find(tree, new_node, &where) == NULL") at ../../lib/libspl/include/assert.h:46
#4  0x00007feb6b21601b in avl_add (tree=tree@entry=0x5639784be320 <_launched_processes>, new_node=new_node@entry=0x563979d54de0) at ../../module/avl/avl.c:640
#5  0x00005639782b14f9 in _zed_exec_fork_child (eid=eid@entry=1644, dir=<optimized out>, prog=prog@entry=0x563979d4e3d0 "all-syslog.sh", env=env@entry=0x563979d57bc0,
    zfd=7, in_foreground=B_TRUE) at zed_exec.c:180
#6  0x00005639782b1b5f in zed_exec_process (eid=1644, class=<optimized out>, subclass=subclass@entry=0x563979d551b0 "history_event", zcp=zcp@entry=0x7ffe923a8940,
    envs=envs@entry=0x563979d577f0) at zed_exec.c:361
#7  0x00005639782b1253 in zed_event_service (zcp=0x7ffe923a8940) at zed_event.c:979
#8  0x00005639782ac7e0 in main (argc=<optimized out>, argv=<optimized out>) at zed.c:291
(gdb) frame 4
#4  0x00007feb6b21601b in avl_add (tree=tree@entry=0x5639784be320 <_launched_processes>, new_node=new_node@entry=0x563979d54de0) at ../../module/avl/avl.c:640
640	../../module/avl/avl.c: No such file or directory.
(gdb)  p *(struct launched_process_node *)0x563979d54de0
$1 = {node = {avl_child = {0x0, 0x0}, avl_pcb = 0}, pid = 24641, eid = 1644, name = 0x563979d576b0 "all-syslog.sh"}
(gdb) p *tree
$2 = {avl_root = 0x563979d574a0, avl_compar = 0x5639782b12a0 <_launched_process_node_compare>, avl_offset = 0, avl_numnodes = 1, avl_size = 48}
(gdb) p *(struct avl_node *) 0x563979d574a0
$3 = {avl_child = {0x0, 0x0}, avl_pcb = 1}
(gdb) frame 5
#5  0x00005639782b14f9 in _zed_exec_fork_child (eid=eid@entry=1644, dir=<optimized out>, prog=prog@entry=0x563979d4e3d0 "all-syslog.sh", env=env@entry=0x563979d57bc0,
    zfd=7, in_foreground=B_TRUE) at zed_exec.c:180
180	zed_exec.c: No such file or directory.
(gdb) p _launched_processes_lock
$4 = pthread_mutex_t = {Type = Normal, Status = Acquired, possibly with no waiters, Owner ID = 1851, Robust = No, Shared = No, Protocol = None}
(gdb) p pid
$5 = 24641
(gdb) p eid
$6 = 1644
(gdb) thread 3
[Switching to thread 3 (Thread 0x7feb637fe700 (LWP 1936))]
#0  0x00007feb6a5a7fc2 in __libc_pause () at ../sysdeps/unix/sysv/linux/pause.c:30
30	../sysdeps/unix/sysv/linux/pause.c: No such file or directory.
(gdb) bt
#0  0x00007feb6a5a7fc2 in __libc_pause () at ../sysdeps/unix/sysv/linux/pause.c:30
#1  0x00005639782b18b5 in _reap_children (arg=<optimized out>) at zed_exec.c:214
#2  0x00007feb6a59d6db in start_thread (arg=0x7feb637fe700) at pthread_create.c:463
#3  0x00007feb6a2c671f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) frame 1
#1  0x00005639782b18b5 in _reap_children (arg=<optimized out>) at zed_exec.c:214
214	zed_exec.c: No such file or directory.
(gdb) p pid
$7 = -1
@don-brady don-brady added Type: Defect Incorrect behavior (e.g. crash, hang) Status: Triage Needed New issue which needs to be triaged labels Apr 28, 2021
@behlendorf
Copy link
Contributor

cc: @nabijaczleweli

@nabijaczleweli
Copy link
Contributor

nabijaczleweli commented Apr 28, 2021

blergh. so, apparently, and this is fun:

A process ID shall not be reused by the system until the process lifetime ends.

good, great, this is defined as

The period of time that begins when a process is created and ends when its process ID is returned to the system.

and then

The remains of a live process (see Live Process) after it terminates (see Process Termination) and before its status information (see XSH Status Information) is consumed by its parent process.

and a Live Process is

An address space with one or more threads executing within that address space, and the required system resources for those threads.
Many of the system resources defined by POSIX.1-2017 are shared among all of the threads within a process. These include the process ID, [&c.].

However, according to Linux man-pages' wait(2):

The kernel maintains a minimal set of information about the zombie process (PID, termination status, resource usage information) in order to allow the parent to later perform a wait to obtain information about the child. As long as a zombie is not removed from the system via a wait, it will consume a slot in the kernel process table, and if this table fills, it will not be possible to create further processes.

So, according to POSIX, re-using the PID for an already existing zombie might or might not be legal, depending on how you choose to read it (it appears to be defined as "its reused when it can be", which is very helpful), but according to Linux man-pages, it should not happen, i.e. a Live Process cannot share a PID with a Zombie Process.

And, since the reaper thread is in the idle state with pid = -1, it means that the last time it waited there were no processes to wait for since wait() returned ECHILD.

a riddle for the ages

nabijaczleweli added a commit to nabijaczleweli/zfs that referenced this issue Apr 28, 2021
nabijaczleweli added a commit to nabijaczleweli/zfs that referenced this issue Apr 28, 2021
nabijaczleweli added a commit to nabijaczleweli/zfs that referenced this issue Apr 28, 2021
@nabijaczleweli
Copy link
Contributor

@don-brady can you try #11965 maybe? if it works, then great, but if not, then maybe it's time to come up with an alternative solution

@h1z1
Copy link

h1z1 commented May 1, 2021

"its reused when it can be"

Not quite. It's reused after it's reaped, that can be from parent or init. Don't think even SunOS recycled a zombie pid. It is a common mistake to assume the pid you saved will in fact refer to the same process later.

The child and parent may not even exist in the same namespaces :)

@nabijaczleweli
Copy link
Contributor

nabijaczleweli commented May 1, 2021

Well, the POSIX definition is circular, but I really don't expect any system to reuse zombie PIDs, which is why it was so surprising to hit this.
But considering we are the parent and track the PID from its conception until we rip apart its steaming corpse, that is guaranteed to be the same process. Which, again, is why it was so surprising to hit this.

Thinking about it again, we could race fork-die-reap-metadata_insertion, but that (and more) is handled in #11965.

Also, changing namespaces alters the view from within, not from without.

@nabijaczleweli
Copy link
Contributor

nabijaczleweli commented May 1, 2021

Great news! I managed to reproduce this by starving the system of PIDs and doing

diff --git a/cmd/zed/zed_exec.c b/cmd/zed/zed_exec.c
index 9b0775a74..9948517a7 100644
--- a/cmd/zed/zed_exec.c
+++ b/cmd/zed/zed_exec.c
@@ -206,6 +208,7 @@ _reap_children(void *arg)

        for (_reap_children_stop = B_FALSE; !_reap_children_stop; ) {
                (void) pthread_mutex_lock(&_launched_processes_lock);
+               sleep(1);
                pid = wait4(0, &status, WNOHANG, &usage);

Manifests as yours has:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0xf7d26515 in __GI_abort () at abort.c:79
#2  0xf7f46e05 in ?? () from /lib/x86_64-linux-gnux32/libzfs_core.so.3
#3  0xf7f4ed58 in avl_add () from /lib/x86_64-linux-gnux32/libzfs_core.so.3
#4  0x00408a96 in _zed_exec_fork_child (eid=eid@entry=2485, dir=<optimized out>,
    prog=prog@entry=0x41f310 "history_event-zfs-list-cacher.sh", env=env@entry=0x5ec4d0, zfd=<optimized out>,
    in_foreground=B_TRUE) at zed_exec.c:182
#5  0x00408e40 in zed_exec_process (eid=2485, class=<optimized out>, subclass=subclass@entry=0x5e7d08 "history_event",
    zcp=zcp@entry=0xffffd530, envs=envs@entry=0x5eabe0) at zed_exec.c:364
#6  0x00408536 in zed_event_service (zcp=zcp@entry=0xffffd530) at zed_event.c:979
#7  0x00404090 in main (argc=<optimized out>, argv=<optimized out>) at zed.c:291

With this node being inserted:

(gdb) p *(struct launched_process_node *)0x5e9b40
$14 = {node = {avl_child = {0x0, 0x0}, avl_parent = 0x0, avl_child_index = 0, avl_balance = 0}, pid = 353, eid = 2485,
  name = 0x5e9a30 "history_event-zfs-list-cacher.sh"}

and this one already existing:

(gdb) p *(struct launched_process_node *)avl_find(&_launched_processes, (struct launched_process_node *)0x5e9b40, (void*)0)
$13 = {node = {avl_child = {0x5e8030, 0x5e84a0}, avl_parent = 0x5ea790, avl_child_index = 0, avl_balance = -1},
  pid = 353, eid = 2477, name = 0x5e9080 "all-debug.sh"}

nabijaczleweli added a commit to nabijaczleweli/zfs that referenced this issue May 1, 2021
This can be very easily triggered by adding a sleep(1) before
the wait4() on a PID-starved system: the reaper thread would wait
for a child before its entry appeared, letting old entries accumulate:
  Invoking "all-debug.sh" eid=3021 pid=391
  Finished "(null)" eid=0 pid=391 time=0.002432s exit=0
  Invoking "all-syslog.sh" eid=3021 pid=336
  Finished "(null)" eid=0 pid=336 time=0.002432s exit=0
  Invoking "history_event-zfs-list-cacher.sh" eid=3021 pid=347
  Invoking "all-debug.sh" eid=3022 pid=349
  Finished "history_event-zfs-list-cacher.sh" eid=3021 pid=347 time=0.001669s exit=0
  Finished "(null)" eid=0 pid=349 time=0.002404s exit=0
  Invoking "all-syslog.sh" eid=3022 pid=370
  Finished "(null)" eid=0 pid=370 time=0.002427s exit=0
  Invoking "history_event-zfs-list-cacher.sh" eid=3022 pid=391
  avl_find(tree, new_node, &where) == NULL
  ASSERT at ../../module/avl/avl.c:641:avl_add()
  Thread 1 "zed" received signal SIGABRT, Aborted.

By employing this wider lock,
we atomise [wait, remove] and [fork, add]:
slowing down the reaper thread now just causes some zombies to accumulate until it can get to them

Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11963
nabijaczleweli added a commit to nabijaczleweli/zfs that referenced this issue May 1, 2021
This can be very easily triggered by adding a sleep(1) before
the wait4() on a PID-starved system: the reaper thread would wait
for a child before its entry appeared, letting old entries accumulate:
  Invoking "all-debug.sh" eid=3021 pid=391
  Finished "(null)" eid=0 pid=391 time=0.002432s exit=0
  Invoking "all-syslog.sh" eid=3021 pid=336
  Finished "(null)" eid=0 pid=336 time=0.002432s exit=0
  Invoking "history_event-zfs-list-cacher.sh" eid=3021 pid=347
  Invoking "all-debug.sh" eid=3022 pid=349
  Finished "history_event-zfs-list-cacher.sh" eid=3021 pid=347
                                              time=0.001669s exit=0
  Finished "(null)" eid=0 pid=349 time=0.002404s exit=0
  Invoking "all-syslog.sh" eid=3022 pid=370
  Finished "(null)" eid=0 pid=370 time=0.002427s exit=0
  Invoking "history_event-zfs-list-cacher.sh" eid=3022 pid=391
  avl_find(tree, new_node, &where) == NULL
  ASSERT at ../../module/avl/avl.c:641:avl_add()
  Thread 1 "zed" received signal SIGABRT, Aborted.

By employing this wider lock, we atomise [wait, remove] and [fork, add]:
slowing down the reaper thread now just causes some zombies to accumulate
until it can get to them

Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11963
nabijaczleweli added a commit to nabijaczleweli/zfs that referenced this issue May 1, 2021
This can be very easily triggered by adding a sleep(1) before
the wait4() on a PID-starved system: the reaper thread would wait
for a child before its entry appeared, letting old entries accumulate:
  Invoking "all-debug.sh" eid=3021 pid=391
  Finished "(null)" eid=0 pid=391 time=0.002432s exit=0
  Invoking "all-syslog.sh" eid=3021 pid=336
  Finished "(null)" eid=0 pid=336 time=0.002432s exit=0
  Invoking "history_event-zfs-list-cacher.sh" eid=3021 pid=347
  Invoking "all-debug.sh" eid=3022 pid=349
  Finished "history_event-zfs-list-cacher.sh" eid=3021 pid=347
                                              time=0.001669s exit=0
  Finished "(null)" eid=0 pid=349 time=0.002404s exit=0
  Invoking "all-syslog.sh" eid=3022 pid=370
  Finished "(null)" eid=0 pid=370 time=0.002427s exit=0
  Invoking "history_event-zfs-list-cacher.sh" eid=3022 pid=391
  avl_find(tree, new_node, &where) == NULL
  ASSERT at ../../module/avl/avl.c:641:avl_add()
  Thread 1 "zed" received signal SIGABRT, Aborted.

By employing this wider lock, we atomise [wait, remove] and [fork, add]:
slowing down the reaper thread now just causes some zombies
to accumulate until it can get to them

Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11963
behlendorf pushed a commit to behlendorf/zfs that referenced this issue May 10, 2021
This can be very easily triggered by adding a sleep(1) before
the wait4() on a PID-starved system: the reaper thread would wait
for a child before its entry appeared, letting old entries accumulate:

  Invoking "all-debug.sh" eid=3021 pid=391
  Finished "(null)" eid=0 pid=391 time=0.002432s exit=0
  Invoking "all-syslog.sh" eid=3021 pid=336
  Finished "(null)" eid=0 pid=336 time=0.002432s exit=0
  Invoking "history_event-zfs-list-cacher.sh" eid=3021 pid=347
  Invoking "all-debug.sh" eid=3022 pid=349
  Finished "history_event-zfs-list-cacher.sh" eid=3021 pid=347
                                              time=0.001669s exit=0
  Finished "(null)" eid=0 pid=349 time=0.002404s exit=0
  Invoking "all-syslog.sh" eid=3022 pid=370
  Finished "(null)" eid=0 pid=370 time=0.002427s exit=0
  Invoking "history_event-zfs-list-cacher.sh" eid=3022 pid=391
  avl_find(tree, new_node, &where) == NULL
  ASSERT at ../../module/avl/avl.c:641:avl_add()
  Thread 1 "zed" received signal SIGABRT, Aborted.

By employing this wider lock, we atomise [wait, remove] and [fork, add]:
slowing down the reaper thread now just causes some zombies
to accumulate until it can get to them

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Don Brady <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11963
Closes openzfs#11965
sempervictus pushed a commit to sempervictus/zfs that referenced this issue May 31, 2021
This can be very easily triggered by adding a sleep(1) before
the wait4() on a PID-starved system: the reaper thread would wait
for a child before its entry appeared, letting old entries accumulate:

  Invoking "all-debug.sh" eid=3021 pid=391
  Finished "(null)" eid=0 pid=391 time=0.002432s exit=0
  Invoking "all-syslog.sh" eid=3021 pid=336
  Finished "(null)" eid=0 pid=336 time=0.002432s exit=0
  Invoking "history_event-zfs-list-cacher.sh" eid=3021 pid=347
  Invoking "all-debug.sh" eid=3022 pid=349
  Finished "history_event-zfs-list-cacher.sh" eid=3021 pid=347
                                              time=0.001669s exit=0
  Finished "(null)" eid=0 pid=349 time=0.002404s exit=0
  Invoking "all-syslog.sh" eid=3022 pid=370
  Finished "(null)" eid=0 pid=370 time=0.002427s exit=0
  Invoking "history_event-zfs-list-cacher.sh" eid=3022 pid=391
  avl_find(tree, new_node, &where) == NULL
  ASSERT at ../../module/avl/avl.c:641:avl_add()
  Thread 1 "zed" received signal SIGABRT, Aborted.

By employing this wider lock, we atomise [wait, remove] and [fork, add]:
slowing down the reaper thread now just causes some zombies
to accumulate until it can get to them

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Don Brady <[email protected]>
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
Closes openzfs#11963
Closes openzfs#11965
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

4 participants