Skip to content

process hangs on system startup after CRIU restore #1911

Open
@PavloMykhailyshyn

Description

@PavloMykhailyshyn

Description

My service launches CRIU for each process (it could be the same binary with different arguments) simultaneously.
When all dumps are complete, the service ends. The system gets rebooted.
The system boots, the service starts again and goes through each dump folder. It runs CRIU on every image simultaneously.

Service did dump
/usr/sbin/criu-ns dump --images-dir /var/lib/dumps/images/zeb2ft-s5kpj2-64gu7z --shell-job --ext-unix-sk -v4 --log-file ../../logs/dump/2022-06-07T15:31:40Z_zeb2ft-s5kpj2-64gu7z.log --action-script /usr/local/sbin/criu_action_script.sh --tree 385732 --ghost-limit 1G --tcp-established

and restore
/usr/sbin/criu-ns restore --images-dir /var/lib/dumps/images/zeb2ft-s5kpj2-64gu7z --shell-job --ext-unix-sk -v4 --log-file ../../logs/restore/2022-06-07T15:36:06Z_zeb2ft-s5kpj2-64gu7z.log --action-script /usr/local/sbin/criu_action_script.sh --restore-detached --tcp-close

Dumps and Restores are always successful, but the process hangs. It is only reproducible after the system gets rebooted. While the system is on everything works perfectly.

the process stuck here (not CRIU process, CRIU finished successfully)

Thread 30 (Thread 0x7f01b27fc700 (LWP 2799)):
#0  __libc_read (nbytes=5, buf=0x11bd0f73, fd=6) at ../sysdeps/unix/sysv/linux/read.c:26
#1  __libc_read (fd=6, buf=0x11bd0f73, nbytes=5) at ../sysdeps/unix/sysv/linux/read.c:24
#2  0x00007f01cad0e3d9 in ?? () from target:/lib/x86_64-linux-gnu/libcrypto.so.1.1
#3  0x00007f01cad0967e in ?? () from target:/lib/x86_64-linux-gnu/libcrypto.so.1.1
#4  0x00007f01cad084d4 in ?? () from target:/lib/x86_64-linux-gnu/libcrypto.so.1.1
#5  0x00007f01cad08aa7 in BIO_read () from target:/lib/x86_64-linux-gnu/libcrypto.so.1.1
#6  0x00007f01cabf1b91 in ?? () from target:/lib/x86_64-linux-gnu/libssl.so.1.1
#7  0x00007f01cabf5e1e in ?? () from target:/lib/x86_64-linux-gnu/libssl.so.1.1
#8  0x00007f01cabf36d0 in ?? () from target:/lib/x86_64-linux-gnu/libssl.so.1.1
#9  0x00007f01cabfac45 in ?? () from target:/lib/x86_64-linux-gnu/libssl.so.1.1
#10 0x00007f01cac05a3f in ?? () from target:/lib/x86_64-linux-gnu/libssl.so.1.1
#11 0x00007f01cac05b47 in SSL_read () from target:/lib/x86_64-linux-gnu/libssl.so.1.1

How to solve this hanging problem? Maybe the connections were restored somehow wrong?
Is CRIU doing something bad or is my system causing such behavior (these reboots, etc)?

criu --version
Version: 3.16.1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions