Skip to content

Commit 3bf115c

Browse files
committed
freeze_processes: implement kludges for cgroup v1
Cgroup v1 freezer has always been problematic, failing to freeze a cgroup. In runc, we have implemented a few kludges to increase the chance of succeeding, but those are used when runc freezes a cgroup for its own purposes (for "runc pause" and to modify device properties for cgroup v1). When criu is used, it fails to freeze a cgroup from time to time (see [1], [2]). Let's try adding kludges similar to ones in runc. Alas, I have absolutely no way to test this, so please review carefully. [1]: opencontainers/runc#4273 [2]: opencontainers/runc#4457 Signed-off-by: Kir Kolyshkin <[email protected]>
1 parent a678a3b commit 3bf115c

File tree

1 file changed

+31
-1
lines changed

1 file changed

+31
-1
lines changed

criu/seize.c

+31-1
Original file line numberDiff line numberDiff line change
@@ -542,6 +542,7 @@ static int freeze_processes(void)
542542
enum freezer_state state = THAWED;
543543

544544
static const unsigned long step_ms = 100;
545+
/* Since opts.timeout is in seconds, multiply it by 1000 to convert to milliseconds. */
545546
unsigned long nr_attempts = (opts.timeout * 1000) / step_ms;
546547
unsigned long i = 0;
547548

@@ -586,6 +587,7 @@ static int freeze_processes(void)
586587
* transition stage.
587588
*/
588589
for (; i <= nr_attempts; i++) {
590+
nanosleep(&req, NULL);
589591
state = get_freezer_state(fd);
590592
if (state == FREEZER_ERROR) {
591593
close(fd);
@@ -598,7 +600,35 @@ static int freeze_processes(void)
598600
pr_err("Unable to freeze cgroup %s (timed out)\n", opts.freeze_cgroup);
599601
goto err;
600602
}
601-
nanosleep(&req, NULL);
603+
604+
if (cgroup_v2)
605+
continue;
606+
607+
/* As per older kernel docs (freezer-subsystem.txt before
608+
* the kernel commit ef9fe980c6fcc1821), if FREEZING is seen,
609+
* userspace should either retry or thaw. While current
610+
* kernel cgroup v1 docs no longer mention a need to retry,
611+
* even recent kernels can't reliably freeze a cgroup v1.
612+
*
613+
* Let's keep asking the kernel to freeze from time to time.
614+
* In addition, do occasional thaw/sleep/freeze.
615+
*
616+
* This is still a game of chances (the real fix belongs to the kernel)
617+
* but these kludges might improve the probability of success.
618+
*
619+
* Cgroup v2 does not have this problem.
620+
*/
621+
switch (i % 32) {
622+
case 9:
623+
case 20:
624+
freezer_write_state(fd, FROZEN);
625+
break;
626+
case 31:
627+
freezer_write_state(fd, THAWED);
628+
nanosleep(&req, NULL);
629+
freezer_write_state(fd, FROZEN);
630+
break;
631+
}
602632
}
603633

604634
if (i > nr_attempts) {

0 commit comments

Comments
 (0)