Skip to content

syzbot: lots of SYZFAIL: ebtable checkpoint: socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) crashes #5956

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
a-nogikh opened this issue Apr 17, 2025 · 13 comments
Labels

Comments

@a-nogikh
Copy link
Collaborator

Context

In the upstream Linux namespace, the SYZFAIL: ebtable checkpoint: socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) crash has been responsible for up to 10% of daily syzbot crashes.

Its frequency recently dropped by 10x, but then it went up again (though it's still less frequent than it used to be).

Likely it was because of the other frequent net fuzzing crasher - unregister_netdevice: waiting for DEV to become free got fixed in the net tree. Since that moment, there's a surge of SYZFAIL: ebtable checkpoint: socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) crashes on ci-upstream-net-this-kasan-gce again.

In any case, we should figure out whether it's a manifestation of a real kernel bug or our it's our checkpoint code in syz-executor must be changed.

The reproducers look similar:

#{"repeat":true,"procs":1,"slowdown":1,"sandbox":"none","tun":true,"netdev":true,"resetnet":true,"cgroups":true,"binfmt_misc":true,"close_fds":true,"usb":true,"vhci":true,"wifi":true,"ieee802154":true,"sysctl":true,"tmpdir":true,"segv":true}
bpf$BPF_TASK_FD_QUERY(0x14, &(0x7f0000000080)={0x0, 0xffffffffffffffff, 0x0, 0x7, &(0x7f0000000000)='cgroup\x00'}, 0x30)
r0 = bpf$PROG_LOAD(0x5, &(0x7f0000000080)={0x9, 0x4, &(0x7f0000000700)=ANY=[@ANYBLOB="18000000000000000000000000000000611200000000000095000000000000001383096e16281fd43e588cf7a1e65f316e5e5600f1fb642cb352b9d4c50ae8366e5cadf97f4e52fdb37bdab01f9f6cc297b10500c98ea973fbaf38f9d47c5702c2bd9ebf0134b54dbee7458404277462d8ac80053e629d28aa5b25e324fd54d237d7921ff7b52f78ad9692619113594630a9eb6490c61332499f4861a57120ea351e61ca79b452a2bffd133c9ce1b4049b537a6310d0ee13db80ad6553ed19a04679d0d66bf61277501f370105113bd565ae2e766f9a79e314ecbc4000b4702ecfcaed9cb384edf20b1d3e7011bd384577a5a78efdd8687e0574465e490aa62e217fa49e4167d7edcd030c20937155d065ee7bb686bffcf28ec73d58a1d795c358c5aee99cae4c959ba2b9a78b4e231c46f8030523faf5b79ef84c5201a69d776df2041ae3d19a3d03fb1f2913fdd3fef24c94f1e224f872c1bebc0a7622231b2be88508a13a5b74e417cdad2076dd0ccdf44daf7404337f84783856b8582065669a46c1d570cdf4d6ce259d39fdc6bc4f066eb27ba18fc0110ebf3eb081d09b8587c911260c2ca2f49825e10b20733735ec2f4a80308c92dac2cac1608cbd739d385703e2933fda0dde43f3270d7170a7f5ce1dad0a2ae4691cc8487e113b89df89fd1d3c51723d79966e8c2eae12cf2dcfa7c09b15de3f494c5bfc35a8ac8124fb66066b2b3c7db6585b2fe802e86d2794d885c779de4ab1a0999fcedaea0b0497927b536e120212681673509f2aa7ad0875d5be6fa5f5812dd7966978f435924026737b78156906c3faf9e84f0cfb70a8d326262ce7ceadf4f95a7afb2bdf8250af753f32"], &(0x7f0000000100)='GPL\x00', 0x0, 0x0, 0x0, 0x0, 0x0, '\x00', 0x0, 0x0, 0xffffffffffffffff, 0x8, 0x0, 0x0, 0x10, 0x0}, 0x70)
r1 = openat$cgroup_root(0xffffffffffffff9c, &(0x7f0000000000), 0x200002, 0x0)
bpf$BPF_PROG_DETACH(0x8, &(0x7f0000000040)={@cgroup=r1, r0, 0x2}, 0x10)

I was unable to crash v6.15-rc2 kernel running on qemu using the reproducer above.

Progress

In another related discussion, @FlorentRevest has concluded that the bpf program included in the reproducer is just returning 0.

@a-nogikh a-nogikh added the bug label Apr 17, 2025
@a-nogikh
Copy link
Collaborator Author

A local patched qemu-based instance is able to trigger the bug.

After some debugging, it seems that the socket() call is rejected after calling BPF_CGROUP_RUN_PROG_INET_SOCK(sk) here: https://elixir.bootlin.com/linux/v6.15-rc1/source/net/ipv4/af_inet.c#L391

So it does look indeed as if it's the fuzzer who prohibits the socket creation.

@a-nogikh
Copy link
Collaborator Author

Should we replace the socket() call by syz_init_net_socket() or something similar?

Creating a cgroup namespace per test seems like an overkill. On the other hand, if we are able to prohibit such basic syscalls for the rest of the proc lifetime, there's little value in letting it run further.

Cc @dvyukov

@dvyukov
Copy link
Collaborator

dvyukov commented Apr 22, 2025

Should we replace the socket() call by syz_init_net_socket() or something similar?

The socket should be in the test net namespace to reset the right state.

Creating a cgroup namespace per test seems like an overkill.

Perhaps we could re-create it as part of the sandbox creation. Before we produce SYZFAIL we should try to re-create sandbox several times.

Another possible option is to mark some set of syscalls that have global dangerous effects as "snapshot-mode only", and test them only on in snapshot mode. We already have a bunch of issues with perf, and create separate instances just for these to achieve similar effect. There is also a bunch of syscalls that we simply disable entirely, or not describe, for similar reasons (e.g. the only in program sanitization). See #5308.

@a-nogikh
Copy link
Collaborator Author

Perhaps we could re-create it as part of the sandbox creation.

You mean calling setup_cgroups() not in the runner executor process before creating the procs, but as a part of the sandboxing setup of each proc? Or do you mean something else here?

There is also a bunch of syscalls that we simply disable entirely, or not describe, for similar reasons (e.g. the only in program sanitization).

Then it looks like we'd need to ban all of BPF_PROG_TYPE_CGROUP_* bpf program types: https://docs.ebpf.io/linux/program-type/BPF_PROG_TYPE_CGROUP_SOCK/

or sanitize BPF_PROG_ATTACH to never attach to a EXP_CGROUP.

@dvyukov
Copy link
Collaborator

dvyukov commented Apr 22, 2025

You mean calling setup_cgroups() not in the runner executor process before creating the procs

Probably not as simple as moving the call, but, yes, something like that.

or sanitize BPF_PROG_ATTACH to never attach to a EXP_CGROUP.

Is it the worst type of BPF hooks? It feels that any global hooks should be much worse than anything attached to a single cgroup.

@a-nogikh
Copy link
Collaborator Author

Is it the worst type of BPF hooks?

I don't know. In the documentation, they also mention LSM hooks, these also sound like they may have global consequences.

@a-nogikh
Copy link
Collaborator Author

There are also CONFIG_CGROUP_BPF and CONFIG_BPF_LSM kernel options, but I am not sure if we want to add this dimension to our kernel config generation. But that would be the most reliable way to prevent these from happening.

@a-nogikh
Copy link
Collaborator Author

As discussed offline: to resolve this specific case, we could do CLONE_NEWCGROUP and set up cgroups for each proc independently. Hopefully the total diff will be small.

@a-nogikh
Copy link
Collaborator Author

FTR

How cgroups are currently configured:

runner(argv, argc);

// This does any one-time setup for the requested features on the machine.
// Note: this can be called multiple times and must be idempotent.
#if SYZ_HAVE_FEATURES
setup_sysctl();
setup_cgroups();
#endif

This is where we first set them up.
mount_cgroups2(unified_controllers, sizeof(unified_controllers) / sizeof(unified_controllers[0]));
mount_cgroups("/syzcgroup/net", net_controllers, sizeof(net_controllers) / sizeof(net_controllers[0]));
mount_cgroups("/syzcgroup/cpu", cpu_controllers, sizeof(cpu_controllers) / sizeof(cpu_controllers[0]));


During per-proc sandboxing

We make /syzcgroups available within the chroot'ed fs for each proc:

initialize_cgroups();

if (mount("/syzcgroup/unified", "./syz-tmp/newroot/syzcgroup/unified", NULL, bind_mount_flags, NULL)) {


Before the execution loop

After sandboxing, right before starting the execution loop, we also do some more cgroup configuration.

setup_loop();

setup_cgroups_loop();

static void setup_cgroups_loop()

That apparently configures some per-proc sub-cgroups:

snprintf(cgroupdir, sizeof(cgroupdir), "/syzcgroup/unified/syz%llu", procid);
if (mkdir(cgroupdir, 0777)) {


And we also do some work per each program execution.

setup_test();

setup_cgroups_test();

It configures symlinks:

snprintf(cgroupdir, sizeof(cgroupdir), "/syzcgroup/unified/syz%llu", procid);
if (symlink(cgroupdir, "./cgroup")) {
debug("symlink(%s, ./cgroup) failed: %d\n", cgroupdir, errno);
}
snprintf(cgroupdir, sizeof(cgroupdir), "/syzcgroup/cpu/syz%llu", procid);
if (symlink(cgroupdir, "./cgroup.cpu")) {
debug("symlink(%s, ./cgroup.cpu) failed: %d\n", cgroupdir, errno);
}
snprintf(cgroupdir, sizeof(cgroupdir), "/syzcgroup/net/syz%llu", procid);
if (symlink(cgroupdir, "./cgroup.net")) {
debug("symlink(%s, ./cgroup.net) failed: %d\n", cgroupdir, errno);
}

@a-nogikh
Copy link
Collaborator Author

The original SYZFAIL: ebtable checkpoint: socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) seems to be quite difficult to reliably reproduce, even though it happens quite frequently during fuzzing.

But SYZFAIL: ebtable: socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) (*) is easy to reproduce:

bpf$BPF_TASK_FD_QUERY(0x14, &(0x7f0000000240)={0x0, 0xffffffffffffffff, 0x0, 0x7, &(0x7f0000000000)='cgroup\x00'}, 0x30)
r0 = openat$cgroup_root(0xffffffffffffff9c, &(0x7f0000000000), 0x200002, 0x0)
r1 = bpf$PROG_LOAD(0x5, &(0x7f0000000080)={0x9, 0x4, &(0x7f00000008c0)=ANY=[@ANYBLOB="180000000000000000000000000000006112000000000000950000000000000051fa7824c74186dc02ec0696c37b64e3b24da3180100000005165c0f63cdc2e82818254950ee03568b8809a1ff4c7c4750eabfafcb9531b31e6a86827d1010c5a909ab98e00e19644a88e95ba26d1c9eecddb2d11c541418ceeb29b9b6829c6e433822bdb3cc85244aab60c1aae1314d7381fcfeb970bea672cf1e926f6a51479343144648a07a975bd89dc398712376610f6254f12495b4658319684387f6f3543205d4bc4ce05b8b961103673dff7f158052e62b20f05fd24108d8363d44fcd0f8f3647899762a17282a1914452d11f557c28f396eebdc858558db0276d14f9035f2b5f703e5be7e4acf8b78c2834ae5805fffee38a9a0033d520bcf6b08ede50899d4b9bdf85c71c5de2503dab358f42a2624c7daa9ed44039aab46419496362e54cfad05a0004ac71a003d7b85d07191bed4e5a890826300214146f7ed569985439baa355c2766dd056f5d79e454f3d873095e7a237bc06d035a8d601f21746d886419f38b34a495040000000071c2f0cce8c93cc17e9afa314fcb2ba15d646c66b0f65021829f87d988b4e2d71753b1549fa734f0b2e56dbd21ed2e09d0cddad721971637f384eed3034597c93e1c52f42cad0ed09c395dc6e9703660fefa1c80f467367c006f25caf0cbcefd13d68839893e39c588eb032905f91cafa4996dbf0c9be9654db05fb918086cc8228d02a3092c0830b8f587a5624515298b2d4eb2bde6f9a2eb83d53f717f13fa7552d92c51dbd32ea50c490ecd085d2811a7555c538cffffff7f00000000dd872244bfa64779e0f43a9c277e2910b7ccdc3d6726d34ad2101033a623ca2a49ad344884289130bc71cee2b7de62bf48129ae1af052a2d46a61625735a9eea7f793946b3229e861d8ea49806b3f7d4295f6b000000000000f337b1ceb2d8a65dcdcd895d7ba37098d2593fdaaef445af5bee02019c00000099b13ecda2a5b37de0519e974cba92ebaf0f701611a9b027ce04340bda4594cc9049c3f101629ab028145e004209ebe71a6fe84af50804000000000000004a27213354964e250a98fe357676f94b6947383e320fbb1118f586d5b9b1b977e1e1a4490ff67703a9b5900f8a6f8a805879dd91ec5ff435b219c53680c0ae04dcc4ef69b98fcb0d6b6a03a8b71a66b4e2876dc4b610444bf10000000000b046b6ae5d68156bcbd6d8793ade9a22ac8fc7857e5bbc14adc4e12b08f350c6789283b9990c72e64372a1f79769a8bdc632fc1a0b3417855d8b7d25ca4d404c23631ad3d2f55dcd385371c86170a4bca58c2b2b4eabc365f45bd10bb45b0c5bc354456a52be18d9b44014d20a3c51c8f013dade83562e73278662829e4f5a9ac00fd91178468c737f0872d97d38d11a176be5a0d7294c51eb161eddcfefa8837c7430721851ec2a107af0df6d43e732bbc01e76c66895eb85d36798d61622773591ee21ad9f6a1b73fa9cf3ffeb8a00b63af800a81d0fb8aa29df8b8ad6fbafefb5802a23cbdeeabceda5bfc5ff2fa5c1d61d04a1324794c6ed000696d9f04010c35474e690545c3d9bd836d4cef2585ba616e01c3d000000000000000000470ebc6f3453ecbf3047e4547d7632d3ad21798e730cb5d1da059b5bdb8107815dff995c0788906790406dfb4f8ee9f24ff94233e2e6e581e6e5de33a5f254c9a8b612547473c3001df3928dac9203b744619082421a8da7c00000000000000000000000000000018a73ef40cca690fb7595c6962984f8276677be6f66cbdbccf1896433808c9c84d74ac4a7c186a04a2250972f7acb156b21f9826b6acb7db32c4e3b3ec8b59fd972975edb1da872d81a35e4fda2f5cbde6b40bea20418c6e9dad30b791eea58f53e80fee4dd7fe08373ea2784fcd3a65261de71eb866458d2c22a"], &(0x7f0000000100)='GPL\x00', 0x0, 0x0, 0x0, 0x0, 0x0, '\x00', 0x0, @cgroup_sock, 0xffffffffffffffff, 0x8, 0x0, 0x0, 0x10, 0x0, 0x0, 0x0, 0xffffffffffffffff, 0x0, 0x0, 0x0, 0x10, 0x0, @void, @value}, 0x70)
bpf$BPF_PROG_DETACH(0x8, &(0x7f0000000040)={@cgroup=r0, r1, 0x2, 0x2, 0x0, @void, @value}, 0x10)

After running it under strace we get:

bpf(BPF_TASK_FD_QUERY, {task_fd_query={pid=0, fd=-1, flags=0, buf_len=7, buf="cgroup", prog_id=0, fd_type=BPF_FD_TYPE_RAW_TRACEPOINT, probe_offset=0, probe_addr=0}}, 48)
openat(AT_FDCWD, "cgroup", O_RDWR|O_PATH)
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_CGROUP_SOCK, insn_cnt=4, insns=0x2000000008c0, license="", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=-1, func_info_rec_size=8, func_info=NULL, func_info_cnt=0, line_info_rec_size=16, line_info=NULL, line_info_cnt=0, attach_btf_id=0}, 112 ...)
bpf(BPF_PROG_ATTACH, {target_fd=3, attach_bpf_fd=4, attach_type=BPF_CGROUP_INET_SOCK_CREATE, attach_flags=BPF_F_ALLOW_MULTI}, 16)

(so bpf$BPF_PROG_DETACH was misleading and it's in fact BPF_PROG_ATTACH)

And ./cgroup is a symlink that we mount here:

snprintf(cgroupdir, sizeof(cgroupdir), "/syzcgroup/unified/syz%llu", procid);
if (symlink(cgroupdir, "./cgroup")) {
debug("symlink(%s, ./cgroup) failed: %d\n", cgroupdir, errno);
}

So, judging by how cgroups are currently configured, we already have a separate cgroup for each proc. As it was mentioned, we could use the cgroup namespace to make sure that the cgroups are really recreated each time we restart the proc, but it doesn't change the fact that it's perfectly legal to configure bpf/cgroups to deny the socket() syscall.

If we recreate the proc on the SYZFAIL (which we afaik already do), we are going to hit the same problem as the failing program will just configure it all again. Even if we create a separate cgroup namespace for each proc and recreate the cgroup each time.

Are there any other viable options than disabling CONFIG_CGROUP_BPF / only allowing the operation on the snapshot instance?

@dvyukov @FlorentRevest


(*) This one is triggered here:

static void reset_loop()

static void reset_net_namespace(void)

static void reset_ebtables()

Given that there's the same underlying scenario, could it be that we get SYZFAIL: ebtable checkpoint: socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) only if reset_loop() for some reason was not called and the proc was restarted?

@dvyukov
Copy link
Collaborator

dvyukov commented Apr 29, 2025

If we recreate the proc on the SYZFAIL (which we afaik already do), we are going to hit the same problem as the failing program will just configure it all again. Even if we create a separate cgroup namespace for each proc and recreate the cgroup each time.

Will it SYZFAIL again? As far as I see we call reset_loop after finishing the previous program, so it should succeed after cgroup recreate. When you test the program with syz-execprog, does it fail on the first run, or on the second?

@a-nogikh
Copy link
Collaborator Author

Is is that well synchronized? If reply_execute is not blocking until we have processed the results in the parent process, we will just immediately execute reset_loop() and print the SYZFAIL message/exit with a non-zero code.

reply_execute(0);

@dvyukov
Copy link
Collaborator

dvyukov commented Apr 29, 2025

That's the child process that will exit. I don't immediately see why it should lead to any failures in the parent processes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants