-
Notifications
You must be signed in to change notification settings - Fork 1.3k
syzbot: lots of SYZFAIL: ebtable checkpoint: socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)
crashes
#5956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
A local patched qemu-based instance is able to trigger the bug. After some debugging, it seems that the So it does look indeed as if it's the fuzzer who prohibits the socket creation. |
Should we replace the Creating a cgroup namespace per test seems like an overkill. On the other hand, if we are able to prohibit such basic syscalls for the rest of the proc lifetime, there's little value in letting it run further. Cc @dvyukov |
The socket should be in the test net namespace to reset the right state.
Perhaps we could re-create it as part of the sandbox creation. Before we produce SYZFAIL we should try to re-create sandbox several times. Another possible option is to mark some set of syscalls that have global dangerous effects as "snapshot-mode only", and test them only on in snapshot mode. We already have a bunch of issues with perf, and create separate instances just for these to achieve similar effect. There is also a bunch of syscalls that we simply disable entirely, or not describe, for similar reasons (e.g. the only in program sanitization). See #5308. |
You mean calling
Then it looks like we'd need to ban all of or sanitize |
Probably not as simple as moving the call, but, yes, something like that.
Is it the worst type of BPF hooks? It feels that any global hooks should be much worse than anything attached to a single cgroup. |
I don't know. In the documentation, they also mention LSM hooks, these also sound like they may have global consequences. |
There are also |
As discussed offline: to resolve this specific case, we could do |
FTR How cgroups are currently configured:syzkaller/executor/executor.cc Line 575 in 9c80ffa
syzkaller/executor/executor_runner.h Lines 678 to 683 in 9c80ffa
This is where we first set them up. syzkaller/executor/common_linux.h Lines 3798 to 3800 in 9c80ffa
During per-proc sandboxingWe make syzkaller/executor/common_linux.h Line 3966 in 9c80ffa
syzkaller/executor/common_linux.h Line 3894 in 9c80ffa
Before the execution loopAfter sandboxing, right before starting the execution loop, we also do some more cgroup configuration. Line 612 in 9c80ffa
syzkaller/executor/common_linux.h Line 4835 in 9c80ffa
syzkaller/executor/common_linux.h Line 3806 in 9c80ffa
That apparently configures some per-proc sub-cgroups: syzkaller/executor/common_linux.h Lines 3815 to 3816 in 9c80ffa
And we also do some work per each program execution.Line 648 in 9c80ffa
syzkaller/executor/common_linux.h Line 4874 in 9c80ffa
It configures symlinks: syzkaller/executor/common_linux.h Lines 3863 to 3874 in 9c80ffa
|
The original But
After running it under
(so And syzkaller/executor/common_linux.h Lines 3863 to 3866 in c6b4fb3
So, judging by how cgroups are currently configured, we already have a separate cgroup for each proc. As it was mentioned, we could use the cgroup namespace to make sure that the cgroups are really recreated each time we restart the proc, but it doesn't change the fact that it's perfectly legal to configure bpf/cgroups to deny the socket() syscall. If we recreate the proc on the SYZFAIL (which we afaik already do), we are going to hit the same problem as the failing program will just configure it all again. Even if we create a separate cgroup namespace for each proc and recreate the cgroup each time. Are there any other viable options than disabling (*) This one is triggered here: syzkaller/executor/common_linux.h Line 4845 in c6b4fb3
syzkaller/executor/common_linux.h Line 3690 in c6b4fb3
syzkaller/executor/common_linux.h Line 3626 in c6b4fb3
Given that there's the same underlying scenario, could it be that we get |
Will it SYZFAIL again? As far as I see we call reset_loop after finishing the previous program, so it should succeed after cgroup recreate. When you test the program with syz-execprog, does it fail on the first run, or on the second? |
Is is that well synchronized? If Line 740 in 9e70464
|
That's the child process that will exit. I don't immediately see why it should lead to any failures in the parent processes. |
Context
In the upstream Linux namespace, the
SYZFAIL: ebtable checkpoint: socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)
crash has been responsible for up to 10% of daily syzbot crashes.Its frequency recently dropped by 10x, but then it went up again (though it's still less frequent than it used to be).
Likely it was because of the other frequent net fuzzing crasher -
unregister_netdevice: waiting for DEV to become free
got fixed in the net tree. Since that moment, there's a surge ofSYZFAIL: ebtable checkpoint: socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)
crashes onci-upstream-net-this-kasan-gce
again.In any case, we should figure out whether it's a manifestation of a real kernel bug or our it's our checkpoint code in
syz-executor
must be changed.The reproducers look similar:
I was unable to crash
v6.15-rc2
kernel running on qemu using the reproducer above.Progress
In another related discussion, @FlorentRevest has concluded that the bpf program included in the reproducer is just returning 0.
The text was updated successfully, but these errors were encountered: