-
Notifications
You must be signed in to change notification settings - Fork 1.3k
bgpd: stuck in unresponsive state #18606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Most likely, it was stuck on I/O operations in your system. Do you have some traces/logs how your system looked like at that moment? (memory, swap, CPU, I/O utilization). |
The nearest metrics I have are at 2025-04-08 07:21:15 UTC+0, nothing unusual here except lower network bandwidth. However I do have logs from
|
This log is triggered when the system cannot send out the packets due to the receive window size being 0 on the other side.
Could you (if you can replicate) show the output of |
I am afraid I can't because I wasn't able to reproduce this issue so far. Since this situation happened in prod environment I had to restart the process manually to restore operability, so current strace session would not provide any meaningful data if I am understanding your request correctly. If that helps, the faulty session was a nlnog peer that had only advertised prefixes (IPv4 and IPv6 fullviews) and none received. Apart from it there were other sessions but none of them were mentioned at the logs before I restorted bgpd operability. |
Description
Honestly I don't understand what exactly happened, I can only attach relevant logs: frr-bgpd.txt
Version
How to reproduce
N/A
Expected behavior
watchfrr issuing
kill -9
on timeout forkill -15
for unresponsive bgpd processActual behavior
bgpd become unresponsive and watchfrr haven't killed it for two hours
Additional context
This happened just before bgpd gone unresponsive:
I have resolved my problem by killing bgpd with SIGKILL.
Backtraces that are relevant (happened on
systemctl restart frr
):gdb symbols:
Checklist
The text was updated successfully, but these errors were encountered: