Skip to content

zebra: V6 RA not sent anymore after interface up-down-up #18451

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

soumyar-roy
Copy link
Contributor

@soumyar-roy soumyar-roy commented Mar 21, 2025

zebra: V6 RA not sent anymore after interface up-down-up

Issue:
Once interface is shutdown, the interface is removed from
wheel timer. Now when the interface is up again, current code
won't add the interface to wheel timer again, so it won't send RA
anymore for that interface

Fix:
Moved wheel_add for interface inside rtadv_start_interface_events
This is more common function which gets triggered for both
RA enable and interface up event

Also on any kind of interface activation event, we try to send
RA as soon as possible. This is to satisfy requirement where
quick RA is needed, especially for some convergence, dependent on
RA.

Testing:
Did ineterface up to down to up
Added debug log for RA, checked it is getting advertised preodically
after when up at up state

show bgp summary for 512 bgp peers for bgp bgp unnumbered works fine.

Signed-off-by: Soumya Roy [email protected]

@soumyar-roy soumyar-roy marked this pull request as draft March 21, 2025 20:30
@mjstapp
Copy link
Contributor

mjstapp commented Mar 21, 2025

so ... please add a meaningful title and description?

@soumyar-roy soumyar-roy changed the title Soumya/fastra zebra: send v6 fast RA at faster interval Mar 21, 2025
@soumyar-roy soumyar-roy force-pushed the soumya/fastra branch 2 times, most recently from f4eedc5 to 7095a84 Compare March 21, 2025 21:51
@soumyar-roy
Copy link
Contributor Author

so ... please add a meaningful title and description?

Added now

lib/wheel.c Outdated
list_isempty(wheel->wheel_slot_lists[curr_slot])) {
/* Came to back to same slot and that is empty
* so the wheel is empty, puase it
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you fix the comment indentation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1)This comment is indented w.r.t if (!wheel->run_forever) {. before running git clang-format >>
((((curr_slot + slots_to_skip) % wheel->slots) == curr_slot) &&

  •           list_isempty(wheel->wheel_slot_lists[curr_slot])) {<<<This line is tab indented
    
  •           /* Came to back to same slot and that is empty
    
  •            * so the wheel is empty, stop it
    
  •           */
    
  •           if (!wheel->run_forever) {
    
  •                   wheel_stop(wheel);
    
  •                   if (debug_timer_wheel)
    
  •                           zlog_debug("Stopped an empty  wheel %p", wheel);
    
  •                   return;
    
  •           }
    
  •   }
    

2)After git clang-format >>
if ((((curr_slot + slots_to_skip) % wheel->slots) == curr_slot) &&

  •           list_isempty(wheel->wheel_slot_lists[curr_slot])) {
    
  •       list_isempty(wheel->wheel_slot_lists[curr_slot])) {<<<<<This line gets space indented
              /* Came to back to same slot and that is empty
               * so the wheel is empty, stop it
              */
    
  1. If I dont do step 2) I get style suggestion error curl https://gist.githubusercontent.com/polychaeta/13d7c1b3f9c07b87352be22b5f29ad01/raw/55bb8b7724008c107d333a05c8db9a785a2db0f7/style.diff | git apply -. So restoring back to 1)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no: line 56 is not correct. it looks like it's missing a space to align the comment block

@ton31337
Copy link
Member

Can we have a bit of the context when fast/regular wheels are used?

@soumyar-roy soumyar-roy force-pushed the soumya/fastra branch 2 times, most recently from c13b1c3 to 3a9feb0 Compare March 23, 2025 18:08
@soumyar-roy
Copy link
Contributor Author

Can we have a bit of the context when fast/regular wheels are used?

Added more context

@soumyar-roy soumyar-roy marked this pull request as ready for review March 24, 2025 00:53
@ton31337
Copy link
Member

Thanks, makes sense now, but please put it inside the commit (not in PR).

@soumyar-roy soumyar-roy reopened this Apr 16, 2025
if (adv_if != NULL)
if (adv_if != NULL) {
rtadv_send_packet(zvrf->rtadv.sock, zif->ifp, RA_ENABLE);
wheel_add_item(zrouter.ra_wheel, zif->ifp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still concerned about this a bit. I only see the "wheel_remove" in the path where the frr-configured "shut" is processed, but not in other interface "down" processing. are you sure this is correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added wheel_remove in interface_down()

@soumyar-roy
Copy link
Contributor Author

Test case added.

Copy link
Contributor

@mjstapp mjstapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the interface module is using some rtadv apis. I think the clearest way to fix these problems is to make sure a) that the rtadv module is notified during the relevant interface events, and b) maybe ensure that it's unambiguous what needs to happen within the rtadv code. a flag, for instance, could indicate whether an interface is rtadv-active or not?

r12f pushed a commit to Azure/sonic-buildimage-msft that referenced this pull request Apr 23, 2025
**Add support for 512 BGP sessions**

Backport sonic-net/sonic-buildimage#22390 to
202412

| Patch | Upstream Commit |
|-------|----------------|
| 0044-zebra-Prevent-starvation-in-dplane_thread_loop.patch |
[6faad863](FRRouting/frr@6faad86)
|
| 0083-bgpd-fix-vty-output-of-evpn-route-target-AS4.patch |
[20b3ab48](FRRouting/frr@20b3ab4)
|
| 0084-zebra-Ensure-dplane-does-not-send-work-back-to-maste.patch |
[c4115522](FRRouting/frr@c411552)
|
| 0085-zebra-Limit-mutex-for-obuf-to-when-we-access-obuf.patch |
[c58da10d](FRRouting/frr@c58da10)
|
| 0086-bgpd-backpressure-Fix-to-pop-items-off-zebra_announc.patch |
[898852f](FRRouting/frr@898852f)
|
| 0087-zebra-fnc-obuf-could-be-accessed-without-a-lock.patch |
[e7a1fbbcf](FRRouting/frr@e7a1fbb)
|
| 0088-zebra-Add-show-fpm-status-json-command.patch |
[0a9e8ef49](FRRouting/frr@0a9e8ef)
|
| 0089-doc-Add-show-fpm-status-json-command-to-documentatio.patch |
[a0c4fe2ca](FRRouting/frr@a0c4fe2)
|
| 0090-zebra-avoid-a-race-during-FPM-dplane-plugin-shutdown.patch |
[277784f](FRRouting/frr@277784f)
|
| 0091-zebra-add-nexthop-counter-to-show-zebra-dplane-comma.patch |
[e36e570c](FRRouting/frr@e36e570)
|
| 0092-zebra-Installation-success-should-not-set-NHG-as-val.patch |
[910b2c5a](FRRouting/frr@910b2c5)
|
| 0093-zebra-When-reinstalling-a-NHG-set-REINSTALL-flag.patch |
[b2ade8e](FRRouting/frr@b2ade8e)
|
| 0094-zebra-Conslidate-zebra_nhg_set_valid-invalid-functio.patch |
[6ee9cc68](FRRouting/frr@8f76afd)
|
| 0095-zebra-Properly-note-that-a-nhg-s-nexthop-has-gone-do.patch |
[3b9428a7](FRRouting/frr@1bbbcf0)
|
| 0096-zebra-be-consistent-about-v6-nexthops-for-v4-routes.patch |
[c93bc371](FRRouting/frr@0221ed2)
|
| 0097-lib-zebra-Modify-nexthop_cmp-to-allow-you-to-use-wei.patch |
[75268f01](FRRouting/frr@b8e24a0)
|
| 0098-zebra-Create-Singleton-nhg-s-without-weights.patch |
[ae4a1315](FRRouting/frr@c20fa97)
|
| 0099-zebra-Allow-blackhole-singleton-nexthops-to-be-v6.patch |
[ae397ad9](FRRouting/frr@f90989d)
|
| 0100-zebra-Allow-for-initial-deny-of-installation-of-nhe-.patch |
[4bf2c11f](FRRouting/frr@0c72a78)
|
| 0101-zebra-Properly-note-that-a-nhg-s-nexthop-has-gone-do.patch |
[892e8179](FRRouting/frr@1bbbcf0)
|
| 0102-zebra-Reinstall-nexthop-when-interface-comes-back-up.patch |
[279f427c](FRRouting/frr@3be8b48)
|
| 0103-zebra-Attempt-to-reuse-NHG-after-interface-up-and-ro.patch |
[98d56711](FRRouting/frr@f02d76f)
|
| 0104-zebra-Expose-_route_entry_dump_nh-so-it-can-be-used.patch |
[4fb44993](FRRouting/frr@ce166ca)
|
| 0105-zebra-Fix-resetting-valid-flags-for-NHG-dependents.patch |
[6e95686b](FRRouting/frr@54ec9f3)
|
| 0106-zebra-Fix-leaked-nhe.patch |
[a84d2bc0](FRRouting/frr@97fa24e)
|
| 0107-zebra-Uninstall-NHG-in-some-situations.patch |
[d1cba73a](FRRouting/frr@4c16694)
|
| 0108-zebra-Optimize-invoking-nhg-compare-func.patch |
[0faa70a5](FRRouting/frr@e77954e)
|
| 0109-zebra-Nexthops-need-to-be-ACTIVE-in-some-cases.patch |
[df56b92b](FRRouting/frr@b61424a)
|
| 0110-zebra-On-Nexthop-install-failure-don-t-set-Installat.patch |
[4b2b1a9a](FRRouting/frr@ec6a000)
|
| 0111-zebra-Bring-up-514-BGP-neighbor-sessions.patch |
[ea399e15](FRRouting/frr@6a75d33)
|
| 0112-lib-Add-support-for-stream-buffer-to-expand.patch |
[65b3ee4e](FRRouting/frr@c0c46ba)
|
| 0113-zebra-zebra-crash-for-zapi-stream.patch |
[c122afdb](FRRouting/frr@6fe9092)
|
| 0114-bgpd-Replace-per-peer-connection-error-with-per-bgp.patch |
[10c127bc](FRRouting/frr@6a5962e)
|
| 0115-bgpd-remove-apis-from-bgp_route.h.patch |
[1d5a8a20](FRRouting/frr@020245b)
|
| 0116-bgpd-batch-peer-connection-error-clearing.patch |
[4baa9f2d](FRRouting/frr@58f924d)
|
| 0117-zebra-move-peer-conn-error-list-to-connection-struct.patch |
[411abd6b](FRRouting/frr@6206e7e)
|
| 0118-bgpd-Allow-batch-clear-to-do-partial-work-and-contin.patch |
[b68be906](FRRouting/frr@c527882)
|
| 0119-zebra-send-v6-fast-RA-at-faster-interval.patch |
[c8f12a4f](FRRouting/frr#18451) |
| 0120-lib-add-option-to-start-stop-wheel-timer.patch |
[ca0adcdd](FRRouting/frr#18451) |
| 0121-bgpd-Paths-received-from-shutdown-peer-not-deleted.patch |
[2cbfc7ec](FRRouting/frr@d2bec7a)
|

**Verification:**
Verified the changes on topology with scaled BGP tests

---------

Signed-off-by: Vivek Reddy <[email protected]>
Signed-off-by: Vivek <[email protected]>
@github-actions github-actions bot added size/L and removed size/M labels Apr 30, 2025
@soumyar-roy
Copy link
Contributor Author

the interface module is using some rtadv apis. I think the clearest way to fix these problems is to make sure a) that the rtadv module is notified during the relevant interface events, and b) maybe ensure that it's unambiguous what needs to happen within the rtadv code. a flag, for instance, could indicate whether an interface is rtadv-active or not?

Changed code.

@@ -18,6 +18,7 @@
#include "log.h"
#include "zclient.h"
#include "vrf.h"
#include "wheel.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need this change, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

zebra/rtadv.c Outdated
/*Try to delete from ra wheels */
wheel_remove_item(zrouter.ra_wheel, ifp);

if (if_down_event) {
/* Nothing to do more, return */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I understand that you don't want to try to send a packet, but shouldn't you stop the "join_timer" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I was trying to limit existing behavior part of this change. I have accommodated it now, join_timer is turned off.

Issue:
Once interface is shutdown, the interface is removed from
wheel timer. Now when the interface is up again, current code
won't add the interface to wheel timer again, so it won't send RA
anymore for that interface

Fix:
Moved wheel_add for interface inside rtadv_start_interface_events
This is more common function which gets triggered for both
RA enable and interface up event

Also on any kind of interface activation event, we try to send
RA as soon as possible. This is to satisfy requirement where
quick RA is needed, especially for some convergence, dependent on
RA.

Testing:
Did ineterface up to down to up
Added debug log for RA, checked it is getting advertised preodically
after when up at up state

show bgp summary for 512 bgp peers for bgp bgp unnumbered works fine.

Signed-off-by: Soumya Roy <[email protected]>
Added test cases with interface down/up/shutdown
to verify RA state of an interface

Signed-off-by: Soumya Roy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libfrr master rebase PR needs rebase size/L tests Topotests, make check, etc zebra
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants