Skip to content

Commit 98f1d11

Browse files
authored
[202412] [FRR] Add support for 514 BGP sessions (sonic-net#1034)
**Add support for 512 BGP sessions** Backport sonic-net#22390 to 202412 | Patch | Upstream Commit | |-------|----------------| | 0044-zebra-Prevent-starvation-in-dplane_thread_loop.patch | [6faad863](FRRouting/frr@6faad86) | | 0083-bgpd-fix-vty-output-of-evpn-route-target-AS4.patch | [20b3ab48](FRRouting/frr@20b3ab4) | | 0084-zebra-Ensure-dplane-does-not-send-work-back-to-maste.patch | [c4115522](FRRouting/frr@c411552) | | 0085-zebra-Limit-mutex-for-obuf-to-when-we-access-obuf.patch | [c58da10d](FRRouting/frr@c58da10) | | 0086-bgpd-backpressure-Fix-to-pop-items-off-zebra_announc.patch | [898852f](FRRouting/frr@898852f) | | 0087-zebra-fnc-obuf-could-be-accessed-without-a-lock.patch | [e7a1fbbcf](FRRouting/frr@e7a1fbb) | | 0088-zebra-Add-show-fpm-status-json-command.patch | [0a9e8ef49](FRRouting/frr@0a9e8ef) | | 0089-doc-Add-show-fpm-status-json-command-to-documentatio.patch | [a0c4fe2ca](FRRouting/frr@a0c4fe2) | | 0090-zebra-avoid-a-race-during-FPM-dplane-plugin-shutdown.patch | [277784f](FRRouting/frr@277784f) | | 0091-zebra-add-nexthop-counter-to-show-zebra-dplane-comma.patch | [e36e570c](FRRouting/frr@e36e570) | | 0092-zebra-Installation-success-should-not-set-NHG-as-val.patch | [910b2c5a](FRRouting/frr@910b2c5) | | 0093-zebra-When-reinstalling-a-NHG-set-REINSTALL-flag.patch | [b2ade8e](FRRouting/frr@b2ade8e) | | 0094-zebra-Conslidate-zebra_nhg_set_valid-invalid-functio.patch | [6ee9cc68](FRRouting/frr@8f76afd) | | 0095-zebra-Properly-note-that-a-nhg-s-nexthop-has-gone-do.patch | [3b9428a7](FRRouting/frr@1bbbcf0) | | 0096-zebra-be-consistent-about-v6-nexthops-for-v4-routes.patch | [c93bc371](FRRouting/frr@0221ed2) | | 0097-lib-zebra-Modify-nexthop_cmp-to-allow-you-to-use-wei.patch | [75268f01](FRRouting/frr@b8e24a0) | | 0098-zebra-Create-Singleton-nhg-s-without-weights.patch | [ae4a1315](FRRouting/frr@c20fa97) | | 0099-zebra-Allow-blackhole-singleton-nexthops-to-be-v6.patch | [ae397ad9](FRRouting/frr@f90989d) | | 0100-zebra-Allow-for-initial-deny-of-installation-of-nhe-.patch | [4bf2c11f](FRRouting/frr@0c72a78) | | 0101-zebra-Properly-note-that-a-nhg-s-nexthop-has-gone-do.patch | [892e8179](FRRouting/frr@1bbbcf0) | | 0102-zebra-Reinstall-nexthop-when-interface-comes-back-up.patch | [279f427c](FRRouting/frr@3be8b48) | | 0103-zebra-Attempt-to-reuse-NHG-after-interface-up-and-ro.patch | [98d56711](FRRouting/frr@f02d76f) | | 0104-zebra-Expose-_route_entry_dump_nh-so-it-can-be-used.patch | [4fb44993](FRRouting/frr@ce166ca) | | 0105-zebra-Fix-resetting-valid-flags-for-NHG-dependents.patch | [6e95686b](FRRouting/frr@54ec9f3) | | 0106-zebra-Fix-leaked-nhe.patch | [a84d2bc0](FRRouting/frr@97fa24e) | | 0107-zebra-Uninstall-NHG-in-some-situations.patch | [d1cba73a](FRRouting/frr@4c16694) | | 0108-zebra-Optimize-invoking-nhg-compare-func.patch | [0faa70a5](FRRouting/frr@e77954e) | | 0109-zebra-Nexthops-need-to-be-ACTIVE-in-some-cases.patch | [df56b92b](FRRouting/frr@b61424a) | | 0110-zebra-On-Nexthop-install-failure-don-t-set-Installat.patch | [4b2b1a9a](FRRouting/frr@ec6a000) | | 0111-zebra-Bring-up-514-BGP-neighbor-sessions.patch | [ea399e15](FRRouting/frr@6a75d33) | | 0112-lib-Add-support-for-stream-buffer-to-expand.patch | [65b3ee4e](FRRouting/frr@c0c46ba) | | 0113-zebra-zebra-crash-for-zapi-stream.patch | [c122afdb](FRRouting/frr@6fe9092) | | 0114-bgpd-Replace-per-peer-connection-error-with-per-bgp.patch | [10c127bc](FRRouting/frr@6a5962e) | | 0115-bgpd-remove-apis-from-bgp_route.h.patch | [1d5a8a20](FRRouting/frr@020245b) | | 0116-bgpd-batch-peer-connection-error-clearing.patch | [4baa9f2d](FRRouting/frr@58f924d) | | 0117-zebra-move-peer-conn-error-list-to-connection-struct.patch | [411abd6b](FRRouting/frr@6206e7e) | | 0118-bgpd-Allow-batch-clear-to-do-partial-work-and-contin.patch | [b68be906](FRRouting/frr@c527882) | | 0119-zebra-send-v6-fast-RA-at-faster-interval.patch | [c8f12a4f](FRRouting/frr#18451) | | 0120-lib-add-option-to-start-stop-wheel-timer.patch | [ca0adcdd](FRRouting/frr#18451) | | 0121-bgpd-Paths-received-from-shutdown-peer-not-deleted.patch | [2cbfc7ec](FRRouting/frr@d2bec7a) | **Verification:** Verified the changes on topology with scaled BGP tests --------- Signed-off-by: Vivek Reddy <[email protected]> Signed-off-by: Vivek <[email protected]>
1 parent fc7e20c commit 98f1d11

File tree

44 files changed

+6283
-66
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+6283
-66
lines changed

src/sonic-frr/patch/0044-zebra-Modify-dplane-loop-to-allow-backpressure-to-fi.patch

+10-10
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
1-
From 4671ddf4920553b663fda129f7c4366839347645 Mon Sep 17 00:00:00 2001
1+
From e6e096f2507e76c375ba9d6b20c05af0b61ce2cd Mon Sep 17 00:00:00 2001
22
From: Donald Sharp <[email protected]>
33
Date: Wed, 12 Jun 2024 14:14:48 -0400
4-
Subject: [PATCH 3/5] zebra: Modify dplane loop to allow backpressure to filter
5-
up
4+
Subject: [PATCH] zebra: Modify dplane loop to allow backpressure to filter up
65

76
Currently when the dplane_thread_loop is run, it moves contexts
87
from the dg_update_list and puts the contexts on the input queue
@@ -30,11 +29,12 @@ context system and memory will not go out of control.
3029

3130
Signed-off-by: Donald Sharp <[email protected]>
3231

32+
3333
diff --git a/zebra/zebra_dplane.c b/zebra/zebra_dplane.c
34-
index c52e032660..f0e1ff6f27 100644
34+
index 44ee41d8c..0460463bc 100644
3535
--- a/zebra/zebra_dplane.c
3636
+++ b/zebra/zebra_dplane.c
37-
@@ -7155,10 +7155,10 @@ static void dplane_thread_loop(struct thread *event)
37+
@@ -7279,10 +7279,10 @@ static void dplane_thread_loop(struct event *event)
3838
{
3939
struct dplane_ctx_list_head work_list;
4040
struct dplane_ctx_list_head error_list;
@@ -47,7 +47,7 @@ index c52e032660..f0e1ff6f27 100644
4747
bool reschedule = false;
4848

4949
/* Capture work limit per cycle */
50-
@@ -7182,18 +7182,48 @@ static void dplane_thread_loop(struct thread *event)
50+
@@ -7306,18 +7306,48 @@ static void dplane_thread_loop(struct event *event)
5151
/* Locate initial registered provider */
5252
prov = dplane_prov_list_first(&zdplane_info.dg_providers);
5353

@@ -104,7 +104,7 @@ index c52e032660..f0e1ff6f27 100644
104104
DPLANE_UNLOCK();
105105

106106
atomic_fetch_sub_explicit(&zdplane_info.dg_routes_queued, counter,
107-
@@ -7212,8 +7242,9 @@ static void dplane_thread_loop(struct thread *event)
107+
@@ -7336,8 +7366,9 @@ static void dplane_thread_loop(struct event *event)
108108
* items.
109109
*/
110110
if (IS_ZEBRA_DEBUG_DPLANE_DETAIL)
@@ -116,7 +116,7 @@ index c52e032660..f0e1ff6f27 100644
116116

117117
/* Capture current provider id in each context; check for
118118
* error status.
119-
@@ -7271,18 +7302,61 @@ static void dplane_thread_loop(struct thread *event)
119+
@@ -7395,18 +7426,61 @@ static void dplane_thread_loop(struct event *event)
120120
if (!zdplane_info.dg_run)
121121
break;
122122

@@ -185,8 +185,8 @@ index c52e032660..f0e1ff6f27 100644
185185
dplane_provider_unlock(prov);
186186

187187
if (counter >= limit)
188-
@@ -7293,7 +7367,7 @@ static void dplane_thread_loop(struct thread *event)
189-
counter, dplane_provider_get_name(prov));
188+
@@ -7422,7 +7496,7 @@ static void dplane_thread_loop(struct event *event)
189+
}
190190

191191
/* Locate next provider */
192192
- prov = dplane_prov_list_next(&zdplane_info.dg_providers, prov);
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
From 6faad863f30d29157e4c675ad956e3ccd38991a7 Mon Sep 17 00:00:00 2001
2+
From: Donald Sharp <[email protected]>
3+
Date: Fri, 14 Jun 2024 13:36:51 -0400
4+
Subject: [PATCH] zebra: Prevent starvation in dplane_thread_loop
5+
6+
When removing a large number of routes, the linux kernel can take the
7+
cpu for an extended amount of time, leaving a situation where FRR
8+
detects a starvation event.
9+
10+
r1# sharp install routes 10.0.0.0 nexthop 192.168.44.33 1000000 repeat 10
11+
2024-06-14 12:55:49.365 [NTFY] sharpd: [M7Q4P-46WDR] vty[5]@# sharp install routes 10.0.0.0 nexthop 192.168.44.33 1000000 repeat 10
12+
2024-06-14 12:55:49.365 [DEBG] sharpd: [YP4TQ-01TYK] Inserting 1000000 routes
13+
2024-06-14 12:55:57.256 [DEBG] sharpd: [TPHKD-3NYSB] Installed All Items 7.890085
14+
2024-06-14 12:55:57.256 [DEBG] sharpd: [YJ486-NX5R1] Removing 1000000 routes
15+
2024-06-14 12:56:07.802 [WARN] zebra: [QH9AB-Y4XMZ][EC 100663314] STARVATION: task dplane_thread_loop (634377bc8f9e) ran for 7078ms (cpu time 220ms)
16+
2024-06-14 12:56:25.039 [DEBG] sharpd: [WTN53-GK9Y5] Removed all Items 27.783668
17+
2024-06-14 12:56:25.039 [DEBG] sharpd: [YP4TQ-01TYK] Inserting 1000000 routes
18+
2024-06-14 12:56:32.783 [DEBG] sharpd: [TPHKD-3NYSB] Installed All Items 7.743524
19+
2024-06-14 12:56:32.783 [DEBG] sharpd: [YJ486-NX5R1] Removing 1000000 routes
20+
2024-06-14 12:56:41.447 [WARN] zebra: [QH9AB-Y4XMZ][EC 100663314] STARVATION: task dplane_thread_loop (634377bc8f9e) ran for 5175ms (cpu time 179ms)
21+
22+
Let's modify the loop in dplane_thread_loop such that after a provider
23+
has been run, check to see if the event should yield, if so, stop
24+
and reschedule this for the future.
25+
26+
Signed-off-by: Donald Sharp <[email protected]>
27+
---
28+
zebra/zebra_dplane.c | 5 +++++
29+
1 file changed, 5 insertions(+)
30+
31+
diff --git a/zebra/zebra_dplane.c b/zebra/zebra_dplane.c
32+
index 06b34da209..3944876439 100644
33+
--- a/zebra/zebra_dplane.c
34+
+++ b/zebra/zebra_dplane.c
35+
@@ -7441,6 +7441,11 @@ static void dplane_thread_loop(struct event *event)
36+
zlog_debug("dplane dequeues %d completed work from provider %s",
37+
counter, dplane_provider_get_name(prov));
38+
39+
+ if (event_should_yield(event)) {
40+
+ reschedule = true;
41+
+ break;
42+
+ }
43+
+
44+
/* Locate next provider */
45+
prov = dplane_prov_list_next(&zdplane_info.dg_providers, prov);
46+
}
47+
--
48+
2.39.5
49+

src/sonic-frr/patch/0045-zebra-Limit-queue-depth-in-dplane_fpm_nl.patch

+8-5
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1-
From 50f606c158f6c89abd0d3f531905005d3a48a5b6 Mon Sep 17 00:00:00 2001
1+
From 1712fbd14dddd542e7aa4b468356abdfe42817d4 Mon Sep 17 00:00:00 2001
22
From: Donald Sharp <[email protected]>
33
Date: Wed, 12 Jun 2024 15:16:08 -0400
4-
Subject: [PATCH 4/5] zebra: Limit queue depth in dplane_fpm_nl
4+
Subject: [PATCH] zebra: Limit queue depth in dplane_fpm_nl
55

66
The dplane providers have a concept of input queues
77
and output queues. These queues are chained together
@@ -16,12 +16,15 @@ queue when it is already full. This will allow the backpressure
1616
to work appropriately in zebra proper.
1717

1818
Signed-off-by: Donald Sharp <[email protected]>
19+
---
20+
zebra/dplane_fpm_nl.c | 19 +++++++++++++++++++
21+
1 file changed, 19 insertions(+)
1922

2023
diff --git a/zebra/dplane_fpm_nl.c b/zebra/dplane_fpm_nl.c
21-
index bc9815bb10..4fd42f64a2 100644
24+
index a054d362f..81f1c9417 100644
2225
--- a/zebra/dplane_fpm_nl.c
2326
+++ b/zebra/dplane_fpm_nl.c
24-
@@ -1560,6 +1560,25 @@ static int fpm_nl_process(struct zebra_dplane_provider *prov)
27+
@@ -1603,6 +1603,25 @@ static int fpm_nl_process(struct zebra_dplane_provider *prov)
2528

2629
fnc = dplane_provider_get_data(prov);
2730
limit = dplane_provider_get_work_limit(prov);
@@ -36,7 +39,7 @@ index bc9815bb10..4fd42f64a2 100644
3639
+ ") of internal work, hold off",
3740
+ __func__, cur_queue);
3841
+ limit = 0;
39-
+ } else {
42+
+ } else if (cur_queue != 0) {
4043
+ if (IS_ZEBRA_DEBUG_FPM)
4144
+ zlog_debug("%s: current queue is %" PRIu64
4245
+ ", limiting to lesser amount of %" PRIu64,

0 commit comments

Comments
 (0)