Skip to content

[202412] [FRR] Add support for 514 BGP sessions #1034

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 23, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
From 4671ddf4920553b663fda129f7c4366839347645 Mon Sep 17 00:00:00 2001
From e6e096f2507e76c375ba9d6b20c05af0b61ce2cd Mon Sep 17 00:00:00 2001
From: Donald Sharp <[email protected]>
Date: Wed, 12 Jun 2024 14:14:48 -0400
Subject: [PATCH 3/5] zebra: Modify dplane loop to allow backpressure to filter
up
Subject: [PATCH] zebra: Modify dplane loop to allow backpressure to filter up

Currently when the dplane_thread_loop is run, it moves contexts
from the dg_update_list and puts the contexts on the input queue
Expand Down Expand Up @@ -30,11 +29,12 @@ context system and memory will not go out of control.

Signed-off-by: Donald Sharp <[email protected]>


diff --git a/zebra/zebra_dplane.c b/zebra/zebra_dplane.c
index c52e032660..f0e1ff6f27 100644
index 44ee41d8c..0460463bc 100644
--- a/zebra/zebra_dplane.c
+++ b/zebra/zebra_dplane.c
@@ -7155,10 +7155,10 @@ static void dplane_thread_loop(struct thread *event)
@@ -7279,10 +7279,10 @@ static void dplane_thread_loop(struct event *event)
{
struct dplane_ctx_list_head work_list;
struct dplane_ctx_list_head error_list;
Expand All @@ -47,7 +47,7 @@ index c52e032660..f0e1ff6f27 100644
bool reschedule = false;

/* Capture work limit per cycle */
@@ -7182,18 +7182,48 @@ static void dplane_thread_loop(struct thread *event)
@@ -7306,18 +7306,48 @@ static void dplane_thread_loop(struct event *event)
/* Locate initial registered provider */
prov = dplane_prov_list_first(&zdplane_info.dg_providers);

Expand Down Expand Up @@ -104,7 +104,7 @@ index c52e032660..f0e1ff6f27 100644
DPLANE_UNLOCK();

atomic_fetch_sub_explicit(&zdplane_info.dg_routes_queued, counter,
@@ -7212,8 +7242,9 @@ static void dplane_thread_loop(struct thread *event)
@@ -7336,8 +7366,9 @@ static void dplane_thread_loop(struct event *event)
* items.
*/
if (IS_ZEBRA_DEBUG_DPLANE_DETAIL)
Expand All @@ -116,7 +116,7 @@ index c52e032660..f0e1ff6f27 100644

/* Capture current provider id in each context; check for
* error status.
@@ -7271,18 +7302,61 @@ static void dplane_thread_loop(struct thread *event)
@@ -7395,18 +7426,61 @@ static void dplane_thread_loop(struct event *event)
if (!zdplane_info.dg_run)
break;

Expand Down Expand Up @@ -185,8 +185,8 @@ index c52e032660..f0e1ff6f27 100644
dplane_provider_unlock(prov);

if (counter >= limit)
@@ -7293,7 +7367,7 @@ static void dplane_thread_loop(struct thread *event)
counter, dplane_provider_get_name(prov));
@@ -7422,7 +7496,7 @@ static void dplane_thread_loop(struct event *event)
}

/* Locate next provider */
- prov = dplane_prov_list_next(&zdplane_info.dg_providers, prov);
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
From 6faad863f30d29157e4c675ad956e3ccd38991a7 Mon Sep 17 00:00:00 2001
From: Donald Sharp <[email protected]>
Date: Fri, 14 Jun 2024 13:36:51 -0400
Subject: [PATCH] zebra: Prevent starvation in dplane_thread_loop

When removing a large number of routes, the linux kernel can take the
cpu for an extended amount of time, leaving a situation where FRR
detects a starvation event.

r1# sharp install routes 10.0.0.0 nexthop 192.168.44.33 1000000 repeat 10
2024-06-14 12:55:49.365 [NTFY] sharpd: [M7Q4P-46WDR] vty[5]@# sharp install routes 10.0.0.0 nexthop 192.168.44.33 1000000 repeat 10
2024-06-14 12:55:49.365 [DEBG] sharpd: [YP4TQ-01TYK] Inserting 1000000 routes
2024-06-14 12:55:57.256 [DEBG] sharpd: [TPHKD-3NYSB] Installed All Items 7.890085
2024-06-14 12:55:57.256 [DEBG] sharpd: [YJ486-NX5R1] Removing 1000000 routes
2024-06-14 12:56:07.802 [WARN] zebra: [QH9AB-Y4XMZ][EC 100663314] STARVATION: task dplane_thread_loop (634377bc8f9e) ran for 7078ms (cpu time 220ms)
2024-06-14 12:56:25.039 [DEBG] sharpd: [WTN53-GK9Y5] Removed all Items 27.783668
2024-06-14 12:56:25.039 [DEBG] sharpd: [YP4TQ-01TYK] Inserting 1000000 routes
2024-06-14 12:56:32.783 [DEBG] sharpd: [TPHKD-3NYSB] Installed All Items 7.743524
2024-06-14 12:56:32.783 [DEBG] sharpd: [YJ486-NX5R1] Removing 1000000 routes
2024-06-14 12:56:41.447 [WARN] zebra: [QH9AB-Y4XMZ][EC 100663314] STARVATION: task dplane_thread_loop (634377bc8f9e) ran for 5175ms (cpu time 179ms)

Let's modify the loop in dplane_thread_loop such that after a provider
has been run, check to see if the event should yield, if so, stop
and reschedule this for the future.

Signed-off-by: Donald Sharp <[email protected]>
---
zebra/zebra_dplane.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/zebra/zebra_dplane.c b/zebra/zebra_dplane.c
index 06b34da209..3944876439 100644
--- a/zebra/zebra_dplane.c
+++ b/zebra/zebra_dplane.c
@@ -7441,6 +7441,11 @@ static void dplane_thread_loop(struct event *event)
zlog_debug("dplane dequeues %d completed work from provider %s",
counter, dplane_provider_get_name(prov));

+ if (event_should_yield(event)) {
+ reschedule = true;
+ break;
+ }
+
/* Locate next provider */
prov = dplane_prov_list_next(&zdplane_info.dg_providers, prov);
}
--
2.39.5

Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
From 50f606c158f6c89abd0d3f531905005d3a48a5b6 Mon Sep 17 00:00:00 2001
From 1712fbd14dddd542e7aa4b468356abdfe42817d4 Mon Sep 17 00:00:00 2001
From: Donald Sharp <[email protected]>
Date: Wed, 12 Jun 2024 15:16:08 -0400
Subject: [PATCH 4/5] zebra: Limit queue depth in dplane_fpm_nl
Subject: [PATCH] zebra: Limit queue depth in dplane_fpm_nl

The dplane providers have a concept of input queues
and output queues. These queues are chained together
Expand All @@ -16,12 +16,15 @@ queue when it is already full. This will allow the backpressure
to work appropriately in zebra proper.

Signed-off-by: Donald Sharp <[email protected]>
---
zebra/dplane_fpm_nl.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)

diff --git a/zebra/dplane_fpm_nl.c b/zebra/dplane_fpm_nl.c
index bc9815bb10..4fd42f64a2 100644
index a054d362f..81f1c9417 100644
--- a/zebra/dplane_fpm_nl.c
+++ b/zebra/dplane_fpm_nl.c
@@ -1560,6 +1560,25 @@ static int fpm_nl_process(struct zebra_dplane_provider *prov)
@@ -1603,6 +1603,25 @@ static int fpm_nl_process(struct zebra_dplane_provider *prov)

fnc = dplane_provider_get_data(prov);
limit = dplane_provider_get_work_limit(prov);
Expand All @@ -36,7 +39,7 @@ index bc9815bb10..4fd42f64a2 100644
+ ") of internal work, hold off",
+ __func__, cur_queue);
+ limit = 0;
+ } else {
+ } else if (cur_queue != 0) {
+ if (IS_ZEBRA_DEBUG_FPM)
+ zlog_debug("%s: current queue is %" PRIu64
+ ", limiting to lesser amount of %" PRIu64,
Expand Down
Loading