Skip to content

Decrease Connect Retry Timer for internal bgp neighbors. #7087

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 18, 2021

Conversation

judyjoseph
Copy link
Contributor

@judyjoseph judyjoseph commented Mar 17, 2021

Why I did it

It was observed that on a multi-asic DUT bootup, the BGP internal sessions between ASIC's was taking more time to get ESTABLISHED than external BGP sessions. The internal sessions was coming up almost exactly 120 secs later.

In multi-asic platform the bgp dockers ( which is per ASIC ) on switch start are bring brought up around the same time and they try to make the bgp sessions with neighbors (in peer ASIC's) which may be not be completely up. This results in BGP connect fail and the retry happens after 120sec which is the default Connect Retry Timer

How I did it

Add the command to set the bgp neighboring session retry timer to 10sec for internal bgp neighbors.

How to verify it

With this fix the sessions come up along with external BGP neighboring sessions

Before the fix:

Neighbhor      V     AS    MsgRcvd    MsgSent    TblVer    InQ    OutQ  Up/Down      State/PfxRcd  NeighborName
-----------  ---  -----  ---------  ---------  --------  -----  ------  ---------  --------------  --------------
10.0.0.1       4  65200       3192         30         0      0       0  00:02:10             6370  ARISTA01T2
10.0.0.5       4  65200       3192         30         0      0       0  00:02:10             6370  ARISTA03T2
10.0.0.9       4  65200       3192         51         0      0       0  00:02:10             6370  ARISTA05T2
10.0.0.13      4  65200       3192         51         0      0       0  00:02:10             6370  ARISTA07T2
10.0.0.33      4  64001          8       4653         0      0       0  00:02:10                3  ARISTA01T0
10.0.0.35      4  64002          8       4653         0      0       0  00:02:10                3  ARISTA02T0
10.0.0.37      4  64003          9       4653         0      0       0  00:02:10                4  ARISTA03T0
10.0.0.39      4  64004         10       4653         0      0       0  00:02:10                3  ARISTA04T0
10.0.0.41      4  64005          9       4653         0      0       0  00:02:09                4  ARISTA05T0
10.0.0.43      4  64006          8       4653         0      0       0  00:02:10                3  ARISTA06T0
10.0.0.45      4  64007          8       4653         0      0       0  00:02:09                3  ARISTA07T0
10.0.0.47      4  64008          8       4653         0      0       0  00:02:10                3  ARISTA08T0
10.0.0.49      4  64009          8       4653         0      0       0  00:02:09                3  ARISTA09T0
10.0.0.51      4  64010          8       4653         0      0       0  00:02:09                3  ARISTA10T0
10.0.0.53      4  64011          8       4343         0      0       0  00:02:10                3  ARISTA11T0
10.0.0.55      4  64012          8       4343         0      0       0  00:02:10                3  ARISTA12T0
10.0.0.57      4  64013          8       4343         0      0       0  00:02:10                3  ARISTA13T0
10.0.0.59      4  64014          8       4343         0      0       0  00:02:09                3  ARISTA14T0
10.0.0.61      4  64015          8       4343         0      0       0  00:02:09                3  ARISTA15T0
10.0.0.63      4  64016          8       4343         0      0       0  00:02:10                3  ARISTA16T0
10.0.0.65      4  64017          7       4343         0      0       0  00:02:10                1  ARISTA17T0
10.0.0.67      4  64018          7       4343         0      0       0  00:02:10                1  ARISTA18T0
10.0.0.69      4  64019          7       4346         0      0       0  00:02:08                1  ARISTA19T0
10.0.0.71      4  64020          7       4343         0      0       0  00:02:10                1  ARISTA20T0
10.1.0.0       4  65100       9209       3203         0      0       0  00:00:22             6464  ASIC4
10.1.0.1       4  65100       3203       9209         0      0       0  00:00:24             6377  ASIC0
10.1.0.2       4  65100      11101       3203         0      0       0  00:00:22             6464  ASIC5
10.1.0.3       4  65100       3204      11102         0      0       0  00:00:24             6377  ASIC0
10.1.0.4       4  65100       9209       3203         0      0       0  00:00:23             6464  ASIC4
10.1.0.5       4  65100       3204       9210         0      0       0  00:00:25             6377  ASIC1
10.1.0.6       4  65100      11101       3203         0      0       0  00:00:23             6464  ASIC5
10.1.0.7       4  65100       3204      11102         0      0       0  00:00:25             6377  ASIC1
10.1.0.8       4  65100       9209         35         0      0       0  00:00:23             6464  ASIC4
10.1.0.9       4  65100         35       9209         0      0       0  00:00:24               45  ASIC2
10.1.0.10      4  65100      11101         35         0      0       0  00:00:23             6464  ASIC5
10.1.0.11      4  65100         36      11102         0      0       0  00:00:24               45  ASIC2
10.1.0.12      4  65100       9209         34         0      0       0  00:00:23             6464  ASIC4
10.1.0.13      4  65100         34       9209         0      0       0  00:00:24               36  ASIC3
10.1.0.14      4  65100      11101         34         0      0       0  00:00:23             6464  ASIC5
10.1.0.15      4  65100         35      11102         0      0       0  00:00:24               36  ASIC3

After the fix:

admin@str--acs-1:/var/log$ show ip bgp summary -d all

Neighbhor        V     AS    MsgRcvd    MsgSent    TblVer    InQ    OutQ  Up/Down    State/PfxRcd    NeighborName
-------------  ---  -----  ---------  ---------  --------  -----  ------  ---------  --------------  --------------
10.0.0.1         4  65200       3192         53         0      0       0  00:02:29   6370            ARISTA01T2
10.0.0.5         4  65200       3192         53         0      0       0  00:02:29   6370            ARISTA03T2
10.0.0.9         4  65200       3192         52         0      0       0  00:02:29   6370            ARISTA05T2
10.0.0.13        4  65200       3192         52         0      0       0  00:02:29   6370            ARISTA07T2
10.0.0.33        4  64001          8       3926         0      0       0  00:02:28   3               ARISTA01T0
10.0.0.35        4  64002          8       3926         0      0       0  00:02:28   3               ARISTA02T0
10.0.0.37        4  64003          9       3926         0      0       0  00:02:28   4               ARISTA03T0
10.0.0.39        4  64004          8       3926         0      0       0  00:02:28   3               ARISTA04T0
10.0.0.41        4  64005          9       3926         0      0       0  00:02:28   4               ARISTA05T0
10.0.0.43        4  64006          8       3926         0      0       0  00:02:28   3               ARISTA06T0
10.0.0.45        4  64007          8       3926         0      0       0  00:02:28   3               ARISTA07T0
10.0.0.47        4  64008          8       3926         0      0       0  00:02:28   3               ARISTA08T0
10.0.0.49        4  64009          8       3926         0      0       0  00:02:28   3               ARISTA09T0
10.0.0.51        4  64010          8       3926         0      0       0  00:02:28   3               ARISTA10T0
10.0.0.53        4  64011          8       6410         0      0       0  00:02:28   3               ARISTA11T0
10.0.0.55        4  64012          8       6410         0      0       0  00:02:28   3               ARISTA12T0
10.0.0.57        4  64013          8       6413         0      0       0  00:02:28   3               ARISTA13T0
10.0.0.59        4  64014          8       6413         0      0       0  00:02:28   3               ARISTA14T0
10.0.0.61        4  64015          8       6410         0      0       0  00:02:28   3               ARISTA15T0
10.0.0.63        4  64016          8       6410         0      0       0  00:02:29   3               ARISTA16T0
10.0.0.65        4  64017          7       6410         0      0       0  00:02:28   1               ARISTA17T0
10.0.0.67        4  64018          7       6413         0      0       0  00:02:28   1               ARISTA18T0
10.0.0.69        4  64019          7       6410         0      0       0  00:02:28   1               ARISTA19T0
10.0.0.71        4  64020          7       6410         0      0       0  00:02:28   1               ARISTA20T0
10.1.0.0         4  65100       6440       3245         0      0       0  00:02:28   6464            ASIC4
10.1.0.1         4  65100       3245       6440         0      0       0  00:02:30   6377            ASIC0
10.1.0.2         4  65100       4694       3245         0      0       0  00:02:29   6464            ASIC5
10.1.0.3         4  65100       3246       4695         0      0       0  00:02:31   6377            ASIC0
10.1.0.4         4  65100       6440       3245         0      0       0  00:02:28   6464            ASIC4
10.1.0.5         4  65100       3245       6440         0      0       0  00:02:30   6377            ASIC1
10.1.0.6         4  65100       4694       3349         0      0       0  00:02:29   6464            ASIC5
10.1.0.7         4  65100       3350       4695         0      0       0  00:02:31   6377            ASIC1
10.1.0.8         4  65100       6440         80         0      0       0  00:02:28   6464            ASIC4
10.1.0.9         4  65100         80       6440         0      0       0  00:02:29   45              ASIC2
10.1.0.10        4  65100       4694         81         0      0       0  00:02:30   6464            ASIC5
10.1.0.11        4  65100         81       4695         0      0       0  00:02:31   45              ASIC2
10.1.0.12        4  65100       3270         81         0      0       0  00:02:25   6464            ASIC4
10.1.0.13        4  65100         80       3272         0      0       0  00:02:26   36              ASIC3
10.1.0.14        4  65100       4695         81         0      0       0  00:02:30   6464            ASIC5
10.1.0.15        4  65100         81       4695         0      0       0  00:02:31   36              ASIC3

admin@str--acs-1:/var/log$ show ipv6 bgp summary -d all


Neighbhor              V     AS    MsgRcvd    MsgSent    TblVer    InQ    OutQ  Up/Down    State/PfxRcd    NeighborName
-------------------  ---  -----  ---------  ---------  --------  -----  ------  ---------  --------------  --------------
10.12.103.119          4  65100          0          0         0      0       0  never      Connect         BGPMonitor
10.12.103.119          4  65100          0          0         0      0       0  never      Connect         BGPMonitor
25.71.45.41            4  65100          0          0         0      0       0  never      Connect         BGPMonitor
25.71.45.41            4  65100          0          0         0      0       0  never      Connect         BGPMonitor
2603:10e2:400:1::1     4  65100       6527       4422         0      0       0  00:22:46   6445            ASIC4
2603:10e2:400:1::1a    4  65100        481       6530         0      0       0  00:22:47   29              ASIC3
2603:10e2:400:1::1d    4  65100       6452        481         0      0       0  00:22:47   6445            ASIC5
2603:10e2:400:1::1e    4  65100        481       6452         0      0       0  00:22:48   29              ASIC3
2603:10e2:400:1::2     4  65100       4423       6528         0      0       0  00:22:48   6380            ASIC0
2603:10e2:400:1::5     4  65100       6452       4422         0      0       0  00:22:46   6445            ASIC5
2603:10e2:400:1::6     4  65100       4424       6458         0      0       0  00:22:49   6380            ASIC0
2603:10e2:400:1::9     4  65100       6527       4304         0      0       0  00:22:47   6445            ASIC4
2603:10e2:400:1::11    4  65100       6527        485         0      0       0  00:22:46   6445            ASIC4
2603:10e2:400:1::12    4  65100        485       6527         0      0       0  00:22:47   37              ASIC2
2603:10e2:400:1::15    4  65100       6452        485         0      0       0  00:22:46   6445            ASIC5
2603:10e2:400:1::16    4  65100        485       6452         0      0       0  00:22:48   37              ASIC2
2603:10e2:400:1::19    4  65100       6527        481         0      0       0  00:22:47   6445            ASIC4
2603:10e2:400:1::a     4  65100       4304       6530         0      0       0  00:22:48   6380            ASIC1
2603:10e2:400:1::d     4  65100       6452       4304         0      0       0  00:22:47   6445            ASIC5
2603:10e2:400:1::e     4  65100       4305       6456         0      0       0  00:22:49   6380            ASIC1
fc00::2                4  65200       3212         58         0      0       0  00:22:45   6370            ARISTA01T2
fc00::4a               4  64003         28       3228         0      0       0  00:22:44   2               ARISTA03T0
fc00::4e               4  64004         28       3228         0      0       0  00:22:44   2               ARISTA04T0
fc00::5a               4  64007         28       3229         0      0       0  00:22:43   2               ARISTA07T0
fc00::5e               4  64008         28       3228         0      0       0  00:22:44   2               ARISTA08T0
fc00::6                4  65200       3212         58         0      0       0  00:22:45   6370            ARISTA03T2
fc00::6a               4  64011         28       3445         0      0       0  00:22:45   2               ARISTA11T0
fc00::6e               4  64012         28       3445         0      0       0  00:22:45   2               ARISTA12T0
fc00::7a               4  64015         28       3445         0      0       0  00:22:45   2               ARISTA15T0
fc00::7e               4  64016         28       3445         0      0       0  00:22:46   2               ARISTA16T0
fc00::8a               4  64019         27       3445         0      0       0  00:22:45   0               ARISTA19T0
fc00::8e               4  64020         27       3445         0      0       0  00:22:45   0               ARISTA20T0
fc00::42               4  64001         28       3229         0      0       0  00:22:43   2               ARISTA01T0
fc00::46               4  64002         28       3228         0      0       0  00:22:44   2               ARISTA02T0
fc00::52               4  64005         28       3228         0      0       0  00:22:44   2               ARISTA05T0
fc00::56               4  64006         28       3228         0      0       0  00:22:44   2               ARISTA06T0
fc00::62               4  64009         28       3228         0      0       0  00:22:44   2               ARISTA09T0
fc00::66               4  64010         28       3228         0      0       0  00:22:44   2               ARISTA10T0
fc00::72               4  64013         28       3448         0      0       0  00:22:45   2               ARISTA13T0
fc00::76               4  64014         28       3448         0      0       0  00:22:45   2               ARISTA14T0
fc00::82               4  64017         27       3445         0      0       0  00:22:45   0               ARISTA17T0
fc00::86               4  64018         27       3448         0      0       0  00:22:45   0               ARISTA18T0
fc00::a                4  65200       3212         59         0      0       0  00:22:46   6370            ARISTA05T2
fc00::e                4  65200       3212         59         0      0       0  00:22:46   6370            ARISTA07T2

Thanks to @abdosi for noticing this behavior on production device and debugging !

Which release branch to backport (provide reason below if selected)

  • 201811
  • [x ] 201911
  • [x ] 202006
  • [x ] 202012

Description for the changelog

A picture of a cute animal (not mandatory but encouraged)

@judyjoseph judyjoseph requested a review from lguohan as a code owner March 17, 2021 23:09
@judyjoseph judyjoseph changed the title Decrease Connect Retry Timer from default value 120sec --> 10 sec Decrease Connect Retry Timer to 10 sec for internal bgp neighbors. Mar 17, 2021
@judyjoseph judyjoseph changed the title Decrease Connect Retry Timer to 10 sec for internal bgp neighbors. Decrease Connect Retry Timer for internal bgp neighbors. Mar 17, 2021
@lguohan
Copy link
Collaborator

lguohan commented Mar 17, 2021

@shi-su , we should need this for external connection as well.

@judyjoseph , is there a global option for this one?

It is per neighbor. Mention here also : FRRouting/frr#4745

@judyjoseph judyjoseph requested review from lguohan, abdosi and rlhui March 17, 2021 23:27
@judyjoseph judyjoseph merged commit 9d9503e into sonic-net:master Mar 18, 2021
abdosi pushed a commit that referenced this pull request Mar 18, 2021
…c to 10 sec. (#7087)

Why I did it
It was observed that on a multi-asic DUT bootup, the BGP internal sessions between ASIC's was taking more time to get ESTABLISHED than external BGP sessions. The internal sessions was coming up almost exactly 120 secs later.

In multi-asic platform the bgp dockers ( which is per ASIC ) on switch start are bring brought up around the same time and they try to make the bgp sessions with neighbors (in peer ASIC's) which may be not be completely up. This results in BGP connect fail and the retry happens after 120sec which is the default Connect Retry Timer

How I did it
Add the command to set the bgp neighboring session retry timer to 10sec for internal bgp neighbors.
yxieca pushed a commit that referenced this pull request Mar 26, 2021
…c to 10 sec. (#7087)

Why I did it
It was observed that on a multi-asic DUT bootup, the BGP internal sessions between ASIC's was taking more time to get ESTABLISHED than external BGP sessions. The internal sessions was coming up almost exactly 120 secs later.

In multi-asic platform the bgp dockers ( which is per ASIC ) on switch start are bring brought up around the same time and they try to make the bgp sessions with neighbors (in peer ASIC's) which may be not be completely up. This results in BGP connect fail and the retry happens after 120sec which is the default Connect Retry Timer

How I did it
Add the command to set the bgp neighboring session retry timer to 10sec for internal bgp neighbors.
raphaelt-nvidia pushed a commit to raphaelt-nvidia/sonic-buildimage that referenced this pull request May 23, 2021
…c to 10 sec. (sonic-net#7087)

Why I did it
It was observed that on a multi-asic DUT bootup, the BGP internal sessions between ASIC's was taking more time to get ESTABLISHED than external BGP sessions. The internal sessions was coming up almost exactly 120 secs later.

In multi-asic platform the bgp dockers ( which is per ASIC ) on switch start are bring brought up around the same time and they try to make the bgp sessions with neighbors (in peer ASIC's) which may be not be completely up. This results in BGP connect fail and the retry happens after 120sec which is the default Connect Retry Timer

How I did it
Add the command to set the bgp neighboring session retry timer to 10sec for internal bgp neighbors.
carl-nokia pushed a commit to carl-nokia/sonic-buildimage that referenced this pull request Aug 7, 2021
…c to 10 sec. (sonic-net#7087)

Why I did it
It was observed that on a multi-asic DUT bootup, the BGP internal sessions between ASIC's was taking more time to get ESTABLISHED than external BGP sessions. The internal sessions was coming up almost exactly 120 secs later.

In multi-asic platform the bgp dockers ( which is per ASIC ) on switch start are bring brought up around the same time and they try to make the bgp sessions with neighbors (in peer ASIC's) which may be not be completely up. This results in BGP connect fail and the retry happens after 120sec which is the default Connect Retry Timer

How I did it
Add the command to set the bgp neighboring session retry timer to 10sec for internal bgp neighbors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants