Skip to content

zebra: Add kernel level graceful restart #4301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 25, 2019

Conversation

donaldsharp
Copy link
Member

Add the a --graceful_restart X flag to zebra start that
now creates a timer that pops in X seconds and will go
through and remove all routes that are older than startup.

If graceful_restart is not specified then we will just pop
a timer that cleans everything up immediately.

Signed-off-by: Praveen Chaudhary [email protected]
Signed-off-by: Donald Sharp [email protected]

@donaldsharp
Copy link
Member Author

@srimohans @praveen-li Please take a look at this. I was looking at #4301 and it was just overly complicated for what was needed. This simplifies a bunch of code in my opinion.

@LabN-CI
Copy link
Collaborator

LabN-CI commented May 10, 2019

💚 Basic BGPD CI results: SUCCESS, 0 tests failed

Results table
_ _
Result SUCCESS git merge/4301 dc45545
Date 05/09/2019
Start 20:45:24
Finish 21:09:33
Run-Time 24:09
Total 1813
Pass 1813
Fail 0
Valgrind-Errors 0
Valgrind-Loss 0
Details vncregress-2019-05-09-20:45:24.txt
Log autoscript-2019-05-09-20:46:07.log.bz2
Memory 446 421 374

For details, please contact louberger

@NetDEF-CI
Copy link
Collaborator

Continuous Integration Result: SUCCESSFUL

Congratulations, this patch passed basic tests

Tested-by: NetDEF / OpenSourceRouting.org CI System

CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7603/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.


CLANG Static Analyzer Summary

  • Github Pull Request 4301, comparing to Git base SHA 900193b

No Changes in Static Analysis warnings compared to base

12 Static Analyzer issues remaining.

See details at
https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7603/artifact/shared/static_analysis/index.html

Copy link
Contributor

@mjstapp mjstapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the simplicity very much - but did have a couple of questions...

Copy link
Contributor

@eqvinox eqvinox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a specification doc that bolts down how exactly things are supposed to work, just so responsibilities & states are clear on shutdown & startup

pretty please :)

@LabN-CI
Copy link
Collaborator

LabN-CI commented May 13, 2019

💚 Basic BGPD CI results: SUCCESS, 0 tests failed

Results table
_ _
Result SUCCESS git merge/4301 186ec00
Date 05/13/2019
Start 14:09:51
Finish 14:33:52
Run-Time 24:01
Total 1816
Pass 1816
Fail 0
Valgrind-Errors 0
Valgrind-Loss 0
Details vncregress-2019-05-13-14:09:51.txt
Log autoscript-2019-05-13-14:10:37.log.bz2
Memory 421 424 359

For details, please contact louberger

/*
* Time for when we sweep the rib from old routes
*/
time_t startup;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startup_time ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't the time_t already imply that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It implies at the declaration. But this variable is referred as zrouter.startup elsewhere. It doesn't imply the meaning there.

@NetDEF-CI
Copy link
Collaborator

Continuous Integration Result: SUCCESSFUL

Congratulations, this patch passed basic tests

Tested-by: NetDEF / OpenSourceRouting.org CI System

CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7650/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

Warnings Generated during build:

Checkout code: Successful with additional warnings
Report for zebra_rib.c | 5 issues
===============================================
< ERROR: strcat() is error-prone; please use strlcat() if possible#130: FILE: /tmp/f1-8269/zebra_rib.c:130:
---
> ERROR: strcat() is error-prone; please use strlcat() if possible#130: FILE: /tmp/f2-8269/zebra_rib.c:130:
14c14
< #147: FILE: /tmp/f1-8269/zebra_rib.c:147:

CLANG Static Analyzer Summary

  • Github Pull Request 4301, comparing to Git base SHA 4ac9d2c

No Changes in Static Analysis warnings compared to base

12 Static Analyzer issues remaining.

See details at
https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7650/artifact/shared/static_analysis/index.html

@eqvinox
Copy link
Contributor

eqvinox commented May 14, 2019

=> we want this to be part of the config instead of a startup flag

=> but for that we need to know when we've finished loading config

=> working on adding a "config done" mechanism

@LabN-CI
Copy link
Collaborator

LabN-CI commented May 15, 2019

💚 Basic BGPD CI results: SUCCESS, 0 tests failed

Results table
_ _
Result SUCCESS git merge/4301 7ba59d4
Date 05/14/2019
Start 22:45:23
Finish 23:09:33
Run-Time 24:10
Total 1816
Pass 1816
Fail 0
Valgrind-Errors 0
Valgrind-Loss 0
Details vncregress-2019-05-14-22:45:23.txt
Log autoscript-2019-05-14-22:46:07.log.bz2
Memory 430 434 360

For details, please contact louberger

@NetDEF-CI
Copy link
Collaborator

Continuous Integration Result: SUCCESSFUL

Congratulations, this patch passed basic tests

Tested-by: NetDEF / OpenSourceRouting.org CI System

CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7690/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

Warnings Generated during build:

Checkout code: Successful with additional warnings
Report for zebra_rib.c | 5 issues
===============================================
< ERROR: strcat() is error-prone; please use strlcat() if possible#130: FILE: /tmp/f1-22244/zebra_rib.c:130:
---
> ERROR: strcat() is error-prone; please use strlcat() if possible#130: FILE: /tmp/f2-22244/zebra_rib.c:130:
14c14
< #147: FILE: /tmp/f1-22244/zebra_rib.c:147:

CLANG Static Analyzer Summary

  • Github Pull Request 4301, comparing to Git base SHA 543c749

No Changes in Static Analysis warnings compared to base

16 Static Analyzer issues remaining.

See details at
https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7690/artifact/shared/static_analysis/index.html

@praveen-li
Copy link

I wrote a doc with my analysis, the one I used for first PR 4197. Let me know if this help. [link below]

Donald has good slides with timeline of zebra start, which can help to understand it better. Thanks.

https://github.com/praveen-li/docs/blob/master/frr-kernel-graceful-restart/frr_kernel_gr_design_utp.md

Copy link
Member

@riw777 riw777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this all looks like the right way to go about it, and the code looks fine.. I'll approve, but I see there is one comment outstanding before pushing.

@LabN-CI
Copy link
Collaborator

LabN-CI commented May 23, 2019

🛑 Basic BGPD CI results: FAILURE

Results table
_ _
Result FAILURE git pull/4301 04cb763 frr.github Build (merge failed)
Date 05/23/2019
Start 23:05:15
Finish 23:05:54
Run-Time 00:39
Total
Pass
Fail
Valgrind-Errors
Valgrind-Loss
Details vncregress-2019-05-23-23:05:15.txt
Log make-2019-05-23-23:05:15.out.bz2
Memory

For details, please contact louberger

@NetDEF-CI
Copy link
Collaborator

Continuous Integration Result: FAILED

See below for issues.
CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

Get source / Pull Request: Successful

Building Stage: Failed

FreeBSD 12 amd64 build: Failed (click for details) FreeBSD 12 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/FBSD12AMD64/config.status/config.status

Make failed for FreeBSD 12 amd64 build:
(see full Make log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/FBSD12AMD64/ErrorLog/log_make.txt)

  CC       lib/json.lo
  CC       lib/keychain.lo
  CC       lib/lib_errors.lo
  CC       lib/libfrr.lo
  CC       lib/linklist.lo
  CC       lib/log.lo
  CC       lib/md5.lo
  CC       lib/memory.lo
  CC       lib/memory_vty.lo
CentOS 7 amd64 build: Failed (click for details) CentOS 7 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/CI005BUILD/config.status/config.status

Make failed for CentOS 7 amd64 build:
(see full Make log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/CI005BUILD/ErrorLog/log_make.txt)

  CC       bgpd/bgp_ecommunity.o
  CC       bgpd/bgp_encap_tlv.o
  CC       bgpd/bgp_errors.o
  CC       bgpd/bgp_evpn.o
  CLIPPY   bgpd/bgp_evpn_vty_clippy.c
  CC       bgpd/bgp_evpn_vty.o
  CC       bgpd/bgp_filter.o
  CC       bgpd/bgp_flowspec.o
  CC       bgpd/bgp_flowspec_util.o
Ubuntu 18.04 amd64 build: Failed (click for details) Ubuntu 18.04 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/U1804AMD64/config.status/config.status

Make failed for Ubuntu 18.04 amd64 build:
(see full Make log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/U1804AMD64/ErrorLog/log_make.txt)

  CC       bgpd/bgp_ecommunity.o
  CC       bgpd/bgp_encap_tlv.o
  CC       bgpd/bgp_errors.o
  CC       bgpd/bgp_evpn.o
  CLIPPY   bgpd/bgp_evpn_vty_clippy.c
  CC       bgpd/bgp_evpn_vty.o
  CC       bgpd/bgp_filter.o
  CC       bgpd/bgp_flowspec.o
  CC       bgpd/bgp_flowspec_util.o
Ubuntu 14.04 amd64 build: Failed (click for details) Ubuntu 14.04 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/CI001BUILD/config.status/config.status

Make failed for Ubuntu 14.04 amd64 build:
(see full Make log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/CI001BUILD/ErrorLog/log_make.txt)

  CC       bgpd/bgp_ecommunity.o
  CC       bgpd/bgp_encap_tlv.o
  CC       bgpd/bgp_errors.o
  CC       bgpd/bgp_evpn.o
  CLIPPY   bgpd/bgp_evpn_vty_clippy.c
  CC       bgpd/bgp_evpn_vty.o
  CC       bgpd/bgp_filter.o
  CC       bgpd/bgp_flowspec.o
  CC       bgpd/bgp_flowspec_util.o
Debian 8 amd64 build: Failed (click for details) Debian 8 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/CI008BLD/config.status/config.status

Make failed for Debian 8 amd64 build:
(see full Make log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/CI008BLD/ErrorLog/log_make.txt)

  CC       bgpd/bgp_ecommunity.o
  CC       bgpd/bgp_encap_tlv.o
  CC       bgpd/bgp_errors.o
  CC       bgpd/bgp_evpn.o
  CLIPPY   bgpd/bgp_evpn_vty_clippy.c
  CC       bgpd/bgp_evpn_vty.o
  CC       bgpd/bgp_filter.o
  CC       bgpd/bgp_flowspec.o
  CC       bgpd/bgp_flowspec_util.o
Ubuntu 18.04 ppc64le build: Failed (click for details) Ubuntu 18.04 ppc64le build: config.log output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/U1804PPC64LEBUILD/config.log/ Ubuntu 18.04 ppc64le build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/U1804PPC64LEBUILD/config.status/config.status

Make failed for Ubuntu 18.04 ppc64le build:
(see full Make log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/U1804PPC64LEBUILD/ErrorLog/log_make.txt)

  CC       bgpd/bgp_ecommunity.o
  CC       bgpd/bgp_encap_tlv.o
  CC       bgpd/bgp_errors.o
  CC       bgpd/bgp_evpn.o
  CC       bgpd/bgp_filter.o
  CC       bgpd/bgp_flowspec.o
  CC       bgpd/bgp_flowspec_util.o
  CC       bgpd/bgp_flowspec_vty.o
  CC       bgpd/bgp_fsm.o
OmniOS amd64 build: Failed (click for details) OmniOS amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/CI010BUILD/config.status/config.status

Make failed for OmniOS amd64 build:
(see full Make log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/CI010BUILD/ErrorLog/log_make.txt)

checking for minix/config.h... no
checking whether it is safe to define __EXTENSIONS__... yes
checking whether gcc supports -diag-error 10006... no
checking whether gcc supports -std=gnu11... yes
checking whether gcc -std=gnu11 supports -g... yes
checking whether gcc -std=gnu11 supports -Os... yes
checking whether gcc -std=gnu11 supports -fno-omit-frame-pointer... yes
checking whether gcc -std=gnu11 supports -funwind-tables... yes
checking whether gcc -std=gnu11 supports -Wall... yes
Debian 9 amd64 build: Failed (click for details) Debian 9 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/CI021BUILD/config.status/config.status

Make failed for Debian 9 amd64 build:
(see full Make log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/CI021BUILD/ErrorLog/log_make.txt)

  CC       bgpd/bgp_ecommunity.o
  CC       bgpd/bgp_encap_tlv.o
  CC       bgpd/bgp_errors.o
  CC       bgpd/bgp_evpn.o
  CLIPPY   bgpd/bgp_evpn_vty_clippy.c
  CC       bgpd/bgp_evpn_vty.o
  CC       bgpd/bgp_filter.o
  CC       bgpd/bgp_flowspec.o
  CC       bgpd/bgp_flowspec_util.o
NetBSD 7 amd64 build: Failed (click for details) NetBSD 7 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/CI012BUILD/config.status/config.status

Make failed for NetBSD 7 amd64 build:
(see full Make log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/CI012BUILD/ErrorLog/log_make.txt)

  CC       lib/json.lo
  CC       lib/keychain.lo
  CC       lib/lib_errors.lo
  CC       lib/libfrr.lo
  CC       lib/linklist.lo
  CC       lib/log.lo
  CC       lib/md5.lo
  CC       lib/memory.lo
  CC       lib/memory_vty.lo
FreeBSD 11 amd64 build: Failed (click for details) FreeBSD 11 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/CI009BUILD/config.status/config.status

Make failed for FreeBSD 11 amd64 build:
(see full Make log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/CI009BUILD/ErrorLog/log_make.txt)

  CC       lib/json.lo
  CC       lib/keychain.lo
  CC       lib/lib_errors.lo
  CC       lib/libfrr.lo
  CC       lib/linklist.lo
  CC       lib/log.lo
  CC       lib/md5.lo
  CC       lib/memory.lo
  CC       lib/memory_vty.lo
Ubuntu 16.04 amd64 build: Failed (click for details) Ubuntu 16.04 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/CI014BUILD/config.status/config.status

Make failed for Ubuntu 16.04 amd64 build:
(see full Make log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/CI014BUILD/ErrorLog/log_make.txt)

  CC       bgpd/bgp_ecommunity.o
  CC       bgpd/bgp_encap_tlv.o
  CC       bgpd/bgp_errors.o
  CC       bgpd/bgp_evpn.o
  CLIPPY   bgpd/bgp_evpn_vty_clippy.c
  CC       bgpd/bgp_evpn_vty.o
  CC       bgpd/bgp_filter.o
  CC       bgpd/bgp_flowspec.o
  CC       bgpd/bgp_flowspec_util.o
Ubuntu 16.04 i386 build: Failed (click for details) Ubuntu 16.04 i386 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/U1604I386/config.status/config.status

Make failed for Ubuntu 16.04 i386 build:
(see full Make log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/U1604I386/ErrorLog/log_make.txt)

  CC       bgpd/bgp_ecommunity.o
  CC       bgpd/bgp_encap_tlv.o
  CC       bgpd/bgp_errors.o
  CC       bgpd/bgp_evpn.o
  CLIPPY   bgpd/bgp_evpn_vty_clippy.c
  CC       bgpd/bgp_evpn_vty.o
  CC       bgpd/bgp_filter.o
  CC       bgpd/bgp_flowspec.o
  CC       bgpd/bgp_flowspec_util.o
Successful on other platforms
  • OpenBSD 6 amd64 build
  • CentOS 6 amd64 build
  • NetBSD 6 amd64 build
  • Ubuntu 12.04 amd64 build

<Initial Code from Praveen Chaudhary>

Add the a `--graceful_restart X` flag to zebra start that
now creates a timer that pops in X seconds and will go
through and remove all routes that are older than startup.

If graceful_restart is not specified then we will just pop
a timer that cleans everything up immediately.

Signed-off-by: Praveen Chaudhary <[email protected]>
Signed-off-by: Donald Sharp <[email protected]>
This code doees this:

a) Imagine ospf installs a route into zebra.  Zebra crashes and
we restart FRR.  If we are using the -k option on zebra than
all routes are re-read in, including this OSPF route.
b) Now imagine at the same time that zebra is starting backup
ospf on a different router looses a link to the this route.
c) Since zebra was run with -k this OSPF route is read back
in but never replaced and we now have a route pointing out
an interface to other routers that cannot handle it.

We should never allow users to implement bad options from zebra's
perspective that allow them to put themselves into a clear problem
state and additionally we have *absolutely* no mechanism to ever
fix that broken route without special human interaction.

Signed-off-by: Donald Sharp <[email protected]>
@LabN-CI
Copy link
Collaborator

LabN-CI commented May 24, 2019

💚 Basic BGPD CI results: SUCCESS, 0 tests failed

Results table
_ _
Result SUCCESS git merge/4301 33656d2
Date 05/23/2019
Start 23:40:13
Finish 00:02:22
Run-Time 22:09
Total 1813
Pass 1813
Fail 0
Valgrind-Errors 0
Valgrind-Loss 0
Details vncregress-2019-05-23-23:40:13.txt
Log autoscript-2019-05-23-23:41:15.log.bz2
Memory 412 430 360

For details, please contact louberger

@NetDEF-CI
Copy link
Collaborator

Continuous Integration Result: SUCCESSFUL

Congratulations, this patch passed basic tests

Tested-by: NetDEF / OpenSourceRouting.org CI System

CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7811/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

Warnings Generated during build:

Checkout code: Successful with additional warnings
Report for zebra_rib.c | 5 issues
===============================================
< ERROR: strcat() is error-prone; please use strlcat() if possible#131: FILE: /tmp/f1-26038/zebra_rib.c:131:
---
> ERROR: strcat() is error-prone; please use strlcat() if possible#131: FILE: /tmp/f2-26038/zebra_rib.c:131:
14c14
< #148: FILE: /tmp/f1-26038/zebra_rib.c:148:

CLANG Static Analyzer Summary

  • Github Pull Request 4301, comparing to Git base SHA 528628c

No Changes in Static Analysis warnings compared to base

11 Static Analyzer issues remaining.

See details at
https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7811/artifact/shared/static_analysis/index.html

@LabN-CI
Copy link
Collaborator

LabN-CI commented May 25, 2019

💚 Basic BGPD CI results: SUCCESS, 0 tests failed

Results table
_ _
Result SUCCESS git merge/4301 33656d2
Date 05/25/2019
Start 04:37:00
Finish 04:59:21
Run-Time 22:21
Total 1813
Pass 1813
Fail 0
Valgrind-Errors 0
Valgrind-Loss 0
Details vncregress-2019-05-25-04:37:00.txt
Log autoscript-2019-05-25-04:38:03.log.bz2
Memory 397 408 360

For details, please contact louberger

Copy link
Contributor

@eqvinox eqvinox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need the docs on this just yet (I'm mostly concerned about ZAPI interactions), so changing from my change request to approved.

@eqvinox eqvinox merged commit d78978e into FRRouting:master May 25, 2019
@praveen-li
Copy link

@donaldsharp It will be great if this can be back-ported in stable\7.0 and 6.0.

@donaldsharp
Copy link
Member Author

nope we do not put features into old branches. If someone needs it there they are free to create their own version of it.

praveen-li pushed a commit to praveen-li/sonic-buildimage that referenced this pull request Sep 30, 2019
praveen-li pushed a commit to praveen-li/sonic-buildimage that referenced this pull request Oct 16, 2019
pavel-shirshov pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Oct 17, 2019
* [FRR]: Patch for kernel level graceful restart.

Original Patch in FRR master:
FRRouting/frr#4301

* Rename 0007-zebra-kernel-level-graceful-restart.patch to 0008-zebra-kernel-level-graceful-restart.patch
@donaldsharp donaldsharp deleted the graceful_restart branch December 9, 2019 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants