-
Notifications
You must be signed in to change notification settings - Fork 1.3k
zebra: Add kernel level graceful restart #4301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@srimohans @praveen-li Please take a look at this. I was looking at #4301 and it was just overly complicated for what was needed. This simplifies a bunch of code in my opinion. |
💚 Basic BGPD CI results: SUCCESS, 0 tests failedResults table
For details, please contact louberger |
Continuous Integration Result: SUCCESSFULCongratulations, this patch passed basic tests Tested-by: NetDEF / OpenSourceRouting.org CI System CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7603/ This is a comment from an automated CI system. CLANG Static Analyzer Summary
No Changes in Static Analysis warnings compared to base12 Static Analyzer issues remaining.See details at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the simplicity very much - but did have a couple of questions...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a specification doc that bolts down how exactly things are supposed to work, just so responsibilities & states are clear on shutdown & startup
pretty please :)
dc45545
to
186ec00
Compare
💚 Basic BGPD CI results: SUCCESS, 0 tests failedResults table
For details, please contact louberger |
zebra/zebra_router.h
Outdated
/* | ||
* Time for when we sweep the rib from old routes | ||
*/ | ||
time_t startup; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
startup_time ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't the time_t already imply that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It implies at the declaration. But this variable is referred as zrouter.startup elsewhere. It doesn't imply the meaning there.
Continuous Integration Result: SUCCESSFULCongratulations, this patch passed basic tests Tested-by: NetDEF / OpenSourceRouting.org CI System CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7650/ This is a comment from an automated CI system. Warnings Generated during build:Checkout code: Successful with additional warnings
CLANG Static Analyzer Summary
No Changes in Static Analysis warnings compared to base12 Static Analyzer issues remaining.See details at |
=> we want this to be part of the config instead of a startup flag => but for that we need to know when we've finished loading config => working on adding a "config done" mechanism |
186ec00
to
7ba59d4
Compare
💚 Basic BGPD CI results: SUCCESS, 0 tests failedResults table
For details, please contact louberger |
Continuous Integration Result: SUCCESSFULCongratulations, this patch passed basic tests Tested-by: NetDEF / OpenSourceRouting.org CI System CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7690/ This is a comment from an automated CI system. Warnings Generated during build:Checkout code: Successful with additional warnings
CLANG Static Analyzer Summary
No Changes in Static Analysis warnings compared to base16 Static Analyzer issues remaining.See details at |
I wrote a doc with my analysis, the one I used for first PR 4197. Let me know if this help. [link below] Donald has good slides with timeline of zebra start, which can help to understand it better. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this all looks like the right way to go about it, and the code looks fine.. I'll approve, but I see there is one comment outstanding before pushing.
7ba59d4
to
04cb763
Compare
🛑 Basic BGPD CI results: FAILUREResults table
For details, please contact louberger |
Continuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: FailedFreeBSD 12 amd64 build: Failed (click for details)FreeBSD 12 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/FBSD12AMD64/config.status/config.statusMake failed for FreeBSD 12 amd64 build:
CentOS 7 amd64 build: Failed (click for details)CentOS 7 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/CI005BUILD/config.status/config.statusMake failed for CentOS 7 amd64 build:
Ubuntu 18.04 amd64 build: Failed (click for details)Ubuntu 18.04 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/U1804AMD64/config.status/config.statusMake failed for Ubuntu 18.04 amd64 build:
Ubuntu 14.04 amd64 build: Failed (click for details)Ubuntu 14.04 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/CI001BUILD/config.status/config.statusMake failed for Ubuntu 14.04 amd64 build:
Debian 8 amd64 build: Failed (click for details)Debian 8 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/CI008BLD/config.status/config.statusMake failed for Debian 8 amd64 build:
Ubuntu 18.04 ppc64le build: Failed (click for details)Ubuntu 18.04 ppc64le build: config.log output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/U1804PPC64LEBUILD/config.log/ Ubuntu 18.04 ppc64le build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/U1804PPC64LEBUILD/config.status/config.statusMake failed for Ubuntu 18.04 ppc64le build:
OmniOS amd64 build: Failed (click for details)OmniOS amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/CI010BUILD/config.status/config.statusMake failed for OmniOS amd64 build:
Debian 9 amd64 build: Failed (click for details)Debian 9 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/CI021BUILD/config.status/config.statusMake failed for Debian 9 amd64 build:
NetBSD 7 amd64 build: Failed (click for details)NetBSD 7 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/CI012BUILD/config.status/config.statusMake failed for NetBSD 7 amd64 build:
FreeBSD 11 amd64 build: Failed (click for details)FreeBSD 11 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/CI009BUILD/config.status/config.statusMake failed for FreeBSD 11 amd64 build:
Ubuntu 16.04 amd64 build: Failed (click for details)Ubuntu 16.04 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/CI014BUILD/config.status/config.statusMake failed for Ubuntu 16.04 amd64 build:
Ubuntu 16.04 i386 build: Failed (click for details)Ubuntu 16.04 i386 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7810/artifact/U1604I386/config.status/config.statusMake failed for Ubuntu 16.04 i386 build:
Successful on other platforms
|
<Initial Code from Praveen Chaudhary> Add the a `--graceful_restart X` flag to zebra start that now creates a timer that pops in X seconds and will go through and remove all routes that are older than startup. If graceful_restart is not specified then we will just pop a timer that cleans everything up immediately. Signed-off-by: Praveen Chaudhary <[email protected]> Signed-off-by: Donald Sharp <[email protected]>
This code doees this: a) Imagine ospf installs a route into zebra. Zebra crashes and we restart FRR. If we are using the -k option on zebra than all routes are re-read in, including this OSPF route. b) Now imagine at the same time that zebra is starting backup ospf on a different router looses a link to the this route. c) Since zebra was run with -k this OSPF route is read back in but never replaced and we now have a route pointing out an interface to other routers that cannot handle it. We should never allow users to implement bad options from zebra's perspective that allow them to put themselves into a clear problem state and additionally we have *absolutely* no mechanism to ever fix that broken route without special human interaction. Signed-off-by: Donald Sharp <[email protected]>
04cb763
to
33656d2
Compare
💚 Basic BGPD CI results: SUCCESS, 0 tests failedResults table
For details, please contact louberger |
Continuous Integration Result: SUCCESSFULCongratulations, this patch passed basic tests Tested-by: NetDEF / OpenSourceRouting.org CI System CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-7811/ This is a comment from an automated CI system. Warnings Generated during build:Checkout code: Successful with additional warnings
CLANG Static Analyzer Summary
No Changes in Static Analysis warnings compared to base11 Static Analyzer issues remaining.See details at |
💚 Basic BGPD CI results: SUCCESS, 0 tests failedResults table
For details, please contact louberger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need the docs on this just yet (I'm mostly concerned about ZAPI interactions), so changing from my change request to approved.
@donaldsharp It will be great if this can be back-ported in stable\7.0 and 6.0. |
nope we do not put features into old branches. If someone needs it there they are free to create their own version of it. |
Original Patch in FRR master: FRRouting/frr#4301
Original Patch in FRR master: FRRouting/frr#4301
* [FRR]: Patch for kernel level graceful restart. Original Patch in FRR master: FRRouting/frr#4301 * Rename 0007-zebra-kernel-level-graceful-restart.patch to 0008-zebra-kernel-level-graceful-restart.patch
Add the a
--graceful_restart X
flag to zebra start thatnow creates a timer that pops in X seconds and will go
through and remove all routes that are older than startup.
If graceful_restart is not specified then we will just pop
a timer that cleans everything up immediately.
Signed-off-by: Praveen Chaudhary [email protected]
Signed-off-by: Donald Sharp [email protected]