-
Notifications
You must be signed in to change notification settings - Fork 1.6k
hostcfgd: Handle missed tacacs updates between load & listen #8223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…pdates between last config load and db-listen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this change in master as well ?
@@ -25,6 +26,20 @@ TACPLUS_SERVER_PASSKEY_DEFAULT = "" | |||
TACPLUS_SERVER_TIMEOUT_DEFAULT = "5" | |||
TACPLUS_SERVER_AUTH_TYPE_DEFAULT = "pap" | |||
|
|||
global_lock = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iso of global_lock
, can we move this lock to the AaaCfg()
? To prevent another classes in this file from using this lock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If anyone else (another subscriber) gets a need, they could use this.
@renukamanavalan and @yxieca : when the issue is seen the steps are:- a) load minigraph Basically delay bewteen restart hostcfgd and config command is 10-15 seconds. I think if we can randomize delay we might be able to repro this Restart of hostcfgd service as part of load minigraph |
Managed a repro with following script. I could repro reliably in two systems. In one case, the pause time of 5 seconds worked, but in another it took 11 seconds. Again this is pretty dependent on many other factors. With help of load_minigraph, we could put control plane on high load, which helped get a distinct pause between load & listen.
|
def tacacs_server_update(self, key, data, modify_conf=True): | ||
if data == {}: | ||
if key in self.tacplus_servers: | ||
del self.tacplus_servers[key] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we don't need this code anymore ? all these changes looks unrelated to this fix.
Can we have separate PR for reason of these changes or at least can we add to PR description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this code anymore. The whole purpose of this code is to keep a local cache of tacacs & aaa. Maintain the cache on updates and use the cache. We don't do caching anymore.
We get it when we need it. This handles the missed-updates in simple manner and btw, we don't gain any by keeping duplicate cache in mem.
Why I did it
The time gap between last config load & db-listen seem to have increased.
Any config updates that occurred in this gap gets missed by db-listen.
This could miss updating /etc/pam.d/common-auth-sonic
How I did it
Add a one shot timer, just before db-listen. The timer will fire after the subscribe is done
When the timer fires, reload tacacs & aaa
How to verify it
A tough repro. So you may explicitly add some sleep just before listen.
Make sure the sleep is at least 10 seconds less than timer.
Write a log just before sleep.
restart hostcfgd and watch syslog for the added log line
Upon this log, make some tacacs updates that would have some change in
/etc/pam.d/common-auth-sonic.
Watch for changes in that file
Which release branch to backport (provide reason below if selected)
Description for the changelog
A picture of a cute animal (not mandatory but encouraged)