-
Notifications
You must be signed in to change notification settings - Fork 175
Refactor ledd daemon & fix high CPU usage due to unexpected socket close #548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@Staphylo could you please help review? |
@assrinivasan @vvolam please review |
@prgeor nit: spelling issue in PR description. Also, could you fix the build failure? |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
sonic-ledd/scripts/ledd
Outdated
if fvp: | ||
# TODO: Once these flag entries have been removed from the DB, | ||
# we can remove this check | ||
if key in ["PortConfigDone", "PortInitDone"]: | ||
return 3 | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we ignoring bad fvp or key not present case now?
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Prince George <[email protected]>
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
@davidm-arista @bobbymcgonigle can someone review this PR? |
@@ -124,10 +276,11 @@ def main(): | |||
|
|||
ledd = DaemonLedd() | |||
|
|||
# Listen indefinitely for changes to the PORT table in the Application DB's | |||
# Listen indefinitely for port oper status changes | |||
while True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move this while loop within the run routine ?
The way xcvrd/ycabled run change event monitoring loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure we can ...can this be done in next iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
works
|
||
def isFrontPanelPort(self): | ||
return multi_asic.is_front_panel_port(self._name, self._role) | ||
class FrontPanelPorts: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need an empty line before here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
sonic-ledd/scripts/ledd
Outdated
# Load platform-specific LedControl module | ||
try: | ||
self.led_control = self.load_platform_util(LED_MODULE_NAME, LED_CLASS_NAME) | ||
led_control = self.load_platform_util(LED_MODULE_NAME, LED_CLASS_NAME) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe can you explain, why are we not storing this as a class variable ?
ledd.run() | ||
|
||
if 0 != ledd.run(): | ||
print("ledd.run() failed... Exiting") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add/modify this a syslog here please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vdahiya12 I just kept the behavior as old one. if you see the main() function, it has many other prints....these exit log do come to syslog ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will catch it in supervisor logs, the print logs can remain. Ideally, we should have disallowed print statements as is, they won't go to syslog as I recall, but supervisor logs will if the daemon hangs/exits. For now we can keep as is.
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
…close
Description
[ledd] Refactor ledd daemon and fix high CPU usage due to unexpected redis DB socket close
As part of the refactor, the LED port color is changed to oper UP only if all subports of a front panel ports are UP.
If any subport is oper DOWN, the LED port color is changed to oper DOWN
Motivation and Context
For unkown reason if redis closes the socket, then ledd daemon is continuously busy waiting for socket event and failing all the time. This is resulting in high CPI
The change in Port LED behavior is necessary to prevent situation in the old behavior where even if some subports are oper down, the front panel port LED should not glow green.
How Has This Been Tested?
Close the socket abruptly and see ledd daemon is respawned so that new socket connection is established to redis DB without causing high CPU.
Test this on Arista 7060X6 with 8x100G mode and 2x400G mode
Additional Information (Optional)