Skip to content

[Accton] Support system health monitoring on pddf implemented models #8198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

seanwu-ec
Copy link
Contributor

@seanwu-ec seanwu-ec commented Jul 16, 2021

Why I did it

To support SystemHealthMonitoring feature on Accton's models (which have implemented PDDF APIs).

How I did it

  1. Provided a config file, system_health_monitoring_config.json.
  2. Implement the SystemLED part that'd be manipulated by HealthChecker.
  3. Enable thermalctld that makes more HW checks available

How to verify it

Use show system-health detail and check if all desired checks pass, and confirm if systemLED works reasonably.

////// SEAN: Booting, LED_DIAG = GREEN_BLINKING //////

root@sonic:/home/admin# show system-health detail 
System is currently booting...


////// SEAN: verifying with PSU2 unplugged //////
////// SEAN: Failed the health check, LED_DIAG = AMBER //////

root@sonic:/home/admin# show system-health detail
System status summary

  System status LED  amber
  Services:
    Status: OK
  Hardware:
    Status: Not OK
    Reasons: PSU 2 is missing or not available
	     PSU2_FAN1 is broken

System services and devices monitor list

Name                        Status    Type
--------------------------  --------  ----------
sonic                       OK        System
rsyslog                     OK        Process
root-overlay                OK        Filesystem
var-log                     OK        Filesystem
routeCheck                  OK        Program
diskCheck                   OK        Program
container_checker           OK        Program
container_memory_telemetry  OK        Program
PSU2_FAN1                   Not OK    Fan
PSU 2                       Not OK    PSU
PSU1_FAN1                   OK        Fan
PSU 1                       OK        PSU

System services and devices ignore list

Name             Status    Type
---------------  --------  ------
PSU1_FAN1.speed  Ignored   Device
PSU2_FAN1.speed  Ignored   Device
asic             Ignored   Device
psu.voltage      Ignored   Device
psu.temperature  Ignored   Device


////// SEAN: Ignore PSU2 in the config file. //////
////// SEAN: Passed the health check, LED_DIAG = GREEN //////

root@sonic:/home/admin# vi /usr/share/sonic/device/x86_64-accton_as7326_56x-r0/system_health_monitoring_config.json 
root@sonic:/home/admin# show system-health detail
System status summary

  System status LED  green
  Services:
    Status: OK
  Hardware:
    Status: OK

System services and devices monitor list

Name                        Status    Type
--------------------------  --------  ----------
sonic                       OK        System
rsyslog                     OK        Process
root-overlay                OK        Filesystem
var-log                     OK        Filesystem
routeCheck                  OK        Program
diskCheck                   OK        Program
container_checker           OK        Program
container_memory_telemetry  OK        Program
PSU1_FAN1                   OK        Fan
PSU 1                       OK        PSU

System services and devices ignore list

Name             Status    Type
---------------  --------  ------
psu.voltage      Ignored   Device
PSU1_FAN1.speed  Ignored   Device
PSU 2            Ignored   Device
asic             Ignored   Device
PSU2_FAN1        Ignored   Device
psu.temperature  Ignored   Device
PSU2_FAN1.speed  Ignored   Device

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106

Description for the changelog

[Accton] Support system health monitoring on pddf implemented models

Models: AS9716-32D, AS7326-56X, AS7726-32X, AS7816-64X

A picture of a cute animal (not mandatory but encouraged)

1. Implement SysLED on chassis.
2. Add config to system health monitor.

Signed-off-by: Sean Wu <[email protected]>
1. Workaround to PddfFan.get_target_speed(), where always returns 0
   for PSU's fan.
NOTE: No fans get enumerated by thermalctld because PDDF hasn't
   implemented FanDrawer yet.

Signed-off-by: Sean Wu <[email protected]>
Enable thermalctld for more HW checks.

Signed-off-by: Sean Wu <[email protected]>
Enable thermalctld for more HW checks

Signed-off-by: Sean Wu <[email protected]>
1. Enable thermalctld for more HW checks
2. Correct LED attr_name in pddf-device.json

Signed-off-by: Sean Wu <[email protected]>
Models: AS7326-56X, AS7726-32X, AS9716-32D

Signed-off-by: Sean Wu <[email protected]>
@seanwu-ec
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@seanwu-ec
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@@ -46,3 +46,9 @@ def get_direction(self):

return direction

def get_target_speed(self):
if self.is_psu_fan:
raise NotImplementedError # Target speed not supported for PSU fans
Copy link
Contributor

@FuzailBrcm FuzailBrcm Aug 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanwu-ec
Do you think its better to implement this is the common pddf_fan.py ?

# git diff  platform/pddf/platform-api-pddf-base/sonic_platform_pddf_base/pddf_fan.py
diff --git a/platform/pddf/platform-api-pddf-base/sonic_platform_pddf_base/pddf_fan.py b/platform/pddf/platform-api-pddf-base/sonic_platform_pddf_base/pddf_fan.py
index 30be0dd74..d272d40e6 100755
--- a/platform/pddf/platform-api-pddf-base/sonic_platform_pddf_base/pddf_fan.py
+++ b/platform/pddf/platform-api-pddf-base/sonic_platform_pddf_base/pddf_fan.py
@@ -224,7 +224,7 @@ class PddfFan(FanBase):
         target_speed = 0
         if self.is_psu_fan:
             # Target speed not usually supported for PSU fans
-            target_speed = 0
+            raise NotImplementedError
         else:
             idx = (self.fantray_index-1)*self.platform['num_fans_pertray'] + self.fan_index
             attr = "fan" + str(idx) + "_pwm"
#

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because PSU doesn't reveal target fan speed. So I think it's better pddf_fan.py implement this.
I've raised an issue for that sometime before: #8129

In order to pass the fan status check at https://github.com/Azure/sonic-platform-daemons/blob/master/sonic-thermalctld/scripts/thermalctld#L341, according to the try_get() in thermalctld, there are two ways of doing this. One is raise NotImplementedError another one is return None. You can go whichever you like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @seanwu-ec
I added the fix in common PDDF APIs as part of PR (#7834)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove those changes from individual platform's fan.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. 😊 Thanks, @FuzailBrcm

* Have moved that implementation to PR#7834

Signed-off-by: Sean Wu <[email protected]>
@seanwu-ec seanwu-ec requested a review from a team as a code owner June 10, 2022 02:01
@seanwu-ec
Copy link
Contributor Author

some codes were merged by PR#10053, PR#9982 and PR#8164. so close this PR.

@seanwu-ec seanwu-ec closed this Jul 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants