Skip to content

[SSD Generic] Add support for parsing nvme ssd model, health and temperature #265

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 10, 2022

Conversation

keboliu
Copy link
Collaborator

@keboliu keboliu commented Feb 25, 2022

Signed-off-by: Kebo Liu [email protected]

Description

enhance ssd_generic to support parsing NVME SSD info from smartctl output
Add test case for new added nvme parsing logic

Motivation and Context

in the output of smartctl, the related keys for NVME SSD are different:

  • model info key is "Model Number"
  • "Percentage Used" indicates the wear-out level
  • "Temperature" provides the thermal info

The new change will look for the above keys when the device is nvme SSD

below is an example of smartctl output on an NVME SSD

root@sonic:/home/admin# smartctl -a /dev/nvme0n1
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-8-2-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, [www.smartmontools.org](https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.smartmontools.org%2F&data=04%7C01%7Ckebol%40nvidia.com%7C7bd4d46f236441d936d608d9f695d1ad%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637811945587622712%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=I6Q%2Fsbo1sucai1mMCygQrN5mEGS1ZrHTQYIbfIRqTw0%3D&reserved=0)

=== START OF INFORMATION SECTION ===
Model Number:                       SFPC020GM1EC2TO-I-5E-11P-STD
Serial Number:                      A0221030722410000027
Firmware Version:                   COT6OQ
PCI Vendor/Subsystem ID:            0x1dd4
IEEE OUI Identifier:                0x8c6078

……

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        43 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    1,302,451 [666 GB]
Data Units Written:                 6,251,447 [3.20 TB]
Host Read Commands:                 23,534,308
Host Write Commands:                77,179,910
Controller Busy Time:               4,861
Power Cycles:                       454
Power On Hours:                     3,209
Unsafe Shutdowns:                   434
Media and Data Integrity Errors:    0
Error Information Log Entries:      4,636
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               47 Celsius
Temperature Sensor 2:               44 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0       4636     0  0x0008  0x0004      -            0     1     -
  1       4635     0  0x001a  0x0004      -            0     1     -
  2       4634     0  0x0009  0x0004      -            0     1     -

How Has This Been Tested?

test ssdutil works as expected.
test show platform ssdhealth works as expected

Additional Information (Optional)

@keboliu keboliu marked this pull request as draft February 28, 2022 11:00
@keboliu keboliu marked this pull request as ready for review March 1, 2022 13:29
@keboliu keboliu force-pushed the nvme-ssd-support branch from 099f992 to 27e2f0f Compare March 2, 2022 05:18
@keboliu keboliu requested review from stephenxs and prgeor March 2, 2022 05:39
@liat-grozovik
Copy link
Collaborator

@prgeor kindly reminder to review and signoff

@prgeor prgeor added the SSD label Mar 8, 2022
@prgeor
Copy link
Collaborator

prgeor commented Mar 8, 2022

++ @sujinmkang

@keboliu keboliu requested a review from sujinmkang March 8, 2022 14:21
@prgeor prgeor merged commit 83c4345 into sonic-net:master Mar 10, 2022
liat-grozovik pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Apr 7, 2022
Update sonic-platform-common submodule to pickup new commits:
01512ec [SSD]Enhance ssd_generic with more error handling to avoid python crash sonic-net/sonic-platform-common#271
ac3e7f1 [y_cable][Broadcom] update the BRCM y_cable driver to release 2.0 sonic-net/sonic-platform-common#263
573717a [Credo][Ycable] Fix Credo firmware download API download_firmware flag sonic-net/sonic-platform-common#269
a844f18 [xcvr] Add get_module_fw_info method to XcvrApi class. sonic-net/sonic-platform-common#267
35bad16 [sfputil]Refactoring read_porttab_mappings sonic-net/sonic-platform-common#264
83c4345 [SSD Generic] Add support for parsing nvme ssd model, health and temperature sonic-net/sonic-platform-common#265
5da31e1 [ycable][credo] Fix the is_link_active API for Credo Ycable sonic-net/sonic-platform-common#260
931c6ea [Y-Cable][Credo] add theading locker to support thread-safe calling, add SKU check for download_firmware API. sonic-net/sonic-platform-common#222
ff3aa75 Fix SFF8472 Enhanced Options sonic-net/sonic-platform-common#259
a8a83e9 [ssd] Allow individual vendor parsers to handle errors sonic-net/sonic-platform-common#252

Signed-off-by: Kebo Liu <[email protected]>
judyjoseph pushed a commit that referenced this pull request Apr 11, 2022
…erature (#265)

* add support for parsing nvme ssd model, health and temperature

Signed-off-by: Kebo Liu <[email protected]>

* Add test case for sfp_ssd

Signed-off-by: Kebo Liu <[email protected]>
@keboliu keboliu deleted the nvme-ssd-support branch October 28, 2023 03:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants