Skip to content

PFC_WD_timestamp frequent changes in COUNTERS/{PORT} break telemetry SAMPLE mode #22201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zbud-msft opened this issue Apr 1, 2025 · 2 comments · Fixed by sonic-net/sonic-sairedis#1567 or Azure/sonic-sairedis.msft#51
Assignees
Labels
Bug 🐛 regression Triaged this issue has been triaged

Comments

@zbud-msft
Copy link
Contributor

Is it platform specific

mellanox

Importance or Severity

High

Description of the bug

With the changes from the following PRs:

sonic-net/sonic-swss#3391 and sonic-net/sonic-sairedis#1534 we see that telemetry SAMPLE mode breaks when querying table COUNTERS from COUNTERS_DB and any port such as Ethernet0 for example.

Seems like the frequent updates and notifications that SAMPLE mode is constantly reading from redis, it is unable to successfully send data after the first sample.

Other tables in other DBs work fine for SAMPLE mode as well as other platforms and versions previous to these commits.

Steps to Reproduce

  1. Run a 202411 version of SONiC that contains the above changes on a Mellanox SKU.
  2. Ensure telemetry config allows you to query from within telemetry docker
  3. Enter telemetry docker and use the following command to SAMPLE from a port that is up.

gnmi_cli -client_types=gnmi -a 127.0.0.1:50051 -t COUNTERS_DB -logtostderr -insecure -v 2 -q COUNTERS/Ethernet0 -qt s -streaming_type SAMPLE
Should see that the first sample works fine but no other data is sent from server to client.

Actual Behavior and Expected Behavior

Expected: SAMPLE mode for this query should be able to constantly get accurate data at sample interval
Actual: SAMPLE mode doesn't work after the first data send.

Relevant log output

gnmi_cli -client_types=gnmi -a 127.0.0.1:50051 -t COUNTERS_DB -logtostderr -insecure -v 7 -q COUNTERS/Ethernet4 -qt s -streaming_type SAMPLE                  
I0401 00:33:33.700289     373 register.go:113] Attempting client types: [gnmi]
I0401 00:33:33.762503     373 register.go:126] client "gnmi" create with type *client.Client
I0401 00:33:33.766674     373 client.go:182] update:{timestamp:1743467613765764061 prefix:{target:"COUNTERS_DB"} update:{path:{element:"COUNTERS" element:"Ethernet4" elem:{name:"COUNTERS"} elem:{name:"Ethernet4"}} val:{json_ietf_val:"{\"PFC_WD_time_stamp\":\"9949737576589\",\"PFC_WD_time_stamp_last\":\"9949737576589\",\"PORT_BUFFER_DROP_STAT_time_stamp\":\"9902099938960\",\"PORT_STAT_COUNTER_time_stamp\":\"9949650941593\",\"SAI_PORT_STAT_ETHER_IN_PKTS_1024_TO_1518_OCTETS\":\"0\",\"SAI_PORT_STAT_ETHER_IN_PKTS_128_TO_255_OCTETS\":\"670\",\"SAI_PORT_STAT_ETHER_IN_PKTS_1519_TO_2047_OCTETS\":\"1\",\"SAI_PORT_STAT_ETHER_IN_PKTS_2048_TO_4095_OCTETS\":\"2\",\"SAI_PORT_STAT_ETHER_IN_PKTS_256_TO_511_OCTETS\":\"0\",\"SAI_PORT_STAT_ETHER_IN_PKTS_4096_TO_9216_OCTETS\":\"39\",\"SAI_PORT_STAT_ETHER_IN_PKTS_512_TO_1023_OCTETS\":\"1\",\"SAI_PORT_STAT_ETHER_IN_PKTS_64_OCTETS\":\"8\",\"SAI_PORT_STAT_ETHER_IN_PKTS_65_TO_127_OCTETS\":\"752\",\"SAI_PORT_STAT_ETHER_OUT_PKTS_1024_TO_1518_OCTETS\":\"0\",\"SAI_PORT_STAT_ETHER_OUT_PKTS_128_TO_255_OCTETS\":\"671\",\"SAI_PORT_STAT_ETHER_OUT_PKTS_1519_TO_2047_OCTETS\":\"0\",\"SAI_PORT_STAT_ETHER_OUT_PKTS_2048_TO_4095_OCTETS\":\"0\",\"SAI_PORT_STAT_ETHER_OUT_PKTS_256_TO_511_OCTETS\":\"0\",\"SAI_PORT_STAT_ETHER_OUT_PKTS_4096_TO_9216_OCTETS\":\"0\",\"SAI_PORT_STAT_ETHER_OUT_PKTS_512_TO_1023_OCTETS\":\"0\",\"SAI_PORT_STAT_ETHER_OUT_PKTS_64_OCTETS\":\"1\",\"SAI_PORT_STAT_ETHER_OUT_PKTS_65_TO_127_OCTETS\":\"4\",\"SAI_PORT_STAT_ETHER_RX_OVERSIZE_PKTS\":\"0\",\"SAI_PORT_STAT_ETHER_STATS_FRAGMENTS\":\"0\",\"SAI_PORT_STAT_ETHER_STATS_JABBERS\":\"0\",\"SAI_PORT_STAT_ETHER_STATS_TX_NO_ERRORS\":\"676\",\"SAI_PORT_STAT_ETHER_STATS_UNDERSIZE_PKTS\":\"0\",\"SAI_PORT_STAT_ETHER_TX_OVERSIZE_PKTS\":\"0\",\"SAI_PORT_STAT_IF_IN_BROADCAST_PKTS\":\"1\",\"SAI_PORT_STAT_IF_IN_DISCARDS\":\"0\",\"SAI_PORT_STAT_IF_IN_ERRORS\":\"0\",\"SAI_PORT_STAT_IF_IN_FEC_CORRECTABLE_FRAMES\":\"0\",\"SAI_PORT_STAT_IF_IN_FEC_CORRECTED_BITS\":\"0\",\"SAI_PORT_STAT_IF_IN_FEC_NOT_CORRECTABLE_FRAMES\":\"0\",\"SAI_PORT_STAT_IF_IN_FEC_SYMBOL_ERRORS\":\"0\",\"SAI_PORT_STAT_IF_IN_MULTICAST_PKTS\":\"679\",\"SAI_PORT_STAT_IF_IN_NON_UCAST_PKTS\":\"680\",\"SAI_PORT_STAT_IF_IN_OCTETS\":\"705503\",\"SAI_PORT_STAT_IF_IN_UCAST_PKTS\":\"822\",\"SAI_PORT_STAT_IF_IN_UNKNOWN_PROTOS\":\"0\",\"SAI_PORT_STAT_IF_OUT_BROADCAST_PKTS\":\"0\",\"SAI_PORT_STAT_IF_OUT_DISCARDS\":\"0\",\"SAI_PORT_STAT_IF_OUT_ERRORS\":\"0\",\"SAI_PORT_STAT_IF_OUT_MULTICAST_PKTS\":\"676\",\"SAI_PORT_STAT_IF_OUT_NON_UCAST_PKTS\":\"676\",\"SAI_PORT_STAT_IF_OUT_OCTETS\":\"125365\",\"SAI_PORT_STAT_IF_OUT_QLEN\":\"0\",\"SAI_PORT_STAT_IF_OUT_UCAST_PKTS\":\"0\",\"SAI_PORT_STAT_IN_DROPPED_PKTS\":\"0\",\"SAI_PORT_STAT_OUT_DROPPED_PKTS\":\"0\",\"SAI_PORT_STAT_PAUSE_RX_PKTS\":\"0\",\"SAI_PORT_STAT_PAUSE_TX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_0_RX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_0_TX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_1_RX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_1_TX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_2_RX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_2_TX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_3_RX_PAUSE_DURATION_US\":\"0\",\"SAI_PORT_STAT_PFC_3_RX_PAUSE_DURATION_US_last\":\"0\",\"SAI_PORT_STAT_PFC_3_RX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_3_RX_PKTS_last\":\"0\",\"SAI_PORT_STAT_PFC_3_TX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_4_RX_PAUSE_DURATION_US\":\"0\",\"SAI_PORT_STAT_PFC_4_RX_PAUSE_DURATION_US_last\":\"0\",\"SAI_PORT_STAT_PFC_4_RX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_4_RX_PKTS_last\":\"0\",\"SAI_PORT_STAT_PFC_4_TX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_5_RX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_5_TX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_6_RX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_6_TX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_7_RX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_7_TX_PKTS\":\"0\"}"}}}
I0401 00:33:33.766849     373 client.go:182] sync_response:true
[
{
  "COUNTERS_DB": {
    "COUNTERS": {
      "Ethernet4": "{\"PFC_WD_time_stamp\":\"9949737576589\",\"PFC_WD_time_stamp_last\":\"9949737576589\",\"PORT_BUFFER_DROP_STAT_time_stamp\":\"9902099938960\",\"PORT_STAT_COUNTER_time_stamp\":\"9949650941593\",\"SAI_PORT_STAT_ETHER_IN_PKTS_1024_TO_1518_OCTETS\":\"0\",\"SAI_PORT_STAT_ETHER_IN_PKTS_128_TO_255_OCTETS\":\"670\",\"SAI_PORT_STAT_ETHER_IN_PKTS_1519_TO_2047_OCTETS\":\"1\",\"SAI_PORT_STAT_ETHER_IN_PKTS_2048_TO_4095_OCTETS\":\"2\",\"SAI_PORT_STAT_ETHER_IN_PKTS_256_TO_511_OCTETS\":\"0\",\"SAI_PORT_STAT_ETHER_IN_PKTS_4096_TO_9216_OCTETS\":\"39\",\"SAI_PORT_STAT_ETHER_IN_PKTS_512_TO_1023_OCTETS\":\"1\",\"SAI_PORT_STAT_ETHER_IN_PKTS_64_OCTETS\":\"8\",\"SAI_PORT_STAT_ETHER_IN_PKTS_65_TO_127_OCTETS\":\"752\",\"SAI_PORT_STAT_ETHER_OUT_PKTS_1024_TO_1518_OCTETS\":\"0\",\"SAI_PORT_STAT_ETHER_OUT_PKTS_128_TO_255_OCTETS\":\"671\",\"SAI_PORT_STAT_ETHER_OUT_PKTS_1519_TO_2047_OCTETS\":\"0\",\"SAI_PORT_STAT_ETHER_OUT_PKTS_2048_TO_4095_OCTETS\":\"0\",\"SAI_PORT_STAT_ETHER_OUT_PKTS_256_TO_511_OCTETS\":\"0\",\"SAI_PORT_STAT_ETHER_OUT_PKTS_4096_TO_9216_OCTETS\":\"0\",\"SAI_PORT_STAT_ETHER_OUT_PKTS_512_TO_1023_OCTETS\":\"0\",\"SAI_PORT_STAT_ETHER_OUT_PKTS_64_OCTETS\":\"1\",\"SAI_PORT_STAT_ETHER_OUT_PKTS_65_TO_127_OCTETS\":\"4\",\"SAI_PORT_STAT_ETHER_RX_OVERSIZE_PKTS\":\"0\",\"SAI_PORT_STAT_ETHER_STATS_FRAGMENTS\":\"0\",\"SAI_PORT_STAT_ETHER_STATS_JABBERS\":\"0\",\"SAI_PORT_STAT_ETHER_STATS_TX_NO_ERRORS\":\"676\",\"SAI_PORT_STAT_ETHER_STATS_UNDERSIZE_PKTS\":\"0\",\"SAI_PORT_STAT_ETHER_TX_OVERSIZE_PKTS\":\"0\",\"SAI_PORT_STAT_IF_IN_BROADCAST_PKTS\":\"1\",\"SAI_PORT_STAT_IF_IN_DISCARDS\":\"0\",\"SAI_PORT_STAT_IF_IN_ERRORS\":\"0\",\"SAI_PORT_STAT_IF_IN_FEC_CORRECTABLE_FRAMES\":\"0\",\"SAI_PORT_STAT_IF_IN_FEC_CORRECTED_BITS\":\"0\",\"SAI_PORT_STAT_IF_IN_FEC_NOT_CORRECTABLE_FRAMES\":\"0\",\"SAI_PORT_STAT_IF_IN_FEC_SYMBOL_ERRORS\":\"0\",\"SAI_PORT_STAT_IF_IN_MULTICAST_PKTS\":\"679\",\"SAI_PORT_STAT_IF_IN_NON_UCAST_PKTS\":\"680\",\"SAI_PORT_STAT_IF_IN_OCTETS\":\"705503\",\"SAI_PORT_STAT_IF_IN_UCAST_PKTS\":\"822\",\"SAI_PORT_STAT_IF_IN_UNKNOWN_PROTOS\":\"0\",\"SAI_PORT_STAT_IF_OUT_BROADCAST_PKTS\":\"0\",\"SAI_PORT_STAT_IF_OUT_DISCARDS\":\"0\",\"SAI_PORT_STAT_IF_OUT_ERRORS\":\"0\",\"SAI_PORT_STAT_IF_OUT_MULTICAST_PKTS\":\"676\",\"SAI_PORT_STAT_IF_OUT_NON_UCAST_PKTS\":\"676\",\"SAI_PORT_STAT_IF_OUT_OCTETS\":\"125365\",\"SAI_PORT_STAT_IF_OUT_QLEN\":\"0\",\"SAI_PORT_STAT_IF_OUT_UCAST_PKTS\":\"0\",\"SAI_PORT_STAT_IN_DROPPED_PKTS\":\"0\",\"SAI_PORT_STAT_OUT_DROPPED_PKTS\":\"0\",\"SAI_PORT_STAT_PAUSE_RX_PKTS\":\"0\",\"SAI_PORT_STAT_PAUSE_TX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_0_RX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_0_TX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_1_RX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_1_TX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_2_RX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_2_TX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_3_RX_PAUSE_DURATION_US\":\"0\",\"SAI_PORT_STAT_PFC_3_RX_PAUSE_DURATION_US_last\":\"0\",\"SAI_PORT_STAT_PFC_3_RX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_3_RX_PKTS_last\":\"0\",\"SAI_PORT_STAT_PFC_3_TX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_4_RX_PAUSE_DURATION_US\":\"0\",\"SAI_PORT_STAT_PFC_4_RX_PAUSE_DURATION_US_last\":\"0\",\"SAI_PORT_STAT_PFC_4_RX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_4_RX_PKTS_last\":\"0\",\"SAI_PORT_STAT_PFC_4_TX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_5_RX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_5_TX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_6_RX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_6_TX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_7_RX_PKTS\":\"0\",\"SAI_PORT_STAT_PFC_7_TX_PKTS\":\"0\"}"
    }
  }
}^CI0401 00:33:36.218285     373 client.go:193] impl.Recv() received unknown error: rpc error: code = Canceled desc = context canceled
E0401 00:33:36.218908     373 gnmi_cli.go:249] sendQueryAndDisplay(ctx, {Addrs:[127.0.0.1:50051] AddressChains:[] Target:COUNTERS_DB Replica:0 UpdatesOnly:false Queries:[[COUNTERS Ethernet4]] Type:stream Timeout:30s NotificationHandler:<nil> ProtoHandler:<nil> Credentials:<nil> TLS:0xdff1c0 Extra:map[] SubReq:<nil> Streaming_type:SAMPLE Streaming_sample_int:0 Heartbeat_int:0 Suppress_redundant:false}, &{PollingInterval:30s StreamingDuration:0s Count:0 countExhausted:false Delimiter:/ Display:0x8f6240 DisplayPrefix: DisplayIndent:   DisplayType:group DisplayPeer:false Timestamp: DisplaySize:false Latency:false ClientTypes:[gnmi]}):
        rpc error: code = Canceled desc = context canceled

]

Output of show version, show techsupport

SONiC Software Version: SONiC.20241110.07
SONiC OS Version: 12
Distribution: Debian 12.9
Kernel: 6.1.0-22-2-amd64
Build commit: 8a18729450
Build date: Sat Mar 15 19:03:02 UTC 2025
Built by: azureuser@eafc873fc000000

Platform: x86_64-mlnx_msn4700-r0
HwSKU: Mellanox-SN4700-V64
ASIC: mellanox
ASIC Count: 1
Serial Number: MT2417XZ01AM
Model Number: MSN4700-WS2ROS
Hardware Revision: A2
Uptime: 00:15:48 up 13 min,  1 user,  load average: 1.61, 1.54, 0.98
Date: Sat 29 Mar 2025 00:15:48

Attach files (if any)

No response

@bingwang-ms
Copy link
Contributor

MSADO: 31968783

@stephenxs
Copy link
Collaborator

Recently, we observed that the counter-polling interval could be inaccurate, impacting the functionalities that Lua implemented like PFC watchdog. So, we recorded the timestamp when the counters were polled so that the Lua plugin could determine the actual polling intervals by calculating the difference.
They are updated every polling interval and should not be updated more frequently than the usual counters, like SAI_PORT_STAT_PFC_4_RX_PKTS, or SAI_PORT_STAT_PFC_4_RX_PAUSE_DURATION_US which represents the received pause number or paused duration if it increases.

@prabhataravind prabhataravind added the Triaged this issue has been triaged label Apr 9, 2025
echuawu added a commit to echuawu/sonic-mgmt that referenced this issue Apr 14, 2025
…github issue sonic-net#17974 and #22201

Skip test_qos_dscp_mapping on dualtor due to github issue https://redmine.mellanox.com/issues/17974
Skip telemetry test_virtualdb_table_streaming case due to sonic-net/sonic-buildimage#22201
echuawu added a commit to echuawu/sonic-mgmt that referenced this issue Apr 14, 2025
…github issue sonic-net#17974 and #22201

Skip test_qos_dscp_mapping on dualtor due to github issue sonic-net#17974
Skip telemetry test_virtualdb_table_streaming case due to sonic-net/sonic-buildimage#22201

Change-Id: If461fcf40ce8d1e1a7a8f30e7193965646d6e6e7
echuawu added a commit to echuawu/sonic-mgmt that referenced this issue Apr 16, 2025
Skip telemetry test_virtualdb_table_streaming case due to sonic-net/sonic-buildimage#22201

Change-Id: I3b5092e148c83f6178df2da64d8ef1180fbbed22
echuawu added a commit to echuawu/sonic-mgmt that referenced this issue Apr 16, 2025
Skip telemetry test_virtualdb_table_streaming case due to sonic-net/sonic-buildimage#22201

Change-Id: I3b5092e148c83f6178df2da64d8ef1180fbbed22
echuawu added a commit to echuawu/sonic-mgmt that referenced this issue Apr 18, 2025
Skip telemetry test_virtualdb_table_streaming case due to sonic-net/sonic-buildimage#22201

Change-Id: I3b5092e148c83f6178df2da64d8ef1180fbbed22
kcudnik pushed a commit to sonic-net/sonic-sairedis that referenced this issue Apr 21, 2025
…uently update the counter table (#1567)

Move the timestamp out of the COUNTER table to avoid updating the table too frequently and overwhelming the telemetry system.
Fixes sonic-net/sonic-buildimage#22201

Signed-off-by: Stephen Sun <[email protected]>
mssonicbld added a commit to mssonicbld/sonic-sairedis.msft that referenced this issue Apr 28, 2025
…uently update the counter table

Move the timestamp out of the COUNTER table to avoid updating the table too frequently and overwhelming the telemetry system.
Fixes sonic-net/sonic-buildimage#22201
mssonicbld added a commit to Azure/sonic-sairedis.msft that referenced this issue Apr 28, 2025
…uently update the counter table (#51)

Move the timestamp out of the COUNTER table to avoid updating the table too frequently and overwhelming the telemetry system.
Fixes sonic-net/sonic-buildimage#22201
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment