You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| 0.2 | 07/01/2020 | Bandaru Viswanath | Major update to accomodate enhancements to use new TAM infrastructure, DB schmas and UI |
85
+
| 0.3 | 06/11/2021 | Bandaru Viswanath | Introduce the Local Mode |
86
+
85
87
86
88
## About This Manual
87
89
@@ -106,13 +108,25 @@ This document describes the high level design of Drop Monitor feature in SONiC.
106
108
107
109
# 1 Feature Overview
108
110
109
-
The Drop Monitor feature in SONiC allows the user to setup packet-drop monitoring sessions for specific flows. A Collector, identified by an IP address and associated transport parameters, can be configured on the switch to send packet drop-reports.
111
+
The Drop Monitor feature in SONiC allows the user to setup packet-drop monitoring sessions for specific flows. A Collector, identified by an IP address and associated transport parameters, can be configured on the switch to send packet drop-reports. This mode where reports are sent to an external collector is termed `external` mode.
112
+
113
+
Additionally, to enable quick and targetted packet-drop debugging, the Drop Monitor feature supports reporting information locally about dropped flows without requiring an external Collector. This mode is termed `local` mode.
114
+
115
+
The two modes - *local* mode and *external* mode are mutually exclusive. That is, when an external collector is configured, information on dropped-flows is unavailable locally on the Switch. Likewise, when used in *local* mode, drop reports are not sent to any external collector.
116
+
117
+
The *external* mode is the default mode.
118
+
119
+
The *local* mode is meant for debugging purposes only and is limited interms of scale (number of flows that can be monitored). It is not expected as a replacement for true drop monitoring with an external Collector.
110
120
111
121
## 1.1 Requirements
112
122
113
123
### 1.1.1 Functional Requirements
114
124
115
-
1.0 Drop Monitor feature allows user to configure a Drop Monitor session on a given switch and send the drop-reports to a specified collector. Drop Monitor session is defined by flow classifiers that are used to identify a flow that needs to be monitored for packet drops.
125
+
1.0 Drop Monitor feature allows user to configure a Drop Monitor session on a given switch. Drop Monitor session is defined by flow classifiers that are used to identify a flow that needs to be monitored for packet drops.
126
+
127
+
1.1 Drop Monitor supports *external* mode, where it can send the drop-reports to a specified collector.
128
+
129
+
1.2 Drop Monitor supports *local* mode where can provide information on dropped-flows on the Switch.
116
130
117
131
2.0 Drop Monitor provisioning as listed below.
118
132
@@ -124,7 +138,9 @@ The Drop Monitor feature in SONiC allows the user to setup packet-drop monitorin
124
138
125
139
2.4 TAM collector configuration that can be attached to Drop Monitor session to send drop reports.
126
140
127
-
2.5 An aging-interval configuration. If the Drop Monitor feature doesn't notice packet drops for this duration, it considers packet drops to have stopped.
141
+
2.5 The *local* mode is facilitated with a built-in collector named "local". This collector provides flow information locally on the Switch.
142
+
143
+
2.6 An aging-interval configuration. If the Drop Monitor feature doesn't notice packet drops for this duration, it considers packet drops to have stopped.
128
144
129
145
3.0 When the first packet of the flow is dropped by the switch, a "Drop-start" report is sent to the collector. This report contains the event type (Drop-start), first 128 bytes of the packet dropped, flow details and the drop reasons for the packet drop.
130
146
@@ -149,11 +165,13 @@ The TAM Drop Monitor feature supports the new management framework and KLISH CLI
149
165
- To activate / de-activate the feature
150
166
- To create/clear appropriate Drop Monitor configuration on a per-flow-group basis and switch-wide.
151
167
- To display current status and statistics for the Drop Monitor on a per flow-group basis.
168
+
- To display packet drop information on a per-flow basis, when the Drop Monotor feature is used in *local* mode.
152
169
153
170
### 1.1.3 Scalability Requirements
154
171
155
172
- Number of Drop Monitor sessions that can be supported is proportional to the availability of resources in hardware such as ACLs. No specific constraints are imposed.
156
173
- Only a single collector is supported.
174
+
- When used in *local* mode, not more than 100 flows may be monitored for packet drops.
157
175
158
176
## 1.2 Design Overview
159
177
@@ -225,6 +243,12 @@ The DropMonitorMgr runs in the TAM docker and is used to pass drop monitor confi
225
243
226
244
The DropMonitorMgr configures the source IP address to be used in drop reports to the system IP address. 9073 is configured as the source port number to be used in drop reports.
227
245
246
+
## 3.1.2 Local Mode
247
+
248
+
A thread DropMonitorCollector is run as part of the DropMonitorMgr daemon, when set in *local* mode. SAI is setup to send the drop reports locally, to a socket listening on a UDP port number. DropMonitorCollector thread receives the drop reports, deciphers them and loads appropriate information to the TAM_DROPMONITOR_FLOW_STATUS_TABLE table in the COUNTERS_DB.
249
+
250
+
A specific CPU queue is configured to receive the drop-reports from the hardware. This queue is rate-limited to 500pps to prevent flooding of the CPU. For the local debugging purposes, not more than 100 flows will be needed for monitoring. Given that drop reports are stateful (not all drops are reported by hardware), this number 500pps is more than sufficient.
251
+
228
252
## 3.2 DB Changes
229
253
230
254
### 3.2.1 CONFIG DB
@@ -236,6 +260,7 @@ TAM\_DROPMONITOR\_TABLE
236
260
key = global ; Only one instance and
237
261
; has a fixed key ”global".
238
262
aging-interval = 1 * 5DIGIT ; Aging interval in seconds
263
+
mode = 1 * 255VCHAR ; "external" or "local"
239
264
240
265
Example:
241
266
> keys *TAM_DROPMONITOR_TABLE*
@@ -245,6 +270,8 @@ TAM\_DROPMONITOR\_TABLE
245
270
246
271
1) "aging-interval"
247
272
2) 3600
273
+
3) "mode"
274
+
4) "external"
248
275
249
276
TAM\_DROPMONITOR\_SESSIONS\_TABLE
250
277
@@ -354,7 +381,20 @@ N/A
354
381
355
382
### 3.2.5 COUNTER DB
356
383
357
-
N/A
384
+
TAM\_DROPMONITOR\_FLOW_STATUS\_TABLE
385
+
386
+
;Defines TAM drop monitor flow status.
387
+
388
+
key = flow-id ; Flow Id, a unique integer
389
+
src-ip = ipv4_address ; SRC IP of the flow 5-tuple
390
+
src-port = 1 * 4DIGIT ; SRC L4 port number of the flow 5-tuple
391
+
dst-ip = ipv4_address ; DST IP of the flow 5-tuple
392
+
dst-port = 1 * 4DIGIT ; DST L4 port number of the flow 5-tuple
393
+
protocol = 1 * 4DIGIT ; Protocol number of the flow 5-tuple
394
+
state = 1*255VCHAR ; drop state for the flow
395
+
; can be one of "dropping" or "inactive"
396
+
timestamp = 1*255VCHAR ; time at which the drops were detected
397
+
drop-reason = 1*255VCHAR ; Reason for packet drop
358
398
359
399
360
400
## 3.3 Switch State Service Design
@@ -425,8 +465,8 @@ A Drop Monitoring session associated a previously defined flow-group as describe
425
465
- The Drop Monitor session must have a unique name for referencing.
426
466
- The flow-group must be previously created with the `flow-group` command (under `config-tam` hierarchy). For drop-monitoring, the flow-group must be associated with an interface.
427
467
- The sampling-rate can be set, by referencing a previously created sampler, created with the `sampler` command (under `config-tam` hierarchy).
428
-
- A collector must be associated with the session, where the drop-reports will be sent. The collector must be previously created with the `collector` command (under `config-tam` hierarchy)..
429
-
468
+
- A collector must be associated with the session, where the drop-reports will be sent. The collector must be previously created with the `collector` command (under `config-tam` hierarchy). When Drop Monitor is setup in `local` mode, the collector parameter is optional and is ignored.
469
+
430
470
When a sesssion that is previously created is removed (with the `no` command), the associated flows are no longer monitored for drops by the switch.
431
471
432
472
The following attribtes are supported for drop-monitor sessions.
@@ -435,7 +475,7 @@ The following attribtes are supported for drop-monitor sessions.
The `mode` command changes the Drop Monitoring mode. By default, the `external` mode is used. This command can be used to change the mode to `local` and back. No active sessions must be present at the time of a mode switch.
493
+
494
+
The command syntax for setting up the aging interval for Drop Monitoring is as follows:
495
+
496
+
```
497
+
sonic (config-tam-dm)# [no] mode { external | local }
|`mode`| One of the two strings `external` and `local`, representing the monitoring mode, Default value is `external`|
502
+
503
+
The no form of the command reverts the mode to the default i.e., `external` mode.
504
+
505
+
#### 3.6.2.5 Clearing dropped flows (Local Mode)
506
+
507
+
This commands clears all flows that are currently tracked as dropped-flows by the Drop Monitor while in Local Mode. It removes the associated information from the TAM_DROPMONITOR_FLOW_STATUS_TABLE. If the flow experiences drops again, they will be reported again.
508
+
509
+
The command syntax for clearing the Drop Monitor tracked dropped-flows is as follows:
510
+
511
+
```
512
+
sonic# clear tam drop-monitor flows
513
+
514
+
```
515
+
450
516
### 3.6.3 Show Commands
451
517
452
518
#### 3.6.3.1 Listing the Drop Monitor attributes
@@ -464,12 +530,13 @@ sonic # show tam drop-monitor
464
530
Status : Active
465
531
Switch ID : 2020
466
532
Aging Interval : 60
533
+
Mode : external
467
534
468
535
```
469
536
470
-
#### 3.6.3.1 Listing the Drop Monitor sessions
537
+
#### 3.6.3.2 Listing the Drop Monitor sessions
471
538
472
-
The following command lists the details for all drop-monitor sessions or for a specific session. Note that only explicitly configured tuples in the associated flow-group are displayed.
539
+
The following command lists the details for all drop-monitor sessions or for a specific session. Note that only explicitly configured tuples in the associated flow-group are displayed. When configured in `local` mode, the names of the *Collector* are shown with the string *local*.
473
540
474
541
```
475
542
sonic # show tam drop-monitor sessions [<name>]
@@ -502,6 +569,30 @@ Packet Count : 7656
502
569
503
570
```
504
571
572
+
#### 3.6.3.2 Listing the dropped flows (Local mode)
573
+
574
+
The following command lists the details for all flows that are dropped by the Switch. The details include the 5-tuple of the flow, time-stamp of the first detected drop and the drop-reason.
575
+
576
+
The flows listed in this command output are tracked until they are no longer dropped (drop-stop event) or user explicitly clears via the `clear` command.
577
+
578
+
This command provides appropriate data only when Drop Monitor is configured in `local` mode. Otherwise, it returns appropriate error.
This section provides a sample Drop Monitor workflow using CLI, for monitoring the packet drops as described below.
@@ -721,6 +812,8 @@ TBD
721
812
722
813
* Drop Monitor feature is an *advanced* feature that is not available in all the Broadcom SONiC packages.
723
814
815
+
* The Drop Monitor feature is a BroadcomSONiC-Only feature. This will not be contributed to Community.
816
+
724
817
## Specific Limitations
725
818
726
819
Drop Monitor feature in SONiC inherits the limitations of the underlying firmware and the hardware. These are listed below.
@@ -729,6 +822,18 @@ Drop Monitor feature in SONiC inherits the limitations of the underlying firmwar
729
822
2. Drop Monitor flows must be IPv4 flows
730
823
3. Drop Monitor is supported on TD3-X7, TH2 and TH3 platforms only.
731
824
825
+
## Local Mode design notes
826
+
827
+
The 'Local' mode is meant for limited number of flows (<100 flows) for drop monitoring on the Switch. Otherwise, the number of reports may overwhelm the CPU. A specific CPU queue is assigned for this traffic and is ratelimited to 500pps for preventing CPU spikes.
828
+
829
+
A side effect of this rate-limiting is that some drop reports may get dropped.
830
+
831
+
1. If the drop-start reports are dropped, then the associated flows won't be reported (as dropped) in COUNTERS_DB.
832
+
2. If the drop-active reports are dropped, then the drop-reasons are not updated COUNTERS_DB.
833
+
3. If the drop-stop reports are dropped, then the flows remain in the COUNTERS_DB until they are explicitly cleared via the clear command.
834
+
835
+
However, given Local mode is used for limited debugguing - less than 100 flows - the worst-case number of drop-reports hitting CPU should always remain less than the rate-limit of 500pps.
0 commit comments