Skip to content

Commit 07cd963

Browse files
stephenxspreetham-singh
authored andcommitted
[Reclaim buffer] Reclaim unused buffer for dynamic buffer model (sonic-net#1910)
Signed-off-by: Stephen Sun [email protected] What I did Reclaim reserved buffer of unused ports for both dynamic and traditional models. This is done by Removing lossless priority groups on unused ports. Applying zero buffer profiles on the buffer objects of unused ports. In the dynamic buffer model, the zero profiles are loaded from a JSON file and applied to APPL_DB if there are admin down ports. The default buffer configuration will be configured on all ports. Buffer manager will apply zero profiles on admin down ports. In the static buffer model, the zero profiles are loaded by the buffer template. Why I did it How I verified it Regression test and vs test. Details if related Static buffer model Remove the lossless buffer priority group if the port is admin-down and the buffer profile aligns with the speed and cable length of the port. Dynamic buffer model Handle zero buffer pools and profiles buffermgrd: add a CLI option to load the JSON file for zero profiles. (done in PR [Reclaiming buffer] Common code update sonic-net#1996) Load them from JSON file into the internal buffer manager's data structure (done in PR [Reclaiming buffer] Common code update sonic-net#1996) Apply them to APPL_DB once there is at least one admin-down port Record zero profiles' names in the pool object it references. By doing so, the zero profile lists can be constructed according to the normal profile list. There should be one profile for each pool on the ingress/egress side. And then apply the zero profiles to the buffer objects of the port. Unload them from APPL_DB once all ports are admin-up since the zero pools and profiles are no longer referenced. Remove buffer pool counter id when the zero pool is removed. Now that it's possible that a pool will be removed from the system, the watermark counter of the pool is removed ahead of the pool itself being removed. Handle port admin status change Currently, there is a logic of removing buffer priority groups of admin down ports. This logic will be reused and extended for all buffer objects, including BUFFER_QUEUE, BUFFER_PORT_INGRESS_PROFILE_LIST, and BUFFER_PORT_EGRESS_PROFILE_LIST. When the port is admin down, The normal profiles are removed from the buffer objects of the port The zero profiles, if provided, are applied to the port When the port is admin up, The zero profiles, if applied, are removed from the port The normal profiles are applied to the port. Ports orchagent exposes the number of queues and priority groups to STATE_DB. Buffer manager can take advantage of these values to apply zero profiles on all the priority groups and queues of the admin-down ports. In case it is not necessary to apply zero profiles on all priority groups or queues on a certain platform, ids_to_reclaim can be customized in the JSON file. Handle all buffer tables, including BUFFER_PG, BUFFER_QUEUE, BUFFER_PORT_INGRESS_PROFILE_LIST and BUFFER_PORT_EGRESS_PROFILE_LIST Originally, only the BUFFER_PG table was cached in the dynamic buffer manager. Now, all tables are cached in order to apply zero profiles when a port is admin down and apply normal profiles when it's up. The index of such tables can include a single port or a list of ports, like BUFFER_PG|Ethernet0|3-4 or BUFFER_PG|Ethernet0,Ethernet4,Ethernet8|3-4. Originally, there is a logic to handle such indexes for the BUFFER_PG table. Now it is reused and extended to handle all the tables. [Mellanox] Plugin to calculate buffer pool size: Originally, buffer for the queue, buffer profile list, etc. were not reclaimed for admin-down ports so they are reserved for all ports. Now, they are reserved for admin-up ports only. Accelerate the progress of applying buffer tables to APPL_DB This is an optimization on top of reclaiming buffer. Don't apply buffer profiles, buffer objects to APPL_DB before buffer pools are applied when the system is starting. This is to apply the items in an order from referenced items to referencing items and try to avoid buffer orchagent retrying due to referenced table items. However, it is still possible that the referencing items are handled before referenced items. In that case, there should not be any error message. [Mellanox] Plugin to calculate buffer pool size: Return the buffer pool sizes value currently in APPL_DB if the pool sizes are not able to be calculated due to lacking some information. This typically happens at the system start. This is to accelerate the progress of pushing tables to APPL_DB.
1 parent a28111c commit 07cd963

File tree

5 files changed

+1686
-238
lines changed

5 files changed

+1686
-238
lines changed

cfgmgr/buffer_pool_mellanox.lua

+122-8
Original file line numberDiff line numberDiff line change
@@ -28,11 +28,23 @@ local port_set_8lanes = {}
2828
local lossless_port_count = 0
2929

3030
local function iterate_all_items(all_items, check_lossless)
31+
-- Iterates all items in all_items, check the buffer profile each item referencing, and update reference count accordingly
32+
-- Arguments:
33+
-- all_items is a list, holding all keys in BUFFER_PORT_INGRESS_PROFILE_LIST or BUFFER_PORT_EGRESS_PROFILE_LIST table
34+
-- format of keys: <port name>|<ID map>, like Ethernet0|3-4
35+
-- Return:
36+
-- 0 successful
37+
-- 1 failure, typically caused by the items just updated are still pended in orchagent's queue
3138
table.sort(all_items)
3239
local lossless_ports = {}
3340
local port
3441
local fvpairs
3542
for i = 1, #all_items, 1 do
43+
-- XXX_TABLE_KEY_SET or XXX_TABLE_DEL_SET existing means the orchagent hasn't handled all updates
44+
-- In this case, the pool sizes are not calculated for now and will retry later
45+
if string.sub(all_items[i], -4, -1) == "_SET" then
46+
return 1
47+
end
3648
-- Count the number of priorities or queues in each BUFFER_PG or BUFFER_QUEUE item
3749
-- For example, there are:
3850
-- 3 queues in 'BUFFER_QUEUE_TABLE:Ethernet0:0-2'
@@ -73,6 +85,83 @@ local function iterate_all_items(all_items, check_lossless)
7385
return 0
7486
end
7587

88+
local function iterate_profile_list(all_items)
89+
-- Iterates all items in all_items, check the buffer profiles each item referencing, and update reference count accordingly
90+
-- Arguments:
91+
-- all_items is a list, holding all keys in BUFFER_PORT_INGRESS_PROFILE_LIST or BUFFER_PORT_EGRESS_PROFILE_LIST table
92+
-- format of keys: <port name>
93+
-- Return:
94+
-- 0 successful
95+
-- 1 failure, typically caused by the items just updated are still pended in orchagent's queue
96+
local port
97+
for i = 1, #all_items, 1 do
98+
-- XXX_TABLE_KEY_SET or XXX_TABLE_DEL_SET existing means the orchagent hasn't handled all updates
99+
-- In this case, the pool sizes are not calculated for now and will retry later
100+
if string.sub(all_items[i], -4, -1) == "_SET" then
101+
return 1
102+
end
103+
port = string.match(all_items[i], "Ethernet%d+")
104+
local profile_list = redis.call('HGET', all_items[i], 'profile_list')
105+
if not profile_list then
106+
return 0
107+
end
108+
for profile_name in string.gmatch(profile_list, "([^,]+)") do
109+
-- The format of profile_list is profile_name,profile_name
110+
-- We need to handle each of the profile in the list
111+
-- The ingress_lossy_profile is shared by both BUFFER_PG|<port>|0 and BUFFER_PORT_INGRESS_PROFILE_LIST
112+
-- It occupies buffers in BUFFER_PG but not in BUFFER_PORT_INGRESS_PROFILE_LIST
113+
-- To distinguish both cases, a new name "ingress_lossy_profile_list" is introduced to indicate
114+
-- the profile is used by the profile list where its size should be zero.
115+
profile_name = 'BUFFER_PROFILE_TABLE:' .. profile_name
116+
if profile_name == 'BUFFER_PROFILE_TABLE:ingress_lossy_profile' then
117+
profile_name = profile_name .. '_list'
118+
if profiles[profile_name] == nil then
119+
profiles[profile_name] = 0
120+
end
121+
end
122+
local profile_ref_count = profiles[profile_name]
123+
if profile_ref_count == nil then
124+
return 1
125+
end
126+
profiles[profile_name] = profile_ref_count + 1
127+
end
128+
end
129+
130+
return 0
131+
end
132+
133+
local function fetch_buffer_pool_size_from_appldb()
134+
local buffer_pools = {}
135+
redis.call('SELECT', config_db)
136+
local buffer_pool_keys = redis.call('KEYS', 'BUFFER_POOL|*')
137+
local pool_name
138+
for i = 1, #buffer_pool_keys, 1 do
139+
local size = redis.call('HGET', buffer_pool_keys[i], 'size')
140+
if not size then
141+
pool_name = string.match(buffer_pool_keys[i], "BUFFER_POOL|([^%s]+)$")
142+
table.insert(buffer_pools, pool_name)
143+
end
144+
end
145+
146+
redis.call('SELECT', appl_db)
147+
buffer_pool_keys = redis.call('KEYS', 'BUFFER_POOL_TABLE:*')
148+
local size
149+
local xoff
150+
local output
151+
for i = 1, #buffer_pools, 1 do
152+
size = redis.call('HGET', 'BUFFER_POOL_TABLE:' .. buffer_pools[i], 'size')
153+
if not size then
154+
size = "0"
155+
end
156+
xoff = redis.call('HGET', 'BUFFER_POOL_TABLE:' .. buffer_pools[i], 'xoff')
157+
if not xoff then
158+
table.insert(result, buffer_pools[i] .. ':' .. size)
159+
else
160+
table.insert(result, buffer_pools[i] .. ':' .. size .. ':' .. xoff)
161+
end
162+
end
163+
end
164+
76165
-- Connect to CONFIG_DB
77166
redis.call('SELECT', config_db)
78167

@@ -82,7 +171,10 @@ total_port = #ports_table
82171

83172
-- Initialize the port_set_8lanes set
84173
local lanes
85-
local number_of_lanes
174+
local number_of_lanes = 0
175+
local admin_status
176+
local admin_up_port = 0
177+
local admin_up_8lanes_port = 0
86178
local port
87179
for i = 1, total_port, 1 do
88180
-- Load lanes from PORT table
@@ -99,13 +191,26 @@ for i = 1, total_port, 1 do
99191
port_set_8lanes[port] = false
100192
end
101193
end
194+
admin_status = redis.call('HGET', ports_table[i], 'admin_status')
195+
if admin_status == 'up' then
196+
admin_up_port = admin_up_port + 1
197+
if (number_of_lanes == 8) then
198+
admin_up_8lanes_port = admin_up_8lanes_port + 1
199+
end
200+
end
201+
number_of_lanes = 0
102202
end
103203

104204
local egress_lossless_pool_size = redis.call('HGET', 'BUFFER_POOL|egress_lossless_pool', 'size')
105205

106206
-- Whether shared headroom pool is enabled?
107207
local default_lossless_param_keys = redis.call('KEYS', 'DEFAULT_LOSSLESS_BUFFER_PARAMETER*')
108-
local over_subscribe_ratio = tonumber(redis.call('HGET', default_lossless_param_keys[1], 'over_subscribe_ratio'))
208+
local over_subscribe_ratio
209+
if #default_lossless_param_keys > 0 then
210+
over_subscribe_ratio = tonumber(redis.call('HGET', default_lossless_param_keys[1], 'over_subscribe_ratio'))
211+
else
212+
over_subscribe_ratio = 0
213+
end
109214

110215
-- Fetch the shared headroom pool size
111216
local shp_size = tonumber(redis.call('HGET', 'BUFFER_POOL|ingress_lossless_pool', 'xoff'))
@@ -161,7 +266,18 @@ local fail_count = 0
161266
fail_count = fail_count + iterate_all_items(all_pgs, true)
162267
fail_count = fail_count + iterate_all_items(all_tcs, false)
163268
if fail_count > 0 then
164-
return {}
269+
fetch_buffer_pool_size_from_appldb()
270+
return result
271+
end
272+
273+
local all_ingress_profile_lists = redis.call('KEYS', 'BUFFER_PORT_INGRESS_PROFILE_LIST*')
274+
local all_egress_profile_lists = redis.call('KEYS', 'BUFFER_PORT_EGRESS_PROFILE_LIST*')
275+
276+
fail_count = fail_count + iterate_profile_list(all_ingress_profile_lists)
277+
fail_count = fail_count + iterate_profile_list(all_egress_profile_lists)
278+
if fail_count > 0 then
279+
fetch_buffer_pool_size_from_appldb()
280+
return result
165281
end
166282

167283
local statistics = {}
@@ -177,9 +293,6 @@ for name in pairs(profiles) do
177293
if name == "BUFFER_PROFILE_TABLE:ingress_lossy_profile" then
178294
size = size + lossypg_reserved
179295
end
180-
if name == "BUFFER_PROFILE_TABLE:egress_lossy_profile" then
181-
profiles[name] = total_port
182-
end
183296
if size ~= 0 then
184297
if shp_enabled and shp_size == 0 then
185298
local xon = tonumber(redis.call('HGET', name, 'xon'))
@@ -211,11 +324,11 @@ if shp_enabled then
211324
end
212325

213326
-- Accumulate sizes for management PGs
214-
local accumulative_management_pg = (total_port - port_count_8lanes) * lossypg_reserved + port_count_8lanes * lossypg_reserved_8lanes
327+
local accumulative_management_pg = (admin_up_port - admin_up_8lanes_port) * lossypg_reserved + admin_up_8lanes_port * lossypg_reserved_8lanes
215328
accumulative_occupied_buffer = accumulative_occupied_buffer + accumulative_management_pg
216329

217330
-- Accumulate sizes for egress mirror and management pool
218-
local accumulative_egress_mirror_overhead = total_port * egress_mirror_headroom
331+
local accumulative_egress_mirror_overhead = admin_up_port * egress_mirror_headroom
219332
accumulative_occupied_buffer = accumulative_occupied_buffer + accumulative_egress_mirror_overhead + mgmt_pool_size
220333

221334
-- Switch to CONFIG_DB
@@ -295,5 +408,6 @@ table.insert(result, "debug:egress_mirror:" .. accumulative_egress_mirror_overhe
295408
table.insert(result, "debug:shp_enabled:" .. tostring(shp_enabled))
296409
table.insert(result, "debug:shp_size:" .. shp_size)
297410
table.insert(result, "debug:total port:" .. total_port .. " ports with 8 lanes:" .. port_count_8lanes)
411+
table.insert(result, "debug:admin up port:" .. admin_up_port .. " admin up ports with 8 lanes:" .. admin_up_8lanes_port)
298412

299413
return result

cfgmgr/buffermgrd.cpp

+4
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
#include <iostream>
1313
#include "json.h"
1414
#include "json.hpp"
15+
#include "warm_restart.h"
1516

1617
using namespace std;
1718
using namespace swss;
@@ -185,6 +186,9 @@ int main(int argc, char **argv)
185186

186187
if (dynamicMode)
187188
{
189+
WarmStart::initialize("buffermgrd", "swss");
190+
WarmStart::checkWarmStart("buffermgrd", "swss");
191+
188192
vector<TableConnector> buffer_table_connectors = {
189193
TableConnector(&cfgDb, CFG_PORT_TABLE_NAME),
190194
TableConnector(&cfgDb, CFG_PORT_CABLE_LEN_TABLE_NAME),

0 commit comments

Comments
 (0)