Skip to content

Commit f380685

Browse files
rodnymolinalguohan
authored andcommitted
Routing-stack warm-reboot feature. (#602)
* Routing-stack warm-reboot feature. Please refer to the corresponding design document for more details: sonic-net/SONiC#239 The following manual UT plan has been successfully executed. UT automation pending. Physical topology formed by various BGP peers connecting to the DUT. Both non-ecmp and ecmp prefixes are learned/tested. Testcase Suite 1: Making use of “pkill -9 bgpd && pkill -9 zebra” as trigger. ============= IPv4 prefixes ========== * Restart zebra/bgpd: - Verify that all prefixes are properly stale-marked and that no change is pushed to AppDB during reconciliation. - Result: PASSED * Restart zebra/bgpd and add one new non-ecmp IPv4 prefix. - To produce a route-state-inconsistency, add prefix in adjacent node before bgp sessions are re-established. - Verify that new-prefix msg is NOT pushed down to AppDB till reconciliation takes place. - Result: PASSED * Restart zebra/bgpd and withdraw one non-ecmp IPv4 prefix. - To produce a route-state-inconsistency, remove prefix in adjacent node before bgp sessions are re-established. - Verify that deleted-prefix msg is NOT pushed down to AppDB till reconciliation takes place. - Result: PASSED * Restart zebra/bgpd and add one new path to an IPv4 ecmp-prefix. - To produce a route-state-inconsistency, add prefix-path in adjacent node before bgp sessions are re-established. - Verify that new prefix-path msg is NOT pushed down to AppDB till reconciliation takes place. - Result: PASSED * Restart zebra/bgpd and withdraw one ecmp-path from an existing ecmp IPv4 prefix. - To produce a route-state-inconsistency, remove prefix-path in adjacent node before bgp sessions are re-established. - Verify that deleted-prefix-path msg is NOT pushed down to AppDB till reconciliation takes place. - Eventually, during reconciliation, this path will be eliminated as it’s not being refreshed by remote-peer. - Result: PASSED IPv6 prefixes ========== * Restart zebra/bgpd: - Verify that all prefixes are properly stale-marked and that no change is pushed to AppDB during reconciliation. - Result: PASSED * Restart zebra/bgpd and add one new non-ecmp IPv6 prefix. - To produce a route-state-inconsistency, add prefix in adjacent node before bgp sessions are re-established. - Verify that new-prefix msg is NOT pushed down to AppDB till reconciliation takes place. - Result: PASSED * Restart zebra/bgpd and withdraw one non-ecmp IPv6 prefix. - To produce a route-state-inconsistency, remove prefix in adjacent node before bgp sessions are re-established. - Verify that deleted-prefix msg is NOT pushed down to AppDB till reconciliation takes place. - Result: PASSED * Restart zebra/bgpd and add one new path to an IPv6 ecmp-prefix. - To produce a route-state-inconsistency, add prefix-path in adjacent node before bgp sessions are re-established. - Verify that new prefix-path msg is NOT pushed down to AppDB till reconciliation takes place. - Result: PASSED * Restart zebra/bgpd and withdraw one ecmp-path from an existing ecmp IPv6 prefix. - To produce a route-state-inconsistency, remove prefix-path in adjacent node before bgp sessions are re-established. - Verify that deleted-prefix-path msg is NOT pushed down to AppDB till reconciliation takes place. - Eventually, during reconciliation, this path will be eliminated as it’s not being refreshed by remote-peer. - Result: PASSED Testcase Suite 2: Making use of sudo service bgp restart” as trigger. ============= IPv4 prefixes ========== * Restart bgp service/docker: - Verify that all prefixes are properly stale-marked and that no change is pushed to AppDB during reconciliation. - Result: PASSED * Restart bgp service/docker and add one new non-ecmp IPv4 prefix. - To produce a route-state-inconsistency, add prefix in adjacent node before bgp sessions are re-established. - Verify that new-prefix msg is NOT pushed down to AppDB till reconciliation takes place. - Result: PASSED * Restart bgp service/docker and withdraw one non-ecmp IPv4 prefix. - To produce a route-state-inconsistency, remove prefix in adjacent node before bgp sessions are re-established. - Verify that deleted-prefix msg is NOT pushed down to AppDB till reconciliation takes place. - Result: PASSED * Restart bgp service/docker and add one new path to an IPv4 ecmp-prefix. - To produce a route-state-inconsistency, add prefix-path in adjacent node before bgp sessions are re-established. - Verify that new prefix-path msg is NOT pushed down to AppDB till reconciliation takes place. - Result: PASSED * Restart bgp service/docker and withdraw one ecmp-path from an existing ecmp IPv4 prefix. - To produce a route-state-inconsistency, remove prefix-path in adjacent node before bgp sessions are re-established. - Verify that deleted-prefix-path msg is NOT pushed down to AppDB till reconciliation takes place. - Eventually, during reconciliation, this path will be eliminated as it’s not being refreshed by remote-peer. - Result: PASSED IPv6 prefixes ========== * Restart bgp service/docker: - Verify that all prefixes are properly stale-marked and that no change is pushed to AppDB during reconciliation. - Result: PASSED * Restart bgp service/docker and add one new non-ecmp IPv6 prefix. - To produce a route-state-inconsistency, add prefix in adjacent node before bgp sessions are re-established. - Verify that new-prefix msg is NOT pushed down to AppDB till reconciliation takes place. - Result: PASSED * Restart bgp service/docker and withdraw one non-ecmp IPv6 prefix. - To produce a route-state-inconsistency, remove prefix in adjacent node before bgp sessions are re-established. - Verify that deleted-prefix msg is NOT pushed down to AppDB till reconciliation takes place. - Result: PASSED * Restart bgp service/docker and add one new path to an IPv6 ecmp-prefix. - To produce a route-state-inconsistency, add prefix-path in adjacent node before bgp sessions are re-established. - Verify that new prefix-path msg is NOT pushed down to AppDB till reconciliation takes place. - Result: PASSED * Restart bgp service/docker and withdraw one ecmp-path from an existing ecmp IPv6 prefix. - To produce a route-state-inconsistency, remove prefix-path in adjacent node before bgp sessions are re-established. - Verify that deleted-prefix-path msg is NOT pushed down to AppDB till reconciliation takes place. - Eventually, during reconciliation, this path will be eliminated as it’s not being refreshed by remote-peer. - Result: PASSED RB= G=lnos-reviewers R=pchaudhary,pmao,rmolina,samaity,sfardeen,zxu A= Signed-off-by: Rodny Molina <[email protected]> * Renaming 'restoration' code and making minor adjustments to fpmsyncd Signed-off-by: Rodny Molina <[email protected]> * Eliminating 'state' associated to field-value-tuples Code has been refactored to reduce the complexity associated to carrying 'state' for every received FV-tuple. Obviously, there's no free-lunch here, this is only possible at the expense of more memory utilization: we are now using two different buffers to store 'old' and 'new' state. Yet, this is a relatively-small price to pay for a much simpler implementation. Signed-off-by: Rodny Molina <[email protected]> * Adding UTs to cover routing-warm-reboot logic. Signed-off-by: Rodny Molina <[email protected]> * Fixing an issue with warm-reboot UTs that prevented an existing test-case from passing As part of these changes i'm also modifying the ip-addresses of my UT setup to avoid clashes with existing/previous testcases. Signed-off-by: Rodny Molina <[email protected]> * Making some small adjustments * Addressing review comments for my UT code * Addressing more review comments. * Refactoring UTs to rely on pubsub notifications instead of logs
1 parent 9fbcb60 commit f380685

File tree

8 files changed

+1290
-24
lines changed

8 files changed

+1290
-24
lines changed

fpmsyncd/Makefile.am

+2-3
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
INCLUDES = -I $(top_srcdir) -I $(FPM_PATH)
1+
INCLUDES = -I $(top_srcdir) -I $(top_srcdir)/warmrestart -I $(FPM_PATH)
22

33
bin_PROGRAMS = fpmsyncd
44

@@ -8,9 +8,8 @@ else
88
DBGFLAGS = -g
99
endif
1010

11-
fpmsyncd_SOURCES = fpmsyncd.cpp fpmlink.cpp routesync.cpp
11+
fpmsyncd_SOURCES = fpmsyncd.cpp fpmlink.cpp routesync.cpp $(top_srcdir)/warmrestart/warmRestartHelper.cpp $(top_srcdir)/warmrestart/warmRestartHelper.h
1212

1313
fpmsyncd_CFLAGS = $(DBGFLAGS) $(AM_CFLAGS) $(CFLAGS_COMMON)
1414
fpmsyncd_CPPFLAGS = $(DBGFLAGS) $(AM_CFLAGS) $(CFLAGS_COMMON)
1515
fpmsyncd_LDADD = -lnl-3 -lnl-route-3 -lswsscommon
16-

fpmsyncd/fpmsyncd.cpp

+66-5
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,24 @@
11
#include <iostream>
22
#include "logger.h"
33
#include "select.h"
4+
#include "selectabletimer.h"
45
#include "netdispatcher.h"
6+
#include "warmRestartHelper.h"
57
#include "fpmsyncd/fpmlink.h"
68
#include "fpmsyncd/routesync.h"
79

10+
811
using namespace std;
912
using namespace swss;
1013

14+
15+
/*
16+
* Default warm-restart timer interval for routing-stack app. To be used only if
17+
* no explicit value has been defined in configuration.
18+
*/
19+
const uint32_t DEFAULT_ROUTING_RESTART_INTERVAL = 120;
20+
21+
1122
int main(int argc, char **argv)
1223
{
1324
swss::Logger::linkToDbNative("fpmsyncd");
@@ -18,25 +29,75 @@ int main(int argc, char **argv)
1829
NetDispatcher::getInstance().registerMessageHandler(RTM_NEWROUTE, &sync);
1930
NetDispatcher::getInstance().registerMessageHandler(RTM_DELROUTE, &sync);
2031

21-
while (1)
32+
while (true)
2233
{
2334
try
2435
{
2536
FpmLink fpm;
2637
Select s;
38+
SelectableTimer warmStartTimer(timespec{0, 0});
2739

28-
cout << "Waiting for connection..." << endl;
40+
/*
41+
* Pipeline should be flushed right away to deal with state pending
42+
* from previous try/catch iterations.
43+
*/
44+
pipeline.flush();
45+
46+
cout << "Waiting for fpm-client connection..." << endl;
2947
fpm.accept();
3048
cout << "Connected!" << endl;
3149

3250
s.addSelectable(&fpm);
51+
52+
/* If warm-restart feature is enabled, execute 'restoration' logic */
53+
bool warmStartEnabled = sync.m_warmStartHelper.checkAndStart();
54+
if (warmStartEnabled)
55+
{
56+
/* Obtain warm-restart timer defined for routing application */
57+
uint32_t warmRestartIval = sync.m_warmStartHelper.getRestartTimer();
58+
if (!warmRestartIval)
59+
{
60+
warmStartTimer.setInterval(timespec{DEFAULT_ROUTING_RESTART_INTERVAL, 0});
61+
}
62+
else
63+
{
64+
warmStartTimer.setInterval(timespec{warmRestartIval, 0});
65+
}
66+
67+
/* Execute restoration instruction and kick off warm-restart timer */
68+
if (sync.m_warmStartHelper.runRestoration())
69+
{
70+
warmStartTimer.start();
71+
s.addSelectable(&warmStartTimer);
72+
}
73+
}
74+
3375
while (true)
3476
{
3577
Selectable *temps;
36-
/* Reading FPM messages forever (and calling "readData" to read them) */
78+
79+
/* Reading FPM messages forever (and calling "readMe" to read them) */
3780
s.select(&temps);
38-
pipeline.flush();
39-
SWSS_LOG_DEBUG("Pipeline flushed");
81+
82+
/*
83+
* Upon expiration of the warm-restart timer, proceed to run the
84+
* reconciliation process and remove warm-restart timer from
85+
* select() loop.
86+
*/
87+
if (warmStartEnabled && temps == &warmStartTimer)
88+
{
89+
SWSS_LOG_NOTICE("Warm-Restart timer expired.");
90+
sync.m_warmStartHelper.reconcile();
91+
s.removeSelectable(&warmStartTimer);
92+
93+
pipeline.flush();
94+
SWSS_LOG_DEBUG("Pipeline flushed");
95+
}
96+
else if (!warmStartEnabled || sync.m_warmStartHelper.isReconciled())
97+
{
98+
pipeline.flush();
99+
SWSS_LOG_DEBUG("Pipeline flushed");
100+
}
40101
}
41102
}
42103
catch (FpmLink::FpmConnectionClosedException &e)

fpmsyncd/routesync.cpp

+48-5
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,8 @@ using namespace std;
1414
using namespace swss;
1515

1616
RouteSync::RouteSync(RedisPipeline *pipeline) :
17-
m_routeTable(pipeline, APP_ROUTE_TABLE_NAME, true)
17+
m_routeTable(pipeline, APP_ROUTE_TABLE_NAME, true),
18+
m_warmStartHelper(pipeline, &m_routeTable, APP_ROUTE_TABLE_NAME, "bgp", "bgp")
1819
{
1920
m_nl_sock = nl_socket_alloc();
2021
nl_connect(m_nl_sock, NETLINK_ROUTE);
@@ -38,10 +39,31 @@ void RouteSync::onMsg(int nlmsg_type, struct nl_object *obj)
3839
return;
3940
}
4041

42+
/*
43+
* Upon arrival of a delete msg we could either push the change right away,
44+
* or we could opt to defer it if we are going through a warm-reboot cycle.
45+
*/
46+
bool warmRestartInProgress = m_warmStartHelper.inProgress();
47+
4148
if (nlmsg_type == RTM_DELROUTE)
4249
{
43-
m_routeTable.del(destipprefix);
44-
return;
50+
if (!warmRestartInProgress)
51+
{
52+
m_routeTable.del(destipprefix);
53+
return;
54+
}
55+
else
56+
{
57+
SWSS_LOG_INFO("Warm-Restart mode: Receiving delete msg: %s\n",
58+
destipprefix);
59+
60+
vector<FieldValueTuple> fvVector;
61+
const KeyOpFieldsValuesTuple kfv = std::make_tuple(destipprefix,
62+
DEL_COMMAND,
63+
fvVector);
64+
m_warmStartHelper.insertRefreshMap(kfv);
65+
return;
66+
}
4567
}
4668
else if (nlmsg_type != RTM_NEWROUTE)
4769
{
@@ -118,8 +140,29 @@ void RouteSync::onMsg(int nlmsg_type, struct nl_object *obj)
118140
vector<FieldValueTuple> fvVector;
119141
FieldValueTuple nh("nexthop", nexthops);
120142
FieldValueTuple idx("ifname", ifnames);
143+
121144
fvVector.push_back(nh);
122145
fvVector.push_back(idx);
123-
m_routeTable.set(destipprefix, fvVector);
124-
SWSS_LOG_DEBUG("RoutTable set: %s %s %s\n", destipprefix, nexthops.c_str(), ifnames.c_str());
146+
147+
if (!warmRestartInProgress)
148+
{
149+
m_routeTable.set(destipprefix, fvVector);
150+
SWSS_LOG_DEBUG("RouteTable set msg: %s %s %s\n",
151+
destipprefix, nexthops.c_str(), ifnames.c_str());
152+
}
153+
154+
/*
155+
* During routing-stack restarting scenarios route-updates will be temporarily
156+
* put on hold by warm-reboot logic.
157+
*/
158+
else
159+
{
160+
SWSS_LOG_INFO("Warm-Restart mode: RouteTable set msg: %s %s %s\n",
161+
destipprefix, nexthops.c_str(), ifnames.c_str());
162+
163+
const KeyOpFieldsValuesTuple kfv = std::make_tuple(destipprefix,
164+
SET_COMMAND,
165+
fvVector);
166+
m_warmStartHelper.insertRefreshMap(kfv);
167+
}
125168
}

fpmsyncd/routesync.h

+7-3
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@
44
#include "dbconnector.h"
55
#include "producerstatetable.h"
66
#include "netmsg.h"
7+
#include "warmRestartHelper.h"
8+
79

810
namespace swss {
911

@@ -16,10 +18,12 @@ class RouteSync : public NetMsg
1618

1719
virtual void onMsg(int nlmsg_type, struct nl_object *obj);
1820

21+
WarmStartHelper m_warmStartHelper;
22+
1923
private:
20-
ProducerStateTable m_routeTable;
21-
struct nl_cache *m_link_cache;
22-
struct nl_sock *m_nl_sock;
24+
ProducerStateTable m_routeTable;
25+
struct nl_cache *m_link_cache;
26+
struct nl_sock *m_nl_sock;
2327
};
2428

2529
}

tests/conftest.py

+47-7
Original file line numberDiff line numberDiff line change
@@ -290,6 +290,26 @@ def stop_swss(self):
290290
cmd += "supervisorctl stop {}; ".format(pname)
291291
self.runcmd(['sh', '-c', cmd])
292292

293+
def start_zebra(dvs):
294+
dvs.runcmd(['sh', '-c', 'supervisorctl start zebra'])
295+
296+
# Let's give zebra a chance to connect to FPM.
297+
time.sleep(5)
298+
299+
def stop_zebra(dvs):
300+
dvs.runcmd(['sh', '-c', 'pkill -x zebra'])
301+
time.sleep(1)
302+
303+
def start_fpmsyncd(dvs):
304+
dvs.runcmd(['sh', '-c', 'supervisorctl start fpmsyncd'])
305+
306+
# Let's give fpmsyncd a chance to connect to Zebra.
307+
time.sleep(5)
308+
309+
def stop_fpmsyncd(dvs):
310+
dvs.runcmd(['sh', '-c', 'pkill -x fpmsyncd'])
311+
time.sleep(1)
312+
293313
def init_asicdb_validator(self):
294314
self.asicdb = AsicDbValidator(self)
295315

@@ -334,13 +354,18 @@ def get_logs(self, modname=None):
334354
raise RuntimeError("Failed to unpack the archive.")
335355
os.system("chmod a+r -R log")
336356

337-
def add_log_marker(self):
357+
def add_log_marker(self, file=None):
338358
marker = "=== start marker {} ===".format(datetime.now().isoformat())
339-
self.ctn.exec_run("logger {}".format(marker))
359+
360+
if file:
361+
self.runcmd(['sh', '-c', "echo \"{}\" >> {}".format(marker, file)])
362+
else:
363+
self.ctn.exec_run("logger {}".format(marker))
364+
340365
return marker
341366

342367
def SubscribeAppDbObject(self, objpfx):
343-
r = redis.Redis(unix_socket_path=self.redis_sock, db=swsscommon.APP_DB)
368+
r = redis.Redis(unix_socket_path=self.redis_sock, db=swsscommon.APPL_DB)
344369
pubsub = r.pubsub()
345370
pubsub.psubscribe("__keyspace@0__:%s*" % objpfx)
346371
return pubsub
@@ -375,26 +400,40 @@ def CountSubscribedObjects(self, pubsub, ignore=None, timeout=10):
375400
return (nadd, ndel)
376401

377402
def GetSubscribedAppDbObjects(self, pubsub, ignore=None, timeout=10):
378-
r = redis.Redis(unix_socket_path=self.redis_sock, db=swsscommon.APP_DB)
403+
r = redis.Redis(unix_socket_path=self.redis_sock, db=swsscommon.APPL_DB)
379404

380405
addobjs = []
381406
delobjs = []
382407
idle = 0
408+
prev_key = None
383409

384410
while True and idle < timeout:
385411
message = pubsub.get_message()
386412
if message:
387413
print message
388414
key = message['channel'].split(':', 1)[1]
415+
# In producer/consumer_state_table scenarios, every entry will
416+
# show up twice for every push/pop operation, so skip the second
417+
# one to avoid double counting.
418+
if key != None and key == prev_key:
419+
continue
420+
# Skip instructions with meaningless keys. To be extended in the
421+
# future to other undesired keys.
422+
if key == "ROUTE_TABLE_KEY_SET" or key == "ROUTE_TABLE_DEL_SET":
423+
continue
389424
if ignore:
390425
fds = message['channel'].split(':')
391426
if fds[2] in ignore:
392427
continue
428+
393429
if message['data'] == 'hset':
430+
(_, k) = key.split(':', 1)
394431
value=r.hgetall(key)
395-
addobjs.append({'key':k, 'vals':value})
432+
addobjs.append({'key':json.dumps(k), 'vals':json.dumps(value)})
433+
prev_key = key
396434
elif message['data'] == 'del':
397-
delobjs.append(key)
435+
(_, k) = key.split(':', 1)
436+
delobjs.append({'key':json.dumps(k)})
398437
idle = 0
399438
else:
400439
time.sleep(1)
@@ -424,7 +463,8 @@ def GetSubscribedAsicDbObjects(self, pubsub, ignore=None, timeout=10):
424463
(_, t, k) = key.split(':', 2)
425464
addobjs.append({'type':t, 'key':k, 'vals':value})
426465
elif message['data'] == 'del':
427-
delobjs.append(key)
466+
(_, t, k) = key.split(':', 2)
467+
delobjs.append({'key':k})
428468
idle = 0
429469
else:
430470
time.sleep(1)

0 commit comments

Comments
 (0)