Skip to content

Commit 9766c8e

Browse files
mprabhu-nokiaCarl Keene
authored and
Carl Keene
committed
[systemd] ASIC status based service bringup on VOQ chassis (sonic-net#7477)
Changes to allow starting per asic services like swss and syncd only if the platform vendor codedetects the asic is detected and notified. The systemd services ordering we want is database->database@->pmon->swss@->syncd@->teamd@->lldp@ There is also a requirement that management, telemetry, snmp dockers can start even if all asic services are not up. Why I did it For VOQ chassis, the fabric cards will have 1-N asics. Also, there could be multiple removable fabric cards. On the supervisor, swss and syncd containers need to be started only if the fabric-card is in Online state and respective asics are detected by the kernel. Using systemd, the dependent services can be in inactive state. How I did it Introduce a mechanism where all ASIC dependent service wait on its state to be published via PMON to REDIS. Once the subscription is received, the service proceeds to create respective dockers. For fixed platforms, systemd is unchanged i.e. the service bring up and docker creation happens in the start()/ExecStartPre routine of the .sh scripts. For VOQ chassis platform on supervisor, the service bringup skips docker creation in the start() routine, but does it in the wait()/ExecStart routine of the .sh scrips. Management dockers are decoupled from ASIC docker creation.
1 parent ba0e360 commit 9766c8e

File tree

8 files changed

+233
-13
lines changed

8 files changed

+233
-13
lines changed

files/build_templates/per_namespace/lldp.service.j2

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,9 @@ StartLimitBurst=3
1616

1717
[Service]
1818
User={{ sonicadmin_user }}
19-
ExecStartPre=/usr/bin/{{docker_container_name}}.sh start{% if multi_instance == 'true' %} %i{% endif %}
20-
ExecStart=/usr/bin/{{docker_container_name}}.sh wait{% if multi_instance == 'true' %} %i{% endif %}
21-
ExecStop=/usr/bin/{{docker_container_name}}.sh stop{% if multi_instance == 'true' %} %i{% endif %}
19+
ExecStartPre=/usr/local/bin/{{docker_container_name}}.sh start{% if multi_instance == 'true' %} %i{% endif %}
20+
ExecStart=/usr/local/bin/{{docker_container_name}}.sh wait{% if multi_instance == 'true' %} %i{% endif %}
21+
ExecStop=/usr/local/bin/{{docker_container_name}}.sh stop{% if multi_instance == 'true' %} %i{% endif %}
2222
RestartSec=30
2323

2424
[Install]

files/build_templates/sonic_debian_extension.j2

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -784,14 +784,17 @@ sudo LANG=C chroot $FILESYSTEM_ROOT fuser -km /sys || true
784784
sudo LANG=C chroot $FILESYSTEM_ROOT umount -lf /sys
785785
{% endif %}
786786

787-
# Copy service scripts (swss, syncd, bgp, teamd, radv)
787+
# Copy service scripts (swss, syncd, bgp, teamd, lldp, radv)
788788
sudo LANG=C cp $SCRIPTS_DIR/swss.sh $FILESYSTEM_ROOT/usr/local/bin/swss.sh
789789
sudo LANG=C cp $SCRIPTS_DIR/syncd.sh $FILESYSTEM_ROOT/usr/local/bin/syncd.sh
790790
sudo LANG=C cp $SCRIPTS_DIR/syncd_common.sh $FILESYSTEM_ROOT/usr/local/bin/syncd_common.sh
791791
sudo LANG=C cp $SCRIPTS_DIR/gbsyncd.sh $FILESYSTEM_ROOT/usr/local/bin/gbsyncd.sh
792792
sudo LANG=C cp $SCRIPTS_DIR/bgp.sh $FILESYSTEM_ROOT/usr/local/bin/bgp.sh
793793
sudo LANG=C cp $SCRIPTS_DIR/teamd.sh $FILESYSTEM_ROOT/usr/local/bin/teamd.sh
794+
sudo LANG=C cp $SCRIPTS_DIR/lldp.sh $FILESYSTEM_ROOT/usr/local/bin/lldp.sh
794795
sudo LANG=C cp $SCRIPTS_DIR/radv.sh $FILESYSTEM_ROOT/usr/local/bin/radv.sh
796+
sudo LANG=C cp $SCRIPTS_DIR/asic_status.sh $FILESYSTEM_ROOT/usr/local/bin/asic_status.sh
797+
sudo LANG=C cp $SCRIPTS_DIR/asic_status.py $FILESYSTEM_ROOT/usr/local/bin/asic_status.py
795798

796799
# Copy sonic-netns-exec script
797800
sudo LANG=C cp $SCRIPTS_DIR/sonic-netns-exec $FILESYSTEM_ROOT/usr/bin/sonic-netns-exec

files/scripts/asic_status.py

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
#!/usr/bin/env python3
2+
3+
"""
4+
bootstrap-asic
5+
"""
6+
try:
7+
import re
8+
import sys
9+
from sonic_py_common import daemon_base
10+
from swsscommon import swsscommon
11+
from sonic_py_common import multi_asic
12+
from sonic_py_common.logger import Logger
13+
except ImportError as e:
14+
raise ImportError(str(e) + " - required module not found")
15+
16+
#
17+
# Constants ====================================================================
18+
#
19+
SYSLOG_IDENTIFIER = 'asic_status.py'
20+
CHASSIS_ASIC_INFO_TABLE = 'CHASSIS_ASIC_TABLE'
21+
SELECT_TIMEOUT_MSECS = 5000
22+
23+
def main():
24+
logger = Logger(SYSLOG_IDENTIFIER)
25+
logger.set_min_log_priority_info()
26+
27+
if len(sys.argv) != 3:
28+
raise Exception('Pass service and valid asic-id as arguments')
29+
30+
service = sys.argv[1]
31+
args_asic_id = sys.argv[2]
32+
33+
# Get num asics
34+
num_asics = multi_asic.get_num_asics()
35+
if num_asics == 0:
36+
logger.log_error('Detected no asics on this platform for service {}'.format(service))
37+
sys.exit(1)
38+
39+
# Connect to STATE_DB and subscribe to chassis-module table notifications
40+
state_db = daemon_base.db_connect("CHASSIS_STATE_DB")
41+
42+
sel = swsscommon.Select()
43+
sst = swsscommon.SubscriberStateTable(state_db, CHASSIS_ASIC_INFO_TABLE)
44+
sel.addSelectable(sst)
45+
46+
while True:
47+
(state, c) = sel.select(SELECT_TIMEOUT_MSECS)
48+
if state == swsscommon.Select.TIMEOUT:
49+
continue
50+
if state != swsscommon.Select.OBJECT:
51+
continue
52+
53+
(asic_key, asic_op, asic_fvp) = sst.pop()
54+
asic_id=re.search(r'\d+$', asic_key)
55+
global_asic_id = asic_id.group(0)
56+
57+
if asic_op == 'SET':
58+
asic_fvs = dict(asic_fvp)
59+
asic_name = asic_fvs.get('name')
60+
if asic_name is None:
61+
logger.log_info('Unable to get asic_name for asic{}'.format(global_asic_id))
62+
continue
63+
64+
if asic_name.startswith('FABRIC-CARD') is False:
65+
logger.log_info('Skipping module with asic_name {} for asic{}'.format(asic_name, global_asic_id))
66+
continue
67+
68+
if (global_asic_id == args_asic_id):
69+
logger.log_info('Detected asic{} is online'.format(global_asic_id))
70+
sys.exit(0)
71+
elif asic_op == 'DEL':
72+
logger.log_info('Detected asic{} is offline'.format(global_asic_id))
73+
sys.exit(1)
74+
else:
75+
continue
76+
77+
if __name__ == "__main__":
78+
main()

files/scripts/asic_status.sh

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
#!/bin/bash
2+
3+
is_chassis_supervisor() {
4+
if [ -f /etc/sonic/chassisdb.conf ]; then
5+
true
6+
return
7+
fi
8+
false
9+
return
10+
}
11+
12+
check_asic_status() {
13+
# Ignore services that are not started in namespace.
14+
if [[ -z $DEV ]]; then
15+
return 0
16+
fi
17+
18+
# For chassis supervisor, wait for asic to be online
19+
/usr/local/bin/asic_status.py $SERVICE $DEV
20+
if [[ $? = 0 ]]; then
21+
debug "$SERVICE successfully detected asic $DEV..."
22+
return 0
23+
fi
24+
debug "$SERVICE failed to detect asic $DEV..."
25+
return 1
26+
}

files/scripts/lldp.sh

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
#!/bin/bash
2+
3+
. /usr/local/bin/asic_status.sh
4+
5+
function debug()
6+
{
7+
/usr/bin/logger $1
8+
/bin/echo `date` "- $1" >> ${DEBUGLOG}
9+
}
10+
11+
start() {
12+
debug "Starting ${SERVICE}$DEV service..."
13+
14+
# On supervisor card, skip starting asic related services here. In wait(),
15+
# wait until the asic is detected by pmon and published via database.
16+
if ! is_chassis_supervisor; then
17+
# start service docker
18+
/usr/bin/${SERVICE}.sh start $DEV
19+
debug "Started ${SERVICE}$DEV service..."
20+
fi
21+
}
22+
23+
wait() {
24+
# On supervisor card, wait for asic to be online before starting the docker.
25+
if is_chassis_supervisor; then
26+
check_asic_status
27+
ASIC_STATUS=$?
28+
29+
# start service docker
30+
if [[ $ASIC_STATUS == 0 ]]; then
31+
/usr/bin/${SERVICE}.sh start $DEV
32+
debug "Started ${SERVICE}$DEV service..."
33+
fi
34+
fi
35+
36+
/usr/bin/${SERVICE}.sh wait $DEV
37+
}
38+
39+
stop() {
40+
debug "Stopping ${SERVICE}$DEV service..."
41+
42+
/usr/bin/${SERVICE}.sh stop $DEV
43+
debug "Stopped ${SERVICE}$DEV service..."
44+
}
45+
46+
DEV=$2
47+
48+
SERVICE="lldp"
49+
DEBUGLOG="/tmp/lldp-debug$DEV.log"
50+
51+
case "$1" in
52+
start|wait|stop)
53+
$1
54+
;;
55+
*)
56+
echo "Usage: $0 {start|wait|stop}"
57+
exit 1
58+
;;
59+
esac

files/scripts/swss.sh

Lines changed: 21 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ if [[ -f /etc/sonic/${SERVICE}_multi_inst_dependent ]]; then
1212
MULTI_INST_DEPENDENT="${MULTI_INST_DEPENDENT} cat /etc/sonic/${SERVICE}_multi_inst_dependent"
1313
fi
1414

15+
. /usr/local/bin/asic_status.sh
16+
1517
function debug()
1618
{
1719
/usr/bin/logger $1
@@ -158,15 +160,31 @@ start() {
158160
clean_up_tables STATE_DB "'PORT_TABLE*', 'MGMT_PORT_TABLE*', 'VLAN_TABLE*', 'VLAN_MEMBER_TABLE*', 'LAG_TABLE*', 'LAG_MEMBER_TABLE*', 'INTERFACE_TABLE*', 'MIRROR_SESSION*', 'VRF_TABLE*', 'FDB_TABLE*', 'FG_ROUTE_TABLE*', 'BUFFER_POOL*', 'BUFFER_PROFILE*', 'MUX_CABLE_TABLE*'"
159161
fi
160162

161-
# start service docker
162-
/usr/bin/${SERVICE}.sh start $DEV
163-
debug "Started ${SERVICE}$DEV service..."
163+
# On supervisor card, skip starting asic related services here. In wait(),
164+
# wait until the asic is detected by pmon and published via database.
165+
if ! is_chassis_supervisor; then
166+
# start service docker
167+
/usr/bin/${SERVICE}.sh start $DEV
168+
debug "Started ${SERVICE}$DEV service..."
169+
fi
164170

165171
# Unlock has to happen before reaching out to peer service
166172
unlock_service_state_change
167173
}
168174

169175
wait() {
176+
# On supervisor card, wait for asic to be online before starting the docker.
177+
if is_chassis_supervisor; then
178+
check_asic_status
179+
ASIC_STATUS=$?
180+
181+
# start service docker
182+
if [[ $ASIC_STATUS == 0 ]]; then
183+
/usr/bin/${SERVICE}.sh start $DEV
184+
debug "Started ${SERVICE}$DEV service..."
185+
fi
186+
fi
187+
170188
start_peer_and_dependent_services
171189

172190
# Allow some time for peer container to start

files/scripts/syncd_common.sh

Lines changed: 21 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@
1111
# For examples of these, see gbsyncd.sh and syncd.sh.
1212
#
1313

14+
. /usr/local/bin/asic_status.sh
15+
1416
function debug()
1517
{
1618
/usr/bin/logger $1
@@ -104,14 +106,30 @@ start() {
104106

105107
startplatform
106108

107-
# start service docker
108-
/usr/bin/${SERVICE}.sh start $DEV
109-
debug "Started ${SERVICE} service..."
109+
# On supervisor card, skip starting asic related services here. In wait(),
110+
# wait until the asic is detected by pmon and published via database.
111+
if ! is_chassis_supervisor; then
112+
# start service docker
113+
/usr/bin/${SERVICE}.sh start $DEV
114+
debug "Started ${SERVICE}$DEV service..."
115+
fi
110116

111117
unlock_service_state_change
112118
}
113119

114120
wait() {
121+
# On supervisor card, wait for asic to be online before starting the docker.
122+
if is_chassis_supervisor; then
123+
check_asic_status
124+
ASIC_STATUS=$?
125+
126+
# start service docker
127+
if [[ $ASIC_STATUS == 0 ]]; then
128+
/usr/bin/${SERVICE}.sh start $DEV
129+
debug "Started ${SERVICE}$DEV service..."
130+
fi
131+
fi
132+
115133
waitplatform
116134

117135
/usr/bin/${SERVICE}.sh wait $DEV

files/scripts/teamd.sh

Lines changed: 21 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
#!/bin/bash
22

3+
. /usr/local/bin/asic_status.sh
4+
35
function debug()
46
{
57
/usr/bin/logger $1
@@ -48,12 +50,28 @@ start() {
4850
debug "Warm boot flag: ${SERVICE}$DEV ${WARM_BOOT}."
4951
debug "Fast boot flag: ${SERVICE}$DEV ${Fast_BOOT}."
5052

51-
# start service docker
52-
/usr/bin/${SERVICE}.sh start $DEV
53-
debug "Started ${SERVICE}$DEV service..."
53+
# On supervisor card, skip starting asic related services here. In wait(),
54+
# wait until the asic is detected by pmon and published via database.
55+
if ! is_chassis_supervisor; then
56+
# start service docker
57+
/usr/bin/${SERVICE}.sh start $DEV
58+
debug "Started ${SERVICE}$DEV service..."
59+
fi
5460
}
5561

5662
wait() {
63+
# On supervisor card, wait for asic to be online before starting the docker.
64+
if is_chassis_supervisor; then
65+
check_asic_status
66+
ASIC_STATUS=$?
67+
68+
# start service docker
69+
if [[ $ASIC_STATUS == 0 ]]; then
70+
/usr/bin/${SERVICE}.sh start $DEV
71+
debug "Started ${SERVICE}$DEV service..."
72+
fi
73+
fi
74+
5775
/usr/bin/${SERVICE}.sh wait $DEV
5876
}
5977

0 commit comments

Comments
 (0)