Skip to content

Commit 0285bfe

Browse files
authored
[chassis] Fix issues regarding database service failure handling and mid-plane connectivity for namespace. (#10500)
What/Why I did: Issue1: By setting up of ipvlan interface in interface-config.sh we are not tolerant to failures. Reason being interface-config.service is one-shot and do not have restart capability. Scenario: For example if let's say database service goes in fail state then interface-services also gets failed because of dependency check but later database service gets restart but interface service will remain in stuck state and the ipvlan interface nevers get created. Solution: Moved all the logic in database service from interface-config service which looks more align logically also since the namespace is created here and all the network setting (sysctl) are happening here.With this if database starts we recreate the interface. Issue 2: Use of IPVLAN vs MACVLAN Currently we are using ipvlan mode. However above failure scenario is not handle correctly by ipvlan mode. Once the ipvlan interface is created and ip address assign to it and if we restart interface-config or database (new PR) service Linux Kernel gives error "Error: Address already assigned to an ipvlan device." based on this:https://github.com/torvalds/linux/blob/master/drivers/net/ipvlan/ipvlan_main.c#L978Reason being if we do not do cleanup of ip address assignment (need to be unique for IPVLAN) it remains in Kernel Database and never goes to free pool even though namespace is deleted. Solution: Considering this hard dependency of unique ip macvlan mode is better for us and since everything is managed by Linux Kernel and no dependency for on user configured IP address. Issue3: Namespace database Service do not check reachability to Supervisor Redis Chassis Server. Currently there is no explicit check as we never do Redis PING from namespace to Supervisor Redis Chassis Server. With this check it's possible we will start database and all other docker even though there is no connectivity and will hit the error/failure late in cycle Solution: Added explicit PING from namespace that will check this reachability. Issue 4:flushdb give exception when trying to accces Chassis Server DB over Unix Sokcet. Solution: Handle gracefully via try..except and log the message.
1 parent a477dbb commit 0285bfe

File tree

4 files changed

+40
-53
lines changed

4 files changed

+40
-53
lines changed

device/nokia/x86_64-nokia_ixr7250e_sup-r0/chassisdb.conf

-1
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,3 @@ start_chassis_db=1
22
chassis_db_address=10.6.0.100
33
lag_id_start=1
44
lag_id_end=512
5-
midplane_subnet=10.6.0.0/16

dockers/docker-database/flush_unused_database

+6-2
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ import swsssdk
33
import redis
44
import subprocess
55
import time
6+
import syslog
67

78
while(True):
89
output = subprocess.Popen(['sonic-db-cli', 'PING'], stdout=subprocess.PIPE, text=True).communicate()[0]
@@ -24,5 +25,8 @@ for instname, v in instlists.items():
2425
if dbinst == instname:
2526
continue
2627

27-
r = redis.Redis(host=insthost, unix_socket_path=instsocket, db=dbid)
28-
r.flushdb()
28+
try:
29+
r = redis.Redis(host=insthost, unix_socket_path=instsocket, db=dbid)
30+
r.flushdb()
31+
except (redis.exceptions.ConnectionError):
32+
syslog.syslog(syslog.LOG_INFO,"flushdb:Redis Unix Socket connection error for path {} and dbaname {}".format(instsocket, dbname))

files/build_templates/docker_image_ctl.j2

+34-8
Original file line numberDiff line numberDiff line change
@@ -118,12 +118,8 @@ function preStartAction()
118118

119119
function setPlatformLagIdBoundaries()
120120
{
121-
CHASSIS_CONF=/usr/share/sonic/device/$PLATFORM/chassisdb.conf
122-
if [ -f "$CHASSIS_CONF" ]; then
123-
source $CHASSIS_CONF
124-
docker exec -i ${DOCKERNAME} $SONIC_DB_CLI CHASSIS_APP_DB SET "SYSTEM_LAG_ID_START" "$lag_id_start"
125-
docker exec -i ${DOCKERNAME} $SONIC_DB_CLI CHASSIS_APP_DB SET "SYSTEM_LAG_ID_END" "$lag_id_end"
126-
fi
121+
docker exec -i ${DOCKERNAME} $SONIC_DB_CLI CHASSIS_APP_DB SET "SYSTEM_LAG_ID_START" "$lag_id_start"
122+
docker exec -i ${DOCKERNAME} $SONIC_DB_CLI CHASSIS_APP_DB SET "SYSTEM_LAG_ID_END" "$lag_id_end"
127123
}
128124
function waitForAllInstanceDatabaseConfigJsonFilesReady()
129125
{
@@ -158,13 +154,40 @@ sleep 1
158154
function postStartAction()
159155
{
160156
{%- if docker_container_name == "database" %}
157+
CHASSISDB_CONF="/usr/share/sonic/device/$PLATFORM/chassisdb.conf"
158+
[ -f $CHASSISDB_CONF ] && source $CHASSISDB_CONF
161159
if [ "$DEV" ]; then
162160
# Enable the forwarding on eth0 interface in namespace.
163161
SYSCTL_NET_CONFIG="/etc/sysctl.d/sysctl-net.conf"
164162
docker exec -i database$DEV sed -i -e "s/^net.ipv4.conf.eth0.forwarding=0/net.ipv4.conf.eth0.forwarding=1/;
165163
s/^net.ipv6.conf.eth0.forwarding=0/net.ipv6.conf.eth0.forwarding=1/" $SYSCTL_NET_CONFIG
166164
docker exec -i database$DEV sysctl --system -e
167165
link_namespace $DEV
166+
167+
168+
if [[ -n "$midplane_subnet" ]]; then
169+
# Use /16 for loopback interface
170+
ip netns exec "$NET_NS" ip addr add 127.0.0.1/16 dev lo
171+
ip netns exec "$NET_NS" ip addr del 127.0.0.1/8 dev lo
172+
173+
# Create eth1 in database instance
174+
ip link add name ns-eth1"$NET_NS" link eth1-midplane type macvlan mode bridge
175+
ip link set dev ns-eth1"$NET_NS" netns "$NET_NS"
176+
ip netns exec "$NET_NS" ip link set ns-eth1"$NET_NS" name eth1
177+
178+
# Configure IP address and enable eth1
179+
lc_slot_id=$(python3 -c 'import sonic_platform.platform; platform_chassis = sonic_platform.platform.Platform().get_chassis(); print(platform_chassis.get_my_slot())' 2>/dev/null)
180+
lc_ip_address=`echo $midplane_subnet | awk -F. '{print $1 "." $2}'`.$lc_slot_id.$(($DEV + 10))
181+
lc_subnet_mask=${midplane_subnet#*/}
182+
ip netns exec "$NET_NS" ip addr add $lc_ip_address/$lc_subnet_mask dev eth1
183+
ip netns exec "$NET_NS" ip link set dev eth1 up
184+
185+
# Allow localnet routing on the new interfaces if midplane is using a
186+
# subnet in the 127/8 range.
187+
if [[ "${midplane_subnet#127}" != "$midplane_subnet" ]]; then
188+
ip netns exec "$NET_NS" bash -c "echo 1 > /proc/sys/net/ipv4/conf/eth1/route_localnet"
189+
fi
190+
fi
168191
fi
169192
# Setup ebtables configuration
170193
ebtables_config
@@ -180,7 +203,8 @@ function postStartAction()
180203
# then we catch python exception of file not valid
181204
# that comes to syslog which is unwanted so wait till database
182205
# config is ready and then ping
183-
until [[ ($(docker exec -i database$DEV pgrep -x -c supervisord) -gt 0) && ($($SONIC_DB_CLI PING | grep -c PONG) -gt 0) ]]; do
206+
until [[ ($(docker exec -i database$DEV pgrep -x -c supervisord) -gt 0) && ($($SONIC_DB_CLI PING | grep -c PONG) -gt 0) &&
207+
($(docker exec -i database$DEV sonic-db-cli PING | grep -c PONG) -gt 0) ]]; do
184208
sleep 1;
185209
done
186210
if [[ ("$BOOT_TYPE" == "warm" || "$BOOT_TYPE" == "fastfast") && -f $WARM_DIR/dump.rdb ]]; then
@@ -222,7 +246,9 @@ function postStartAction()
222246
($(docker exec -i ${DOCKERNAME} $SONIC_DB_CLI CHASSIS_APP_DB PING | grep -c True) -gt 0) ]]; do
223247
sleep 1
224248
done
225-
setPlatformLagIdBoundaries
249+
if [[ -n "$lag_id_start" && -n "$lag_id_end" ]]; then
250+
setPlatformLagIdBoundaries
251+
fi
226252
REDIS_SOCK="/var/run/redis-chassis/redis_chassis.sock"
227253
fi
228254
chgrp -f redis $REDIS_SOCK && chmod -f 0760 $REDIS_SOCK

files/image_config/interfaces/interfaces-config.sh

-42
Original file line numberDiff line numberDiff line change
@@ -60,48 +60,6 @@ for intf_pid in $(ls -1 /var/run/dhclient*.Ethernet*.pid 2> /dev/null); do
6060
[[ -f ${intf_pid} ]] && kill `cat ${intf_pid}` && rm -f ${intf_pid}
6161
done
6262

63-
64-
# Setup eth1 if we connect to a remote chassis DB.
65-
PLATFORM=${PLATFORM:-`sonic-cfggen -H -v DEVICE_METADATA.localhost.platform`}
66-
CHASSISDB_CONF="/usr/share/sonic/device/$PLATFORM/chassisdb.conf"
67-
[[ -f $CHASSISDB_CONF ]] && source $CHASSISDB_CONF
68-
69-
ASIC_CONF="/usr/share/sonic/device/$PLATFORM/asic.conf"
70-
[[ -f $ASIC_CONF ]] && source $ASIC_CONF
71-
72-
if [[ -n "$midplane_subnet" && ($NUM_ASIC -gt 1) ]]; then
73-
for asic_id in `seq 0 $((NUM_ASIC - 1))`; do
74-
NET_NS="asic$asic_id"
75-
76-
PIDS=`ip netns pids "$NET_NS" 2>/dev/null`
77-
if [[ "$?" -ne "0" ]]; then # namespace doesn't exist
78-
continue
79-
fi
80-
81-
# Use /16 for loopback interface
82-
ip netns exec $NET_NS ip addr add 127.0.0.1/16 dev lo
83-
ip netns exec $NET_NS ip addr del 127.0.0.1/8 dev lo
84-
85-
# Create eth1 in database instance
86-
ip link add name ns-eth1 link eth1-midplane type ipvlan mode l2
87-
ip link set dev ns-eth1 netns $NET_NS
88-
ip netns exec $NET_NS ip link set ns-eth1 name eth1
89-
90-
# Configure IP address and enable eth1
91-
lc_slot_id=$(python3 -c 'import sonic_platform.platform; platform_chassis = sonic_platform.platform.Platform().get_chassis(); print(platform_chassis.get_my_slot())' 2>/dev/null)
92-
lc_ip_address=`echo $midplane_subnet | awk -F. '{print $1 "." $2}'`.$lc_slot_id.$((asic_id + 10))
93-
lc_subnet_mask=${midplane_subnet#*/}
94-
ip netns exec $NET_NS ip addr add $lc_ip_address/$lc_subnet_mask dev eth1
95-
ip netns exec $NET_NS ip link set dev eth1 up
96-
97-
# Allow localnet routing on the new interfaces if midplane is using a
98-
# subnet in the 127/8 range.
99-
if [[ "${midplane_subnet#127}" != "$midplane_subnet" ]]; then
100-
ip netns exec $NET_NS bash -c "echo 1 > /proc/sys/net/ipv4/conf/eth1/route_localnet"
101-
fi
102-
done
103-
fi
104-
10563
# Read sysctl conf files again
10664
sysctl -p /etc/sysctl.d/90-dhcp6-systcl.conf
10765

0 commit comments

Comments
 (0)