Skip to content

interfaces-config.service may hang at sonic-cfggen -d #1873

Closed
@jeromesun14

Description

@jeromesun14

Description

interfaces-config.service may hang at sonic-cfggen -d -t /usr/share/sonic/templates/interfaces.j2 > /etc/network/interfaces

Steps to reproduce the issue:

It's hard to reproduce this issue through keep rebooting system time and time.
But we can reproduce sonic-cfggen -d -t /usr/share/sonic/templates/interfaces.j2 > /etc/network/interfaces hang up.

  1. redis-cli -n 4 FLUSHDB
  2. sonic-cfggen -d -t /usr/share/sonic/templates/interfaces.j2

We know the dependency: interfaces-config.service -> database.service -> updategraph.service.
And database.service load config db at docker container with configdb-load.sh.

# cat /etc/supervisor/conf.d/supervisord.conf 
[supervisord]
logfile_maxbytes=1MB
logfile_backups=2
nodaemon=true

[program:rsyslogd]
command=/bin/bash -c "rm -f /var/run/rsyslogd.pid && /usr/sbin/rsyslogd -n"
priority=1
autostart=true
autorestart=false
stdout_logfile=syslog
stderr_logfile=syslog

[program:redis-server]
command=/usr/bin/redis-server /etc/redis/redis.conf
priority=2
autostart=true
autorestart=false
stdout_logfile=syslog
stderr_logfile=syslog

[program:configdb-load.sh]
command=/usr/bin/configdb-load.sh
priority=3
autostart=true
autorestart=false
startsecs=0
stdout_logfile=syslog
stderr_logfile=syslog

database.service does not wait configdb-load.sh load all confib db data into redis db 4 and it quits after redis-server is OK.

function postStartAction()
{
    until [[ $(/usr/bin/docker exec database redis-cli ping | grep -c PONG) -gt 0 ]]; do
      sleep 1;
    done
}

So when interfaces-config.service runs, there may be no entries in redis db 4. It causes interfaces-config.sh hang at sonic-cfggen -d -t /usr/share/sonic/templates/interfaces.j2 > /etc/network/interfaces, and keep interfaces-config.service in running status.

xxx@switch:~$ ps aux | grep inter
root       806  0.0  0.0  20044  2780 ?        Ss   04:59   0:00 /bin/bash /usr/bin/interfaces-config.sh
root       816  0.0  0.6  87020 25080 ?        S    04:59   0:00 /usr/bin/python /usr/local/bin/sonic-cfggen -d -t /usr/share/sonic/templates/interfaces.j2

xxx@switch:~$ sudo systemctl list-jobs 
JOB UNIT                                 TYPE  STATE  
  1 graphical.target                     start waiting
  2 multi-user.target                    start waiting
 53 systemd-update-utmp-runlevel.service start waiting
 68 dhcp_relay.service                   start waiting
 69 swss.service                         start waiting
 73 interfaces-config.service            start running
 78 radv.service                         start waiting
 85 snmp.service                         start waiting

8 jobs listed.
xxx@switch:~$ docker ps
CONTAINER ID        IMAGE                            COMMAND                  CREATED             STATUS              PORTS               NAMES
812df0938d44        docker-lldp-sv2:latest           "/usr/bin/supervisord"   3 days ago          Up 6 hours                              lldp
bd37c386fde5        docker-platform-monitor:latest   "/usr/bin/supervisord"   3 days ago          Up 6 hours                              pmon
0dd9a559a4ed        docker-teamd:latest              "/usr/bin/supervisord"   3 days ago          Up 6 hours                              teamd
fc28bf86eabf        docker-fpm-quagga:latest         "/usr/bin/supervisord"   3 days ago          Up 6 hours                              bgp
9d23ca7b50e9        docker-database:latest           "/usr/bin/supervisord"   3 days ago          Up 6 hours                              database
xxx@switch:~$ 

Describe the results you received:

Describe the results you expected:

Additional information you deem important (e.g. issue happens only occasionally):

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions