[generic-config-updater] Handle failed service restarts #2020

renukamanavalan · 2022-01-18T00:19:35Z

What I did

During config update, update of certain tables do demand service restart.
With multiple related updates are not grouped together, this might result in too many service restarts, which could fail with "hitting start limit". When that happens, call reset-failed, try to restart. If it fails again, take a pause and try to restart again.

How I did it

When service restart fails, call reset-failed, try, pause and then call service restart again.

How to verify it

Previous command output (if the output of a command-line utility has changed)

New command output (if the output of a command-line utility has changed)

2) Read conf file from install dir 3) Drop empty keys & tables upon jsonpatch.JsonPatch.apply to be in sync with redis update 4) Prefix service_validator module path with "generic_updater"

2) Added vlan validator 3) Added test code for vlan validator

lgtm-com · 2022-01-18T00:30:02Z

This pull request introduces 1 alert when merging 3bd91b5 into 5cc9dd5 - view on LGTM.com

new alerts:

1 for Syntax error

generic_config_updater/services_validator.py

lgtm-com · 2022-01-19T18:36:27Z

This pull request introduces 1 alert when merging f520752 into d9f3afe - view on LGTM.com

new alerts:

1 for Unused local variable

generic_config_updater/services_validator.py

qiluo-msft · 2022-01-19T23:10:16Z

generic_config_updater/services_validator.py

@@ -17,13 +17,42 @@ def set_verbose(verbose=False):

 def _service_restart(svc_name):
    rc = os.system(f"systemctl restart {svc_name}")
-    logger.log(logger.LOG_PRIORITY_NOTICE,
-            f"Restarted {svc_name}", print_to_console)
+    if rc != 0:


rc

Is the rc a stable code when too many restarts happen? If yes, just check if rc == <code>: #Closed

rc == or != 0 is stable comparison for success/failure

Sorry, I mean is there a specific exit code for "too many restarts" failure? if yes, we can check that code specifically.

There are no specific error code. 1 is a generic error code, which is most commonly we see. But this could change in future. Moreover if there is any inherent service related error, the monitor catches it and leads to ICM.

In our case, for any failure, we try our best, before giving up.

ref: link

"In case of an error while processing any init-script action except for status, the init script shall print an error message and exit with a non-zero status code:"

"1 generic or unspecified error (current practice)"

qiluo-msft · 2022-01-19T23:10:57Z

generic_config_updater/services_validator.py

+                print_to_console)
+
+        rc = os.system(f"systemctl restart {svc_name}")
+        if rc != 0:


rc

Is there a specific exit code for 2nd failure? #Closed

I don't get your comments. But rc is the absolute way to check success / failure.

qiluo-msft · 2022-01-19T23:13:22Z

generic_config_updater/services_validator.py

    return rc == 0


 def rsyslog_validator(old_config, upd_config, keys):
-    return _service_restart("rsyslog-config")


_service_restart

With _service_restart fixed, do we still need this fix? #Closed

What we need is to run /usr/bin/rsyslog-config.sh.
What it does is, update /etc/rsyslog.conf from CONFG-DB & restart rsyslog.
We do the same and additionally handle the possibility of rsyslog restart failure.

rsyslog-config is not a service, but a one shot wrapper to run /usr/bin/rsyslog-config.sh, after updategraph.service, at the startup.

What I did Missed update from review comments in PR #2020 s/os.system("sleep 10s")/time.sleep(10)/

What I did During config update, update of certain tables do demand service restart. With multiple related updates are not grouped together, this might result in too many service restarts, which could fail with "hitting start limit". When that happens, call reset-failed, try to restart. If it fails again, take a pause and try to restart again. How I did it When service restart fails, call reset-failed, try, pause and then call service restart again.

What I did Missed update from review comments in PR #2020 s/os.system("sleep 10s")/time.sleep(10)/

renukamanavalan added 22 commits October 1, 2021 23:47

ChangeApplier and unit test code.

7863b59

minor update

84f493b

fix unused import

dd337eb

Added service validation

6e0bba0

Take off added code for logging

cabcf23

Merge remote-tracking branch 'upstream/master' into updater

a075606

fix merge

bd7781c

change applier & test updated for service validation

f839e53

removed unused import

beb1ea7

Added two service validators

fe25d2e

move global to class; No logical code changes

fec8f8f

A fix in file path

7d90266

Merge remote-tracking branch 'upstream/master' into updater

91f8f7e

1) Copy generic_updater_config.conf.json as part of install

6cd9dd8

2) Read conf file from install dir 3) Drop empty keys & tables upon jsonpatch.JsonPatch.apply to be in sync with redis update 4) Prefix service_validator module path with "generic_updater"

Merge remote-tracking branch 'upstream/master' into updater

5a9434b

Prune only empty tables

30feec2

file rename

8cd17fd

Merge remote-tracking branch 'upstream/master' into updater

7533ed1

1) Drop print to stdout from change_applier

9abb300

2) Added vlan validator 3) Added test code for vlan validator

No logical code changes; Name changes only, per review comments

a41052b

Merge remote-tracking branch 'upstream/master' into updater

d37b350

Handle failed restart

3bd91b5

renukamanavalan self-assigned this Jan 18, 2022

renukamanavalan requested review from ghooo, qiluo-msft, isabelmsft and wen587 January 18, 2022 00:21

renukamanavalan added the generic-config-updater label Jan 18, 2022

ghooo reviewed Jan 18, 2022

View reviewed changes

generic_config_updater/services_validator.py Outdated Show resolved Hide resolved

Add pause, only if restart fails even after reset

f520752

Added test code

e66ce4a

renukamanavalan changed the title ~~generic_updater: Handle failed service restarts~~ [generic-config-updater] Handle failed service restarts Jan 19, 2022

renukamanavalan added 2 commits January 19, 2022 21:56

Dropped redundant test code

d280814

minor

f30ff36

qiluo-msft reviewed Jan 19, 2022

View reviewed changes

generic_config_updater/services_validator.py Show resolved Hide resolved

qiluo-msft reviewed Jan 19, 2022

View reviewed changes

ghooo approved these changes Jan 20, 2022

View reviewed changes

renukamanavalan merged commit 4f2773c into sonic-net:master Jan 20, 2022

This was referenced Jan 20, 2022

[generic_config_updater] dhcp-relay and rsyslog-conf service is failed after apply-patch #1979

Closed

[generic_config_updater] Minor update - No logical code change #2028

Merged

renukamanavalan added a commit that referenced this pull request Jan 20, 2022

[generic_config_updater] Minor update - No logical code change (#2028)

ad1ed4e

What I did Missed update from review comments in PR #2020 s/os.system("sleep 10s")/time.sleep(10)/

ghooo added the Request for 202111 Branch label Jan 26, 2022

judyjoseph pushed a commit that referenced this pull request Jan 31, 2022

[generic_config_updater] Minor update - No logical code change (#2028)

f078246

What I did Missed update from review comments in PR #2020 s/os.system("sleep 10s")/time.sleep(10)/

judyjoseph added the Included in 202111 Branch label Jan 31, 2022

ghooo mentioned this pull request Jun 24, 2022

SONiC Generic Update and Rollback - HLD sonic-net/SONiC#736

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[generic-config-updater] Handle failed service restarts #2020

[generic-config-updater] Handle failed service restarts #2020

renukamanavalan commented Jan 18, 2022 •

edited

Loading

lgtm-com bot commented Jan 18, 2022

lgtm-com bot commented Jan 19, 2022

qiluo-msft Jan 19, 2022 •

edited

Loading

renukamanavalan Jan 19, 2022

qiluo-msft Jan 19, 2022 •

edited

Loading

renukamanavalan Jan 20, 2022

qiluo-msft Jan 19, 2022 •

edited

Loading

renukamanavalan Jan 19, 2022

qiluo-msft Jan 19, 2022 •

edited

Loading

renukamanavalan Jan 19, 2022

[generic-config-updater] Handle failed service restarts #2020

[generic-config-updater] Handle failed service restarts #2020

Conversation

renukamanavalan commented Jan 18, 2022 • edited Loading

What I did

How I did it

How to verify it

Previous command output (if the output of a command-line utility has changed)

New command output (if the output of a command-line utility has changed)

lgtm-com bot commented Jan 18, 2022

lgtm-com bot commented Jan 19, 2022

qiluo-msft Jan 19, 2022 • edited Loading

Choose a reason for hiding this comment

renukamanavalan Jan 19, 2022

Choose a reason for hiding this comment

qiluo-msft Jan 19, 2022 • edited Loading

Choose a reason for hiding this comment

renukamanavalan Jan 20, 2022

Choose a reason for hiding this comment

qiluo-msft Jan 19, 2022 • edited Loading

Choose a reason for hiding this comment

renukamanavalan Jan 19, 2022

Choose a reason for hiding this comment

qiluo-msft Jan 19, 2022 • edited Loading

Choose a reason for hiding this comment

renukamanavalan Jan 19, 2022

Choose a reason for hiding this comment

renukamanavalan commented Jan 18, 2022 •

edited

Loading

qiluo-msft Jan 19, 2022 •

edited

Loading

qiluo-msft Jan 19, 2022 •

edited

Loading

qiluo-msft Jan 19, 2022 •

edited

Loading

qiluo-msft Jan 19, 2022 •

edited

Loading