-
Notifications
You must be signed in to change notification settings - Fork 1.5k
config reload fail due to monit socket connection fail #21268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Comments
@abdosi could you please take a look at this? |
@bingwang-ms for viz |
Not sure if this helps https://github.com/sonic-net/sonic-buildimage/pull/19477/files @abdosi |
mssonicbld
pushed a commit
to mssonicbld/sonic-utilities
that referenced
this issue
Dec 27, 2024
…itor enable after config reload (sonic-net#3698) Fix: sonic-net/sonic-buildimage#21268 How I did: Reorder the sequence of doing enabling container_cheek and routeCheck before doing monit reload to avoid transient issue of monit sock error. Also I add a sleep of 1 sec to make sure monitor enable configuration takes effect before we do reload.
mssonicbld
pushed a commit
to sonic-net/sonic-utilities
that referenced
this issue
Dec 27, 2024
…itor enable after config reload (#3698) Fix: sonic-net/sonic-buildimage#21268 How I did: Reorder the sequence of doing enabling container_cheek and routeCheck before doing monit reload to avoid transient issue of monit sock error. Also I add a sleep of 1 sec to make sure monitor enable configuration takes effect before we do reload.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
Notice there's chance that config reload/ load minigraph fails due to monit socket connection fail.
It roughly noticed around 12/20. Seems to be a timing and flaky issue.
Suspect it's related to sonic-net/sonic-utilities#3682 @abdosi could you please take a look?
Steps to reproduce the issue:
It's noticed on different config reload scenarios:
Describe the results you received:
On pr KVM test plans:
https://elastictest.org/scheduler/testplan/6768f5a49e7fdf9b25e4066e?testcase=testbed_q_sonic-elastictest-prod-vmss-E8s-v3_249428_vms-kvm-t1-lag_prepare.log&type=prepare
In nightly test, we run with &> /dev/null, but we can still see '1' as the RC.
And in syslog, I notice the same error logs:
In syslog, around the fail time:
Describe the results you expected:
config reload should go smoothly with RC 0
Output of
show version
:both internal-202405 and github 202405 image has this issue
Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):
The text was updated successfully, but these errors were encountered: