-
Notifications
You must be signed in to change notification settings - Fork 712
Disable routeCheck monit as part of config reload/minigraph stop service and enable it back as part of service start. #3682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Abhishek Dosi <[email protected]>
with large route scale of 70K+ routes this can log monit error Transiently which can also fail in sonic-mgmt. Signed-off-by: Abhishek Dosi <[email protected]>
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
@anamehra @yejianquan for viz. |
Signed-off-by: Abhishek Dosi <[email protected]>
/azp run Azure.sonic-utilities |
Azure Pipelines successfully started running 1 pipeline(s). |
try: | ||
subprocess.check_call(['sudo', 'monit', 'status'], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL) | ||
click.echo("Enabling container monitoring ...") | ||
click.echo("Enabling container and routeCheck monitoring ...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the route check is enabled at this point, as the links and sessions are still coming up, it may still cause same error if a route check happens, is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anamehra once we have start of monit probability of hitting ERR is less since it takes 5 min for monit to initialize and then we have 15 min (3 cycle) for any error to come. System should get stabilize by 15-20 min post monit restart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@yejianquan : Help with cherry-pick |
…ice and enable it back as part of service start. (sonic-net#3682) What I did: For Config reload/minigraph stop and re-enable routeCheck because with large route scale of 70K+ routes this can log monit error Transiently which can result in failure of sonic-mgmt test cases because of loganalyzer. Why I did: Because of this transient issue monit ERR log can get generated and this can result failure of sonic-mgmt test case. How I verify: Manual Verification via sudo monit status routeCheck and UT updated.
Cherry-pick PR to 202405: #3686 |
…ice and enable it back as part of service start. (#3682) What I did: For Config reload/minigraph stop and re-enable routeCheck because with large route scale of 70K+ routes this can log monit error Transiently which can result in failure of sonic-mgmt test cases because of loganalyzer. Why I did: Because of this transient issue monit ERR log can get generated and this can result failure of sonic-mgmt test case. How I verify: Manual Verification via sudo monit status routeCheck and UT updated.
What I did:
For Config reload/minigraph stop and re-enable routeCheck because
with large route scale of 70K+ routes this can log monit error
Transiently which can result in failure of sonic-mgmt test cases because of loganalyzer.
MSFT ADO: 30457854
Why I did:
Because of this transient issue monit ERR log can get generated and this can result failure of sonic-mgmt test case.
How I verify:
Manual Verification via
sudo monit status routeCheck
and UT updated.