Skip to content

Disable routeCheck monit as part of config reload/minigraph stop service and enable it back as part of service start. #3682

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Dec 17, 2024

Conversation

abdosi
Copy link
Contributor

@abdosi abdosi commented Dec 16, 2024

What I did:

For Config reload/minigraph stop and re-enable routeCheck because
with large route scale of 70K+ routes this can log monit error
Transiently which can result in failure of sonic-mgmt test cases because of loganalyzer.

MSFT ADO: 30457854

Why I did:

Because of this transient issue monit ERR log can get generated and this can result failure of sonic-mgmt test case.

How I verify:

Manual Verification via sudo monit status routeCheck and UT updated.

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@abdosi
Copy link
Contributor Author

abdosi commented Dec 16, 2024

@anamehra @yejianquan for viz.

@abdosi abdosi changed the title Disable routeCheck monit as part of config reload/minigraph stop service and enable it back of service start. Disable routeCheck monit as part of config reload/minigraph stop service and enable it back as part of service start. Dec 16, 2024
Signed-off-by: Abhishek Dosi <[email protected]>
@abdosi
Copy link
Contributor Author

abdosi commented Dec 16, 2024

/azp run Azure.sonic-utilities

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

try:
subprocess.check_call(['sudo', 'monit', 'status'], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
click.echo("Enabling container monitoring ...")
click.echo("Enabling container and routeCheck monitoring ...")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the route check is enabled at this point, as the links and sessions are still coming up, it may still cause same error if a route check happens, is that correct?

Copy link
Contributor Author

@abdosi abdosi Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anamehra once we have start of monit probability of hitting ERR is less since it takes 5 min for monit to initialize and then we have 15 min (3 cycle) for any error to come. System should get stabilize by 15-20 min post monit restart.

Copy link
Contributor

@yejianquan yejianquan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@abdosi abdosi merged commit d85e1db into sonic-net:master Dec 17, 2024
7 checks passed
@abdosi
Copy link
Contributor Author

abdosi commented Dec 17, 2024

@yejianquan : Help with cherry-pick

mssonicbld pushed a commit to mssonicbld/sonic-utilities that referenced this pull request Dec 18, 2024
…ice and enable it back as part of service start. (sonic-net#3682)

What I did:

For Config reload/minigraph stop and re-enable routeCheck because
with large route scale of 70K+ routes this can log monit error
Transiently which can result in failure of sonic-mgmt test cases because of loganalyzer.

Why I did:

Because of this transient issue monit ERR log can get generated and this can result failure of sonic-mgmt test case.

How I verify:

Manual Verification via sudo monit status routeCheck and UT updated.
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202405: #3686

mssonicbld pushed a commit that referenced this pull request Dec 18, 2024
…ice and enable it back as part of service start. (#3682)

What I did:

For Config reload/minigraph stop and re-enable routeCheck because
with large route scale of 70K+ routes this can log monit error
Transiently which can result in failure of sonic-mgmt test cases because of loganalyzer.

Why I did:

Because of this transient issue monit ERR log can get generated and this can result failure of sonic-mgmt test case.

How I verify:

Manual Verification via sudo monit status routeCheck and UT updated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants