Skip to content

Chassisd does not wait for the execution to complete for previous admin state change requests. Fixed issue #22138 #607

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

rameshraghupathy
Copy link
Contributor

@rameshraghupathy rameshraghupathy commented Apr 11, 2025

Chassisd does not wait for the execution to complete for previous admin state change requests. Fixed issue #22138

Description

Fixed issue #22138
Chassisd does not wait for the execution to complete for previous admin state change requests. Added support to solve this.

Fixed issue #22138

Motivation and Context

Added a state to indicate the progress of an admin_state transition period (window) of every module in config_db.
When in progress do not allow another admin_state change for that module.
Mark the beginning of state transition in the CLI itself. Added a timeout also for error cases.
Mark the end of state transition in chassisd

How Has This Been Tested?

ssue "config chassis module startup DPU0". This takes around 3 mins. Check the config_db and verify the new variable is set to True.
Issue another "config chassis module startup DPU0" and also ""config chassis module shutdown DPU0".
See both are blocked.
Once the DPU operational state is the admin_sate check if a new config is accepted.

Additional Information (Optional)

Goes with PR: sonic-net/sonic-utilities#3845

…te for previous admin state change requests.
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@@ -740,6 +749,9 @@ class SmartSwitchModuleUpdater(ModuleUpdater):
# Persist dpu down time
self.persist_dpu_reboot_time(key)

# Clear transition flag in STATE_DB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, but but do we need to wait until the operational state transition has occurred?

Copy link
Contributor Author

@rameshraghupathy rameshraghupathy Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gpunathilell @vvolam This is the instance where the operation state changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, we can just wait for the admin_state change function to return to clear the transition flag right? Assuming the set_admin_state function failed, we can immediately clear the transition flag and do other operations (instead of waiting for timeout)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gpunathilell We can, but we need to maintain a list for all DPUs and it makes the code unnecessarily complicated. The possibility of coming out of timeout is very rare and again it is safe to wait until the timeout happens.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant in line 240 in chassisd, we can just add it here right?

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants