Skip to content

Fix issue with Assign VM operation #10845

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from
Closed

Fix issue with Assign VM operation #10845

wants to merge 9 commits into from

Conversation

Pearl1594
Copy link
Contributor

@Pearl1594 Pearl1594 commented May 12, 2025

Description

This PR fixes: #10825

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI
  • test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

Test 1: Successful assignment of VM to another account in another domain

  1. Created a VM in the ROOT domain and created another domain Engg with account dom1
  2. Shut it off, and moved it to account dom1 under /Engg domain
  3. Successfully created dom1-network and moved the VM to the domain
  4. Started the VM
    image
    image

Test 2: Simulated exception after the network is created in the destination account/domain

  1. Created a VM in the ROOT domain in admin-network and created another account dom4 in the Engg domain
  2. Shut off the VM and moved it to the dom4 account under /Engg domain
  3. Creates the network in dom2 : dom4-network
  4. Then an exception is thrown, leading to the network to be cleaned up
  5. VM continues to be on ROOT domain
    image
    image

Test 3: Simulated exception after after network creation logic, but here, we pass network id - so no new network is created

  1. Created a VM in the ROOT domain in admin-network and created another account dom4 in the Engg domain
  2. Shut off the VM and moved it to the dom4 account under /Engg domain and passed network during the operation: dom4-net
  3. Then an exception is thrown - network is not deleted, as network was not created during this operation.
  4. VM continues to be on ROOT domain
    image
    image

How did you try to break this feature and the system with this change?

Copy link

codecov bot commented May 12, 2025

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Project coverage is 16.14%. Comparing base (011fced) to head (49b76c9).
Report is 22 commits behind head on 4.20.

Files with missing lines Patch % Lines
.../src/main/java/com/cloud/vm/UserVmManagerImpl.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##               4.20   #10845   +/-   ##
=========================================
  Coverage     16.13%   16.14%           
- Complexity    13225    13228    +3     
=========================================
  Files          5652     5652           
  Lines        497021   497018    -3     
  Branches      60222    60222           
=========================================
+ Hits          80202    80223   +21     
+ Misses       407885   407861   -24     
  Partials       8934     8934           
Flag Coverage Δ
uitests 4.00% <ø> (ø)
unittests 16.99% <0.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@DaanHoogland DaanHoogland added this to the 4.20.1 milestone May 12, 2025
@winterhazel winterhazel self-requested a review May 12, 2025 11:55
@Pearl1594
Copy link
Contributor Author

Thanks @winterhazel - would be great if you can help review this! :)

…left behind in the DB should there be an exception
@Pearl1594
Copy link
Contributor Author

There are other issues I can see with this approach, because we delete the network in case of say an exception that occurs, the VM cannot then be expunged, as the network can't be found and the nics are updated to point to the new network.

@Pearl1594
Copy link
Contributor Author

After testing multiple approaches and not reaching a desired result which the original PR (#7061) set out to achieve, I've reverted the bit that encloses executeStepsToChangeOwnershipOfVm within a transaction.

@Pearl1594 Pearl1594 marked this pull request as ready for review May 13, 2025 09:05
Copy link

@Pearl1594 Pearl1594 requested a review from DaanHoogland May 13, 2025 09:37
@winterhazel
Copy link
Member

After testing multiple approaches and not reaching a desired result which the original PR (#7061) set out to achieve, I've reverted the bit that encloses executeStepsToChangeOwnershipOfVm within a transaction.

@Pearl1594 thanks for looking into the issue. I will have a look into it as well to see if I can come up with another approach, but I'm ok with reverting this bit if we do not find a better way to fix it.

@Pearl1594
Copy link
Contributor Author

Sure @winterhazel - thanks!

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm, need testing and maybe documentation of the limitations of the assign VM functionality/known issues

@Pearl1594 Pearl1594 linked an issue May 13, 2025 that may be closed by this pull request
@Pearl1594 Pearl1594 moved this to In Progress in ACS 4.20.1 May 13, 2025
@winterhazel
Copy link
Member

@Pearl1594 I have submitted a potential (still work in progress) approach to https://github.com/winterhazel/cloudstack/tree/address-assignvm-regression. I have not completely tested it yet, but it seems to be working ok based on some simple tests. It fixes #10825, and does not revert the issue that #7061 fixed.

This approach moves the creation of the network to outside the transaction, right before the updates start happening. This way, if an error happens while creating the network, nothing will have been updated yet. And, if an error happens while updating the virtual machine, the changes will be rolledback (we will have a dangling network at the moment, but we can adapt the code to delete it).

What do you think?

@Pearl1594
Copy link
Contributor Author

Pearl1594 commented May 14, 2025

@winterhazel on briefly checking the code changes - I believe, it partly reverts what #7061 was set to achieve, but I am completely fine with it, thanks for looking into it. Once you are done with your testing and you see it not causing any major issues other than network component being left behind, then we can proceed with your approach. However, if you foresee any other (new) side-effects than what was originally seen, then we could proceed to just revert. So I'll wait for your PR and confirmation.

@Pearl1594
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@Pearl1594
Copy link
Contributor Author

@winterhazel We're aiming to cut the rc by Friday. If the changes are extensive and need more time for thorough testing, would you be open to targeting them for the next release?

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 13378

@Pearl1594
Copy link
Contributor Author

@blueorangutan test

@blueorangutan
Copy link

@Pearl1594 a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@winterhazel
Copy link
Member

@winterhazel We're aiming to cut the rc by Friday. If the changes are extensive and need more time for thorough testing, would you be open to targeting them for the next release?

@Pearl1594 I should be able to finish properly testing until friday, but I will notify if I find out that we need more time to test this approach.

By the way, regarding:

I believe, it partly reverts what #7061 was set to achieve

It is not reverting what #7061 intended to fix. The original issue was the virtual machine being left in an inconsistent state when there was an error after the updates began (e.g. it would be left assigned to the new account, but still on the previous network). This does not happen anymore because all updates are still inside the transaction.

@DaanHoogland
Copy link
Contributor

@winterhazel , will your PR be replacing this one are be an addition?

@winterhazel
Copy link
Member

@winterhazel , will your PR be replacing this one are be an addition?

@DaanHoogland it will be replacing this one.

@blueorangutan
Copy link

[SF] Trillian test result (tid-13306)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 56389 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10845-t13306-kvm-ol8.zip
Smoke tests completed. 141 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

@Pearl1594
Copy link
Contributor Author

@winterhazel can you please raise a PR atleast, so that we can have the CI simulator tests run and also smoke tests, so that we can have some results by tomorrow?

@weizhouapache
Copy link
Member

@Pearl1594 @winterhazel
can you discuss which is better, #10875 and #10845 ?

@winterhazel
Copy link
Member

@weizhouapache the approach in #10875 fixes #10825 without introducing back the bug #7061 intended to fix, so it should be the ideal solution.

@Pearl1594
Copy link
Contributor Author

Closing this PR as #10875 address it. Thanks @winterhazel - sorry for the delay, I'm on leave.

@Pearl1594 Pearl1594 closed this May 16, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in ACS 4.20.1 May 16, 2025
@weizhouapache
Copy link
Member

Closing this PR as #10875 address it. Thanks @winterhazel - sorry for the delay, I'm on leave.

good, thanks both @Pearl1594 @winterhazel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Failure to assign VM to another account
7 participants