Skip to content

Rollback of changes with errors during the VM assign #7061

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jan 15, 2025

Conversation

stephankruggg
Copy link
Contributor

Description

When assigning a VM to another account (assignVirtualMachine API), if a network error happens during the process, no rollback is executed; thus, leaving the DB in an inconsistent state.

This PR fixes this by processing all steps to change the ownership of a VM inside the transaction. It also refactors code related to this procedure.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

How Has This Been Tested?

In a local lab, I created a VM and an account of user type. I then attempted to assign the admin's VM to the user by using a network not accessible to the user.

Before applying the changes, an error was raised, but the VM was listed as belonging to the user.

After applying the changes, an error was raised, and the VM was listed as belonging to the admin - no changes occurred.

Copy link
Member

@weizhouapache weizhouapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lots of refactoring.
I am not sure if the code is good or not. Maybe others can check.

@stephankruggg
can you point out the key point of your change ?

@codecov
Copy link

codecov bot commented Jan 5, 2023

Codecov Report

Attention: Patch coverage is 86.41304% with 50 lines in your changes missing coverage. Please review.

Project coverage is 16.13%. Comparing base (ab76d3c) to head (7d882a9).
Report is 6 commits behind head on 4.20.

Files with missing lines Patch % Lines
.../src/main/java/com/cloud/vm/UserVmManagerImpl.java 87.60% 38 Missing and 7 partials ⚠️
...e/cloudstack/api/command/admin/vm/AssignVMCmd.java 0.00% 5 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               4.20    #7061      +/-   ##
============================================
+ Coverage     16.07%   16.13%   +0.06%     
- Complexity    12883    12965      +82     
============================================
  Files          5639     5639              
  Lines        494193   494241      +48     
  Branches      59925    59896      -29     
============================================
+ Hits          79419    79753     +334     
+ Misses       405944   405667     -277     
+ Partials       8830     8821       -9     
Flag Coverage Δ
uitests 4.02% <ø> (ø)
unittests 16.98% <86.41%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@stephankruggg
Copy link
Contributor Author

lots of refactoring. I am not sure if the code is good or not. Maybe others can check.

@stephankruggg can you point out the key point of your change ?

@weizhouapache, when assigning a VM to another account (assignVirtualMachine API), if a network error happens during the process, no rollback is executed; thus, leaving the DB in an inconsistent state. This PR fixes this by processing all steps to change the ownership of a VM inside the transaction.

The refactoring is intended to make the code more testable, readable, and easier to maintain.

@weizhouapache
Copy link
Member

lots of refactoring. I am not sure if the code is good or not. Maybe others can check.
@stephankruggg can you point out the key point of your change ?

@weizhouapache, when assigning a VM to another account (assignVirtualMachine API), if a network error happens during the process, no rollback is executed; thus, leaving the DB in an inconsistent state. This PR fixes this by processing all steps to change the ownership of a VM inside the transaction.

The refactoring is intended to make the code more testable, readable, and easier to maintain.

@stephankruggg
thanks.

it seems the key block is

        Transaction.execute(new TransactionCallbackNoReturn() {
            @Override
            public void doInTransactionWithoutResult(TransactionStatus status) {
                executeStepsToChangeOwnershipOfVm(cmd, caller, oldAccount, newAccount, vm, offering, volumes, template, domainId);
            }
        });

@rohityadavcloud
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@rohityadavcloud a Jenkins job has been kicked to build packages. It will be bundled with SystemVM template(s). I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 5302

Copy link
Member

@weizhouapache weizhouapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall code lgtm.
left 2 minor comments.

@stephankruggg
can you ask your team to test this and share the testing result as a comment ?

@github-actions
Copy link

github-actions bot commented Feb 8, 2023

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Feb 9, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 5 Code Smells

75.3% 75.3% Coverage
0.0% 0.0% Duplication

@github-actions
Copy link

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

@DaanHoogland DaanHoogland added this to the 4.19.0.0 milestone Jun 22, 2023
@DaanHoogland
Copy link
Contributor

@stephankruggg can you look at the conflicts? (cc @BryanMLima )

@DaanHoogland
Copy link
Contributor

@stephankruggg is anybody looking at this (or should we postpone)?

@stephankruggg
Copy link
Contributor Author

@stephankruggg is anybody looking at this (or should we postpone)?

@GaOrtiga can you take at this one, please?

@stephankruggg stephankruggg removed their assignment Nov 15, 2023
@GaOrtiga
Copy link
Contributor

Hi guys, sorry for the delay.

@stephankruggg is anybody looking at this (or should we postpone)?

@GaOrtiga can you take at this one, please?

Yes.

@stephankruggg is anybody looking at this (or should we postpone)?

I have fixed the conflicts and will run some tests in the next few days before pushing the changes.

@kiranchavala
Copy link
Contributor

@blueorangutan package

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm, though a lot of changes there are unit tests to support those. I don't think there is an integration test supporting this scenario, but as it is a bit niche I think we can risk merging. Do you agree @weizhouapache ?

Copy link

github-actions bot commented Jan 6, 2025

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

@winterhazel
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 12014

@winterhazel
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 12016

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-12067)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 53801 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr7061-t12067-kvm-ol8.zip
Smoke tests completed. 140 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_11_isolated_network_with_dynamic_routed_mode Error 2.27 test_ipv4_routing.py
test_12_vpc_and_tier_with_dynamic_routed_mode Error 3.42 test_ipv4_routing.py
test_12_vpc_and_tier_with_dynamic_routed_mode Error 3.42 test_ipv4_routing.py

Copy link
Member

@bernardodemarco bernardodemarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLGTM

@DaanHoogland
Copy link
Contributor

[SF] Trillian test result (tid-12067) Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8 Total time taken: 53801 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr7061-t12067-kvm-ol8.zip Smoke tests completed. 140 look OK, 1 have errors, 0 did not run Only failed and skipped tests results shown below:
Test Result Time (s) Test File
test_11_isolated_network_with_dynamic_routed_mode Error 2.27 test_ipv4_routing.py
test_12_vpc_and_tier_with_dynamic_routed_mode Error 3.42 test_ipv4_routing.py
test_12_vpc_and_tier_with_dynamic_routed_mode Error 3.42 test_ipv4_routing.py

these are a generic issue on 4.20 and not specific to this PR. Is there any more testing required for this @stephankruggg ?
cc @winterhazel @weizhouapache @bernardodemarco

@winterhazel
Copy link
Member

these are a generic issue on 4.20 and not specific to this PR. Is there any more testing required for this @stephankruggg ? cc @winterhazel @weizhouapache @bernardodemarco

I don't think we require more testing. I already validated that it fixed the original issue, and took a good look at the code comparing it with the current version to ensure that the behavior is the same.

@DaanHoogland DaanHoogland merged commit d9a7b6e into apache:4.20 Jan 15, 2025
26 checks passed
DaanHoogland added a commit that referenced this pull request Jan 15, 2025
* 4.20:
  Rollback of changes with errors during the VM assign (#7061)
  [VMware] Consider CD/DVD drive when calculating next free unit number for volume attachment over IDE controller (#9644)
  consider a valid ipv4 address as a validish ipv4 /32 cidr (#10174)
dhslove pushed a commit to ablecloud-team/ablestack-cloud that referenced this pull request Jan 16, 2025
Co-authored-by: Stephan Krug <[email protected]>
Co-authored-by: Gabriel <[email protected]>
Co-authored-by: Fabricio Duarte <[email protected]>
@Pearl1594 Pearl1594 moved this to Done in ACS 4.20.1 Mar 17, 2025
@Pearl1594 Pearl1594 mentioned this pull request May 8, 2025
14 tasks
shwstppr added a commit that referenced this pull request May 13, 2025
@Pearl1594 Pearl1594 mentioned this pull request May 13, 2025
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.