Skip to content

Fix GC hidden attribute in case of SIGTERM signal #87

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: 3.2.12-8.3
Choose a base branch
from

Conversation

Wescoeur
Copy link
Member

@Wescoeur Wescoeur commented Jun 11, 2025

The GC can be interrupted by a SIGTERM signal. If this is caught while modifying a volume's hidden flag, this can have bad consequences.

For example in the situation below, the hidden flag of a volume has been changed but the cached value (self.hidden) in the python process still has the old value because of the 'util.CommandException' exception that was thrown. A VDI that normally should not be hidden is still hidden after executing _undoInterruptedCoalesceLeaf because the hidden value was not the correct one.

Code:

    def _setHidden(self, hidden=True):
        vhdutil.setHidden(self.path, hidden)
        # Exception! Next line is never executed.
        self.hidden = hidden

Trace:

Jun  5 09:15:50 r620-q6 SMGC: [563219] Removed vhd-parent from dce4b0fc(2.000G/170.336M?)
Jun  5 09:15:50 r620-q6 SMGC: [563219] Removed vhd-blocks from dce4b0fc(2.000G/170.336M?)
Jun  5 09:15:50 r620-q6 SM: [563219] ['/usr/bin/vhd-util', 'set', '--debug', '-n', '/var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/OLD_dce4b0fc-6ad1-4750-857b-45d8d2758503.vhd', '-f', 'hidden', '-v', '1']
Jun  5 09:15:50 r620-q6 SM: [563219] GC: recieved SIGTERM
Jun  5 09:15:50 r620-q6 SM: [563219] FAILED in util.pread: (rc -15) stdout: '', stderr: ''
Jun  5 09:15:50 r620-q6 SMGC: [563219] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
Jun  5 09:15:50 r620-q6 SMGC: [563219]          ***********************
Jun  5 09:15:50 r620-q6 SMGC: [563219]          *  E X C E P T I O N  *
Jun  5 09:15:50 r620-q6 SMGC: [563219]          ***********************
Jun  5 09:15:50 r620-q6 SMGC: [563219] _doCoalesceLeaf: EXCEPTION <class 'util.CommandException'>, Signalled 15
Jun  5 09:15:50 r620-q6 SMGC: [563219]   File "/opt/xensource/sm/cleanup.py", line 2653, in _liveLeafCoalesce
Jun  5 09:15:50 r620-q6 SMGC: [563219]     self._doCoalesceLeaf(vdi)
Jun  5 09:15:50 r620-q6 SMGC: [563219]   File "/opt/xensource/sm/cleanup.py", line 2717, in _doCoalesceLeaf
Jun  5 09:15:50 r620-q6 SMGC: [563219]     vdi._setHidden(True)
Jun  5 09:15:50 r620-q6 SMGC: [563219]   File "/opt/xensource/sm/cleanup.py", line 1063, in _setHidden
Jun  5 09:15:50 r620-q6 SMGC: [563219]     vhdutil.setHidden(self.path, hidden)
Jun  5 09:15:50 r620-q6 SMGC: [563219]   File "/opt/xensource/sm/vhdutil.py", line 235, in setHidden
Jun  5 09:15:50 r620-q6 SMGC: [563219]     ret = ioretry(cmd)
Jun  5 09:15:50 r620-q6 SMGC: [563219]   File "/opt/xensource/sm/vhdutil.py", line 94, in ioretry
Jun  5 09:15:50 r620-q6 SMGC: [563219]     errlist=[errno.EIO, errno.EAGAIN])
Jun  5 09:15:50 r620-q6 SMGC: [563219]   File "/opt/xensource/sm/util.py", line 347, in ioretry
Jun  5 09:15:50 r620-q6 SMGC: [563219]     return f()
Jun  5 09:15:50 r620-q6 SMGC: [563219]   File "/opt/xensource/sm/vhdutil.py", line 93, in <lambda>
Jun  5 09:15:50 r620-q6 SMGC: [563219]     return util.ioretry(lambda: util.pread2(cmd, text=text),
Jun  5 09:15:50 r620-q6 SMGC: [563219]   File "/opt/xensource/sm/util.py", line 255, in pread2
Jun  5 09:15:50 r620-q6 SMGC: [563219]     return pread(cmdlist, quiet=quiet, text=text)
Jun  5 09:15:50 r620-q6 SMGC: [563219]   File "/opt/xensource/sm/util.py", line 217, in pread
Jun  5 09:15:50 r620-q6 SMGC: [563219]     raise CommandException(rc, str(cmdlist), stderr.strip())
Jun  5 09:15:50 r620-q6 SMGC: [563219]
Jun  5 09:15:50 r620-q6 SMGC: [563219] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
Jun  5 09:15:50 r620-q6 SMGC: [563219] *** UNDO LEAF-COALESCE
Jun  5 09:15:50 r620-q6 SMGC: [563219] Renaming parent back: dce4b0fc-6ad1-4750-857b-45d8d2758503 -> 056b6f93-66ff-460a-9354-157540b584a8
Jun  5 09:15:50 r620-q6 SMGC: [563219] Renaming /var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/dce4b0fc-6ad1-4750-857b-45d8d2758503.vhd -> /var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/056b6f93-66ff-460a-9354-157540b584a8.vhd
Jun  5 09:15:50 r620-q6 SMGC: [563219] Renaming child back to dce4b0fc-6ad1-4750-857b-45d8d2758503
Jun  5 09:15:50 r620-q6 SMGC: [563219] Renaming /var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/OLD_dce4b0fc-6ad1-4750-857b-45d8d2758503.vhd -> /var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/dce4b0fc-6ad1-4750-857b-45d8d2758503.vhd
Jun  5 09:15:50 r620-q6 SMGC: [563219] Updating the VDI record
Jun  5 09:15:50 r620-q6 SMGC: [563219] Set vhd-parent = 056b6f93-66ff-460a-9354-157540b584a8 for dce4b0fc(2.000G/8.500K?)
Jun  5 09:15:50 r620-q6 SMGC: [563219] Set vdi_type = vhd for dce4b0fc(2.000G/8.500K?)
Jun  5 09:15:50 r620-q6 SM: [563219] ['/usr/bin/vhd-util', 'set', '--debug', '-n', '/var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/056b6f93-66ff-460a-9354-157540b584a8.vhd', '-f', 'hidden', '-v', '1']
Jun  5 09:15:50 r620-q6 SM: [563219]   pread SUCCESS
Jun  5 09:15:50 r620-q6 SMGC: [563219] *** leaf-coalesce undo successful

Therefore, a VDI impacted by this problem remains hidden and can no longer be used correctly without manual intervention:

Jun  5 09:16:29 r620-q6 SM: [566174] lock: released /var/lock/sm/f816795d-e7a9-43df-170c-23bc329607fc/sr
Jun  5 09:16:29 r620-q6 SM: [566174] ***** generic exception: vdi_clone: EXCEPTION <class 'xs_errors.SROSError'>, Failed to clone VDI [opterr=hidden VDI]
Jun  5 09:16:29 r620-q6 SM: [566174]   File "/opt/xensource/sm/SRCommand.py", line 113, in run
Jun  5 09:16:29 r620-q6 SM: [566174]     return self._run_locked(sr)
Jun  5 09:16:29 r620-q6 SM: [566174]   File "/opt/xensource/sm/SRCommand.py", line 163, in _run_locked
Jun  5 09:16:29 r620-q6 SM: [566174]     rv = self._run(sr, target)
Jun  5 09:16:29 r620-q6 SM: [566174]   File "/opt/xensource/sm/SRCommand.py", line 270, in _run
Jun  5 09:16:29 r620-q6 SM: [566174]     return target.clone(self.params['sr_uuid'], self.vdi_uuid)
Jun  5 09:16:29 r620-q6 SM: [566174]   File "/opt/xensource/sm/FileSR.py", line 704, in clone
Jun  5 09:16:29 r620-q6 SM: [566174]     return self._do_snapshot(sr_uuid, vdi_uuid, VDI.SNAPSHOT_DOUBLE)
Jun  5 09:16:29 r620-q6 SM: [566174]   File "/opt/xensource/sm/FileSR.py", line 754, in _do_snapshot
Jun  5 09:16:29 r620-q6 SM: [566174]     return self._snapshot(snapType, cbtlog, consistency_state)
Jun  5 09:16:29 r620-q6 SM: [566174]   File "/opt/xensource/sm/FileSR.py", line 797, in _snapshot
Jun  5 09:16:29 r620-q6 SM: [566174]     raise xs_errors.XenError('VDIClone', opterr='hidden VDI')
Jun  5 09:16:29 r620-q6 SM: [566174]

The GC can be interrupted by a SIGTERM signal. If this is caught while modifying
a volume's hidden flag, this can have bad consequences.

For example in the situation below, the hidden flag of a volume has been changed
but the cached value (self.hidden) in the python process still has the old value
because of the 'util.CommandException' exception that was thrown. A VDI
that normally should not be hidden is still hidden after executing
`_undoInterruptedCoalesceLeaf` because the hidden value was not the correct one.

Code:
```
    def _setHidden(self, hidden=True):
        vhdutil.setHidden(self.path, hidden)
        # Exception! Next line is never executed.
        self.hidden = hidden
```

Trace:
```
Jun  5 09:15:50 r620-q6 SMGC: [563219] Removed vhd-parent from dce4b0fc(2.000G/170.336M?)
Jun  5 09:15:50 r620-q6 SMGC: [563219] Removed vhd-blocks from dce4b0fc(2.000G/170.336M?)
Jun  5 09:15:50 r620-q6 SM: [563219] ['/usr/bin/vhd-util', 'set', '--debug', '-n', '/var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/OLD_dce4b0fc-6ad1-4750-857b-45d8d2758503.vhd', '-f', 'hidden', '-v', '1']
Jun  5 09:15:50 r620-q6 SM: [563219] GC: recieved SIGTERM
Jun  5 09:15:50 r620-q6 SM: [563219] FAILED in util.pread: (rc -15) stdout: '', stderr: ''
Jun  5 09:15:50 r620-q6 SMGC: [563219] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
Jun  5 09:15:50 r620-q6 SMGC: [563219]          ***********************
Jun  5 09:15:50 r620-q6 SMGC: [563219]          *  E X C E P T I O N  *
Jun  5 09:15:50 r620-q6 SMGC: [563219]          ***********************
Jun  5 09:15:50 r620-q6 SMGC: [563219] _doCoalesceLeaf: EXCEPTION <class 'util.CommandException'>, Signalled 15
Jun  5 09:15:50 r620-q6 SMGC: [563219]   File "/opt/xensource/sm/cleanup.py", line 2653, in _liveLeafCoalesce
Jun  5 09:15:50 r620-q6 SMGC: [563219]     self._doCoalesceLeaf(vdi)
Jun  5 09:15:50 r620-q6 SMGC: [563219]   File "/opt/xensource/sm/cleanup.py", line 2717, in _doCoalesceLeaf
Jun  5 09:15:50 r620-q6 SMGC: [563219]     vdi._setHidden(True)
Jun  5 09:15:50 r620-q6 SMGC: [563219]   File "/opt/xensource/sm/cleanup.py", line 1063, in _setHidden
Jun  5 09:15:50 r620-q6 SMGC: [563219]     vhdutil.setHidden(self.path, hidden)
Jun  5 09:15:50 r620-q6 SMGC: [563219]   File "/opt/xensource/sm/vhdutil.py", line 235, in setHidden
Jun  5 09:15:50 r620-q6 SMGC: [563219]     ret = ioretry(cmd)
Jun  5 09:15:50 r620-q6 SMGC: [563219]   File "/opt/xensource/sm/vhdutil.py", line 94, in ioretry
Jun  5 09:15:50 r620-q6 SMGC: [563219]     errlist=[errno.EIO, errno.EAGAIN])
Jun  5 09:15:50 r620-q6 SMGC: [563219]   File "/opt/xensource/sm/util.py", line 347, in ioretry
Jun  5 09:15:50 r620-q6 SMGC: [563219]     return f()
Jun  5 09:15:50 r620-q6 SMGC: [563219]   File "/opt/xensource/sm/vhdutil.py", line 93, in <lambda>
Jun  5 09:15:50 r620-q6 SMGC: [563219]     return util.ioretry(lambda: util.pread2(cmd, text=text),
Jun  5 09:15:50 r620-q6 SMGC: [563219]   File "/opt/xensource/sm/util.py", line 255, in pread2
Jun  5 09:15:50 r620-q6 SMGC: [563219]     return pread(cmdlist, quiet=quiet, text=text)
Jun  5 09:15:50 r620-q6 SMGC: [563219]   File "/opt/xensource/sm/util.py", line 217, in pread
Jun  5 09:15:50 r620-q6 SMGC: [563219]     raise CommandException(rc, str(cmdlist), stderr.strip())
Jun  5 09:15:50 r620-q6 SMGC: [563219]
Jun  5 09:15:50 r620-q6 SMGC: [563219] *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
Jun  5 09:15:50 r620-q6 SMGC: [563219] *** UNDO LEAF-COALESCE
Jun  5 09:15:50 r620-q6 SMGC: [563219] Renaming parent back: dce4b0fc-6ad1-4750-857b-45d8d2758503 -> 056b6f93-66ff-460a-9354-157540b584a8
Jun  5 09:15:50 r620-q6 SMGC: [563219] Renaming /var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/dce4b0fc-6ad1-4750-857b-45d8d2758503.vhd -> /var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/056b6f93-66ff-460a-9354-157540b584a8.vhd
Jun  5 09:15:50 r620-q6 SMGC: [563219] Renaming child back to dce4b0fc-6ad1-4750-857b-45d8d2758503
Jun  5 09:15:50 r620-q6 SMGC: [563219] Renaming /var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/OLD_dce4b0fc-6ad1-4750-857b-45d8d2758503.vhd -> /var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/dce4b0fc-6ad1-4750-857b-45d8d2758503.vhd
Jun  5 09:15:50 r620-q6 SMGC: [563219] Updating the VDI record
Jun  5 09:15:50 r620-q6 SMGC: [563219] Set vhd-parent = 056b6f93-66ff-460a-9354-157540b584a8 for dce4b0fc(2.000G/8.500K?)
Jun  5 09:15:50 r620-q6 SMGC: [563219] Set vdi_type = vhd for dce4b0fc(2.000G/8.500K?)
Jun  5 09:15:50 r620-q6 SM: [563219] ['/usr/bin/vhd-util', 'set', '--debug', '-n', '/var/run/sr-mount/f816795d-e7a9-43df-170c-23bc329607fc/056b6f93-66ff-460a-9354-157540b584a8.vhd', '-f', 'hidden', '-v', '1']
Jun  5 09:15:50 r620-q6 SM: [563219]   pread SUCCESS
Jun  5 09:15:50 r620-q6 SMGC: [563219] *** leaf-coalesce undo successful
```

Therefore, a VDI impacted by this problem remains hidden and can no longer
be used correctly without manual intervention:
```
Jun  5 09:16:29 r620-q6 SM: [566174] lock: released /var/lock/sm/f816795d-e7a9-43df-170c-23bc329607fc/sr
Jun  5 09:16:29 r620-q6 SM: [566174] ***** generic exception: vdi_clone: EXCEPTION <class 'xs_errors.SROSError'>, Failed to clone VDI [opterr=hidden VDI]
Jun  5 09:16:29 r620-q6 SM: [566174]   File "/opt/xensource/sm/SRCommand.py", line 113, in run
Jun  5 09:16:29 r620-q6 SM: [566174]     return self._run_locked(sr)
Jun  5 09:16:29 r620-q6 SM: [566174]   File "/opt/xensource/sm/SRCommand.py", line 163, in _run_locked
Jun  5 09:16:29 r620-q6 SM: [566174]     rv = self._run(sr, target)
Jun  5 09:16:29 r620-q6 SM: [566174]   File "/opt/xensource/sm/SRCommand.py", line 270, in _run
Jun  5 09:16:29 r620-q6 SM: [566174]     return target.clone(self.params['sr_uuid'], self.vdi_uuid)
Jun  5 09:16:29 r620-q6 SM: [566174]   File "/opt/xensource/sm/FileSR.py", line 704, in clone
Jun  5 09:16:29 r620-q6 SM: [566174]     return self._do_snapshot(sr_uuid, vdi_uuid, VDI.SNAPSHOT_DOUBLE)
Jun  5 09:16:29 r620-q6 SM: [566174]   File "/opt/xensource/sm/FileSR.py", line 754, in _do_snapshot
Jun  5 09:16:29 r620-q6 SM: [566174]     return self._snapshot(snapType, cbtlog, consistency_state)
Jun  5 09:16:29 r620-q6 SM: [566174]   File "/opt/xensource/sm/FileSR.py", line 797, in _snapshot
Jun  5 09:16:29 r620-q6 SM: [566174]     raise xs_errors.XenError('VDIClone', opterr='hidden VDI')
Jun  5 09:16:29 r620-q6 SM: [566174]
```

Signed-off-by: Ronan Abhamon <[email protected]>
@Wescoeur Wescoeur requested a review from Nambrok June 11, 2025 13:32
@stormi
Copy link
Member

stormi commented Jun 11, 2025

Is this a candidate for an upstream PR?

@Nambrok
Copy link

Nambrok commented Jun 11, 2025

Upstream PR: xapi-project#760

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants