Skip to content

Commit 1cea69b

Browse files
committed
Fix deprecated runbooks to use same content
Fix all deprecated runbooks content to this template: # <alert-name> [Deprecated] This alert has been deprecated; it does not indicate a genuine issue. If triggered, it may be safely ignored and silenced. Signed-off-by: avlitman <[email protected]>
1 parent 13c662c commit 1cea69b

11 files changed

+33
-854
lines changed
+3-51
Original file line numberDiff line numberDiff line change
@@ -1,52 +1,4 @@
1-
# KubeMacPoolDown
1+
# KubeMacPoolDown [Deprecated]
22

3-
**Note:** Starting from 4.14, this runbook was replaced by [KubemacpoolDown runbook](http://kubevirt.io/monitoring/runbooks/KubemacpoolDown.html).
4-
5-
## Meaning
6-
7-
`KubeMacPool` is down. `KubeMacPool` is responsible for allocating MAC addresses
8-
and preventing MAC address conflicts.
9-
10-
## Impact
11-
12-
If `KubeMacPool` is down, `VirtualMachine` objects cannot be created.
13-
14-
## Diagnosis
15-
16-
1. Set the `KMP_NAMESPACE` environment variable:
17-
18-
```bash
19-
$ export KMP_NAMESPACE="$(kubectl get pod -A --no-headers -l \
20-
control-plane=mac-controller-manager | awk '{print $1}')"
21-
```
22-
23-
2. Set the `KMP_NAME` environment variable:
24-
25-
```bash
26-
$ export KMP_NAME="$(kubectl get pod -A --no-headers -l \
27-
control-plane=mac-controller-manager | awk '{print $2}')"
28-
```
29-
30-
3. Obtain the `KubeMacPool-manager` pod details:
31-
32-
```bash
33-
$ kubectl describe pod -n $KMP_NAMESPACE $KMP_NAME
34-
```
35-
36-
4. Check the `KubeMacPool-manager` logs for error messages:
37-
38-
```bash
39-
$ kubectl logs -n $KMP_NAMESPACE $KMP_NAME
40-
```
41-
42-
## Mitigation
43-
44-
<!--DS: If you cannot resolve the issue, log in to the
45-
link:https://access.redhat.com[Customer Portal] and open a support case,
46-
attaching the artifacts gathered during the diagnosis procedure.-->
47-
<!--USstart-->
48-
If you cannot resolve the issue, see the following resources:
49-
50-
- [OKD Help](https://www.okd.io/help/)
51-
- [#virtualization Slack channel](https://kubernetes.slack.com/channels/virtualization)
52-
<!--USend-->
3+
This alert has been deprecated; it does not indicate a genuine issue. If
4+
triggered, it may be safely ignored and silenced.
Original file line numberDiff line numberDiff line change
@@ -1,155 +1,4 @@
1-
# KubeVirtVMStuckInErrorState
1+
# KubeVirtVMStuckInErrorState [Deprecated]
22

3-
## Meaning
4-
5-
The `KubeVirtVMStuckInErrorState` alert means that a VirtualMachine has been in
6-
an error state for more than 5 minutes. VirtualMachines are in error state when
7-
they are in one of the following status:
8-
9-
1. CrashLoopBackOff
10-
2. Unknown
11-
3. Unschedulable
12-
4. ErrImagePull
13-
5. ImagePullBackOff
14-
6. PvcNotFound
15-
7. DataVolumeError
16-
17-
This alert can suggest an issue in the VirtualMachine configuration, e.g. a
18-
missing PVC, or a problem in the cluster's underlying infrastructure, e.g.
19-
network disruptions, node resource shortage, etc.
20-
21-
## Impact
22-
23-
There is no immediate impact. However, if there are multiple machines in an
24-
error state, it might indicate that something is not working as planned, for
25-
example, a script may be consistently creating incorrect VirtualMachines
26-
configurations, or there might be a problem in the cluster's underlying
27-
infrastructure.
28-
29-
## Diagnosis
30-
31-
Check the VirtualMachine's status and conditions, and VM logs and configuration
32-
to find out what is causing the error state.
33-
34-
```bash
35-
$ kubectl describe vmi testvmi-hxghp -n kubevirt-test-default1
36-
37-
Name: testvmi-hxghp
38-
Namespace: kubevirt-test-default1
39-
Labels: name=testvmi-hxghp
40-
Annotations: kubevirt.io/latest-observed-api-version: v1
41-
kubevirt.io/storage-observed-api-version: v1alpha3
42-
API Version: kubevirt.io/v1
43-
Kind: VirtualMachineInstance
44-
Metadata:
45-
...
46-
Spec:
47-
Domain:
48-
...
49-
Resources:
50-
Requests:
51-
Cpu: 5000000Gi
52-
Memory: 5130000240Mi
53-
...
54-
Status:
55-
Active Pods:
56-
acbc8143-c1da-45e8-b498-3f0dafcd1383:
57-
Conditions:
58-
Last Probe Time: 2022-10-03T11:11:07Z
59-
Last Transition Time: 2022-10-03T11:11:07Z
60-
Message: Guest VM is not reported as running
61-
Reason: GuestNotRunning
62-
Status: False
63-
Type: Ready
64-
Last Probe Time: <nil>
65-
Last Transition Time: 2022-10-03T11:11:07Z
66-
Message: 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory.
67-
Reason: Unschedulable
68-
Status: False
69-
Type: PodScheduled
70-
Guest OS Info:
71-
Phase: Scheduling
72-
Phase Transition Timestamps:
73-
Phase: Pending
74-
Phase Transition Timestamp: 2022-10-03T11:11:07Z
75-
Phase: Scheduling
76-
Phase Transition Timestamp: 2022-10-03T11:11:07Z
77-
Qos Class: Burstable
78-
Runtime User: 0
79-
Virtual Machine Revision Name: revision-start-vm-3503e2dc-27c0-46ef-9167-7ae2e7d93e6e-1
80-
Events:
81-
Type Reason Age From Message
82-
---- ------ ---- ---- -------
83-
Normal SuccessfulCreate 27s virtualmachine-controller Created virtual machine pod virt-launcher-testvmi-hxghp-xh9qn
84-
85-
86-
```
87-
88-
Also, check the nodes statuses and conditions.
89-
90-
```bash
91-
$ kubectl get nodes -l node-role.kubernetes.io/worker= -o json | jq '.items | .[].status.allocatable'
92-
93-
{
94-
"cpu": "5",
95-
"devices.kubevirt.io/kvm": "1k",
96-
"devices.kubevirt.io/sev": "0",
97-
"devices.kubevirt.io/tun": "1k",
98-
"devices.kubevirt.io/vhost-net": "1k",
99-
"ephemeral-storage": "33812468066",
100-
"hugepages-1Gi": "0",
101-
"hugepages-2Mi": "128Mi",
102-
"memory": "3783496Ki",
103-
"pods": "110"
104-
}
105-
```
106-
107-
```bash
108-
$ kubectl get nodes -l node-role.kubernetes.io/worker= -o json | jq '.items | .[].status.conditions'
109-
110-
[
111-
{
112-
"lastHeartbeatTime": "2022-10-03T11:13:34Z",
113-
"lastTransitionTime": "2022-10-03T10:14:20Z",
114-
"message": "kubelet has sufficient memory available",
115-
"reason": "KubeletHasSufficientMemory",
116-
"status": "False",
117-
"type": "MemoryPressure"
118-
},
119-
{
120-
"lastHeartbeatTime": "2022-10-03T11:13:34Z",
121-
"lastTransitionTime": "2022-10-03T10:14:20Z",
122-
"message": "kubelet has no disk pressure",
123-
"reason": "KubeletHasNoDiskPressure",
124-
"status": "False",
125-
"type": "DiskPressure"
126-
},
127-
{
128-
"lastHeartbeatTime": "2022-10-03T11:13:34Z",
129-
"lastTransitionTime": "2022-10-03T10:14:20Z",
130-
"message": "kubelet has sufficient PID available",
131-
"reason": "KubeletHasSufficientPID",
132-
"status": "False",
133-
"type": "PIDPressure"
134-
},
135-
{
136-
"lastHeartbeatTime": "2022-10-03T11:13:34Z",
137-
"lastTransitionTime": "2022-10-03T10:14:30Z",
138-
"message": "kubelet is posting ready status",
139-
"reason": "KubeletReady",
140-
"status": "True",
141-
"type": "Ready"
142-
}
143-
]
144-
```
145-
146-
## Mitigation
147-
148-
First, ensure that the VirtualMachine configuration is correct and all necessary
149-
resources exist. For example, if a PVC is missing, it should be created. Also,
150-
verify that the cluster's infrastructure is healthy and there are enough
151-
resources to run the VirtualMachine.
152-
153-
This problem can be caused by several reasons. Therefore, we advise you to try
154-
to identify and fix the root cause. If you cannot resolve this issue, please
155-
open an issue and attach the artifacts gathered in the Diagnosis section.
3+
This alert has been deprecated; it does not indicate a genuine issue. If
4+
triggered, it may be safely ignored and silenced.
Original file line numberDiff line numberDiff line change
@@ -1,89 +1,4 @@
1-
# KubeVirtVMStuckInMigratingState
1+
# KubeVirtVMStuckInMigratingState [Deprecated]
22

3-
## Meaning
4-
5-
The `KubeVirtVMStuckInMigratingState` alert means that a VirtualMachine has been
6-
in a migrating state for more than 5 minutes. This alert can suggest a problem
7-
in the cluster's underlying infrastructure, e.g. network disruptions, node
8-
resource shortage, etc.
9-
10-
## Impact
11-
12-
There is no immediate impact. However, if there are multiple machines in a
13-
migrating state, it might indicate there might be a problem in the cluster's
14-
underlying infrastructure.
15-
16-
## Diagnosis
17-
18-
Check the nodes statuses and conditions for potential issues.
19-
20-
```bash
21-
$ kubectl get nodes -l node-role.kubernetes.io/worker= -o json | jq '.items | .[].status.allocatable'
22-
23-
{
24-
"cpu": "5",
25-
"devices.kubevirt.io/kvm": "1k",
26-
"devices.kubevirt.io/sev": "0",
27-
"devices.kubevirt.io/tun": "1k",
28-
"devices.kubevirt.io/vhost-net": "1k",
29-
"ephemeral-storage": "33812468066",
30-
"hugepages-1Gi": "0",
31-
"hugepages-2Mi": "128Mi",
32-
"memory": "3783496Ki",
33-
"pods": "110"
34-
}
35-
```
36-
37-
```bash
38-
$ kubectl get nodes -l node-role.kubernetes.io/worker= -o json | jq '.items | .[].status.conditions'
39-
40-
[
41-
{
42-
"lastHeartbeatTime": "2022-10-03T11:13:34Z",
43-
"lastTransitionTime": "2022-10-03T10:14:20Z",
44-
"message": "kubelet has sufficient memory available",
45-
"reason": "KubeletHasSufficientMemory",
46-
"status": "False",
47-
"type": "MemoryPressure"
48-
},
49-
{
50-
"lastHeartbeatTime": "2022-10-03T11:13:34Z",
51-
"lastTransitionTime": "2022-10-03T10:14:20Z",
52-
"message": "kubelet has no disk pressure",
53-
"reason": "KubeletHasNoDiskPressure",
54-
"status": "False",
55-
"type": "DiskPressure"
56-
},
57-
{
58-
"lastHeartbeatTime": "2022-10-03T11:13:34Z",
59-
"lastTransitionTime": "2022-10-03T10:14:20Z",
60-
"message": "kubelet has sufficient PID available",
61-
"reason": "KubeletHasSufficientPID",
62-
"status": "False",
63-
"type": "PIDPressure"
64-
},
65-
{
66-
"lastHeartbeatTime": "2022-10-03T11:13:34Z",
67-
"lastTransitionTime": "2022-10-03T10:14:30Z",
68-
"message": "kubelet is posting ready status",
69-
"reason": "KubeletReady",
70-
"status": "True",
71-
"type": "Ready"
72-
}
73-
]
74-
```
75-
76-
## Mitigation
77-
78-
Ensure you applied the appropriate migration configuration to the VirtualMachine
79-
according to the nature of the workload.
80-
81-
Migration configurations can either be set globally via Kubevirt CR's
82-
`MigrationConfiguration` struct or to a specific scope with
83-
[Migration Policies](https://kubevirt.io/user-guide/operations/migration_policies/#migration-policies).
84-
To check whether a VirtualMachine is bound to a migration policy, please refer
85-
to its `vm.Status.MigrationState.MigrationPolicyName`.
86-
87-
This problem can be caused by several reasons. Therefore, we advise you to try
88-
to identify and fix the root cause. If you cannot resolve this issue, please
89-
open an issue and attach the artifacts gathered in the Diagnosis section.
3+
This alert has been deprecated; it does not indicate a genuine issue. If
4+
triggered, it may be safely ignored and silenced.
Original file line numberDiff line numberDiff line change
@@ -1,84 +1,4 @@
1-
# KubeVirtVMStuckInStartingState
1+
# KubeVirtVMStuckInStartingState [Deprecated]
22

3-
## Meaning
4-
5-
The `KubeVirtVMStuckInStartingState` alert means that a VirtualMachine has been
6-
in a starting state for more than 5 minutes. This alert can suggest an issue in
7-
the VirtualMachine configuration, e.g., a misconfigured priority class or a
8-
missing network device.
9-
10-
## Impact
11-
12-
There is no immediate impact. However, if there are multiple machines in a
13-
starting state, it might indicate that something is not working as planned, for
14-
example, a script may be consistently creating incorrect Virtual Machines
15-
configurations.
16-
17-
## Diagnosis
18-
19-
Check the VirtualMachine's status and conditions, and VM logs and configuration
20-
to find out what is causing the starting state.
21-
22-
```bash
23-
$ kubectl describe vmi testvmi-ldgrw -n kubevirt-test-default1
24-
25-
Name: testvmi-ldgrw
26-
Namespace: kubevirt-test-default1
27-
Labels: name=testvmi-ldgrw
28-
Annotations: kubevirt.io/latest-observed-api-version: v1
29-
kubevirt.io/storage-observed-api-version: v1alpha3
30-
API Version: kubevirt.io/v1
31-
Kind: VirtualMachineInstance
32-
Metadata:
33-
...
34-
Spec:
35-
...
36-
Networks:
37-
Name: default
38-
Pod:
39-
Priority Class Name: non-preemtible
40-
Termination Grace Period Seconds: 0
41-
Status:
42-
Conditions:
43-
Last Probe Time: 2022-10-03T11:08:30Z
44-
Last Transition Time: 2022-10-03T11:08:30Z
45-
Message: virt-launcher pod has not yet been scheduled
46-
Reason: PodNotExists
47-
Status: False
48-
Type: Ready
49-
Last Probe Time: <nil>
50-
Last Transition Time: 2022-10-03T11:08:30Z
51-
Message: failed to create virtual machine pod: pods "virt-launcher-testvmi-ldgrw-" is forbidden: no PriorityClass with name non-preemtible was found
52-
Reason: FailedCreate
53-
Status: False
54-
Type: Synchronized
55-
Guest OS Info:
56-
Phase: Pending
57-
Phase Transition Timestamps:
58-
Phase: Pending
59-
Phase Transition Timestamp: 2022-10-03T11:08:30Z
60-
Runtime User: 0
61-
Virtual Machine Revision Name: revision-start-vm-6f01a94b-3260-4c5a-bbe5-dc98d13e6bea-1
62-
Events:
63-
Type Reason Age From Message
64-
---- ------ ---- ---- -------
65-
Warning FailedCreate 8s (x13 over 28s) virtualmachine-controller Error creating pod: pods "virt-launcher-testvmi-ldgrw-" is forbidden: no PriorityClass with name non-preemtible was found
66-
```
67-
68-
## Mitigation
69-
70-
First, ensure that the VirtualMachine configuration is correct and all necessary
71-
resources exist. For example, if a network device is missing, it should be
72-
created.
73-
74-
If the state of the VirtualMachine is "Pending", it means that it wasn't
75-
scheduled yet, which in turn rules out scheduling issues as the root cause. If
76-
this is the case, possible causes include:
77-
78-
1. virt-launcher pod isn't scheduled
79-
2. Topology hints for VMI aren't updated
80-
3. DV is not provisioned/ready
81-
82-
This problem can be caused by several reasons. Therefore, we advise you to try
83-
to identify and fix the root cause. If you cannot resolve this issue, please
84-
open an issue and attach the artifacts gathered in the Diagnosis section.
3+
This alert has been deprecated; it does not indicate a genuine issue. If
4+
triggered, it may be safely ignored and silenced.

0 commit comments

Comments
 (0)