You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ci-operator/templates/openshift/installer/cluster-launch-installer-e2e: Gather node console logs on AWS
To help debug things like [1]:
Dec 2 16:31:41.298: INFO: cluster upgrade is Failing: Cluster operator kube-apiserver is reporting a failure: NodeControllerDegraded: The master node(s) "ip-10-0-136-232.ec2.internal" not ready
...
Kubelet stopped posting node status.
where a node goes down but does not come back up far enough to
reconnect as a node.
Eventually, we'll address this with machine-health checks, killing the
non-responsive machine and automatically replacing it with a new one.
That's currently waiting on an etcd operator that can handle
reconnecting control-plane machines automatically. But in the short
term, and possibly still in the long term, it's nice to collect what
we can from the broken machine to understand why it didn't come back
up. This code isn't specific to broken machines, but collecting
console logs from all nodes should cover us in the broken-machine case
as well.
[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1778904
0 commit comments