Description
Version
[root@localhost ~]# openshift-install version
openshift-install v0.9.1
[root@localhost ~]# oc version
oc v4.0.0-0.79.0
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO
Platform (aws|libvirt|openstack):
aws
What happened?
Hey Guys, I am trying to install one master and three worker nodes for OCP 4.0 on aws,
I can see the bootstrapper node and master nodes on the aws console.
On checking the openshift-install.log
file I can see it failing here,
time="2019-01-14T23:10:06+05:30" level=debug msg="Still waiting for the Kubernetes API: Get https://test-api.aws.cee.redhat.com:6443/version?timeout=32s: dial tcp 18.224.189.175:6443: connect: connection refused"
time="2019-01-14T23:10:57+05:30" level=info msg="API v1.11.0+0583818 up"
time="2019-01-14T23:10:57+05:30" level=info msg="Waiting up to 30m0s for the bootstrap-complete event..."
time="2019-01-14T23:10:57+05:30" level=debug msg="added kube-controller-manager.1579c7d32898b0b7: ip-10-0-1-144_8699fbf7-1823-11e9-83c1-02978bf4bc4e became leader"
time="2019-01-14T23:10:57+05:30" level=debug msg="added kube-scheduler.1579c7d3575f93ce: ip-10-0-1-144_86bb137d-1823-11e9-a99e-02978bf4bc4e became leader"
time="2019-01-15T09:58:28+05:30" level=warning msg="RetryWatcher - getting event failed! Re-creating the watcher. Last RV: 92"
time="2019-01-15T09:58:29+05:30" level=warning msg="Failed to connect events watcher: Get https://test-api.aws.cee.redhat.com:6443/api/v1/namespaces/kube-system/events?resourceVersion=92&watch=true: dial tcp 18.224.189.175:6443: connect: connection refused"
Checking the bootstrap node I could see the bootkube.service in failed state reporting,
[core@ip-10-0-1-144 ~]$ journalctl -b -f -u bootkube.service
-- Logs begin at Mon 2019-01-14 17:35:11 UTC. --
Jan 15 09:53:32 ip-10-0-1-144 systemd[1]: bootkube.service: main process exited, code=exited, status=125/n/a
Jan 15 09:53:32 ip-10-0-1-144 systemd[1]: Unit bootkube.service entered failed state.
Jan 15 09:53:32 ip-10-0-1-144 systemd[1]: bootkube.service failed.
Jan 15 09:53:37 ip-10-0-1-144 systemd[1]: bootkube.service holdoff time over, scheduling restart.
Jan 15 09:53:37 ip-10-0-1-144 systemd[1]: Stopped Bootstrap a Kubernetes cluster.
Jan 15 09:53:37 ip-10-0-1-144 systemd[1]: Started Bootstrap a Kubernetes cluster.
Jan 15 09:53:37 ip-10-0-1-144 bootkube.sh[3055]: unable to pull quay.io/openshift-release-dev/ocp-release@sha256@sha256:e237499d3b118e25890550daad8b17274af93baf855914a9c6f8f07ebc095dea: error getting default registries to try: invalid reference format
I can see that the ocp-release image getting pulled but I can see @sha256
keyword getting repeated on it.
Checking the /usr/local/bin/bootkube.sh
file I cannot see any reference to the sha256 value but only could see the image tag which when manually pulled works fine.
How can I can continue back the installation? Do I have to destroy the cluster and rebuild it?
Let me know if you are looking for more logs.
I can see this issue already reported #2086
but this is happening with the latest installer as well.
What you expected to happen?
- The single master cluster should be up and running on aws.
How to reproduce it (as minimally and precisely as possible)?
$ your-commands-here
References
- enter text here.