Skip to content

[aws] pulling ocp-release images includes @sha256@sha256 on latest installer 0.9.1 causing installation break #1066

Closed
@jatanmalde

Description

@jatanmalde

Version

[root@localhost ~]# openshift-install version
openshift-install v0.9.1
[root@localhost ~]# oc version
oc v4.0.0-0.79.0
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Platform (aws|libvirt|openstack):

aws

What happened?

Hey Guys, I am trying to install one master and three worker nodes for OCP 4.0 on aws,
I can see the bootstrapper node and master nodes on the aws console.
On checking the openshift-install.log file I can see it failing here,

time="2019-01-14T23:10:06+05:30" level=debug msg="Still waiting for the Kubernetes API: Get https://test-api.aws.cee.redhat.com:6443/version?timeout=32s: dial tcp 18.224.189.175:6443: connect: connection refused"
time="2019-01-14T23:10:57+05:30" level=info msg="API v1.11.0+0583818 up"
time="2019-01-14T23:10:57+05:30" level=info msg="Waiting up to 30m0s for the bootstrap-complete event..."
time="2019-01-14T23:10:57+05:30" level=debug msg="added kube-controller-manager.1579c7d32898b0b7: ip-10-0-1-144_8699fbf7-1823-11e9-83c1-02978bf4bc4e became leader"
time="2019-01-14T23:10:57+05:30" level=debug msg="added kube-scheduler.1579c7d3575f93ce: ip-10-0-1-144_86bb137d-1823-11e9-a99e-02978bf4bc4e became leader"
time="2019-01-15T09:58:28+05:30" level=warning msg="RetryWatcher - getting event failed! Re-creating the watcher. Last RV: 92"
time="2019-01-15T09:58:29+05:30" level=warning msg="Failed to connect events watcher: Get https://test-api.aws.cee.redhat.com:6443/api/v1/namespaces/kube-system/events?resourceVersion=92&watch=true: dial tcp 18.224.189.175:6443: connect: connection refused"

Checking the bootstrap node I could see the bootkube.service in failed state reporting,

[core@ip-10-0-1-144 ~]$ journalctl -b -f -u bootkube.service
-- Logs begin at Mon 2019-01-14 17:35:11 UTC. --
Jan 15 09:53:32 ip-10-0-1-144 systemd[1]: bootkube.service: main process exited, code=exited, status=125/n/a
Jan 15 09:53:32 ip-10-0-1-144 systemd[1]: Unit bootkube.service entered failed state.
Jan 15 09:53:32 ip-10-0-1-144 systemd[1]: bootkube.service failed.
Jan 15 09:53:37 ip-10-0-1-144 systemd[1]: bootkube.service holdoff time over, scheduling restart.
Jan 15 09:53:37 ip-10-0-1-144 systemd[1]: Stopped Bootstrap a Kubernetes cluster.
Jan 15 09:53:37 ip-10-0-1-144 systemd[1]: Started Bootstrap a Kubernetes cluster.
Jan 15 09:53:37 ip-10-0-1-144 bootkube.sh[3055]: unable to pull quay.io/openshift-release-dev/ocp-release@sha256@sha256:e237499d3b118e25890550daad8b17274af93baf855914a9c6f8f07ebc095dea: error getting default registries to try: invalid reference format

I can see that the ocp-release image getting pulled but I can see @sha256 keyword getting repeated on it.
Checking the /usr/local/bin/bootkube.sh file I cannot see any reference to the sha256 value but only could see the image tag which when manually pulled works fine.

How can I can continue back the installation? Do I have to destroy the cluster and rebuild it?

Let me know if you are looking for more logs.

I can see this issue already reported #2086 but this is happening with the latest installer as well.

What you expected to happen?

  • The single master cluster should be up and running on aws.

How to reproduce it (as minimally and precisely as possible)?

$ your-commands-here

References

  • enter text here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions