Skip to content

ci-operator/templates/openshift/installer/cluster-launch-installer-*: Drop eastus Azure region #5182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 25, 2019

Conversation

wking
Copy link
Member

@wking wking commented Sep 25, 2019

Azure seems to have a shortage of VMs there today, with failures like:

level=error msg="Error: Code=\"ZonalAllocationFailed\" Message=\"Allocation failed. We do not have sufficient capacity for the requested VM size in this zone. Read more about improving likelihood of allocation success at http://aka.ms/allocation-guidance\""

You can see that failure was for eastus with:

$ curl -s https://storage.googleapis.com/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-serial-4.2/244/artifacts/e2e-azure-serial/installer/terraform.tfstate | jq -r '.resources[] | select(.name == "ignition_bootstrap").instances[].attributes.content | fromjson | .storage.files[] | select(.path == "/opt/openshift/manifests/cluster-config.yaml").contents.source' | sed 's/.*,//' | base64 -d | grep region
        region: eastus

While we could drop to an even weighting among the three remaining zones, in this commit I'm double-waiting centralus because we have extra capacity there. Before 4d08a9d (#5081) we were serving all 20 of our Azure leases out of centralus.

@openshift-ci-robot openshift-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Sep 25, 2019
… Drop eastus Azure region

Azure seems to have a shortage of VMs there today, with failures like
[1]:

  level=error msg="Error: Code=\"ZonalAllocationFailed\" Message=\"Allocation failed. We do not have sufficient capacity for the requested VM size in this zone. Read more about improving likelihood of allocation success at http://aka.ms/allocation-guidance\""

You can see that failure was for eastus with:

  $ curl -s https://storage.googleapis.com/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-serial-4.2/244/artifacts/e2e-azure-serial/installer/terraform.tfstate | jq -r '.resources[] | select(.name == "ignition_bootstrap").instances[].attributes.content | fromjson | .storage.files[] | select(.path == "/opt/openshift/manifests/cluster-config.yaml").contents.source' | sed 's/.*,//' | base64 -d | grep region
          region: eastus

Jerry also found an eastus2 failure like this.

While we could drop to an even weighting among the three remaining
zones, in this commit I'm triple-weighting centralus because we have
extra capacity there.  Before 4d08a9d
(ci-operator/templates/openshift/installer/cluster-launch-installer-*:
Random Azure regions, 2019-09-18, openshift#5081) we were serving all 20 of our
Azure leases out of centralus.

[1]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-serial-4.2/244
Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 25, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wking, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 895facc into openshift:master Sep 25, 2019
@openshift-ci-robot
Copy link
Contributor

@wking: Updated the following 4 configmaps:

  • prow-job-cluster-launch-installer-e2e configmap in namespace ci using the following files:
    • key cluster-launch-installer-e2e.yaml using file ci-operator/templates/openshift/installer/cluster-launch-installer-e2e.yaml
  • prow-job-cluster-launch-installer-e2e configmap in namespace ci-stg using the following files:
    • key cluster-launch-installer-e2e.yaml using file ci-operator/templates/openshift/installer/cluster-launch-installer-e2e.yaml
  • prow-job-cluster-launch-installer-src configmap in namespace ci using the following files:
    • key cluster-launch-installer-src.yaml using file ci-operator/templates/openshift/installer/cluster-launch-installer-src.yaml
  • prow-job-cluster-launch-installer-src configmap in namespace ci-stg using the following files:
    • key cluster-launch-installer-src.yaml using file ci-operator/templates/openshift/installer/cluster-launch-installer-src.yaml

In response to this:

Azure seems to have a shortage of VMs there today, with failures like:

level=error msg="Error: Code=\"ZonalAllocationFailed\" Message=\"Allocation failed. We do not have sufficient capacity for the requested VM size in this zone. Read more about improving likelihood of allocation success at http://aka.ms/allocation-guidance\""

You can see that failure was for eastus with:

$ curl -s https://storage.googleapis.com/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-azure-serial-4.2/244/artifacts/e2e-azure-serial/installer/terraform.tfstate | jq -r '.resources[] | select(.name == "ignition_bootstrap").instances[].attributes.content | fromjson | .storage.files[] | select(.path == "/opt/openshift/manifests/cluster-config.yaml").contents.source' | sed 's/.*,//' | base64 -d | grep region
       region: eastus

While we could drop to an even weighting among the three remaining zones, in this commit I'm double-waiting centralus because we have extra capacity there. Before 4d08a9d (#5081) we were serving all 20 of our Azure leases out of centralus.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking wking deleted the drop-azure-eastus branch September 25, 2019 18:29
@openshift-ci-robot
Copy link
Contributor

@wking: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/rehearse/codeready-toolchain/host-operator/master/e2e 00fa1c5 link /test pj-rehearse
ci/rehearse/openshift/cloud-credential-operator/master/e2e-gcp 00fa1c5 link /test pj-rehearse
ci/prow/pj-rehearse 00fa1c5 link /test pj-rehearse

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

wking added a commit to wking/openshift-release that referenced this pull request Nov 21, 2019
…aller-*: Drop eastus Azure region"

This reverts commit 00fa1c5, openshift#5182.

Now we're having a problem with centralus:

  $ curl -s 'https://search.svc.ci.openshift.org/search?name=azure&maxAge=24h&context=5&search=We+do+not+have+sufficient+capacity+for+the+requested+VM+size+in+this+zone' | jq -r '. | to_entries[].value | to_entries[].value[].context[]' | sed -n 's/Azure region: //p' | sort | uniq -c | sort -n
       18 centralus

So de-emphasize it by reverting the earlier commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants