You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ci-operator/step-registry/ipi/conf/aws: Drop us-east-1 zone overrides
This was originally part of avoiding broken zones, see e8921c3
(ci-operator/templates/openshift: Get e2e-aws out of us-east-1b,
2019-03-22, openshift#3204) and b717933
(ci-operator/templates/openshift/installer/cluster-launch-installer-*:
Random AWS regions for IPI, 2020-01-23, openshift#6833). But the installer has
had broken-zone avoidence since way back in
openshift/installer@71aef620b6 (pkg/asset/machines/aws: Only return
available zones, 2019-02-07, openshift/installer#1210) I dunno how
reliably AWS sets 'state: impaired' and similar; it didn't seem to
protect us from e8921c3. But we're getting ready to pivot to using
multiple AWS accounts, which creates two issues with hard-coding
region names in the step:
1. References by name are not stable between accounts. From the AWS
docs [1]:
To ensure that resources are distributed across the Availability
Zones for a Region, we independently map Availability Zones to
names for each AWS account. For example, the Availability Zone
us-east-1a for your AWS account might not be the same location as
us-east-1a for another AWS account.
So "aah, us-east-1a is broken, let's use b and c instead" might
apply to one account but not the other. And the installer does not
currently accept zone IDs.
2. References by name may not exist in other accounts. From the AWS
docs [1]:
As Availability Zones grow over time, our ability to expand them
can become constrained. If this happens, we might restrict you
from launching an instance in a constrained Availability Zone
unless you already have an instance in that Availability
Zone. Eventually, we might also remove the constrained
Availability Zone from the list of Availability Zones for new
accounts. Therefore, your account might have a different number
of available Availability Zones in a Region than another account.
And it turns out that for some reason they sometimes don't name
sequentially, e.g. our new account lacks us-west-1a:
$ AWS_PROFILE=ci aws --region us-west-1 ec2 describe-availability-zones | jq -r '.AvailabilityZones[] | .ZoneName + " " + .ZoneId + " " + .State' | sort
us-west-1a usw1-az3 available
us-west-1b usw1-az1 available
$ AWS_PROFILE=ci-2 aws --region us-west-1 ec2 describe-availability-zones | jq -r '.AvailabilityZones[] | .ZoneName + " " + .ZoneId + " " + .State' | sort
us-west-1b usw1-az3 available
us-west-1c usw1-az1 available
I have no idea why they decided to do that, but we have to work
with the world as it is ;).
Removing the us-east-1 overrides helps reduce our exposure, although
we are still vulnerable to (2) with the a/b default line. We'll do
something about that in follow-up work.
Leaving the "which zones?" decision up to the installer would cause it
to try to set up each available zone, and that causes more API
contention and resource consumption than we want. Background on that
in 51c4a37 (ci-operator/templates/openshift: Explicitly set AWS
availability zones, 2019-03-28, openshift#3285) and d87fffb
(ci-operator/templates/openshift: Drop us-east-1c, 2019-04-26, openshift#3615),
as well as the rejected/rotted-out [2].
[1]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html
[2]: openshift/installer#1487
0 commit comments