Skip to content

Commit 51c4a37

Browse files
committed
ci-operator/templates/openshift: Explicitly set AWS availability zones
This is very similar to the earlier e8921c3 (ci-operator/templates/openshift: Get e2e-aws out of us-east-1b, 2019-03-22, openshift#3204). This time, however, I'm not changing the zones where the machines will run. By default, the installer will provisioning zone infrastructure in all available zones, but since openshift/installer@644f705286 (data/aws/vpc: Only create subnet infrastucture for zones with Machine(Set)s, 2019-03-27, openshift/installer#1481) users who explicitly set zones in their install-config will no longer have unused zones provisioned with subnets, NAT gateways, EIPs, and other related infrastructure. This infrastructure reduction has two benefits in CI: 1. We don't have to pay for resources that we won't use, and we will have more room under our EIP limits (although we haven't bumped into that one in a while, because we're VPC-constained). 2. We should see reduced rates in clusters failing install because of AWS rate limiting, with results like [1]: aws_route.to_nat_gw.3: Error creating route: timeout while waiting for state to become 'success' (timeout: 2m0s) The reduction is because: i. We'll be making fewer requests for these resources, because we won't need to create (and subsequently tear down) as many of them. This will reduce our overall AWS-API load somewhat, although the reduction will be incremental because we have so many other resources which are not associated with zones. ii. Throttling for these per-zone resources are the ones that tend to break Terraform [2]. So even if the rate of timeouts per-API request remains unchanged, a given cluster will only have half as many (three vs. the old six) per-zone chances of hitting one of the timeouts. This should give us something close to a 50% reduction in clusters hitting throttling timeouts. The drawback is that we're diverging further from the stock "I just called 'openshift-install create cluster' without providing an install-config.yaml" experience. [1]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_console-operator/187/pull-ci-openshift-console-operator-master-e2e-aws-operator/575/artifacts/e2e-aws-operator/installer/.openshift_install.log [2]: With a cache of build-log.txt from the past ~48 hours: $ grep -hr 'timeout while waiting for state' ~/.cache/openshift-deck-build-logs >timeouts $ wc -l timeouts 362 timeouts $ grep aws_route_table_association timeouts | wc -l 214 $ grep 'aws_route\.to_nat_gw' timeouts | wc -l 102 So (102+214)/362 is 87% of our timeouts, with the remainder being almost entirely related to the internet gateway (which is not per-zone).
1 parent 3e1b090 commit 51c4a37

File tree

4 files changed

+69
-2
lines changed

4 files changed

+69
-2
lines changed

ci-operator/templates/openshift/installer/cluster-launch-installer-e2e.yaml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -237,6 +237,24 @@ objects:
237237
clusterID: ${CLUSTER_ID}
238238
metadata:
239239
name: ${CLUSTER_NAME}
240+
controlPlane:
241+
name: master
242+
replicas: 3
243+
platform:
244+
aws:
245+
zones:
246+
- us-east-1a
247+
- us-east-1b
248+
- us-east-1c
249+
compute:
250+
- name: worker
251+
replicas: 3
252+
platform:
253+
aws:
254+
zones:
255+
- us-east-1a
256+
- us-east-1b
257+
- us-east-1c
240258
networking:
241259
clusterNetwork:
242260
- cidr: 10.128.0.0/14

ci-operator/templates/openshift/installer/cluster-launch-installer-src.yaml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -231,6 +231,24 @@ objects:
231231
clusterID: ${CLUSTER_ID}
232232
metadata:
233233
name: ${CLUSTER_NAME}
234+
controlPlane:
235+
name: master
236+
replicas: 3
237+
platform:
238+
aws:
239+
zones:
240+
- us-east-1a
241+
- us-east-1b
242+
- us-east-1c
243+
compute:
244+
- name: worker
245+
replicas: 3
246+
platform:
247+
aws:
248+
zones:
249+
- us-east-1a
250+
- us-east-1b
251+
- us-east-1c
234252
networking:
235253
clusterNetwork:
236254
- cidr: 10.128.0.0/14

ci-operator/templates/openshift/openshift-ansible/cluster-launch-e2e-40.yaml

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -257,11 +257,24 @@ objects:
257257
apiVersion: v1beta4
258258
baseDomain: test.ose
259259
clusterID: ${CLUSTER_ID}
260+
controlPlane:
261+
name: master
262+
replicas: ${MASTERS}
263+
platform:
264+
aws:
265+
zones:
266+
- us-east-1a
267+
- us-east-1b
268+
- us-east-1c
260269
compute:
261270
- name: worker
262271
replicas: ${WORKERS}
263-
controlPlane:
264-
- replicas: ${MASTERS}
272+
platform:
273+
aws:
274+
zones:
275+
- us-east-1a
276+
- us-east-1b
277+
- us-east-1c
265278
metadata:
266279
name: ${CLUSTER_NAME}
267280
networking:

ci-operator/templates/openshift/openshift-ansible/cluster-scaleup-e2e-40.yaml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -322,6 +322,24 @@ objects:
322322
clusterID: ${CLUSTER_ID}
323323
metadata:
324324
name: ${CLUSTER_NAME}
325+
controlPlane:
326+
name: master
327+
replicas: 3
328+
platform:
329+
aws:
330+
zones:
331+
- us-east-1a
332+
- us-east-1b
333+
- us-east-1c
334+
compute:
335+
- name: worker
336+
replicas: 3
337+
platform:
338+
aws:
339+
zones:
340+
- us-east-1a
341+
- us-east-1b
342+
- us-east-1c
325343
networking:
326344
clusterNetwork:
327345
- cidr: 10.128.0.0/14

0 commit comments

Comments
 (0)