Skip to content

Commit 246f4a1

Browse files
committed
data/aws: 20-minute create timeouts for routes and security groups
Using [1,2,3,4,5], both of which were added in v1.11, so we have them in our v2.2 AWS provider. This should mitigate some of the issues we've been having in our busy CI account, where out of ~1150 jobs in the last 24 hours, we've had the following failures [6]: $ curl -s 'http://localhost:8000/search?name=-e2e-aws&.&q=level%3Derror.*timeout+while+waiting+for+state' | jq -r '. | to_entries[].value[] | to_entries[].value[]' | sed 's/(i-[^)]*/(i-.../;s/(igw-[^)]*/(igw-.../;s/\(master\|nat_gw\|private_routing\|route_net\)\.[0-9]/\1.../' | sort | uniq -c | sort -n 2 level=error msg="\t* aws_instance.master...: Error waiting for instance (i-...) to become ready: timeout while waiting for state 10 level=error msg="\t* aws_security_group.bootstrap: timeout while waiting for state 38 level=error msg="\t* aws_route.igw_route: Error creating route: timeout while waiting for state 58 level=error msg="\t* aws_internet_gateway.igw: error attaching EC2 Internet Gateway (igw-...): timeout while waiting for state 76 level=error msg="\t* aws_route_table_association.private_routing...: timeout while waiting for state 90 level=error msg="\t* aws_route_table_association.route_net...: timeout while waiting for state 164 level=error msg="\t* aws_route.to_nat_gw...: Error creating route: timeout while waiting for state The 20 minute timeout is much higher than the two-minute route default [2], so that should help a lot with our leading error. The security group default is 10 minutes [4], so this is less of change there, and we only see that error rarely anyway. I went with 20 minutes (instead of a higher number), because a single resource (or parallel resources) coming in just under that range will keep the full Terraform step under the 30 minutes that we've chosen as a timeout for our other steps (waiting for the Kubernetes API, bootstrap completion, and install completion. But obviously we can tune more later if necessary. [1]: https://www.terraform.io/docs/configuration/resources.html#operation-timeouts [2]: https://www.terraform.io/docs/providers/aws/r/route.html#timeouts [3]: hashicorp/terraform-provider-aws#3639 (v1.11.0) [4]: https://www.terraform.io/docs/providers/aws/r/security_group.html#timeouts [5]: hashicorp/terraform-provider-aws#3599 (v1.11.0) [6]: https://github.com/wking/openshift-release/tree/debug-scripts/d3
1 parent c87b389 commit 246f4a1

File tree

5 files changed

+20
-0
lines changed

5 files changed

+20
-0
lines changed

data/data/aws/bootstrap/main.tf

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,10 @@ resource "aws_lb_target_group_attachment" "bootstrap" {
140140
resource "aws_security_group" "bootstrap" {
141141
vpc_id = "${var.vpc_id}"
142142

143+
timeouts {
144+
create = "20m"
145+
}
146+
143147
tags = "${merge(map(
144148
"Name", "${var.cluster_id}-bootstrap-sg",
145149
), var.tags)}"

data/data/aws/vpc/sg-master.tf

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
resource "aws_security_group" "master" {
22
vpc_id = "${data.aws_vpc.cluster_vpc.id}"
33

4+
timeouts {
5+
create = "20m"
6+
}
7+
48
tags = "${merge(map(
59
"Name", "${var.cluster_id}-master-sg",
610
), var.tags)}"

data/data/aws/vpc/sg-worker.tf

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
resource "aws_security_group" "worker" {
22
vpc_id = "${data.aws_vpc.cluster_vpc.id}"
33

4+
timeouts {
5+
create = "20m"
6+
}
7+
48
tags = "${merge(map(
59
"Name", "${var.cluster_id}-worker-sg",
610
), var.tags)}"

data/data/aws/vpc/vpc-private.tf

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,10 @@ resource "aws_route" "to_nat_gw" {
1313
destination_cidr_block = "0.0.0.0/0"
1414
nat_gateway_id = "${element(aws_nat_gateway.nat_gw.*.id, count.index)}"
1515
depends_on = ["aws_route_table.private_routes"]
16+
17+
timeouts {
18+
create = "20m"
19+
}
1620
}
1721

1822
resource "aws_subnet" "private_subnet" {

data/data/aws/vpc/vpc-public.tf

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,10 @@ resource "aws_route" "igw_route" {
2323
destination_cidr_block = "0.0.0.0/0"
2424
route_table_id = "${aws_route_table.default.id}"
2525
gateway_id = "${aws_internet_gateway.igw.id}"
26+
27+
timeouts {
28+
create = "20m"
29+
}
2630
}
2731

2832
resource "aws_subnet" "public_subnet" {

0 commit comments

Comments
 (0)