gke-cluster v19.0.0 replacing autopilot cluster #1126

heaton-dev · 2023-02-02T13:52:03Z

Between daily-2022.11.11 and v19.0.0 gke-cluster has started replacing Autopilot clusters without config change. This doesn't happen immediately, but re-running the Terraform the next day causes a replace. Confirmed this behaviour twice now, no other changes, just ran Terraform, waited a day, ran it again, forces replace.

I've included Terraform output below running a plan against an existing cluster.

daily-2022.11.11 behaviour:

Note: Objects have changed outside of Terraform

Terraform detected the following changes made outside of Terraform since the last "terraform apply" which may have affected this plan:

  # module.cluster.google_container_cluster.cluster has changed
  ~ resource "google_container_cluster" "cluster" {
        id                          = "projects/k8s-heaton/locations/europe-west1/clusters/k8s-heaton-e5fd"
        name                        = "k8s-heaton-e5fd"
        # (27 unchanged attributes hidden)

      - node_config {
          - disk_size_gb      = 100 -> null
          - disk_type         = "pd-standard" -> null
          - guest_accelerator = [] -> null
          - image_type        = "COS_CONTAINERD" -> null
          - labels            = {} -> null
          - local_ssd_count   = 0 -> null
          - logging_variant   = "DEFAULT" -> null
          - machine_type      = "e2-medium" -> null
          - metadata          = {
              - "disable-legacy-endpoints" = "true"
            } -> null
          - oauth_scopes      = [
              - "https://www.googleapis.com/auth/devstorage.read_only",
              - "https://www.googleapis.com/auth/logging.write",
              - "https://www.googleapis.com/auth/monitoring",
              - "https://www.googleapis.com/auth/service.management.readonly",
              - "https://www.googleapis.com/auth/servicecontrol",
              - "https://www.googleapis.com/auth/trace.append",
            ] -> null
          - preemptible       = false -> null
          - resource_labels   = {} -> null
          - service_account   = "default" -> null
          - spot              = false -> null
          - tags              = [] -> null
          - taint             = [] -> null

          - shielded_instance_config {
              - enable_integrity_monitoring = true -> null
              - enable_secure_boot          = true -> null
            }

          - workload_metadata_config {
              - mode = "GKE_METADATA" -> null
            }
        }

      - node_pool {
          - initial_node_count          = 1 -> null
          - instance_group_urls         = [] -> null
          - managed_instance_group_urls = [] -> null
          - max_pods_per_node           = 32 -> null
          - name                        = "default-pool" -> null
          - node_count                  = 0 -> null
          - node_locations              = [
              - "europe-west1-b",
              - "europe-west1-d",
            ] -> null
          - version                     = "1.24.8-gke.2000" -> null

          - autoscaling {
              - location_policy      = "BALANCED" -> null
              - max_node_count       = 1000 -> null
              - min_node_count       = 0 -> null
              - total_max_node_count = 0 -> null
              - total_min_node_count = 0 -> null
            }

          - management {
              - auto_repair  = true -> null
              - auto_upgrade = true -> null
            }

          - network_config {
              - create_pod_range     = false -> null
              - enable_private_nodes = false -> null
              - pod_ipv4_cidr_block  = "10.130.0.0/16" -> null
              - pod_range            = "pods" -> null
            }

          - node_config {
              - disk_size_gb      = 100 -> null
              - disk_type         = "pd-standard" -> null
              - guest_accelerator = [] -> null
              - image_type        = "COS_CONTAINERD" -> null
              - labels            = {} -> null
              - local_ssd_count   = 0 -> null
              - logging_variant   = "DEFAULT" -> null
              - machine_type      = "e2-medium" -> null
              - metadata          = {
                  - "disable-legacy-endpoints" = "true"
                } -> null
              - oauth_scopes      = [
                  - "https://www.googleapis.com/auth/devstorage.read_only",
                  - "https://www.googleapis.com/auth/logging.write",
                  - "https://www.googleapis.com/auth/monitoring",
                  - "https://www.googleapis.com/auth/service.management.readonly",
                  - "https://www.googleapis.com/auth/servicecontrol",
                  - "https://www.googleapis.com/auth/trace.append",
                ] -> null
              - preemptible       = false -> null
              - resource_labels   = {} -> null
              - service_account   = "default" -> null
              - spot              = false -> null
              - tags              = [] -> null
              - taint             = [] -> null

              - shielded_instance_config {
                  - enable_integrity_monitoring = true -> null
                  - enable_secure_boot          = true -> null
                }

              - workload_metadata_config {
                  - mode = "GKE_METADATA" -> null
                }
            }

          - upgrade_settings {
              - max_surge       = 1 -> null
              - max_unavailable = 0 -> null
              - strategy        = "SURGE" -> null
            }
        }

        # (22 unchanged blocks hidden)
    }


Unless you have made equivalent changes to your configuration, or ignored the relevant attributes using ignore_changes, the following plan
may include actions to undo or respond to these changes.

v19.0.0 behaviour:

Note: Objects have changed outside of Terraform
# Similar output as above...

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # module.cluster.google_container_cluster.cluster must be replaced
...
...
      + node_config { # forces replacement
          + disk_size_gb      = (known after apply)
          + disk_type         = (known after apply)
          + guest_accelerator = (known after apply) # forces replacement
          + image_type        = (known after apply)
          + labels            = (known after apply)
          + local_ssd_count   = (known after apply)
          + logging_variant   = "DEFAULT"
          + machine_type      = (known after apply)
          + metadata          = (known after apply) # forces replacement
          + min_cpu_platform  = (known after apply)
          + oauth_scopes      = (known after apply) # forces replacement
          + preemptible       = false # forces replacement
          + service_account   = (known after apply)
          + spot              = false # forces replacement
          + taint             = (known after apply)

          + shielded_instance_config { # forces replacement
              + enable_integrity_monitoring = (known after apply)
              + enable_secure_boot          = (known after apply)
            }

          + workload_metadata_config { # forces replacement
              + mode = (known after apply)
            }
        }

      + node_pool {
...
...
        }

Fabric is trying to add node_config and node_pool attributes

The text was updated successfully, but these errors were encountered:

ludoo · 2023-02-02T13:53:40Z

@apichick @danielmarzini do you have a clue on what's happening here?

juliocc · 2023-02-02T14:45:24Z

Just FYI, here's the diff of gke-cluster between daily-2022.11.11 and v19.0.0

diff --git a/modules/gke-cluster/main.tf b/modules/gke-cluster/main.tf
index bc94dd37..f4b86bf6 100644
--- a/modules/gke-cluster/main.tf
+++ b/modules/gke-cluster/main.tf
@@ -48,7 +48,18 @@ resource "google_container_cluster" "cluster" {
   enable_autopilot = var.enable_features.autopilot ? true : null
 
   # the default nodepool is deleted here, use the gke-nodepool module instead
-  # node_config {}
+  # default nodepool configuration based on a shielded_nodes variable
+  node_config {
+    dynamic "shielded_instance_config" {
+      for_each = var.enable_features.shielded_nodes ? [""] : []
+      content {
+        enable_secure_boot          = true
+        enable_integrity_monitoring = true
+      }
+    }
+  }
+
+
 
   addons_config {
     dynamic "dns_cache_config" {
@@ -131,7 +142,7 @@ resource "google_container_cluster" "cluster" {
       dynamic "resource_limits" {
         for_each = var.cluster_autoscaling.mem_limits != null ? [""] : []
         content {
-          resource_type = "cpu"
+          resource_type = "memory"
           minimum       = var.cluster_autoscaling.mem_limits.min
           maximum       = var.cluster_autoscaling.mem_limits.max
         }

ludoo · 2023-02-02T14:47:07Z

I think we might need to skip node config if autopilot bool is set

ludoo · 2023-02-02T15:14:20Z

@joeheaton can you try with the updated module?

heaton-dev · 2023-02-02T15:35:05Z

Looks like that solved it! Thanks @ludoo

ludoo · 2023-02-02T15:38:25Z

Awesome! Thanks for flagging this!

ludoo added bug Something isn't working on:modules labels Feb 2, 2023

ludoo mentioned this issue Feb 2, 2023

Skip node config for autopilot #1127

Merged

ludoo closed this as completed in #1127 Feb 2, 2023

ludoo reopened this Feb 2, 2023

ludoo closed this as completed Feb 2, 2023

ludoo mentioned this issue Apr 29, 2023

Module gke-cluster with "forces replacement" due to deletion of default node pool #1275

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gke-cluster v19.0.0 replacing autopilot cluster #1126

gke-cluster v19.0.0 replacing autopilot cluster #1126

heaton-dev commented Feb 2, 2023

ludoo commented Feb 2, 2023

juliocc commented Feb 2, 2023

ludoo commented Feb 2, 2023

ludoo commented Feb 2, 2023

heaton-dev commented Feb 2, 2023

ludoo commented Feb 2, 2023

gke-cluster v19.0.0 replacing autopilot cluster #1126

gke-cluster v19.0.0 replacing autopilot cluster #1126

Comments

heaton-dev commented Feb 2, 2023

ludoo commented Feb 2, 2023

juliocc commented Feb 2, 2023

ludoo commented Feb 2, 2023

ludoo commented Feb 2, 2023

heaton-dev commented Feb 2, 2023

ludoo commented Feb 2, 2023