Skip to content

gke-cluster v19.0.0 replacing autopilot cluster #1126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
heaton-dev opened this issue Feb 2, 2023 · 6 comments · Fixed by #1127
Closed

gke-cluster v19.0.0 replacing autopilot cluster #1126

heaton-dev opened this issue Feb 2, 2023 · 6 comments · Fixed by #1127
Labels
bug Something isn't working on:modules

Comments

@heaton-dev
Copy link
Contributor

Between daily-2022.11.11 and v19.0.0 gke-cluster has started replacing Autopilot clusters without config change. This doesn't happen immediately, but re-running the Terraform the next day causes a replace. Confirmed this behaviour twice now, no other changes, just ran Terraform, waited a day, ran it again, forces replace.

I've included Terraform output below running a plan against an existing cluster.

daily-2022.11.11 behaviour:

Note: Objects have changed outside of Terraform

Terraform detected the following changes made outside of Terraform since the last "terraform apply" which may have affected this plan:

  # module.cluster.google_container_cluster.cluster has changed
  ~ resource "google_container_cluster" "cluster" {
        id                          = "projects/k8s-heaton/locations/europe-west1/clusters/k8s-heaton-e5fd"
        name                        = "k8s-heaton-e5fd"
        # (27 unchanged attributes hidden)

      - node_config {
          - disk_size_gb      = 100 -> null
          - disk_type         = "pd-standard" -> null
          - guest_accelerator = [] -> null
          - image_type        = "COS_CONTAINERD" -> null
          - labels            = {} -> null
          - local_ssd_count   = 0 -> null
          - logging_variant   = "DEFAULT" -> null
          - machine_type      = "e2-medium" -> null
          - metadata          = {
              - "disable-legacy-endpoints" = "true"
            } -> null
          - oauth_scopes      = [
              - "https://www.googleapis.com/auth/devstorage.read_only",
              - "https://www.googleapis.com/auth/logging.write",
              - "https://www.googleapis.com/auth/monitoring",
              - "https://www.googleapis.com/auth/service.management.readonly",
              - "https://www.googleapis.com/auth/servicecontrol",
              - "https://www.googleapis.com/auth/trace.append",
            ] -> null
          - preemptible       = false -> null
          - resource_labels   = {} -> null
          - service_account   = "default" -> null
          - spot              = false -> null
          - tags              = [] -> null
          - taint             = [] -> null

          - shielded_instance_config {
              - enable_integrity_monitoring = true -> null
              - enable_secure_boot          = true -> null
            }

          - workload_metadata_config {
              - mode = "GKE_METADATA" -> null
            }
        }

      - node_pool {
          - initial_node_count          = 1 -> null
          - instance_group_urls         = [] -> null
          - managed_instance_group_urls = [] -> null
          - max_pods_per_node           = 32 -> null
          - name                        = "default-pool" -> null
          - node_count                  = 0 -> null
          - node_locations              = [
              - "europe-west1-b",
              - "europe-west1-d",
            ] -> null
          - version                     = "1.24.8-gke.2000" -> null

          - autoscaling {
              - location_policy      = "BALANCED" -> null
              - max_node_count       = 1000 -> null
              - min_node_count       = 0 -> null
              - total_max_node_count = 0 -> null
              - total_min_node_count = 0 -> null
            }

          - management {
              - auto_repair  = true -> null
              - auto_upgrade = true -> null
            }

          - network_config {
              - create_pod_range     = false -> null
              - enable_private_nodes = false -> null
              - pod_ipv4_cidr_block  = "10.130.0.0/16" -> null
              - pod_range            = "pods" -> null
            }

          - node_config {
              - disk_size_gb      = 100 -> null
              - disk_type         = "pd-standard" -> null
              - guest_accelerator = [] -> null
              - image_type        = "COS_CONTAINERD" -> null
              - labels            = {} -> null
              - local_ssd_count   = 0 -> null
              - logging_variant   = "DEFAULT" -> null
              - machine_type      = "e2-medium" -> null
              - metadata          = {
                  - "disable-legacy-endpoints" = "true"
                } -> null
              - oauth_scopes      = [
                  - "https://www.googleapis.com/auth/devstorage.read_only",
                  - "https://www.googleapis.com/auth/logging.write",
                  - "https://www.googleapis.com/auth/monitoring",
                  - "https://www.googleapis.com/auth/service.management.readonly",
                  - "https://www.googleapis.com/auth/servicecontrol",
                  - "https://www.googleapis.com/auth/trace.append",
                ] -> null
              - preemptible       = false -> null
              - resource_labels   = {} -> null
              - service_account   = "default" -> null
              - spot              = false -> null
              - tags              = [] -> null
              - taint             = [] -> null

              - shielded_instance_config {
                  - enable_integrity_monitoring = true -> null
                  - enable_secure_boot          = true -> null
                }

              - workload_metadata_config {
                  - mode = "GKE_METADATA" -> null
                }
            }

          - upgrade_settings {
              - max_surge       = 1 -> null
              - max_unavailable = 0 -> null
              - strategy        = "SURGE" -> null
            }
        }

        # (22 unchanged blocks hidden)
    }


Unless you have made equivalent changes to your configuration, or ignored the relevant attributes using ignore_changes, the following plan
may include actions to undo or respond to these changes.

v19.0.0 behaviour:

Note: Objects have changed outside of Terraform
# Similar output as above...

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # module.cluster.google_container_cluster.cluster must be replaced
...
...
      + node_config { # forces replacement
          + disk_size_gb      = (known after apply)
          + disk_type         = (known after apply)
          + guest_accelerator = (known after apply) # forces replacement
          + image_type        = (known after apply)
          + labels            = (known after apply)
          + local_ssd_count   = (known after apply)
          + logging_variant   = "DEFAULT"
          + machine_type      = (known after apply)
          + metadata          = (known after apply) # forces replacement
          + min_cpu_platform  = (known after apply)
          + oauth_scopes      = (known after apply) # forces replacement
          + preemptible       = false # forces replacement
          + service_account   = (known after apply)
          + spot              = false # forces replacement
          + taint             = (known after apply)

          + shielded_instance_config { # forces replacement
              + enable_integrity_monitoring = (known after apply)
              + enable_secure_boot          = (known after apply)
            }

          + workload_metadata_config { # forces replacement
              + mode = (known after apply)
            }
        }

      + node_pool {
...
...
        }

Fabric is trying to add node_config and node_pool attributes

@ludoo
Copy link
Collaborator

ludoo commented Feb 2, 2023

@apichick @danielmarzini do you have a clue on what's happening here?

@ludoo ludoo added bug Something isn't working on:modules labels Feb 2, 2023
@juliocc
Copy link
Collaborator

juliocc commented Feb 2, 2023

Just FYI, here's the diff of gke-cluster between daily-2022.11.11 and v19.0.0

diff --git a/modules/gke-cluster/main.tf b/modules/gke-cluster/main.tf
index bc94dd37..f4b86bf6 100644
--- a/modules/gke-cluster/main.tf
+++ b/modules/gke-cluster/main.tf
@@ -48,7 +48,18 @@ resource "google_container_cluster" "cluster" {
   enable_autopilot = var.enable_features.autopilot ? true : null
 
   # the default nodepool is deleted here, use the gke-nodepool module instead
-  # node_config {}
+  # default nodepool configuration based on a shielded_nodes variable
+  node_config {
+    dynamic "shielded_instance_config" {
+      for_each = var.enable_features.shielded_nodes ? [""] : []
+      content {
+        enable_secure_boot          = true
+        enable_integrity_monitoring = true
+      }
+    }
+  }
+
+
 
   addons_config {
     dynamic "dns_cache_config" {
@@ -131,7 +142,7 @@ resource "google_container_cluster" "cluster" {
       dynamic "resource_limits" {
         for_each = var.cluster_autoscaling.mem_limits != null ? [""] : []
         content {
-          resource_type = "cpu"
+          resource_type = "memory"
           minimum       = var.cluster_autoscaling.mem_limits.min
           maximum       = var.cluster_autoscaling.mem_limits.max
         }

@ludoo
Copy link
Collaborator

ludoo commented Feb 2, 2023

I think we might need to skip node config if autopilot bool is set

@ludoo
Copy link
Collaborator

ludoo commented Feb 2, 2023

@joeheaton can you try with the updated module?

@ludoo ludoo reopened this Feb 2, 2023
@heaton-dev
Copy link
Contributor Author

Looks like that solved it! Thanks @ludoo

@ludoo
Copy link
Collaborator

ludoo commented Feb 2, 2023

Awesome! Thanks for flagging this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working on:modules
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants