Skip to content

Lambda Associated EC2 Subnet and Security Group Deletion Issues and Improvements #10329

Closed
@bflad

Description

@bflad

Description

Beginning in September 2019, improved VPC networking for AWS Lambda began rolling out in certain AWS Commercial regions. Due to the underlying AWS infrastructure changes associated with this improved networking for Lambda, an unexpected consequence was a slight change in the Elastic Network Interface (ENI) description that Terraform used to manually delete those in those EC2 Subnets and Security Groups as well as an increased amount of time to delete them. During this Lambda service deployment, it was noticed by HashiCorp, AWS, and the community that deleting Elastic Compute Cloud (EC2) Subnets and Security Groups previously associated with Lambda Functions were now receiving DependencyViolation errors after those Terraform resources' default deletion timeouts (20 minutes and 10 minutes respectively). These errors during a Terraform apply operation may look like the following:

$ terraform destroy
...
Error: errors during apply: 2 problems:
        
        - Error deleting subnet: timeout while waiting for state to become 'destroyed' (last state: 'pending', timeout: 20m0s)
        - Error deleting security group: DependencyViolation: resource sg-xxxxxxxxxxxx has a dependent object
          status code: 400, request id: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx

Please note: not all DependencyViolation errors like the above are associated with this Lambda service change. The DependencyViolation error occurs when any infrastructure is still associated with an EC2 Subnet or Security Group during deletion. This may occur due to multiple, separate Terraform configurations working with the same subnet/security group or infrastructure manually associated with the subnet/security group.

Working on top of a community contribution (thanks, @ewbankkit and @obourdon!) and in close communication with the AWS Lambda service team to determine the highest percentile deletion times, Terraform AWS Provider version 2.31.0 and later includes automatic handling of the updated ENI description and handles the increased deletion times for the new Lambda infrastructure. See the Terraform documentation on provider versioning for information about upgrading Terraform Providers.

For Terraform environments that cannot be updated to Terraform AWS Provider version 2.31.0 or later yet, this issue can be mitigated by setting the customizable deletion timeouts available for these two Terraform resources to at least 45 minutes and ensuring any Lambda execution IAM Role permissions with ec2:DeleteNetworkInterface are explicitly ordered after the deletion of associated subnets/security groups so the Lambda service has permissions to delete the ENIs it created in your VPC before those permissions are removed.

Example configuration for Terraform AWS Provider versions 2.30.0 and earlier:

resource "aws_iam_role_policy_attachment" "example" {
  policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole"
  role       = "${aws_iam_role.example.id}"
}

resource "aws_subnet" "example" {
  # ... other configuration ...

  timeouts = {
    delete = "45m"
  }

  depends_on = ["aws_iam_role_policy_attachment.example"]
}

resource "aws_security_group" "example" {
  # ... other configuration ...

  timeouts = {
    delete = "45m"
  }

  depends_on = ["aws_iam_role_policy_attachment.example"]
}

In those earlier versions of the Terraform AWS Provider, if the IAM Role permissions are removed before Lambda is able to delete its Hyperplane ENIs, the subnet/security groups deletions will continually fail with a DependencyViolation error as those ENIs must be manually deleted. Those ENIs can be discovered by searching for the ENI description AWS Lambda VPC ENI*.

Example AWS CLI commands to find Lambda ENIs (see the AWS CLI documentation for additional filtering options):

# EC2 Subnet example
$ aws ec2 describe-network-interfaces --filter 'Name=description,Values="AWS Lambda VPC ENI*",Name=subnet-id,Values=subnet-12345678'
# EC2 Security Group example
$ aws ec2 describe-network-interfaces --filter 'Name=description,Values="AWS Lambda VPC ENI*",Name=group-id,Values=sg-12345678'

Example AWS CLI command to delete an ENI:

$ aws ec2 delete-network-interface --network-interface-id eni-12345678

While the deletion issues are now handled (either automatically in version 2.31.0 or later, or manually with the configuration above), the increased deletion time for this infrastructure is less than ideal. HashiCorp and AWS are continuing to closely work together on reducing this time, which will likely be handled by additional changes to the AWS Lambda service without any necessary changes to Terraform configurations. This issue serves as a location to capture updates relating to those service improvements.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementRequests to existing resources that expand the functionality or scope.service/ec2Issues and PRs that pertain to the ec2 service.service/lambdaIssues and PRs that pertain to the lambda service.upstreamAddresses functionality related to the cloud provider.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions