Fargate/Atlantis - Trigger AWS Fargate task from AWS Lambda #19

antonbabenko · 2018-09-07T19:32:14Z

$$$ Who wants to pay for idle resources in the cloud century? I don't. $$$

To-do:

Read this
Allow configuration of Fargate tasks schedule via AWS Cloudwatch to avoid cold-starts on workdays (eg, start one Fargate Task 15 minutes before 9:00 on workdays)

The text was updated successfully, but these errors were encountered:

oba11 · 2018-09-07T21:40:12Z

I could be wrong but I think it breaks system design whereby there is unhealthy loadbalancer on weekends. Tasks schedule feature is good for tasks without service (and loabalancer). Here you need the loadbalancer to always be healthy and available to consume requests from github webhook.
Also I think ideal cold start is just tearing down the module and the end of workweeks and start it up on workweeks.
Like I mentioned, I could be wrong 😄

antonbabenko · 2018-09-08T08:48:50Z

@oba11 Well, you are absolutely right IF architecture is implemented with LB and stay the same as now.

I am thinking about it this in a different way:

Github PR sends a request to AWS Lambda
AWS Lambda function triggers ECS Task creation & return a reply to Github PR that "Atlantis is starting now. You should see a response here in a couple minutes."
ECS Task completes processing the request (eg, SQS message), persist response in S3 or DynamoDB
S3 or DynamoDB event triggers AWS Lambda function which posts a proper reply to original GitHub PR with all details Atlantis has produced.

While this solution CAN be implemented with several hacks using the current version of Atlantis, it will be better to do some architectural changes to Atlantis, which has to be discussed in more details.

I think Atlantis should be divided into several services:

Core - Service which does the heavy-lifting (run terraform commands according to the configuration specified) and outputs result to STDOUT (for simplicity)
Acceptor - Service which process web-hooks. Now it supports VCS (GitHub, GitLab, BitBucket), but it can be more generic. The ultimate idea is to allow colleagues to trigger the same web-hooks from Slack.
Publisher - Service which posts back to the acceptor. Currently, it is VCS which triggered the invocation (eg, GitHub PR). This adds to the previous service (see above).

@lkysow, what do you think about this? Should I move this discussion to https://github.com/runatlantis/atlantis or do you have something like this already in your plans?

oba11 · 2018-09-09T17:04:19Z

No doubt this will be super nice, lets see what @lkysow thinks :)

lkysow · 2018-09-11T16:48:20Z

Hi Everyone. I'm all for cost-saving but I don't think this will work with how Atlantis currently runs. Also, the cost savings aren't that substantial. Given us-east-1 pricing, I think Atlantis costs 45 cents a day:

per vCPU per hour | $0.0506
per GB per hour | $0.0127
we're using CPU 256 and mem 512 -> 0.25 and 0.5
(0.0506 * 0.25 + 0.0127 * 0.5)*24 hours = $0.456/day

As to the other questions:

Atlantis keeps state on the filesystem between plan and apply. If the container is torn down then this state will be lost so I don't think it's possible to spin it up on demand right now
I don't want to pull apart Atlantis into separate services right now. I thinking having a single binary makes it operationally much simpler to deploy and it makes it easier to contribute to.

antonbabenko · 2018-09-11T16:55:17Z

Thanks for the feedback @lkysow!

I also won't be working on this feature myself in the nearest future, so I can't come up with numerous hacks which can be applied to get this to work.

Let's keep this issue open and come back to it when time allows, or someone wants to contribute :)

Jaff · 2019-05-09T18:09:39Z

Where is the container_definition for the Fargate task?

antonbabenko · 2019-05-10T10:46:26Z

Container definition is specified as part of aws_ecs_task_definition resource:

terraform-aws-atlantis/main.tf

Lines 446 to 456 in 9150d5a

    
           resource "aws_ecs_task_definition" "atlantis" { 
        
             family                   = "${var.name}" 
        
             execution_role_arn       = "${aws_iam_role.ecs_task_execution.arn}" 
        
             task_role_arn            = "${aws_iam_role.ecs_task_execution.arn}" 
        
             network_mode             = "awsvpc" 
        
             requires_compatibilities = ["FARGATE"] 
        
             cpu                      = "${var.ecs_task_cpu}" 
        
             memory                   = "${var.ecs_task_memory}" 
        
             container_definitions = "${local.container_definitions}" 
        
           }

Jaff · 2019-05-16T18:59:33Z

There does not appear to be anything related to running the atlantis server. You don't provide command and entrypoint parameters.

lkysow · 2019-05-16T19:06:23Z

There does not appear to be anything related to running the atlantis server. You don't provide command and entrypoint parameters.

The Atlantis Docker image will automatically run the server command if not given any args: https://github.com/runatlantis/atlantis/blob/master/Dockerfile#L29

Jaff · 2019-05-16T19:40:49Z

Hi, Luke; thanks for response

What about arguments? I have to provide --repo-config-json for my use. Likewise, I need to set credentials with profile since my user handles many accounts

lkysow · 2019-05-16T20:02:40Z

Can you use the custom_environment_secrets and custom_environment_variables variables? Atlantis supports using environment variables for all its flags (https://www.runatlantis.io/docs/server-configuration.html#environment-variables).

ex. ATLANTIS_REPO_CONFIG_JSON.

Sorry but I'm not too familiar with this module myself. Also maybe if you have more questions you could open up a separate issue because I think this issue is about running Atlantis on-demand via lambda so we shouldn't pollute that purpose too much.

Jaff · 2019-05-16T20:08:41Z

OK, thanks!

smiller171 · 2019-06-03T18:51:31Z

Given us-east-1 pricing, I think Atlantis costs 45 cents a day:

@lkysow I don't think these are good defaults though. I had an apply die on me and I had to manually recover some stuff because it ran out of resources and was killed by ECS

lkysow · 2019-06-03T20:31:16Z

:( that sucks. Curious, why did ECS kill it? Maybe we can bump up the default resources so others don't have that issue.

Yeah if you want to avoid that you must give it persistent disk. Either through kube or through an actual VM.

smiller171 · 2019-06-03T20:40:10Z

@lkysow I don't think persistent disk would have helped me here. My state is in S3, it just left a cfn stack in a bad state. Wasn't a huge pain, at least this time, and I bumped up the resources.

The problem was that it swamped the CPU enough that it took too long to respond to the health check. One possible solution is to just make the health check more forgiving. The tradeoff of course is taking longer to recover when there's a real problem.

nitrocode · 2020-04-13T17:40:38Z

@smiller171 What did you bump the resources to?

Current module defaults

container_memory_reservation = 128

ecs_task_cpu    = 256
ecs_task_memory = 512

cloudposse/terraform-aws-ecs-atlantis uses the same defaults

container_cpu    = 256
container_memory = 512

smiller171 · 2020-05-28T17:32:01Z

@nitrocode I ended up using:

  ecs_task_cpu                 = 1024
  ecs_task_memory              = 2048
  container_memory_reservation = 1536

This has worked well for me so far

nitrocode · 2020-05-28T17:35:40Z

Oh wow so you quadrupled each setting. Thanks. If I see similar issues, I'll do the same.

smiller171 · 2020-05-28T17:48:03Z

@nitrocode Yeah, but this almost certainly depends on how big your stack is and how many projects are running in parallel

github-actions · 2022-01-11T00:45:44Z

This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days

bryantbiggs · 2022-01-12T13:11:11Z

this coupled with #206 could work - holding from going stale

vitaliCoasy · 2022-01-22T21:56:37Z

I dont know if this would make any sense for current atlantis architecture, but as AWS Lambda allows to run containers right now, I would rather consider re-building the atlantis container and to add lambda handler API in it, so we could just deploy atlantis container into Lambda and to run it per Lambda calls, without a need to run it in ECS.

antonbabenko · 2022-01-23T15:35:17Z

@vitaliCoasy terraform runs can take more than 15 minutes (current limit of max duration for lambda function), so I don't think it will make much sense to migrate from ECS Fargate to pure Lambda function.

smiller171 · 2022-01-28T18:37:00Z

Yeah, I think it would make more sense to trigger a Lambda which starts an ECS job

github-actions · 2022-02-28T00:09:28Z

This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days

smiller171 · 2022-02-28T15:54:15Z

bump :)

github-actions · 2022-04-01T00:10:15Z

This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions · 2022-04-12T00:09:35Z

This issue was automatically closed because of stale in 10 days

github-actions · 2022-11-08T02:32:32Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions bot added the stale label Jan 11, 2022

bryantbiggs added enhancement and removed stale labels Jan 12, 2022

MarkIannucci mentioned this issue Jan 22, 2022

Leverage EFS to persist atlantis locks between deployments? #206

Closed

github-actions bot added the stale label Feb 28, 2022

github-actions bot removed the stale label Mar 1, 2022

MarkIannucci mentioned this issue Mar 19, 2022

fix: Only create mount point for EFS when using EFS #261

Merged

1 task

github-actions bot added the stale label Apr 1, 2022

github-actions bot closed this as completed Apr 12, 2022

github-actions bot locked as resolved and limited conversation to collaborators Nov 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fargate/Atlantis - Trigger AWS Fargate task from AWS Lambda #19

Fargate/Atlantis - Trigger AWS Fargate task from AWS Lambda #19

antonbabenko commented Sep 7, 2018

oba11 commented Sep 7, 2018

antonbabenko commented Sep 8, 2018

oba11 commented Sep 9, 2018

lkysow commented Sep 11, 2018

antonbabenko commented Sep 11, 2018

Jaff commented May 9, 2019

antonbabenko commented May 10, 2019

Jaff commented May 16, 2019

lkysow commented May 16, 2019

Jaff commented May 16, 2019 •

edited

Loading

lkysow commented May 16, 2019

Jaff commented May 16, 2019

smiller171 commented Jun 3, 2019

lkysow commented Jun 3, 2019

smiller171 commented Jun 3, 2019

nitrocode commented Apr 13, 2020 •

edited

Loading

smiller171 commented May 28, 2020 •

edited

Loading

nitrocode commented May 28, 2020

smiller171 commented May 28, 2020

github-actions bot commented Jan 11, 2022

bryantbiggs commented Jan 12, 2022

vitaliCoasy commented Jan 22, 2022

antonbabenko commented Jan 23, 2022

smiller171 commented Jan 28, 2022

github-actions bot commented Feb 28, 2022

smiller171 commented Feb 28, 2022

github-actions bot commented Apr 1, 2022

github-actions bot commented Apr 12, 2022

github-actions bot commented Nov 8, 2022

Fargate/Atlantis - Trigger AWS Fargate task from AWS Lambda #19

Fargate/Atlantis - Trigger AWS Fargate task from AWS Lambda #19

Comments

antonbabenko commented Sep 7, 2018

oba11 commented Sep 7, 2018

antonbabenko commented Sep 8, 2018

oba11 commented Sep 9, 2018

lkysow commented Sep 11, 2018

antonbabenko commented Sep 11, 2018

Jaff commented May 9, 2019

antonbabenko commented May 10, 2019

Jaff commented May 16, 2019

lkysow commented May 16, 2019

Jaff commented May 16, 2019 • edited Loading

lkysow commented May 16, 2019

Jaff commented May 16, 2019

smiller171 commented Jun 3, 2019

lkysow commented Jun 3, 2019

smiller171 commented Jun 3, 2019

nitrocode commented Apr 13, 2020 • edited Loading

smiller171 commented May 28, 2020 • edited Loading

nitrocode commented May 28, 2020

smiller171 commented May 28, 2020

github-actions bot commented Jan 11, 2022

bryantbiggs commented Jan 12, 2022

vitaliCoasy commented Jan 22, 2022

antonbabenko commented Jan 23, 2022

smiller171 commented Jan 28, 2022

github-actions bot commented Feb 28, 2022

smiller171 commented Feb 28, 2022

github-actions bot commented Apr 1, 2022

github-actions bot commented Apr 12, 2022

github-actions bot commented Nov 8, 2022

Jaff commented May 16, 2019 •

edited

Loading

nitrocode commented Apr 13, 2020 •

edited

Loading

smiller171 commented May 28, 2020 •

edited

Loading