Skip to content

Fargate/Atlantis - Trigger AWS Fargate task from AWS Lambda #19

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 tasks
antonbabenko opened this issue Sep 7, 2018 · 29 comments
Closed
2 tasks

Fargate/Atlantis - Trigger AWS Fargate task from AWS Lambda #19

antonbabenko opened this issue Sep 7, 2018 · 29 comments

Comments

@antonbabenko
Copy link
Member

$$$ Who wants to pay for idle resources in the cloud century? I don't. $$$

To-do:

  • Read this
  • Allow configuration of Fargate tasks schedule via AWS Cloudwatch to avoid cold-starts on workdays (eg, start one Fargate Task 15 minutes before 9:00 on workdays)
@oba11
Copy link

oba11 commented Sep 7, 2018

I could be wrong but I think it breaks system design whereby there is unhealthy loadbalancer on weekends. Tasks schedule feature is good for tasks without service (and loabalancer). Here you need the loadbalancer to always be healthy and available to consume requests from github webhook.
Also I think ideal cold start is just tearing down the module and the end of workweeks and start it up on workweeks.
Like I mentioned, I could be wrong 😄

@antonbabenko
Copy link
Member Author

@oba11 Well, you are absolutely right IF architecture is implemented with LB and stay the same as now.

I am thinking about it this in a different way:

  1. Github PR sends a request to AWS Lambda
  2. AWS Lambda function triggers ECS Task creation & return a reply to Github PR that "Atlantis is starting now. You should see a response here in a couple minutes."
  3. ECS Task completes processing the request (eg, SQS message), persist response in S3 or DynamoDB
  4. S3 or DynamoDB event triggers AWS Lambda function which posts a proper reply to original GitHub PR with all details Atlantis has produced.

While this solution CAN be implemented with several hacks using the current version of Atlantis, it will be better to do some architectural changes to Atlantis, which has to be discussed in more details.

I think Atlantis should be divided into several services:

  1. Core - Service which does the heavy-lifting (run terraform commands according to the configuration specified) and outputs result to STDOUT (for simplicity)
  2. Acceptor - Service which process web-hooks. Now it supports VCS (GitHub, GitLab, BitBucket), but it can be more generic. The ultimate idea is to allow colleagues to trigger the same web-hooks from Slack.
  3. Publisher - Service which posts back to the acceptor. Currently, it is VCS which triggered the invocation (eg, GitHub PR). This adds to the previous service (see above).

@lkysow, what do you think about this? Should I move this discussion to https://github.com/runatlantis/atlantis or do you have something like this already in your plans?

@oba11
Copy link

oba11 commented Sep 9, 2018

No doubt this will be super nice, lets see what @lkysow thinks :)

@lkysow
Copy link

lkysow commented Sep 11, 2018

Hi Everyone. I'm all for cost-saving but I don't think this will work with how Atlantis currently runs. Also, the cost savings aren't that substantial. Given us-east-1 pricing, I think Atlantis costs 45 cents a day:

  • per vCPU per hour | $0.0506
  • per GB per hour | $0.0127
  • we're using CPU 256 and mem 512 -> 0.25 and 0.5
  • (0.0506 * 0.25 + 0.0127 * 0.5)*24 hours = $0.456/day

As to the other questions:

  1. Atlantis keeps state on the filesystem between plan and apply. If the container is torn down then this state will be lost so I don't think it's possible to spin it up on demand right now
  2. I don't want to pull apart Atlantis into separate services right now. I thinking having a single binary makes it operationally much simpler to deploy and it makes it easier to contribute to.

@antonbabenko
Copy link
Member Author

Thanks for the feedback @lkysow!

I also won't be working on this feature myself in the nearest future, so I can't come up with numerous hacks which can be applied to get this to work.

Let's keep this issue open and come back to it when time allows, or someone wants to contribute :)

@Jaff
Copy link

Jaff commented May 9, 2019

Where is the container_definition for the Fargate task?

@antonbabenko
Copy link
Member Author

Container definition is specified as part of aws_ecs_task_definition resource:

resource "aws_ecs_task_definition" "atlantis" {
family = "${var.name}"
execution_role_arn = "${aws_iam_role.ecs_task_execution.arn}"
task_role_arn = "${aws_iam_role.ecs_task_execution.arn}"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = "${var.ecs_task_cpu}"
memory = "${var.ecs_task_memory}"
container_definitions = "${local.container_definitions}"
}

@Jaff
Copy link

Jaff commented May 16, 2019

There does not appear to be anything related to running the atlantis server. You don't provide command and entrypoint parameters.

@lkysow
Copy link

lkysow commented May 16, 2019

There does not appear to be anything related to running the atlantis server. You don't provide command and entrypoint parameters.

The Atlantis Docker image will automatically run the server command if not given any args: https://github.com/runatlantis/atlantis/blob/master/Dockerfile#L29

@Jaff
Copy link

Jaff commented May 16, 2019

Hi, Luke; thanks for response

What about arguments? I have to provide --repo-config-json for my use. Likewise, I need to set credentials with profile since my user handles many accounts

@lkysow
Copy link

lkysow commented May 16, 2019

Can you use the custom_environment_secrets and custom_environment_variables variables? Atlantis supports using environment variables for all its flags (https://www.runatlantis.io/docs/server-configuration.html#environment-variables).

ex. ATLANTIS_REPO_CONFIG_JSON.

Sorry but I'm not too familiar with this module myself. Also maybe if you have more questions you could open up a separate issue because I think this issue is about running Atlantis on-demand via lambda so we shouldn't pollute that purpose too much.

@Jaff
Copy link

Jaff commented May 16, 2019

OK, thanks!

@smiller171
Copy link
Contributor

Given us-east-1 pricing, I think Atlantis costs 45 cents a day:

@lkysow I don't think these are good defaults though. I had an apply die on me and I had to manually recover some stuff because it ran out of resources and was killed by ECS

@lkysow
Copy link

lkysow commented Jun 3, 2019

:( that sucks. Curious, why did ECS kill it? Maybe we can bump up the default resources so others don't have that issue.

Yeah if you want to avoid that you must give it persistent disk. Either through kube or through an actual VM.

@smiller171
Copy link
Contributor

@lkysow I don't think persistent disk would have helped me here. My state is in S3, it just left a cfn stack in a bad state. Wasn't a huge pain, at least this time, and I bumped up the resources.

The problem was that it swamped the CPU enough that it took too long to respond to the health check. One possible solution is to just make the health check more forgiving. The tradeoff of course is taking longer to recover when there's a real problem.

@nitrocode
Copy link
Member

nitrocode commented Apr 13, 2020

@smiller171 What did you bump the resources to?

Current module defaults

container_memory_reservation = 128

ecs_task_cpu    = 256
ecs_task_memory = 512

cloudposse/terraform-aws-ecs-atlantis uses the same defaults

container_cpu    = 256
container_memory = 512

@smiller171
Copy link
Contributor

smiller171 commented May 28, 2020

@nitrocode I ended up using:

  ecs_task_cpu                 = 1024
  ecs_task_memory              = 2048
  container_memory_reservation = 1536

This has worked well for me so far

@nitrocode
Copy link
Member

Oh wow so you quadrupled each setting. Thanks. If I see similar issues, I'll do the same.

@smiller171
Copy link
Contributor

@nitrocode Yeah, but this almost certainly depends on how big your stack is and how many projects are running in parallel

@github-actions
Copy link

This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days

@bryantbiggs
Copy link
Member

this coupled with #206 could work - holding from going stale

@vitaliCoasy
Copy link

I dont know if this would make any sense for current atlantis architecture, but as AWS Lambda allows to run containers right now, I would rather consider re-building the atlantis container and to add lambda handler API in it, so we could just deploy atlantis container into Lambda and to run it per Lambda calls, without a need to run it in ECS.

@antonbabenko
Copy link
Member Author

@vitaliCoasy terraform runs can take more than 15 minutes (current limit of max duration for lambda function), so I don't think it will make much sense to migrate from ECS Fargate to pure Lambda function.

@smiller171
Copy link
Contributor

Yeah, I think it would make more sense to trigger a Lambda which starts an ECS job

@github-actions
Copy link

This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days

@github-actions github-actions bot added the stale label Feb 28, 2022
@smiller171
Copy link
Contributor

bump :)

@github-actions
Copy link

github-actions bot commented Apr 1, 2022

This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days

@github-actions github-actions bot added the stale label Apr 1, 2022
@github-actions
Copy link

This issue was automatically closed because of stale in 10 days

@github-actions
Copy link

github-actions bot commented Nov 8, 2022

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants