EcsRunTaskOperator reattach does not work #29601
Labels
area:providers
good first issue
kind:bug
This is a clearly a bug
provider:amazon
AWS/Amazon - related issues
Apache Airflow Provider(s)
amazon
Versions of Apache Airflow Providers
apache-airflow-providers-amazon==7.1.0
Apache Airflow version
2.5.1
Operating System
Docker on ECS (apache/airflow:2.5.1-python3.9)
Deployment
Other Docker-based deployment
Deployment details
No response
What happened
The
EcsRunTaskOperator
has areattach
option. The idea is that when a task is launched on ECS, itsarn
will be saved in thexcom
table so that if airflow restarts or something, it'll be able to reattach to the currently-running task in ECS rather than launching a new one.It always fails to get the
arn
of the running task from thexcom
table however.What you think should happen instead
When airflow restarts and retries an
EcsRunTaskOperator
task that was killed by the the restart, it should find thearn
of the currently-running ECS task and continue waiting for that ECS task to finish instead of starting a new one.How to reproduce
EcsRunTaskOperator
task which, for testing purposes, takes at least a few minutes to completeCheck the task logs when the task restarts. In the logs you'll see "No active previously launched task found to reattach"
Anything else
There are two problems from what I can tell.
Problem 1
When is pushes the xcom data, it uses the
task_id
of the task.When in tries to retrieve the data it uses a made-up
task_id
, so it'll never find the one saved earlier.It also uses the same made-up
task_id
when it tries to later delete thexcom
data.Problem 2
Switching from the made-up
task_id
to the normaltask_id
during retrieval doesn't help, since allxcom
rows with the task/dag/run id are deleted when the task restarts, so thearn
saved on the previous attempt is never available.I tried changing the
xcom_push
to this:This causes this error:
You used to be able to make up a
task_id
as a hack to save things in thexcom
table, but that foreign key constraint must have been added at some point.Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: