-
Notifications
You must be signed in to change notification settings - Fork 145
feat: TKC-3683: add support for canceled workflow executions #6431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
pkg/runner/agent.go
Outdated
} | ||
return | ||
} | ||
originalError := a.runner.Abort(req.ExecutionID()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we rename Abort to Terminate?
@@ -242,6 +242,38 @@ func (a *agentLoop) loopRunnerRequests(ctx context.Context) error { | |||
a.logger.Errorf("failed to send error for start execution '%s/%s': %v", req.EnvironmentID(), req.ExecutionID(), err) | |||
} | |||
} | |||
case cloud.RunnerRequestType_CANCEL: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have ABORT, no? What is the difference between CANCEL and ABORT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea is for abort to be used if the system aborts the execution (stuck running, timeout, invalid spec...), while cancel to be used if a user manually cancels the execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe RunnerRequestType_ABORT
is exactly used for that, there is no need to add another "CANCEL" request type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When agent aborts execution due to system issue (OOM kill), I think it just updates status, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When agent aborts execution due to system issue (OOM kill), I think it just updates status, no?
Yes, the pod will be gone and the execution will get aborted.
I believe RunnerRequestType_ABORT is exactly used for that, there is no need to add another "CANCEL" request type.
I thoight about it, I'd rather remove the RunnerRequestType_ABORT
and use the RunnerRequestType_CANCEL
as it is more intuitive, from now on UI, API and CLI would send cancelations.
I don't know can the control plane abort (not by user) an execution in an agent for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know can the control plane abort (not by user) an execution in an agent for some reason?
grepping the code - I couldn't find anything.
Can we then remove the ABORT?
f472388
to
93e96e9
Compare
} | ||
} | ||
if err := w.patchTerminationAnnotations(ctx, id, options.Namespace, testkube.CANCELED_TestWorkflowStatus, "Job has been canceled by a user"); err != nil { | ||
fmt.Printf("Failed to patch job %s/%s with termination code & reason: %v\n", options.Namespace, id, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this log or debugging printing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, nice catch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one note
93e96e9
to
dce0ff0
Compare
Signed-off-by: Dejan Zele Pejchev <[email protected]>
dce0ff0
to
7fb440e
Compare
Pull request description
Add support for handling canceled workflow executions.
Update status tracking for canceled runs.
Emit canceled event for old architecture.
Improve aborted vs canceled error messages.
Checklist (choose whats happened)
Breaking changes
Changes
Fixes