Skip to content

cancelled() returns false after workflow cancel #3041

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
GuyAv46 opened this issue Dec 14, 2023 · 13 comments
Closed

cancelled() returns false after workflow cancel #3041

GuyAv46 opened this issue Dec 14, 2023 · 13 comments
Labels
bug Something isn't working Stale

Comments

@GuyAv46
Copy link

GuyAv46 commented Dec 14, 2023

Describe the bug

I have a branch protection rule that requires a pr-validation job to be successful for merging a PR:

  pr-validation:
    needs: [test-linux-x86_64, test-linux-aarch64, test-macos, coverage, sanitize]
    runs-on: ubuntu-latest
    if: ${{ !cancelled() }}
    steps:
      - if: contains(needs.*.result, 'failure')
        run: exit 1

Some of the jobs in the needs list are complex and use matrixes to trigger reusable workflows, some of them also use matrixes and trigger more reusable workflows.

As you can see here, this job was running successfully when:

Expected behavior

  1. The workflow was canceled, so it wasn't expected to run.
  2. Some of the jobs in the needs list failed before the workflow cancelation, so if the job was to run, the exit 1 step was expected to be executed.

But for some reason, the job ran and ended successfully, and approved the PR.

Runner Version and Platform

Current runner version: '2.311.0'
Operating System
  Ubuntu
  22.04.3
  LTS
Runner Image
  Image: ubuntu-22.04
  Version: 20231211.1.0
  Included Software: https://github.com/actions/runner-images/blob/ubuntu22/20231211.1/images/ubuntu/Ubuntu2204-Readme.md
  Image Release: https://github.com/actions/runner-images/releases/tag/ubuntu22%2F20231211.1
Runner Image Provisioner
  2.0.321.1
GITHUB_TOKEN Permissions

OS of the machine running the runner? OSX/Windows/Linux/...

What's not working?

Please include error messages and screenshots.

Job Log Output

If applicable, include the relevant part of the job / step log output here. All sensitive information should already be masked out, but please double-check before pasting here.

Runner and Worker's Diagnostic Logs

If applicable, add relevant diagnostic log information. Logs are located in the runner's _diag folder. The runner logs are prefixed with Runner_ and the worker logs are prefixed with Worker_. Each job run correlates to a worker log. All sensitive information should already be masked out, but please double-check before pasting here.

@GuyAv46 GuyAv46 added the bug Something isn't working label Dec 14, 2023
@GuyAv46
Copy link
Author

GuyAv46 commented Dec 14, 2023

Debug run output of the exit 1 step:

 ##[debug]Evaluating condition for step: 'Run exit 1'
##[debug]Evaluating: (success() && contains(needs[*].result, 'failure'))
##[debug]Evaluating And:
##[debug]..Evaluating success:
##[debug]..=> true
##[debug]..Evaluating contains:
##[debug]....Evaluating Index:
##[debug]......Evaluating Index:
##[debug]........Evaluating needs:
##[debug]........=> Object
##[debug]........Evaluating Wildcard:
##[debug]........=> '*'
##[debug]......=> Array
##[debug]......Evaluating String:
##[debug]......=> 'result'
##[debug]....=> Array
##[debug]....Evaluating String:
##[debug]....=> 'failure'
##[debug]..=> false
##[debug]=> false
##[debug]Expanded: (true && contains(Array, 'failure'))
##[debug]Result: false

@GuyAv46
Copy link
Author

GuyAv46 commented Dec 14, 2023

Screenshot from 2023-12-14 15-41-54

@GuyAv46 GuyAv46 changed the title Job runs unexpectedly cancelled() returns false after workflow cancel Dec 14, 2023
@GuyAv46
Copy link
Author

GuyAv46 commented Dec 17, 2023

Just happened again, and I think I figured out why:

  • The external/first matrix uses workflow calls with the default fail-fast. When one of the entries fails, it cancels the other matrix jobs and marks the matrix result as cancelled.
  • When I tested a similar scenario, but the matrix has its own steps (and not calling a reusable workflow), the matrix result was failure.

I couldn't find any mention of this inconsistent behavior.

@ruvceskistefan
Copy link
Contributor

Hey @GuyAv46,
here is an example which cancels the final job if some of previous jobs had failed.

name: test_require_status_check

on:
  workflow_dispatch:
  pull_request:
  push:

jobs:
  pre-requisite:
    runs-on: ubuntu-latest
    steps:
      - name: Get Req
        id: get-req
        run: echo "get req"

  build-one:
    name: build one
    runs-on: ubuntu-latest
    needs: pre-requisite
    steps:
      - run: | 
          echo "build-one"
          exit 1

  build-two:
    name: build two
    runs-on: ubuntu-latest
    needs: pre-requisite
    steps:
      - run: echo "build-two"

  test:
    name: Test initial
    needs:
      - pre-requisite
      - build-two
    runs-on: ubuntu-latest
    steps:
      - run: |
          echo "Testing builds"
          exit 1

  test-more:
    name: Test more
    needs:
      - pre-requisite
      - build-two
    runs-on: ubuntu-latest
    steps:
      - run: echo "Test more"

  results:
    runs-on: ubuntu-latest
    if: ${{ always() && contains(needs.*.result, 'failure') == false}}
    needs:
      - build-one
      - build-two
      - test
      - test-more
    steps:
      - run: echo "Workflows have succeeded!"

And below you can see diagram image
image

@GuyAv46
Copy link
Author

GuyAv46 commented Dec 26, 2023

The problem is that if I only skip the task, it's enough for the required tasks for the branch protection rules, and the PR is approved. I had to force this job to run and fail if there was a fail/cancel.

My solution was to add "cancelled" to the expected output for failure:

  pr-validation:
    needs: [test-linux-x86_64, test-linux-aarch64, test-macos, coverage, sanitize]
    runs-on: ubuntu-latest
    if: ${{ !cancelled() }}
    steps:
      - if: contains(needs.*.result, 'failure') || contains(needs.*.result, 'cancelled')
        run: exit 1

@bpp-liamhardman
Copy link

bpp-liamhardman commented Jan 4, 2024

I'm seeing this behaviour as well when using any of the various of if cancelled()I've seen. Here's one of the test steps I have in my workflow:

  echo-test:

    runs-on: ubuntu-latest
    if: ${{ cancelled() }}
    run: echo "Job was cancelled!"

I've tried using just if: cancelled() along with if: ${{ always() }} and - if: ${{ needs.*.result == 'cancelled' }} . I was going to raise a separate issue but it seems like we're seeing the same behaviour, so probably best not to duplicate work for the team.

@GuyAv46
Copy link
Author

GuyAv46 commented Jan 28, 2024

After I thought I found the workaround I needed, it happened again.
This time my pr-validation checks for both failure and cancelled, but somehow the result was neither.

  pr-validation:
    needs:
      - docs-only # if the setup jobs fail, the rest of the jobs will be skipped, and we will exit with a failure
      - get-latest-redis-tag
      - get-latest-7-0-redis-tag
      - get-latest-6-2-redis-tag
      - get-latest-6-0-redis-tag
      - test-linux-x86_64
      - test-linux-aarch64
      - test-macos
      - coverage
      - sanitize
    runs-on: ubuntu-latest
    if: ${{ !cancelled() }}
    steps:
      - if: contains(needs.*.result, 'failure') || contains(needs.*.result, 'cancelled')
        run: exit 1

Screenshot from 2024-01-28 17-21-53

@GuyAv46
Copy link
Author

GuyAv46 commented Jan 28, 2024

Debug logging is pretty useless if you can't see the content of the Array. I tried to reproduce it but was only able to achieve either 'failure' or 'cancelled'

@GuyAv46
Copy link
Author

GuyAv46 commented Jan 30, 2024

I figured out what is the problem.
As I mentioned before, if a matrix is used with a reusable workflow, and one of the matrix runs fails, the result of the matrix is CANCELLED. I tried to work around this, but...
If you have a workflow containing a matrix that calls a reusable workflow, and in this workflow, there is another matrix that uses another reusable workflow, if one of the deepest workflows fails, its parent matrix will report cancelled, which will cause the grandparent matrix TO IGNORE THE RETURN STATUS AND REPORT SUCCESS. This means that I must flatten out my matrix so a cancelled status won't be ignored. I can't have a fail-fast in this case, but at least my CI won't pass on a failing flow.
I hope this will get some attention soon

@GuyAv46
Copy link
Author

GuyAv46 commented Jan 30, 2024

Using ARR variable to get the results of the list:

  pr-validation:
    needs:
      - docs-only # if the setup jobs fail, the rest of the jobs will be skipped, and we will exit with a failure
      - get-latest-redis-tag
      - get-latest-7-0-redis-tag
      - get-latest-6-2-redis-tag
      - get-latest-6-0-redis-tag
      - test-linux-x86_64
      - test-macos
      - coverage
      - sanitize
    runs-on: ubuntu-latest
    if: ${{ !cancelled() }}
    env:
      ARR: ${{ toJson(needs.*.result) }}
    steps:
      - if: contains(needs.*.result, 'failure') || contains(needs.*.result, 'cancelled')
        run: exit 1
      - run: exit 1

Screenshot from 2024-01-30 14-04-59

@tgoric
Copy link

tgoric commented Mar 5, 2024

I came across this issue as well. From what I can tell, it's specific to having a matrix that calls a reusable workflow. When one job of the matrix fails, and others fail fast, it reports a cancelled status but the cancelled() status check function doesn't return true.

I have a simple reproduction in this repo with runs showing the behavior, and also started this discussion post yesterday.

In the simple case of a single matrix (I don't have any matrix calling another matrix), using if: ${{ !contains(needs.*.result, 'cancelled') }} instead of if: ${{ !cancelled() }} is good enough.

Copy link
Contributor

This issue is stale because it has been open 365 days with no activity. Remove stale label or comment or this will be closed in 15 days.

@github-actions github-actions bot added the Stale label Mar 10, 2025
Copy link
Contributor

github-actions bot commented Apr 7, 2025

This issue was closed because it has been stalled for 15 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

4 participants