Skip to content

Action should not be successful and cached if outputs were not created #14543

@keith

Description

@keith

We recently had a cache of cache poisoning where we realized that actions are considered successful based on their exit code, even if the actual outputs of the action do not match the listed outputs, which later fails the build with something like:

ERROR: /path/to/BUILD:3:8: output 'foo.txt' was not created

But this does not stop the same action from being fetched from the cache by subsequent builds. Specifically what happened in our case was:

  1. The compiler of an action crashed during execution due to a hardware failure
  2. The action somehow still exited 0 (this is something that I also need to investigate and fix, but likely on the rules_swift side)
  3. Bazel cached the output of the action, which was a log of the compiler crash, and that no outputs were created
  4. Bazel failed after it identified the missing outputs
  5. All subsequent builds with the same inputs pulled this invalid cache entry and failed showing the same compiler crash log

I think if the outputs being created successfully were part of the requirement for an action to be marked as successful, this wouldn't have happened. (This clearly requires your action fails non-deterministically, which should be rare, but can happen in cases like this.)

Here's the execution log json from the action where the compiler crashed:

{
  "commandArgs": ["bazel-out/darwin-opt-exec-2B5CBBC6-ST-d7817b5f5799/bin/external/build_bazel_rules_swift/tools/worker/worker", "swiftc", "@bazel-out/ios-arm64-min12.0-applebin_ios-ios_arm64-opt-ST-d7817b5f5799/bin/Modules/Foo/Foo.swiftmodule-0.params"],
  snip ...
  "inputs": snip...,
  "listedOutputs": ["bazel-out/ios-arm64-min12.0-applebin_ios-ios_arm64-opt-ST-d7817b5f5799/bin/Modules/Foo/Foo.swiftmodule", snip ...],
  "remotable": true,
  "cacheable": true,
  "timeoutMillis": "0",
  "progressMessage": "Compiling Swift module //Modules/Foo:Foo",
  "mnemonic": "SwiftCompile",
  "actualOutputs": [],
  "runner": "worker",
  "remoteCacheHit": false,
  "status": "",
  "exitCode": 0,
  "remoteCacheable": true,
  "walltime": "7.261184363s"
}

And then the log from all subsequent builds with the same inputs:

{
  "commandArgs": ["bazel-out/darwin-opt-exec-2B5CBBC6-ST-d7817b5f5799/bin/external/build_bazel_rules_swift/tools/worker/worker", "swiftc", "@bazel-out/ios-arm64-min12.0-applebin_ios-ios_arm64-opt-ST-d7817b5f5799/bin/Modules/Foo/Foo.swiftmodule-0.params"],
  snip ...
  "inputs": snip...,
  "listedOutputs": ["bazel-out/ios-arm64-min12.0-applebin_ios-ios_arm64-opt-ST-d7817b5f5799/bin/Modules/Foo/Foo.swiftmodule", snip...],
  "remotable": true,
  "cacheable": true,
  "timeoutMillis": "0",
  "progressMessage": "Compiling Swift module //Modules/Foo:Foo",
  "mnemonic": "SwiftCompile",
  "actualOutputs": [],
  "runner": "remote cache hit",
  "remoteCacheHit": true,
  "status": "",
  "exitCode": 0,
  "remoteCacheable": true,
  "walltime": "0s"
}

Note the second execution log shows the invalid results were pulled from cache.

What operating system are you running Bazel on?

macOS

What's the output of bazel info release?

5.0.0rc3

Metadata

Metadata

Assignees

Labels

P1I'll work on this now. (Assignee required)team-Remote-ExecIssues and PRs for the Execution (Remote) teamtype: bug

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions