-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Description
We recently had a cache of cache poisoning where we realized that actions are considered successful based on their exit code, even if the actual outputs of the action do not match the listed outputs, which later fails the build with something like:
ERROR: /path/to/BUILD:3:8: output 'foo.txt' was not created
But this does not stop the same action from being fetched from the cache by subsequent builds. Specifically what happened in our case was:
- The compiler of an action crashed during execution due to a hardware failure
- The action somehow still exited 0 (this is something that I also need to investigate and fix, but likely on the rules_swift side)
- Bazel cached the output of the action, which was a log of the compiler crash, and that no outputs were created
- Bazel failed after it identified the missing outputs
- All subsequent builds with the same inputs pulled this invalid cache entry and failed showing the same compiler crash log
I think if the outputs being created successfully were part of the requirement for an action to be marked as successful, this wouldn't have happened. (This clearly requires your action fails non-deterministically, which should be rare, but can happen in cases like this.)
Here's the execution log json from the action where the compiler crashed:
{
"commandArgs": ["bazel-out/darwin-opt-exec-2B5CBBC6-ST-d7817b5f5799/bin/external/build_bazel_rules_swift/tools/worker/worker", "swiftc", "@bazel-out/ios-arm64-min12.0-applebin_ios-ios_arm64-opt-ST-d7817b5f5799/bin/Modules/Foo/Foo.swiftmodule-0.params"],
snip ...
"inputs": snip...,
"listedOutputs": ["bazel-out/ios-arm64-min12.0-applebin_ios-ios_arm64-opt-ST-d7817b5f5799/bin/Modules/Foo/Foo.swiftmodule", snip ...],
"remotable": true,
"cacheable": true,
"timeoutMillis": "0",
"progressMessage": "Compiling Swift module //Modules/Foo:Foo",
"mnemonic": "SwiftCompile",
"actualOutputs": [],
"runner": "worker",
"remoteCacheHit": false,
"status": "",
"exitCode": 0,
"remoteCacheable": true,
"walltime": "7.261184363s"
}
And then the log from all subsequent builds with the same inputs:
{
"commandArgs": ["bazel-out/darwin-opt-exec-2B5CBBC6-ST-d7817b5f5799/bin/external/build_bazel_rules_swift/tools/worker/worker", "swiftc", "@bazel-out/ios-arm64-min12.0-applebin_ios-ios_arm64-opt-ST-d7817b5f5799/bin/Modules/Foo/Foo.swiftmodule-0.params"],
snip ...
"inputs": snip...,
"listedOutputs": ["bazel-out/ios-arm64-min12.0-applebin_ios-ios_arm64-opt-ST-d7817b5f5799/bin/Modules/Foo/Foo.swiftmodule", snip...],
"remotable": true,
"cacheable": true,
"timeoutMillis": "0",
"progressMessage": "Compiling Swift module //Modules/Foo:Foo",
"mnemonic": "SwiftCompile",
"actualOutputs": [],
"runner": "remote cache hit",
"remoteCacheHit": true,
"status": "",
"exitCode": 0,
"remoteCacheable": true,
"walltime": "0s"
}
Note the second execution log shows the invalid results were pulled from cache.
What operating system are you running Bazel on?
macOS
What's the output of bazel info release
?
5.0.0rc3