Skip to content

Bazel symlink prefetching can crash - triggered by Builds without the Bytes #18772

@alexofortune

Description

@alexofortune

Description of the bug:

We're trying to adopt Builds without the Bytes at our codebase, but we see build failures when we enable it related to strange symlink treatment. Here's the error and stack trace:

Exec failed due to IOException: 136 errors during bulk transfer:
java.io.IOException: /mnt/cache/bazel/output/652855c435821699db0f4352be3bd5fd/execroot/__main__/bazel-out/k8-fastbuild/bin/external/com_git_scm_git/git-add/git (Not a directory)
		at com.google.devtools.build.lib.unix.NativePosixFiles.openWrite(Native Method)
		at com.google.devtools.build.lib.unix.UnixFileSystem.createFileOutputStream(UnixFileSystem.java:519)
		at com.google.devtools.build.lib.vfs.AbstractFileSystem.getOutputStream(AbstractFileSystem.java:174)
		at com.google.devtools.build.lib.vfs.AbstractFileSystem.getOutputStream(AbstractFileSystem.java:188)
		at com.google.devtools.build.lib.vfs.Path.getOutputStream(Path.java:408)
		at com.google.devtools.build.lib.vfs.Path.getOutputStream(Path.java:396)
		at com.google.devtools.build.lib.vfs.FileSystemUtils.moveFile(FileSystemUtils.java:457)
		at com.google.devtools.build.lib.remote.AbstractActionInputPrefetcher.finalizeDownload(AbstractActionInputPrefetcher.java:547)
		at com.google.devtools.build.lib.remote.AbstractActionInputPrefetcher.lambda$downloadFileNoCheckRx$14(AbstractActionInputPrefetcher.java:475)
		at io.reactivex.rxjava3.internal.operators.completable.CompletablePeek$CompletableObserverImplementation.onComplete(CompletablePeek.java:107)
		... 85 more

Upon further investigation, we found this culprit block of code at https://github.com/bazelbuild/bazel/blob/d097b5d6cd3bc9fdb725b379b6cf3ef247126008/src/main/java/com/google/devtools/build/lib/remote/AbstractActionInputPrefetcher.java , line 437 :

    if (path.isSymbolicLink()) {
      try {
        path = path.getRelative(path.readSymbolicLink());
      } catch (IOException e) {
        return Completable.error(e);
      }
    }

Instrumenting this with logs yielded following results:

230626 10:35:11.852:I 556 [com.google.devtools.build.lib.remote.AbstractActionInputPrefetcher.downloadFileNoCheckRx] [2] prefetching a file - original path /mnt/cache/bazel/output/652855c435821699db0f4352be3bd5fd/execroot/__main__/bazel-out/k8-fastbuild/bin/external/com_git_scm_git/git-maintenance
230626 10:35:11.852:I 556 [com.google.devtools.build.lib.remote.AbstractActionInputPrefetcher.downloadFileNoCheckRx] [2] prefetching a file - transformed path /mnt/cache/bazel/output/652855c435821699db0f4352be3bd5fd/execroot/__main__/bazel-out/k8-fastbuild/bin/external/com_git_scm_git/git-maintenance/git

To be precise:

  • git is standard git binary
  • git-maintenance is a symlink to git, generated via a bash script via genrule, and declared via outs. Note that this genrule is marked as local = True

This results in resolving the download path to a directory that doesn't exist and IOException when Bazel tries to attempt moving.

At least three questions come to my mind:

  • Why bazel decides to prefetch something that is already on local filesystem? ( At the time of the crash, I can see both git binary and the symlinks )
  • Why does bazel try to even check if the path is a symbolic link before it attempts to pre-fetch the file?
  • Why does bazel think its a good idea to interpret that path as a directory and append relative path to it, resulting in this bogus path?

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Something akin to this should work:

a) Create a genrule:

SYMLINKS = [
   "binary-symlink",
   "binary-symlink2",
]
genrule(
    name = "symlinks",
    srcs = [":binary"],
    outs = SYMLINKS,
    cmd = ";".join([
               "ln -s binary $(location %s)" % f
               for f in SYMLINKS
     ]),
     local = True,
)

b) Add an action depending on the symlinks. Try to build it with BWOB.

Which operating system are you running Bazel on?

Linux

What is the output of bazel info release?

6.1.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

No response

Have you found anything relevant by searching the web?

Nothing

Any other information, logs, or outputs that you want to share?

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions