Skip to content

workflows/premerge: Generate a ccache artifact for each pull request #124311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

tstellar
Copy link
Collaborator

This allows each update for a pull request to reuse the ccache data from the previous run for that pull request. This will help improve build times since the ccache data used will be specific to the pull request and not polluted with builds of other PRs.

This allows each update for a pull request to reuse the ccache data
from the previous run for that pull request.  This will help improve
build times since the ccache data used will be specific to the
pull request and not polluted with builds of other PRs.
@tstellar tstellar marked this pull request as ready for review January 24, 2025 23:24
@llvmbot
Copy link
Member

llvmbot commented Jan 24, 2025

@llvm/pr-subscribers-github-workflow

Author: Tom Stellard (tstellar)

Changes

This allows each update for a pull request to reuse the ccache data from the previous run for that pull request. This will help improve build times since the ccache data used will be specific to the pull request and not polluted with builds of other PRs.


Full diff: https://github.com/llvm/llvm-project/pull/124311.diff

3 Files Affected:

  • (added) .github/workflows/pr-ccache-restore/action.yml (+23)
  • (added) .github/workflows/pr-ccache-save/action.yml (+25)
  • (modified) .github/workflows/premerge.yaml (+13)
diff --git a/.github/workflows/pr-ccache-restore/action.yml b/.github/workflows/pr-ccache-restore/action.yml
new file mode 100644
index 00000000000000..9ae25b446f9909
--- /dev/null
+++ b/.github/workflows/pr-ccache-restore/action.yml
@@ -0,0 +1,23 @@
+name: PR ccache restore
+
+inputs:
+  artifact-name-suffix:
+    desciption: The suffix to append to the artifict name (ccache-pr#)
+    required: true
+
+runs:
+  using: "composite"
+  steps:
+    - uses: ./.github/workflows/unprivileged-download-artifact
+      id: download-artifact
+      with:
+        artifact-name: ccache-pr${{ github.event.pull_request.number }}-${{ inputs.artifact-name-suffix }}
+
+    - shell: bash
+      if: steps.download-artifact.outputs.filename != ''
+      run: |
+        # Is this the best way to clear the cache?
+        rm -Rf .ccache/
+        rm ${{ steps.download-artifact.outputs.filename }}
+        tar --zstd -xf ccache.tar.zst
+        rm ccache.tar.zst
diff --git a/.github/workflows/pr-ccache-save/action.yml b/.github/workflows/pr-ccache-save/action.yml
new file mode 100644
index 00000000000000..b88f511d751794
--- /dev/null
+++ b/.github/workflows/pr-ccache-save/action.yml
@@ -0,0 +1,25 @@
+name: PR ccache save
+inputs:
+  artifact-name-suffix:
+    desciption: The suffix to append to the artifict name (ccache-pr#)
+    required: true
+
+runs:
+  using: "composite"
+  steps:
+    - name: Package ccache Directory
+      shell: bash
+      run: |
+        # Dereference symlinks so that this works on Windows.
+        tar -h -c .ccache | zstd -T0 -c > ccache.tar.zst
+    - uses: actions/upload-artifact@26f96dfa697d77e81fd5907df203aa23a56210a8 #v4.3.0
+      with:
+        name: 'ccache-pr${{ github.event.number }}-${{ inputs.artifact-name-suffix }}'
+        path: ccache.tar.zst
+        retention-days: 7
+        overwrite: true
+
+    - shell: bash
+      run: |
+        rm ccache.tar.zst
+        ccache --show-stats
diff --git a/.github/workflows/premerge.yaml b/.github/workflows/premerge.yaml
index 30f4fc807f3a57..c10e8d17f75979 100644
--- a/.github/workflows/premerge.yaml
+++ b/.github/workflows/premerge.yaml
@@ -27,6 +27,13 @@ jobs:
         uses: hendrikmuhs/[email protected]
         with:
           max-size: "2000M"
+      - name: Install dependencies for pr-ccache-restore
+        run: |
+          sudo apt-get install zstd unzip
+      - name: Restore ccache from previous PR run
+        uses: ./.github/workflows/pr-ccache-restore
+        with:
+          artifact-name-suffix: Linux-x86
       - name: Build and Test
         # Mark the job as a success even if the step fails so that people do
         # not get notified while the new premerge pipeline is in an
@@ -70,3 +77,9 @@ jobs:
           export CXX=/opt/llvm/bin/clang++
 
           ./.ci/monolithic-linux.sh "$(echo ${linux_projects} | tr ' ' ';')" "$(echo ${linux_check_targets})" "$(echo ${linux_runtimes} | tr ' ' ';')" "$(echo ${linux_runtime_check_targets})"
+
+      - name: Save ccache for next PR run
+        if: always()
+        uses: ./.github/workflows/pr-ccache-save
+        with:
+          artifact-name-suffix: Linux-x86

Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need separate actions for this? I thought it was possible to implement this by attaching the PR (if we're running against a pr and the number is available) to the cache key, and then setting the ccache action to also look for cache keys without a PR (like the most recent run against main)?

@tstellar
Copy link
Collaborator Author

Do we need separate actions for this? I thought it was possible to implement this by attaching the PR (if we're running against a pr and the number is available) to the cache key, and then setting the ccache action to also look for cache keys without a PR (like the most recent run against main)?

You can implement this using the cache, but it's limited to 10 GB (for the whole repo), so that's not really enough to cache all the PRs. By using an artifact instead of the cache, you can ensure that the cache for each PR will persist long enough to actually be reused when the PR is updated.

@boomanaiden154
Copy link
Contributor

You can implement this using the cache, but it's limited to 10 GB (for the whole repo), so that's not really enough to cache all the PRs. By using an artifact instead of the cache, you can ensure that the cache for each PR will persist long enough to actually be reused when the PR is updated.

Ah. Completely missed the fact that this purposefully used an artifact. I thought caches saved on the pull_request target were saved in the context of the fork (thus counting against a different 10GB limit)? Couldn't find that wording in the documentation at a quick glance though, so I might be forgetting something. That would make the simpler approach a bit more advantageous.

We're also running the CI on PRs as if they're merged into main, so I think a cache from a recent main would have a much higher hit rate than a PR-specific cache from even a day ago.

@tstellar
Copy link
Collaborator Author

It looks like caches from PRs stored in the target repo cache: https://github.com/llvm/llvm-project/actions/caches, which now that I think about it, seems like a security risk to me. It also says we are using ~150GB/10GB of the cache, so I guess it's a soft limit?

We're also running the CI on PRs as if they're merged into main, so I think a cache from a recent main would have a much higher hit rate than a PR-specific cache from even a day ago.

Yes, that's a good point. I wonder if we should limit the PRs to read-only access to the cache, that way it won't be polluted with lots of different PR builds.

@boomanaiden154
Copy link
Contributor

It looks like caches from PRs stored in the target repo cache: https://github.com/llvm/llvm-project/actions/caches, which now that I think about it, seems like a security risk to me. It also says we are using ~150GB/10GB of the cache, so I guess it's a soft limit?

Yeah, it's definitely a soft limit. Not sure what the actual logic is, but full eviction doesn't happen for some time.

Yes, that's a good point. I wonder if we should limit the PRs to read-only access to the cache, that way it won't be polluted with lots of different PR builds.

This seems like the best course of action to me. The only other thing we would need to figure out is special casing the release branch, because main and the release branch will diverge quickly. Just adding a key to the cache on the target branch wouldn't be a bad first step, but I'm guessing those cache keys will get evicted pretty quickly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants