Skip to content

Commit cea9e82

Browse files
authored
Improves deletion of old artifacts. (#11079)
We introduced deletion of the old artifacts as this was the suspected culprit of Kubernetes Job failures. It turned out eventually that those Kubernetes Job failures were caused by the #11017 change, but it's good to do housekeeping of the artifacts anyway. The delete workflow action introduced in a hurry had two problems: * it runs for every fork if they sync master. This is a bit too invasive * it fails continuously after 10 - 30 minutes every time as we have too many old artifacts to delete (GitHub has 90 days retention policy so we have likely tens of thousands of artifacts to delete) * it runs every hour and it causes occasional API rate limit exhaustion (because we have too many artifacts to loop trough) This PR introduces filtering with the repo, changes the frequency of deletion to be 4 times a day. Back of the envelope calculation tops 4/day at 2500 artifacts to delete at every run so we have low risk of reaching 5000 API calls/hr rate limit. and adds script that we are running manually to delete those excessive artifacts now. Eventually when the number of artifacts goes down the regular job should delete maybe a few hundreds of artifacts appearing within the 6 hours window in normal circumstances and it should stop failing then.
1 parent 1ebd3a6 commit cea9e82

File tree

3 files changed

+96
-1
lines changed

3 files changed

+96
-1
lines changed

.github/workflows/delete_old_artifacts.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
11
name: 'Delete old artifacts'
22
on:
33
schedule:
4-
- cron: '0 * * * *' # every hour
4+
- cron: '27 */6 * * *' # run every 6 hours
55

66
jobs:
77
delete-artifacts:
88
runs-on: ubuntu-latest
9+
if: github.repository == 'apache/airflow'
910
steps:
1011
- uses: kolpav/purge-artifacts-action@v1
1112
with:

CI.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -615,6 +615,16 @@ This is manually triggered workflow (via GitHub UI manual run) that should only
615615
When triggered, it will force-push the "apache/airflow" master to the fork's master. It's the easiest
616616
way to sync your fork master to the Apache Airflow's one.
617617

618+
Delete old artifacts
619+
--------------------
620+
621+
This workflow is introduced, to delete old artifacts from the Github Actions build. We set it to
622+
delete old artifacts that are > 7 days old. It only runs for the 'apache/airflow' repository.
623+
624+
We also have a script that can help to clean-up the old artifacts:
625+
`remove_artifacts.sh <dev/remove_artifacts.sh>`_
626+
627+
618628
Naming conventions for stored images
619629
====================================
620630

dev/remove_artifacts.sh

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
#!/usr/bin/env bash
2+
# Licensed to the Apache Software Foundation (ASF) under one
3+
# or more contributor license agreements. See the NOTICE file
4+
# distributed with this work for additional information
5+
# regarding copyright ownership. The ASF licenses this file
6+
# to you under the Apache License, Version 2.0 (the
7+
# "License"); you may not use this file except in compliance
8+
# with the License. You may obtain a copy of the License at
9+
#
10+
# http://www.apache.org/licenses/LICENSE-2.0
11+
#
12+
# Unless required by applicable law or agreed to in writing,
13+
# software distributed under the License is distributed on an
14+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
# KIND, either express or implied. See the License for the
16+
# specific language governing permissions and limitations
17+
# under the License.
18+
set -euo pipefail
19+
20+
# Parameters:
21+
#
22+
# GITHUB_REPO - repository to delete the artifacts
23+
# GITHUB_USER - your personal user name
24+
# GITHUB_TOKEN - your personal token with `repo` scope
25+
#
26+
GITHUB_REPO=https://api.github.com/repos/apache/airflow
27+
readonly GITHUB_REPO
28+
29+
if [[ -z ${GITHUB_USER} ]]; then
30+
echo 2>&1
31+
echo 2>&1 "Set GITHUB_USER variable to your user"
32+
echo 2>&1
33+
exit 1
34+
fi
35+
readonly GITHUB_USER
36+
37+
if [[ -z ${GITHUB_TOKEN} ]]; then
38+
echo 2>&1
39+
echo 2>&1 "Set GITHUB_TOKEN variable to a token with 'repo' scope"
40+
echo 2>&1
41+
exit 2
42+
fi
43+
GITHUB_TOKEN=${GITHUB_TOKEN}
44+
readonly GITHUB_TOKEN
45+
46+
function github_api_call() {
47+
curl --silent --location --user "${GITHUB_USER}:${GITHUB_TOKEN}" "$@"
48+
}
49+
50+
# A temporary file which receives HTTP response headers.
51+
TEMPFILE=$(mktemp)
52+
readonly TEMPFILE
53+
54+
function loop_through_artifacts_and_delete() {
55+
56+
# Process all artifacts on this repository, loop on returned "pages".
57+
artifact_url=${GITHUB_REPO}/actions/artifacts
58+
59+
while [[ -n "${artifact_url}" ]]; do
60+
# Get current page, get response headers in a temporary file.
61+
json=$(github_api_call --dump-header "${TEMPFILE}" "$artifact_url")
62+
63+
# Get artifact_url of next page. Will be empty if we are at the last page.
64+
artifact_url=$(grep '^Link:' "$TEMPFILE" | tr ',' '\n' | \
65+
grep 'rel="next"' | head -1 | sed -e 's/.*<//' -e 's/>.*//')
66+
rm -f "${TEMPFILE}"
67+
68+
# Number of artifacts on this page:
69+
count=$(($(jq <<<"${json}" -r '.artifacts | length')))
70+
71+
# Loop on all artifacts on this page.
72+
for ((i = 0; "${i}" < "${count}"; i++)); do
73+
# Get the name of artifact and count instances of this name
74+
name=$(jq <<<"${json}" -r ".artifacts[$i].name?")
75+
id=$(jq <<<"${json}" -r ".artifacts[$i].id?")
76+
size=$(($(jq <<<"${json}" -r ".artifacts[$i].size_in_bytes?")))
77+
printf "Deleting '%s': [%s] : %'d bytes\n" "${name}" "${id}" "${size}"
78+
github_api_call -X DELETE "${GITHUB_REPO}/actions/artifacts/${id}"
79+
sleep 1 # There is a Github API limit of 5000 calls/hr. This is to limit the API calls below that
80+
done
81+
done
82+
}
83+
84+
loop_through_artifacts_and_delete

0 commit comments

Comments
 (0)