-
Notifications
You must be signed in to change notification settings - Fork 4.8k
GC No longer Executes #21824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
|
Hi @mdavid01 To resolve this issue, follow these steps:
|
Thanks, Wang:
|
Thanks Wang. Will attempt this on weekend as these are critical production systems. At this time, none of the 5 job service /var/log/jobs shows any files. I assume we need root access to view these files? |
Hello Wang: we did not execute the steps recommended above because I found what I believe to be a confirmation that GC is running but far too slowly to ever complete. When tracking the Artifact_Trash database table row count every 10 seconds, I found the record count to be decreasing at about 1 entry per minute during prime time. Our artifact trash table count started at 57,000+ entries this morning. Below is a summary of GC performance based on the attached performance tracking file: 844 Estimated total hours to complete garbage collection At current rate, GC will run for 844 hours if no new deletions are added. At the current rate, we can handle 1400 GC artifacts per day.
Attached is the Excel file that captured the speed at which GC is removing Artifact_Trash records from the database. I assume that tracking Artifact_Trash activity serves as a valid proxy for GC performance. Harbor Garbage Collection Timings.xlsx Thanks. |
Hi Team: raised this issue on the 4/2 community meeting:
on March 6, GC ran as expected. On march 7, it stopped executing in same fashion as our other environment.
GC log only shows:
{"errors":[{"code":"NOT_FOUND","message":"{"code":10010,"message":"object is not found","details":"log entity: b768f80f8781bf9ef30708f0"}"}]}
Same result on GC scheduled, manual, and dry runs. Only the log entity in the error message changes. Since we're using HELM, there's no way to trace the code.
Initially opened this issue at #21655 but received no actionable response. We've googled and tried multiple fixes. We don't know how to find the log entity -- Or, is it the log entity that's not found?
LM (LMT) Leadership quite frustrated with lack of attention to this issue. We're happy to answer any questions about our environment, pod logs, etc.
Sorry, guys - really need help with this.
The text was updated successfully, but these errors were encountered: