-
-
Notifications
You must be signed in to change notification settings - Fork 3
Sync minio b2 #692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync minio b2 #692
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a new script to sync a MinIO bucket with a B2 bucket in order to support periodic S3 cleaning via clean_s3.py.
- Introduces a syncing script that compares object modification times between buckets.
- Implements object download, upload with retry logic and conditional deletion.
- Excludes model checkpoint files from the sync when specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just making sure, you've tested that you're able to successfully sync up to backblaze with this script, right? I think I recall you mentioning it but I'm just confirming.
Also, if you can make this executable if it's not already by running chmod +x scripts/sync_minio_b2.py
, that will prepare it to be run on our server. And if you could do the same thing for clean_s3.py
, since I think I didn't commit the executable permission to the repo, that would be great.
Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @TaperChipmunk32)
scripts/sync_minio_b2.py
line 20 at r2 (raw file):
aws_secret_access_key=os.getenv("MINIO_SECRET_KEY"), # Verify is false if endpoint_url is an IP address. Aqua/Cheetah connecting to MinIO need this disabled for now. verify=False if re.match(r"https://\d+\.\d+\.\d+\.\d+", os.getenv("MINIO_ENDPOINT_URL")) else True,
Should this line assigning a value to verify
get added to clean_s3.py as well?
scripts/sync_minio_b2.py
line 58 at r2 (raw file):
for key in keys_to_remove: if key in b2_objects:
Rather than use if statements, you can just call b2_objects.pop(key, None)
and minio_objects.pop(key, None)
.
scripts/sync_minio_b2.py
line 109 at r2 (raw file):
def try_n_times(func: Callable, n=10):
Can you import this from /silnlp/common/environment.py
rather than paste it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 2 files reviewed, 6 unresolved discussions (waiting on @copilot-pull-request-reviewer[bot] and @mshannon-sil)
scripts/sync_minio_b2.py
line 20 at r2 (raw file):
Previously, mshannon-sil wrote…
Should this line assigning a value to
verify
get added to clean_s3.py as well?
Yes, I just added it.
scripts/sync_minio_b2.py
line 58 at r2 (raw file):
Previously, mshannon-sil wrote…
Rather than use if statements, you can just call
b2_objects.pop(key, None)
andminio_objects.pop(key, None)
.
Done.
scripts/sync_minio_b2.py
line 109 at r2 (raw file):
Previously, mshannon-sil wrote…
Can you import this from
/silnlp/common/environment.py
rather than paste it here?
Yes, I just fixed that. I had assumed it would be running without the repo.
@mshannon-sil I do still need to test it without dry-run, thanks for the reminder. Also, after running chmod on each of the scripts, there are no changes to commit, so I assume the permission already existed, or git is ignoring it. |
I just tested it and it is working now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Everything looks good to me, but let's hold off on merging until it runs successfully on AQuA. I'll go ahead and set that up, using this branch.
Reviewed 1 of 2 files at r3.
Reviewable status: 1 of 2 files reviewed, 4 unresolved discussions (waiting on @TaperChipmunk32)
scripts/sync_minio_b2.py
line 109 at r2 (raw file):
Previously, TaperChipmunk32 (Matthew Beech) wrote…
Yes, I just fixed that. I had assumed it would be running without the repo.
Gotcha. Yeah I have the repo cloned on the server, so it should work.
Would you like me to run it today on AQuA? |
Sure, you may want to keep an eye on the API charges as it runs. I'll be away from my computer in a little while. |
Confirmed that the script is running successfully on AQuA every Sunday after the clean_s3 script. |
This adds a script that will sync MinIO with B2 to be run with clean_s3.py periodically.
This change is