-
-
Notifications
You must be signed in to change notification settings - Fork 3
SSL verification error #640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Is this error causing problems? |
Yes, oftentimes experiments are failing after 10 retries. |
I just saw this on a production job as well: |
After talking with TechOps, It looks like there may be an issue with the network's upload speeds. I submitted a ticket to IT services yesterday to see if they can resolve the issue on their end, and I'm currently waiting for a reply. |
Research has switched over to using the MinIO bucket on our server rather than AWS, so we shouldn't run into this issue anymore. |
When uploading checkpoints to the S3 bucket, the following error is occurring:
botocore.exceptions.SSLError: SSL validation failed for [file to be uploaded] EOF occurred in violation of protocol (_ssl.c:2426)
. This error appears about 70 seconds past the initial upload timestamp. The read timeout is currently set to 600 seconds, but the connect timeout is only set to 60 seconds by default, so that may be the cause.The text was updated successfully, but these errors were encountered: