-
Notifications
You must be signed in to change notification settings - Fork 88
Compute Instance Not Reconnecting After Resource Maxed Out and Throwing 404 Error Despite Being Active and Running on GCE #510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The latest plugin version 4.681.v9020cf2b_7453 adds support for GCP's limit VM runtime via the maxRunDuration option applicable to both Standard and Spot VMs. Upgrade from
did you mean something like the pipeline retry option? (if yes, you can put the specific part of the pipeline within the Using the (missed this notification, sry, now watching it) |
I tried to simulate by causing a memory stress, pipeline used, node ('gce') {
sh """
timeout <num-seconds>s stress --vm 2 --vm-bytes 250M
"""
} For now I did two tests, (above pipeline run for In the both cases, when the However the stress command terminate after timout, and jenkins did reconnect and completed the pipeline.
Pipeline logs 240s
Pipeline logs 600s
System logs
|
I wasn't able to simulate the kind of logs in the description.. The log is indicating of an operation request sent to GCP (such as provisioning a new machine, deleting a machine) etc. The current stack trace I am unable to find what is the code path in this plugin that triggered it; the stack trace is only showing the internal library line numbers. @skrishna375 , Can you please share the full stacktrace (please redact any sensitive information, such as VM name etc.) |
Jenkins and plugins versions report
Environment
What Operating System are you using (both controller, and any agents involved in the problem)?
Controller: Rocky Linux 8.10
Agent: Rocky Linux 8.10
Reproduction steps
2.1 Job which is running will lose the contact and return an error as Cannot contact jenkins-agent-*: java.lang.InterruptedException
2.2 Controller logs will return with 404 operation error but nothing specific at the Agent log.
Expected Results
Agent should get reconnected (unaware if there is any retry options available at this moment)
Actual Results
Agent connection lost and leaves as zombie although the instance is healthy at the GCP end. This is leading to issues while creating agents due to zombie nature and it has to be cleaned up manually.
In ideal scenario, it should have been reconnected to execute jobs on this specific agent.
Anything else?
We have multiple cloud been configured with different instance types and Issue is applicable to both agents configured with One shot as True & False.
Are you interested in contributing a fix?
No response
The text was updated successfully, but these errors were encountered: