Skip to content

Compute instance getting terminated randomly with message Agent was removed #508

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mkjkec2005 opened this issue Jan 14, 2025 · 2 comments

Comments

@mkjkec2005
Copy link

Jenkins and plugins versions report

Environment
Jenkins: 2.452.3
OS: Linux - 6.1.0-22-amd64
Java: 17.0.11 - Debian (OpenJDK 64-Bit Server VM)
---
Office-365-Connector:4.21.1
ace-editor:1.1
active-directory:2.35
ansible:403.v8d0ca_dcb_b_502
ant:497.v94e7d9fffa_b_9
antisamy-markup-formatter:162.v0e6ec0fcfcf6
apache-httpcomponents-client-4-api:4.5.14-208.v438351942757
artifactory:4.0.6
asm-api:9.7-33.v4d23ef79fcc8
authentication-tokens:1.113.v81215a_241826
aws-credentials:231.v08a_59f17d742
aws-java-sdk-ec2:1.12.730-457.v3403b_37d2170
aws-java-sdk-minimal:1.12.730-457.v3403b_37d2170
badge:1.13
bitbucket:241.v6d24a_57f9359
block-queued-job:0.2.0
blueocean:1.27.13
blueocean-autofavorite:1.2.5
blueocean-bitbucket-pipeline:1.27.13
blueocean-commons:1.27.13
blueocean-config:1.27.13
blueocean-core-js:1.27.13
blueocean-dashboard:1.27.13
blueocean-display-url:2.4.2
blueocean-events:1.27.13
blueocean-git-pipeline:1.27.13
blueocean-github-pipeline:1.27.13
blueocean-i18n:1.27.13
blueocean-jira:1.27.13
blueocean-jwt:1.27.13
blueocean-personalization:1.27.13
blueocean-pipeline-api-impl:1.27.13
blueocean-pipeline-editor:1.27.13
blueocean-pipeline-scm-api:1.27.13
blueocean-rest:1.27.13
blueocean-rest-impl:1.27.13
blueocean-web:1.27.13
bootstrap5-api:5.3.3-1
bouncycastle-api:2.30.1.78.1-233.vfdcdeb_0a_08a_a_
branch-api:2.1169.va_f810c56e895
build-monitor-plugin:1.14-883.vf620a_44eb_ec1
build-pipeline-plugin:2.0.2
build-timeout:1.33
build-user-vars-plugin:166.v52976843b_435
caffeine-api:3.1.8-133.v17b_1ff2e0599
checks-api:2.2.0
cloudbees-bitbucket-branch-source:887.va_d359b_3d2d8d
cloudbees-credentials:3.3
cloudbees-folder:6.928.v7c780211d66e
cobertura:1.17
code-coverage-api:4.99.0
command-launcher:107.v773860566e2e
commons-compress-api:1.26.1-2
commons-lang3-api:3.14.0-76.vda_5591261cfe
commons-text-api:1.12.0-119.v73ef73f2345d
conditional-buildstep:1.4.3
config-file-provider:973.vb_a_80ecb_9a_4d0
convert-to-pipeline:1.0
copy-to-slave:1.4.4
coverage:1.16.0
create-fingerprint:25.v0a_b_e60b_42fa_4
credentials:1344.v5a_3f65a_1e173
credentials-binding:677.vdc9d38cb_254d
dashboard-view:2.508.va_74654f026d1
data-tables-api:2.0.8-1
depgraph-view:1.0.5
description-setter:239.vd0a_6b_785f92d
display-url-api:2.204.vf6fddd8a_8b_e9
docker-commons:439.va_3cb_0a_6a_fb_29
docker-workflow:580.vc0c340686b_54
durable-task:555.v6802fe0f0b_82
dynamic_extended_choice_parameter:1.0.1
ec2-deployment-dashboard:1.0.10
echarts-api:5.5.0-1
eddsa-api:0.3.0-4.v84c6f0f4969e
email-ext:1814.v404722f34263
envinject:2.919.v009a_a_1067cd0
envinject-api:1.199.v3ce31253ed13
environment-dashboard:1.1.10
extended-choice-parameter:382.v5697b_32134e8
extensible-choice-parameter:1.8.1
external-monitor-job:215.v2e88e894db_f8
favorite:2.218.vd60382506538
font-awesome-api:6.5.2-1
forensics-api:2.4.0
generic-webhook-trigger:2.2.1
git:5.2.2
git-client:5.0.0
git-server:126.v0d945d8d2b_39
github:1.39.0
github-api:1.318-461.v7a_c09c9fa_d63
github-branch-source:1789.v5b_0c0cea_18c3
global-post-script:1.1.4
google-compute-engine:4.575.v6969b_7c435eb_
google-oauth-plugin:1.330.vf5e86021cb_ec
gradle:2.12
greenballs:1.15.1
groovy:457.v99900cb_85593
groovy-events-listener-plugin:2.210.v8a_4107f66127
groovy-postbuild:228.vcdb_cf7265066
gson-api:2.11.0-41.v019fcf6125dc
handlebars:3.0.8
handy-uri-templates-2-api:2.1.8-30.v7e777411b_148
htmlpublisher:1.35
instance-identity:185.v303dc7c645f9
instant-messaging:2.777.vfc1db_63216cc
ionicons-api:74.v93d5eb_813d5f
ivy:2.6
jackson2-api:2.17.0-379.v02de8ec9f64c
jakarta-activation-api:2.1.3-1
jakarta-mail-api:2.1.3-1
javadoc:243.vb_b_503b_b_45537
javax-activation-api:1.2.0-7
javax-mail-api:1.6.2-10
jaxb:2.3.9-1
jdk-tool:73.vddf737284550
jenkins-design-language:1.27.13
jenkins-jira-plugin:3.8.2
jenkinswalldisplay:0.6.34
jersey2-api:2.42-147.va_28a_44603b_d5
jira:3.13
jira-trigger:1.0.3
jjwt-api:0.11.5-112.ve82dfb_224b_a_d
job-dsl:1.87
job-import-plugin:3.6
jobConfigHistory:1229.v3039470161a_d
jobtemplates:1.0
joda-time-api:2.12.7-29.v5a_b_e3a_82269a_
jquery:1.12.4-1
jquery-detached:1.2.1
jquery-ui:1.0.2
jquery3-api:3.7.1-2
jsch:0.2.16-86.v42e010d9484b_
json-api:20240303-41.v94e11e6de726
json-path-api:2.9.0-58.v62e3e85b_a_655
junit:1265.v65b_14fa_f12f0
ldap:725.v3cb_b_711b_1a_ef
lockable-resources:1255.vf48745da_35d0
log-parser:2.3.4
mailer:472.vf7c289a_4b_420
mapdb-api:1.0.9-40.v58107308b_7a_7
matrix-auth:3.2.2
matrix-project:832.va_66e270d2946
maven-plugin:3.23
mercurial:1260.vdfb_723cdcc81
mina-sshd-api-common:2.12.1-113.v4d3ea_5eb_7f72
mina-sshd-api-core:2.12.1-113.v4d3ea_5eb_7f72
momentjs:1.1.1
oauth-credentials:0.653.v14cf2088e950
oidc-provider:79.v46f0066a_d813
okhttp-api:4.11.0-172.vda_da_1feeb_c6e
pam-auth:1.11
parameterized-trigger:806.vf6fff3e28c3e
periodicbackup:2.0
pipeline-build-step:540.vb_e8849e1a_b_d8
pipeline-github-lib:61.v629f2cc41d83
pipeline-githubnotify-step:49.vf37bf92d2bc8
pipeline-graph-analysis:216.vfd8b_ece330ca_
pipeline-groovy-lib:727.ve832a_9244dfa_
pipeline-input-step:495.ve9c153f6067b_
pipeline-milestone-step:119.vdfdc43fc3b_9a_
pipeline-model-api:2.2198.v41dd8ef6dd56
pipeline-model-declarative-agent:1.1.1
pipeline-model-definition:2.2198.v41dd8ef6dd56
pipeline-model-extensions:2.2198.v41dd8ef6dd56
pipeline-multibranch-defaults:2.1
pipeline-npm:204.v4dc4c2202625
pipeline-rest-api:2.34
pipeline-stage-step:312.v8cd10304c27a_
pipeline-stage-tags-metadata:2.2198.v41dd8ef6dd56
pipeline-stage-view:2.34
pipeline-utility-steps:2.17.0
plain-credentials:183.va_de8f1dd5a_2b_
plugin-util-api:4.1.0
prism-api:1.29.0-15
publish-to-bitbucket:0.4
pubsub-light:1.18
rebuild:332.va_1ee476d8f6d
resource-disposer:0.23
role-strategy:727.vd344b_eec783d
run-condition:1.7
scm-api:690.vfc8b_54395023
script-security:1341.va_2819b_414686
secure-requester-whitelist:67.vca_a_d9205723f
skype-notifier:1.1.0
snakeyaml-api:2.2-111.vc6598e30cc65
sse-gateway:1.27
ssh:2.6.1
ssh-credentials:337.v395d2403ccd4
ssh-slaves:2.973.v0fa_8c0dea_f9f
sshd:3.330.vc866a_8389b_58
stash-pullrequest-builder:1.17
stashNotifier:1.492.v1b_33f185ee18
structs:338.v848422169819
subversion:1269.v53185011cd9f
summary_report:1.15
synopsys-coverity:3.0.3
template-project:1.5.2
template-workflows:41.v32d86a_313b_4a
terminate-ssh-processes-plugin:1.0
thinBackup:2.1.1
throttle-concurrents:2.14
timestamper:1.27
token-macro:400.v35420b_922dcb_
trilead-api:2.147.vb_73cc728a_32e
uno-choice:2.8.3
valgrind:0.28
variant:60.v7290fc0eb_b_cd
violation-comments-to-stash:1.134
webhook-step:342.v620877effe14
windows-slaves:1.8.1
workflow-aggregator:596.v8c21c963d92d
workflow-api:1316.v33eb_726c50b_a_
workflow-basic-steps:1058.vcb_fc1e3a_21a_9
workflow-cps:3903.v48a_8836749e9
workflow-cps-global-lib:612.v55f2f80781ef
workflow-durable-task-step:1353.v1891a_b_01da_18
workflow-job:1400.v7fd111b_ec82f
workflow-multibranch:783.787.v50539468395f
workflow-scm-step:427.v4ca_6512e7df1
workflow-step-api:657.v03b_e8115821b_
workflow-support:907.v6713a_ed8a_573
ws-cleanup:0.46

What Operating System are you using (both controller, and any agents involved in the problem)?

Controller OS: Linux - 6.1.0-22-amd64
Agent is Debian 12 based.

Reproduction steps

Create a pipeline job which runs a build job in a compute instance provisioned using GCE plugin. The build job pipeline contains multiple stages and takes ~ 2 hours to complete.

Expected Results

Pipeline should run to completion and build should be marked as successful.

Actual Results

Pipeline fails with the below error:
[2025-01-14T11:43:41.807Z] [ 85%] Building CXX object src/external-repos/mf/NBI/Netconf/CMakeFiles/netc.dir/src/rpc/CloseSessionRpc.cpp.o
[2025-01-14T11:43:43.380Z] Cannot contact gx-dcc-pool-govz8e: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
[2025-01-14T11:44:18.143Z] Agent gx-dcc-pool-govz8e was deleted; cancelling node body
[2025-01-14T11:44:18.143Z] Could not connect to gx-dcc-pool-govz8e to send interrupt signal to process
[Pipeline] End of Pipeline
Agent was removed
org.jenkinsci.plugins.workflow.actions.ErrorAction$ErrorId: 9cf2a749-60ea-47e0-be4f-9d69773b85fb

Anything else?

No response

Are you interested in contributing a fix?

No response

@gbhat618
Copy link
Contributor

gbhat618 commented Jan 14, 2025

The plugin has a background job called CleanLostNodesWork that runs hourly. It checks GCP VMs tagged with the Jenkisn GCE cloud's instanceId and deletes them if they are not in the current Jenkins controller.

If you clone your Jenkins controller (e.g., for testing environment or staging environment) there is chance the instanceId was missed to be manually updated in the GCE cloud configuration in the new controller, this issue can occur.

This problem has been reported before, debugged as to occur when multiple Jenkins controllers share the same GCE cloud instanceId.

If that is the case, then you will need to change the instanceId in the other controller in $JENKINS_HOME/config.xml
in the specific GCE cloud, example lines look like,

<?xml version='1.1' encoding='UTF-8'?>
<hudson>
  ...
  ...
    <com.google.jenkins.plugins.computeengine.ComputeEngineCloud plugin="[email protected]_013">
      <name>gce-mycloud</name>
      <instanceCap>2147483647</instanceCap>
      <projectId>my-dummy-project</projectId>
      <credentialsId></credentialsId>
      <instanceId>b04fdd72-f3a6-4b0b-963d-1ac2ab339a4f</instanceId>
      <configurations>
        <com.google.jenkins.plugins.computeengine.InstanceConfiguration>
      ...
      ...

AFAIK this instanceId is not shown in the UI for GCE cloud configuration; as looking at the code it can be a string like controller-0 or any UUID if you prefer as the default is generating a random uuid and putting that.

Reference: GitHub issue comment, also see follow up comments from there.

Questions to investigate:

  • How long have you been using the GCE plugin, when did the issue start, and how often does it occur?
  • In GCP Logging, does the deletion request come from the Jenkins controller or elsewhere?

@gbhat618
Copy link
Contributor

gbhat618 commented Jan 14, 2025

If the problem is indeed multiple controllers having same instanceId ⏫ , and if you update the instanceId, controller requires restart to reload the new instanceId --> I just tested, without restart the old instanceId keeps appearing.
Also the instanceId need not be a UUID, it can be a string (it will appear in the label jenkins_cloud_id value on the VM, that can be checked for update confirmation)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants