add diagnosis script #5683

hisImminence · 2025-05-01T11:35:17Z

Description

closes https://github.com/camunda/team-distribution/issues/495

This script automates the collection of logs and diagnostics from a Camunda Helm Chart deployment in a Kubernetes cluster. It gathers relevant information from the specified namespace and outputs it in a .zip file for easy sharing with the Camunda Support team.

Once this change is approved, I will backport it to the other versions.

Example output file: camunda-diagnostics-logs-20250501-082547.zip

Terminal output:

========================================
Camunda Diagnostics Collection Script
========================================
Namespace: immi-test
Output Directory: camunda-diagnostics-logs-20250501-082547
Current kubectl context: gke_camunda-distribution_europe-west1-b_distro-ci
========================================
Collecting resource information...
  - Collecting pod information (current state of all pods in the namespace).
  - Collecting cluster events (recent events in the namespace).
  - Collecting Persistent Volume Claims (PVCs) descriptions (storage claims in the namespace).
  - Collecting service information (list of services in the namespace).
  - Collecting detailed service descriptions (configuration of services).
  - Collecting endpoint information (list of endpoints in the namespace).
  - Collecting detailed endpoint descriptions (configuration of endpoints).
  - Collecting ingress descriptions (configuration of ingress resources).
  - Collecting config map information (configuration data stored in the namespace).
  - Collecting Persistent Volumes (PVs)
    - Collecting information for PV: pvc-0264b211-65ec-43f9-984d-99979ebd75d7
    - Collecting information for PV: pvc-5969ec94-d963-4fc0-ad08-a871df01af79
    - Collecting information for PV: pvc-4b1e1971-fd13-4159-9d43-a8efa2bb56fb
    - Collecting information for PV: pvc-4c4f92cf-1d05-4640-8820-7db2394b44f9
  - Collecting node information for nodes...
    - Collecting information for node: gke-distro-ci-workflow-preemptible02-ba8c70c7-bkbp
    - Collecting information for node: gke-distro-ci-workflow-preemptible02-ba8c70c7-cpvf
    - Collecting information for node: gke-distro-ci-workflow-preemptible02-ba8c70c7-gc6n
    - Collecting information for node: gke-distro-ci-workflow-preemptible02-ba8c70c7-nnbk
    - Collecting information for node: gke-distro-ci-workflow-preemptible02-ba8c70c7-pbkp
    - Collecting information for node: gke-distro-ci-workflow-spot02-001b92b6-2vqz
    - Collecting information for node: gke-distro-ci-workflow-spot02-001b92b6-5wjh
    - Collecting information for node: gke-distro-ci-workflow-spot02-001b92b6-8rbx
    - Collecting information for node: gke-distro-ci-workflow-spot02-001b92b6-hr4x
    - Collecting information for node: gke-distro-ci-workflow-spot02-001b92b6-mdxq
    - Collecting information for node: gke-distro-ci-workflow-spot02-001b92b6-szx5
    - Collecting information for node: gke-distro-ci-workflow-spot02-001b92b6-vs66
    - Collecting information for node: gke-distro-ci-workflow-spot02-001b92b6-wddz
  - Collecting logs and descriptions for each pod...
    - Collecting logs for pod: camunda-connectors-7fd4f6f6fd-lcls2
    - Collecting logs for pod: camunda-elasticsearch-master-0
    - Collecting logs for pod: camunda-identity-7bb4cc6dcc-cgbgl
    - Collecting logs for pod: camunda-keycloak-0
    - Collecting logs for pod: camunda-operate-d6b454f46-4nlmn
    - Collecting logs for pod: camunda-optimize-f99fdcd68-cns7q
    - Collecting logs for pod: camunda-postgresql-0
    - Collecting logs for pod: camunda-postgresql-web-modeler-0
    - Collecting logs for pod: camunda-tasklist-57765cd549-8cqdx
    - Collecting logs for pod: camunda-web-modeler-restapi-bf767b4b4-lmggt
    - Collecting logs for pod: camunda-web-modeler-webapp-97866d889-zm4xk
    - Collecting logs for pod: camunda-web-modeler-websockets-5cccddcf5f-lvlst
    - Collecting logs for pod: camunda-zeebe-0
    - Collecting logs for pod: camunda-zeebe-gateway-797bc6488f-qgkm7
All logs and descriptions collected.
Compressing collected diagnostics into camunda-diagnostics-logs-20250501-082547.zip...
Diagnostics collected and compressed into camunda-diagnostics-logs-20250501-082547.zip.
========================================
Diagnostics collection completed.
Please share the file 'camunda-diagnostics-logs-20250501-082547.zip' with the Camunda Support team.

To clean up the generated files and folder, run the following command:
  rm -rf camunda-diagnostics-logs-20250501-082547 camunda-diagnostics-logs-20250501-082547.zip
========================================

When should this change go live?

This is a bug fix, security concern, or something that needs urgent release support. (add bug or support label)
This is already available but undocumented and should be released within a week. (add available & undocumented label)
This is on a specific schedule and the assignee will coordinate a release with the Documentation team. (create draft PR and/or add hold label)
This is part of a scheduled alpha or minor. (add alpha or minor label)
There is no urgency with this change (add low prio label)

PR Checklist

My changes are for an upcoming minor release and are in the /docs directory (version 8.8).
My changes are for an already released minor and are in a /versioned_docs directory.

I added a DRI, team, or delegate as a reviewer for technical accuracy and grammar/style:
- Engineering team review
- Technical writer review via @camunda/tech-writers unless working with an embedded writer.

github-actions · 2025-05-01T11:35:40Z

👋 🤖 ✅ Looks like the changes were ported across versions, nice job! 🎉

You can read more about the versioning within our docs in our documentation guidelines.

docs/self-managed/operational-guides/troubleshooting/diagnostics.md

akeller · 2025-05-02T19:03:23Z

@hisImminence I added the deploy label for easier reviews. Please tag tech-writers as a reviewer when you are ready for us (there are a few issues I noticed with grammar/syntax.

gustavo-camunda · 2025-05-05T08:12:30Z

Hi @hisImminence ,

Looks good in general, thanks! I noticed that if a Pod has been restarted, then the logic that iterates over nodes will incorrectly pick up the restart date as a node name. For example, in the following scenario:

kubectl get pod -A -o wide

NAMESPACE            NAME                                                           READY   STATUS    RESTARTS        AGE     IP            NODE                                   NOMINATED NODE   READINESS GATES
camunda-platform     camunda-tasklist-79bb8c9d85-sx6cn                              1/1     Running   1 (2d21h ago)   2d21h   10.244.1.5    camunda-platform-local-worker          <none>           <none>

Then "2d21h" will be picked up as a node name. Script output:

  - Collecting node information:
    - Collecting information for node: 2d21h
Error from server (NotFound): nodes "2d21h" not found

hisImminence · 2025-05-05T18:37:48Z

Hi @hisImminence ,

Looks good in general, thanks! I noticed that if a Pod has been restarted, then the logic that iterates over nodes will incorrectly pick up the restart date as a node name. For example, in the following scenario:
kubectl get pod -A -o wide

NAMESPACE            NAME                                                           READY   STATUS    RESTARTS        AGE     IP            NODE                                   NOMINATED NODE   READINESS GATES
camunda-platform     camunda-tasklist-79bb8c9d85-sx6cn                              1/1     Running   1 (2d21h ago)   2d21h   10.244.1.5    camunda-platform-local-worker          <none>           <none>
Then "2d21h" will be picked up as a node name. Script output:
  - Collecting node information:
    - Collecting information for node: 2d21h
Error from server (NotFound): nodes "2d21h" not found

Great catch! I fixed it using also the columns name directly -->
for node in $(kubectl get pods -n "$namespace" -o custom-columns=":spec.nodeName" --no-headers | sort | uniq); do

jessesimpson36

I'm good with these changes / tried them locally.

mesellings

Approved! Just a few comments/suggestions 🚀

docs/self-managed/operational-guides/troubleshooting/diagnostics.md

…cs.md Co-authored-by: Mark Sellings <[email protected]>

hisImminence · 2025-05-06T17:54:48Z

Approved! Just a few comments/suggestions 🚀

Super! Thank you @mesellings - all your reviews made sense to me :)

hisImminence · 2025-05-06T18:53:26Z

p.s. need one more review to get the merging unblocked

github-actions · 2025-05-06T19:16:43Z

🧹 Preview environment for this PR has been torn down.

github-actions bot assigned hisImminence May 1, 2025

hisImminence marked this pull request as ready for review May 1, 2025 11:36

hisImminence requested a review from a team May 1, 2025 11:36

github-actions bot reviewed May 1, 2025

View reviewed changes

docs/self-managed/operational-guides/troubleshooting/diagnostics.md Outdated Show resolved Hide resolved

hisImminence requested a review from gustavo-camunda May 1, 2025 11:59

hisImminence force-pushed the add-diagnostic-script branch from 24ce35a to dd43517 Compare May 1, 2025 22:07

github-actions bot reviewed May 1, 2025

View reviewed changes

docs/self-managed/operational-guides/troubleshooting/diagnostics.md Show resolved Hide resolved

github-actions bot reviewed May 1, 2025

View reviewed changes

docs/self-managed/operational-guides/troubleshooting/diagnostics.md Outdated Show resolved Hide resolved

hisImminence force-pushed the add-diagnostic-script branch from 7aeaec3 to 53dc549 Compare May 2, 2025 10:25

akeller added deploy Stand up a temporary docs site with this PR component:self-managed Docs and issues related to Camunda Platform 8 Self-Managed labels May 2, 2025

github-actions bot temporarily deployed to camunda-docs May 2, 2025 19:10 Destroyed

hisImminence force-pushed the add-diagnostic-script branch from 53dc549 to ada7984 Compare May 5, 2025 00:09

hisImminence added 8 commits May 5, 2025 15:36

add diagnosis script

2993c05

add linter fixes

73e64ad

add to sidebars

4a0374e

more review dog edits

daf7181

fix more ci issues

6ea8716

fix namespace wording

32d2940

fix: set namespace as cmd line param

3ca4a81

fix: use spec.nodeName

17d0a03

hisImminence force-pushed the add-diagnostic-script branch from 05bbba8 to 17d0a03 Compare May 5, 2025 18:36

hisImminence requested a review from akeller May 5, 2025 18:38

akeller added this to Documentation Team May 5, 2025

akeller moved this to 👀 In Review in Documentation Team May 5, 2025

akeller requested review from a team and removed request for akeller May 5, 2025 18:40

hisImminence and others added 2 commits May 5, 2025 15:47

fix: add diagnistics script to verions 8.4 to 8.7

160e0de

fix: npm doesnt compile without escaping <namespace>

a1b1e8e

jessesimpson36 previously approved these changes May 5, 2025

View reviewed changes

github-actions bot temporarily deployed to camunda-docs May 5, 2025 19:21 Destroyed

mesellings previously approved these changes May 6, 2025

View reviewed changes

Update docs/self-managed/operational-guides/troubleshooting/diagnosti…

0deb7dc

…cs.md Co-authored-by: Mark Sellings <[email protected]>

hisImminence dismissed stale reviews from mesellings and jessesimpson36 via 0deb7dc May 6, 2025 17:52

hisImminence and others added 3 commits May 6, 2025 14:53

Update docs/self-managed/operational-guides/troubleshooting/diagnosti…

ed4b30d

…cs.md Co-authored-by: Mark Sellings <[email protected]>

Update docs/self-managed/operational-guides/troubleshooting/diagnosti…

dfcd701

…cs.md Co-authored-by: Mark Sellings <[email protected]>

Update docs/self-managed/operational-guides/troubleshooting/diagnosti…

5ee7aa4

…cs.md Co-authored-by: Mark Sellings <[email protected]>

hisImminence enabled auto-merge (squash) May 6, 2025 17:54

github-actions bot temporarily deployed to camunda-docs May 6, 2025 18:01 Destroyed

jessesimpson36 approved these changes May 6, 2025

View reviewed changes

hisImminence merged commit a2d8782 into main May 6, 2025
9 checks passed

hisImminence deleted the add-diagnostic-script branch May 6, 2025 19:12

github-project-automation bot moved this from 👀 In Review to ✅ Done in Documentation Team May 6, 2025

mesellings mentioned this pull request May 9, 2025

fix: add diagnistics script to sidebars #5741

Merged

10 tasks

gustavo-camunda mentioned this pull request May 28, 2025

Enhance diagnostics script #5891

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add diagnosis script #5683

add diagnosis script #5683

Uh oh!

hisImminence commented May 1, 2025 •

edited

Loading

Uh oh!

github-actions bot commented May 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

akeller commented May 2, 2025

Uh oh!

gustavo-camunda commented May 5, 2025

Uh oh!

hisImminence commented May 5, 2025

Uh oh!

jessesimpson36 left a comment

Uh oh!

mesellings left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hisImminence commented May 6, 2025

Uh oh!

hisImminence commented May 6, 2025

Uh oh!

Uh oh!

github-actions bot commented May 6, 2025

Uh oh!

Uh oh!

add diagnosis script #5683

add diagnosis script #5683

Uh oh!

Conversation

hisImminence commented May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

When should this change go live?

PR Checklist

Uh oh!

github-actions bot commented May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

akeller commented May 2, 2025

Uh oh!

gustavo-camunda commented May 5, 2025

Uh oh!

hisImminence commented May 5, 2025

Uh oh!

jessesimpson36 left a comment

Choose a reason for hiding this comment

Uh oh!

mesellings left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hisImminence commented May 6, 2025

Uh oh!

hisImminence commented May 6, 2025

Uh oh!

Uh oh!

github-actions bot commented May 6, 2025

Uh oh!

Uh oh!

hisImminence commented May 1, 2025 •

edited

Loading

github-actions bot commented May 1, 2025 •

edited

Loading