Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

antctl Support Bundle Should Collect Logs from Crashed Agent and Controller Containers #3624

Closed
edwardbadboy opened this issue Apr 12, 2022 · 2 comments
Assignees
Labels
area/component/antctl Issues or PRs releated to the command line interface component area/ops Issues or PRs related to features which support network operations and troubleshooting good first issue Good for newcomers kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@edwardbadboy
Copy link
Contributor

Describe the bug

When antctl runs in remote mode, it needs to connect to remote agent or controller APIServer to get support bundle. If remote agent or controller fails, antctl writes the failed node name in support bundle. I'd like to suggest an improvement. If antctl fails to connect to agent/controller APIServer, it can alternatively get logs from K8s API. For example:

logOption := &v1.PodLogOptions{Container: containerName, SinceSeconds: ...}
logs := k8sClientset.CoreV1().Pods("kube-system").GetLogs(podName, logOption)
logStream, err := logs.Stream(context.TODO())
...

Another improvement is that support bundle function controllerRemoteRunE should run getClusterInfo even if there is failure in previous steps. This is to deal with a corner case (or maybe we can see it as a user error) that incompatible antctl is used to get agent and controller logs, then antctl may fail in creating clients for AntreaAgentInfo and AntreaControllerInfo, and it collects nothing. If in this case it can at least run getClusterInfo, we will be able to see the Antrea version from collected Pod manifests and get a clue.

These improvements are minor, but they'll help to save some communication round-trips.

To Reproduce
Edit antrea-controller Deployment and change antrea-controller command to "false".

$ antctl supportbundle --controller-only -d test-bundle
Controller Info Failed Reason: Post "https://10.10.10.10:10349/apis/system.antrea.io/v1beta1/supportbundles": dial tcp 10.176.26.253:10349: connect: connection refused
Finish [--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------] 100.00% 200ms
Error: no data was collected: Post "https://10.10.10.10:10349/apis/system.antrea.io/v1beta1/supportbundles": dial tcp 10.176.26.253:10349: connect: connection refused;

Expected
It's better to also request log stream from K8s API.

Actual behavior
antctl only reports an error.

Versions:
Please provide the following information:

  • Antrea version: v1.5.2, v1.6.0
@edwardbadboy edwardbadboy added the kind/bug Categorizes issue or PR as related to a bug. label Apr 12, 2022
@edwardbadboy edwardbadboy changed the title antctl Support Bundle Collect Logs from Crashed Agent and Controller Containers antctl Support Bundle Should Collect Logs from Crashed Agent and Controller Containers Apr 12, 2022
@antoninbas antoninbas added good first issue Good for newcomers area/component/antctl Issues or PRs releated to the command line interface component area/ops Issues or PRs related to features which support network operations and troubleshooting labels Apr 12, 2022
@hangyan
Copy link
Member

hangyan commented Apr 13, 2022

working on this

@github-actions
Copy link
Contributor

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 13, 2022
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 11, 2022
@luolanzone luolanzone added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 15, 2024
@hangyan hangyan reopened this Nov 20, 2024
@antoninbas antoninbas added this to the Antrea v2.3 release milestone Nov 20, 2024
@hangyan hangyan closed this as completed Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/component/antctl Issues or PRs releated to the command line interface component area/ops Issues or PRs related to features which support network operations and troubleshooting good first issue Good for newcomers kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests

4 participants