Skip to content

rpk: add --wait to remote-bundle start #26486

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from
Open

rpk: add --wait to remote-bundle start #26486

wants to merge 2 commits into from

Conversation

JFlath
Copy link
Contributor

@JFlath JFlath commented Jun 17, 2025

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.1.x
  • v24.3.x
  • v24.2.x

Release Notes

Improvements

  • Allows waiting for collection to complete when running a remote-bundle

@JFlath
Copy link
Contributor Author

JFlath commented Jun 17, 2025

@r-vasquez As ever, I'll have made some mistakes with bit/build. This is branched from remote_bundle_upload, so includes that change too. Can either rebase, or we can just merge that in first.

Either way though, more broadly, does the approach here work for you?

ready, errorred := filterCompletedBrokers(status)
if len(ready)+len(errorred) == len(status) {
if len(ready) == 0 {
fmt.Printf(`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would using out.Die() be the better option here to exit with 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of approach, I actually think we need to exit 1 in this case, having thought about it some more. If out.Die() is the best option for that, great. If not, let me know :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, either out.Die or os.Exit(1). But os.Die is effectively the same so let's use that 👍

@vbotbuildovich
Copy link
Collaborator

CI test results

test results on build#67463
test_class test_method test_arguments test_kind job_url test_status passed reason
TopicDeleteCloudStorageTest topic_delete_installed_snapshots_test ducktape https://buildkite.com/redpanda/redpanda/builds/67463#01977e21-bd90-47d5-a1b9-1796af54ff9b FLAKY 20/21 upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS

ready, errorred := filterCompletedBrokers(status)
if len(ready)+len(errorred) == len(status) {
if len(ready) == 0 {
fmt.Printf(`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, either out.Die or os.Exit(1). But os.Die is effectively the same so let's use that 👍

Waiting for collection to complete...
`, jobID)

for {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually dislike 'naked' infinite loops without any escape hatch, what do you think of:

  1. Listening to the cmd.Context() cancellation.
  2. Having a timeout (probably configurable through flags)
  3. having a parameter that breaks the for loop so you don't have to wait for the next poll to finish the loop

I'm thinking in something like:

// context with timeout
ctx, cancel := context.WithTimeout(cmd.Context(), timeout)
defer cancel()
var done bool

for !done {
    select {
    case <-ctx.Done():
        // Context canceled or timed out; return or handle error
        return ctx.Err()

    default:
     
        select {
        case <-ctx.Done():
            // Context canceled during wait
            return ctx.Err()
        case <-time.After(10 * time.Second):
        //  Poll:
        //   status, err := executeBundleStatus(ctx, fs, p)
        //   handle error, right now is not being handled, we can break early if we find something bad.
        //    ...
        // if len(ready)+len(errorred) == len(status) {
        //     if len(ready) == 0 {
        //         // Print "no bundles created" message
        //     }
        //     done = true
        // }
        }
    }
}

I see that we start polling initially after 10s, as (I assume) this is a long-running operation and won't be ready right away, but alternatively, you can poll right away with the suggestion above. Just have the select at the end of the polling logic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what do you think of printing a debug log (or straight to stdout) with the progress:

fmt.Printf("Waiting for collection... %v/%v ready\n", len(ready), len(status))

We can also be fancy and use carriage return to update the message

for !done {
    // ... polling 
    
    fmt.Printf("\rWaiting for collection... %v/%v ready", len(ready), len(status))

    // ... sleep ...
}

fmt.Println() // move to new line.
fmt.Println("Debug bundle collection successful, you can find the debug bundle in ....")

rpk debug remote-bundle status
`, jobID)
}
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we break here, we are not printing any message, we should confirm before ending that everything went right and where the bundle is located.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants