Skip to content

[apiserver][feat] Add read and write timeouts to HTTP server #3350

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

owenowenisme
Copy link
Contributor

Why are these changes needed?

Related issue number

Closes #3306

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@owenowenisme
Copy link
Contributor Author

@dentiny PTAL
Also, I don't know is 5 seconds for read & write is suitable for us, please let me know if the timeout need to be changed.

@owenowenisme
Copy link
Contributor Author

owenowenisme commented Apr 13, 2025

srv := &http.Server{
Addr: *httpPortFlag,
Handler: topMux,
ReadTimeout: 0, // No timeout
WriteTimeout: 0, // No timeout
IdleTimeout: 0, // No timeout
}
// Start the server.
if err := srv.ListenAndServe(); err != nil {
klog.Fatal(err)
}

Haha just saw you added the timeout
I'll just close this pr then.

@dentiny
Copy link
Contributor

dentiny commented Apr 13, 2025

Hi @owenowenisme , my PR change is only made to make linter happy thus no-op change, while you're adding timeout and delivering true value. :)

server := &http.Server{
Addr: *httpPortFlag,
Handler: topMux,
ReadTimeout: 5 * time.Second,
Copy link
Contributor

@dentiny dentiny Apr 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have an ideal recommendation here, usually there're two approaches to set timeout:

  1. Set timeout to a large value so it behaves the same as no timeout for most cases, but does help cut down tail latency and trigger retry (i.e. half a minute) (could decrease timeout later after collection stats and user feedbacks);
  2. (more scientific) have apiserver and kuberay operator deployed, to check what's the rough duration for operations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @kevin85421 / @MortalHappiness / @rueian might have much more production experience with kuberay.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can parse the value from an env variable. And the default 0 is fine, I believe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the default 0 is fine,

@rueian, would you mind explaining why defaulting to 0 is acceptable? To make sure we are on the same page,

https://pkg.go.dev/net/http#Server

// ReadTimeout is the maximum duration for reading the entire
// request, including the body. A zero or negative value means
// there will be no timeout.

I would prefer to default to a positive timeout to prevent the server from being blocked by a single request for an extended period. In addition, I prefer not to expose the environment variable until we determine that it is necessary.

@dentiny or @owenowenisme, would you mind helping me understand the meaning of ReadTimeout and WriteTimeout here? I have read https://pkg.go.dev/net/http#Server, but I'd like to know what 'request' and 'response' refer to in the context of the KubeRay API server.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that it is the user's responsibility to decide the value they want, so I think it is fine for us to leave it as default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general idea is, we should always apply a limit on resource and operation; otherwise server could get overloaded by slow / malicious clients.

Copy link
Member

@kevin85421 kevin85421 Apr 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think 0 is ok for now, because it's current behavior (no timeout); I would like to reduce risk.

For me, getting stuck on a single request is much more expensive than having a request fail. KubeRay has had related issues complained by users regarding the requests between the KubeRay operator and the Ray dashboard. Hence, I still prefer to set a default timeout.

Set a long enough timeout, as I mentioned #3350 (comment)

SG.

Use flag instead of env to allow user configuration, because env (if used improperly) is shared among all sessions, and prune to security issue.

We can expose this configuration only if some users request it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the default value, could @owenowenisme investigate the size of requests/responses in different situations so that we can set a reasonable timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll work on that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think we should provide a default timeout, but also give users an option to configure it.

@kevin85421
Copy link
Member

Btw, @owenowenisme would you mind fixing the conflict?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Configure operations timeout for apiserver HTTP server
5 participants