-
Notifications
You must be signed in to change notification settings - Fork 554
[Feature] Add timeout for apiserver grpc server #3427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Add timeout for apiserver grpc server #3427
Conversation
-e Signed-off-by: machichima <[email protected]>
There're three timeout related issues, as far as I am aware of:
|
Got it, I think this PR is working on grpc service timeout, while PR #3350 works on limiting http request? As I saw the comment in PR #3350 that they want to decide a default timeout, I am thinking if the default value here should be the same as what they set? |
yes
No, I don't think they need to be the same |
60 seconds should be a good number ? |
…server-timeout-grpc-server -e Signed-off-by: machichima <[email protected]>
-e Signed-off-by: machichima <[email protected]>
-e Signed-off-by: machichima <[email protected]>
-e Signed-off-by: machichima <[email protected]>
Yes I think its good for default value. I also add an env variable to let user set the timeout value. |
@dentiny PTAL |
case <-ctx.Done(): | ||
// Raise error if time out | ||
if ctx.Err() == context.DeadlineExceeded { | ||
return nil, fmt.Errorf("grpc server timed out") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we name the grpc server with KubeRay API server ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Just changed
apiserver/cmd/main.go
Outdated
@@ -50,11 +51,21 @@ func main() { | |||
_ = flagSet.Set("log_file", *logFile) | |||
} | |||
|
|||
grpcTimeout := 60 * time.Second // Default timeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we following mechanisms to define constants or we are adding to each files where we are using it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quickly glancing over the code, we have constants.go
for other components (i.e. operator)
https://github.com/ray-project/kuberay/blob/ebb5ba441b0a7f888c17aa5c2d33943084a9a2d9/ray-operator/controllers/ray/utils/constant.go
I usually do this in two ways:
- either place it to constant file, just as what we did for kuberay operator
- the benefit of which is we group all constants in one place, rather than scattered around the codebase
- or define a util function
getGrpcServerTimeoutOrDefault
and have default timeout besides- the benefit of which is it's easy to locate all timeout related functions and features
Our codebase seems to prefer (1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I found that in apiserver
, they put constants in config.go
, I'll add it here
kuberay/apiserver/pkg/util/config.go
Lines 10 to 14 in a83d3c1
const ( | |
// Label keys | |
RayClusterNameLabelKey = "ray.io/cluster-name" | |
RayClusterUserLabelKey = "ray.io/user" | |
RayClusterVersionLabelKey = "ray.io/version" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
apiserver/cmd/main.go
Outdated
@@ -50,11 +51,21 @@ func main() { | |||
_ = flagSet.Set("log_file", *logFile) | |||
} | |||
|
|||
grpcTimeout := 60 * time.Second // Default timeout | |||
if timeoutStr := os.Getenv("GRPC_SERVER_TIMEOUT"); timeoutStr != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw why do we use env var instead of flags? I think flags are strictly better in a few ways:
- program checks env variables; for example, bazel uses env to decide whether we could reuse cache
- impose security issue, because env is shared among all processes which could be accessed everywhere
I almost only use env variables when:
- across language boundary
- across process boundary, if no other easier way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the guidance!
The reason why I put it in environment variable instead of flag is because I search through the code base and find they put this (which I think is a bit similar to timeout?) in the environment variable, so I just simply follow what it does
requeueAfterSeconds, err := strconv.Atoi(os.Getenv(utils.RAYCLUSTER_DEFAULT_REQUEUE_SECONDS_ENV)) |
I agree to your points, if there's no other places that need this value, I think I'll just move it to flag instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to flag
apiserver/cmd/main.go
Outdated
grpcTimeout = timeout | ||
klog.Infof("gRPC servier timeout set to %v", grpcTimeout) | ||
} else { | ||
klog.Warningf("Invalid GRPC_SERVER_TIMEOUT value: %v, using default timeout (60 seconds)", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use %d
to print out default value, in case we change in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Just added
-e Signed-off-by: machichima <[email protected]>
-e Signed-off-by: machichima <[email protected]>
-e Signed-off-by: machichima <[email protected]>
-e Signed-off-by: machichima <[email protected]>
…server-timeout-grpc-server -e Signed-off-by: machichima <[email protected]>
-e Signed-off-by: machichima <[email protected]>
-e Signed-off-by: machichima <[email protected]>
-e Signed-off-by: machichima <[email protected]>
-e Signed-off-by: machichima <[email protected]>
let me know when it's ready for review, feel free to ping me any time :) |
-e Signed-off-by: machichima <[email protected]>
select { | ||
case <-time.After(delay): | ||
return "test_response", h.returnErr | ||
case <-ctx.Done(): | ||
var grpcCode codes.Code | ||
switch ctx.Err() { | ||
case context.Canceled: | ||
grpcCode = codes.Canceled | ||
case context.DeadlineExceeded: | ||
grpcCode = codes.DeadlineExceeded | ||
default: | ||
grpcCode = codes.Unknown | ||
} | ||
return nil, status.Error(grpcCode, ctx.Err().Error()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this to mimic the grpc IO handler for testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is automatically updated when running make test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is automatically updated when running make test
apiserver/cmd/main.go
Outdated
@@ -50,11 +51,21 @@ func main() { | |||
_ = flagSet.Set("log_file", *logFile) | |||
} | |||
|
|||
grpcTimeout := 60 * time.Second // Default timeout | |||
if timeoutStr := os.Getenv("GRPC_SERVER_TIMEOUT"); timeoutStr != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to flag
Sorry I just found that I didn't submit the review as comment |
@dentiny PTAL, Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you for the help!
And sorry about the delay
func (mr *MockKubernetesClientInterfaceMockRecorder) ConfigMapClient(namespace interface{}) *gomock.Call { | ||
mr.mock.ctrl.T.Helper() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the mock call setup fails (e.g., wrong argument types), Go's testing output will show the line in your
actual test code where the error originated, rather than pointing you to this ConfigMapClient() method in
the mock recorder.
TIL
@@ -61,7 +83,7 @@ func TestAPIServerInterceptor(t *testing.T) { | |||
req, | |||
info, | |||
func(ctx context.Context, req interface{}) (interface{}, error) { | |||
return tt.handler.Handle(ctx, req) | |||
return tt.handler.Handle(ctx, req, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 0 /*delay*/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed! Thanks!
@@ -153,7 +175,7 @@ func TestAPIServerInterceptorLogging(t *testing.T) { | |||
"test_request", | |||
info, | |||
func(receivedCtx context.Context, req interface{}) (interface{}, error) { | |||
return handler.Handle(receivedCtx, req) | |||
return handler.Handle(receivedCtx, req, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add a comment besides constants
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed! Thanks!
…server-timeout-grpc-server -e Signed-off-by: machichima <[email protected]>
-e Signed-off-by: machichima <[email protected]>
PR description needs to be updated :) |
Thanks! Just updated! |
-grpc_timeout
(e.g.-grpc_timeout 30s
), default to 60 secondsWhy are these changes needed?
Currently, there's no timeout setting in gRPC server side. This PR added the timeout to ensure resource access doesn't get overloaded in all cases.
Related issue number
Part of #3344
Checks