-
Notifications
You must be signed in to change notification settings - Fork 1.6k
refactor: Update the response queue in the server to reuse response slots #7879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a9ab1e8
to
32490e1
Compare
1948b34
to
596925a
Compare
The PR was automatically closed on forced rebase with the main, reopened with a new commit. |
@@ -127,6 +127,45 @@ for trial in $TRIALS; do | |||
|
|||
kill $SERVER_PID | |||
wait $SERVER_PID | |||
|
|||
SERVER_ARGS="--model-repository=$MODELDIR --grpc-max-response-pool-size=1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test plan in the description on what we are trying to test here?
Why have we set --grpc-max-response-pool-size
only to 1
?
Also can we add a test to confirm memory footprint decreses with using --grpc-max-response-pool-size
VS not using --grpc-max-response-pool-size
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also can we add a test to confirm memory footprint decreses with using --grpc-max-response-pool-size VS not using --grpc-max-response-pool-size ?
I have added a new test case in L0_memory to evaluate memory utilization when running the server with different values for --grpc-max-response-pool-size
(1, 25, and 50), as well as without this flag.
Why have we set --grpc-max-response-pool-size only to 1?
Regarding setting --grpc-max-response-pool-size
to 1, I included this specific test to evaluate the lowest possible value. And, running the decoupled model tests takes a long time, with additional pool sizes it is leading to timeouts. The new test case in L0_memory
covers different values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, updated description with higher level details about tests. Please let me know if we are missing something.
Can we also update docs for |
I have updated the documentation to include details about both the |
Nice work @pskiran1 ! |
What does the PR do?
The current response queue allocates memory for each response.
This PR aims to enhance the response queue by reusing response slots across multiple responses within the same request once they have been written (completed) to the network. This may help reduce active memory utilization.
PopResponse()
function, we clear the response content and return it to the reusable pool.AllocateResponse()
function, we check for any available responses in the reusable pool; if present, we use it, otherwise, we allocate a new response.--grpc-max-response-pool-size
option) to limit the number of active response protobuf allocations in the gRPC response queue.Test cases:
L0_decoupled
:--grpc-max-response-pool-size
in decoupled mode. It runs the existing test cases with and without this flagL0_memory
:Checklist
<commit_type>: <Title>
Commit Type:
Check the conventional commit type
box here and add the label to the github PR.
Related PRs:
Where should the reviewer start?
Test plan:
Caveats:
Background
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)