Skip to content

refactor: Update the response queue in the server to reuse response slots #7879

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Feb 20, 2025

Conversation

pskiran1
Copy link
Member

@pskiran1 pskiran1 commented Dec 13, 2024

What does the PR do?

The current response queue allocates memory for each response.
This PR aims to enhance the response queue by reusing response slots across multiple responses within the same request once they have been written (completed) to the network. This may help reduce active memory utilization.

  • In the PopResponse() function, we clear the response content and return it to the reusable pool.
  • In the AllocateResponse() function, we check for any available responses in the reusable pool; if present, we use it, otherwise, we allocate a new response.
  • Introduces a configurable threshold(--grpc-max-response-pool-size option) to limit the number of active response protobuf allocations in the gRPC response queue.

Test cases:
L0_decoupled:

  • Added a test case to verify the behavior of --grpc-max-response-pool-size in decoupled mode. It runs the existing test cases with and without this flag

L0_memory:

  • Added a test case to evaluate memory utilization when running the server with different values for --grpc-max-response-pool-size (1, 25, and 50) and also a scenario where the flag is not set, to compare memory usage across different configurations.
  • We run a for loop in the script to test different pool sizes. We start the server with/without pool size, start monitoring the server's memory usage, run the client, and then terminate the server.

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • build
  • ci
  • docs
  • feat
  • fix
  • perf
  • refactor
  • revert
  • style
  • test

Related PRs:

Where should the reviewer start?

Test plan:

  • CI Pipeline ID: 24179262

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

@pskiran1 pskiran1 requested review from tanmayv25 and kthui December 13, 2024 17:38
@pskiran1 pskiran1 requested a review from indrajit96 January 3, 2025 10:10
@pskiran1 pskiran1 closed this Jan 20, 2025
@pskiran1 pskiran1 force-pushed the spolisetty_dlis_7657 branch from 1948b34 to 596925a Compare January 20, 2025 06:11
@pskiran1
Copy link
Member Author

The PR was automatically closed on forced rebase with the main, reopened with a new commit.

@pskiran1 pskiran1 reopened this Jan 20, 2025
@pskiran1 pskiran1 requested a review from kthui January 24, 2025 16:14
@pskiran1 pskiran1 marked this pull request as ready for review February 3, 2025 16:46
@@ -127,6 +127,45 @@ for trial in $TRIALS; do

kill $SERVER_PID
wait $SERVER_PID

SERVER_ARGS="--model-repository=$MODELDIR --grpc-max-response-pool-size=1"
Copy link
Contributor

@indrajit96 indrajit96 Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test plan in the description on what we are trying to test here?
Why have we set --grpc-max-response-pool-size only to 1?
Also can we add a test to confirm memory footprint decreses with using --grpc-max-response-pool-size VS not using --grpc-max-response-pool-size ?

Copy link
Member Author

@pskiran1 pskiran1 Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can we add a test to confirm memory footprint decreses with using --grpc-max-response-pool-size VS not using --grpc-max-response-pool-size ?

I have added a new test case in L0_memory to evaluate memory utilization when running the server with different values for --grpc-max-response-pool-size (1, 25, and 50), as well as without this flag.

Why have we set --grpc-max-response-pool-size only to 1?

Regarding setting --grpc-max-response-pool-size to 1, I included this specific test to evaluate the lowest possible value. And, running the decoupled model tests takes a long time, with additional pool sizes it is leading to timeouts. The new test case in L0_memory covers different values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, updated description with higher level details about tests. Please let me know if we are missing something.

@indrajit96
Copy link
Contributor

Can we also update docs for --grpc-max-response-pool-size

@pskiran1
Copy link
Member Author

pskiran1 commented Feb 18, 2025

Can we also update docs for --grpc-max-response-pool-size

I have updated the documentation to include details about both the --grpc-infer-allocation-pool-size and --grpc-max-response-pool-size options. I request @tanmayv25, and @indrajit96, please review and confirm if the content is correct. Thank you.

@tanmayv25
Copy link
Contributor

Nice work @pskiran1 !

@pskiran1 pskiran1 merged commit 5704238 into main Feb 20, 2025
3 checks passed
@pskiran1 pskiran1 deleted the spolisetty_dlis_7657 branch February 20, 2025 06:27
@pskiran1 pskiran1 added the PR: feat A new feature label Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: feat A new feature
Development

Successfully merging this pull request may close these issues.

5 participants