refactor: Update the response queue in the server to reuse response slots #7879

pskiran1 · 2024-12-13T16:24:57Z

What does the PR do?

The current response queue allocates memory for each response.
This PR aims to enhance the response queue by reusing response slots across multiple responses within the same request once they have been written (completed) to the network. This may help reduce active memory utilization.

In the PopResponse() function, we clear the response content and return it to the reusable pool.
In the AllocateResponse() function, we check for any available responses in the reusable pool; if present, we use it, otherwise, we allocate a new response.
Introduces a configurable threshold(--grpc-max-response-pool-size option) to limit the number of active response protobuf allocations in the gRPC response queue.

Test cases:
L0_decoupled:

Added a test case to verify the behavior of --grpc-max-response-pool-size in decoupled mode. It runs the existing test cases with and without this flag

L0_memory:

Added a test case to evaluate memory utilization when running the server with different values for --grpc-max-response-pool-size (1, 25, and 50) and also a scenario where the flag is not set, to compare memory usage across different configurations.
We run a for loop in the script to test different pool sizes. We start the server with/without pool size, start monitoring the server's memory usage, run the client, and then terminate the server.

Checklist

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

Related PRs:

Where should the reviewer start?

Test plan:

CI Pipeline ID: 24179262

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

src/grpc/infer_handler.h

pskiran1 · 2025-01-20T06:18:18Z

The PR was automatically closed on forced rebase with the main, reopened with a new commit.

into spolisetty_dlis_7657

src/grpc/infer_handler.h

src/command_line_parser.cc

indrajit96 · 2025-02-04T12:29:20Z

qa/L0_decoupled/test.sh

@@ -127,6 +127,45 @@ for trial in $TRIALS; do

  kill $SERVER_PID
  wait $SERVER_PID
+
+  SERVER_ARGS="--model-repository=$MODELDIR --grpc-max-response-pool-size=1"


Can you add a test plan in the description on what we are trying to test here?
Why have we set --grpc-max-response-pool-size only to 1?
Also can we add a test to confirm memory footprint decreses with using --grpc-max-response-pool-size VS not using --grpc-max-response-pool-size ?

Also can we add a test to confirm memory footprint decreses with using --grpc-max-response-pool-size VS not using --grpc-max-response-pool-size ?

I have added a new test case in L0_memory to evaluate memory utilization when running the server with different values for --grpc-max-response-pool-size (1, 25, and 50), as well as without this flag.

Why have we set --grpc-max-response-pool-size only to 1?

Regarding setting --grpc-max-response-pool-size to 1, I included this specific test to evaluate the lowest possible value. And, running the decoupled model tests takes a long time, with additional pool sizes it is leading to timeouts. The new test case in L0_memory covers different values.

Also, updated description with higher level details about tests. Please let me know if we are missing something.

indrajit96 · 2025-02-04T13:08:40Z

Can we also update docs for --grpc-max-response-pool-size

src/command_line_parser.cc

pskiran1 · 2025-02-18T13:49:27Z

Can we also update docs for --grpc-max-response-pool-size

I have updated the documentation to include details about both the --grpc-infer-allocation-pool-size and --grpc-max-response-pool-size options. I request @tanmayv25, and @indrajit96, please review and confirm if the content is correct. Thank you.

tanmayv25 · 2025-02-18T17:22:02Z

Nice work @pskiran1 !

pskiran1 requested review from tanmayv25 and kthui December 13, 2024 17:38

kthui reviewed Dec 13, 2024

View reviewed changes

src/grpc/infer_handler.h Outdated Show resolved Hide resolved

pskiran1 requested a review from indrajit96 January 3, 2025 10:10

pvijayakrish force-pushed the spolisetty_dlis_7657 branch from a9ab1e8 to 32490e1 Compare January 15, 2025 17:13

pskiran1 closed this Jan 20, 2025

pskiran1 force-pushed the spolisetty_dlis_7657 branch from 1948b34 to 596925a Compare January 20, 2025 06:11

Reuse response allocations

c9bcbd9

pskiran1 reopened this Jan 20, 2025

pskiran1 added 5 commits January 21, 2025 20:12

ResponseQueue Threshold

2fba1dd

Update copyright

16d347a

Merge branch 'main' of https://github.com/triton-inference-server/server

12cc2d2

into spolisetty_dlis_7657

Test case

740d167

Update copyright

9020014

pskiran1 requested a review from kthui January 24, 2025 16:14

Merge branch 'main' into spolisetty_dlis_7657

6ccee17

pskiran1 marked this pull request as ready for review February 3, 2025 16:46

Merge branch 'main' into spolisetty_dlis_7657

62e719c

indrajit96 reviewed Feb 4, 2025

View reviewed changes

src/grpc/infer_handler.h Show resolved Hide resolved

src/command_line_parser.cc Show resolved Hide resolved

indrajit96 reviewed Feb 4, 2025

View reviewed changes

tanmayv25 reviewed Feb 4, 2025

View reviewed changes

src/command_line_parser.cc Show resolved Hide resolved

pskiran1 added 7 commits February 16, 2025 19:25

Test case

477a0a3

Fix pre-commit

f49203c

Merge branch 'main' into spolisetty_dlis_7657

dcbc0b7

Fix pre-commit

0752b3f

Update

31347a8

Update

1a6a497

Update

77555bb

pskiran1 mentioned this pull request Feb 17, 2025

Add support for an optional parameter in the example repeat int32 model triton-inference-server/python_backend#396

Merged

pskiran1 added 6 commits February 17, 2025 22:41

Update

85ea2b1

Update documentation

b82eda0

Update

5cfab02

Update

6bce04f

Update

ba16414

Update

34f766a

pskiran1 requested review from tanmayv25 and indrajit96 February 18, 2025 13:50

tanmayv25 approved these changes Feb 18, 2025

View reviewed changes

pskiran1 merged commit 5704238 into main Feb 20, 2025
3 checks passed

pskiran1 deleted the spolisetty_dlis_7657 branch February 20, 2025 06:27

pskiran1 added the PR: feat A new feature label Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Update the response queue in the server to reuse response slots #7879

refactor: Update the response queue in the server to reuse response slots #7879

pskiran1 commented Dec 13, 2024 •

edited

Loading

pskiran1 commented Jan 20, 2025

indrajit96 Feb 4, 2025 •

edited

Loading

pskiran1 Feb 18, 2025 •

edited

Loading

pskiran1 Feb 18, 2025

indrajit96 commented Feb 4, 2025

pskiran1 commented Feb 18, 2025 •

edited

Loading

tanmayv25 commented Feb 18, 2025

refactor: Update the response queue in the server to reuse response slots #7879

refactor: Update the response queue in the server to reuse response slots #7879

Conversation

pskiran1 commented Dec 13, 2024 • edited Loading

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

pskiran1 commented Jan 20, 2025

indrajit96 Feb 4, 2025 • edited Loading

Choose a reason for hiding this comment

pskiran1 Feb 18, 2025 • edited Loading

Choose a reason for hiding this comment

pskiran1 Feb 18, 2025

Choose a reason for hiding this comment

indrajit96 commented Feb 4, 2025

pskiran1 commented Feb 18, 2025 • edited Loading

tanmayv25 commented Feb 18, 2025

pskiran1 commented Dec 13, 2024 •

edited

Loading

indrajit96 Feb 4, 2025 •

edited

Loading

pskiran1 Feb 18, 2025 •

edited

Loading

pskiran1 commented Feb 18, 2025 •

edited

Loading