Changes to support Ensemble Top Level Response Caching #560

lkomali · 2024-04-05T02:06:07Z

Related PR: triton-inference-server/core#338

Changes:
Updated DetermineStatsModelVersion(), MergeStatistics() functions to handle cache hit scenario when ensemble top request is cached due to which composing models are not executed.
Tests for DetermineStatsModelVersion()

src/c++/perf_analyzer/inference_profiler.cc

src/c++/perf_analyzer/model_parser.h

src/c++/perf_analyzer/inference_profiler.cc

rmccorm4 · 2024-05-07T23:03:21Z

src/c++/perf_analyzer/inference_profiler.h

+  // This function sets composing model server stats to 0 in case of a cache hit
+  // when top level response cache is enabled, since composing models are not
+  // executed and do not have any stats
+  void ResetServerStats(ServerSideStats* server_stats);


Is this used? If not let's remove

src/c++/perf_analyzer/inference_profiler.cc

src/c++/perf_analyzer/model_parser.h

matthewkotila · 2024-05-07T23:53:57Z

LGTM! Thanks for working on this, Harshini!

…ton-inference-server/client into lkomali-dlis-4626-pa-changes

rmccorm4 · 2024-05-09T02:43:40Z

src/c++/perf_analyzer/inference_profiler.cc

-                << "cache hit/miss "
-                << ensemble_times.total_combined_cache_compute_time_avg_us
-                << " usec)" << std::endl;
+      // FIXME - Refactor these calculations in case of ensemble top level


Usually good practice to include the ticket number if you have one for follow-up like

// FIXME [DLIS-XXXX]: Fix the calculations for the latency breakdown with ensemble+caching

rmccorm4 · 2024-05-09T02:44:06Z

src/c++/perf_analyzer/inference_profiler.cc

-  // This is due to the scheduler sends cache response and composing models do
-  // not get executed. It's a valid scenario and shouldn't throw error.
-  bool is_model_version_specified =
+  // FIXME - Investigate why composing model version is -1 in case of ensemble


Usually good practice to include the ticket number if you have one for follow-up like

// FIXME [DLIS-XXXX]: ...

rmccorm4

LGTM with assumption we'll follow-up with the improvements for 24.06, @matthewkotila for final verdict.

Left a couple optional nits about the comments.

…#642)" This reverts commit cc6a3b2.

* Fix empty response bug * Fix unused variable Fix test Initialize logger to capture logs Add unit test Change to _ instead of removing Check if args.model is not None fix artifact path Support Python 3.8 in GenAI-Perf (#643) Add automation to run unit tests and check code coverage for GenAI-Perf against Python 3.10 (#640) Changes to support Ensemble Top Level Response Caching (#560) Support for fixed number of requests (#633) * first pass. Hardcoded values * Working for concurrency (hardcoded whenever count windows is used for now) * working for req rate as well * Add CLI. Add/fix unit tests * Remove hack. Restore all normal functionality * Refactor thread config into one class. Add more testing * Rename arg to request-count * Fix request rate bug * Update info print * fix corner case * move fixme to a story tag * add assert to avoid corner case * rename variables * self review #1 * copyright changes * add doxygen to functions * Don't allow sweeping over multiple concurrency or request rate with request-count fix test (#637) Support custom artifacts directory and improve default artifacts directory (#636) * Add artifacts dir option and more descriptive profile export filename * Clean up * fix input data path * Add tests * create one to one plot dir for each profile run * change the directory look * add helper method Extend genai perf plots to compare across multiple runs (#635) * Modify PlotManager and plots classes * Support plots for multiple runs -draft * Fix default plot visualization * Remove artifact * Set default compare directory * Support generating parquet files * Remove annotations and fix heatmap * Fix errors * Fix pre-commit * Fix CodeQL warning * Remove unused comments * remove x axis tick label for boxplot * Add logging and label for heatmap subplots * Allow users to adjust width and height * fix grammer --------- Co-authored-by: Hyunjae Woo <[email protected]> Generate plot configurations for plot manager (#632) * Introduce PlotConfig and PlotConfigParser class * Port preprocessing steps and introduce ProfileRunData * Create plot configs for default plots * fix minor bug * Fix comment * Implement parse method in PlotConfigParser * refactor * fix test * Add test * Address feedback * Handle custom endpoint Add more metadata to profile export JSON file (#627) * Add more metadata to profile export data * Fix minor bug * refactor Add compare subcommand (#623) * Move for better visibility * Add compare subparser * Add subcommand compare * Fix test * Add ticket * add --files option and minor fix * Fix tests * Add unit tests * Address feedback * Fix minor error and add section header Revert "Changes to support Ensemble Top Level Response Caching (#560) (#642)" This reverts commit cc6a3b2. Changes to support Ensemble Top Level Response Caching (#560) (#642)

…#642)" This reverts commit cc6a3b2.

Changes to support DLIS-4626 with tests

b6c696c

lkomali requested review from rmccorm4, debermudez and tgerdesnv April 5, 2024 19:07

rmccorm4 reviewed Apr 5, 2024

View reviewed changes

src/c++/perf_analyzer/inference_profiler.cc Outdated Show resolved Hide resolved

rmccorm4 reviewed Apr 5, 2024

View reviewed changes

src/c++/perf_analyzer/inference_profiler.cc Outdated Show resolved Hide resolved

rmccorm4 reviewed Apr 5, 2024

View reviewed changes

src/c++/perf_analyzer/inference_profiler.cc Outdated Show resolved Hide resolved

rmccorm4 reviewed Apr 5, 2024

View reviewed changes

src/c++/perf_analyzer/model_parser.h Outdated Show resolved Hide resolved

lkomali changed the title ~~Changes to support Ensemble Top Level Request Caching~~ Changes to support Ensemble Top Level Response Caching Apr 12, 2024

lkomali added 2 commits April 12, 2024 14:53

Resolved comments

4d5eacc

clang-format-fix

9a0fa60

rmccorm4 reviewed Apr 12, 2024

View reviewed changes

src/c++/perf_analyzer/inference_profiler.cc Outdated Show resolved Hide resolved

lkomali added 3 commits April 15, 2024 14:35

Reset Server Stats as a method of ServerStats struct

3618845

Change function name

99ab07c

clang-format fix

32bed57

rmccorm4 requested a review from matthewkotila May 7, 2024 22:51

Merge branch 'main' into lkomali-dlis-4626-pa-changes

add2572

rmccorm4 reviewed May 7, 2024

View reviewed changes

src/c++/perf_analyzer/inference_profiler.cc Outdated Show resolved Hide resolved

rmccorm4 reviewed May 7, 2024

View reviewed changes

src/c++/perf_analyzer/inference_profiler.cc Outdated Show resolved Hide resolved

rmccorm4 reviewed May 7, 2024

View reviewed changes

src/c++/perf_analyzer/inference_profiler.cc Outdated Show resolved Hide resolved

oandreeva-nv reviewed May 7, 2024

View reviewed changes

src/c++/perf_analyzer/model_parser.h Outdated Show resolved Hide resolved

lkomali added 3 commits May 7, 2024 17:46

Fix comments

188f8d8

Merge branch 'lkomali-dlis-4626-pa-changes' of https://github.com/tri…

99b7d23

…ton-inference-server/client into lkomali-dlis-4626-pa-changes

Fix comment description

833194a

lkomali requested review from rmccorm4 and oandreeva-nv May 8, 2024 01:01

rmccorm4 approved these changes May 8, 2024

View reviewed changes

Fix latency calculations and stats display

184d0c2

lkomali added 5 commits May 8, 2024 16:41

Remove unwanted file

fd3126a

Add follow-up ticket related comments

c2044bf

Remove unwanted code

9fcdf3f

Fix

b55ca64

Fix comment

3039b65

rmccorm4 reviewed May 9, 2024

View reviewed changes

rmccorm4 approved these changes May 9, 2024

View reviewed changes

matthewkotila approved these changes May 9, 2024

View reviewed changes

rmccorm4 merged commit 110ae94 into main May 9, 2024
3 checks passed

rmccorm4 deleted the lkomali-dlis-4626-pa-changes branch May 9, 2024 20:38

lkomali added a commit that referenced this pull request May 9, 2024

Changes to support Ensemble Top Level Response Caching (#560)

c8ec217

mc-nv added a commit that referenced this pull request May 10, 2024

Changes to support Ensemble Top Level Response Caching (#560) #642

3f9cb12

mc-nv pushed a commit that referenced this pull request May 10, 2024

Changes to support Ensemble Top Level Response Caching (#560) (#642)

cc6a3b2

ganeshku1 added a commit that referenced this pull request May 11, 2024

Revert "Changes to support Ensemble Top Level Response Caching (#560) (…

ff4493b

…#642)" This reverts commit cc6a3b2.

ganeshku1 pushed a commit that referenced this pull request May 11, 2024

Changes to support Ensemble Top Level Response Caching (#560)

e0fc025

mc-nv pushed a commit that referenced this pull request May 13, 2024

Revert "Changes to support Ensemble Top Level Response Caching (#560) (…

3daf2e3

…#642)" This reverts commit cc6a3b2.

mc-nv pushed a commit that referenced this pull request May 13, 2024

Changes to support Ensemble Top Level Response Caching (#560)

828143b

nnshah1 pushed a commit that referenced this pull request Jun 5, 2024

Revert "Changes to support Ensemble Top Level Response Caching (#560) (…

c822917

…#642)" This reverts commit cc6a3b2.

nnshah1 pushed a commit that referenced this pull request Jun 5, 2024

Changes to support Ensemble Top Level Response Caching (#560)

fddba6d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Changes to support Ensemble Top Level Response Caching #560

Changes to support Ensemble Top Level Response Caching #560

Uh oh!

lkomali commented Apr 5, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rmccorm4 May 7, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

matthewkotila commented May 7, 2024

Uh oh!

rmccorm4 May 9, 2024 •

edited

Loading

Uh oh!

rmccorm4 May 9, 2024

Uh oh!

rmccorm4 left a comment

Uh oh!

Uh oh!

Uh oh!

Changes to support Ensemble Top Level Response Caching #560

Changes to support Ensemble Top Level Response Caching #560

Uh oh!

Conversation

lkomali commented Apr 5, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rmccorm4 May 7, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

matthewkotila commented May 7, 2024

Uh oh!

rmccorm4 May 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rmccorm4 May 9, 2024

Choose a reason for hiding this comment

Uh oh!

rmccorm4 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rmccorm4 May 9, 2024 •

edited

Loading