Skip to content

Commit 0f84995

Browse files
authored
Add test for sequence model instance update (#5831)
* Add test for sequence model instance update * Add gap for file timestamp update * Update test for non-blocking sequence update * Update documentation * Remove mentioning increase instance count case * Add more documentaion for scheduler update test * Update test for non-blocking batcher removal * Add polling due to async scheduler destruction * Use _ as private * Fix typo * Add docs on instance count decrease * Fix typo * Separate direct and oldest to different test cases * Separate nested tests in a loop into multiple test cases * Refactor scheduler update test * Improve doc on handling future test failures * Address pre-commit * Add best effort to reset model state after a single test case failure * Remove reset model method to make harder for chaining multiple test cases as one * Remove description on model state clean up
1 parent 9bc9ad6 commit 0f84995

File tree

6 files changed

+340
-154
lines changed

6 files changed

+340
-154
lines changed

docs/user_guide/model_management.md

+13-8
Original file line numberDiff line numberDiff line change
@@ -212,9 +212,8 @@ repository, copy in the new shared libraries, and then reload the
212212
model.
213213

214214
* If only the model instance configuration on the 'config.pbtxt' is modified
215-
(i.e. increasing/decreasing the instance count) for non-sequence models,
216-
then Triton will update the model rather then reloading it, when either a load
217-
request is received under
215+
(i.e. increasing/decreasing the instance count), then Triton will update the
216+
model rather then reloading it, when either a load request is received under
218217
[Model Control Mode EXPLICIT](#model-control-mode-explicit) or change to the
219218
'config.pbtxt' is detected under
220219
[Model Control Mode POLL](#model-control-mode-poll).
@@ -225,11 +224,17 @@ request is received under
225224
configuration, so its presence in the model directory may be detected as a new file
226225
and cause the model to fully reload when only an update is expected.
227226

228-
* If a sequence model is updated with in-flight sequence(s), Triton does not
229-
guarantee any remaining request(s) from the in-flight sequence(s) will be routed
230-
to the same model instance for processing. It is currently the responsibility of
231-
the user to ensure any in-flight sequence(s) is complete before updating a
232-
sequence model.
227+
* If a sequence model is *updated* (i.e. decreasing the instance count), Triton
228+
will wait until the in-flight sequence is completed (or timed-out) before the
229+
instance behind the sequence is removed.
230+
* If the instance count is decreased, arbitrary instance(s) are selected among
231+
idle instances and instances with in-flight sequence(s) for removal.
232+
233+
* If a sequence model is *reloaded* with in-flight sequence(s) (i.e. changes to
234+
the model file), Triton does not guarantee any remaining request(s) from the
235+
in-flight sequence(s) will be routed to the same model instance for processing.
236+
It is currently the responsibility of the user to ensure any in-flight
237+
sequence(s) are completed before reloading a sequence model.
233238

234239
## Concurrently Loading Models
235240

0 commit comments

Comments
 (0)