-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Give sufficient time to get inference statistics for sequence tests #5670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'd be surprised if the window between returning response and reporting statistics is really small enough for this to happen often. On top of that, if it is just a timing issue, why is it so consistently off by 1? ex: However, if it was in fact a logic/race condition issue (ex: the stats are just getting set incorrectly) and not a timing issue, then I suppose the test would still periodically fail even with this PR's change.
|
Even if it does not resolve this issue, it seems prudent to have some sort of waiting to ensure the statistics are ready. I added the logging here. It's true that in the runs I've seen, it's always an off-by-one error and I wouldn't expect it to be as consistent if this was a timing issue. But this seemed like a good change to start with. It's possible it would reproduce locally and it just hasn't hit it after a dozen runs yet. This test is a bit challenging as its runtime is just over an hour and it seems to hit this case <5% of runs. An additional option is enabling verbose logs so that if it fails again, we'll have verbose logs. |
I agree that having the timestamp can be useful to understand the root cause of the issue. Without further information, however, I think off-by-1 is more an indicator of timing issue than logical error. The client (test) will proceed once all responses are received and the batch stats will only be updated after all responses are sent on the server side, which is why 1-off may be observed. That being said, it is less likely that the network time for getting stats is less than the time for updating it, unless it is running on resource constrained system and the server thread is preempted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, minor comments for future bookkeeping
qa/common/sequence_util.py
Outdated
actual_exec_cnt = stats.model_stats[0].execution_count | ||
if actual_exec_cnt == exec_cnt: | ||
break | ||
print("Waiting: expect {} executions, got {}".format(exec_cnt, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add an attempt number and some kind of "ERROR/WARNING" or similar so it's easier to spot in the logs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Updated.
@@ -112,7 +112,7 @@ else | |||
fi | |||
fi | |||
|
|||
SERVER_ARGS_EXTRA="--backend-directory=${BACKEND_DIR} --backend-config=tensorflow,version=${TF_VERSION}" | |||
SERVER_ARGS_EXTRA="--backend-directory=${BACKEND_DIR} --backend-config=tensorflow,version=${TF_VERSION} --log-verbose=1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating verbose logging so that we have more information if this fails again. SERVER_ARGS_EXTRA is used every time, so putting this here seemed to make more sense than putting it in five places.
qa/common/sequence_util.py
Outdated
print("Waiting: expect {} executions, got {}".format(exec_cnt, | ||
actual_exec_cnt)) | ||
print("WARNING: expect {} executions, got {} (attempt {})".format( | ||
exec_cnt, actual_exec_cnt, loop_count)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
increment loop count? Alternatively just use a fixed number of iterations:
num_tries = 10
for i in range(num_tries):
...
print(f"WARNING: ... (attempt {i})")
sleep(1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, switched to using the loop count incrementor.
qa/common/sequence_util.py
Outdated
@@ -984,8 +984,22 @@ | |||
_max_sequence_idle_ms * 1000) # 5 secs | |||
|
|||
def check_status(self, model_name, batch_exec, exec_cnt, infer_cnt): | |||
stats = self.triton_client_.get_inference_statistics(model_name, "1") | |||
self.assertEqual(len(stats.model_stats), 1, "expect 1 model stats") | |||
start_time = time.time() |
Check notice
Code scanning / CodeQL
Unused local variable
qa/common/sequence_util.py
Outdated
@@ -986,18 +986,18 @@ def check_setup(self, model_name): | |||
def check_status(self, model_name, batch_exec, exec_cnt, infer_cnt): | |||
start_time = time.time() | |||
# There is a time window between when responses are returned and statistics are updated. | |||
# To prevent intermittent test failure during that window, wait up to 10 seconds for the | |||
# To prevent intermittent test failure during that window, wait up to 5 seconds for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can keep it as 10 seconds by increasing sleep time. Might as well have more time for it to pass if we're doing the workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, updated.
Since responses are sent before inference statistics are reported, there is the risk that statistics are checked before they are ready. This may be the cause of L0_sequence_batcher failing infrequently with statistics missing one request, with messages like:
Only in the case that sequence batcher is going to fail due to not having enough requests ready, this change will provide up to 10 seconds in .5 second increments for the requests to be ready.