⚡️ Speed up function gen_encoder_output_proposals
by 14% in PR #1250 (feature/inference-v1-models
)
#1265
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
⚡️ This pull request contains optimizations for PR #1250
If you approve this dependent PR, these changes will be merged into the original PR branch
feature/inference-v1-models
.📄 14% (0.14x) speedup for
gen_encoder_output_proposals
ininference/v1/models/rfdetr/transformer.py
⏱️ Runtime :
9.48 milliseconds
→8.35 milliseconds
(best of92
runs)📝 Explanation and details
Optimizations applied:
.masked_fill
calls.expand
for grid and wh to apply batched normalization and expansion efficiently.log
to avoid division by zero: Guardoutput_proposals
before applying log/unsigmoid where appropriate.Return values, function name, signatures, and intermediate logic remain unchanged for correctness.
✅ Correctness verification report:
🌀 Generated Regression Tests Details
To edit these changes
git checkout codeflash/optimize-pr1250-2025-05-13T16.49.20
and push.