⚡️ Speed up method PositionEmbeddingLearned.forward
by 30% in PR #1250 (feature/inference-v1-models
)
#1274
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
⚡️ This pull request contains optimizations for PR #1250
If you approve this dependent PR, these changes will be merged into the original PR branch
feature/inference-v1-models
.📄 30% (0.30x) speedup for
PositionEmbeddingLearned.forward
ininference/v1/models/rfdetr/position_encoding.py
⏱️ Runtime :
5.43 milliseconds
→4.17 milliseconds
(best of38
runs)📝 Explanation and details
Optimization summary:
.expand()
instead of.repeat()
to minimize memory usage and runtime by avoiding actual data copying and repeated allocation.device
to avoid repeated attribute lookups..expand()
.✅ Correctness verification report:
🌀 Generated Regression Tests Details
To edit these changes
git checkout codeflash/optimize-pr1250-2025-05-14T12.23.01
and push.