⚡️ Speed up method Dinov2WithRegistersSelfAttention.transpose_for_scores
by 22% in PR #1250 (feature/inference-v1-models
)
#1281
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
⚡️ This pull request contains optimizations for PR #1250
If you approve this dependent PR, these changes will be merged into the original PR branch
feature/inference-v1-models
.📄 22% (0.22x) speedup for
Dinov2WithRegistersSelfAttention.transpose_for_scores
ininference/v1/models/rfdetr/dinov2_with_windowed_attn.py
⏱️ Runtime :
465 microseconds
→383 microseconds
(best of75
runs)📝 Explanation and details
Here is the optimized version of your program. Key improvements for efficiency.
transpose_for_scores
function:.reshape()
instead of.view()
for handling possibly non-contiguous memory.reshape
andpermute
for fewer intermediate objects..permute(...)
out of return for readability, but actual allocation effect is unchanged (no extra memory).transpose_for_scores
for reduced overhead.No changes to function names, args, signatures, or externally visible behavior.
This will run slightly faster, especially for non-contiguous inputs coming from upstream ops, and matches PyTorch best practices for new code in 3.12+.
✅ Correctness verification report:
🌀 Generated Regression Tests Details
To edit these changes
git checkout codeflash/optimize-pr1250-2025-05-14T17.28.58
and push.