Skip to content

flexible coopmat mnk and unified elempack for vulkan convolution 1x1s1d1 #6154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 49 commits into from
Jul 15, 2025

Conversation

nihui
Copy link
Member

@nihui nihui commented Jul 3, 2025

No description provided.

@tencent-adm
Copy link
Member

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@codecov-commenter
Copy link

codecov-commenter commented Jul 3, 2025

Codecov Report

Attention: Patch coverage is 43.46290% with 160 lines in your changes missing coverage. Please review.

Project coverage is 95.77%. Comparing base (a1f5d5b) to head (b705bdc).

Files with missing lines Patch % Lines
src/layer/vulkan/convolution_vulkan.cpp 50.61% 120 Missing ⚠️
src/gpu.cpp 0.00% 40 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6154      +/-   ##
==========================================
- Coverage   95.78%   95.77%   -0.01%     
==========================================
  Files         835      835              
  Lines      264943   265103     +160     
==========================================
+ Hits       253783   253910     +127     
- Misses      11160    11193      +33     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

github-actions bot commented Jul 3, 2025

The binary size change of libncnn.so (bytes)

architecture base size pr size difference
x86_64 15602512 15606928 +4416 ⚠️
armhf 6611348 6615712 +4364 ⚠️
aarch64 9921904 9987680 +65776 ⚠️

@nihui nihui closed this Jul 8, 2025
@nihui nihui reopened this Jul 8, 2025
@nihui
Copy link
Member Author

nihui commented Jul 13, 2025

7900xtx baseline pr6514  
squeezenet 0.5 0.42 -16.00%
mobilenet 0.46 0.4 -13.04%
mobilenet_v2 0.76 0.63 -17.11%
mobilenet_v3 0.88 0.78 -11.36%
shufflenet 0.51 0.51 0.00%
shufflenet_v2 0.73 0.68 -6.85%
mnasnet 0.81 0.66 -18.52%
proxylessnasnet 0.83 0.67 -19.28%
efficientnet_b0 1.44 1.25 -13.19%
efficientnetv2_b0 2.88 2.57 -10.76%
regnety_400m 1.1 1 -9.09%
blazeface 0.4 0.38 -5.00%
googlenet 1.73 1.46 -15.61%
resnet18 0.73 0.73 0.00%
alexnet 0.53 0.53 0.00%
vgg16 1.33 1.33 0.00%
resnet50 1.73 1.49 -13.87%
squeezenet_ssd 1.46 1.36 -6.85%
mobilenet_ssd 1.17 1.03 -11.97%
mobilenet_yolo 0.84 0.7 -16.67%
mobilenetv2_yolov3 2.43 2.44 0.41%
yolov4-tiny 3.21 3.11 -3.12%
nanodet_m 1.45 1.37 -5.52%
yolo-fastest-1.1 0.77 0.7 -9.09%
yolo-fastestv2 0.8 0.75 -6.25%
vision_transformer 7.47 7.48 0.13%
FastestDet 0.77 0.7 -9.09%
      -8.80%

@nihui nihui requested a review from Copilot July 14, 2025 09:17
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR integrates flexible cooperative matrix (coopmat) support into the Vulkan 1x1 convolution path by selecting optimal M/N/K tile sizes at runtime and unifying specialized shaders into a single generic shader that handles different element and output packs.

  • Added get_optimal_cooperative_matrix_mnk utility in GpuInfo to pick best coopmat dimensions.
  • Replaced two fixed-size pack4 shaders with one parameterized convolution_1x1s1d1_cm.comp.
  • Extended Convolution_vulkan to configure, pack weight and dispatch the unified shader using dynamic M/N/K and elempack settings.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/gpu.cpp Implemented runtime coopmat MNK selection and fallback to fp32
src/gpu.h Declared new get_optimal_cooperative_matrix_mnk utility
src/layer/vulkan/shader/convolution_1x1s1d1_cm.comp Added a single unified 1x1 coopmat shader with dynamic tile sizes
src/layer/vulkan/convolution_vulkan.h Added coopmat config members (use_cooperative_matrix, coopmat_*, UNROLL_*)
src/layer/vulkan/convolution_vulkan.cpp Hooked up coopmat parameters, weight packing, pipeline creation and dispatch logic
Comments suppressed due to low confidence (2)

src/layer/vulkan/convolution_vulkan.h:62

  • [nitpick] Member names like UNROLL_SG_M use uppercase and underscores, which is inconsistent with the project's camelCase field naming. Consider renaming to unrollSgM or similar for consistency.
    int UNROLL_SG_M;

src/gpu.h:385

  • [nitpick] Consider adding documentation comments explaining each parameter and the return behavior of get_optimal_cooperative_matrix_mnk, as its signature is complex and understanding M/N/K selection logic will aid future maintainers.
    void get_optimal_cooperative_matrix_mnk(int M, int N, int K, VkComponentTypeKHR type, VkComponentTypeKHR acctype, VkScopeKHR scope, int& coopmat_M, int& coopmat_N, int& coopmat_K) const;

@nihui nihui merged commit d395000 into Tencent:master Jul 15, 2025
99 of 104 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants