flexible coopmat mnk and unified elempack for vulkan convolution 1x1s1d1 #6154

nihui · 2025-07-03T12:06:57Z

No description provided.

tencent-adm · 2025-07-03T12:07:15Z

Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

codecov-commenter · 2025-07-03T12:09:52Z

Codecov Report

Attention: Patch coverage is 43.46290% with 160 lines in your changes missing coverage. Please review.

Project coverage is 95.77%. Comparing base (a1f5d5b) to head (b705bdc).

Files with missing lines	Patch %	Lines
src/layer/vulkan/convolution_vulkan.cpp	50.61%	120 Missing ⚠️
src/gpu.cpp	0.00%	40 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6154      +/-   ##
==========================================
- Coverage   95.78%   95.77%   -0.01%     
==========================================
  Files         835      835              
  Lines      264943   265103     +160     
==========================================
+ Hits       253783   253910     +127     
- Misses      11160    11193      +33

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-07-03T12:28:00Z

The binary size change of libncnn.so (bytes)

architecture	base size	pr size	difference
x86_64	15602512	15606928	+4416 ⚠️
armhf	6611348	6615712	+4364 ⚠️
aarch64	9921904	9987680	+65776 ⚠️

…oopmat-mnk

nihui · 2025-07-13T15:54:23Z

7900xtx	baseline	pr6514
squeezenet	0.5	0.42	-16.00%
mobilenet	0.46	0.4	-13.04%
mobilenet_v2	0.76	0.63	-17.11%
mobilenet_v3	0.88	0.78	-11.36%
shufflenet	0.51	0.51	0.00%
shufflenet_v2	0.73	0.68	-6.85%
mnasnet	0.81	0.66	-18.52%
proxylessnasnet	0.83	0.67	-19.28%
efficientnet_b0	1.44	1.25	-13.19%
efficientnetv2_b0	2.88	2.57	-10.76%
regnety_400m	1.1	1	-9.09%
blazeface	0.4	0.38	-5.00%
googlenet	1.73	1.46	-15.61%
resnet18	0.73	0.73	0.00%
alexnet	0.53	0.53	0.00%
vgg16	1.33	1.33	0.00%
resnet50	1.73	1.49	-13.87%
squeezenet_ssd	1.46	1.36	-6.85%
mobilenet_ssd	1.17	1.03	-11.97%
mobilenet_yolo	0.84	0.7	-16.67%
mobilenetv2_yolov3	2.43	2.44	0.41%
yolov4-tiny	3.21	3.11	-3.12%
nanodet_m	1.45	1.37	-5.52%
yolo-fastest-1.1	0.77	0.7	-9.09%
yolo-fastestv2	0.8	0.75	-6.25%
vision_transformer	7.47	7.48	0.13%
FastestDet	0.77	0.7	-9.09%
			-8.80%

Copilot

Pull Request Overview

This PR integrates flexible cooperative matrix (coopmat) support into the Vulkan 1x1 convolution path by selecting optimal M/N/K tile sizes at runtime and unifying specialized shaders into a single generic shader that handles different element and output packs.

Added get_optimal_cooperative_matrix_mnk utility in GpuInfo to pick best coopmat dimensions.
Replaced two fixed-size pack4 shaders with one parameterized convolution_1x1s1d1_cm.comp.
Extended Convolution_vulkan to configure, pack weight and dispatch the unified shader using dynamic M/N/K and elempack settings.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/gpu.cpp	Implemented runtime coopmat MNK selection and fallback to fp32
src/gpu.h	Declared new `get_optimal_cooperative_matrix_mnk` utility
src/layer/vulkan/shader/convolution_1x1s1d1_cm.comp	Added a single unified 1x1 coopmat shader with dynamic tile sizes
src/layer/vulkan/convolution_vulkan.h	Added coopmat config members (`use_cooperative_matrix`, `coopmat_`, `UNROLL_`)
src/layer/vulkan/convolution_vulkan.cpp	Hooked up coopmat parameters, weight packing, pipeline creation and dispatch logic

Comments suppressed due to low confidence (2)

src/layer/vulkan/convolution_vulkan.h:62

[nitpick] Member names like UNROLL_SG_M use uppercase and underscores, which is inconsistent with the project's camelCase field naming. Consider renaming to unrollSgM or similar for consistency.

    int UNROLL_SG_M;

src/gpu.h:385

[nitpick] Consider adding documentation comments explaining each parameter and the return behavior of get_optimal_cooperative_matrix_mnk, as its signature is complex and understanding M/N/K selection logic will aid future maintainers.

    void get_optimal_cooperative_matrix_mnk(int M, int N, int K, VkComponentTypeKHR type, VkComponentTypeKHR acctype, VkScopeKHR scope, int& coopmat_M, int& coopmat_N, int& coopmat_K) const;

src/layer/vulkan/convolution_vulkan.cpp

nihui and others added 24 commits June 29, 2025 23:24

hah

a51eb36

w

fe2e540

helper function for selecting the optimal coopmat mnk size

77145e3

apply code-format changes

4f10eab

fallback to acctype fp32, fix hardcode

03a0ca6

Merge branch 'coopmat-mnk' of github.com:nihui/ncnn into coopmat-mnk

3e243e9

unroll m n

19c7b97

transpose

2299760

s

bb431a1

s

2bee53a

apply code-format changes

79a3087

unroll

64cd9bc

Merge branch 'coopmat-mnk' of github.com:nihui/ncnn into coopmat-mnk

736c57b

s

eb3959f

optimize load store

0be55aa

opt++

b92159a

transpose mn, opt++

bd63fb0

q

fc23db9

f

2590e45

s

44f0067

f

81788d1

q

48ee2d1

unroll k

b27e981

d

dfe733a

github-actions bot added core vulkan labels Jul 3, 2025

q

5c34b10

nihui and others added 11 commits July 4, 2025 00:13

q

c84ea06

o

edb00fe

cost

fe90a0f

stash

dae82c3

apply code-format changes

0c3dc2f

s

c8112d4

apply code-format changes

9a33826

cc

e4e19fe

Merge remote-tracking branch 'refs/remotes/origin/coopmat-mnk' into c…

414e534

…oopmat-mnk

s

820940c

Merge branch 'master' into coopmat-mnk

b49a531

nihui closed this Jul 8, 2025

nihui reopened this Jul 8, 2025

nihui added 10 commits July 8, 2025 14:32

h

4794fd8

z

58d12c5

q

8e442da

Merge branch 'master' into coopmat-mnk

0c7c551

Merge branch 'master' into coopmat-mnk

71e9dfa

coopmat members

4701dbf

Merge branch 'master' into coopmat-mnk

c59c836

mod

7cf3933

Merge branch 'coopmat-mnk' of github.com:nihui/ncnn into coopmat-mnk

d19b0eb

Merge branch 'master' into coopmat-mnk

910f88e

nihui added 2 commits July 14, 2025 16:53

cc

30275a1

cc

6ae0aa4

nihui requested a review from Copilot July 14, 2025 09:17

Copilot AI reviewed Jul 14, 2025

View reviewed changes

src/layer/vulkan/convolution_vulkan.cpp Outdated Show resolved Hide resolved

src/layer/vulkan/convolution_vulkan.cpp Outdated Show resolved Hide resolved

s

b705bdc

nihui merged commit d395000 into Tencent:master Jul 15, 2025
99 of 104 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

flexible coopmat mnk and unified elempack for vulkan convolution 1x1s1d1 #6154

flexible coopmat mnk and unified elempack for vulkan convolution 1x1s1d1 #6154

nihui commented Jul 3, 2025

Uh oh!

tencent-adm commented Jul 3, 2025

Uh oh!

codecov-commenter commented Jul 3, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 3, 2025 •

edited

Loading

Uh oh!

nihui commented Jul 13, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

flexible coopmat mnk and unified elempack for vulkan convolution 1x1s1d1 #6154

flexible coopmat mnk and unified elempack for vulkan convolution 1x1s1d1 #6154

Conversation

nihui commented Jul 3, 2025

Uh oh!

tencent-adm commented Jul 3, 2025

Uh oh!

codecov-commenter commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nihui commented Jul 13, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Jul 3, 2025 •

edited

Loading

github-actions bot commented Jul 3, 2025 •

edited

Loading