-
Notifications
You must be signed in to change notification settings - Fork 12.2k
Vulkan Implementation #2059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Vulkan Implementation #2059
Changes from 131 commits
Commits
Show all changes
156 commits
Select commit
Hold shift + click to select a range
061246f
Vulkan loader code
0cc4m 4a96d0e
Fix matmul kernel, continue implementation
0cc4m 88d4ec0
Continue implementation
0cc4m a4004d4
Vulkan memory management
0cc4m b0e6585
Vulkan development
0cc4m fc4f207
Matmul call
0cc4m 2471728
Add aligned malloc and free for VMA
0cc4m 8ce84c2
Continue implementation
0cc4m a42376e
First matmul success
0cc4m baf9ff5
GEMM Kernel optimization
0cc4m 1b4863c
1D Blocktiling
0cc4m 7c6860b
2D Blocktiling
0cc4m 0c9cca0
Write coalescing
0cc4m 2c70df9
Continue vulkan implementation and optimization
0cc4m 3adc7b1
First FP16 attempt, disabled for now
0cc4m fc5bb53
Code abstraction, FP16 implementation, fix kernel, add FP16 to FP32 k…
0cc4m c31e14b
Enable device extensions properly, restore fp16 matmul op
0cc4m 40c8f84
Fix mulmat_f16
0cc4m df3cdbd
Output FP32 in fp16 matmul shader
0cc4m cb5cb4d
Fix f16_to_f32 kernel
0cc4m c8ff09b
dequant_q4_0 kernel
0cc4m 4ea9b2f
Add VMA library
0cc4m 36cd5d8
Avoid requesting dedicated memory, VMA can decide that by itself
0cc4m 24eeb97
Add bounds checking to matmul kernels, improve implementation, fix co…
0cc4m 3d7d8d0
add cmake commands
SlyEcho ade9555
Add 2d write operation, profiling code
0cc4m ae7325f
Fix 2d write
0cc4m e35d28f
Fix queue selection for AMD RADV
0cc4m 80b17e2
Fix trailing whitespace in vk_mem_alloc.h
0cc4m 2449390
Add WIP warp tile mat mul shaders
0cc4m 869ae76
Disable glslc optimization
0cc4m ea06a2c
Disable glslc optimization for CMake
0cc4m 6d5a0ad
Merge pull request #2 from SlyEcho/vulkan
0cc4m c3d9475
Optimize warptile matmul shader, replace blocktile with it
0cc4m c7c761a
Add split-k optimization for small matrix multiplication
0cc4m 0ef62f5
Fix validation errors, improve compatibility with AMD GPUs
0cc4m 3bc7a80
Rework command buffer handling
0cc4m 8dd585e
Variable matmul kernel using specialization constants
0cc4m 0c4d841
Fix synchronization on AMD, add barriers for buffer ownership transfe…
0cc4m ad3d28e
Reuse semaphores
0cc4m 22a4cb7
Handle stage flags during command buffer submission properly
0cc4m f58fa51
Increase matmul test runs for consistent results
0cc4m 931a892
Fix F32 matmul
0cc4m 8d351b8
Merge upstream changes, fix conflict
0cc4m e490395
Add vectorized loading and zeropadding for matrix multiplication
0cc4m 105fd19
Use pinned memory for f16 preprocessing
0cc4m 9e97cb0
Don't force aligned matmul
0cc4m b5b1337
Don't free before queue done
0cc4m 3432e37
Replace VMA library with native Vulkan buffer management
0cc4m 754ea68
Basic offloading support with mul_f32 and dmmv for q4_0
0cc4m 2859562
Run glslc commands in parallel
0cc4m 3452095
Unroll loops in dmmv shader
0cc4m f2d4ca3
Reduce usage of waitIdle
0cc4m 67843a3
Reuse pinned allocation for f16 conversion
0cc4m 1ac8ff3
Handle devices with only a single queue
0cc4m 53809c9
Fix trailing whitespace in CMakeLists.txt
0cc4m 4e58028
Allow parallel execution of kernels, parallelize third and fourth dim…
0cc4m 69554ce
Add fallback for devices only supporting one DescriptorSet per Descri…
0cc4m 1b2ec1a
Move to graph function similar to CUDA implementation
0cc4m d0bd120
Use F16 kernel for most things, replace q_f32 with mul_mat_q_f16 func…
0cc4m 44065df
Add F32 dmmv shaders
0cc4m f6b241e
Batch submissions
0cc4m 6bd9bd9
Add .spv to gitignore
0cc4m 2231618
Split off matrix vector multiplication for separate optimization
0cc4m 582c825
Use single command buffer for matrix vector multiplication ops
0cc4m dc6e677
Reduce overhead of mul_f32 calls by using a single command buffer
0cc4m 75788fe
Add submission batching to mul_f32
0cc4m c638955
Fix tests
0cc4m 44bbc85
Add missing barrier
0cc4m ccd2592
Add further missing barrier
0cc4m e660943
Add further ops
0cc4m a07f603
Replace vk::QueueFamilyIgnored with VK_QUEUE_FAMILY_IGNORED to suppor…
0cc4m 7ac00de
Remove unnecessary cblas link
0cc4m 1132941
Fix descriptor set pre-allocation assert
0cc4m a47ca7a
Add runtime shader compilation, start transferring shaders to this ap…
0cc4m 592ebb0
Transfer remaining shaders to header and compile on runtime
0cc4m 01d22a4
Merge upstream changes, fix conflict
0cc4m e9be24f
Fix fp32 fallback if device doesn't support fp16, add force disable e…
0cc4m 7e88677
Add support for q4_1, q5_0, q5_1 and q8_0
0cc4m 5ae5d2b
Remove unnecessary scalar layout extension
0cc4m 7f89e40
Parse graph early to pre-record command buffers
0cc4m b6591b5
Merge upstream changes, fix conflicts
0cc4m 42bfa88
Add q6_k support
0cc4m da09a02
Add multi-submit for command buffers
0cc4m 39bd512
Fix q6_k dequant shader for AMD
0cc4m 85c1a63
Fix q6_k for GPUs without fp16 support
0cc4m dad1cdb
Simplify q6_k fp16 fix
0cc4m e2962e1
Minor fixes
0cc4m b447229
Fix wg_denom of m-mulmat shaders
0cc4m 73d01d1
Add Python-based Vulkan shader generator
0cc4m de4b813
Replace shaderc dependency with precompiled shaders
0cc4m 1e6e13f
Clean up code
0cc4m 7efac61
Fix shader generator script Windows compatibility
0cc4m bd05447
Close file before deletion
0cc4m 35b10d1
Merge upstream changes, fix conflict
0cc4m e90a651
Fix vulkan shader fp32 name
0cc4m a861879
Add q2_k and q3_k support
0cc4m 4a97d2d
Add q4_k support
0cc4m 0ec595f
Add q5_k support
0cc4m 1b66b8b
Bake SPIR-V bytecode into the library instead of loading shaders from…
0cc4m a0db45f
Switch to signal semaphores for flexibility
0cc4m 3de5ba4
Finish broadcasting mul mat support for GQA
0cc4m 0230981
Clean up unused functions
0cc4m d130fe6
Merge remote-tracking branch 'origin/master' into vulkan
0cc4m 1cb90e5
Add further ops, not yet enabled. Improve semaphore code
0cc4m 2c7fa8d
Reduce number of used semaphores by utilizing timelines more properly
0cc4m 80bfc59
Remove queue information
0cc4m 2e01682
Reuse timeline semaphores, allow parallel operation with binary semap…
0cc4m 4b7eccc
Add Vulkan to llama-bench
0cc4m 20787d8
Merge upstream changes, fix conflicts
0cc4m 00bea85
Remove cblas dependency
0cc4m bd7fa3f
Fix matmul k-split bug
0cc4m 7f05c7f
Fix q4_k dmmv K_QUANTS_PER_ITERATION 1 shader
0cc4m e969445
Add RMS Norm shader, rework op_f32 shader setup, fix matmul bug
0cc4m 39cd277
Fix issues with float16 overflows in shaders
0cc4m 7551889
Merge upstream changes, fix conflicts
0cc4m 471a1b0
Fix issues with older Vulkan headers on Ubuntu 22.04
0cc4m d9ca456
Allow multi-op partial offloading by parsing the graph to preallocate…
0cc4m fc63f88
Implement further ops, rework op_f32 calls, fix bugs
0cc4m ff93769
Finish full offloading support, add last remaining ops, fix bugs, rem…
0cc4m 0c708c1
Upload generated file ggml-vulkan-shaders.hpp, remove redundant shaders
0cc4m 2f5529e
Merge upstream changes, fix conflicts, adapt per-layer kv
0cc4m 2c8a156
Merge upstream changes, fix conflicts, adapt soft_max op
0cc4m cd34b87
Fix Python and shader header format
0cc4m c05883f
Free model gpu buffers on exit
0cc4m 5fef0d6
Merge remote-tracking branch 'origin/master' into vulkan
0cc4m e9e2be3
Use single queue per device to simplify code
0cc4m 7b36cea
Add matmul shader support for running multiple calculations in parallel
0cc4m 918c333
Merge upstream changes, fix staging buffer usage
0cc4m 542ae3b
Merge upstream changes, fix conflicts
0cc4m c3290d2
Switch from semaphore-synchronized multiple command buffers per op to…
0cc4m 02d2e38
Fix missing event cast
0cc4m 2d14b22
Merge upstream changes, implement basic vulkan backend
0cc4m 1811c4e
Replace uint64_t(-1) with UINT64_MAX, rename function for clarity
0cc4m f84c54f
Fix warning about empty C function parameters
0cc4m c0f3474
Fix compiler warnings
0cc4m 6e61742
Properly implement Vulkan backend buffer handling
0cc4m 00f214c
Fix oversized host staging buffers
0cc4m 1f55cd2
Simplify barrier synchronization calls
0cc4m 7fa5ca9
Fix gcc warnings
0cc4m f652ebf
Implement max_size for backend buffer types to limit the size of a si…
0cc4m bcf2a44
Use min of maxMemoryAllocationSize and maxBufferSize for device max a…
0cc4m 6b97c71
refactor multi buf
slaren f2c364a
Disable unsupported ops to fix tests
0cc4m 1c953c1
Check for maintenance4 support before using it
0cc4m 566a178
Handle devices with only a single queue
0cc4m 3742b6c
Fix single queue logic
0cc4m bc5e64b
propagate buffer usage in multi buffers
slaren 3a15a01
Implement rope_neox op
0cc4m 82ce1c4
Cleanup header and other files
0cc4m 5a8a07e
Simplify gpu_extras by removing events and putting staging memcpys in…
0cc4m a5cca6c
Move queue into context
0cc4m 48ad459
Simplify context use, optimize matmul shader for warp size 64 (AMD GC…
0cc4m 9c4c15a
Merge branch 'master' into vulkan
ggerganov e3acca3
Add get_max_size to SYCL backend.
0cc4m 10fbb1f
llama : fix trailing whitespace
ggerganov File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.