Skip to content

Commit cb5462d

Browse files
authored
fix: add zero init for KV tiled copy (#1029)
This PR fixes the out-of-bound K/V value loading in the FA3 `sparse_mainloop.cuh`, which may cause a nan value for `BatchPrefillWithPagedKVCacheSM90Run` and `BlockSparseAttentionWrapper `. The accumulation of out-of-bound V and S causes the nan value. This may solve the issue #1018.
1 parent 40c89fd commit cb5462d

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

include/flashinfer/attention/hopper/sparse_mainloop.cuh

+1-1
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ struct SparseCollectiveMainloop {
5353
static constexpr auto AlignmentKV = 128 / cutlass::sizeof_bits<DTypeKV>::value;
5454
using AlignmentTypeKV = cute::uint_byte_t<static_cast<int>(sizeof(DTypeKV)) * AlignmentKV>;
5555
// NOTE(Zihao): use SM80_CP_ASYNC for sparse loading of KV-cache
56-
using GmemCopyAtomKV = cute::Copy_Atom<SM80_CP_ASYNC_CACHEGLOBAL<AlignmentTypeKV>, DTypeKV>;
56+
using GmemCopyAtomKV = cute::Copy_Atom<SM80_CP_ASYNC_CACHEGLOBAL_ZFILL<AlignmentTypeKV>, DTypeKV>;
5757
using GmemTiledCopyK =
5858
decltype(cutlass::gemm::collective::detail::make_simt_gmem_tiled_copy<
5959
GmemCopyAtomKV, NUM_COPY_THREADS, AlignmentKV,

0 commit comments

Comments
 (0)