Skip to content

Commit 9956c29

Browse files
authored
perf: add -DNDEBUG compilation flag (#998)
We observed some performance degradation after upgrading cutlass to v3.9. This is because some runtime checks are added to cutlass in debug mode: * https://github.com/NVIDIA/cutlass/blob/df8a550d3917b0e97f416b2ed8c2d786f7f686a3/include/cutlass/pipeline/sm90_pipeline.hpp#L66-L70 * https://github.com/NVIDIA/cutlass/blob/df8a550d3917b0e97f416b2ed8c2d786f7f686a3/include/cutlass/pipeline/sm90_pipeline.hpp#L82-L86 This PR addresses the issue by adding `-DNDEBUG` to compilation flags in both JIT and AOT mode. After this PR, we no longer observe the performance degradation.
1 parent 25d67b5 commit 9956c29

File tree

2 files changed

+4
-0
lines changed

2 files changed

+4
-0
lines changed

flashinfer/jit/core.py

+3
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,9 @@ def load_cuda_ops(
114114
"--ptxas-options=-v",
115115
"--ptxas-options=--verbose,--register-usage-level=10,--warn-on-local-memory-usage",
116116
]
117+
else:
118+
# non debug mode
119+
cuda_cflags += ["-DNDEBUG"]
117120

118121
cflags += extra_cflags
119122
cuda_cflags += extra_cuda_cflags

setup.py

+1
Original file line numberDiff line numberDiff line change
@@ -225,6 +225,7 @@ def __init__(self, *args, **kwargs) -> None:
225225
"-Xfatbin",
226226
"-compress-all",
227227
"-use_fast_math",
228+
"-DNDEBUG",
228229
"-DPy_LIMITED_API=0x03080000",
229230
]
230231
libraries = [

0 commit comments

Comments
 (0)