Skip to content

Commit 558a764

Browse files
Force FP32 compute in GLM4 FFN Down (ggml-org#13101)
* Force FP32 compute in cuBLAS GEMM * Revert "Force FP32 compute in cuBLAS GEMM" This reverts commit 6efd872. * Force F32 compute in GLM4 ffn down * Edit comment to clarify issue Co-authored-by: Johannes Gäßler <[email protected]> --------- Co-authored-by: Johannes Gäßler <[email protected]>
1 parent edb18b6 commit 558a764

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

src/llama-graph.cpp

+4
Original file line numberDiff line numberDiff line change
@@ -803,6 +803,10 @@ ggml_tensor * llm_graph_context::build_ffn(
803803

804804
if (down) {
805805
cur = build_lora_mm(down, cur);
806+
if (arch == LLM_ARCH_GLM4) {
807+
// GLM4 seems to have numerical issues with half-precision accumulators
808+
ggml_mul_mat_set_prec(cur, GGML_PREC_F32);
809+
}
806810
}
807811

808812
if (down_b) {

0 commit comments

Comments
 (0)