Fix for tensor parallelism with newly split sin/cos tensors #758

MikeRoz47 · 2025-03-23T00:46:08Z

Since 7b05acd, sin and cos in the device context are split by layer. However, the external functions tp_attn_forward_ and tp_attn_forward_paged_ expect a single tensor per device. I've changed the get_sin_cos method of the TPContext class to concatenate these tensors before providing them to the external functions.

The error message I was receiving before making this change was:

TypeError: tp_attn_forward_(): incompatible function arguments. The following argument types are supported:
    1. (arg0: int, arg1: torch.Tensor, arg2: list[torch.Tensor], arg3: list[torch.Tensor], arg4: list[torch.Tensor], arg5: list[torch.Tensor], arg6: list[torch.Tensor], arg7: list[torch.Tensor], arg8: list[torch.Tensor], arg9: list[torch.Tensor], arg10: list[torch.Tensor], arg11: list[torch.Tensor], arg12: float, arg13: list[int], arg14: list[int], arg15: list[int], arg16: list[int], arg17: int, arg18: int, arg19: int, arg20: int, arg21: list[torch.Tensor], arg22: list[torch.Tensor], arg23: list[torch.Tensor], arg24: float) -> None

Invoked with: 1399607536, tensor([[     0.00057,      0.00150,     -0.00022,  ...,     -0.00005,     -0.00012,     -0.00014],

followed by many more lines of truncated string representations of tensors.

I've tested this change locally, which seemed to exercise the tp_attn_forward_ path. I'm not sure how I'd exercise the tp_attn_forward_paged_ path. Where inference was failing before due to this error, I'm now receiving coherent output that seems consistent from what I've seen from the model before.

frenzybiscuit · 2025-03-23T17:30:48Z

Does this fix the massive performance loss on windows 11?

Linux -> 21 token/second
Windows 11 -> 14 token/second

This is with 70B models and tensor parallel. If I disable TP on windows 11, performance jumps to around 18 token/second.

MikeRoz47 · 2025-03-23T17:38:01Z

I don't know anything about that. This PR fixes a fatal error I received when performing inference.

EDIT: I'll also add that I received the error when using the dev branch, not master or any release.

Ph0rk0z · 2025-03-23T19:45:31Z

I think you found it.

This looks suspiciously like the error I got when running post-gemma dev. Fails on test inference while loading the model in tabby.

Unfortunately I'm using paged since it's all ampere cards with FA.

MikeRoz47 · 2025-03-24T14:09:22Z

Just so you know, the paged path should be fixed too, I just don't think my test exercised that path. But I'm running Ampere cards too, so maybe it did? I just know that my first failure was in trying to call tp_attn_forward_.

Ph0rk0z · 2025-03-24T14:56:17Z

I couldn't see the top of the message due to mosh only showing so many lines, but the tensor puke on the bottom looks about right.

In tabby it defaults to paged and in textgen I think it mainly uses the classic engine. Should just try it when I get back.

edit: It worked. Same model that failed before now loads.

turboderp · 2025-04-11T20:11:04Z

Sorry I missed this. (:

EthanAndersonUSA · 2025-04-15T12:44:43Z

Huge thanks for fixing this!

concatenate the sin and cos tensors

edecbf0

turboderp merged commit 61450b4 into turboderp-org:dev Apr 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for tensor parallelism with newly split sin/cos tensors #758

Fix for tensor parallelism with newly split sin/cos tensors #758

MikeRoz47 commented Mar 23, 2025

frenzybiscuit commented Mar 23, 2025

MikeRoz47 commented Mar 23, 2025 •

edited

Loading

Ph0rk0z commented Mar 23, 2025

MikeRoz47 commented Mar 24, 2025

Ph0rk0z commented Mar 24, 2025 •

edited

Loading

turboderp commented Apr 11, 2025

EthanAndersonUSA commented Apr 15, 2025

Fix for tensor parallelism with newly split sin/cos tensors #758

Fix for tensor parallelism with newly split sin/cos tensors #758

Conversation

MikeRoz47 commented Mar 23, 2025

frenzybiscuit commented Mar 23, 2025

MikeRoz47 commented Mar 23, 2025 • edited Loading

Ph0rk0z commented Mar 23, 2025

MikeRoz47 commented Mar 24, 2025

Ph0rk0z commented Mar 24, 2025 • edited Loading

turboderp commented Apr 11, 2025

EthanAndersonUSA commented Apr 15, 2025

MikeRoz47 commented Mar 23, 2025 •

edited

Loading

Ph0rk0z commented Mar 24, 2025 •

edited

Loading