Does `jax.lax.linalg.tridiagonal_solve` correctly use the `cusparse` batched implementations when appropriate? #28371

jpbrodrick89 · 2025-04-29T01:43:29Z

jpbrodrick89
Apr 29, 2025

I've been trying to follow through the source code to answer this myself, but I keep getting lost/confused.

When looking at the cusparse library docs I noticed there are specific batched implementations, e.g. cusparse<t>gtsv2StridedBatch() and cusparse<t>gtsvInterleavedBatch(), which I assume are quite efficient. However, when profiling tridiagonal_solve with an array size of 200 and batch sizes up to 6400, I observe a linear complexity throughout despite not reaching memory/kernel limits of A100. Alternative implementations using unrolled loops show that sub-linear scaling should be possible in this regime, which makes me think that the batched implementations might not actually be used here. Is this a "bug", a "missing feature", or simply the current upstream behaviour of the batched cusparse routines? If not the latter, how complicated would it be to address?

Thank you!

dfm · 2025-04-29T09:12:30Z

dfm
Apr 29, 2025
Collaborator

Good question! It looks like this op currently bottoms out in the unbatched implementation (the relevant backend code is here) with a loop over the batch dimensions. It seems like a good feature request to use the batched solvers when we can, and shouldn't be too hard to implement! Perhaps it's worth opening the feature request as an issue with more details about your specific use case?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Does `jax.lax.linalg.tridiagonal_solve` correctly use the `cusparse` batched implementations when appropriate? #28371

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Does jax.lax.linalg.tridiagonal_solve correctly use the cusparse batched implementations when appropriate? #28371

Uh oh!

jpbrodrick89 Apr 29, 2025

Replies: 1 comment

Uh oh!

dfm Apr 29, 2025 Collaborator

Does `jax.lax.linalg.tridiagonal_solve` correctly use the `cusparse` batched implementations when appropriate? #28371

jpbrodrick89
Apr 29, 2025

dfm
Apr 29, 2025
Collaborator