Skip to content

LinearSolve of Block-Tridiagonal matrix is slower than Sparse Arrays #225

Open
@rsenne

Description

@rsenne

So basically I need to do a linear solve on a block-tridiagonal matrix. I figured I would compare against SparseArrays and a custom routine I just wrote. Here is a MWRE.

using Random
using LinearAlgebra
using BlockBandedMatrices
using SparseArrays
using BlockArrays: getblock
using BenchmarkTools

# Parameters
k = 2       # block size
n = 5       # number of blocks

# Create SPD blocks
function make_spd_block(k)
    A = randn(k, k)
    return A'A + k * I
end

Bs = [make_spd_block(k) for _ in 1:n]         # diagonal blocks
Cs = [randn(k, k) for _ in 1:(n-1)]           # off-diagonal blocks

# Build full dense matrix for comparison
function build_full_block_tridiag(Bs, Cs)
    k, n = size(Bs[1], 1), length(Bs)
    A = zeros(k*n, k*n)
    for i in 1:n
        A[(k*(i-1)+1):(k*i), (k*(i-1)+1):(k*i)] .= Bs[i]
        if i < n
            A[(k*(i-1)+1):(k*i), (k*i+1):(k*(i+1))] .= Cs[i]'
            A[(k*i+1):(k*(i+1)), (k*(i-1)+1):(k*i)] .= Cs[i]
        end
    end
    return Symmetric(A)
end

A_dense = build_full_block_tridiag(Bs, Cs)
b = randn(k * n)

# Method 1: Custom block Cholesky solver
function custom_block_cholesky(Bs, Cs, b)
    n = length(Bs)
    k = size(Bs[1], 1)
    Ls = Vector{Matrix{Float64}}(undef, n)
    Loffs = Vector{Matrix{Float64}}(undef, n - 1)

    # Cholesky factorization
    Ls[1] = cholesky(Bs[1]).L
    for i in 2:n
        Loffs[i-1] = Cs[i-1] * inv(Ls[i-1]')
        S = Bs[i] - Loffs[i-1] * Loffs[i-1]'
        Ls[i] = cholesky(S).L
    end

    # Forward substitution
    y = zeros(k * n)
    for i in 1:n
        idx = (k*(i-1)+1):(k*i)
        y[idx] .= b[idx]
        if i > 1
            prev_idx = (k*(i-2)+1):(k*(i-1))
            y[idx] .-= Loffs[i-1] * y[prev_idx]
        end
        y[idx] .= Ls[i] \ y[idx]
    end

    # Backward substitution
    x = zeros(k * n)
    for i in n:-1:1
        idx = (k*(i-1)+1):(k*i)
        x[idx] .= y[idx]
        if i < n
            next_idx = (k*i+1):(k*(i+1))
            x[idx] .-= Loffs[i]' * x[next_idx]
        end
        x[idx] .= Ls[i]' \ x[idx]
    end

    return x
end

x_custom = custom_block_cholesky(Bs, Cs, b)

# Method 2: BlockBandedMatrix solve
block_sizes = fill(k, n)
A_bbm = BlockBandedMatrix{Float64}(undef, block_sizes, block_sizes, (1, 1))

for i in 1:n
    getblock(A_bbm, i, i) .= Bs[i]
end

for i in 1:(n-1)
    getblock(A_bbm, i+1, i) .= Cs[i]
    getblock(A_bbm, i, i+1) .= Cs[i]'
end

x_bbm = A_bbm \ b

# Method 3: Sparse matrix solve
A_sparse = sparse(A_dense)
x_sparse = A_sparse \ b

When doing this I found all methods produced the same results to machine precision. I also found my custom method was the fastest. here were the results i got using the @btime macro.

Mine: 4.343 μs (178 allocations: 8.08 KiB)
BlockBandedMatrices: 23.400 μs (394 allocations: 412.88 KiB)
SparseArrays: 9.200 μs (60 allocations: 7.42 KiB)

Obviously I'm not using any specific matrix types to do this but just using the blocks I know exist inside vectors and then assuming symmetry. But I was curious if potentially this could be a viable improvement over the current linear solving for this specific matrix form.

Edit

Also I realize the use of Cholesky here is very brittle and I'm making some idealized assumptions. I assume someone with more numerical Linear Algebra experience could probably make this better :)

Edit 2

I discovered a more stable LU block factorization here is the code.

	function custom_block_lu(As, Bs, Cs, b)
	
	    n = length(As)  # number of block rows
	    k = size(As[1], 1)  # block size
	    
	    # Storage for LU factors
	    Us = Vector{Matrix{Float64}}(undef, n)      # Upper diagonal blocks
	    Ls = Vector{Matrix{Float64}}(undef, n-1)    # Lower off-diagonal blocks
	    
	    # LU Factorization Phase
	    # =====================
	    
	    # First block: U₁ = A₁
	    Us[1] = copy(As[1])
	    
	    for i = 1:n-1
	        # Compute lower off-diagonal: Lᵢ = Bᵢ * Uᵢ⁻¹
	        Ls[i] = Bs[i] / Us[i]
	        
	        # Update next diagonal block: Uᵢ₊₁ = Aᵢ₊₁ - Lᵢ * Cᵢ
	        Us[i+1] = As[i+1] - Ls[i] * Cs[i]
	    end
	    
	    # Forward Substitution Phase (Solve Ly = b)
	    # ========================================
	    
	    y = zeros(k * n)
	    
	    for i = 1:n
	        idx = (k*(i-1)+1):(k*i)
	        y[idx] = b[idx]
	        
	        # Subtract contribution from previous block
	        if i > 1
	            prev_idx = (k*(i-2)+1):(k*(i-1))
	            y[idx] -= Ls[i-1] * y[prev_idx]
	        end
	    end
	    
	    # Backward Substitution Phase (Solve Ux = y)
	    # ==========================================
	    
	    x = zeros(k * n)
	    
	    for i = n:-1:1
	        idx = (k*(i-1)+1):(k*i)
	        x[idx] = y[idx]
	        
	        # Subtract contribution from next block
	        if i < n
	            next_idx = (k*i+1):(k*(i+1))
	            x[idx] -= Cs[i] * x[next_idx]
	        end
	        
	        # Solve with diagonal block
	        x[idx] = Us[i] \ x[idx]
	    end
	    
	    return x
	end

For a block-tridiagonal system with 50 blocks of dimension 2 here is the relative performance:

Cholesky: Fails (Not PD)
BBM: 829.400 μs (4445 allocations: 4.93 MiB)
Sparse: 282.000 μs (147 allocations: 223.47 KiB)
LU: 135.100 μs (1982 allocations: 89.11 KiB)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions