Description
So basically I need to do a linear solve on a block-tridiagonal matrix. I figured I would compare against SparseArrays and a custom routine I just wrote. Here is a MWRE.
using Random
using LinearAlgebra
using BlockBandedMatrices
using SparseArrays
using BlockArrays: getblock
using BenchmarkTools
# Parameters
k = 2 # block size
n = 5 # number of blocks
# Create SPD blocks
function make_spd_block(k)
A = randn(k, k)
return A'A + k * I
end
Bs = [make_spd_block(k) for _ in 1:n] # diagonal blocks
Cs = [randn(k, k) for _ in 1:(n-1)] # off-diagonal blocks
# Build full dense matrix for comparison
function build_full_block_tridiag(Bs, Cs)
k, n = size(Bs[1], 1), length(Bs)
A = zeros(k*n, k*n)
for i in 1:n
A[(k*(i-1)+1):(k*i), (k*(i-1)+1):(k*i)] .= Bs[i]
if i < n
A[(k*(i-1)+1):(k*i), (k*i+1):(k*(i+1))] .= Cs[i]'
A[(k*i+1):(k*(i+1)), (k*(i-1)+1):(k*i)] .= Cs[i]
end
end
return Symmetric(A)
end
A_dense = build_full_block_tridiag(Bs, Cs)
b = randn(k * n)
# Method 1: Custom block Cholesky solver
function custom_block_cholesky(Bs, Cs, b)
n = length(Bs)
k = size(Bs[1], 1)
Ls = Vector{Matrix{Float64}}(undef, n)
Loffs = Vector{Matrix{Float64}}(undef, n - 1)
# Cholesky factorization
Ls[1] = cholesky(Bs[1]).L
for i in 2:n
Loffs[i-1] = Cs[i-1] * inv(Ls[i-1]')
S = Bs[i] - Loffs[i-1] * Loffs[i-1]'
Ls[i] = cholesky(S).L
end
# Forward substitution
y = zeros(k * n)
for i in 1:n
idx = (k*(i-1)+1):(k*i)
y[idx] .= b[idx]
if i > 1
prev_idx = (k*(i-2)+1):(k*(i-1))
y[idx] .-= Loffs[i-1] * y[prev_idx]
end
y[idx] .= Ls[i] \ y[idx]
end
# Backward substitution
x = zeros(k * n)
for i in n:-1:1
idx = (k*(i-1)+1):(k*i)
x[idx] .= y[idx]
if i < n
next_idx = (k*i+1):(k*(i+1))
x[idx] .-= Loffs[i]' * x[next_idx]
end
x[idx] .= Ls[i]' \ x[idx]
end
return x
end
x_custom = custom_block_cholesky(Bs, Cs, b)
# Method 2: BlockBandedMatrix solve
block_sizes = fill(k, n)
A_bbm = BlockBandedMatrix{Float64}(undef, block_sizes, block_sizes, (1, 1))
for i in 1:n
getblock(A_bbm, i, i) .= Bs[i]
end
for i in 1:(n-1)
getblock(A_bbm, i+1, i) .= Cs[i]
getblock(A_bbm, i, i+1) .= Cs[i]'
end
x_bbm = A_bbm \ b
# Method 3: Sparse matrix solve
A_sparse = sparse(A_dense)
x_sparse = A_sparse \ b
When doing this I found all methods produced the same results to machine precision. I also found my custom method was the fastest. here were the results i got using the @btime
macro.
Mine: 4.343 μs (178 allocations: 8.08 KiB)
BlockBandedMatrices: 23.400 μs (394 allocations: 412.88 KiB)
SparseArrays: 9.200 μs (60 allocations: 7.42 KiB)
Obviously I'm not using any specific matrix types to do this but just using the blocks I know exist inside vectors and then assuming symmetry. But I was curious if potentially this could be a viable improvement over the current linear solving for this specific matrix form.
Edit
Also I realize the use of Cholesky here is very brittle and I'm making some idealized assumptions. I assume someone with more numerical Linear Algebra experience could probably make this better :)
Edit 2
I discovered a more stable LU block factorization here is the code.
function custom_block_lu(As, Bs, Cs, b)
n = length(As) # number of block rows
k = size(As[1], 1) # block size
# Storage for LU factors
Us = Vector{Matrix{Float64}}(undef, n) # Upper diagonal blocks
Ls = Vector{Matrix{Float64}}(undef, n-1) # Lower off-diagonal blocks
# LU Factorization Phase
# =====================
# First block: U₁ = A₁
Us[1] = copy(As[1])
for i = 1:n-1
# Compute lower off-diagonal: Lᵢ = Bᵢ * Uᵢ⁻¹
Ls[i] = Bs[i] / Us[i]
# Update next diagonal block: Uᵢ₊₁ = Aᵢ₊₁ - Lᵢ * Cᵢ
Us[i+1] = As[i+1] - Ls[i] * Cs[i]
end
# Forward Substitution Phase (Solve Ly = b)
# ========================================
y = zeros(k * n)
for i = 1:n
idx = (k*(i-1)+1):(k*i)
y[idx] = b[idx]
# Subtract contribution from previous block
if i > 1
prev_idx = (k*(i-2)+1):(k*(i-1))
y[idx] -= Ls[i-1] * y[prev_idx]
end
end
# Backward Substitution Phase (Solve Ux = y)
# ==========================================
x = zeros(k * n)
for i = n:-1:1
idx = (k*(i-1)+1):(k*i)
x[idx] = y[idx]
# Subtract contribution from next block
if i < n
next_idx = (k*i+1):(k*(i+1))
x[idx] -= Cs[i] * x[next_idx]
end
# Solve with diagonal block
x[idx] = Us[i] \ x[idx]
end
return x
end
For a block-tridiagonal system with 50 blocks of dimension 2 here is the relative performance:
Cholesky: Fails (Not PD)
BBM: 829.400 μs (4445 allocations: 4.93 MiB)
Sparse: 282.000 μs (147 allocations: 223.47 KiB)
LU: 135.100 μs (1982 allocations: 89.11 KiB)