Description
Heya,
I am implementing a model based on a Python GPflow implementation and came across a difference in the variances calculated by GPflow and ApproximateGPs for the SparseVariationalApproximation.
In the ApproximateGPs implementation variances at X based on the approx GP posterior are calculated as follows:
function Statistics.var(f::ApproxPosteriorGP{<:SparseVariationalApproximation}, x::AbstractVector)
A = _A(f, x)
return var(f.prior, x) - diag_At_A(A) + diag_At_A(f.data.B' * A)
end
The GPflow implementation does almost the same, albeit for a slight difference in case the GP is unwhitened.
See the following line marking the difference: https://github.com/GPflow/GPflow/blob/81fe1fb86a77d3d49d30ab4f8e739e25bbd71f56/gpflow/conditionals/util.py#L139
This would result in the following function for NonCentered SVA:
function Statistics.var(f::ApproxPosteriorGP{<:SparseVariationalApproximation{NonCentered}}, x::AbstractVector)
A = _A(f, x)
chol_Kuu = _chol_cov(f.approx.fz)
return var(f.prior, x) - diag_At_A(A) + diag_At_A(f.data.B' * (adjoint(_chol_lower(chol_Kuu)) \ A))
end
Note the use of a different A matrix in the last portion of the calculation.
Implementing the above function gives me the same variance as produced by GPflow.
I am unsure where this difference comes from, I am not that deep in the math behind GPs, but maybe you guys know where this comes from and which implementation is correct?
Cheers,
Alex