Two questions about the likelihood in `ExactGPs`, and fitting GPRs that smoothly account for noise #2094

miguelgondu · 2022-08-09T08:52:49Z

miguelgondu
Aug 9, 2022

In issue #2092 I had a couple of questions about the difference between the posterior f_pred and the predictive posterior y_pred. The way I phrased this question in the issue wasn't clear enough unfortunately, because I think @gpleiss thought I meant to ask why we add self.likelihood in the exact prediction strategy.

First, I'm curious why we have to call y_pred = likelihood(model(x)) given that, in eval mode, model(x) already returns the distribution $\mathcal{N}(K_{x'x}(K_{xx} + \sigma^2 I)^{-1}y, K_{x'x'} - K_{x'x}(K_{xx} + \sigma^2 I)^{-1}K_{xx'})$ (where $x'$ are the test inputs). I might be confusing the likelihood for noise modelling, but I thought that the posterior model(x) already accounted for the noise after looking at the exact prediction strategy.

Second, I was expecting samples from y_pred to smoothly account for the noise, but they don't. This is to be expected, and I'm definitely misunderstanding what the likelihood does.

Do you have any tips for building a model whose posterior samples account for the noise? And do you have any references you could point me to to better understand the difference between noise modelling and the likelihood?

Thanks!

For completeness, here's the code used to create the image above.

"""
Just a slightly modified copy of
https://docs.gpytorch.ai/en/latest/examples/01_Exact_GPs/Simple_GP_Regression.html
"""
import numpy as np
import torch
from matplotlib import pyplot as plt

import gpytorch

noise_level = 1e-1
train_x = torch.linspace(0, 2 * np.pi, 100)
train_y = torch.sin(train_x) + noise_level * torch.randn(train_x.size())

# Everything's the same as in the tutorial, except
# we add a prior to have high lengthscales.
class ExactGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y):
        likelihood = gpytorch.likelihoods.GaussianLikelihood()
        super(ExactGPModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module = gpytorch.kernels.ScaleKernel(
            gpytorch.kernels.RBFKernel(
                lengthscale_prior=gpytorch.priors.GammaPrior(10.0, 1.0)
            )
        )

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)


model = ExactGPModel(train_x, train_y)
model.train()
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
mll = gpytorch.mlls.ExactMarginalLogLikelihood(model.likelihood, model)

for i in range(100):
    optimizer.zero_grad()
    output = model(train_x)
    loss = -mll(output, train_y)
    loss.backward()
    optimizer.step()

model.eval()

new_inputs = torch.linspace(0, 2 * np.pi, 50)
pred1 = model(new_inputs)
pred2 = model.likelihood(model(new_inputs))

fig, (ax1, ax2) = plt.subplots(1, 2, sharey=True, figsize=(2 * 7, 7))

ax1.plot(new_inputs.detach().numpy(), pred1.mean.detach().numpy())
ax2.plot(new_inputs.detach().numpy(), pred2.mean.detach().numpy())

ax1.plot(train_x.detach().numpy(), train_y.detach().numpy(), "*k", alpha=0.2)
ax2.plot(train_x.detach().numpy(), train_y.detach().numpy(), "*k", alpha=0.2)

for _ in range(5):
    sample1 = pred1.sample().detach().numpy()
    ax1.plot(new_inputs.detach().numpy(), sample1, "--")

    sample2 = pred2.sample().detach().numpy()
    ax2.plot(new_inputs.detach().numpy(), sample2, "--")

ax1.set_title("Samples from model(new_data)")
ax2.set_title("Samples from likelihood(model(new_data))")

plt.tight_layout()
plt.show()
plt.close()

Answered by gpleiss

Aug 9, 2022

First, I'm curious why we have to call y_pred = likelihood(model(x)) given that, in eval mode, model(x) already returns the distribution...

model(x) returns the posterior distribution f(x*) | y ~ N( k^* (K + \sigma^2 I)^{-1} y, k** - k^* (K + \sigma^2 I) k).
likelihood(model(x)) returns the posterior predictive distribution y(x*) | y ~ N( k^* (K + \sigma^2 I)^{-1} y, k** - k^* (K + \sigma^2 I) k + \sigma^2).

The difference between them is an additional \sigma^2 variance term. This is because y(x*) | y = f(x*) | y + \epsilon, where \epsilon ~ N(0, \sigma^2) is the observational noise.

I might be confusing the likelihood for noise modelling, but I thought that the posterior model(x) al…

View full answer

gpleiss · 2022-08-09T20:51:35Z

gpleiss
Aug 9, 2022
Maintainer

First, I'm curious why we have to call y_pred = likelihood(model(x)) given that, in eval mode, model(x) already returns the distribution...

model(x) returns the posterior distribution f(x*) | y ~ N( k^* (K + \sigma^2 I)^{-1} y, k** - k^* (K + \sigma^2 I) k).
likelihood(model(x)) returns the posterior predictive distribution y(x*) | y ~ N( k^* (K + \sigma^2 I)^{-1} y, k** - k^* (K + \sigma^2 I) k + \sigma^2).

The difference between them is an additional \sigma^2 variance term. This is because y(x*) | y = f(x*) | y + \epsilon, where \epsilon ~ N(0, \sigma^2) is the observational noise.

I might be confusing the likelihood for noise modelling, but I thought that the posterior model(x) already accounted for the noise.

The posterior model(x) accounts for observational noise in the training data. likelihood(model(x)) accounts for observational noise on the unobserved test data.

Second, I was expecting samples from y_pred to smoothly account for the noise, but they don't. This is to be expected, and I'm definitely misunderstanding what the likelihood does.

Again, this is because y_pred is modeling the latent function f as well as the observational noise we expect on y^*.

1 reply

miguelgondu Aug 10, 2022
Author

Thanks, Geoff!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Two questions about the likelihood in `ExactGPs`, and fitting GPRs that smoothly account for noise #2094

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Two questions about the likelihood in ExactGPs, and fitting GPRs that smoothly account for noise #2094

Uh oh!

miguelgondu Aug 9, 2022

Replies: 1 comment · 1 reply

Uh oh!

gpleiss Aug 9, 2022 Maintainer

Uh oh!

miguelgondu Aug 10, 2022 Author

Two questions about the likelihood in `ExactGPs`, and fitting GPRs that smoothly account for noise #2094

miguelgondu
Aug 9, 2022

Replies: 1 comment 1 reply

gpleiss
Aug 9, 2022
Maintainer

miguelgondu Aug 10, 2022
Author