Fix training status of noise model of `HeteroskedasticNoise` after exceptions #2382

fjzzq2002 · 2023-07-20T18:24:38Z

In the current implementation of HeteroskedasticNoise.forward, self.noise_model.train(training) is set after the output from self.noise_model is received. When an exception is thrown by self.noise_model(), this reset is not called, leaving self.noise_model in evaluation mode. This patch fixes this scenario by adding a try-finally block.

The following is a typical error example:

import gpytorch
import torch

class ExactGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(ExactGPModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

train_x = torch.tensor([[1.0], [2.0]])
train_y = torch.tensor([0.0, 0.0])
test_x = torch.tensor([[3.0]])
likelihood = gpytorch.likelihoods.GaussianLikelihood()
noise_model = ExactGPModel(train_x, train_y, likelihood).to(torch.double)
noise_model(train_x)
final_likelihood = gpytorch.likelihoods.HeteroskedasticNoise(noise_model)
assert noise_model.training and final_likelihood.training

# under a normal lengthscale, our likelihood works as expected
noise_model.covar_module.base_kernel.raw_lengthscale.data[[0]] = 0
print(final_likelihood(test_x).to_dense())

# now assume due to an imperfect optimizer the lengthscale got really low
noise_model.covar_module.base_kernel.raw_lengthscale.data[[0]] = -720
assert 0 < noise_model.covar_module.base_kernel.lengthscale < 1e-310

# as a result, we got a numerical error whenever we try to eval on noise_model
noise_model.eval()
try:
    print(noise_model(test_x))
except Exception as e:
    print("Error:", e)
noise_model.train()

# now we run the final_likelihood which ends in another error
try:
    print(final_likelihood(test_x).to_dense())
except Exception as e:
    print("Error:", e)

# after the call, noise_model is still in evaluation mode, so the cache is not cleared
assert final_likelihood.training and not noise_model.training

# even if we reset lengthscale back to normal, it still cannot give the correct likelihood
noise_model.covar_module.base_kernel.raw_lengthscale.data[[0]] = 0
try:
    print(final_likelihood(test_x).to_dense())
except Exception as e:
    print("Error:", e)

# works after calling train() to clear the cache
noise_model.train()
print(final_likelihood(test_x).to_dense())

We also believe it resolves pytorch/botorch#1386 (replicated pytorch/botorch#1386 (comment) and our patch successfully fixed it).

Balandat

Thanks for the PR. This makes sense to me! Though I think the unit test needs updating.

test/likelihoods/test_noise_models.py

Balandat · 2023-07-22T16:35:56Z

test/likelihoods/test_noise_models.py

+
+
+class TestNoiseModels(unittest.TestCase):
+    def test_heteroskedasticnoise_error(self):


Looks like this is just testing the NumericallyUnstableModelExample but not the actual code of the noise model?

We are testing noise model HeteroskedasticNoise here by wrapping it around NumericallyUnstableModelExample. Or should I change the name of this class?

The issue was that the previous code didn't actually test that things were reset back, seems to be fixed now.

Balandat · 2023-07-25T17:38:24Z

test/likelihoods/test_noise_models.py

+
+
+class TestNoiseModels(unittest.TestCase):
+    def test_heteroskedasticnoise_error(self):


The issue was that the previous code didn't actually test that things were reset back, seems to be fixed now.

fix HeteroskedasticNoise exception handling

45a96e8

Balandat reviewed Jul 22, 2023

View reviewed changes

update unit test of HeteroskedasticNoise

45b28a7

Balandat approved these changes Jul 25, 2023

View reviewed changes

Balandat enabled auto-merge July 25, 2023 17:47

Balandat merged commit 090d6e1 into cornellius-gp:master Jul 25, 2023

esantorella mentioned this pull request Jun 23, 2024

[Bug] fit_gpytorch_mll gives backward pass runtime exception on second model fit attempt with HeteroskedasticSingleTaskGP pytorch/botorch#2370

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix training status of noise model of `HeteroskedasticNoise` after exceptions #2382

Fix training status of noise model of `HeteroskedasticNoise` after exceptions #2382

fjzzq2002 commented Jul 20, 2023

Balandat left a comment

Balandat Jul 22, 2023

fjzzq2002 Jul 24, 2023 •

edited

Loading

Balandat Jul 25, 2023

Balandat Jul 25, 2023



		class TestNoiseModels(unittest.TestCase):
		def test_heteroskedasticnoise_error(self):

Fix training status of noise model of HeteroskedasticNoise after exceptions #2382

Fix training status of noise model of HeteroskedasticNoise after exceptions #2382

Conversation

fjzzq2002 commented Jul 20, 2023

Balandat left a comment

Choose a reason for hiding this comment

Balandat Jul 22, 2023

Choose a reason for hiding this comment

fjzzq2002 Jul 24, 2023 • edited Loading

Choose a reason for hiding this comment

Balandat Jul 25, 2023

Choose a reason for hiding this comment

Balandat Jul 25, 2023

Choose a reason for hiding this comment

Fix training status of noise model of `HeteroskedasticNoise` after exceptions #2382

Fix training status of noise model of `HeteroskedasticNoise` after exceptions #2382

fjzzq2002 Jul 24, 2023 •

edited

Loading