diff --git a/examples/08_Advanced_Usage/SVGP_Model_Updating.ipynb b/examples/08_Advanced_Usage/SVGP_Model_Updating.ipynb new file mode 100644 index 000000000..40e85535d --- /dev/null +++ b/examples/08_Advanced_Usage/SVGP_Model_Updating.ipynb @@ -0,0 +1,476 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "id": "3403d231", + "metadata": {}, + "outputs": [], + "source": [ + "import tqdm\n", + "import math\n", + "import torch\n", + "import gpytorch\n", + "from matplotlib import pyplot as plt\n", + "\n", + "# Make plots inline\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "markdown", + "id": "dd5cad91", + "metadata": {}, + "source": [ + "## SVGP Model Updating\n", + "\n", + "In this notebook, we will be describing a \"fantasy model\" strategy for stochastic variational GPs (SVGPs) analogous to fantasy modelling for exact GPs. \n", + "\n", + "To understand what a \"fantasy model\" is, we first think about exact GPs. Imagine, we have trained a GP on some data $\\mathcal{D} := \\{x_i, y_i\\}_{i=1}^N$, which is the same as saying that $\\mathbf{y} \\sim \\mathcal{GP}(\\mu(\\mathbf{x}), K(\\mathbf{x}, \\mathbf{x}'))$. \n", + "\n", + "If we observe some new data $\\mathcal{D}^*:= \\{x_j, y_j\\}_{j=1}^{N^*}$, then that data is easily incorporated into our GP model as $(\\mathbf{y}, \\mathbf{y}^*) \\sim \\mathcal{GP}(\\mu([\\mathbf{x}, \\mathbf{x}^*]), K([\\mathbf{x}, \\mathbf{x}^*], [\\mathbf{x}, \\mathbf{x}^*]')$.\n", + "To compute predictions with this new model (conditional on the same hyper-parameters), we could use the following piece of code for an exact GP:\n", + "\n", + "```python\n", + "updated_model = deepcopy(model)\n", + "updated_model.set_train_data(torch.cat((train_x, new_x)), torch.cat((train_y, new_y)), strict=False)\n", + "```\n", + "\n", + "or we could take advantage of linear algebraic identies to efficiently produce the same model, which is the `get_fantasy_model` function for exact GPs in GPyTorch:\n", + "```python\n", + "updated_model = model.get_fantasy_model(new_x, new_y)\n", + "```\n", + "\n", + "The second approach is significantly more computationally efficient, costing $\\mathcal{O}((N^*)^2 N)$ time versus $\\mathcal{O}((N + N^*)^3)$ time.\n", + "\n", + "In this tutorial notebook, we describe the **online variational conditioning** (OVC) approach of [Maddox et al, '21](https://arxiv.org/abs/2110.15172) which provides a closed form method for updating SVGPs in the same manner as exact GPs are updated with respect ot new data, via the usage of the `get_fantasy_model` method." + ] + }, + { + "cell_type": "markdown", + "id": "6bc1b924", + "metadata": {}, + "source": [ + "### Training Data\n", + "\n", + "First, we construct some training data, here $250$ data points from a noisy sine wave." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "81daa65c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Text(0, 0.5, 'y')" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "train_x = torch.linspace(0, 3, 250).view(-1, 1).contiguous()\n", + "train_y = torch.sin(6. * train_x) + 0.3 * torch.randn_like(train_x)\n", + "\n", + "plt.scatter(train_x, train_y, marker = \"*\", color = \"black\")\n", + "plt.xlabel(\"x\")\n", + "plt.ylabel(\"y\")" + ] + }, + { + "cell_type": "markdown", + "id": "a1d8b8f0", + "metadata": {}, + "source": [ + "### Model definition\n", + "\n", + "Next, we define our model class definition. The only difference from a standard approximate GP is that we require the likelihood object to be a) Gaussian (for now) and b) to be stored inside of the `ApproximateGP` object." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "50f30949", + "metadata": {}, + "outputs": [], + "source": [ + "from gpytorch.models import ApproximateGP\n", + "from gpytorch.variational import CholeskyVariationalDistribution\n", + "from gpytorch.variational import VariationalStrategy\n", + "\n", + "class GPModel(ApproximateGP):\n", + " def __init__(self, inducing_points, likelihood):\n", + " variational_distribution = CholeskyVariationalDistribution(inducing_points.size(0))\n", + " variational_strategy = VariationalStrategy(self, inducing_points, variational_distribution, learn_inducing_locations=True)\n", + " super(GPModel, self).__init__(variational_strategy)\n", + " self.mean_module = gpytorch.means.ConstantMean()\n", + " self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())\n", + " self.likelihood = likelihood\n", + " \n", + " def forward(self, x):\n", + " mean_x = self.mean_module(x)\n", + " covar_x = self.covar_module(x)\n", + " return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)" + ] + }, + { + "cell_type": "markdown", + "id": "9c19020b", + "metadata": {}, + "source": [ + "We initialize the SVGP with $25$ inducing points." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "9cfef053", + "metadata": {}, + "outputs": [], + "source": [ + "likelihood = gpytorch.likelihoods.GaussianLikelihood()\n", + "model = GPModel(torch.randn(25, 1) + 2., likelihood)\n" + ] + }, + { + "cell_type": "markdown", + "id": "11113b9d", + "metadata": {}, + "source": [ + "### Model Training\n", + "\n", + "As we don't have a lot of data, we train the model with full-batch (although this isn't a restriction) and for $500$ iterations (b/c our choice of inducing points may not have been very good)." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "990fae51", + "metadata": {}, + "outputs": [], + "source": [ + "model.train()\n", + "likelihood.train()\n", + "\n", + "optimizer = torch.optim.Adam([\n", + " {'params': model.parameters()},\n", + " # {'params': likelihood.parameters()},\n", + "], lr=0.1)\n", + "\n", + "# Our loss object. We're using the VariationalELBO\n", + "mll = gpytorch.mlls.VariationalELBO(likelihood, model, num_data=train_y.size(0))" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "25e5394a", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Iteration: 0 \t Loss: 1.6754810810089111\n", + "Iteration: 50 \t Loss: 0.5079809427261353\n", + "Iteration: 100 \t Loss: 0.39197731018066406\n", + "Iteration: 150 \t Loss: 0.36815035343170166\n", + "Iteration: 200 \t Loss: 0.3656342625617981\n", + "Iteration: 250 \t Loss: 0.3653048574924469\n", + "Iteration: 300 \t Loss: 0.3654007315635681\n", + "Iteration: 350 \t Loss: 0.3680660128593445\n", + "Iteration: 400 \t Loss: 0.3646673262119293\n", + "Iteration: 450 \t Loss: 0.36463457345962524\n", + "Iteration: 500 \t Loss: 0.36551928520202637\n" + ] + } + ], + "source": [ + "iters = 500 + 1\n", + "\n", + "for i in range(iters):\n", + " optimizer.zero_grad()\n", + " output = model(train_x)\n", + " loss = -mll(output, train_y.squeeze())\n", + " loss.backward()\n", + " optimizer.step()\n", + " if i % 50 == 0:\n", + " print(\"Iteration: \", i, \"\\t Loss:\", loss.item())" + ] + }, + { + "cell_type": "markdown", + "id": "20163340", + "metadata": {}, + "source": [ + "### Model Evaluation\n", + "\n", + "Now, that we've trained our SVGP, we choose some data to evaluate it on -- here $250$ data points from $[0, 8]$ to illustrate the performance both for interpolation (on $[0,3]$) and extrapolation (on $[3, 8]$)." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "874a34f0", + "metadata": {}, + "outputs": [], + "source": [ + "model.eval()\n", + "likelihood.eval()\n", + "\n", + "test_x = torch.linspace(0, 8, 250).view(-1,1)\n", + "with torch.no_grad():\n", + " posterior = likelihood(model(test_x))" + ] + }, + { + "cell_type": "markdown", + "id": "9e2093f8", + "metadata": {}, + "source": [ + "As expected, the posterior model fits the training data well but reverts to a zero mean and high prediction outside of the region of the training data." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "d56cc9f1", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Text(0, 0.5, 'y')" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "plt.plot(test_x, posterior.mean, color = \"blue\", label = \"Post Mean\")\n", + "plt.fill_between(test_x.squeeze(), *posterior.confidence_region(), color = \"blue\", alpha = 0.3, label = \"Post Conf Region\")\n", + "plt.scatter(train_x, train_y, color = \"black\", marker = \"*\", alpha = 0.5, label = \"Training Data\")\n", + "plt.legend()\n", + "plt.xlabel(\"x\")\n", + "plt.ylabel(\"y\")" + ] + }, + { + "cell_type": "markdown", + "id": "d3f79b90", + "metadata": {}, + "source": [ + "### Model Updating\n", + "\n", + "Now, we choose $25$ points to condition the model on -- imagining that these data points have just been acquired, perhaps from an active learning or Bayesian optimization loop." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "0896fe69", + "metadata": {}, + "outputs": [], + "source": [ + "val_x = torch.linspace(3, 5, 25).view(-1,1)\n", + "val_y = torch.sin(6. * val_x) + 0.3 * torch.randn_like(val_x)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "2bbfaf5f", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/wesleymaddox/Documents/GitHub/wjm_gpytorch/gpytorch/utils/cholesky.py:40: NumericalWarning: A not p.d., added jitter of 1.0e-06 to the diagonal\n", + " warnings.warn(\n", + "/Users/wesleymaddox/Documents/GitHub/wjm_gpytorch/gpytorch/utils/cholesky.py:40: NumericalWarning: A not p.d., added jitter of 1.0e-05 to the diagonal\n", + " warnings.warn(\n", + "/Users/wesleymaddox/Documents/GitHub/wjm_gpytorch/gpytorch/utils/cholesky.py:40: NumericalWarning: A not p.d., added jitter of 1.0e-04 to the diagonal\n", + " warnings.warn(\n" + ] + } + ], + "source": [ + "cond_model = model.variational_strategy.get_fantasy_model(inputs=val_x, targets=val_y.squeeze())" + ] + }, + { + "cell_type": "markdown", + "id": "fb931dc8", + "metadata": {}, + "source": [ + "Note that the updated model returned is an ExactGP class rather than a SVGP." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "ac5e171c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "_BaseExactGP(\n", + " (likelihood): GaussianLikelihood(\n", + " (noise_covar): HomoskedasticNoise(\n", + " (raw_noise_constraint): GreaterThan(1.000E-04)\n", + " )\n", + " )\n", + " (mean_module): ConstantMean()\n", + " (covar_module): ScaleKernel(\n", + " (base_kernel): RBFKernel(\n", + " (raw_lengthscale_constraint): Positive()\n", + " (distance_module): None\n", + " )\n", + " (raw_outputscale_constraint): Positive()\n", + " )\n", + ")" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "cond_model" + ] + }, + { + "cell_type": "markdown", + "id": "4f929bf8", + "metadata": {}, + "source": [ + "We compute its posterior distribution on the same testing dataset as before." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "f67ed240", + "metadata": {}, + "outputs": [], + "source": [ + "with torch.no_grad():\n", + " updated_posterior = cond_model.likelihood(cond_model(test_x))" + ] + }, + { + "cell_type": "markdown", + "id": "52238773", + "metadata": {}, + "source": [ + "Finally, we plot the updated model, showing that the model has been updated to the newly observed data (grey) without forgetting the previous training data (black)." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "a914fd04", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Text(0, 0.5, 'y')" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "plt.plot(test_x, posterior.mean, color = \"blue\", label = \"Post Mean\")\n", + "plt.fill_between(test_x.squeeze(), *posterior.confidence_region(), color = \"blue\", alpha = 0.3, label = \"Post Conf Region\")\n", + "plt.scatter(train_x, train_y, color = \"black\", marker = \"*\", alpha = 0.5, label = \"Training Data\")\n", + "\n", + "plt.plot(test_x, updated_posterior.mean, color = \"orange\", label = \"Fant Mean\")\n", + "plt.fill_between(test_x.squeeze(), *updated_posterior.confidence_region(), color = \"orange\", alpha = 0.3, label = \"Fant Conf Region\")\n", + "\n", + "plt.scatter(val_x, val_y, color = \"grey\", marker = \"*\", alpha = 0.5, label = \"New Data\")\n", + "plt.legend()\n", + "plt.xlabel(\"x\")\n", + "plt.ylabel(\"y\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0a7b20f6", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.5" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/examples/08_Advanced_Usage/index.rst b/examples/08_Advanced_Usage/index.rst index b6026d702..d97466760 100644 --- a/examples/08_Advanced_Usage/index.rst +++ b/examples/08_Advanced_Usage/index.rst @@ -59,6 +59,14 @@ See the `1D derivatives GP example`_ or the `2D derivatives GP example`_ for exa Simple_Batch_Mode_GP_Regression.ipynb: +Variational Fantasization +---------------------------------- +We also include an example of how to perform fantasy modelling (e.g. efficient, closed form updates) for variational +Gaussian process models, enabling their usage for lookahead optimization. + +.. _Variational fantasization: + SVGP_Model_Updating.ipynb + Converting Models to TorchScript ---------------------------------- @@ -73,3 +81,4 @@ how to convert both an exact GP and a variational GP to a ScriptModule that can TorchScript_Exact_Models.ipynb TorchScript_Variational_Models.ipynb + SVGP_Model_Updating.ipynb diff --git a/gpytorch/models/approximate_gp.py b/gpytorch/models/approximate_gp.py index 88c0022f0..85e2674f4 100644 --- a/gpytorch/models/approximate_gp.py +++ b/gpytorch/models/approximate_gp.py @@ -75,6 +75,33 @@ def pyro_model(self, input, beta=1.0, name_prefix=""): """ return super().pyro_model(input, beta=beta, name_prefix=name_prefix) + def get_fantasy_model(self, inputs, targets, **kwargs): + r""" + Returns a new GP model that incorporates the specified inputs and targets as new training data using + online variational conditioning (OVC). + + This function first casts the inducing points and variational parameters into pseudo-points before + returning an equivalent ExactGP model with a specialized likelihood. + + .. note:: + If `targets` is a batch (e.g. `b x m`), then the GP returned from this method will be a batch mode GP. + If `inputs` is of the same (or lesser) dimension as `targets`, then it is assumed that the fantasy points + are the same for each target batch. + + :param torch.Tensor inputs: (`b1 x ... x bk x m x d` or `f x b1 x ... x bk x m x d`) Locations of fantasy + observations. + :param torch.Tensor targets: (`b1 x ... x bk x m` or `f x b1 x ... x bk x m`) Labels of fantasy observations. + :return: An `ExactGP` model with `n + m` training examples, where the `m` fantasy examples have been added + and all test-time caches have been updated. + :rtype: ~gpytorch.models.ExactGP + + Reference: "Conditioning Sparse Variational Gaussian Processes for Online Decision-Making," + Maddox, Stanton, Wilson, NeurIPS, '21 + https://papers.nips.cc/paper/2021/hash/325eaeac5bef34937cfdc1bd73034d17-Abstract.html + + """ + return self.variational_strategy.get_fantasy_model(inputs=inputs, targets=targets, **kwargs) + def __call__(self, inputs, prior=False, **kwargs): if inputs.dim() == 1: inputs = inputs.unsqueeze(-1) diff --git a/gpytorch/test/variational_test_case.py b/gpytorch/test/variational_test_case.py index 57943a323..077bf88a7 100644 --- a/gpytorch/test/variational_test_case.py +++ b/gpytorch/test/variational_test_case.py @@ -95,6 +95,28 @@ def _eval_iter(self, model, batch_shape=torch.Size([]), cuda=False): return output + def _fantasy_iter( + self, + model, + likelihood, + batch_shape=torch.Size([]), + cuda=False, + num_fant=10, + covar_module=None, + mean_module=None, + ): + model.likelihood = likelihood + val_x = torch.randn(*batch_shape, num_fant, 2).clamp(-2.5, 2.5) + val_y = torch.linspace(-1, 1, num_fant) + val_y = val_y.view(num_fant, *([1] * (len(self.event_shape) - 1))) + val_y = val_y.expand(*batch_shape, num_fant, *self.event_shape[1:]) + if cuda: + model = model.cuda() + val_x = val_x.cuda() + val_y = val_y.cuda() + updated_model = model.get_fantasy_model(val_x, val_y, covar_module=covar_module, mean_module=mean_module) + return updated_model + @abstractproperty def batch_shape(self): raise NotImplementedError @@ -272,3 +294,102 @@ def test_training_all_batch_zero_mean(self): expected_batch_shape=(torch.Size([3, 4]) + self.batch_shape), constant_mean=False, ) + + def test_fantasy_call( + self, + data_batch_shape=None, + inducing_batch_shape=None, + model_batch_shape=None, + expected_batch_shape=None, + constant_mean=True, + ): + # Batch shapes + model_batch_shape = model_batch_shape if model_batch_shape is not None else self.batch_shape + data_batch_shape = data_batch_shape if data_batch_shape is not None else self.batch_shape + inducing_batch_shape = inducing_batch_shape if inducing_batch_shape is not None else self.batch_shape + expected_batch_shape = expected_batch_shape if expected_batch_shape is not None else self.batch_shape + + num_inducing = 16 + num_fant = 10 + # Make model and likelihood + model, likelihood = self._make_model_and_likelihood( + batch_shape=model_batch_shape, + inducing_batch_shape=inducing_batch_shape, + distribution_cls=self.distribution_cls, + strategy_cls=self.strategy_cls, + constant_mean=constant_mean, + num_inducing=num_inducing, + ) + + # we iterate through the covar and mean module possible settings + covar_mean_options = [ + {"covar_module": None, "mean_module": None}, + {"covar_module": gpytorch.kernels.MaternKernel(), "mean_module": gpytorch.means.ZeroMean()}, + ] + for cm_dict in covar_mean_options: + fant_model = self._fantasy_iter( + model, likelihood, data_batch_shape, self.cuda, num_fant=num_fant, **cm_dict + ) + self.assertTrue(isinstance(fant_model, gpytorch.models.ExactGP)) + + # we check to ensure setting the covar_module and mean_modules are okay + if cm_dict["covar_module"] is None: + self.assertEqual(type(fant_model.covar_module), type(model.covar_module)) + else: + self.assertNotEqual(type(fant_model.covar_module), type(model.covar_module)) + if cm_dict["mean_module"] is None: + self.assertEqual(type(fant_model.mean_module), type(model.mean_module)) + else: + self.assertNotEqual(type(fant_model.mean_module), type(model.mean_module)) + + # now we check to ensure the shapes of the fantasy strategy are correct + self.assertTrue(fant_model.prediction_strategy is not None) + for key in fant_model.prediction_strategy._memoize_cache.keys(): + if key[0] == "mean_cache": + break + mean_cache = fant_model.prediction_strategy._memoize_cache[key] + self.assertEqual(mean_cache.shape, torch.Size([*expected_batch_shape, num_inducing + num_fant])) + + # we remove the mean_module and covar_module and check for errors + del model.mean_module + with self.assertRaises(ModuleNotFoundError): + self._fantasy_iter(model, likelihood, data_batch_shape, self.cuda, num_fant=num_fant) + + model.mean_module = gpytorch.means.ZeroMean() + del model.covar_module + with self.assertRaises(ModuleNotFoundError): + self._fantasy_iter(model, likelihood, data_batch_shape, self.cuda, num_fant=num_fant) + + # finally we check to ensure failure for a non-gaussian likelihood + with self.assertRaises(NotImplementedError): + self._fantasy_iter( + model, + gpytorch.likelihoods.BernoulliLikelihood(), + data_batch_shape, + self.cuda, + num_fant=num_fant, + ) + + def test_fantasy_call_batch_inducing(self): + return self.test_fantasy_call( + model_batch_shape=(torch.Size([3]) + self.batch_shape), + data_batch_shape=self.batch_shape, + inducing_batch_shape=(torch.Size([3]) + self.batch_shape), + expected_batch_shape=(torch.Size([3]) + self.batch_shape), + ) + + def test_fantasy_call_batch_data(self): + return self.test_fantasy_call( + model_batch_shape=self.batch_shape, + inducing_batch_shape=self.batch_shape, + data_batch_shape=(torch.Size([3]) + self.batch_shape), + expected_batch_shape=(torch.Size([3]) + self.batch_shape), + ) + + def test_fantasy_call_batch_model(self): + return self.test_fantasy_call( + model_batch_shape=(torch.Size([3]) + self.batch_shape), + inducing_batch_shape=self.batch_shape, + data_batch_shape=self.batch_shape, + expected_batch_shape=(torch.Size([3]) + self.batch_shape), + ) diff --git a/gpytorch/variational/_variational_strategy.py b/gpytorch/variational/_variational_strategy.py index faff22133..779d1fc04 100644 --- a/gpytorch/variational/_variational_strategy.py +++ b/gpytorch/variational/_variational_strategy.py @@ -1,14 +1,38 @@ #!/usr/bin/env python3 +import functools from abc import ABC, abstractproperty +from copy import deepcopy import torch from .. import settings from ..distributions import Delta, MultivariateNormal +from ..likelihoods import GaussianLikelihood +from ..models import ExactGP from ..module import Module from ..utils.broadcasting import _mul_broadcast_shape -from ..utils.memoize import cached, clear_cache_hook +from ..utils.memoize import add_to_cache, cached, clear_cache_hook + + +class _BaseExactGP(ExactGP): + def __init__(self, train_inputs, train_targets, likelihood, mean_module, covar_module): + super().__init__(train_inputs, train_targets, likelihood) + self.mean_module = mean_module + self.covar_module = covar_module + + def forward(self, x): + mean = self.mean_module(x) + covar = self.covar_module(x) + return MultivariateNormal(mean, covar) + + +def _add_cache_hook(tsr, pred_strat): + if tsr.grad_fn is not None: + wrapper = functools.partial(clear_cache_hook, pred_strat) + functools.update_wrapper(wrapper, clear_cache_hook) + tsr.grad_fn.register_hook(wrapper) + return tsr class _VariationalStrategy(Module, ABC): @@ -16,6 +40,8 @@ class _VariationalStrategy(Module, ABC): Abstract base class for all Variational Strategies. """ + has_fantasy_strategy = False + def __init__(self, model, inducing_points, variational_distribution, learn_inducing_locations=True): super().__init__() @@ -97,6 +123,158 @@ def kl_divergence(self): kl_divergence = torch.distributions.kl.kl_divergence(self.variational_distribution, self.prior_distribution) return kl_divergence + @cached(name="amortized_exact_gp") + def amortized_exact_gp(self, mean_module=None, covar_module=None): + mean_module = self.model.mean_module if mean_module is None else mean_module + covar_module = self.model.covar_module if covar_module is None else covar_module + + with torch.no_grad(): + # from here on down, we refer to the inducing points as pseudo_inputs + pseudo_target_covar, pseudo_target_mean = self.pseudo_points + pseudo_inputs = self.inducing_points.detach() + if pseudo_inputs.ndim < pseudo_target_mean.ndim: + pseudo_inputs = pseudo_inputs.expand(*pseudo_target_mean.shape[:-2], *pseudo_inputs.shape) + # TODO: add flag for conditioning into SGPR after building fantasy strategy for SGPR + new_covar_module = deepcopy(covar_module) + + # update inducing mean if necessary + pseudo_target_mean = pseudo_target_mean.squeeze() + mean_module(pseudo_inputs) + + inducing_exact_model = _BaseExactGP( + pseudo_inputs, + pseudo_target_mean, + mean_module=deepcopy(mean_module), + covar_module=new_covar_module, + likelihood=deepcopy(self.model.likelihood), + ) + + # now fantasize around this model + # as this model is new, we need to compute a posterior to construct the prediction strategy + # which uses the likelihood pseudo caches + faked_points = torch.randn( + *pseudo_target_mean.shape[:-2], + 1, + pseudo_inputs.shape[-1], + device=pseudo_inputs.device, + dtype=pseudo_inputs.dtype, + ) + inducing_exact_model.eval() + _ = inducing_exact_model(faked_points) + + # then we overwrite the likelihood to take into account the multivariate normal term + pred_strat = inducing_exact_model.prediction_strategy + pred_strat._memoize_cache = {} + with torch.no_grad(): + updated_lik_train_train_covar = pred_strat.train_prior_dist.lazy_covariance_matrix + pseudo_target_covar + pred_strat.lik_train_train_covar = updated_lik_train_train_covar + + # do the mean cache because the mean cache doesn't solve against lik_train_train_covar + train_mean = inducing_exact_model.mean_module(*inducing_exact_model.train_inputs) + train_labels_offset = (inducing_exact_model.prediction_strategy.train_labels - train_mean).unsqueeze(-1) + mean_cache = updated_lik_train_train_covar.inv_matmul(train_labels_offset).squeeze(-1) + mean_cache = _add_cache_hook(mean_cache, inducing_exact_model.prediction_strategy) + add_to_cache(pred_strat, "mean_cache", mean_cache) + # TODO: check to see if we need to do the covar_cache? + + inducing_exact_model.prediction_strategy = pred_strat + return inducing_exact_model + + def pseudo_points(self): + raise NotImplementedError("Each variational strategy must implement its own pseudo points method") + + def get_fantasy_model( + self, + inputs, + targets, + mean_module=None, + covar_module=None, + **kwargs, + ): + r""" + Performs the online variational conditioning (OVC) strategy of Maddox et al, '21 to return + an exact GP model that incorporates the inputs and targets alongside the variational model's inducing + points and targets. + + Currently, instead of directly updating the variational parameters (and inducing points), we instead + return an ExactGP model rather than an updated variational GP model. This is done primarily for + numerical stability. + + Unlike the ExactGP's call for get_fantasy_model, we enable options for mean_module and covar_module + that allow specification of the mean / covariance. We expect that either the mean and covariance + modules are attributes of the model itself called mean_module and covar_module respectively OR that you + pass them into this method explicitly. + + :param torch.Tensor inputs: (`b1 x ... x bk x m x d` or `f x b1 x ... x bk x m x d`) Locations of fantasy + observations. + :param torch.Tensor targets: (`b1 x ... x bk x m` or `f x b1 x ... x bk x m`) Labels of fantasy observations. + :param torch.nn.Module mean_module: torch module describing the mean function of the GP model. Optional if + `mean_module` is already an attribute of the variational GP. + :param torch.nn.Module covar_module: torch module describing the covariance function of the GP model. Optional + if `covar_module` is already an attribute of the variational GP. + :return: An `ExactGP` model with `k + m` training examples, where the `m` fantasy examples have been added + and all test-time caches have been updated. We assume that there are `k` inducing points in this variational + GP. Note that we return an `ExactGP` rather than a variational GP. + :rtype: ~gpytorch.models.ExactGP + + Reference: "Conditioning Sparse Variational Gaussian Processes for Online Decision-Making," + Maddox, Stanton, Wilson, NeurIPS, '21 + https://papers.nips.cc/paper/2021/hash/325eaeac5bef34937cfdc1bd73034d17-Abstract.html + """ + + # currently, we only support fantasization for CholeskyVariationalDistribution and + # whitened / unwhitened variational strategies + if not self.has_fantasy_strategy: + raise NotImplementedError( + "No fantasy model support for ", + self.__name__, + ". Only VariationalStrategy and UnwhitenedVariationalStrategy are currently supported.", + ) + if not isinstance(self.model.likelihood, GaussianLikelihood): + raise NotImplementedError( + "No fantasy model support for ", + self.model.likelihood, + ". Only GaussianLikelihoods are currently supported.", + ) + # we assume that either the user has given the model a mean_module and a covar_module + # or that it will be passed into the get_fantasy_model function. we check for these. + if mean_module is None: + mean_module = getattr(self.model, "mean_module", None) + if mean_module is None: + raise ModuleNotFoundError( + "Either you must provide a mean_module as input to get_fantasy_model", + "or it must be an attribute of the model called mean_module.", + ) + if covar_module is None: + covar_module = getattr(self.model, "covar_module", None) + if covar_module is None: + # raise an error + raise ModuleNotFoundError( + "Either you must provide a covar_module as input to get_fantasy_model", + "or it must be an attribute of the model called covar_module.", + ) + + # first we construct an exact model over the inducing points with the inducing covariance + # matrix + inducing_exact_model = self.amortized_exact_gp(mean_module=mean_module, covar_module=covar_module) + + # then we update this model by adding in the inputs and pseudo targets + # finally we fantasize wrt targets + fantasy_model = inducing_exact_model.get_fantasy_model(inputs, targets, **kwargs) + fant_pred_strat = fantasy_model.prediction_strategy + + # first we update the lik_train_train_covar + # do the mean cache again because the mean cache resets the likelihood forward + train_mean = fantasy_model.mean_module(*fantasy_model.train_inputs) + train_labels_offset = (fant_pred_strat.train_labels - train_mean).unsqueeze(-1) + fantasy_lik_train_root_inv = fant_pred_strat.lik_train_train_covar.root_inv_decomposition() + mean_cache = fantasy_lik_train_root_inv.matmul(train_labels_offset).squeeze(-1) + mean_cache = _add_cache_hook(mean_cache, fant_pred_strat) + add_to_cache(fant_pred_strat, "mean_cache", mean_cache) + # TODO: should we update the covar_cache? + + fantasy_model.prediction_strategy = fant_pred_strat + return fantasy_model + def __call__(self, x, prior=False, **kwargs): # If we're in prior mode, then we're done! if prior: diff --git a/gpytorch/variational/unwhitened_variational_strategy.py b/gpytorch/variational/unwhitened_variational_strategy.py index f3985d473..bfe1c5d64 100644 --- a/gpytorch/variational/unwhitened_variational_strategy.py +++ b/gpytorch/variational/unwhitened_variational_strategy.py @@ -4,6 +4,8 @@ import torch +from gpytorch.variational.cholesky_variational_distribution import CholeskyVariationalDistribution + from .. import settings from ..distributions import MultivariateNormal from ..lazy import ( @@ -17,6 +19,7 @@ ) from ..utils.broadcasting import _mul_broadcast_shape from ..utils.cholesky import psd_safe_cholesky +from ..utils.errors import NotPSDError from ..utils.memoize import add_to_cache, cached from ._variational_strategy import _VariationalStrategy @@ -44,6 +47,7 @@ class UnwhitenedVariationalStrategy(_VariationalStrategy): the inducing point locations :math:`\mathbf Z` should be learned (i.e. are they parameters of the model). """ + has_fantasy_strategy = True @cached(name="cholesky_factor", ignore_args=True) def _cholesky_factor(self, induc_induc_covar): @@ -58,6 +62,58 @@ def prior_distribution(self): res = MultivariateNormal(out.mean, out.lazy_covariance_matrix.add_jitter()) return res + @property + @cached(name="pseudo_points_memo") + def pseudo_points(self): + # TODO: implement for other distributions + # retrieve the variational mean, m and covariance matrix, S. + if not isinstance(self._variational_distribution, CholeskyVariationalDistribution): + raise NotImplementedError( + "Only CholeskyVariationalDistribution has pseudo-point support currently, ", + "but your _variational_distribution is a ", + self._variational_distribution.__name__, + ) + + # retrieve the variational mean, m and covariance matrix, S. + var_cov_root = TriangularLazyTensor(self._variational_distribution.chol_variational_covar) + var_cov = CholLazyTensor(var_cov_root) + var_mean = self.variational_distribution.mean # .unsqueeze(-1) + if var_mean.shape[-1] != 1: + var_mean = var_mean.unsqueeze(-1) + + # R = K - S + Kmm = self.model.covar_module(self.inducing_points) + res = Kmm - var_cov + + cov_diff = res + + # D_a = (S^{-1} - K^{-1})^{-1} = S + S R^{-1} S + # note that in the whitened case R = I - S, unwhitened R = K - S + # we compute (R R^{T})^{-1} R^T S for stability reasons as R is probably not PSD. + eval_lhs = var_cov.evaluate() + eval_rhs = cov_diff.transpose(-1, -2).matmul(eval_lhs) + inner_term = cov_diff.matmul(cov_diff.transpose(-1, -2)) + # TODO: flag the jitter here + inner_solve = inner_term.add_jitter(1e-3).inv_matmul(eval_rhs, eval_lhs.transpose(-1, -2)) + inducing_covar = var_cov + inner_solve + + # mean term: D_a S^{-1} m + # unwhitened: (S - S R^{-1} S) S^{-1} m = (I - S R^{-1}) m + rhs = cov_diff.transpose(-1, -2).matmul(var_mean) + inner_rhs_mean_solve = inner_term.add_jitter(1e-3).inv_matmul(rhs) + pseudo_target_mean = var_mean + var_cov.matmul(inner_rhs_mean_solve) + + # ensure inducing covar is psd + try: + pseudo_target_covar = CholLazyTensor(inducing_covar.add_jitter(1e-3).cholesky()).evaluate() + except NotPSDError: + from gpytorch.lazy import DiagLazyTensor + + evals, evecs = inducing_covar.symeig(eigenvectors=True) + pseudo_target_covar = evecs.matmul(DiagLazyTensor(evals + 1e-4)).matmul(evecs.transpose(-1, -2)).evaluate() + + return pseudo_target_covar, pseudo_target_mean + def forward(self, x, inducing_points, inducing_values, variational_inducing_covar=None): # If our points equal the inducing points, we're done if torch.equal(x, inducing_points): diff --git a/gpytorch/variational/variational_strategy.py b/gpytorch/variational/variational_strategy.py index 5addd5b38..c40301bf1 100644 --- a/gpytorch/variational/variational_strategy.py +++ b/gpytorch/variational/variational_strategy.py @@ -4,14 +4,26 @@ import torch +from gpytorch.variational._variational_strategy import _VariationalStrategy +from gpytorch.variational.cholesky_variational_distribution import CholeskyVariationalDistribution + from ..distributions import MultivariateNormal -from ..lazy import DiagLazyTensor, MatmulLazyTensor, RootLazyTensor, SumLazyTensor, TriangularLazyTensor, delazify +from ..lazy import ( + CholLazyTensor, + DiagLazyTensor, + MatmulLazyTensor, + RootLazyTensor, + SumLazyTensor, + TriangularLazyTensor, + delazify, +) from ..settings import _linalg_dtype_cholesky, trace_mode from ..utils.cholesky import psd_safe_cholesky -from ..utils.errors import CachingError +from ..utils.errors import CachingError, NotPSDError from ..utils.memoize import cached, clear_cache_hook, pop_from_cache_ignore_args from ..utils.warnings import OldVersionWarning -from ._variational_strategy import _VariationalStrategy + +# from ._variational_strategy import _VariationalStrategy def _ensure_updated_strategy_flag_set( @@ -67,6 +79,8 @@ def __init__(self, model, inducing_points, variational_distribution, learn_induc self.register_buffer("updated_strategy", torch.tensor(True)) self._register_load_state_dict_pre_hook(_ensure_updated_strategy_flag_set) + self.has_fantasy_strategy = True + @cached(name="cholesky_factor", ignore_args=True) def _cholesky_factor(self, induc_induc_covar): L = psd_safe_cholesky(delazify(induc_induc_covar).type(_linalg_dtype_cholesky.value())) @@ -84,6 +98,65 @@ def prior_distribution(self): res = MultivariateNormal(zeros, DiagLazyTensor(ones)) return res + @property + @cached(name="pseudo_points_memo") + def pseudo_points(self): + # TODO: have var_mean, var_cov come from a method of _variational_distribution + # while having Kmm_root be a root decomposition to enable CIQVariationalDistribution support. + + # retrieve the variational mean, m and covariance matrix, S. + if not isinstance(self._variational_distribution, CholeskyVariationalDistribution): + raise NotImplementedError( + "Only CholeskyVariationalDistribution has pseudo-point support currently, ", + "but your _variational_distribution is a ", + self._variational_distribution.__name__, + ) + + var_cov_root = TriangularLazyTensor(self._variational_distribution.chol_variational_covar) + var_cov = CholLazyTensor(var_cov_root) + var_mean = self.variational_distribution.mean + if var_mean.shape[-1] != 1: + var_mean = var_mean.unsqueeze(-1) + + # compute R = I - S + cov_diff = var_cov.add_jitter(-1.0) + cov_diff = -1.0 * cov_diff + + # K^{1/2} + Kmm = self.model.covar_module(self.inducing_points) + Kmm_root = Kmm.cholesky() + + # D_a = (S^{-1} - K^{-1})^{-1} = S + S R^{-1} S + # note that in the whitened case R = I - S, unwhitened R = K - S + # we compute (R R^{T})^{-1} R^T S for stability reasons as R is probably not PSD. + eval_var_cov = var_cov.evaluate() + eval_rhs = cov_diff.transpose(-1, -2).matmul(eval_var_cov) + inner_term = cov_diff.matmul(cov_diff.transpose(-1, -2)) + # TODO: flag the jitter here + inner_solve = inner_term.add_jitter(1e-3).inv_matmul(eval_rhs, eval_var_cov.transpose(-1, -2)) + inducing_covar = var_cov + inner_solve + + inducing_covar = Kmm_root.matmul(inducing_covar).matmul(Kmm_root.transpose(-1, -2)) + + # mean term: D_a S^{-1} m + # unwhitened: (S - S R^{-1} S) S^{-1} m = (I - S R^{-1}) m + rhs = cov_diff.transpose(-1, -2).matmul(var_mean) + # TODO: this jitter too + inner_rhs_mean_solve = inner_term.add_jitter(1e-3).inv_matmul(rhs) + pseudo_target_mean = Kmm_root.matmul(inner_rhs_mean_solve) + + # ensure inducing covar is psd + # TODO: make this be an explicit root decomposition + try: + pseudo_target_covar = CholLazyTensor(inducing_covar.add_jitter(1e-3).cholesky()).evaluate() + except NotPSDError: + from gpytorch.lazy import DiagLazyTensor + + evals, evecs = inducing_covar.symeig(eigenvectors=True) + pseudo_target_covar = evecs.matmul(DiagLazyTensor(evals + 1e-4)).matmul(evecs.transpose(-1, -2)).evaluate() + + return pseudo_target_covar, pseudo_target_mean + def forward(self, x, inducing_points, inducing_values, variational_inducing_covar=None, **kwargs): # Compute full prior distribution full_inputs = torch.cat([inducing_points, x], dim=-2) diff --git a/test/examples/test_svgp_gp_regression.py b/test/examples/test_svgp_gp_regression.py index 5624de89c..94e39df5e 100644 --- a/test/examples/test_svgp_gp_regression.py +++ b/test/examples/test_svgp_gp_regression.py @@ -114,6 +114,16 @@ def test_regression_error( # Make sure CG was called (or not), and no warnings were thrown self.assertFalse(cg_mock.called) + if distribution_cls is gpytorch.variational.CholeskyVariationalDistribution: + # finally test fantasization + # we only will check that tossing the entire training set into the model will reduce the mae + model.likelihood = likelihood + fant_model = model.get_fantasy_model(train_x, train_y) + fant_preds = fant_model.likelihood(fant_model(train_x)).mean.squeeze() + updated_abs_error = torch.mean(torch.abs(train_y - fant_preds) / 2) + # TODO: figure out why this error is worse than before + self.assertLess(updated_abs_error.item(), 0.15) + def test_predictive_ll_regression_error(self): return self.test_regression_error( mll_cls=gpytorch.mlls.PredictiveLogLikelihood, diff --git a/test/examples/test_unwhitened_svgp_regression.py b/test/examples/test_unwhitened_svgp_regression.py index 8120dadb4..c1bc686bd 100644 --- a/test/examples/test_unwhitened_svgp_regression.py +++ b/test/examples/test_unwhitened_svgp_regression.py @@ -81,6 +81,16 @@ def test_regression_error( mean_abs_error = torch.mean(torch.abs(train_y - test_preds) / 2) self.assertLess(mean_abs_error.item(), 0.014) + if distribution_cls is gpytorch.variational.CholeskyVariationalDistribution: + # finally test fantasization + # we only will check that tossing the entire training set into the model will reduce the mae + model.likelihood = likelihood + fant_model = model.get_fantasy_model(train_x, train_y) + fant_preds = fant_model.likelihood(fant_model(train_x)).mean.squeeze() + updated_abs_error = torch.mean(torch.abs(train_y - fant_preds) / 2) + # TODO: figure out why this error is worse than before + self.assertLess(updated_abs_error.item(), 0.15) + if __name__ == "__main__": unittest.main() diff --git a/test/variational/test_batch_decoupled_variational_strategy.py b/test/variational/test_batch_decoupled_variational_strategy.py index a592c8060..99acf289b 100644 --- a/test/variational/test_batch_decoupled_variational_strategy.py +++ b/test/variational/test_batch_decoupled_variational_strategy.py @@ -57,6 +57,11 @@ def test_eval_iteration(self, *args, **kwargs): self.assertEqual(cholesky_mock.call_count, 1) # One to compute cache, that's it! self.assertFalse(ciq_mock.called) + def test_fantasy_call(self, *args, **kwargs): + # with self.assertRaises(AttributeError): + # super().test_fantasy_call(*args, **kwargs) + pass + class TestBatchDecoupledPredictiveGP(TestBatchDecoupledVariationalGP): @property diff --git a/test/variational/test_ciq_variational_strategy.py b/test/variational/test_ciq_variational_strategy.py index 3bfc7cb3b..c72b17c77 100644 --- a/test/variational/test_ciq_variational_strategy.py +++ b/test/variational/test_ciq_variational_strategy.py @@ -37,6 +37,10 @@ def test_eval_iteration(self, *args, **kwargs): self.assertFalse(cholesky_mock.called) self.assertEqual(ciq_mock.call_count, 2) # One for each evaluation call + def test_fantasy_call(self, *args, **kwargs): + with self.assertRaises(AttributeError): + super().test_fantasy_call(*args, **kwargs) + class TestMeanFieldCiqVariationalGP(TestCiqVariationalGP): @property diff --git a/test/variational/test_grid_interpolation_variational_strategy.py b/test/variational/test_grid_interpolation_variational_strategy.py index 385b3ab8d..5747a7ca0 100644 --- a/test/variational/test_grid_interpolation_variational_strategy.py +++ b/test/variational/test_grid_interpolation_variational_strategy.py @@ -75,6 +75,10 @@ def test_eval_iteration(self, *args, **kwargs): self.assertFalse(cholesky_mock.called) self.assertFalse(ciq_mock.called) + def test_fantasy_call(self, *args, **kwargs): + with self.assertRaises(AttributeError): + super().test_fantasy_call(*args, **kwargs) + class TestGridPredictiveGP(TestGridVariationalGP): @property diff --git a/test/variational/test_independent_multitask_variational_strategy.py b/test/variational/test_independent_multitask_variational_strategy.py index ab88f52ed..e5f9aa09d 100644 --- a/test/variational/test_independent_multitask_variational_strategy.py +++ b/test/variational/test_independent_multitask_variational_strategy.py @@ -60,6 +60,10 @@ def test_eval_iteration(self, *args, expected_batch_shape=None, **kwargs): expected_batch_shape = expected_batch_shape[:-1] super().test_eval_iteration(*args, expected_batch_shape=expected_batch_shape, **kwargs) + def test_fantasy_call(self, *args, **kwargs): + with self.assertRaises(AttributeError): + super().test_fantasy_call(*args, **kwargs) + class TestMultitaskPredictiveGP(TestMultitaskVariationalGP): @property diff --git a/test/variational/test_lmc_variational_strategy.py b/test/variational/test_lmc_variational_strategy.py index 2e5255200..76eea8175 100644 --- a/test/variational/test_lmc_variational_strategy.py +++ b/test/variational/test_lmc_variational_strategy.py @@ -68,6 +68,10 @@ def test_eval_iteration(self, *args, expected_batch_shape=None, **kwargs): self.assertFalse(cg_mock.called) self.assertFalse(ciq_mock.called) + def test_fantasy_call(self, *args, **kwargs): + with self.assertRaises(AttributeError): + super().test_fantasy_call(*args, **kwargs) + class TestLMCPredictiveGP(TestLMCVariationalGP): @property diff --git a/test/variational/test_orthogonally_decoupled_variational_strategy.py b/test/variational/test_orthogonally_decoupled_variational_strategy.py index 873204eb1..c1b2847f7 100644 --- a/test/variational/test_orthogonally_decoupled_variational_strategy.py +++ b/test/variational/test_orthogonally_decoupled_variational_strategy.py @@ -57,6 +57,10 @@ def test_eval_iteration(self, *args, **kwargs): self.assertFalse(ciq_mock.called) self.assertEqual(cholesky_mock.call_count, 1) # One to compute cache, that's it! + def test_fantasy_call(self, *args, **kwargs): + with self.assertRaises(AttributeError): + super().test_fantasy_call(*args, **kwargs) + class TestOrthogonallyDecoupledPredictiveGP(TestOrthogonallyDecoupledVariationalGP): @property diff --git a/test/variational/test_unwhitened_variational_strategy.py b/test/variational/test_unwhitened_variational_strategy.py index 362d8f662..f7047c4cf 100644 --- a/test/variational/test_unwhitened_variational_strategy.py +++ b/test/variational/test_unwhitened_variational_strategy.py @@ -40,6 +40,14 @@ def test_eval_iteration(self, *args, **kwargs): self.assertFalse(ciq_mock.called) self.assertEqual(cholesky_mock.call_count, 1) # One to compute cache, that's it! + def test_fantasy_call(self, *args, **kwargs): + # we only want to check CholeskyVariationalDistribution + if self.distribution_cls is gpytorch.variational.CholeskyVariationalDistribution: + return super().test_fantasy_call(*args, **kwargs) + + with self.assertRaises(AttributeError): + super().test_fantasy_call(*args, **kwargs) + class TestUnwhitenedPredictiveGP(TestUnwhitenedVariationalGP): @property diff --git a/test/variational/test_variational_strategy.py b/test/variational/test_variational_strategy.py index 4b879af4f..569d92ed3 100644 --- a/test/variational/test_variational_strategy.py +++ b/test/variational/test_variational_strategy.py @@ -37,6 +37,14 @@ def test_eval_iteration(self, *args, **kwargs): self.assertEqual(cholesky_mock.call_count, 1) # One to compute cache, that's it! self.assertFalse(ciq_mock.called) + def test_fantasy_call(self, *args, **kwargs): + # we only want to check CholeskyVariationalDistribution + if self.distribution_cls is gpytorch.variational.CholeskyVariationalDistribution: + return super().test_fantasy_call(*args, **kwargs) + + with self.assertRaises(AttributeError): + super().test_fantasy_call(*args, **kwargs) + class TestPredictiveGP(TestVariationalGP): @property