Skip to content

[Bug] Runtime error for indices not on the same device when running VNNGP example #2266

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yw5aj opened this issue Feb 3, 2023 · 2 comments · Fixed by yw5aj/gpytorch#1 or #2267
Closed
Labels

Comments

@yw5aj
Copy link
Contributor

yw5aj commented Feb 3, 2023

🐛 Bug

When running the VNNGP example, once we hit output = model(x=None) it will report: RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu).

To reproduce

** Code snippet to reproduce **
Simply run 04_Variational_and_Approximate_GPs/VNNGP.ipynb

** Stack trace/error message **

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[5], line 20
     18 for i in minibatch_iter:
     19     optimizer.zero_grad()
---> 20     output = model(x=None)
     21     # Obtain the indices for mini-batch data
     22     current_training_indices = model.variational_strategy.current_training_indices

Cell In[4], line 34, in GPModel.__call__(self, x, prior, **kwargs)
     32     if x.dim() == 1:
     33         x = x.unsqueeze(-1)
---> 34 return self.variational_strategy(x=x, prior=False, **kwargs)

File ~\AppData\Local\mambaforge\envs\torch\lib\site-packages\gpytorch\variational\nearest_neighbor_variational_strategy.py:129, in NNVariationalStrategy.__call__(self, x, prior, **kwargs)
    127 if self.training:
    128     self._clear_cache()
--> 129     return self.forward(x, self.inducing_points, None, None)
    130 else:
    131     # Ensure inducing_points and x are the same size
    132     inducing_points = self.inducing_points

File ~\AppData\Local\mambaforge\envs\torch\lib\site-packages\gpytorch\variational\nearest_neighbor_variational_strategy.py:168, in NNVariationalStrategy.forward(self, x, inducing_points, inducing_values, variational_inducing_covar, **kwargs)
    165     if torch.cuda.is_available():
    166         kl_indices = kl_indices.cuda()
--> 168 kl = self._kl_divergence(kl_indices)
    169 add_to_cache(self, "kl_divergence_memo", kl)
    171 return MultivariateNormal(predictive_mean, DiagLinearOperator(predictive_var))

File ~\AppData\Local\mambaforge\envs\torch\lib\site-packages\gpytorch\variational\nearest_neighbor_variational_strategy.py:325, in NNVariationalStrategy._kl_divergence(self, kl_indices, compute_full, batch_size)
    323         kl = self._firstk_kl_helper() * self.M / self.k
    324     else:
--> 325         kl = self._stochastic_kl_helper(kl_indices) * self.M / len(kl_indices)
    326 return kl

File ~\AppData\Local\mambaforge\envs\torch\lib\site-packages\gpytorch\variational\nearest_neighbor_variational_strategy.py:263, in NNVariationalStrategy._stochastic_kl_helper(self, kl_indices)
    261 # Select a mini-batch of inducing points according to kl_indices, and their k-nearest neighbors
    262 inducing_points = self.inducing_points[..., kl_indices, :]
--> 263 nearest_neighbor_indices = self.nn_xinduce_idx[..., kl_indices - self.k, :].to(inducing_points.device)
    264 expanded_inducing_points_all = self.inducing_points.unsqueeze(-2).expand(
    265     *self._inducing_batch_shape, self.M, self.k, self.D
    266 )
    267 expanded_nearest_neighbor_indices = nearest_neighbor_indices.unsqueeze(-1).expand(
    268     *self._inducing_batch_shape, kl_bs, self.k, self.D
    269 )

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

Expected Behavior

No error

System information

Please complete the following information:

  • GPyTorch 1.9.1
  • PyTorch 1.13.1
  • Windows 10 with GPU set up

Additional context

The mat file download needs to be manually done via web browser.

@yw5aj yw5aj added the bug label Feb 3, 2023
@yw5aj
Copy link
Contributor Author

yw5aj commented Feb 3, 2023

After changing line

self.nn_xinduce_idx = self.nn_util.build_sequential_nn_idx(inducing_points_fl)
to self.nn_xinduce_idx = self.nn_util.build_sequential_nn_idx(inducing_points_fl).to(self.inducing_points.device), the code can work on my end. But advice on if that's the best place to edit would be welcome; if yes, I'd be more than happy to submit a PR.

@Balandat
Copy link
Collaborator

Balandat commented Feb 3, 2023

A PR would be great. Probably makes sense to do this directly in the build_sequential_nn_idx function here though (by adding a to(device=x.device) before returning:

nn_idx = nn_idx.view(*self.batch_shape, N - self.k, self.k)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants