Support for temporary/fantasy training data #177

jacobrgardner · 2018-07-16T18:11:54Z

This is related to the need for "fantasy" observations for BayesOpt, where, in addition to training data, we want to condition a model on extra temporary training data with sampled function values as labels.

Right now, this is technically supported via the set_train_data method, which can make arbitrary changes to the training data with strict=False, which would let us just directly append the fantasy data to the training data:

gpytorch/gpytorch/models/exact_gp.py

Line 50 in c16ec46

def set_train_data(self, inputs=None, targets=None, strict=True):

If we'd prefer a better interface that doesn't involve the user tracking how many fantasy points they've added so they can remove them later, we could add a similar method (set_fantasy_data ?) that modifies self.fantasy_inputs and self.fantasy_targets attributes (default None) similar to the train_inputs and train_targets ones modified there.

If we did this, then we'd basically need to update __call__ in three ways.

First, we concatenate in fantasy_inputs here:

gpytorch/gpytorch/models/exact_gp.py

Lines 92 to 95 in c16ec46

    
           full_inputs = tuple( 
        
               torch.cat([train_input, input], dim=-2) for train_input, input in zip(train_inputs, inputs) 
        
           ) 
        
           full_output = super(ExactGP, self).__call__(*full_inputs, **kwargs)

Then concatenate fantasy_targets on to fantasy_labels here:

gpytorch/gpytorch/models/exact_gp.py

Line 104 in c16ec46

train_labels=Variable(self.train_targets),

And update the n_train argument to account for the fantasy training data here:

gpytorch/gpytorch/models/exact_gp.py

Line 110 in c16ec46

n_train=self.train_targets.size(-1),

The text was updated successfully, but these errors were encountered:

Balandat · 2018-07-16T19:55:52Z

Is there a way to do this efficiently without resetting the mean/covar caches? Seems like re-computing the full kernel matrix would be quite expensive if all we do is modify the data by adding a small number of fantasies.

jacobrgardner · 2018-07-16T20:44:50Z

So dealing with the mean caches is basically a case of dealing with linear systems involving bordered matrices, because we basically want to update K^{-1}y to [K A; B C]^{-1}[y; y_fantasy]. Methods exist for this (e.g., https://www.researchgate.net/publication/307559841_Linear_systems_of_equations_with_bordered_matrices, a few discussed in Golub & Van Loan) that may or may not be faster in actual wall clock time than just doing the solve from scratch -- our single solves are pretty fast at this point if you have a GPU.

The covar cache is only used/computed with LOVE, which would take some thought on how to update. This is actually something that @andrewgordonwilson and I are actively researching: how to update LOVE in the setting where you add individual data points. I have some ideas about this, but they are a bit complicated for a github issue.

jacobrgardner · 2018-07-16T20:48:27Z

For what it's worth, doing the solve from scratch using CG has the same asymptotic complexity for exact GPs (O(n^2)) as the "standard" way you'd do this update in a Cholesky-based GP package, which involves using the Schur complement and Woodbury formula.

Balandat · 2018-07-17T00:25:50Z

may or may not be faster in actual wall clock time than just doing the solve from scratch -- our single solves are pretty fast at this point if you have a GPU.

Being smart about warm-starting should probably be very helpful here as well, right?. E.g. the initial guess could take the solution from the previous solve for the existing points, and sth. ad-hoc like the mean across the solution for the previous points.

jacobrgardner · 2018-07-17T02:13:40Z

Good point, initializing with the existing mean cache with a few extra zeros concatenated for the fantasy examples is a smart idea. Assuming we don't expect the training data to ever change radically, what do you think about making it default behavior to, if a mean cache already exists, expand it to match the training data size and use it as initialization always?

Balandat · 2018-07-17T04:06:47Z

That sounds good to me. Would you want to use zeros or the mean across the mean cache?

jacobrgardner · 2018-07-17T14:29:28Z

Alright, I gave this some more thought and for temporary fantasy points and exact GPs specifically there is an O(kn+m^2) time approximate solution (where m is the number of fantasy points) for updating the mean cache and covar cache IF we are already using LOVE, where k is the rank of the decomposition used for LOVE.

I'll implement this idea as the default behavior when gpytorch.settings.fast_pred_var() is on, since the approximation should be exactly as good as the LOVE variances anyways, and do the initialization stuff we talked about when it is off.

There will be some kinda involved internal changes with this, so I can either get started on this now or continue with the original plan of helping finish up priors first.

cc @andrewgordonwilson, since the trick I'm talking about here is highly relevant to our discussion about updating the precomputed cache for LOVE.

Balandat · 2018-07-17T21:04:17Z

Let's try to get the priors in first, so we can avoid working on diverging branches as much as possible

jacobrgardner added the enhancement label Jul 18, 2018

jacobrgardner self-assigned this Jul 21, 2018

jacobrgardner mentioned this issue Jul 23, 2018

Separately cache inverse root of train-train covariance matrix when doing LOVE [WIP] #190

Closed

jacobrgardner closed this as completed Jul 3, 2019

fteufel mentioned this issue Feb 23, 2024

[Docs] get_fantasy_model - are posterior covariances computed from scratch or using efficient cache updates? #2479

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for temporary/fantasy training data #177

Support for temporary/fantasy training data #177

jacobrgardner commented Jul 16, 2018

Balandat commented Jul 16, 2018

jacobrgardner commented Jul 16, 2018 •

edited

Loading

jacobrgardner commented Jul 16, 2018

Balandat commented Jul 17, 2018

jacobrgardner commented Jul 17, 2018

Balandat commented Jul 17, 2018

jacobrgardner commented Jul 17, 2018 •

edited

Loading

Balandat commented Jul 17, 2018

Support for temporary/fantasy training data #177

Support for temporary/fantasy training data #177

Comments

jacobrgardner commented Jul 16, 2018

Balandat commented Jul 16, 2018

jacobrgardner commented Jul 16, 2018 • edited Loading

jacobrgardner commented Jul 16, 2018

Balandat commented Jul 17, 2018

jacobrgardner commented Jul 17, 2018

Balandat commented Jul 17, 2018

jacobrgardner commented Jul 17, 2018 • edited Loading

Balandat commented Jul 17, 2018

jacobrgardner commented Jul 16, 2018 •

edited

Loading

jacobrgardner commented Jul 17, 2018 •

edited

Loading